XPath injection
Construct XPath queries to guard against malicious input
Select your ecosystem
What is XPath injection?
XPath is a query language for XML documents. It was designed to simplify selecting nodes within the document structure.
XPath injection is a type of attack that can change the intent of an XPath query that is executed on an application’s backend. An application might be vulnerable to this attack if special characters are injected into a user-supplied input value, that input is not filtered and is concatenated with other strings to construct an XPath query, which is executed against an XML document.
Impacts of this attack can include bypassing authentication logic, or the disclosure of sensitive data within the XML document being queried.
About this lesson
In this lesson, you will learn how XPath injection works and how to protect your applications against it. We will begin by exploiting an XPath injection vulnerability in a simple application. Then we will analyze the vulnerable code and explore some options for remediation and prevention.
The Red Hills County Softball League has a web application built for its members and fans to view information about upcoming games, match results, and teams. But these small web applications aren't always built with security in mind.
How does XPath injection work?
Firstly, let's recap what took place in the interactive example above:
- Certain special characters, like a single quote, that were injected in the user-supplied query-string parameter “team” caused an error condition in the application
- The error messages presented by the application suggested that the server-side code was likely using XPath queries to retrieve data that was displayed to the user
- A combination of special characters was found, including injecting a null byte, which manipulated the XPath query in a way that changed the extent of the query but still resulted in valid XPath syntax
- As a result of the injection attack, the application retrieved more data than was supposed to be displayed and disclosed it to the attacker
We will have a look at this vulnerable application in more detail by going through the server-side code.
In our example attack, the hacker injected the string: Bears’]%00
The null byte (%00
) terminated the string representing the XPath query so the rest of the string concatenated after the user input was ignored and the XPath query effectively became the following string: /teams/team[name='Bears’]<null character>
The query returned all child nodes of the <team>
node where the <name>
node’s value was Bears
, including nodes such as the team members’ date of birth, address, and email, which were supposed to be private and not displayed to the application user.
Impacts of XPath injection
By exploiting XPath injection, a malicious actor could disclose sensitive data within the XML document being queried. For example, in the vulnerable application we looked at above, the personal information of team members was leaked, resulting in a violation of privacy for those individuals.
If the application uses the XML data for any security-related decisions, such as a database of usernames and passwords to authenticate users against, then authentication could be bypassed.
Disclosure of the contents of other files on the application server’s file system may also be possible depending on the XPath library in use and its configuration. The doc()
and doc-available()
XPath functions, when implemented in the XPath library, can allow the reading of files on the local filesystem.
Use an allowlist
By restricting user-supplied input that is used to construct the XPath query to only known safe characters, the query can be securely constructed:
Encode user input
By encoding special characters injected into the user-supplied input, such as the single quote to its XML entity representation, we can avoid a situation where malicious user input breaks out of the intended XPath query syntax:
import {encode} from 'html-entities';//…const nodes = xpath.select("/teams/team[name='" + encode(teamName) + "']/members/member/name/text()", doc);
In this modified application code, the malicious user input demonstrated in the attack shown above, would be encoded so that it is: Bears']
And the resulting constructed XPath query would be:
/teams/team[name='Bears']']/members/member/name/text()
Parameterized XPath queries
A better option is to use parameterized XPath queries, however, this depends on the specific XPath library or API the application uses, as some libraries do not implement parameterization of queries. Similar to SQL parameterized queries, the user input is inserted into the query as a variable, and any special characters in that user input cause the query to fail or are automatically escaped and cannot change the syntax of the query. A parameterized XPath query may look similar to the following:
//team[name = $teamname]
Where $teamname
is supplied as an argument to the parameterization method call.
Static analysis tool
Adding a static application security testing (SAST) tool to your DevOps pipeline as an additional line of defense is an excellent way to catch vulnerabilities before they make it to production. There are many, but Snyk Code is our personal favorite, as it scans in real-time, provides actionable remediation advice, and is available from your favorite IDE.
Test your knowledge!
Keep learning
To learn more about XPath injection, check out some other great content:
- OWASP guide to XPath injection
- Take a look at our other injection lessons