If you’ve taken a look at the 2017 OWASP Top 10, updated for the first time since 2013, you might be wondering what in the world XML External Entity (XXE) processing is and how it pulled the number four spot of most critical web application security risks. Also, according to OWASP it’s an issue that is “not commonly tested as of 2017.” Don’t panic – here’s a quick rundown of what it is and why you should care.
First, it’s important to know a few basic underlying concepts. Extensible Markup Language, or XML for short, is a language designed to transport data while being readable by both humans and machines. Entities are XML shortcuts for special characters, strings, URI, and more. They look like this –
They will always start with an ampersand, end with a semi-colon, and have the entity name in the middle. You can think of them as XML’s version of variables. Lastly, entities can be either internal or external. Internal means they are defined locally in the XML document or part of XML itself. External entities are defined by an outside source such as a URL or a file on the local system.
You might use an XML entity when using special characters such as less than (<), greater than (>), or ampersand (&). Using the entity rather than the characters themselves ensures the XML processor doesn’t error out on unexpected characters when being processed. Another use-case would be setting entity values to strings. Say you have a long user agreement that needs to be repeated several times throughout a document, you may set the entity
&agreement; and call it when needed rather than pasting the whole text over and over.
However, as the name would imply, this vulnerability deals with external entities. And even though we’ll be using these to attack applications, they do have legitimate use cases. For instance, you may call another XML document stored on the local machine to be included in another. Or maybe you reach out over the internet to a document type definition (DTD) file, used to check and validate your XML.
The issue comes into play when XML documents are parsed by a weakly configured parser. The parser expects to see a legitimate entity like
<!ENTITY include SYSTEM "https://versprite.com/docs/appsec.xml"> but an attacker may inject an entity such as
<!ENTITY xxe SYSTEM "file:///etc/passwd:"> instead. In cases where the contents of the entity are reflected back to the user, the /etc/password file of the web server would then be displayed.
Aside from grabbing /etc/passwd, you may try looking for config files with hardcoded passwords such as wp-config.php, Windows’ unattended.xml or Sysprep files, or take it even further by grabbing files over a network.
Along these same lines, you can attempt to access specific ports and compare error messages to see if the ports are open or closed.
Another example of an XXE exploitation comes from a Denial of Service called the Billion Laughs Attack. This attack depends on the parser processing recursive entities that call more recursive entities. As shown below, a few lines of XML can result in the server processing and spitting out a lot more data than was put in. The more recursive entities used, the more likely it is to result in a denial of service.
Another denial of service may occur when trying to access an endless file such as /dev/urandom. Lastly, if you are very lucky, you may find an XXE vulnerability on a server using PHP and allowing the expect filter. In this case, shell commands can be run as shown below.
If the contents are not reflected back, there are still tricks to getting the information back to you. For example, you might exfiltrate the information gained from one entity by using another. Shown below, a file’s contents are grabbed via XXE and sent to an attacker’s server where they will be visible in the web server logs.
While these examples show HTTP requests, XXE issues aren’t just for web applications. This issue can occur in desktop apps, AJAX requests, OpenID logins (used by Login with Facebook / Google functionality), and SOAP web service requests as well.
As with most of the OWASP Top 10, the best way to avoid the issues is to practice secure coding. Disable external entities when they are not needed. The specifics on how to do this will vary from language to language and parser to parser – so lookout for any documentation mentioning DTDs, external entities, entity expansion, or entity substitution.
Applications should be sandboxed so that damage is minimized if an XXE issue is successfully exploited. Google pays up to $13,337 for unsandboxed XXE issues, much less for sandboxed issues.
Consider switching from XML to JSON. While parts of the JSON ecosystem have had security issues, simply parsing it is safe.
Lastly, if all else fails, make sure your XML parser doesn’t display errors. While this won’t completely prevent an attack, it may slow down an attacker or the discovery of the issue on the system.
Also, be sure to check out the OWASP Atlanta hosted VerSprite for a presentation on XXE attacks, I Don’t Always Exploit Web Apps, But When I Do I Prefer XXE.