Python: Copy Xxe/XmlBomb queries from JS

After internal discussion, these will replace the `XmlEntityInjection` query, so we can have separate severities on DoS and the other (more serious) attacks. Note: These clearly don't work, since they are verbatim copies of the JS code, but I split it into multiple commits to clearly highlight what changes were made.
2026-02-23 18:33:42 +01:00 · 2022-03-24 14:13:12 +01:00
parent 48015e5a2e
commit 65907c9762
12 changed files with 352 additions and 0 deletions
--- a/python/ql/src/experimental/Security/NEW/CWE-611/Xxe.qhelp
+++ b/python/ql/src/experimental/Security/NEW/CWE-611/Xxe.qhelp
@@ -0,0 +1,57 @@
+<!DOCTYPE qhelp PUBLIC "-//Semmle//qhelp//EN" "qhelp.dtd">
+<qhelp>
+
+<overview>
+<p>
+Parsing untrusted XML files with a weakly configured XML parser may lead to an
+XML External Entity (XXE) attack. This type of attack uses external entity references
+to access arbitrary files on a system, carry out denial-of-service (DoS) attacks, or server-side
+request forgery. Even when the result of parsing is not returned to the user, DoS attacks are still possible
+and out-of-band data retrieval techniques may allow attackers to steal sensitive data.
+</p>
+</overview>
+
+<recommendation>
+<p>
+The easiest way to prevent XXE attacks is to disable external entity handling when
+parsing untrusted data. How this is done depends on the library being used. Note that some
+libraries, such as recent versions of <code>libxml</code>, disable entity expansion by default,
+so unless you have explicitly enabled entity expansion, no further action needs to be taken.
+</p>
+</recommendation>
+
+<example>
+<p>
+The following example uses the <code>libxml</code> XML parser to parse a string <code>xmlSrc</code>.
+If that string is from an untrusted source, this code may be vulnerable to an XXE attack, since
+the parser is invoked with the <code>noent</code> option set to <code>true</code>:
+</p>
+<sample src="examples/Xxe.js"/>
+
+<p>
+To guard against XXE attacks, the <code>noent</code> option should be omitted or set to
+<code>false</code>. This means that no entity expansion is undertaken at all, not even for standard
+internal entities such as <code>&amp;amp;</code> or <code>&amp;gt;</code>. If desired, these
+entities can be expanded in a separate step using utility functions provided by libraries such
+as <a href="http://underscorejs.org/#unescape">underscore</a>,
+<a href="https://lodash.com/docs/4.17.15#unescape">lodash</a> or
+<a href="https://github.com/mathiasbynens/he">he</a>.
+</p>
+<sample src="examples/XxeGood.js"/>
+</example>
+
+<references>
+<li>
+OWASP:
+<a href="https://www.owasp.org/index.php/XML_External_Entity_(XXE)_Processing">XML External Entity (XXE) Processing</a>.
+</li>
+<li>
+Timothy Morgen:
+<a href="https://research.nccgroup.com/2014/05/19/xml-schema-dtd-and-entity-attacks-a-compendium-of-known-techniques/">XML Schema, DTD, and Entity Attacks</a>.
+</li>
+<li>
+Timur Yunusov, Alexey Osipov:
+<a href="https://www.slideshare.net/qqlan/bh-ready-v4">XML Out-Of-Band Data Retrieval</a>.
+</li>
+</references>
+</qhelp>
--- a/python/ql/src/experimental/Security/NEW/CWE-611/Xxe.ql
+++ b/python/ql/src/experimental/Security/NEW/CWE-611/Xxe.ql
@@ -0,0 +1,23 @@
+/**
+ * @name XML external entity expansion
+ * @description Parsing user input as an XML document with external
+ *              entity expansion is vulnerable to XXE attacks.
+ * @kind path-problem
+ * @problem.severity error
+ * @security-severity 9.1
+ * @precision high
+ * @id js/xxe
+ * @tags security
+ *       external/cwe/cwe-611
+ *       external/cwe/cwe-827
+ */
+
+import javascript
+import semmle.javascript.security.dataflow.XxeQuery
+import DataFlow::PathGraph
+
+from Configuration cfg, DataFlow::PathNode source, DataFlow::PathNode sink
+where cfg.hasFlowPath(source, sink)
+select sink.getNode(), source, sink,
+  "A $@ is parsed as XML without guarding against external entity expansion.", source.getNode(),
+  "user-provided value"
--- a/python/ql/src/experimental/Security/NEW/CWE-611/examples/Xxe.js
+++ b/python/ql/src/experimental/Security/NEW/CWE-611/examples/Xxe.js
@@ -0,0 +1,7 @@
+const app = require("express")(),
+  libxml = require("libxmljs");
+
+app.post("upload", (req, res) => {
+  let xmlSrc = req.body,
+    doc = libxml.parseXml(xmlSrc, { noent: true });
+});
--- a/python/ql/src/experimental/Security/NEW/CWE-611/examples/XxeGood.js
+++ b/python/ql/src/experimental/Security/NEW/CWE-611/examples/XxeGood.js
@@ -0,0 +1,7 @@
+const app = require("express")(),
+  libxml = require("libxmljs");
+
+app.post("upload", (req, res) => {
+  let xmlSrc = req.body,
+    doc = libxml.parseXml(xmlSrc);
+});
--- a/python/ql/src/experimental/Security/NEW/CWE-776/XmlBomb.qhelp
+++ b/python/ql/src/experimental/Security/NEW/CWE-776/XmlBomb.qhelp
@@ -0,0 +1,60 @@
+<!DOCTYPE qhelp PUBLIC "-//Semmle//qhelp//EN" "qhelp.dtd">
+<qhelp>
+
+<overview>
+<p>
+Parsing untrusted XML files with a weakly configured XML parser may be vulnerable to
+denial-of-service (DoS) attacks exploiting uncontrolled internal entity expansion.
+</p>
+<p>
+In XML, so-called <i>internal entities</i> are a mechanism for introducing an abbreviation
+for a piece of text or part of a document. When a parser that has been configured
+to expand entities encounters a reference to an internal entity, it replaces the entity
+by the data it represents. The replacement text may itself contain other entity references,
+which are expanded recursively. This means that entity expansion can increase document size
+dramatically.
+</p>
+<p>
+If untrusted XML is parsed with entity expansion enabled, a malicious attacker could
+submit a document that contains very deeply nested entity definitions, causing the parser
+to take a very long time or use large amounts of memory. This is sometimes called an
+<i>XML bomb</i> attack.
+</p>
+</overview>
+
+<recommendation>
+<p>
+The safest way to prevent XML bomb attacks is to disable entity expansion when parsing untrusted
+data. How this is done depends on the library being used. Note that some libraries, such as
+recent versions of <code>libxmljs</code> (though not its SAX parser API), disable entity expansion
+by default, so unless you have explicitly enabled entity expansion, no further action is needed.
+</p>
+</recommendation>
+
+<example>
+<p>
+The following example uses the XML parser provided by the <code>node-expat</code> package to
+parse a string <code>xmlSrc</code>. If that string is from an untrusted source, this code may be
+vulnerable to a DoS attack, since <code>node-expat</code> expands internal entities by default:
+</p>
+<sample src="examples/XmlBomb.js"/>
+
+<p>
+At the time of writing, <code>node-expat</code> does not provide a way of controlling entity
+expansion, but the example could be rewritten to use the <code>sax</code> package instead,
+which only expands standard entities such as <code>&amp;amp;</code>:
+</p>
+<sample src="examples/XmlBombGood.js"/>
+</example>
+
+<references>
+<li>
+Wikipedia:
+<a href="https://en.wikipedia.org/wiki/Billion_laughs">Billion Laughs</a>.
+</li>
+<li>
+Bryan Sullivan:
+<a href="https://msdn.microsoft.com/en-us/magazine/ee335713.aspx">Security Briefs - XML Denial of Service Attacks and Defenses</a>.
+</li>
+</references>
+</qhelp>
--- a/python/ql/src/experimental/Security/NEW/CWE-776/XmlBomb.ql
+++ b/python/ql/src/experimental/Security/NEW/CWE-776/XmlBomb.ql
@@ -0,0 +1,23 @@
+/**
+ * @name XML internal entity expansion
+ * @description Parsing user input as an XML document with arbitrary internal
+ *              entity expansion is vulnerable to denial-of-service attacks.
+ * @kind path-problem
+ * @problem.severity warning
+ * @security-severity 7.5
+ * @precision high
+ * @id js/xml-bomb
+ * @tags security
+ *       external/cwe/cwe-776
+ *       external/cwe/cwe-400
+ */
+
+import javascript
+import semmle.javascript.security.dataflow.XmlBombQuery
+import DataFlow::PathGraph
+
+from Configuration cfg, DataFlow::PathNode source, DataFlow::PathNode sink
+where cfg.hasFlowPath(source, sink)
+select sink.getNode(), source, sink,
+  "A $@ is parsed as XML without guarding against uncontrolled entity expansion.", source.getNode(),
+  "user-provided value"
--- a/python/ql/src/experimental/Security/NEW/CWE-776/examples/XmlBomb.js
+++ b/python/ql/src/experimental/Security/NEW/CWE-776/examples/XmlBomb.js
@@ -0,0 +1,10 @@
+const app = require("express")(),
+  expat = require("node-expat");
+
+app.post("upload", (req, res) => {
+  let xmlSrc = req.body,
+    parser = new expat.Parser();
+  parser.on("startElement", handleStart);
+  parser.on("text", handleText);
+  parser.write(xmlSrc);
+});
--- a/python/ql/src/experimental/Security/NEW/CWE-776/examples/XmlBombGood.js
+++ b/python/ql/src/experimental/Security/NEW/CWE-776/examples/XmlBombGood.js
@@ -0,0 +1,10 @@
+const app = require("express")(),
+  sax = require("sax");
+
+app.post("upload", (req, res) => {
+  let xmlSrc = req.body,
+    parser = sax.parser(true);
+  parser.onopentag = handleStart;
+  parser.ontext = handleText;
+  parser.write(xmlSrc);
+});
--- a/python/ql/src/experimental/semmle/python/security/dataflow/XmlBombCustomizations.qll
+++ b/python/ql/src/experimental/semmle/python/security/dataflow/XmlBombCustomizations.qll
@@ -0,0 +1,49 @@
+/**
+ * Provides default sources, sinks and sanitizers for reasoning about
+ * XML-bomb vulnerabilities, as well as extension points for adding
+ * your own.
+ */
+
+import javascript
+import semmle.javascript.security.dataflow.DOM
+
+module XmlBomb {
+  /**
+   * A data flow source for XML-bomb vulnerabilities.
+   */
+  abstract class Source extends DataFlow::Node { }
+
+  /**
+   * A data flow sink for XML-bomb vulnerabilities.
+   */
+  abstract class Sink extends DataFlow::Node { }
+
+  /**
+   * A sanitizer for XML-bomb vulnerabilities.
+   */
+  abstract class Sanitizer extends DataFlow::Node { }
+
+  /** A source of remote user input, considered as a flow source for XML bomb vulnerabilities. */
+  class RemoteFlowSourceAsSource extends Source {
+    RemoteFlowSourceAsSource() { this instanceof RemoteFlowSource }
+  }
+
+  /**
+   * An access to `document.location`, considered as a flow source for XML bomb vulnerabilities.
+   */
+  class LocationAsSource extends Source, DataFlow::ValueNode {
+    LocationAsSource() { isLocation(astNode) }
+  }
+
+  /**
+   * A call to an XML parser that performs internal entity expansion, viewed
+   * as a data flow sink for XML-bomb vulnerabilities.
+   */
+  class XmlParsingWithEntityResolution extends Sink, DataFlow::ValueNode {
+    XmlParsingWithEntityResolution() {
+      exists(XML::ParserInvocation parse | astNode = parse.getSourceArgument() |
+        parse.resolvesEntities(XML::InternalEntity())
+      )
+    }
+  }
+}
--- a/python/ql/src/experimental/semmle/python/security/dataflow/XmlBombQuery.qll
+++ b/python/ql/src/experimental/semmle/python/security/dataflow/XmlBombQuery.qll
@@ -0,0 +1,27 @@
+/**
+ * Provides a taint tracking configuration for reasoning about
+ * XML-bomb vulnerabilities.
+ *
+ * Note, for performance reasons: only import this file if
+ * `XmlBomb::Configuration` is needed, otherwise
+ * `XmlBombCustomizations` should be imported instead.
+ */
+
+import javascript
+import XmlBombCustomizations::XmlBomb
+
+/**
+ * A taint-tracking configuration for reasoning about XML-bomb vulnerabilities.
+ */
+class Configuration extends TaintTracking::Configuration {
+  Configuration() { this = "XmlBomb" }
+
+  override predicate isSource(DataFlow::Node source) { source instanceof Source }
+
+  override predicate isSink(DataFlow::Node sink) { sink instanceof Sink }
+
+  override predicate isSanitizer(DataFlow::Node node) {
+    super.isSanitizer(node) or
+    node instanceof Sanitizer
+  }
+}
--- a/python/ql/src/experimental/semmle/python/security/dataflow/XxeCustomizations.qll
+++ b/python/ql/src/experimental/semmle/python/security/dataflow/XxeCustomizations.qll
@@ -0,0 +1,52 @@
+/**
+ * Provides default sources, sinks and sanitizers for reasoning about
+ * XML External Entity (XXE) vulnerabilities, as well as extension
+ * points for adding your own.
+ */
+
+import javascript
+import semmle.javascript.security.dataflow.DOM
+
+module Xxe {
+  /**
+   * A data flow source for XXE vulnerabilities.
+   */
+  abstract class Source extends DataFlow::Node { }
+
+  /**
+   * A data flow sink for XXE vulnerabilities.
+   */
+  abstract class Sink extends DataFlow::Node { }
+
+  /**
+   * A sanitizer for XXE vulnerabilities.
+   */
+  abstract class Sanitizer extends DataFlow::Node { }
+
+  /** A source of remote user input, considered as a flow source for XXE vulnerabilities. */
+  class RemoteFlowSourceAsSource extends Source {
+    RemoteFlowSourceAsSource() { this instanceof RemoteFlowSource }
+  }
+
+  /**
+   * An access to `document.location`, considered as a flow source for XXE vulnerabilities.
+   */
+  class LocationAsSource extends Source, DataFlow::ValueNode {
+    LocationAsSource() { isLocation(astNode) }
+  }
+
+  /**
+   * A call to an XML parser that performs external entity expansion, viewed
+   * as a data flow sink for XXE vulnerabilities.
+   */
+  class XmlParsingWithExternalEntityResolution extends Sink, DataFlow::ValueNode {
+    XmlParsingWithExternalEntityResolution() {
+      exists(XML::ParserInvocation parse | astNode = parse.getSourceArgument() |
+        parse.resolvesEntities(XML::ExternalEntity(_))
+        or
+        parse.resolvesEntities(XML::ParameterEntity(true)) and
+        parse.resolvesEntities(XML::InternalEntity())
+      )
+    }
+  }
+}
--- a/python/ql/src/experimental/semmle/python/security/dataflow/XxeQuery.qll
+++ b/python/ql/src/experimental/semmle/python/security/dataflow/XxeQuery.qll
@@ -0,0 +1,27 @@
+/**
+ * Provides a taint tracking configuration for reasoning about XML
+ * External Entity (XXE) vulnerabilities.
+ *
+ * Note, for performance reasons: only import this file if
+ * `Xxe::Configuration` is needed, otherwise `XxeCustomizations`
+ * should be imported instead.
+ */
+
+import javascript
+import XxeCustomizations::Xxe
+
+/**
+ * A taint-tracking configuration for reasoning about XXE vulnerabilities.
+ */
+class Configuration extends TaintTracking::Configuration {
+  Configuration() { this = "Xxe" }
+
+  override predicate isSource(DataFlow::Node source) { source instanceof Source }
+
+  override predicate isSink(DataFlow::Node sink) { sink instanceof Sink }
+
+  override predicate isSanitizer(DataFlow::Node node) {
+    super.isSanitizer(node) or
+    node instanceof Sanitizer
+  }
+}