Add query for polynomial ReDoS

2026-04-28 02:05:14 +02:00 · 2021-08-26 18:02:24 +01:00
parent 86073776b7
commit d62b41bdf4
8 changed files with 751 additions and 3 deletions
--- a/ql/src/queries/security/cwe-1333/PolynomialReDoS.qhelp
+++ b/ql/src/queries/security/cwe-1333/PolynomialReDoS.qhelp
@@ -0,0 +1,113 @@
+<!DOCTYPE qhelp PUBLIC
+"-//Semmle//qhelp//EN"
+"qhelp.dtd">
+
+<qhelp>
+
+	<include src="ReDoSIntroduction.inc.qhelp" />
+
+	<example>
+		<p>
+
+			Consider this use of a regular expression, which removes
+			all leading and trailing whitespace in a string:
+
+		</p>
+
+		<sample language="ruby">
+			text.gsub!(/^\s+|\s+$/, '') # BAD
+		</sample>
+
+		<p>
+
+			The sub-expression <code>"\s+$"</code> will match the
+			whitespace characters in <code>text</code> from left to
+			right, but it can start matching anywhere within a
+			whitespace sequence. This is problematic for strings
+			that do <strong>not</strong> end with a whitespace
+			character. Such a string will force the regular
+			expression engine to process each whitespace sequence
+			once per whitespace character in the sequence.
+
+		</p>
+
+		<p>
+
+			This ultimately means that the time cost of trimming a
+			string is quadratic in the length of the string. So a
+			string like <code>"a b"</code> will take milliseconds to
+			process, but a similar string with a million spaces
+			instead of just one will take several minutes.
+
+		</p>
+
+		<p>
+
+			Avoid this problem by rewriting the regular expression
+			to not contain the ambiguity about when to start
+			matching whitespace sequences. For instance, by using a
+			negative look-behind
+			(<code>/^\s+|(?&lt;!\s)\s+$/</code>), or just by using
+			the built-in strip method (<code>text.strip!</code>).
+
+		</p>
+
+		<p>
+
+			Note that the sub-expression <code>"^\s+"</code> is
+			<strong>not</strong> problematic as the <code>^</code>
+			anchor restricts when that sub-expression can start
+			matching, and as the regular expression engine matches
+			from left to right.
+
+		</p>
+
+	</example>
+
+	<example>
+
+		<p>
+
+			As a similar, but slightly subtler problem, consider the
+			regular expression that matches lines with numbers, possibly written
+			using scientific notation:
+		</p>
+
+		<sample language="ruby">
+			/^0\.\d+E?\d+$/ # BAD
+		</sample>
+
+		<p>
+
+			The problem with this regular expression is in the
+			sub-expression <code>\d+E?\d+</code> because the second
+			<code>\d+</code> can start matching digits anywhere
+			after the first match of the first <code>\d+</code> if
+			there is no <code>E</code> in the input string.
+
+		</p>
+
+		<p>
+
+			This is problematic for strings that do
+			<strong>not</strong> end with a digit. Such a string
+			will force the regular expression engine to process each
+			digit sequence once per digit in the sequence, again
+			leading to a quadratic time complexity.
+
+		</p>
+
+		<p>
+
+			To make the processing faster, the regular expression
+			should be rewritten such that the two <code>\d+</code>
+			sub-expressions do not have overlapping matches:
+			<code>/^0\.\d+(E\d+)?$/</code>.
+
+		</p>
+
+	</example>
+
+	<include src="ReDoSReferences.inc.qhelp"/>
+
+</qhelp>
--- a/ql/src/queries/security/cwe-1333/PolynomialReDoS.ql
+++ b/ql/src/queries/security/cwe-1333/PolynomialReDoS.ql
@@ -0,0 +1,30 @@
+/**
+ * @name Polynomial regular expression used on uncontrolled data
+ * @description A regular expression that can require polynomial time
+ *              to match may be vulnerable to denial-of-service attacks.
+ * @kind path-problem
+ * @problem.severity warning
+ * @precision high
+ * @id rb/polynomial-redos
+ * @tags security
+ *       external/cwe/cwe-1333
+ *       external/cwe/cwe-730
+ *       external/cwe/cwe-400
+ */
+
+import DataFlow::PathGraph
+import codeql.ruby.DataFlow
+import codeql.ruby.regexp.PolynomialReDoS
+import codeql.ruby.regexp.SuperlinearBackTracking
+
+from
+  PolynomialReDoS::Configuration config, DataFlow::PathNode source, DataFlow::PathNode sink,
+  PolynomialReDoS::Sink sinkNode, PolynomialBackTrackingTerm regexp
+where
+  config.hasFlowPath(source, sink) and
+  sinkNode = sink.getNode() and
+  regexp = sinkNode.getRegExp()
+select sinkNode.getHighlight(), source, sink,
+  "This $@ that depends on $@ may run slow on strings " + regexp.getPrefixMessage() +
+    "with many repetitions of '" + regexp.getPumpString() + "'.", regexp, "regular expression",
+  source.getNode(), "a user-provided value"