Add query for polynomial ReDoS

This commit is contained in:
Nick Rolfe
2021-08-26 18:02:24 +01:00
parent 86073776b7
commit d62b41bdf4
8 changed files with 751 additions and 3 deletions

View File

@@ -0,0 +1,113 @@
<!DOCTYPE qhelp PUBLIC
"-//Semmle//qhelp//EN"
"qhelp.dtd">
<qhelp>
<include src="ReDoSIntroduction.inc.qhelp" />
<example>
<p>
Consider this use of a regular expression, which removes
all leading and trailing whitespace in a string:
</p>
<sample language="ruby">
text.gsub!(/^\s+|\s+$/, '') # BAD
</sample>
<p>
The sub-expression <code>"\s+$"</code> will match the
whitespace characters in <code>text</code> from left to
right, but it can start matching anywhere within a
whitespace sequence. This is problematic for strings
that do <strong>not</strong> end with a whitespace
character. Such a string will force the regular
expression engine to process each whitespace sequence
once per whitespace character in the sequence.
</p>
<p>
This ultimately means that the time cost of trimming a
string is quadratic in the length of the string. So a
string like <code>"a b"</code> will take milliseconds to
process, but a similar string with a million spaces
instead of just one will take several minutes.
</p>
<p>
Avoid this problem by rewriting the regular expression
to not contain the ambiguity about when to start
matching whitespace sequences. For instance, by using a
negative look-behind
(<code>/^\s+|(?&lt;!\s)\s+$/</code>), or just by using
the built-in strip method (<code>text.strip!</code>).
</p>
<p>
Note that the sub-expression <code>"^\s+"</code> is
<strong>not</strong> problematic as the <code>^</code>
anchor restricts when that sub-expression can start
matching, and as the regular expression engine matches
from left to right.
</p>
</example>
<example>
<p>
As a similar, but slightly subtler problem, consider the
regular expression that matches lines with numbers, possibly written
using scientific notation:
</p>
<sample language="ruby">
/^0\.\d+E?\d+$/ # BAD
</sample>
<p>
The problem with this regular expression is in the
sub-expression <code>\d+E?\d+</code> because the second
<code>\d+</code> can start matching digits anywhere
after the first match of the first <code>\d+</code> if
there is no <code>E</code> in the input string.
</p>
<p>
This is problematic for strings that do
<strong>not</strong> end with a digit. Such a string
will force the regular expression engine to process each
digit sequence once per digit in the sequence, again
leading to a quadratic time complexity.
</p>
<p>
To make the processing faster, the regular expression
should be rewritten such that the two <code>\d+</code>
sub-expressions do not have overlapping matches:
<code>/^0\.\d+(E\d+)?$/</code>.
</p>
</example>
<include src="ReDoSReferences.inc.qhelp"/>
</qhelp>

View File

@@ -0,0 +1,30 @@
/**
* @name Polynomial regular expression used on uncontrolled data
* @description A regular expression that can require polynomial time
* to match may be vulnerable to denial-of-service attacks.
* @kind path-problem
* @problem.severity warning
* @precision high
* @id rb/polynomial-redos
* @tags security
* external/cwe/cwe-1333
* external/cwe/cwe-730
* external/cwe/cwe-400
*/
import DataFlow::PathGraph
import codeql.ruby.DataFlow
import codeql.ruby.regexp.PolynomialReDoS
import codeql.ruby.regexp.SuperlinearBackTracking
from
PolynomialReDoS::Configuration config, DataFlow::PathNode source, DataFlow::PathNode sink,
PolynomialReDoS::Sink sinkNode, PolynomialBackTrackingTerm regexp
where
config.hasFlowPath(source, sink) and
sinkNode = sink.getNode() and
regexp = sinkNode.getRegExp()
select sinkNode.getHighlight(), source, sink,
"This $@ that depends on $@ may run slow on strings " + regexp.getPrefixMessage() +
"with many repetitions of '" + regexp.getPumpString() + "'.", regexp, "regular expression",
source.getNode(), "a user-provided value"