JS: add js/exploitable-polynomial-redos

This commit is contained in:
Esben Sparre Andreasen
2020-02-12 12:56:55 +01:00
parent 091c6c063c
commit 1b73cee692
15 changed files with 775 additions and 63 deletions

View File

@@ -1,70 +1,34 @@
<!DOCTYPE qhelp PUBLIC
"-//Semmle//qhelp//EN"
"qhelp.dtd">
"-//Semmle//qhelp//EN"
"qhelp.dtd">
<qhelp>
<overview>
<p>
Some regular expressions take a very long time to match certain input strings to the point where
the time it takes to match a string of length <i>n</i> is proportional to <i>2<sup>n</sup></i>.
Such regular expressions can negatively affect performance, or even allow a malicious user to
perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular
expression to match.
</p>
<p>
The regular expression engines provided by many popular JavaScript platforms use backtracking
non-deterministic finite automata to implement regular expression matching. While this approach
is space-efficient and allows supporting advanced features like capture groups, it is not
time-efficient in general. The worst-case time complexity of such an automaton can be exponential,
meaning that for strings of a certain shape, increasing the input length by ten characters may
make the automaton about 1000 times slower.
</p>
<p>
Typically, a regular expression is affected by this problem if it contains a repetition of the
form <code>r*</code> or <code>r+</code> where the sub-expression <code>r</code> is ambiguous in
the sense that it can match some string in multiple ways. More information about the precise
circumstances can be found in the references.
</p>
</overview>
<include src="ReDoSIntroduction.qhelp" />
<recommendation>
<p>
Modify the regular expression to remove the ambiguity.
</p>
</recommendation>
<example>
<p>
Consider this regular expression:
</p>
<sample language="javascript">
/^_(__|.)+_$/
</sample>
<p>
Its sub-expression <code>"(__|.)+?"</code> can match the string <code>"__"</code> either by the
first alternative <code>"__"</code> to the left of the <code>"|"</code> operator, or by two
repetitions of the second alternative <code>"."</code> to the right. Thus, a string consisting
of an odd number of underscores followed by some other character will cause the regular
expression engine to run for an exponential amount of time before rejecting the input.
</p>
<p>
This problem can be avoided by rewriting the regular expression to remove the ambiguity between
the two branches of the alternative inside the repetition:
</p>
<sample language="javascript">
/^_(__|[^_])+_$/
</sample>
</example>
<example>
<p>
Consider this regular expression:
</p>
<sample language="javascript">
/^_(__|.)+_$/
</sample>
<p>
Its sub-expression <code>"(__|.)+?"</code> can match the string <code>"__"</code> either by the
first alternative <code>"__"</code> to the left of the <code>"|"</code> operator, or by two
repetitions of the second alternative <code>"."</code> to the right. Thus, a string consisting
of an odd number of underscores followed by some other character will cause the regular
expression engine to run for an exponential amount of time before rejecting the input.
</p>
<p>
This problem can be avoided by rewriting the regular expression to remove the ambiguity between
the two branches of the alternative inside the repetition:
</p>
<sample language="javascript">
/^_(__|[^_])+_$/
</sample>
</example>
<include src="ReDoSReferences.qhelp"/>
<references>
<li>
OWASP:
<a href="https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS">Regular expression Denial of Service - ReDoS</a>.
</li>
<li>Wikipedia: <a href="https://en.wikipedia.org/wiki/ReDoS">ReDoS</a>.</li>
<li>Wikipedia: <a href="https://en.wikipedia.org/wiki/Time_complexity">Time complexity</a>.</li>
<li>James Kirrage, Asiri Rathnayake, Hayo Thielecke:
<a href="http://www.cs.bham.ac.uk/~hxt/research/reg-exp-sec.pdf">Static Analysis for Regular Expression Denial-of-Service Attack</a>.
</li>
</references>
</qhelp>