mirror of
https://github.com/github/codeql.git
synced 2025-12-24 04:36:35 +01:00
Merge pull request #7723 from joefarebrother/redos
Java: Add ReDoS queries
This commit is contained in:
108
java/ql/src/Security/CWE/CWE-730/PolynomialReDoS.qhelp
Normal file
108
java/ql/src/Security/CWE/CWE-730/PolynomialReDoS.qhelp
Normal file
@@ -0,0 +1,108 @@
|
||||
<!DOCTYPE qhelp PUBLIC
|
||||
"-//Semmle//qhelp//EN"
|
||||
"qhelp.dtd">
|
||||
|
||||
<qhelp>
|
||||
|
||||
<include src="ReDoSIntroduction.inc.qhelp" />
|
||||
|
||||
<example>
|
||||
<p>
|
||||
|
||||
Consider this use of a regular expression, which removes
|
||||
all leading and trailing whitespace in a string:
|
||||
|
||||
</p>
|
||||
|
||||
<sample language="java">
|
||||
Pattern.compile("^\\s+|\\s+$").matcher(text).replaceAll("") // BAD
|
||||
</sample>
|
||||
|
||||
<p>
|
||||
|
||||
The sub-expression <code>"\\s+$"</code> will match the
|
||||
whitespace characters in <code>text</code> from left to right, but it
|
||||
can start matching anywhere within a whitespace sequence. This is
|
||||
problematic for strings that do <strong>not</strong> end with a whitespace
|
||||
character. Such a string will force the regular expression engine to
|
||||
process each whitespace sequence once per whitespace character in the
|
||||
sequence.
|
||||
|
||||
</p>
|
||||
|
||||
<p>
|
||||
|
||||
This ultimately means that the time cost of trimming a
|
||||
string is quadratic in the length of the string. So a string like
|
||||
<code>"a b"</code> will take milliseconds to process, but a similar
|
||||
string with a million spaces instead of just one will take several
|
||||
minutes.
|
||||
|
||||
</p>
|
||||
|
||||
<p>
|
||||
|
||||
Avoid this problem by rewriting the regular expression to
|
||||
not contain the ambiguity about when to start matching whitespace
|
||||
sequences. For instance, by using a negative look-behind
|
||||
(<code>"^\\s+|(?<!\\s)\\s+$"</code>), or just by using the built-in trim
|
||||
method (<code>text.trim()</code>).
|
||||
|
||||
</p>
|
||||
|
||||
<p>
|
||||
|
||||
Note that the sub-expression <code>"^\\s+"</code> is
|
||||
<strong>not</strong> problematic as the <code>^</code> anchor restricts
|
||||
when that sub-expression can start matching, and as the regular
|
||||
expression engine matches from left to right.
|
||||
|
||||
</p>
|
||||
|
||||
</example>
|
||||
|
||||
<example>
|
||||
|
||||
<p>
|
||||
|
||||
As a similar, but slightly subtler problem, consider the
|
||||
regular expression that matches lines with numbers, possibly written
|
||||
using scientific notation:
|
||||
</p>
|
||||
|
||||
<sample language="java">
|
||||
"^0\\.\\d+E?\\d+$""
|
||||
</sample>
|
||||
|
||||
<p>
|
||||
|
||||
The problem with this regular expression is in the
|
||||
sub-expression <code>\d+E?\d+</code> because the second
|
||||
<code>\d+</code> can start matching digits anywhere after the first
|
||||
match of the first <code>\d+</code> if there is no <code>E</code> in
|
||||
the input string.
|
||||
|
||||
</p>
|
||||
|
||||
<p>
|
||||
|
||||
This is problematic for strings that do <strong>not</strong>
|
||||
end with a digit. Such a string will force the regular expression
|
||||
engine to process each digit sequence once per digit in the sequence,
|
||||
again leading to a quadratic time complexity.
|
||||
|
||||
</p>
|
||||
|
||||
<p>
|
||||
|
||||
To make the processing faster, the regular expression
|
||||
should be rewritten such that the two <code>\d+</code> sub-expressions
|
||||
do not have overlapping matches: <code>"^0\\.\\d+(E\\d+)?$"</code>.
|
||||
|
||||
</p>
|
||||
|
||||
</example>
|
||||
|
||||
<include src="ReDoSReferences.inc.qhelp"/>
|
||||
|
||||
</qhelp>
|
||||
24
java/ql/src/Security/CWE/CWE-730/PolynomialReDoS.ql
Normal file
24
java/ql/src/Security/CWE/CWE-730/PolynomialReDoS.ql
Normal file
@@ -0,0 +1,24 @@
|
||||
/**
|
||||
* @name Polynomial regular expression used on uncontrolled data
|
||||
* @description A regular expression that can require polynomial time
|
||||
* to match may be vulnerable to denial-of-service attacks.
|
||||
* @kind path-problem
|
||||
* @problem.severity warning
|
||||
* @security-severity 7.5
|
||||
* @precision high
|
||||
* @id java/polynomial-redos
|
||||
* @tags security
|
||||
* external/cwe/cwe-730
|
||||
* external/cwe/cwe-400
|
||||
*/
|
||||
|
||||
import java
|
||||
import semmle.code.java.security.performance.PolynomialReDoSQuery
|
||||
import DataFlow::PathGraph
|
||||
|
||||
from DataFlow::PathNode source, DataFlow::PathNode sink, PolynomialBackTrackingTerm regexp
|
||||
where hasPolynomialReDoSResult(source, sink, regexp)
|
||||
select sink, source, sink,
|
||||
"This $@ that depends on $@ may run slow on strings " + regexp.getPrefixMessage() +
|
||||
"with many repetitions of '" + regexp.getPumpString() + "'.", regexp, "regular expression",
|
||||
source.getNode(), "a user-provided value"
|
||||
34
java/ql/src/Security/CWE/CWE-730/ReDoS.qhelp
Normal file
34
java/ql/src/Security/CWE/CWE-730/ReDoS.qhelp
Normal file
@@ -0,0 +1,34 @@
|
||||
<!DOCTYPE qhelp PUBLIC
|
||||
"-//Semmle//qhelp//EN"
|
||||
"qhelp.dtd">
|
||||
|
||||
<qhelp>
|
||||
|
||||
<include src="ReDoSIntroduction.inc.qhelp" />
|
||||
|
||||
<example>
|
||||
<p>
|
||||
Consider this regular expression:
|
||||
</p>
|
||||
<sample language="java">
|
||||
^_(__|.)+_$
|
||||
</sample>
|
||||
<p>
|
||||
Its sub-expression <code>"(__|.)+?"</code> can match the string <code>"__"</code> either by the
|
||||
first alternative <code>"__"</code> to the left of the <code>"|"</code> operator, or by two
|
||||
repetitions of the second alternative <code>"."</code> to the right. Thus, a string consisting
|
||||
of an odd number of underscores followed by some other character will cause the regular
|
||||
expression engine to run for an exponential amount of time before rejecting the input.
|
||||
</p>
|
||||
<p>
|
||||
This problem can be avoided by rewriting the regular expression to remove the ambiguity between
|
||||
the two branches of the alternative inside the repetition:
|
||||
</p>
|
||||
<sample language="java">
|
||||
^_(__|[^_])+_$
|
||||
</sample>
|
||||
</example>
|
||||
|
||||
<include src="ReDoSReferences.inc.qhelp"/>
|
||||
|
||||
</qhelp>
|
||||
26
java/ql/src/Security/CWE/CWE-730/ReDoS.ql
Normal file
26
java/ql/src/Security/CWE/CWE-730/ReDoS.ql
Normal file
@@ -0,0 +1,26 @@
|
||||
/**
|
||||
* @name Inefficient regular expression
|
||||
* @description A regular expression that requires exponential time to match certain inputs
|
||||
* can be a performance bottleneck, and may be vulnerable to denial-of-service
|
||||
* attacks.
|
||||
* @kind problem
|
||||
* @problem.severity error
|
||||
* @security-severity 7.5
|
||||
* @precision high
|
||||
* @id java/redos
|
||||
* @tags security
|
||||
* external/cwe/cwe-730
|
||||
* external/cwe/cwe-400
|
||||
*/
|
||||
|
||||
import java
|
||||
import semmle.code.java.security.performance.ExponentialBackTracking
|
||||
|
||||
from RegExpTerm t, string pump, State s, string prefixMsg
|
||||
where
|
||||
hasReDoSResult(t, pump, s, prefixMsg) and
|
||||
// exclude verbose mode regexes for now
|
||||
not t.getRegex().getAMode() = "VERBOSE"
|
||||
select t,
|
||||
"This part of the regular expression may cause exponential backtracking on strings " + prefixMsg +
|
||||
"containing many repetitions of '" + pump + "'."
|
||||
61
java/ql/src/Security/CWE/CWE-730/ReDoSIntroduction.inc.qhelp
Normal file
61
java/ql/src/Security/CWE/CWE-730/ReDoSIntroduction.inc.qhelp
Normal file
@@ -0,0 +1,61 @@
|
||||
<!DOCTYPE qhelp PUBLIC
|
||||
"-//Semmle//qhelp//EN"
|
||||
"qhelp.dtd">
|
||||
<qhelp>
|
||||
<overview>
|
||||
<p>
|
||||
|
||||
Some regular expressions take a long time to match certain
|
||||
input strings to the point where the time it takes to match a string
|
||||
of length <i>n</i> is proportional to <i>n<sup>k</sup></i> or even
|
||||
<i>2<sup>n</sup></i>. Such regular expressions can negatively affect
|
||||
performance, or even allow a malicious user to perform a Denial of
|
||||
Service ("DoS") attack by crafting an expensive input string for the
|
||||
regular expression to match.
|
||||
|
||||
</p>
|
||||
|
||||
<p>
|
||||
|
||||
The regular expression engine provided by Java uses a backtracking non-deterministic finite
|
||||
automata to implement regular expression matching. While this approach
|
||||
is space-efficient and allows supporting advanced features like
|
||||
capture groups, it is not time-efficient in general. The worst-case
|
||||
time complexity of such an automaton can be polynomial or even
|
||||
exponential, meaning that for strings of a certain shape, increasing
|
||||
the input length by ten characters may make the automaton about 1000
|
||||
times slower.
|
||||
|
||||
</p>
|
||||
|
||||
<p>
|
||||
|
||||
Typically, a regular expression is affected by this
|
||||
problem if it contains a repetition of the form <code>r*</code> or
|
||||
<code>r+</code> where the sub-expression <code>r</code> is ambiguous
|
||||
in the sense that it can match some string in multiple ways. More
|
||||
information about the precise circumstances can be found in the
|
||||
references.
|
||||
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Note that Java versions 9 and above have some mitigations against ReDoS; however they aren't perfect
|
||||
and more complex regular expressions can still be affected by this problem.
|
||||
</p>
|
||||
</overview>
|
||||
|
||||
<recommendation>
|
||||
|
||||
<p>
|
||||
|
||||
Modify the regular expression to remove the ambiguity, or
|
||||
ensure that the strings matched with the regular expression are short
|
||||
enough that the time-complexity does not matter.
|
||||
|
||||
Alternatively, an alternate regex library that guarantees linear time execution, such as Google's RE2J, may be used.
|
||||
|
||||
</p>
|
||||
|
||||
</recommendation>
|
||||
</qhelp>
|
||||
16
java/ql/src/Security/CWE/CWE-730/ReDoSReferences.inc.qhelp
Normal file
16
java/ql/src/Security/CWE/CWE-730/ReDoSReferences.inc.qhelp
Normal file
@@ -0,0 +1,16 @@
|
||||
<!DOCTYPE qhelp PUBLIC
|
||||
"-//Semmle//qhelp//EN"
|
||||
"qhelp.dtd">
|
||||
<qhelp>
|
||||
<references>
|
||||
<li>
|
||||
OWASP:
|
||||
<a href="https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS">Regular expression Denial of Service - ReDoS</a>.
|
||||
</li>
|
||||
<li>Wikipedia: <a href="https://en.wikipedia.org/wiki/ReDoS">ReDoS</a>.</li>
|
||||
<li>Wikipedia: <a href="https://en.wikipedia.org/wiki/Time_complexity">Time complexity</a>.</li>
|
||||
<li>James Kirrage, Asiri Rathnayake, Hayo Thielecke:
|
||||
<a href="http://www.cs.bham.ac.uk/~hxt/research/reg-exp-sec.pdf">Static Analysis for Regular Expression Denial-of-Service Attack</a>.
|
||||
</li>
|
||||
</references>
|
||||
</qhelp>
|
||||
Reference in New Issue
Block a user