mirror of
https://github.com/github/codeql.git
synced 2025-12-24 04:36:35 +01:00
Update docs to be about Java
This commit is contained in:
@@ -14,13 +14,13 @@
|
|||||||
|
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<sample language="python">
|
<sample language="java">
|
||||||
re.sub(r"^\s+|\s+$", "", text) # BAD
|
Pattern.compile("^\\s+|\\s+$").matcher(text).replaceAll("") // BAD
|
||||||
</sample>
|
</sample>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
|
|
||||||
The sub-expression <code>"\s+$"</code> will match the
|
The sub-expression <code>"\\s+$"</code> will match the
|
||||||
whitespace characters in <code>text</code> from left to right, but it
|
whitespace characters in <code>text</code> from left to right, but it
|
||||||
can start matching anywhere within a whitespace sequence. This is
|
can start matching anywhere within a whitespace sequence. This is
|
||||||
problematic for strings that do <strong>not</strong> end with a whitespace
|
problematic for strings that do <strong>not</strong> end with a whitespace
|
||||||
@@ -45,14 +45,14 @@
|
|||||||
Avoid this problem by rewriting the regular expression to
|
Avoid this problem by rewriting the regular expression to
|
||||||
not contain the ambiguity about when to start matching whitespace
|
not contain the ambiguity about when to start matching whitespace
|
||||||
sequences. For instance, by using a negative look-behind
|
sequences. For instance, by using a negative look-behind
|
||||||
(<code>^\s+|(?<!\s)\s+$</code>), or just by using the built-in strip
|
(<code>"^\\s+|(?<!\\s)\\s+$"</code>), or just by using the built-in trim
|
||||||
method (<code>text.strip()</code>).
|
method (<code>text.trim()</code>).
|
||||||
|
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
|
|
||||||
Note that the sub-expression <code>"^\s+"</code> is
|
Note that the sub-expression <code>"^\\s+"</code> is
|
||||||
<strong>not</strong> problematic as the <code>^</code> anchor restricts
|
<strong>not</strong> problematic as the <code>^</code> anchor restricts
|
||||||
when that sub-expression can start matching, and as the regular
|
when that sub-expression can start matching, and as the regular
|
||||||
expression engine matches from left to right.
|
expression engine matches from left to right.
|
||||||
@@ -70,8 +70,8 @@
|
|||||||
using scientific notation:
|
using scientific notation:
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<sample language="python">
|
<sample language="java">
|
||||||
^0\.\d+E?\d+$ # BAD
|
"^0\\.\\d+E?\\d+$""
|
||||||
</sample>
|
</sample>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
@@ -97,7 +97,7 @@
|
|||||||
|
|
||||||
To make the processing faster, the regular expression
|
To make the processing faster, the regular expression
|
||||||
should be rewritten such that the two <code>\d+</code> sub-expressions
|
should be rewritten such that the two <code>\d+</code> sub-expressions
|
||||||
do not have overlapping matches: <code>^0\.\d+(E\d+)?$</code>.
|
do not have overlapping matches: <code>"^0\\.\\d+(E\\d+)?$"</code>.
|
||||||
|
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
|
|||||||
@@ -10,7 +10,7 @@
|
|||||||
<p>
|
<p>
|
||||||
Consider this regular expression:
|
Consider this regular expression:
|
||||||
</p>
|
</p>
|
||||||
<sample language="python">
|
<sample language="java">
|
||||||
^_(__|.)+_$
|
^_(__|.)+_$
|
||||||
</sample>
|
</sample>
|
||||||
<p>
|
<p>
|
||||||
@@ -24,7 +24,7 @@
|
|||||||
This problem can be avoided by rewriting the regular expression to remove the ambiguity between
|
This problem can be avoided by rewriting the regular expression to remove the ambiguity between
|
||||||
the two branches of the alternative inside the repetition:
|
the two branches of the alternative inside the repetition:
|
||||||
</p>
|
</p>
|
||||||
<sample language="python">
|
<sample language="java">
|
||||||
^_(__|[^_])+_$
|
^_(__|[^_])+_$
|
||||||
</sample>
|
</sample>
|
||||||
</example>
|
</example>
|
||||||
|
|||||||
@@ -17,7 +17,7 @@
|
|||||||
|
|
||||||
<p>
|
<p>
|
||||||
|
|
||||||
The regular expression engine provided by Python uses a backtracking non-deterministic finite
|
The regular expression engine provided by Java uses a backtracking non-deterministic finite
|
||||||
automata to implement regular expression matching. While this approach
|
automata to implement regular expression matching. While this approach
|
||||||
is space-efficient and allows supporting advanced features like
|
is space-efficient and allows supporting advanced features like
|
||||||
capture groups, it is not time-efficient in general. The worst-case
|
capture groups, it is not time-efficient in general. The worst-case
|
||||||
@@ -38,6 +38,11 @@
|
|||||||
references.
|
references.
|
||||||
|
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
Note that Java versions 9 and above have some mitigations against ReDoS; however they aren't perfect
|
||||||
|
and more complex regular expressions can still be affected by this problem.
|
||||||
|
</p>
|
||||||
</overview>
|
</overview>
|
||||||
|
|
||||||
<recommendation>
|
<recommendation>
|
||||||
@@ -48,6 +53,8 @@
|
|||||||
ensure that the strings matched with the regular expression are short
|
ensure that the strings matched with the regular expression are short
|
||||||
enough that the time-complexity does not matter.
|
enough that the time-complexity does not matter.
|
||||||
|
|
||||||
|
Alternatively, an alternate regex library that guarantees linear time execution, such as Google's RE2J, may be used.
|
||||||
|
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
</recommendation>
|
</recommendation>
|
||||||
|
|||||||
Reference in New Issue
Block a user