mirror of
https://github.com/github/codeql.git
synced 2025-12-21 19:26:31 +01:00
Update docs to be about Java
This commit is contained in:
@@ -14,13 +14,13 @@
|
||||
|
||||
</p>
|
||||
|
||||
<sample language="python">
|
||||
re.sub(r"^\s+|\s+$", "", text) # BAD
|
||||
<sample language="java">
|
||||
Pattern.compile("^\\s+|\\s+$").matcher(text).replaceAll("") // BAD
|
||||
</sample>
|
||||
|
||||
<p>
|
||||
|
||||
The sub-expression <code>"\s+$"</code> will match the
|
||||
The sub-expression <code>"\\s+$"</code> will match the
|
||||
whitespace characters in <code>text</code> from left to right, but it
|
||||
can start matching anywhere within a whitespace sequence. This is
|
||||
problematic for strings that do <strong>not</strong> end with a whitespace
|
||||
@@ -45,14 +45,14 @@
|
||||
Avoid this problem by rewriting the regular expression to
|
||||
not contain the ambiguity about when to start matching whitespace
|
||||
sequences. For instance, by using a negative look-behind
|
||||
(<code>^\s+|(?<!\s)\s+$</code>), or just by using the built-in strip
|
||||
method (<code>text.strip()</code>).
|
||||
(<code>"^\\s+|(?<!\\s)\\s+$"</code>), or just by using the built-in trim
|
||||
method (<code>text.trim()</code>).
|
||||
|
||||
</p>
|
||||
|
||||
<p>
|
||||
|
||||
Note that the sub-expression <code>"^\s+"</code> is
|
||||
Note that the sub-expression <code>"^\\s+"</code> is
|
||||
<strong>not</strong> problematic as the <code>^</code> anchor restricts
|
||||
when that sub-expression can start matching, and as the regular
|
||||
expression engine matches from left to right.
|
||||
@@ -70,8 +70,8 @@
|
||||
using scientific notation:
|
||||
</p>
|
||||
|
||||
<sample language="python">
|
||||
^0\.\d+E?\d+$ # BAD
|
||||
<sample language="java">
|
||||
"^0\\.\\d+E?\\d+$""
|
||||
</sample>
|
||||
|
||||
<p>
|
||||
@@ -97,7 +97,7 @@
|
||||
|
||||
To make the processing faster, the regular expression
|
||||
should be rewritten such that the two <code>\d+</code> sub-expressions
|
||||
do not have overlapping matches: <code>^0\.\d+(E\d+)?$</code>.
|
||||
do not have overlapping matches: <code>"^0\\.\\d+(E\\d+)?$"</code>.
|
||||
|
||||
</p>
|
||||
|
||||
|
||||
@@ -10,7 +10,7 @@
|
||||
<p>
|
||||
Consider this regular expression:
|
||||
</p>
|
||||
<sample language="python">
|
||||
<sample language="java">
|
||||
^_(__|.)+_$
|
||||
</sample>
|
||||
<p>
|
||||
@@ -24,7 +24,7 @@
|
||||
This problem can be avoided by rewriting the regular expression to remove the ambiguity between
|
||||
the two branches of the alternative inside the repetition:
|
||||
</p>
|
||||
<sample language="python">
|
||||
<sample language="java">
|
||||
^_(__|[^_])+_$
|
||||
</sample>
|
||||
</example>
|
||||
|
||||
@@ -17,7 +17,7 @@
|
||||
|
||||
<p>
|
||||
|
||||
The regular expression engine provided by Python uses a backtracking non-deterministic finite
|
||||
The regular expression engine provided by Java uses a backtracking non-deterministic finite
|
||||
automata to implement regular expression matching. While this approach
|
||||
is space-efficient and allows supporting advanced features like
|
||||
capture groups, it is not time-efficient in general. The worst-case
|
||||
@@ -38,6 +38,11 @@
|
||||
references.
|
||||
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Note that Java versions 9 and above have some mitigations against ReDoS; however they aren't perfect
|
||||
and more complex regular expressions can still be affected by this problem.
|
||||
</p>
|
||||
</overview>
|
||||
|
||||
<recommendation>
|
||||
@@ -48,6 +53,8 @@
|
||||
ensure that the strings matched with the regular expression are short
|
||||
enough that the time-complexity does not matter.
|
||||
|
||||
Alternatively, an alternate regex library that guarantees linear time execution, such as Google's RE2J, may be used.
|
||||
|
||||
</p>
|
||||
|
||||
</recommendation>
|
||||
|
||||
Reference in New Issue
Block a user