Update docs to be about Java

This commit is contained in:
Joe Farebrother
2022-02-22 17:10:20 +00:00
parent c312b4b6b0
commit 5364001aa2
3 changed files with 19 additions and 12 deletions

View File

@@ -14,13 +14,13 @@
</p>
<sample language="python">
re.sub(r"^\s+|\s+$", "", text) # BAD
<sample language="java">
Pattern.compile("^\\s+|\\s+$").matcher(text).replaceAll("") // BAD
</sample>
<p>
The sub-expression <code>"\s+$"</code> will match the
The sub-expression <code>"\\s+$"</code> will match the
whitespace characters in <code>text</code> from left to right, but it
can start matching anywhere within a whitespace sequence. This is
problematic for strings that do <strong>not</strong> end with a whitespace
@@ -45,14 +45,14 @@
Avoid this problem by rewriting the regular expression to
not contain the ambiguity about when to start matching whitespace
sequences. For instance, by using a negative look-behind
(<code>^\s+|(?&lt;!\s)\s+$</code>), or just by using the built-in strip
method (<code>text.strip()</code>).
(<code>"^\\s+|(?&lt;!\\s)\\s+$"</code>), or just by using the built-in trim
method (<code>text.trim()</code>).
</p>
<p>
Note that the sub-expression <code>"^\s+"</code> is
Note that the sub-expression <code>"^\\s+"</code> is
<strong>not</strong> problematic as the <code>^</code> anchor restricts
when that sub-expression can start matching, and as the regular
expression engine matches from left to right.
@@ -70,8 +70,8 @@
using scientific notation:
</p>
<sample language="python">
^0\.\d+E?\d+$ # BAD
<sample language="java">
"^0\\.\\d+E?\\d+$""
</sample>
<p>
@@ -97,7 +97,7 @@
To make the processing faster, the regular expression
should be rewritten such that the two <code>\d+</code> sub-expressions
do not have overlapping matches: <code>^0\.\d+(E\d+)?$</code>.
do not have overlapping matches: <code>"^0\\.\\d+(E\\d+)?$"</code>.
</p>

View File

@@ -10,7 +10,7 @@
<p>
Consider this regular expression:
</p>
<sample language="python">
<sample language="java">
^_(__|.)+_$
</sample>
<p>
@@ -24,7 +24,7 @@
This problem can be avoided by rewriting the regular expression to remove the ambiguity between
the two branches of the alternative inside the repetition:
</p>
<sample language="python">
<sample language="java">
^_(__|[^_])+_$
</sample>
</example>

View File

@@ -17,7 +17,7 @@
<p>
The regular expression engine provided by Python uses a backtracking non-deterministic finite
The regular expression engine provided by Java uses a backtracking non-deterministic finite
automata to implement regular expression matching. While this approach
is space-efficient and allows supporting advanced features like
capture groups, it is not time-efficient in general. The worst-case
@@ -38,6 +38,11 @@
references.
</p>
<p>
Note that Java versions 9 and above have some mitigations against ReDoS; however they aren't perfect
and more complex regular expressions can still be affected by this problem.
</p>
</overview>
<recommendation>
@@ -48,6 +53,8 @@
ensure that the strings matched with the regular expression are short
enough that the time-complexity does not matter.
Alternatively, an alternate regex library that guarantees linear time execution, such as Google's RE2J, may be used.
</p>
</recommendation>