Update docs to be about Java

2025-12-21 19:26:31 +01:00 · 2022-02-22 17:10:20 +00:00
parent c312b4b6b0
commit 5364001aa2
3 changed files with 19 additions and 12 deletions
--- a/java/ql/src/Security/CWE/CWE-730/PolynomialReDoS.qhelp
+++ b/java/ql/src/Security/CWE/CWE-730/PolynomialReDoS.qhelp
@@ -14,13 +14,13 @@

 		</p>

-		<sample language="python">
-			re.sub(r"^\s+|\s+$", "", text) # BAD
+		<sample language="java">
+			Pattern.compile("^\\s+|\\s+$").matcher(text).replaceAll("") // BAD
 		</sample>

 		<p>

-			The sub-expression <code>"\s+$"</code> will match the
+			The sub-expression <code>"\\s+$"</code> will match the
 			whitespace characters in <code>text</code> from left to right, but it
 			can start matching anywhere within a whitespace sequence. This is
 			problematic for strings that do <strong>not</strong> end with a whitespace
@@ -45,14 +45,14 @@
 			Avoid this problem by rewriting the regular expression to
 			not contain the ambiguity about when to start matching whitespace
 			sequences. For instance, by using a negative look-behind
-			(<code>^\s+|(?&lt;!\s)\s+$</code>), or just by using the built-in strip
-			method (<code>text.strip()</code>).
+			(<code>"^\\s+|(?&lt;!\\s)\\s+$"</code>), or just by using the built-in trim
+			method (<code>text.trim()</code>).

 		</p>

 		<p>

-			Note that the sub-expression <code>"^\s+"</code> is
+			Note that the sub-expression <code>"^\\s+"</code> is
 			<strong>not</strong> problematic as the <code>^</code> anchor restricts
 			when that sub-expression can start matching, and as the regular
 			expression engine matches from left to right.
@@ -70,8 +70,8 @@
 			using scientific notation:
 		</p>

-		<sample language="python">
-			^0\.\d+E?\d+$ # BAD
+		<sample language="java">
+			"^0\\.\\d+E?\\d+$"" 
 		</sample>

 		<p>
@@ -97,7 +97,7 @@

 			To make the processing faster, the regular expression
 			should be rewritten such that the two <code>\d+</code> sub-expressions
-			do not have overlapping matches: <code>^0\.\d+(E\d+)?$</code>.
+			do not have overlapping matches: <code>"^0\\.\\d+(E\\d+)?$"</code>.

 		</p>

--- a/java/ql/src/Security/CWE/CWE-730/ReDoS.qhelp
+++ b/java/ql/src/Security/CWE/CWE-730/ReDoS.qhelp
@@ -10,7 +10,7 @@
 		<p>
 			Consider this regular expression:
 		</p>
-		<sample language="python">
+		<sample language="java">
 			^_(__|.)+_$
 		</sample>
 		<p>
@@ -24,7 +24,7 @@
 			This problem can be avoided by rewriting the regular expression to remove the ambiguity between
 			the two branches of the alternative inside the repetition:
 		</p>
-		<sample language="python">
+		<sample language="java">
 			^_(__|[^_])+_$
 		</sample>
 	</example>
--- a/java/ql/src/Security/CWE/CWE-730/ReDoSIntroduction.inc.qhelp
+++ b/java/ql/src/Security/CWE/CWE-730/ReDoSIntroduction.inc.qhelp
@@ -17,7 +17,7 @@

 		<p>

-			The regular expression engine provided by Python uses a backtracking non-deterministic finite
+			The regular expression engine provided by Java uses a backtracking non-deterministic finite
 			automata to implement regular expression matching. While this approach
 			is space-efficient and allows supporting advanced features like
 			capture groups, it is not time-efficient in general. The worst-case
@@ -38,6 +38,11 @@
 			references.

 		</p>
+
+		<p>
+			Note that Java versions 9 and above have some mitigations against ReDoS; however they aren't perfect 
+			and more complex regular expressions can still be affected by this problem. 
+		</p>
 	</overview>

 	<recommendation>
@@ -48,6 +53,8 @@
 			ensure that the strings matched with the regular expression are short
 			enough that the time-complexity does not matter.

+			Alternatively, an alternate regex library that guarantees linear time execution, such as Google's RE2J, may be used. 
+
 		</p>

 	</recommendation>