mirror of
https://github.com/github/codeql.git
synced 2025-12-24 12:46:34 +01:00
Python: Fix performance issue in charSet
Observed on `mozilla/bugbug` on the 2.8.0 CLI branch, we had the
following line in the timing report:
```
FullServerSideRequestForgery.ql-17:regex::RegexString::charSet_dispred#fff#antijoin_rhs ............... 1m13s
```
Inspecting the logs, we see the following join:
```
(644s) Tuple counts for regex::RegexString::charSet_dispred#fff#antijoin_rhs/5@f295d1bk after 1m13s:
1 ~0% {1} r1 = CONSTANT(unique string)["]"]
2389 ~4% {3} r2 = JOIN r1 WITH regex::RegexString::nonEscapedCharAt_dispred#fff_201#join_rhs ON FIRST 1 OUTPUT Rhs.1 'arg0', Rhs.2 'arg1', (Rhs.2 'arg1' + 1)
668873 ~0% {6} r3 = JOIN r2 WITH regex::RegexString::char_set_start_dispred#fff ON FIRST 1 OUTPUT Lhs.0 'arg0', "]", Lhs.1 'arg1', Lhs.2 'arg2', Rhs.1 'arg3', Rhs.2 'arg4'
537501371 ~4% {7} r4 = JOIN r3 WITH regex::RegexString::nonEscapedCharAt_dispred#fff_021#join_rhs ON FIRST 2 OUTPUT Lhs.0 'arg0', Lhs.2 'arg1', Lhs.3 'arg2', Lhs.4 'arg3', Lhs.5 'arg4', "]", Rhs.2
269085087 ~0% {7} r5 = SELECT r4 ON In.6 > In.4 'arg4'
89583155 ~3% {7} r6 = SELECT r5 ON In.6 < In.1 'arg1'
89583155 ~26634% {5} r7 = SCAN r6 OUTPUT In.0 'arg0', In.1 'arg1', In.2 'arg2', In.3 'arg3', In.4 'arg4'
return r7
```
Now, this is problematic not just because of the large intermediary join
but also because of the large number of tuples being materialised at the
end. The culprit in this case turns out to be this bit of `charSet`:
```
not exists(int mid | this.nonEscapedCharAt(mid) = "]" | mid > inner_start and mid < inner_end)
```
Rewriting this to instead look for the minimum index at which a `]`
appears resulted in a much nicer join.
I also fixed up a similar issue surrounding the `\N` unicode escape.
Not that I think this will necessarily be relevant, but the `min`-based
solution is more robust either way.
This commit is contained in:
@@ -204,8 +204,7 @@ abstract class RegexString extends Expr {
|
||||
|
|
||||
end = inner_end + 1 and
|
||||
inner_end > inner_start and
|
||||
this.nonEscapedCharAt(inner_end) = "]" and
|
||||
not exists(int mid | this.nonEscapedCharAt(mid) = "]" | mid > inner_start and mid < inner_end)
|
||||
inner_end = min(int i | this.nonEscapedCharAt(i) = "]" and inner_start < i)
|
||||
)
|
||||
}
|
||||
|
||||
@@ -344,9 +343,8 @@ abstract class RegexString extends Expr {
|
||||
this.escapingChar(start) and
|
||||
this.getChar(start + 1) = "N" and
|
||||
this.getChar(start + 2) = "{" and
|
||||
this.getChar(end - 1) = "}" and
|
||||
end > start and
|
||||
not exists(int i | start + 2 < i and i < end - 1 | this.getChar(i) = "}")
|
||||
end - 1 = min(int i | start + 2 < i and this.getChar(i) = "}")
|
||||
}
|
||||
|
||||
/**
|
||||
|
||||
Reference in New Issue
Block a user