Commit Graph

36 Commits

Author SHA1 Message Date
erik-krogh
0fdd06fff5 use my script to delete outdated deprecations 2024-09-03 20:30:58 +02:00
Taus
1c68c987b0 Python: Change all remaining occurrences of StrConst
Done using
```
git grep StrConst | xargs sed -i 's/StrConst/StringLiteral/g'
```
2024-04-22 12:00:09 +00:00
erik-krogh
a7f733ab8c move RegExpInterpretation into Concepts.qll 2023-05-01 10:42:15 +02:00
erik-krogh
2fad406b5c move StdLibRegExpInterpretation to Stdlib.qll 2023-05-01 10:42:15 +02:00
erik-krogh
a64848c022 simplify StdLibRegExpInterpretation to only consider re.compile, because the rest is handled by RegexExecution 2023-05-01 10:42:14 +02:00
erik-krogh
59cc90e547 move Regex into a ParseRegExp file, and rename the class to RegExp 2023-05-01 10:42:14 +02:00
erik-krogh
556bb41999 move all code to find Regex flag into a module 2023-05-01 10:42:14 +02:00
erik-krogh
f0254fc089 introduce RegExpInterpretation instead of RegexString, and move RegexTreeView.qll into a regexp folder 2023-05-01 10:42:13 +02:00
erik-krogh
e677b62241 use type-tracking instead of global dataflow for tracking regular expressions 2023-05-01 10:41:53 +02:00
erik-krogh
8bc8342c7c Py:don't parse regular expressions in system-code 2023-03-16 10:41:30 +01:00
erik-krogh
6e712b293a add tracking of strings to compile-sites for poly-redos, in the style of Ruby 2023-02-02 22:56:20 +01:00
Erik Krogh Kristensen
01f6862965 Merge pull request #11833 from erik-krogh/trackPyReg
PY: track string-constants to regular expression uses
2023-02-01 11:40:42 +01:00
Tony Torralba
d87c8c75d6 Python: Remove omittable exists variables 2023-01-10 13:37:35 +01:00
erik-krogh
10308f5875 track string-constants to regular expression uses 2023-01-06 13:17:31 +01:00
Josh Soref
441d5359cc spelling: recursion
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com>
2022-10-13 11:21:09 -04:00
erik-krogh
5586c9a17e delete old deprecations 2022-08-16 22:27:15 +02:00
Asger F
181a53bd03 Python: Rename getAnImmediateUse -> asSource 2022-06-21 12:44:06 +02:00
Erik Krogh Kristensen
50bfc8eaa0 refactor uses of API::Node::getAUse() that should have been something else 2022-04-07 13:52:13 +02:00
Arthur Baars
af1d949d06 Merge pull request #8489 from aibaars/regex-refactor
Ruby: refactor regex libraries
2022-03-28 12:17:00 +02:00
Rasmus Wriedt Larsen
d51aaf2f91 Python: Import framework-modeling in regex.qll 2022-03-24 14:28:44 +01:00
Arthur Baars
1a9aaf4543 Apply suggestions from code review
Co-authored-by: yoff <lerchedahl@gmail.com>
2022-03-24 11:37:03 +01:00
Arthur Baars
74aea81fe3 Ruby: refactor regex libraries 2022-03-24 11:37:02 +01:00
Arthur Baars
5044f89105 Ruby/Python re-introduce normalCharacterSequence 2022-02-25 18:43:43 +01:00
Arthur Baars
9d9abaf1f9 Apply suggestions from code review
Co-authored-by: yoff <lerchedahl@gmail.com>
2022-02-25 12:27:20 +01:00
Arthur Baars
69ed121ecb Ruby/Python: regex parser: group sequences of 'normal' characters 2022-02-22 16:15:33 +01:00
Taus
67be20f368 Python: Remove implied inequalities
Also gets rid of `inner_end`, since we're already doing `end - 1 = ...`
in the other fix (and so this is more consistent).
2022-02-04 12:46:06 +00:00
Taus
22aa4c9379 Python: Fix performance issue in charSet
Observed on `mozilla/bugbug` on the 2.8.0 CLI branch, we had the
following line in the timing report:
```
FullServerSideRequestForgery.ql-17:regex::RegexString::charSet_dispred#fff#antijoin_rhs ............... 1m13s
```

Inspecting the logs, we see the following join:

```
(644s) Tuple counts for regex::RegexString::charSet_dispred#fff#antijoin_rhs/5@f295d1bk after 1m13s:
1         ~0%         {1} r1 = CONSTANT(unique string)["]"]
2389      ~4%         {3} r2 = JOIN r1 WITH regex::RegexString::nonEscapedCharAt_dispred#fff_201#join_rhs ON FIRST 1 OUTPUT Rhs.1 'arg0', Rhs.2 'arg1', (Rhs.2 'arg1' + 1)
668873    ~0%         {6} r3 = JOIN r2 WITH regex::RegexString::char_set_start_dispred#fff ON FIRST 1 OUTPUT Lhs.0 'arg0', "]", Lhs.1 'arg1', Lhs.2 'arg2', Rhs.1 'arg3', Rhs.2 'arg4'
537501371 ~4%         {7} r4 = JOIN r3 WITH regex::RegexString::nonEscapedCharAt_dispred#fff_021#join_rhs ON FIRST 2 OUTPUT Lhs.0 'arg0', Lhs.2 'arg1', Lhs.3 'arg2', Lhs.4 'arg3', Lhs.5 'arg4', "]", Rhs.2
269085087 ~0%         {7} r5 = SELECT r4 ON In.6 > In.4 'arg4'
89583155  ~3%         {7} r6 = SELECT r5 ON In.6 < In.1 'arg1'
89583155  ~26634%     {5} r7 = SCAN r6 OUTPUT In.0 'arg0', In.1 'arg1', In.2 'arg2', In.3 'arg3', In.4 'arg4'
                    return r7
```
Now, this is problematic not just because of the large intermediary join
but also because of the large number of tuples being materialised at the
end. The culprit in this case turns out to be this bit of `charSet`:
```
not exists(int mid | this.nonEscapedCharAt(mid) = "]" | mid > inner_start and mid < inner_end)
```

Rewriting this to instead look for the minimum index at which a `]`
appears resulted in a much nicer join.

I also fixed up a similar issue surrounding the `\N` unicode escape.
Not that I think this will necessarily be relevant, but the `min`-based
solution is more robust either way.
2022-02-03 20:42:04 +00:00
Erik Krogh Kristensen
3c59aa319e Merge pull request #7245 from erik-krogh/explicit-this-all-the-places
All langs: apply the explicit-this patch to all remaining code
2021-12-07 10:40:26 +01:00
Erik Krogh Kristensen
6ff8d4de5c add all remaining explicit this 2021-11-26 13:50:10 +01:00
Nick Rolfe
df6ba43cca Python: treat \A, \Z, \b, \B as special chars, not escapes 2021-11-19 15:49:53 +00:00
Erik Krogh Kristensen
44afa34e37 Merge branch 'main' of github.com:github/codeql into htmlReg 2021-10-26 14:46:27 +02:00
Erik Krogh Kristensen
a3c55c2aec use set literal instead of big disjunction of literals 2021-10-26 12:55:25 +02:00
Erik Krogh Kristensen
fd64ff9ef1 don't give group numbers to non-capturing groups 2021-09-21 12:15:27 +02:00
Rasmus Lerchedahl Petersen
d37c14880f Python: Copy performance fix 2021-09-14 15:15:50 +02:00
Rasmus Lerchedahl Petersen
a01fca5d48 Merge branch 'main' of github.com:github/codeql into python-regex-parsing-consistency-checks
To fix conflicts
2021-08-30 18:40:12 +02:00
Andrew Eisenberg
3660c64328 Packaging: Rafactor Python core libraries
Extract the external facing `qll` files into the codeql/python-all
query pack.
2021-08-24 13:23:45 -07:00