Commit Graph

31 Commits

Author SHA1 Message Date
Tony Torralba
d87c8c75d6 Python: Remove omittable exists variables 2023-01-10 13:37:35 +01:00
erik-krogh
6c8b1cf4be changes based on Python review 2022-12-19 11:20:31 +01:00
erik-krogh
ba7321ac5c add qldoc to RegExpCharEscape 2022-12-18 17:23:45 +01:00
erik-krogh
26c5480ee6 share {js,rb}/regex/missing-regexp-anchor 2022-12-18 17:23:41 +01:00
erik-krogh
f67d0bc8c0 put the shared HostnameRegexp code in the shared regex pack 2022-12-17 17:26:18 +01:00
erik-krogh
95f35196e4 add missing additional keywords 2022-11-23 20:45:51 +01:00
Erik Krogh Kristensen
99636ba344 fix typo
Co-authored-by: yoff <lerchedahl@gmail.com>
2022-11-14 17:35:55 +01:00
erik-krogh
05605480ae drive-by simplification of the python regex-tree 2022-11-07 14:31:27 +01:00
erik-krogh
1aeaefca7f add a Python implementation of RegexTreeViewSig 2022-11-07 14:31:27 +01:00
erik-krogh
5fbcbbc584 move existing regex-tree into a module 2022-11-07 14:31:27 +01:00
erik-krogh
7675571daa fix RegExpEscape::getValue having multiple results for some escapes 2022-09-27 13:25:23 +02:00
Taus
38b8640582 Python: Fix bad join in RegExpBackRef::getGroup
Although this wasn't (as far as I know) causing any performance issues,
it was making the join-order badness report quite noisy, and so I
figured it was worth fixing.

Before:
```
Tuple counts for RegexTreeView::RegExpBackRef::getGroup#dispred#f0820431#ff/2@d3441d0b after 84ms:
1501195 ~3%     {2} r1 = JOIN RegexTreeView::RegExpTerm::getLiteral#dispred#f0820431#ff_10#join_rhs WITH RegexTreeView::RegExpTerm::getLiteral#dispred#f0820431#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'result', Lhs.1 'result'
149     ~0%     {5} r2 = JOIN r1 WITH RegexTreeView::RegExpBackRef#class#31aac2a7#ffff ON FIRST 1 OUTPUT Rhs.1, Rhs.2, Rhs.3, Lhs.1 'result', Lhs.0 'this'
149     ~1%     {3} r3 = JOIN r2 WITH regex::RegexString::numbered_backreference#dispred#f0820431#ffff ON FIRST 3 OUTPUT Lhs.3 'result', Rhs.3, Lhs.4 'this'
4       ~0%     {2} r4 = JOIN r3 WITH RegexTreeView::RegExpGroup::getNumber#dispred#f0820431#ff ON FIRST 2 OUTPUT Lhs.2 'this', Lhs.0 'result'

1501195 ~3%     {2} r5 = JOIN RegexTreeView::RegExpTerm::getLiteral#dispred#f0820431#ff_10#join_rhs WITH RegexTreeView::RegExpTerm::getLiteral#dispred#f0820431#ff_10#join_rhs ON FIRST 1 OUTPUT Lhs.1 'result', Rhs.1 'result'
42526   ~0%     {5} r6 = JOIN r5 WITH RegexTreeView::RegExpGroup#31aac2a7#ffff ON FIRST 1 OUTPUT Lhs.1 'this', Lhs.0 'result', Rhs.1, Rhs.2, Rhs.3
22      ~0%     {8} r7 = JOIN r6 WITH RegexTreeView::RegExpBackRef#class#31aac2a7#ffff ON FIRST 1 OUTPUT Lhs.2, Lhs.3, Lhs.4, Lhs.1 'result', Lhs.0 'this', Rhs.1, Rhs.2, Rhs.3
0       ~0%     {6} r8 = JOIN r7 WITH regex::RegexString::getGroupName#dispred#f0820431#ffff ON FIRST 3 OUTPUT Lhs.5, Lhs.6, Lhs.7, Rhs.3, Lhs.3 'result', Lhs.4 'this'
0       ~0%     {2} r9 = JOIN r8 WITH regex::RegexString::named_backreference#dispred#f0820431#ffff ON FIRST 4 OUTPUT Lhs.5 'this', Lhs.4 'result'

4       ~0%     {2} r10 = r4 UNION r9
                return r10
```

In this case I opted for a classical solution: tying together the
literal and number (or name) part of the backreference in order to
encourage a two-column join.

After:
```
Tuple counts for RegexTreeView::RegExpBackRef::getGroup#dispred#f0820431#ff/2@b0cc4d5n after 0ms:
898  ~1%     {3} r1 = JOIN RegexTreeView::RegExpTerm::getLiteral#dispred#f0820431#ff WITH RegexTreeView::RegExpGroup::getNumber#dispred#f0820431#ff ON FIRST 1 OUTPUT Lhs.1, Rhs.1, Lhs.0 'result'
4    ~0%     {2} r2 = JOIN r1 WITH RegexTreeView::RegExpBackRef::hasLiteralAndNumber#f0820431#fff_120#join_rhs ON FIRST 2 OUTPUT Rhs.2 'this', Lhs.2 'result'

1110 ~0%     {5} r3 = JOIN RegexTreeView::RegExpGroup#31aac2a7#ffff WITH RegexTreeView::RegExpTerm::getLiteral#dispred#f0820431#ff ON FIRST 1 OUTPUT Lhs.1, Lhs.2, Lhs.3, Lhs.0 'result', Rhs.1
146  ~0%     {3} r4 = JOIN r3 WITH regex::RegexString::getGroupName#dispred#f0820431#ffff ON FIRST 3 OUTPUT Lhs.4, Rhs.3, Lhs.3 'result'
0    ~0%     {2} r5 = JOIN r4 WITH RegexTreeView::RegExpBackRef::hasLiteralAndName#f0820431#fff_120#join_rhs ON FIRST 2 OUTPUT Rhs.2 'this', Lhs.2 'result'

4    ~0%     {2} r6 = r2 UNION r5
            return r6
```
2022-06-28 16:51:09 +00:00
Arthur Baars
74aea81fe3 Ruby: refactor regex libraries 2022-03-24 11:37:02 +01:00
Arthur Baars
79cd7bf8ed Python: create semmle/python/dataflow/new/Regex.qll 2022-03-21 15:57:19 +01:00
Arthur Baars
9412b331db Revert "Revert "Python: switch to shared implementation of IncompleteHostnameRegExp.ql""
This reverts commit 6d24591416.
2022-03-18 16:31:22 +01:00
Arthur Baars
5044f89105 Ruby/Python re-introduce normalCharacterSequence 2022-02-25 18:43:43 +01:00
Arthur Baars
9d9abaf1f9 Apply suggestions from code review
Co-authored-by: yoff <lerchedahl@gmail.com>
2022-02-25 12:27:20 +01:00
Arthur Baars
69ed121ecb Ruby/Python: regex parser: group sequences of 'normal' characters 2022-02-22 16:15:33 +01:00
Nick Rolfe
df6ba43cca Python: treat \A, \Z, \b, \B as special chars, not escapes 2021-11-19 15:49:53 +00:00
Erik Krogh Kristensen
62e729501c make the RegExpEscape::getUnescaped predicate public in python 2021-10-26 15:25:14 +02:00
Erik Krogh Kristensen
44afa34e37 Merge branch 'main' of github.com:github/codeql into htmlReg 2021-10-26 14:46:27 +02:00
Rasmus Wriedt Larsen
7cd5e681dd Merge pull request #6693 from yoff/python/promote-regex-injection
Python: Promote `py/regex-injection`
2021-10-14 14:49:05 +02:00
Taus
a9c8163ab3 Python: Fix uses of implicit this
Quoting the style guide:

"14. _Always_ qualify _calls_ to predicates of the same class with
`this`."
2021-10-13 13:43:36 +00:00
Erik Krogh Kristensen
01e345c2cc implement RegExpWordBoundary in RegexTreeView 2021-09-21 12:13:37 +02:00
Erik Krogh Kristensen
8535e6f281 use toUnicode in RegexTreeView 2021-09-21 12:13:37 +02:00
Rasmus Lerchedahl Petersen
64685f31dc Python: Add missing qldoc
Also do some general cleanup
How was this allowed comitted in the first place?
2021-09-16 16:51:43 +02:00
Taus
b99c075282 Merge pull request #6460 from yoff/python-regex-parsing-consistency-checks
Python: Add regex parsing consistency checks
2021-09-07 13:33:59 +02:00
Erik Krogh Kristensen
df04c5044c use concat instead of strictconcat in RegexTreeView.qll 2021-09-02 08:54:39 +02:00
Rasmus Lerchedahl Petersen
a01fca5d48 Merge branch 'main' of github.com:github/codeql into python-regex-parsing-consistency-checks
To fix conflicts
2021-08-30 18:40:12 +02:00
Erik Krogh Kristensen
f5a1a12435 support case insensitive regexps in the ReDoS queries 2021-08-30 09:59:33 +02:00
Andrew Eisenberg
3660c64328 Packaging: Rafactor Python core libraries
Extract the external facing `qll` files into the codeql/python-all
query pack.
2021-08-24 13:23:45 -07:00