Commit Graph

4722 Commits

Author SHA1 Message Date
Tom Hvitved
2de892bfd8 Python: Points-to performance improvements 2022-02-10 10:29:30 +01:00
Tamás Vajk
6483a92587 Merge pull request #7865 from github/post-release-prep/codeql-cli-2.8.0
Post-release preparation for codeql-cli-2.8.0
2022-02-09 16:42:38 +01:00
Rasmus Wriedt Larsen
9d5e8d5bd8 Merge pull request #7842 from RasmusWL/consistency-queires
Misc: Streamline `consistency-queries/qlpack.yml`
2022-02-09 13:42:18 +01:00
Tom Hvitved
9440a45015 Merge branch 'main' into post-release-prep/codeql-cli-2.8.0 2022-02-09 09:40:33 +01:00
Rasmus Wriedt Larsen
eb109828c0 Merge pull request #7252 from museljh/feature/cwe-338
Python: CWE-338 insecureRandomness
2022-02-07 19:30:06 +01:00
Rasmus Wriedt Larsen
32cd7d6fa7 Add groups to all consistency-queries/qlpack.yml
as discussed in PR review
2022-02-07 11:15:48 +01:00
github-actions[bot]
b4ab86c020 Post-release preparation for codeql-cli-2.8.0 2022-02-06 23:34:07 +00:00
yoff
182c62f5c3 Merge pull request #7838 from tausbn/python-fix-charset-performance-problem
Python: Fix performance issue in `charSet`
2022-02-04 14:18:13 +01:00
Taus
67be20f368 Python: Remove implied inequalities
Also gets rid of `inner_end`, since we're already doing `end - 1 = ...`
in the other fix (and so this is more consistent).
2022-02-04 12:46:06 +00:00
Rasmus Wriedt Larsen
c817ba5718 Python: Add consistency-queries/qlpack.yml
But no queries yet
2022-02-04 12:08:54 +01:00
Rasmus Wriedt Larsen
e9b496ba73 Merge pull request #7831 from RasmusWL/printast-remove-regexp
Python: Remove `RegExpTerm` from PrintAST
2022-02-04 11:38:58 +01:00
Harry Maclean
ab7fd89653 Merge pull request #7663 from github/hmac/api-graph-subclass
Ruby: Add basic subclassing support to API Graphs
2022-02-04 10:19:07 +13:00
Taus
22aa4c9379 Python: Fix performance issue in charSet
Observed on `mozilla/bugbug` on the 2.8.0 CLI branch, we had the
following line in the timing report:
```
FullServerSideRequestForgery.ql-17:regex::RegexString::charSet_dispred#fff#antijoin_rhs ............... 1m13s
```

Inspecting the logs, we see the following join:

```
(644s) Tuple counts for regex::RegexString::charSet_dispred#fff#antijoin_rhs/5@f295d1bk after 1m13s:
1         ~0%         {1} r1 = CONSTANT(unique string)["]"]
2389      ~4%         {3} r2 = JOIN r1 WITH regex::RegexString::nonEscapedCharAt_dispred#fff_201#join_rhs ON FIRST 1 OUTPUT Rhs.1 'arg0', Rhs.2 'arg1', (Rhs.2 'arg1' + 1)
668873    ~0%         {6} r3 = JOIN r2 WITH regex::RegexString::char_set_start_dispred#fff ON FIRST 1 OUTPUT Lhs.0 'arg0', "]", Lhs.1 'arg1', Lhs.2 'arg2', Rhs.1 'arg3', Rhs.2 'arg4'
537501371 ~4%         {7} r4 = JOIN r3 WITH regex::RegexString::nonEscapedCharAt_dispred#fff_021#join_rhs ON FIRST 2 OUTPUT Lhs.0 'arg0', Lhs.2 'arg1', Lhs.3 'arg2', Lhs.4 'arg3', Lhs.5 'arg4', "]", Rhs.2
269085087 ~0%         {7} r5 = SELECT r4 ON In.6 > In.4 'arg4'
89583155  ~3%         {7} r6 = SELECT r5 ON In.6 < In.1 'arg1'
89583155  ~26634%     {5} r7 = SCAN r6 OUTPUT In.0 'arg0', In.1 'arg1', In.2 'arg2', In.3 'arg3', In.4 'arg4'
                    return r7
```
Now, this is problematic not just because of the large intermediary join
but also because of the large number of tuples being materialised at the
end. The culprit in this case turns out to be this bit of `charSet`:
```
not exists(int mid | this.nonEscapedCharAt(mid) = "]" | mid > inner_start and mid < inner_end)
```

Rewriting this to instead look for the minimum index at which a `]`
appears resulted in a much nicer join.

I also fixed up a similar issue surrounding the `\N` unicode escape.
Not that I think this will necessarily be relevant, but the `min`-based
solution is more robust either way.
2022-02-03 20:42:04 +00:00
Chuan-kai Lin
c8bc5cfa75 Merge pull request #7825 from github/cklin/python-downgrade-scripts
Python: adjust downgrade script location and format
2022-02-03 11:40:07 -08:00
Rasmus Wriedt Larsen
8386b36217 Python: Apply suggestions from code review
Co-authored-by: Nick Rolfe <nickrolfe@github.com>
2022-02-03 15:00:04 +01:00
Rasmus Wriedt Larsen
cf68148316 Python: Add change-note 2022-02-03 14:29:02 +01:00
Rasmus Wriedt Larsen
e2de0e61ca Python: Remove RegExpTerm from PrintAST
Since this caused bad performance (as we had to evaluate points-to).

Fixes https://github.com/github/codeql/issues/6964

This approach was motivated by the comment on the issue from @tausbn:

> We discussed this internally in the CodeQL Python team, and have
> agreed that the best approach for now is to disable the printing of
> regex ASTs.

I tried to keep our RegExpTerm logic, but doing the fix below did not
work, and still evaluated RegExpTerm :| I guess we will just have to
revert this PR if we want it back

```diff
   TRegExpTermNode(RegExpTerm term) {
+    none() and
     exists(StrConst str | term.getRootTerm() = getParsedRegExp(str) and shouldPrint(str, _))
   }
```
2022-02-03 14:22:14 +01:00
Erik Krogh Kristensen
e93c46ad31 Merge pull request #7811 from erik-krogh/pyApiIpa
Python: refactor API-graph labels to an IPA type
2022-02-03 12:31:39 +01:00
Tom Hvitved
6bb71f051b Merge pull request #7791 from hvitved/dataflow/inline-local-flow-star
Data flow: Inline `local(Expr|Instruction)?(Flow|Taint)`
2022-02-03 09:02:43 +01:00
Chuan-kai Lin
df91ee6616 Python: adjust downgrade script location and format 2022-02-02 14:23:21 -08:00
Arthur Baars
33b97f3e0c Update synchronized files 2022-02-02 13:30:45 +01:00
CodeQL CI
7bb11b837c Merge pull request #7788 from yoff/python/remove-library-annotation
Approved by tausbn
2022-02-02 03:51:00 -08:00
liangjinhuang
1dd15fa235 style:auto format 2022-02-02 01:30:54 +08:00
liangjinhuang
976e484c57 style:move all source files under src/experimental & feat:modify source regular matching rules 2022-02-02 01:14:51 +08:00
Erik Krogh Kristensen
e06f6529f1 refactor API-graph labels to an IPA type 2022-02-01 17:32:08 +01:00
liangjinhuang
1885b683f7 style:formatDocument 2022-02-02 00:21:26 +08:00
liangjinhuang
af2e8ff8c6 feat:modify source regular matching rules 2022-02-02 00:10:15 +08:00
museljh
012434b152 Update python/ql/src/experimental/Security/CWE-338/InsecureRandomness.ql
Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
2022-02-01 19:00:06 +08:00
museljh
a6002186bd Update python/ql/src/experimental/Security/CWE-338/InsecureRandomness.ql
Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
2022-02-01 18:59:12 +08:00
Rasmus Wriedt Larsen
f7a0b17ed6 Merge pull request #7687 from yoff/python/PathInjection-FlowState
python: Rewrite path injection query to use flow state
2022-02-01 11:33:37 +01:00
Taus
4a29095e3b Python: Fix bad join order in TPythonTuple
TL;DR: Something introduced the following bad join order:
```
(227s) Tuple counts for dom#TObject::TPythonTuple#ff/2@i2#8f58670w after 3m46s:
25000      ~0%     {2} r1 = SCAN PointsToContext::PointsToContext::appliesToScope_dispred#ff#prev_delta OUTPUT In.1, In.0 'context'
24000      ~1%     {2} r2 = JOIN r1 WITH @py_scope#f ON FIRST 1 OUTPUT Lhs.1 'context', Lhs.0
1076876712 ~6%     {3} r3 = JOIN r2 WITH Flow::TupleNode#class#f CARTESIAN PRODUCT OUTPUT Rhs.0, Lhs.0 'context', Lhs.1
870129666  ~0%     {3} r4 = JOIN r3 WITH Flow::ControlFlowNode::isLoad_dispred#f ON FIRST 1 OUTPUT Lhs.1 'context', Lhs.2, Lhs.0 'origin'
870129000  ~0%     {3} r5 = r4 AND NOT dom#TObject::TPythonTuple#ff#prev(Lhs.2 'origin', Lhs.0 'context')
870129000  ~1%     {3} r6 = SCAN r5 OUTPUT In.2 'origin', In.1, In.0 'context'
9000       ~0%     {2} r7 = JOIN r6 WITH Flow::ControlFlowNode::getScope_dispred#ff ON FIRST 2 OUTPUT Lhs.0 'origin', Lhs.2 'context'
                    return r7
```
(...the above being the tuple counts _at the point when I cancelled the
query_!)

Rewriting the code to force a join between `TupleNode#class` and
`getScope` results in the following join orders:

```
(0s) Tuple counts for TObject::scope_loads_tuplenode#ff/2@b3cf0bo5 after 13ms:
37369 ~3%     {1} r1 = JOIN Flow::TupleNode#class#f WITH Flow::ControlFlowNode::isLoad_dispred#f ON FIRST 1 OUTPUT Lhs.0 'origin'
37369 ~3%     {2} r2 = JOIN r1 WITH Flow::ControlFlowNode::getScope_dispred#ff ON FIRST 1 OUTPUT Rhs.1 's', Lhs.0 'origin'
            return r2
```
and
```
(78s) Tuple counts for dom#TObject::TPythonTuple#ff/2@i53#121c440w after 6ms:
34736 ~3%     {2} r1 = SCAN PointsToContext::PointsToContext::appliesToScope_dispred#ff#prev_delta OUTPUT In.1, In.0 'context'
7370  ~5%     {2} r2 = JOIN r1 WITH TObject::scope_loads_tuplenode#ff ON FIRST 1 OUTPUT Lhs.1 'context', Rhs.1 'origin'
7370  ~5%     {2} r3 = r2 AND NOT dom#TObject::TPythonTuple#ff#prev(Lhs.1 'origin', Lhs.0 'context')
7370  ~1%     {2} r4 = SCAN r3 OUTPUT In.1 'origin', In.0 'context'
            return r4
```
the latter being the largest iteration of `dom#TPythonTuple` throughout
the log.

No other major performance issues were observed.
2022-01-31 16:59:50 +00:00
Tom Hvitved
f2352d8272 Data flow: Inline local(Expr|Instruction)?(Flow|Taint)
Computing a full transitive closure is often bad; by inlining all calls we are
providing more context to the QL optimizer.
2022-01-31 14:33:41 +01:00
Rasmus Lerchedahl Petersen
211345c010 python: remove more annotations 2022-01-31 11:20:59 +01:00
Rasmus Lerchedahl Petersen
cac3862659 python: remove library annotation
to clean up QL warnings.
Should put these in a private module instead?
2022-01-31 08:50:37 +01:00
Rasmus Lerchedahl Petersen
0c3bce1415 python: deprecation
I am slightly concerned that the test now generates many more
intermediate results. I suppose that maes the analysis heavy.
Should the new library get a new name instead, so the old code
does not get evaluated?
2022-01-31 08:32:24 +01:00
Taus
47a57e0c0a Merge pull request #7635 from github/python/support-match
Python/support match
2022-01-28 11:55:46 +01:00
yoff
74d57bbb1a Update python/ql/lib/semmle/python/dataflow/new/internal/DataFlowPrivate.qll
Co-authored-by: Taus <tausbn@github.com>
2022-01-28 11:38:29 +01:00
Dave Bartolomeo
cca74e925f Merge pull request #7724 from github/aeisenberg/examples-groups
Add new groups for examples packs
2022-01-27 12:11:26 -05:00
Rasmus Lerchedahl Petersen
c60df7d69c Merge branch 'main' of github.com:github/codeql into python/support-match 2022-01-27 16:45:17 +01:00
yoff
4632c14280 Merge pull request #7654 from RasmusWL/remove-old-pointsto-queries
Python: Cleanup: Remove old points-to versions of queries
2022-01-27 16:39:01 +01:00
Rasmus Lerchedahl Petersen
b93c04bb79 python: Add reverse flow in some patterns
Particularly in value and literal patterns.
This is getting a little bit into the guards aspect of matching.
We could similarly add reverse flow in terms of
sub-patterns storing to a sequence pattern,
a flow step from alternatives to an-or-pattern, etc..
It does not seem too likely that sources are embedded in patterns
to begin with, but for secrets perhaps?

It is illustrated by the literal test. The value test still fails.
I believe we miss flow in general from the static attribute.
2022-01-27 15:20:23 +01:00
github-actions[bot]
634134f283 Release preparation for version 2.8.0 2022-01-27 10:40:20 +00:00
Rasmus Lerchedahl Petersen
cb52ab669e python: address review comments
The comment about `py_scopes` was simply removed
2022-01-27 11:17:00 +01:00
yoff
e28669e487 Apply suggestions from code review
Co-authored-by: Taus <tausbn@github.com>
2022-01-27 10:31:43 +01:00
Andrew Eisenberg
a7f755cf12 Add new groups for examples packs
Also, remove version numbers. Will make it easier to avoid publishing
the examples packs.
2022-01-26 14:49:18 -08:00
Rasmus Lerchedahl Petersen
47af3a69a5 Merge branch 'main' of github.com:github/codeql into python/support-match 2022-01-26 11:39:46 +01:00
Edoardo Pirovano
1b539eb4dc Merge branch rc/3.4 into main 2022-01-25 16:22:01 +00:00
Harry Maclean
517f2d0823 Add optional results to InlineExpectationsTest
The idea behind optional results is that there may be instances where
each line of source code has many results and you don't want to annotate
all of them, but you still want to ensure that any annotations you do
have are correct.

This change makes that possible by exposing a new predicate
`hasOptionalResult`, which has the same signature as `hasResult`.

Results produced by `hasOptionalResult` will be matched against any
annotations, but the lack of a matching annotation will not cause a
failure.

We will use this in the inline tests for the API edge getASubclass,
because for each API path that uses getASubclass there is always a
shorter path that does not use it, and thus we can't use the normal
shortest-path matching approach that works for other API Graph tests.
2022-01-25 16:41:49 +13:00
Rasmus Wriedt Larsen
301318020f Merge pull request #7455 from haby0/py/add-shutil-module-path-injection-sinks
Python: Add shutil module sinks for path injection query
2022-01-24 20:06:36 +01:00
Rasmus Lerchedahl Petersen
9aa4c4a6a7 python: Add missing input
also update test expectation
2022-01-21 13:55:33 +01:00