Commit Graph

2586 Commits

Author SHA1 Message Date
Rasmus Wriedt Larsen
b3d3d6e142 Python: Move logical test of sanitizers
Don't know why it would ever have been under default sanitizers :D
2020-11-19 16:46:07 +01:00
Rasmus Lerchedahl Petersen
7cbbf3bbf7 Python: slightly nicer test 2020-11-19 16:20:57 +01:00
Rasmus Wriedt Larsen
4c7c940273 Python: Add example of Code Injection FP 2020-11-19 15:05:51 +01:00
Rasmus Wriedt Larsen
7e407d43d2 Python: Change (single) test to match codeql database create 2020-11-19 14:56:18 +01:00
Rasmus Wriedt Larsen
8ffcff0824 Python: Add example of top-level module shadowing stdlib
Although this test is added under the `wrong` folder, the current results from
this CodeQL test is actually correct (compared with the Python
interpreter). However, they don't match what the extractor does when invoked
with `codeql database create`.

Since I deemed it "more than an easy fix" to change the extractor behavior for
`codeql database create` to match the real python behavior, and it turned out to
be quite a challenge to change the extractor behavior for all tests, I'm just
going to make THIS ONE test-case behave like the extractor will with `codeql
database create`...

This is a first commit, to show how the extractor works with qltest by default.

Inspired by the debugging in https://github.com/github/codeql/issues/4640
2020-11-19 14:56:17 +01:00
Rasmus Lerchedahl Petersen
6cc8e5acf1 Python: support psycopg 2020-11-19 12:13:20 +01:00
Rasmus Lerchedahl Petersen
39f134c1c1 Python: reorganized and added to tests 2020-11-19 12:06:58 +01:00
Henning Makholm
a86679a377 Remove unit tests for duplicate-code detection
The old Semmle duplicate-code detection code has never been done when
extracting databases for the CodeQL CLI, except that `codeql test run`
will run it _just_ in order to support tests of the feature. With the
sunsetting of Odasa there's no need to even _test_ the feature anymore.

This commit removes those tests that fail when the duplicate-code
detector is turned off. Once it is merged and bumped, we can finally
remove it from `codeql`.
2020-11-18 16:37:29 +01:00
Rasmus Wriedt Larsen
ab856d6c01 Python: Show getCallableForArgument can have multiple results 2020-11-18 10:44:32 +01:00
Rasmus Wriedt Larsen
abf2902a69 Python: Fix QLDoc
Co-authored-by: yoff <lerchedahl@gmail.com>
2020-11-18 09:47:23 +01:00
Rasmus Wriedt Larsen
39590a39cb Python: Rename helper predicate based on review 2020-11-18 09:26:53 +01:00
Rasmus Wriedt Larsen
14136154d6 Python: Fix bad join order in TypeTracker::callStep
From a local evaluation against flask DB, after
https://github.com/github/codeql/pull/4649 was merged we would get:

```
Tuple counts for TypeTracker::callStep#ff/2@a21b71:
9876     ~0%     {3} r1 = SCAN DataFlowPrivate::DataFlowCall::getArg_dispred#fff AS I OUTPUT I.<2>, I.<0>, I.<1>
9876     ~2%     {3} r2 = JOIN r1 WITH project#DataFlowPrivate::DataFlowCall::getArg_dispred#fff AS R ON FIRST 1 OUTPUT r1.<2>, R.<0>, r1.<1>
72388997 ~0%     {4} r3 = JOIN r2 WITH DataFlowPublic::ParameterNode::isParameterOf_dispred#fff_201#join_rhs AS R ON FIRST 1 OUTPUT r2.<2>, R.<2>, r2.<1>, R.<1>
4952     ~0%     {2} r4 = JOIN r3 WITH DataFlowPrivate::DataFlowCall::getCallable_dispred#ff AS R ON FIRST 2 OUTPUT r3.<2>, r3.<3>
                     return r4
```
2020-11-18 09:17:31 +01:00
Anders Schack-Mulligen
f74fc0ff26 Dataflow: Fix bad join-orders. 2020-11-17 14:28:25 +01:00
Rasmus Lerchedahl Petersen
71830abda0 Python: remaining c# tests, except lambdas
both via nonlocal and via dict
2020-11-17 08:28:11 +01:00
Rasmus Lerchedahl Petersen
27b4c67b9f Python: Start of tests for captured variables 2020-11-16 17:25:39 +01:00
Anders Schack-Mulligen
3dbd48063c Dataflow: Add Unit type for all languages. 2020-11-16 09:02:44 +01:00
Anders Schack-Mulligen
9e45f10c5d Dataflow: Remove headUsesContent. 2020-11-13 15:12:39 +01:00
Anders Schack-Mulligen
e0a6a485df Dataflow: Sync. 2020-11-13 15:12:16 +01:00
Rasmus Wriedt Larsen
9f1d8cd1bb Python: Convert indentation to spaces for VS Code snippets 2020-11-13 13:05:23 +01:00
Rasmus Wriedt Larsen
5200af5244 Python: Add code snippets for VS Code
Notice that in this form, the filename doesn't matter, and you need to specify
`scope` to limit the snippet to only trigger for `ql`.
2020-11-13 10:57:17 +01:00
yoff
86fc9e62ef Merge pull request #4650 from RasmusWL/python-set-literal-formatting
Python: Update set literal formatting
2020-11-11 15:35:12 +01:00
Rasmus Wriedt Larsen
611398586d Merge pull request #4649 from yoff/python-dataflow-cfgparameters
Python: Make `ParameterNode` a `CfgNode`
2020-11-11 10:22:12 +01:00
Rasmus Wriedt Larsen
9ed15732ed Python: Update set literal formatting
Now that auto-formatting rules have been updated
2020-11-11 09:38:25 +01:00
Rasmus Lerchedahl Petersen
0710963fc3 Python: update test expectations
EssaNode -> ControlFlowNode
2020-11-10 23:58:55 +01:00
Jonas Jensen
fc764db8e1 Merge pull request #4643 from nickrolfe/getFileBySourceArchiveName
Replace getEncodedFile with shared getFileBySourceArchiveName predicate
2020-11-10 17:36:29 +01:00
Nick Rolfe
ac4a1f1d9b Update comment to be a QLDoc comment 2020-11-10 14:14:27 +00:00
Nick Rolfe
1e1eb7ee33 Replace getEncodedFile with shared getFileBySourceArchiveName predicate
While also making it work with paths for databases created on Windows.
2020-11-10 13:55:27 +00:00
Anders Schack-Mulligen
89ef6ea4eb C++/C#/Java/JavaScript/Python: Autoformat set literals. 2020-11-10 13:32:27 +01:00
Rasmus Lerchedahl Petersen
109d55eb25 Python: Make ParameterNode a CfgNode
Add a step from that `CfgNode` to the corresponding `EssaNode`.
The intended effect is seen in `ImpliesDataflow.expected`.
The efeect seen in other `.expected`-files is that parameter nodes
change type, that the extra steps are seen, and that flow from
`EssaVar`s is mirrored in flow from `CfgNode`s.
There is one surprise, which is the `.0` node in
`coverage/localFlow.expected`.
2020-11-10 11:35:50 +01:00
yoff
26286e534e Merge pull request #4174 from yoff/SharedDataflow_PointsToImpliesDataflow
Python: Dataflow, Test that `pointsTo` implies data flow
merging now, will fix `self` in a later PR
2020-11-10 10:25:29 +01:00
Rasmus Wriedt Larsen
fbe51c51bb Python: Add missing QLDoc 2020-11-09 09:05:08 +01:00
Rasmus Wriedt Larsen
ed0e4f8425 Python: reasoning about => detecting
Co-authored-by: yoff <lerchedahl@gmail.com>
2020-11-09 09:01:04 +01:00
Taus
a9149b7e47 Python: Update python/ql/src/semmle/python/dataflow/new/internal/DataFlowPrivate.qll
Co-authored-by: yoff <lerchedahl@gmail.com>
2020-11-06 17:15:58 +01:00
Taus Brock-Nannestad
5a9cc0861c Merge branch 'main' into python-add-source-nodes 2020-11-06 17:12:41 +01:00
yoff
45317bcec9 Update python/ql/test/library-tests/PointsTo/new/code/w_function_values.py
Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
2020-11-06 15:03:20 +01:00
Rasmus Wriedt Larsen
9ebe59d393 Python: Move UnsafeDeserialization configuration to own file 2020-11-06 14:27:37 +01:00
Rasmus Wriedt Larsen
d38c48d2c8 Python: Move ReflectedXSS configuration to own file 2020-11-06 14:24:31 +01:00
Rasmus Wriedt Larsen
1897a0d59a Python: Move PathInjection configuration to own file
This one required a bit more thought, but ended up pretty nicely. Had to write
some QLDoc, but I think it turned out OK.
2020-11-06 14:21:23 +01:00
Rasmus Wriedt Larsen
0c6bd8401a Python: Move SqlInjection configuration to own file 2020-11-06 14:09:46 +01:00
Rasmus Wriedt Larsen
6299b73a46 Python: Move CommandInjection configuration to own file 2020-11-06 14:07:06 +01:00
Rasmus Wriedt Larsen
7c04c59456 Python: Move CodeInjection configuration to own file
This makes it easy to extend the sources/sinks of the configuration and re-run
the query from the query console on LGTM.com.

File location in `semmle.<lang>.security.dataflow.<QueryName>.qll` is matching
what we currently do in other languages (JS and C# sampled).

I did not follow the pattern in other languages for wrapping all the code in a
`module CodeInjection`, since I didn't understand the value in doing so -- I
would like confirmation from the other teams if we _should_ actually do that,
before merging.
2020-11-06 13:58:06 +01:00
Rasmus Lerchedahl Petersen
fe186bf854 Python: Add test 2020-11-06 13:30:11 +01:00
Rasmus Lerchedahl Petersen
64b9e9150e Python: only show results in extracted files 2020-11-06 12:01:16 +01:00
Taus Brock-Nannestad
7c58b28e36 Python: Write DataFlow::update more succinctly
This has no impact on performance, but it cleans up the code a bit,
and (hopefully) makes it more readable.
2020-11-05 16:47:41 +01:00
Taus Brock-Nannestad
bae4acabb1 Python: Fix bad join in StrConst::isUnicode
Also fixes a bug ("`B`" was not recognised as a bytestring prefix).

The basic idea behind this fix is that the set of possible prefixes is
fairly small, so it's easier just to precompute them, and then join
them with the entire prefix of the string in question (rather than
look at each string in isolation, get its prefix, and _then_ check
whether it looks like it's a unicode string prefix, which essentially
is what the code did before).
2020-11-05 16:45:27 +01:00
Taus Brock-Nannestad
1251bc57f5 Python: Fix bad join in TObject::literal_instantiation
Here, `context.appliesTo(n)` was being distributed across all of the
disjuncts, which caused poor performance.

The new helper predicate, `literal_node_class` should be fairly small,
since it only applies to a subset of `ControlFlowNode`s, and only
assigns a limited set of `ClassObjectInternal`s to these nodes.
2020-11-05 16:40:29 +01:00
Taus Brock-Nannestad
35a63e2411 Python: Fix bad join in regex::used_as_regex
Since the number of relevant attributes in the `re` module is fairly
small, it made sense to factor this out in a separate predicate, and
the join order also became more sensible.
2020-11-05 16:33:59 +01:00
Taus Brock-Nannestad
035e747ad5 Python: Fix slow use of regexCapture in Builtin::strValue
This is only _really_ expensive when there are a _lot_ of strings in
the database, but for this case, where we're always extracting the
same substring of the string, it's easier -- and faster -- to just
make a substring operation directly.
2020-11-05 16:33:33 +01:00
Taus Brock-Nannestad
83ba8c9bf5 Python: Add LocalSourceNode and flowsTo
This fixes the major performance problem with type tracking on
some (pathological) databases.

The interface could probably be improved a bit. In particular, I'm
thinking that we might want to have `DataFlow::exprNode` return a
`LocalSourceNode` so that a cast isn't necessary in order to use
`flowsTo`.

I have added two `cached` annotations. The one on `flowsTo` is
crucial, as performance regresses without it. The one on
`simpleLocalFlowStep` may not be needed, but Java has a similar
annotation, and to me it makes sense to have this relation cached.
2020-11-05 16:26:03 +01:00
Rasmus Lerchedahl Petersen
6cecd3ba83 Python: Move and rename query 2020-11-05 11:49:39 +01:00