Fixes https://github.com/Semmle/ql/issues/1572
Adjust mock so it's more aligned with what the flask code actually does. Tests
were passing before, even though we didn't handle the case in real code :\
If the legacy configuration is only enabled if there are no other
configurations, defining a configuration in an imported library can lead to
unwanted results. For example, code that uses `any(MyTaintKind t).taints(node)`
would *stop* working, if it did not define its own configuration. (this actually
happened to us)
We performed a dist-compare to ensure there is not a performance deg ration by
doing this. Results at https://git.semmle.com/gist/rasmuswl/a1eca07f3a92f5f65ee78d733e5d260e
Tests that were affected by this:
- RockPaperScissors + Simple: new edges because no configuration was defined for
SqlInjectionTaint or CommandInjectionTaint
- CleartextLogging + CleartextStorage: new edges because no configuration was
defined before, AND duplicate deges.
- TestNode: new edges because no configuration was defined before
- PathInjection: Duplicate edges
- TarSlip: Duplicate edges
- CommandInjection: Duplicate edges
- ReflectedXss: Duplicate edges
- SqlInjection: Duplicate edges
- CodeInjection: Duplicate edges
- StackTraceExposure: Duplicate edges
- UnsafeDeserialization: Duplicate edges
- UrlRedirect: Duplicate edges
We don't use a locked-down version of six, so some internal things probably
changed from the version used last time, and the versoin I have installed.
Long term fix would be to use a specific version of six for tests!
Due to internal PR#35123 we now actually run the tests under
`python/ql/test/2/...`
These seems like a regression, since the tests state that N is ok, but A and J
should not be allowed.
For now we can accept them, so we don't block all other Python PRs
Due to internal PR#35123 we now actually run the tests under
`python/ql/test/2/...`
Since we haven't done this in a while, test output has changed a bit. These
changes look perfectly fine.
The `multiple_invocation_paths` predicate had a bad join order where
we (essentially) joined `i1` with `i2` and only then joined `i1` and `i2`
separately to reduce the number of tuples. The join coming from `i1 != i2` had
little impact, but `i1.getFunction() = multi` made a big difference (and
similarly for `i2`). I factored out the code so that these joins would be done
more eagerly. Thus, we went from
```
[2019-11-06 16:53:05] (38s) Starting to evaluate predicate MethodCallOrder::multiple_invocation_paths#ffff/4@2ce75a
[2019-11-06 16:53:35] (68s) Tuple counts for MethodCallOrder::multiple_invocation_paths#ffff:
134547 ~9% {2} r1 = SCAN CallGraph::TInvocation#fff AS I OUTPUT I.<0>, I.<2>
235284431 ~3% {4} r2 = JOIN r1 WITH CallGraph::TInvocation#fff AS R ON FIRST 1 OUTPUT r1.<0>, r1.<1>, R.<1>, R.<2>
235149884 ~3% {4} r3 = SELECT r2 ON r2.<3> != r2.<1>
235149884 ~4% {3} r4 = SCAN r3 OUTPUT r3.<1>, r3.<0>, r3.<3>
166753634 ~5% {4} r5 = JOIN r4 WITH #CallGraph::FunctionInvocation::getACallee_dispred#ffPlus#swapped AS R ON FIRST 1 OUTPUT R.<1>, r4.<2>, r4.<1>, r4.<0>
129778 ~0% {4} r6 = JOIN r5 WITH #CallGraph::FunctionInvocation::getACallee_dispred#ffPlus AS R ON FIRST 2 OUTPUT r5.<0>, r5.<3>, r5.<1>, r5.<2>
return r6
[2019-11-06 16:53:35] (68s) Registering MethodCallOrder::multiple_invocation_paths#ffff + [] with content 1705dcbc08kd9aa40rp2g2e9civhv
[2019-11-06 16:53:35] (68s) >>> Wrote relation MethodCallOrder::multiple_invocation_paths#ffff with 129778 rows and 4 columns.
```
to
```
[2019-11-06 17:22:22] (25s) Starting to evaluate predicate MethodCallOrder::multiple_invocation_paths_helper#ffff/4@586aec
[2019-11-06 17:22:22] (25s) Tuple counts for MethodCallOrder::multiple_invocation_paths_helper#ffff:
134547 ~0% {2} r1 = SCAN CallGraph::TInvocation#fff AS I OUTPUT I.<2>, I.<0>
88111 ~4% {3} r2 = JOIN r1 WITH #CallGraph::FunctionInvocation::getACallee_dispred#ffPlus#swapped AS R ON FIRST 1 OUTPUT R.<1>, r1.<1>, r1.<0>
761305 ~0% {4} r3 = JOIN r2 WITH #CallGraph::FunctionInvocation::getACallee_dispred#ffPlus AS R ON FIRST 1 OUTPUT r2.<1>, r2.<2>, r2.<0>, R.<1>
673194 ~0% {4} r4 = SELECT r3 ON r3.<3> != r3.<1>
673194 ~0% {4} r5 = SCAN r4 OUTPUT r4.<2>, r4.<1>, r4.<3>, r4.<0>
return r5
[2019-11-06 17:22:22] (25s) Registering MethodCallOrder::multiple_invocation_paths_helper#ffff + [] with content 20edaaecf25nldgp24d9c4et8m3kv
[2019-11-06 17:22:22] (25s) >>> Wrote relation MethodCallOrder::multiple_invocation_paths_helper#ffff with 673194 rows and 4 columns.
[2019-11-06 17:22:22] (25s) Starting to evaluate predicate MethodCallOrder::multiple_invocation_paths_helper#ffff_2301#join_rhs/4@9e5441
[2019-11-06 17:22:22] (25s) Tuple counts for MethodCallOrder::multiple_invocation_paths_helper#ffff_2301#join_rhs:
673194 ~0% {4} r1 = SCAN MethodCallOrder::multiple_invocation_paths_helper#ffff AS I OUTPUT I.<2>, I.<3>, I.<0>, I.<1>
return r1
[2019-11-06 17:22:22] (25s) Registering MethodCallOrder::multiple_invocation_paths_helper#ffff_2301#join_rhs + [] with content 2069301e655fi9mcovngg9hetfqas
[2019-11-06 17:22:22] (25s) >>> Wrote relation MethodCallOrder::multiple_invocation_paths_helper#ffff_2301#join_rhs with 673194 rows and 4 columns.
[2019-11-06 17:22:22] (25s) Starting to evaluate predicate MethodCallOrder::multiple_invocation_paths#ffff/4@2f7c34
[2019-11-06 17:22:22] (25s) Tuple counts for MethodCallOrder::multiple_invocation_paths#ffff:
134547 ~0% {2} r1 = SCAN CallGraph::TInvocation#fff AS I OUTPUT I.<2>, I.<0>
129778 ~0% {4} r2 = JOIN r1 WITH MethodCallOrder::multiple_invocation_paths_helper#ffff_2301#join_rhs AS R ON FIRST 2 OUTPUT R.<2>, R.<3>, r1.<0>, r1.<1>
return r2
[2019-11-06 17:22:22] (25s) Registering MethodCallOrder::multiple_invocation_paths#ffff + [] with content 1705dcbc08kd9aa40rp2g2e9civhv
[2019-11-06 17:22:22] (25s) >>> Wrote relation MethodCallOrder::multiple_invocation_paths#ffff with 129778 rows and 4 columns.
[2019-11-06 17:22:22] (25s) Starting to evaluate predicate MethodCallOrder::multiple_invocation_paths#ffff_0312#join_rhs/4@9f9146
[2019-11-06 17:22:22] (25s) Tuple counts for MethodCallOrder::multiple_invocation_paths#ffff_0312#join_rhs:
129778 ~0% {4} r1 = SCAN MethodCallOrder::multiple_invocation_paths#ffff AS I OUTPUT I.<0>, I.<3>, I.<1>, I.<2>
return r1
[2019-11-06 17:22:22] (25s) Registering MethodCallOrder::multiple_invocation_paths#ffff_0312#join_rhs + [] with content 17c3fe1fcbf6ghhdr7hiukqp41rst
[2019-11-06 17:22:22] (25s) >>> Wrote relation MethodCallOrder::multiple_invocation_paths#ffff_0312#join_rhs with 129778 rows and 4 columns.
```
Execution time on `salt` went from 29.5s to somewhere below 299ms (the predicate
was not listed in the timing report).
As it turns out, there was a further bad join-order in the `global_name_used`
predicate. In this case, there was a common subexpression in the RA that was
being factored out and evaluated separately, producing a large number of tuples.
Before this change, any function that has a parameter that was called
password/credentials would be treated as returning sensitive data of that
kind. `py/clear-text-logging-sensitive-data` would alert if one of these are
logged, which has a LOT of false-positives.
This was brought up on the LGTM.com forums here:
https://discuss.lgtm.com/t/warn-when-always-failing-assert-is-reachable-rather-than-unreachable/2436
Essentially, in a complex chain of `elif` statements, like
```python
if x < 0:
...
elif x >= 0:
...
else:
...
```
the `else` clause is redundant, since the preceding conditions completely
exhaust the possible values for `x` (assuming `x` is an integer). Rather than
promoting the final `elif` clause to an `else` clause, it is common to instead
raise an explicit exception in the `else` clause. During execution, this
exception will never actually be raised, but its presence indicates that the
preceding conditions are intended to cover all possible cases.
I think it's a fair point. This is a clear instance where the alert, even if it
is technically correct, is not useful for the end user.
Also, I decided to make the exclusion fairly restrictive: it only applies if
the unreachable statement is an `assert False, ...` or `raise ...`, and only
if said statement is the first in the `else` block. Any other statements will
still be reported.
This would find instances of `thing = MyThing.objects.get(field=userinput)`, and
what seems to be a query that wants to match on `thing = MyThing();
thing.field=userinput`. Both are not vulnerable to user-input, due to the
build-in escaping by django.
The DjangoModelFieldWrite actually matches on `MyThing.field=userinput` and not
`thing.field=userinput`. I suspect this to be a mistake.
Matching on `thing.field=userinput`, would require this CodeQL:
attr.getObject(_).pointsTo().getClass() = model
This is vulnerable to SQL injection because of the quotes around %s -- added
some code that highlights this in test.py
Since our examples did this in the safe query, I ended up rewriting them
completely, causing a lot of trouble for myself :D
Hopefully it is more clear that you can get multiple results from getABaseType
because of multiple inheritance, and not because we are following the chain of
inheritance