Fixes#1832.
In the taint sink, we add an additional check that the given control-flow node
can indeed point to a value that is mutable. This takes care of the guard on the
type.
If and when we get around to adding configurations for all of the taint
analyses, we may want to implement this as a barrier instead, pruning any steps
that go through a type test where the type is not mutable.
Fixes https://github.com/Semmle/ql/issues/1572
Adjust mock so it's more aligned with what the flask code actually does. Tests
were passing before, even though we didn't handle the case in real code :\
If the legacy configuration is only enabled if there are no other
configurations, defining a configuration in an imported library can lead to
unwanted results. For example, code that uses `any(MyTaintKind t).taints(node)`
would *stop* working, if it did not define its own configuration. (this actually
happened to us)
We performed a dist-compare to ensure there is not a performance deg ration by
doing this. Results at https://git.semmle.com/gist/rasmuswl/a1eca07f3a92f5f65ee78d733e5d260e
Tests that were affected by this:
- RockPaperScissors + Simple: new edges because no configuration was defined for
SqlInjectionTaint or CommandInjectionTaint
- CleartextLogging + CleartextStorage: new edges because no configuration was
defined before, AND duplicate deges.
- TestNode: new edges because no configuration was defined before
- PathInjection: Duplicate edges
- TarSlip: Duplicate edges
- CommandInjection: Duplicate edges
- ReflectedXss: Duplicate edges
- SqlInjection: Duplicate edges
- CodeInjection: Duplicate edges
- StackTraceExposure: Duplicate edges
- UnsafeDeserialization: Duplicate edges
- UrlRedirect: Duplicate edges
We don't use a locked-down version of six, so some internal things probably
changed from the version used last time, and the versoin I have installed.
Long term fix would be to use a specific version of six for tests!
Due to internal PR#35123 we now actually run the tests under
`python/ql/test/2/...`
These seems like a regression, since the tests state that N is ok, but A and J
should not be allowed.
For now we can accept them, so we don't block all other Python PRs
Due to internal PR#35123 we now actually run the tests under
`python/ql/test/2/...`
Since we haven't done this in a while, test output has changed a bit. These
changes look perfectly fine.
The `multiple_invocation_paths` predicate had a bad join order where
we (essentially) joined `i1` with `i2` and only then joined `i1` and `i2`
separately to reduce the number of tuples. The join coming from `i1 != i2` had
little impact, but `i1.getFunction() = multi` made a big difference (and
similarly for `i2`). I factored out the code so that these joins would be done
more eagerly. Thus, we went from
```
[2019-11-06 16:53:05] (38s) Starting to evaluate predicate MethodCallOrder::multiple_invocation_paths#ffff/4@2ce75a
[2019-11-06 16:53:35] (68s) Tuple counts for MethodCallOrder::multiple_invocation_paths#ffff:
134547 ~9% {2} r1 = SCAN CallGraph::TInvocation#fff AS I OUTPUT I.<0>, I.<2>
235284431 ~3% {4} r2 = JOIN r1 WITH CallGraph::TInvocation#fff AS R ON FIRST 1 OUTPUT r1.<0>, r1.<1>, R.<1>, R.<2>
235149884 ~3% {4} r3 = SELECT r2 ON r2.<3> != r2.<1>
235149884 ~4% {3} r4 = SCAN r3 OUTPUT r3.<1>, r3.<0>, r3.<3>
166753634 ~5% {4} r5 = JOIN r4 WITH #CallGraph::FunctionInvocation::getACallee_dispred#ffPlus#swapped AS R ON FIRST 1 OUTPUT R.<1>, r4.<2>, r4.<1>, r4.<0>
129778 ~0% {4} r6 = JOIN r5 WITH #CallGraph::FunctionInvocation::getACallee_dispred#ffPlus AS R ON FIRST 2 OUTPUT r5.<0>, r5.<3>, r5.<1>, r5.<2>
return r6
[2019-11-06 16:53:35] (68s) Registering MethodCallOrder::multiple_invocation_paths#ffff + [] with content 1705dcbc08kd9aa40rp2g2e9civhv
[2019-11-06 16:53:35] (68s) >>> Wrote relation MethodCallOrder::multiple_invocation_paths#ffff with 129778 rows and 4 columns.
```
to
```
[2019-11-06 17:22:22] (25s) Starting to evaluate predicate MethodCallOrder::multiple_invocation_paths_helper#ffff/4@586aec
[2019-11-06 17:22:22] (25s) Tuple counts for MethodCallOrder::multiple_invocation_paths_helper#ffff:
134547 ~0% {2} r1 = SCAN CallGraph::TInvocation#fff AS I OUTPUT I.<2>, I.<0>
88111 ~4% {3} r2 = JOIN r1 WITH #CallGraph::FunctionInvocation::getACallee_dispred#ffPlus#swapped AS R ON FIRST 1 OUTPUT R.<1>, r1.<1>, r1.<0>
761305 ~0% {4} r3 = JOIN r2 WITH #CallGraph::FunctionInvocation::getACallee_dispred#ffPlus AS R ON FIRST 1 OUTPUT r2.<1>, r2.<2>, r2.<0>, R.<1>
673194 ~0% {4} r4 = SELECT r3 ON r3.<3> != r3.<1>
673194 ~0% {4} r5 = SCAN r4 OUTPUT r4.<2>, r4.<1>, r4.<3>, r4.<0>
return r5
[2019-11-06 17:22:22] (25s) Registering MethodCallOrder::multiple_invocation_paths_helper#ffff + [] with content 20edaaecf25nldgp24d9c4et8m3kv
[2019-11-06 17:22:22] (25s) >>> Wrote relation MethodCallOrder::multiple_invocation_paths_helper#ffff with 673194 rows and 4 columns.
[2019-11-06 17:22:22] (25s) Starting to evaluate predicate MethodCallOrder::multiple_invocation_paths_helper#ffff_2301#join_rhs/4@9e5441
[2019-11-06 17:22:22] (25s) Tuple counts for MethodCallOrder::multiple_invocation_paths_helper#ffff_2301#join_rhs:
673194 ~0% {4} r1 = SCAN MethodCallOrder::multiple_invocation_paths_helper#ffff AS I OUTPUT I.<2>, I.<3>, I.<0>, I.<1>
return r1
[2019-11-06 17:22:22] (25s) Registering MethodCallOrder::multiple_invocation_paths_helper#ffff_2301#join_rhs + [] with content 2069301e655fi9mcovngg9hetfqas
[2019-11-06 17:22:22] (25s) >>> Wrote relation MethodCallOrder::multiple_invocation_paths_helper#ffff_2301#join_rhs with 673194 rows and 4 columns.
[2019-11-06 17:22:22] (25s) Starting to evaluate predicate MethodCallOrder::multiple_invocation_paths#ffff/4@2f7c34
[2019-11-06 17:22:22] (25s) Tuple counts for MethodCallOrder::multiple_invocation_paths#ffff:
134547 ~0% {2} r1 = SCAN CallGraph::TInvocation#fff AS I OUTPUT I.<2>, I.<0>
129778 ~0% {4} r2 = JOIN r1 WITH MethodCallOrder::multiple_invocation_paths_helper#ffff_2301#join_rhs AS R ON FIRST 2 OUTPUT R.<2>, R.<3>, r1.<0>, r1.<1>
return r2
[2019-11-06 17:22:22] (25s) Registering MethodCallOrder::multiple_invocation_paths#ffff + [] with content 1705dcbc08kd9aa40rp2g2e9civhv
[2019-11-06 17:22:22] (25s) >>> Wrote relation MethodCallOrder::multiple_invocation_paths#ffff with 129778 rows and 4 columns.
[2019-11-06 17:22:22] (25s) Starting to evaluate predicate MethodCallOrder::multiple_invocation_paths#ffff_0312#join_rhs/4@9f9146
[2019-11-06 17:22:22] (25s) Tuple counts for MethodCallOrder::multiple_invocation_paths#ffff_0312#join_rhs:
129778 ~0% {4} r1 = SCAN MethodCallOrder::multiple_invocation_paths#ffff AS I OUTPUT I.<0>, I.<3>, I.<1>, I.<2>
return r1
[2019-11-06 17:22:22] (25s) Registering MethodCallOrder::multiple_invocation_paths#ffff_0312#join_rhs + [] with content 17c3fe1fcbf6ghhdr7hiukqp41rst
[2019-11-06 17:22:22] (25s) >>> Wrote relation MethodCallOrder::multiple_invocation_paths#ffff_0312#join_rhs with 129778 rows and 4 columns.
```
Execution time on `salt` went from 29.5s to somewhere below 299ms (the predicate
was not listed in the timing report).
As it turns out, there was a further bad join-order in the `global_name_used`
predicate. In this case, there was a common subexpression in the RA that was
being factored out and evaluated separately, producing a large number of tuples.
Before this change, any function that has a parameter that was called
password/credentials would be treated as returning sensitive data of that
kind. `py/clear-text-logging-sensitive-data` would alert if one of these are
logged, which has a LOT of false-positives.
This was brought up on the LGTM.com forums here:
https://discuss.lgtm.com/t/warn-when-always-failing-assert-is-reachable-rather-than-unreachable/2436
Essentially, in a complex chain of `elif` statements, like
```python
if x < 0:
...
elif x >= 0:
...
else:
...
```
the `else` clause is redundant, since the preceding conditions completely
exhaust the possible values for `x` (assuming `x` is an integer). Rather than
promoting the final `elif` clause to an `else` clause, it is common to instead
raise an explicit exception in the `else` clause. During execution, this
exception will never actually be raised, but its presence indicates that the
preceding conditions are intended to cover all possible cases.
I think it's a fair point. This is a clear instance where the alert, even if it
is technically correct, is not useful for the end user.
Also, I decided to make the exclusion fairly restrictive: it only applies if
the unreachable statement is an `assert False, ...` or `raise ...`, and only
if said statement is the first in the `else` block. Any other statements will
still be reported.
This would find instances of `thing = MyThing.objects.get(field=userinput)`, and
what seems to be a query that wants to match on `thing = MyThing();
thing.field=userinput`. Both are not vulnerable to user-input, due to the
build-in escaping by django.
The DjangoModelFieldWrite actually matches on `MyThing.field=userinput` and not
`thing.field=userinput`. I suspect this to be a mistake.
Matching on `thing.field=userinput`, would require this CodeQL:
attr.getObject(_).pointsTo().getClass() = model
This is vulnerable to SQL injection because of the quotes around %s -- added
some code that highlights this in test.py
Since our examples did this in the safe query, I ended up rewriting them
completely, causing a lot of trouble for myself :D
Hopefully it is more clear that you can get multiple results from getABaseType
because of multiple inheritance, and not because we are following the chain of
inheritance