Perhaps unsurprisingly, the join orderer was eager and willing to find
the wrong join order in this predicate as well. Applying a similar
fix to the one used in `TypeTracker::step` fixes the problem.
This is merely making explicit what was implicitly enforced. The move
to change the return type of `step` already meant that `this` and
`result` had to be `LocalSourceNode`. By moving these methods to their
rightful place, we should hopefully avoid a bit of suprising behaviour.
This caused some suprising test changes, where suddenly we had flow from
a `ModuleVariableNode` (as a `RemoteFlowSource`) to a sink. This of
course makes little sense, so instead we simply exclude these nodes as
uses in the first place.
The somewhat convoluted `comes_from_cfgnode` was originally introduced
in order to have local sources for instances of global variables. This
was needed because global variables have an implicit "scope entry" SSA
definition that flows to the first actual use of the variable (and so
would not fit the strict "has no incoming flow" definition of a local
source node).
However, a subsequent change means that we include all global variable
reads anyway, and so the old definition is no longer needed.
(See commit 3fafb47b16 for further
context.)
This commit does a lot of stuff all at once, so here are the main
highlights:
In `TypeTracker.qll`, we change `StepSummary::step` to step only between
source nodes. Because reads and writes of global variables happen in two
different (jump) steps, this requires the intermediate
`ModuleVariableNode` to _also_ be a `LocalSourceNode`, and we therefore
modify the charpred for that class accordingly. (This also means
changing a few of the tests to account for these new source nodes.)
In addition, we change `TypeTracker::step` to likewise step between
local source nodes.
Next, to enable the use of the `track` convenience method on nodes, we
add some pragmas to `TypeTracker::step` that prevent bad joins from
occurring. With this, we can eliminate all of the manual type tracker
join predicates.
Next, we observe that because `StepSummary::step` now uses `flowsTo`, it
automatically encapsulates all local-flow steps. In particular this
means we do not have to use `typePreservingStep` in `smallstep`, but can
use `jumpStep` directly. A similar observation applies to
`TypeTracker::smallstep`.
Having done this, we no longer need `typePreservingStep`, so we get rid
of it.
Previously, in cases like
```python
def foo(x):
x.bar()
x.baz()
x.quux()
```
we would have flow from the first `x` to each use _and_ flow from the
post-update node for each method call to each subsequent use, and all
of these would be `LocalSourceNode`s. For large functions with the above
pattern, this would lead to a quadratic blowup in `hasLocalSource`.
With this commit, only the first of these will count as a
`LocalSourceNode`, and the blowup disappears.
The meat of this PR is described in the new python/ql/test/experimental/meta/InlineTaintTest.qll file:
> Defines a InlineExpectationsTest for checking whether any arguments in
> `ensure_tainted` and `ensure_not_tainted` calls are tainted.
>
> Also defines query predicates to ensure that:
> - if any arguments to `ensure_not_tainted` are tainted, their annotation is marked with `SPURIOUS`.
> - if any arguments to `ensure_tainted` are not tainted, their annotation is marked with `MISSING`.
>
> The functionality of this module is tested in `ql/test/experimental/meta/inline-taint-test-demo`.
In some cases, we were joining the result of `val.getClass()` against
the first argument of `Types::improperSubclass` before filtering out the
vast majority of tuples by the call to `isinstance_call`.
To fix this, we let `isinstance_call` take care of figuring out the
class of the value being tested. As a bonus, this cleans up the only
other place where `isinstance_call` is used, where we _also_ want to
know the class of the value being tested in the `isinstance` call.
The recent change to `appliesTo` lead to a perturbation in the join
order of this predicate, which resulted in a cartesian product between
`call` and `ctx` being created (before being filtered by `appliesTo`).
By splitting the intermediate result into its own helper predicate,
suitably marked to prevent inlining/magic, we prevent this from
happening again.