This commit does a lot of stuff all at once, so here are the main
highlights:
In `TypeTracker.qll`, we change `StepSummary::step` to step only between
source nodes. Because reads and writes of global variables happen in two
different (jump) steps, this requires the intermediate
`ModuleVariableNode` to _also_ be a `LocalSourceNode`, and we therefore
modify the charpred for that class accordingly. (This also means
changing a few of the tests to account for these new source nodes.)
In addition, we change `TypeTracker::step` to likewise step between
local source nodes.
Next, to enable the use of the `track` convenience method on nodes, we
add some pragmas to `TypeTracker::step` that prevent bad joins from
occurring. With this, we can eliminate all of the manual type tracker
join predicates.
Next, we observe that because `StepSummary::step` now uses `flowsTo`, it
automatically encapsulates all local-flow steps. In particular this
means we do not have to use `typePreservingStep` in `smallstep`, but can
use `jumpStep` directly. A similar observation applies to
`TypeTracker::smallstep`.
Having done this, we no longer need `typePreservingStep`, so we get rid
of it.
Previously, in cases like
```python
def foo(x):
x.bar()
x.baz()
x.quux()
```
we would have flow from the first `x` to each use _and_ flow from the
post-update node for each method call to each subsequent use, and all
of these would be `LocalSourceNode`s. For large functions with the above
pattern, this would lead to a quadratic blowup in `hasLocalSource`.
With this commit, only the first of these will count as a
`LocalSourceNode`, and the blowup disappears.
In some cases, we were joining the result of `val.getClass()` against
the first argument of `Types::improperSubclass` before filtering out the
vast majority of tuples by the call to `isinstance_call`.
To fix this, we let `isinstance_call` take care of figuring out the
class of the value being tested. As a bonus, this cleans up the only
other place where `isinstance_call` is used, where we _also_ want to
know the class of the value being tested in the `isinstance` call.
The recent change to `appliesTo` lead to a perturbation in the join
order of this predicate, which resulted in a cartesian product between
`call` and `ctx` being created (before being filtered by `appliesTo`).
By splitting the intermediate result into its own helper predicate,
suitably marked to prevent inlining/magic, we prevent this from
happening again.
The `CallCfgNode` restrictions are familiar and useful.
Restricting `InstanceSource` to extend `LocalSourceNode` is novel, but I
think it makes sense. It will act as a good reminder to anyone extending
`InstanceSource` that the node in question is a `LocalSourceNode`, which
will be enforced by the return type of the internal type tracker anyway.
I wasn't able to roll out API graphs as widely in Tornado as I had
hoped, since we're lacking the "def" part. This means most of the
`InstanceSource` machinery will have to stay.