This caused some suprising test changes, where suddenly we had flow from
a `ModuleVariableNode` (as a `RemoteFlowSource`) to a sink. This of
course makes little sense, so instead we simply exclude these nodes as
uses in the first place.
The somewhat convoluted `comes_from_cfgnode` was originally introduced
in order to have local sources for instances of global variables. This
was needed because global variables have an implicit "scope entry" SSA
definition that flows to the first actual use of the variable (and so
would not fit the strict "has no incoming flow" definition of a local
source node).
However, a subsequent change means that we include all global variable
reads anyway, and so the old definition is no longer needed.
(See commit 3fafb47b16 for further
context.)
This commit does a lot of stuff all at once, so here are the main
highlights:
In `TypeTracker.qll`, we change `StepSummary::step` to step only between
source nodes. Because reads and writes of global variables happen in two
different (jump) steps, this requires the intermediate
`ModuleVariableNode` to _also_ be a `LocalSourceNode`, and we therefore
modify the charpred for that class accordingly. (This also means
changing a few of the tests to account for these new source nodes.)
In addition, we change `TypeTracker::step` to likewise step between
local source nodes.
Next, to enable the use of the `track` convenience method on nodes, we
add some pragmas to `TypeTracker::step` that prevent bad joins from
occurring. With this, we can eliminate all of the manual type tracker
join predicates.
Next, we observe that because `StepSummary::step` now uses `flowsTo`, it
automatically encapsulates all local-flow steps. In particular this
means we do not have to use `typePreservingStep` in `smallstep`, but can
use `jumpStep` directly. A similar observation applies to
`TypeTracker::smallstep`.
Having done this, we no longer need `typePreservingStep`, so we get rid
of it.
Previously, in cases like
```python
def foo(x):
x.bar()
x.baz()
x.quux()
```
we would have flow from the first `x` to each use _and_ flow from the
post-update node for each method call to each subsequent use, and all
of these would be `LocalSourceNode`s. For large functions with the above
pattern, this would lead to a quadratic blowup in `hasLocalSource`.
With this commit, only the first of these will count as a
`LocalSourceNode`, and the blowup disappears.
The `CallCfgNode` restrictions are familiar and useful.
Restricting `InstanceSource` to extend `LocalSourceNode` is novel, but I
think it makes sense. It will act as a good reminder to anyone extending
`InstanceSource` that the node in question is a `LocalSourceNode`, which
will be enforced by the return type of the internal type tracker anyway.
I wasn't able to roll out API graphs as widely in Tornado as I had
hoped, since we're lacking the "def" part. This means most of the
`InstanceSource` machinery will have to stay.
Eliminates _almost_ all of the bespoke type trackers found here. The
ones that remain do not fit easily inside the framework of API graphs
(at least, not yet), and I did not see any easy ways to clean them up.
They have, however, been rewritten to use `LocalSourceNode` internally,
which was the primary goal of this exercise.
I'm sure we could also clean up many of the inner modules given the more
lean presentation we have now, but this can wait for a different PR.
Because the replacement extension point now extends `API::Node`, I
modified the `toString` method of the latter to have an empty body.
The alternative would be to require everyone to provide a `toString`
predicate for their extensions, but seeing as these will usually be
pointing to already existing API graph nodes, this seems silly.
(This may be the reason why the equivalent method in the JS libs has
such an implementation.)