- Removes the version check on the set of built-in names.
- Renames the predicate used to represent said set.
- Documents how these lists of names were obtained.
- Gets rid of a superfluous import.
This was an unwanted interaction between two unrelated tests, so I
switched to a different built-in in the second test. I also added a test
case that shows an unfortunate side effect of this more restricted
handling of built-ins.
This is _slightly_ wrong, since `exec` isn't a built-in function in
Python 2. It should be harmless, however, since `exec` is a keyword,
and so cannot be redefined anyway.
I am very tempted to leave out the constants, or at the very least
`False`, `True`, and `None`, as these have _many_ occurrences in the
average codebase, and are not terribly useful at the API-graph level.
If we really do want to capture "nodes that refer to such and such
constant", then I think a better solution would be to create classes
extending `DataFlow::Node` to facilitate this.
Limits the behaviour of github/codeql#5614 in two ways:
First, we only consider files that are contained in the source archive.
This prevents unnecessary computation involving files in e.g. the
standard library.
Secondly, we ignore any relative imports (e.g. `from .foo import ...`),
as these only work inside packages anyway.
This fixes an observed performance regression on projects that include
`google-cloud-sdk` as part of their source code.
This ensures that changes to `API::Node` does not invalidate the cached
`module Impl`. At present, I don't expect this to have any effect (as
the `Node` class is also fairly static, though not explicitly cached),
but I can imagine us making some of the `Node` methods have
user-extensible behaviour, in which case we definitely do not want this
to result in reevaluation of `API::Impl`.
Perhaps unsurprisingly, the join orderer was eager and willing to find
the wrong join order in this predicate as well. Applying a similar
fix to the one used in `TypeTracker::step` fixes the problem.
This is merely making explicit what was implicitly enforced. The move
to change the return type of `step` already meant that `this` and
`result` had to be `LocalSourceNode`. By moving these methods to their
rightful place, we should hopefully avoid a bit of suprising behaviour.
This caused some suprising test changes, where suddenly we had flow from
a `ModuleVariableNode` (as a `RemoteFlowSource`) to a sink. This of
course makes little sense, so instead we simply exclude these nodes as
uses in the first place.
The somewhat convoluted `comes_from_cfgnode` was originally introduced
in order to have local sources for instances of global variables. This
was needed because global variables have an implicit "scope entry" SSA
definition that flows to the first actual use of the variable (and so
would not fit the strict "has no incoming flow" definition of a local
source node).
However, a subsequent change means that we include all global variable
reads anyway, and so the old definition is no longer needed.
(See commit 3fafb47b16 for further
context.)
This commit does a lot of stuff all at once, so here are the main
highlights:
In `TypeTracker.qll`, we change `StepSummary::step` to step only between
source nodes. Because reads and writes of global variables happen in two
different (jump) steps, this requires the intermediate
`ModuleVariableNode` to _also_ be a `LocalSourceNode`, and we therefore
modify the charpred for that class accordingly. (This also means
changing a few of the tests to account for these new source nodes.)
In addition, we change `TypeTracker::step` to likewise step between
local source nodes.
Next, to enable the use of the `track` convenience method on nodes, we
add some pragmas to `TypeTracker::step` that prevent bad joins from
occurring. With this, we can eliminate all of the manual type tracker
join predicates.
Next, we observe that because `StepSummary::step` now uses `flowsTo`, it
automatically encapsulates all local-flow steps. In particular this
means we do not have to use `typePreservingStep` in `smallstep`, but can
use `jumpStep` directly. A similar observation applies to
`TypeTracker::smallstep`.
Having done this, we no longer need `typePreservingStep`, so we get rid
of it.
Previously, in cases like
```python
def foo(x):
x.bar()
x.baz()
x.quux()
```
we would have flow from the first `x` to each use _and_ flow from the
post-update node for each method call to each subsequent use, and all
of these would be `LocalSourceNode`s. For large functions with the above
pattern, this would lead to a quadratic blowup in `hasLocalSource`.
With this commit, only the first of these will count as a
`LocalSourceNode`, and the blowup disappears.
The meat of this PR is described in the new python/ql/test/experimental/meta/InlineTaintTest.qll file:
> Defines a InlineExpectationsTest for checking whether any arguments in
> `ensure_tainted` and `ensure_not_tainted` calls are tainted.
>
> Also defines query predicates to ensure that:
> - if any arguments to `ensure_not_tainted` are tainted, their annotation is marked with `SPURIOUS`.
> - if any arguments to `ensure_tainted` are not tainted, their annotation is marked with `MISSING`.
>
> The functionality of this module is tested in `ql/test/experimental/meta/inline-taint-test-demo`.