Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll)
and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade
(semmle.python.controlflow.internal.Cfg) and the new SSA adapter
(semmle.python.dataflow.new.internal.SsaImpl), both introduced
additively in the preceding PRs in this stack.
This is the trunk-flip equivalent of the original draft PR #21894 (kept
around as documentation), rebased on top of the four preparatory PRs:
P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919).
P2: Qualify Flow.qll's AST references with Py:: prefix (#21920).
P3: Add new shared-CFG-backed control flow graph (#21921).
P4: Add new shared-SSA-backed SSA adapter (#21923).
The Python dataflow library (semmle/python/dataflow/new/) now imports
the new CFG facade and SSA adapter. All CFG-typed predicates
(ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are
qualified with the Cfg:: prefix; SSA references switch from
EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable.
GuardNode is redesigned to use the new CFG's outcome-node model
(isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock +
flipped indirection. Only BarrierGuard<...> is preserved as public
API.
Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib,
...) are updated to take CFG nodes from the new facade.
A handful of dataflow consistency tweaks for the new CFG:
- Augmented-assignment targets are treated as both load and store.
- 'from X import *' produces uncertain SSA writes for unknown names.
- CFG nodes are canonicalised so dataflow does not see equivalent
pre/post-order pairs as distinct nodes.
Two AST tweaks for the new CFG:
- AstNodeImpl: omit PEP 695 type-parameter names from
FunctionDefExpr / ClassDefExpr children.
- ImportResolution: drop the legacy essa import.
Test churn (~175 files): reblessed library- and query-test .expected
files reflect slightly different CFG granularity, different toString
output, and a handful of true alert deltas in security queries.
Verification: all 367 lib + src + consistency-queries compile clean.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds a comment explaining why we no longer flag the indirect tuple
example.
Also adds a test case which _would_ be flagged if not for the type
annotation.
As we're no longer tracking tuples across function boundaries, we lose
the result that related to this setup (which, as the preceding commit
explains, lead to a lot of false positives).
Technically we still depend on points-to in that we still mention
`PythonFunctionValue` and `ClassValue` in the query. However, we
immediately move to working with the corresponding `Function` and
`Class` AST nodes, and so we're not really using points-to. (The reason
for doing things this way is that otherwise the `.toString()` for all of
the alerts would change, which would make the diff hard to interpret.
This way, it should be fairly simple to see which changes are actually
relevant.)
We do lose some precision when moving away from points-to, and this is
reflected in the changes in the `.expected` file. In particular we no
longer do complicated tracking of values, but rather look at the
syntactic structure of the classes in question. This causes us to lose
out on some results where a special method is defined elsewhere, and
causes a single FP where a special method initially has the wrong
signature, but is subsequently overwritten with a function with the
correct signature.
We also lose out on results having to do with default values, as these
are now disabled.
Finally, it was necessary to add special handling of methods marked with
the `staticmethod` decorator, as these expect to receive fewer
arguments. This was motivated by a MRVA run, where e.g. sympy showed a
lot of examples along the lines of
```
@staticmethod
def __abs__():
return ...
```
Or, more generally, any copy step, as these presumably do not preserve
object identity.
(Arguably, `copy` could still be susceptible to interior mutability, but
I think that's outside the scope of this query anyway.)
There are two issues with `deepcopy` here. Firstly, the `deepcopy` function itself
has a mutable default value in its parameter `_nil` (set to the empty list by default).
Now, this value is never actually returned from `deepcopy`, as it is only used as a
sentinel, but our analysis is not clever enough to see this. Thus, it thinks that this
mutable default is returned, and hence the result of any call to `deepcopy` is a
potential source.
To remedy this, I opted to simply exclude all sources that originate from within the
standard library. It is very unlikely for any of the sources in the standard library
to be legit.
Secondly, `deepcopy` -- by virtue of being a function that we model as preserving
values -- admits data-flow through its calls, but this is not correct for the mutable
default query, as it is here the _identity_ of the default value in question that is
important. Thus, we get spurious flow through `deepcopy` for this specific query.
Moves the existing tests into the `ModificationOfParameterWithDefault` subdirectory
which already contained a bunch more tests. In the process, I also removed some
duplicated test cases.