Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll)
and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade
(semmle.python.controlflow.internal.Cfg) and the new SSA adapter
(semmle.python.dataflow.new.internal.SsaImpl), both introduced
additively in the preceding PRs in this stack.
This is the trunk-flip equivalent of the original draft PR #21894 (kept
around as documentation), rebased on top of the four preparatory PRs:
P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919).
P2: Qualify Flow.qll's AST references with Py:: prefix (#21920).
P3: Add new shared-CFG-backed control flow graph (#21921).
P4: Add new shared-SSA-backed SSA adapter (#21923).
The Python dataflow library (semmle/python/dataflow/new/) now imports
the new CFG facade and SSA adapter. All CFG-typed predicates
(ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are
qualified with the Cfg:: prefix; SSA references switch from
EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable.
GuardNode is redesigned to use the new CFG's outcome-node model
(isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock +
flipped indirection. Only BarrierGuard<...> is preserved as public
API.
Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib,
...) are updated to take CFG nodes from the new facade.
A handful of dataflow consistency tweaks for the new CFG:
- Augmented-assignment targets are treated as both load and store.
- 'from X import *' produces uncertain SSA writes for unknown names.
- CFG nodes are canonicalised so dataflow does not see equivalent
pre/post-order pairs as distinct nodes.
Two AST tweaks for the new CFG:
- AstNodeImpl: omit PEP 695 type-parameter names from
FunctionDefExpr / ClassDefExpr children.
- ImportResolution: drop the legacy essa import.
Test churn (~175 files): reblessed library- and query-test .expected
files reflect slightly different CFG granularity, different toString
output, and a handful of true alert deltas in security queries.
Verification: all 367 lib + src + consistency-queries compile clean.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
mostly removing of nodes from the graph.
One result lost:
```
check("submodule.submodule_attr", submodule.submodule_attr, "submodule_attr", globals()) #$ MISSING:prints=submodule_attr
```
This commit removes SSA nodes from the data flow graph. Specifically, for a definition and use such as
```python
x = expr
y = x + 2
```
we used to have flow from `expr` to an SSA variable representing x and from that SSA variable to the use of `x` in the definition of `y`. Now we instead have flow from `expr` to the control flow node for `x` at line 1 and from there to the control flow node for `x` at line 2.
Specific changes:
- `EssaNode` from the data flow layer no longer exists.
- Several glue steps between `EssaNode`s and `CfgNode`s have been deleted.
- Entry nodes are now admitted as `CfgNodes` in the data flow layer (they were filtered out before).
- Entry nodes now have a new `toString` taking into account that the module name may be ambigous.
- Some tests have been rewritten to accomodate the changes, but only `python/ql/test/experimental/dataflow/basic/maximalFlowsConfig.qll` should have semantic changes.
- Comments have been updated
- Test output has been updated, but apart from `python/ql/test/experimental/dataflow/basic/maximalFlows.expected` only `python/ql/test/experimental/dataflow/typetracking-summaries/summaries.py` should have a semantic change. This is a bonus fix, probably meaning that something was never connected up correctly.
It's nice that it fixes the `InsecureProtocol` test-case (which maybe
should have been a test-case for the import resolution library in the
first place?)
But it's not quite right:
1. it adds spurious flow for `clashing_attr`
2. it runs into huge problems for typetracking_imports/tracked.expected
3. it runs into the problem for
https://github.com/github/codeql/pull/10176 with an `from <pkg>
import *` blocking flow from previously defined variable, that is NOT
overridden. (simplistic_reexport.bar_attr)
Just like the one added for `py/insecure-protocol` in fb425b7, but
instead added in the import-resolution tests, such that we don't have to
remember it's in a completely different directory.
It's not very useful to look at, and it's a mess when you change any
tests to see all the changes lines in the expected output that you
really do not care about!
I guess we could have done this at the very start of introducing this
test in this PR, but I think the last commit was mostly inspired from
looking at all the things that evidently was re-exported from the trace
import, even when I knew they were not available because of the
`__all__` definition.
However, we can see that `from <pkg> import *` and `import pkg` are
handled differently. Would have liked `has_defined_all_indirection` to
behave in the same way no matter how the import was made.
Notice that `has_defined_all_indirection` all have both
`all_defined_bar_copy` and `all_defined_foo_copy` marked as exported,
even though only `all_defined_foo_copy` is available.
For `from <pkg> import <attr>` we would use to treat the `<pkg>`
(ImportExpr) as a definition of the name `<attr>`.
Since this removes bad import-flow, and nothing broke, I'm guessing this
was never intentional.