Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll)
and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade
(semmle.python.controlflow.internal.Cfg) and the new SSA adapter
(semmle.python.dataflow.new.internal.SsaImpl), both introduced
additively in the preceding PRs in this stack.
This is the trunk-flip equivalent of the original draft PR #21894 (kept
around as documentation), rebased on top of the four preparatory PRs:
P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919).
P2: Qualify Flow.qll's AST references with Py:: prefix (#21920).
P3: Add new shared-CFG-backed control flow graph (#21921).
P4: Add new shared-SSA-backed SSA adapter (#21923).
The Python dataflow library (semmle/python/dataflow/new/) now imports
the new CFG facade and SSA adapter. All CFG-typed predicates
(ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are
qualified with the Cfg:: prefix; SSA references switch from
EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable.
GuardNode is redesigned to use the new CFG's outcome-node model
(isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock +
flipped indirection. Only BarrierGuard<...> is preserved as public
API.
Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib,
...) are updated to take CFG nodes from the new facade.
A handful of dataflow consistency tweaks for the new CFG:
- Augmented-assignment targets are treated as both load and store.
- 'from X import *' produces uncertain SSA writes for unknown names.
- CFG nodes are canonicalised so dataflow does not see equivalent
pre/post-order pairs as distinct nodes.
Two AST tweaks for the new CFG:
- AstNodeImpl: omit PEP 695 type-parameter names from
FunctionDefExpr / ClassDefExpr children.
- ImportResolution: drop the legacy essa import.
Test churn (~175 files): reblessed library- and query-test .expected
files reflect slightly different CFG granularity, different toString
output, and a handful of true alert deltas in security queries.
Verification: all 367 lib + src + consistency-queries compile clean.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Preparatory refactor for the shared-CFG dataflow migration. Adds the
new Python SSA adapter additively, without changing any production
behaviour.
Library additions:
- semmle.python.dataflow.new.internal.SsaImpl — Python SSA
implementation built on the new (shared) CFG. Mirrors the Java SSA
adapter (java/ql/lib/semmle/code/java/dataflow/internal/SsaImpl.qll):
an InputSig is defined in terms of positional (BasicBlock, int)
variable references, and the shared
codeql.ssa.Ssa::Make<Location, Cfg, Input> module is then
instantiated.
SourceVariable is the AST-level Py::Variable. Variable references
are looked up via the new CFG facade's NameNode.defines/uses/deletes
predicates (added in the preceding PR), which themselves are
one-line bridges to AST-level Name.defines/uses/deletes.
Implicit-entry definitions are inserted for non-local/global/builtin
reads, captured variables, and (when needed) parameters.
Test additions:
- library-tests/dataflow-new-ssa/ — exercises the new SSA over a
representative test corpus and checks expected def/use chains.
- library-tests/dataflow-new-ssa-vs-legacy/ — runs both new SSA and
legacy ESSA over the same corpus and diffs the results, so any
semantic divergence shows up as a test failure.
Production impact:
None. The new SSA adapter has zero callers in lib/ and src/ — the
legacy ESSA SSA (semmle/python/essa/*) remains the default. The
dataflow library is not migrated yet; that lands in a follow-up PR.
Verified by:
- All 367 lib + src + consistency-queries compile clean.
- All 641 ControlFlow + PointsTo + dataflow + essa + consistency
library-tests pass.
- Both new dataflow-new-ssa[/vs-legacy] test packs pass.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Preparatory refactor for the shared-CFG dataflow migration. Adds the
new Python CFG library additively, without changing any production
behaviour.
Library additions:
- semmle.python.controlflow.internal.AstNodeImpl — mediates between
the Python AST and the shared codeql.controlflow.ControlFlowGraph
signature. Wraps Python's Stmt/Expr/Scope/Pattern and adds two
synthetic kinds of node (BlockStmt for body slots, intermediate
nodes for multi-operand boolean expressions).
- semmle.python.controlflow.internal.Cfg — public facade
re-exposing the same API surface as semmle/python/Flow.qll
(ControlFlowNode, CallNode, BasicBlock, NameNode, DefinitionNode,
CompareNode, ...), backed by the shared CFG.
- lib/printCfgNew.ql — debug/visualisation query for the new CFG.
- consistency-queries/CfgConsistency.ql — consistency query running
the shared CFG's standard checks against Python.
Shared library:
- shared.controlflow.ControlFlowGraph — adds two defaulted
getWhileElse / getForeachElse predicates to AstSig so Python can
model while-else / for-else (no behavioural change for other
languages).
Test additions:
- ControlFlow/bindings/* — annotation-driven SSA-binding tests for
the new CFG (annassign, compound, comprehension, decorated,
except_handler, imports, match_pattern, parameters, simple,
type_params, walrus_starred, with_stmt, dead_under_no_raise).
- ControlFlow/store-load/* — basic store/load coverage.
- ControlFlow/evaluation-order/NewCfg*.ql — mirrors of the existing
OldCfg evaluation-order self-validation suite, run against the
new CFG via NewCfgImpl.qll.
- Minor extensions to existing test_if.py / test_boolean.py +
cosmetic .expected churn on a handful of OldCfg tests.
No dataflow, SSA, or production query is migrated yet — that lands in
follow-up PRs. The new CFG library has zero callers in lib/ and src/.
Verified by:
- All lib + src + consistency-queries compile clean (367 queries).
- All 56 ControlFlow library-tests pass.
- All 474 dataflow + PointsTo library-tests + consistency tests pass.
- syntax_error/CONSISTENCY/CfgConsistency passes.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Preparatory refactor for the shared-CFG dataflow migration.
Deprecates the AstNode.getAFlowNode() cached predicate on the public
Python QL API and rewrites all ~140 internal callers across lib/, src/,
test/, and tools/ from `expr.getAFlowNode() = cfgNode` to
`cfgNode.getNode() = expr`, using ControlFlowNode.getNode() which
already exists in Flow.qll.
The predicate itself is preserved (with a deprecation note pointing at
the new pattern) so external users do not experience churn — they can
migrate at their own pace and the AST/CFG hierarchies still get the
intended untangling once the deprecation eventually elapses.
Semantic noop verified by:
- All 361 lib/ + src/ queries compile clean.
- All 122 ControlFlow + PointsTo library-tests pass.
- All 64 dataflow library-tests pass.
- All 113 Variables/Exceptions/Expressions/Statements/Functions/Imports/
Security/CWE-798/ModificationOfParameterWithDefault query-tests pass.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This one is potentially a bit iffy -- it checks for a very powerful
property (that implies many of the other queries), but as the test
results show, it can produce false positives when there is in fact no
problem. We may want to get rid of it entirely, if it becomes too noisy.
This looks for nodes annotated with `t[never]` in the test that are
reachable in the CFG. This should not happen (it messes with various
queries, e.g. the "mixed returns" query), but the test shows that in a
few particular cases (involving the `match` statement where all cases
contain `return`s), we _do_ have reachable nodes that shouldn't be.
This one demonstrates a bug in the current CFG. In a dictionary
comprehension `{k: v for k, v in d.items()}`, we evaluate the value
before the key, which is incorrect. (A fix for this bug has been
implemented in a separate PR.)
These use the annotated, self-verifying test files to check various
consistency requirements.
Some of these may be expressing the same thing in different ways, but
it's fairly cheap to keep them around, so I have not attempted to
produce a minimal set of queries for this.
These tests consist of various Python constructions (hopefully a
somewhat comprehensive set) with specific timestamp annotations
scattered throughout. When the tests are run using the Python 3
interpreter, these annotations are checked and compared to the "current
timestamp" to see that they are in agreement. This is what makes the
tests "self-validating".
There are a few different kinds of annotations: the basic `t[4]` style
(meaning this is executed at timestamp 4), the `t[dead(4)]` variant
(meaning this _would_ happen at timestamp 4, but it is in a dead
branch), and `t[never]` (meaning this is never executed at all).
In addition to this, there is a query, MissingAnnotations, which checks
whether we have applied these annotations maximally. Many expression
nodes are not actually annotatable, so there is a sizeable list of
excluded nodes for that query.
We won't be able to run these tests until Python 3.15 is actually out
(and our CI is using it), so it seemed easiest to just put them in their
own test directory.
Adds `hasOverloadDecorator` as a predicate on functions. It looks for
decorators called `overload` or `something.overload` (usually
`typing.overload` or `t.overload`). These are then filtered out in the
predicates that (approximate) resolving methods according to the MRO.
As the test introduced in the previous commit shows, this removes the
spurious resolutions we had before.
The ones that no longer require points-to no longer import
`LegacyPointsTo`. The ones that do use the specific
`...MetricsWithPointsTo` classes that are applicable.
With `ModuleVariableNode`s now appearing for _all_ global variables (not
just the ones that actually seem to be used), some of the tests changed
a bit. Mostly this was in the form of new flow (because of new nodes
that popped into existence). For some inline expectation tests, I opted
to instead exclude these results, as there was no suitable location to
annotate. For the normal tests, I just accepted the output (after having
vetted it carefully, of course).
In hindsight, having a `.getMetrics()` method that just returns `this`
is somewhat weird. It's possible that it predates the existence of the
inline cast, however.
For whatever reason, the CFG node for exceptions and exception groups
was placed with the points-to code. (Probably because a lot of the
predicates depended on points-to.)
However, as it turned out, two of the SSA modules only depended on
non-points-to properties of these nodes, and so it was fairly
straightforward to remove the imports of `LegacyPointsTo` for those
modules.
In the process, I moved the aforementioned CFG node types into
`Flow.qll`, and changed the classes in the `Exceptions` module to the
`...WithPointsTo` form that we introduced elsewhere.
Turns out the `ImportTime` module (despite living in
`semmle.python.types` does not actually depend on points-to, so some of
the `LegacyPointsTo` imports could be replaced or removed.
This frees `Class.qll`, `Exprs.qll`, and `Function.qll` from the
clutches of points-to. For the somewhat complicated setup with
`getLiteralObject` (an abstract method), I opted for a slightly ugly but
workable solution of just defining a predicate on `ImmutableLiteral`
that inlines each predicate body, special-cased to the specific instance
to which it applies.