Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll)
and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade
(semmle.python.controlflow.internal.Cfg) and the new SSA adapter
(semmle.python.dataflow.new.internal.SsaImpl), both introduced
additively in the preceding PRs in this stack.
This is the trunk-flip equivalent of the original draft PR #21894 (kept
around as documentation), rebased on top of the four preparatory PRs:
P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919).
P2: Qualify Flow.qll's AST references with Py:: prefix (#21920).
P3: Add new shared-CFG-backed control flow graph (#21921).
P4: Add new shared-SSA-backed SSA adapter (#21923).
The Python dataflow library (semmle/python/dataflow/new/) now imports
the new CFG facade and SSA adapter. All CFG-typed predicates
(ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are
qualified with the Cfg:: prefix; SSA references switch from
EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable.
GuardNode is redesigned to use the new CFG's outcome-node model
(isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock +
flipped indirection. Only BarrierGuard<...> is preserved as public
API.
Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib,
...) are updated to take CFG nodes from the new facade.
A handful of dataflow consistency tweaks for the new CFG:
- Augmented-assignment targets are treated as both load and store.
- 'from X import *' produces uncertain SSA writes for unknown names.
- CFG nodes are canonicalised so dataflow does not see equivalent
pre/post-order pairs as distinct nodes.
Two AST tweaks for the new CFG:
- AstNodeImpl: omit PEP 695 type-parameter names from
FunctionDefExpr / ClassDefExpr children.
- ImportResolution: drop the legacy essa import.
Test churn (~175 files): reblessed library- and query-test .expected
files reflect slightly different CFG granularity, different toString
output, and a handful of true alert deltas in security queries.
Verification: all 367 lib + src + consistency-queries compile clean.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Preparatory refactor for the shared-CFG dataflow migration.
Deprecates the AstNode.getAFlowNode() cached predicate on the public
Python QL API and rewrites all ~140 internal callers across lib/, src/,
test/, and tools/ from `expr.getAFlowNode() = cfgNode` to
`cfgNode.getNode() = expr`, using ControlFlowNode.getNode() which
already exists in Flow.qll.
The predicate itself is preserved (with a deprecation note pointing at
the new pattern) so external users do not experience churn — they can
migrate at their own pace and the AST/CFG hierarchies still get the
intended untangling once the deprecation eventually elapses.
Semantic noop verified by:
- All 361 lib/ + src/ queries compile clean.
- All 122 ControlFlow + PointsTo library-tests pass.
- All 64 dataflow library-tests pass.
- All 113 Variables/Exceptions/Expressions/Statements/Functions/Imports/
Security/CWE-798/ModificationOfParameterWithDefault query-tests pass.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The internal predicates that identify `@staticmethod`, `@classmethod` and
`@property` decorators previously required the decorator's `NameNode` to
satisfy `isGlobal()` (i.e. no SSA def reaches the decorator's name use).
That filter was correct but unnecessarily indirect: these three names
are builtins, and even when a class body redefines one, the class body
has not started executing at the decorator position, so Python uses the
builtin.
Match the decorator's AST `Name` directly instead, dropping the CFG/SSA
detour. The slight semantic change — `isGlobal()` would have rejected
module-level shadowing of these builtins — is negligible in practice
and explicitly documented in the change note.
`hasContextmanagerDecorator` and `hasOverloadDecorator` keep the
`NameNode.isGlobal()` check because their target names (`contextmanager`,
`overload`) are imported, not builtin, and local shadowing is a real
concern.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
For module-level metaclass declarations, we now also check that the
right hand side in a `__metaclass__ = type` assignment is in fact the
built-in `type`.
These could arguably be moved to `Class` itself, but for now I'm
choosing to limit the changes to the `DuckTyping` module (until we
decide on a proper API).
This module (which for convenience currently resides inside
`DataFlowDispatch`, but this may change later) contains convenience
predicates for bridging the gap between the data-flow layer and the old
points-to analysis.
Adds `hasOverloadDecorator` as a predicate on functions. It looks for
decorators called `overload` or `something.overload` (usually
`typing.overload` or `t.overload`). These are then filtered out in the
predicates that (approximate) resolving methods according to the MRO.
As the test introduced in the previous commit shows, this removes the
spurious resolutions we had before.
Uses the same trick as for `ExtractedArgumentNode`, wherein we postpone
the global restriction on the charpred to instead be in the `argumentOf`
predicate (which is global anyway).
In addition to this, we also converted `CapturedVariablesArgumentNode`
into a proper synthetic node, and added an explicit post-update node for
it. These nodes just act as wrappers for the function part of call
nodes. Thus, to make them work with the variable capture machinery, we
simply map them to the closure node for the corresponding control-flow
or post-update node.
Adds support for tracking instances via type annotations. Also adds a
convenience method to the newly added `Annotation` class,
`getAnnotatedExpression`, that returns the expression that is annotated
with the given type. For return annotations this is any value returned
from the annotated function in question.
Co-authored-by: Napalys Klicius <napalys@github.com>
We used to use the CfgNode for the comprehension itself.
In cases where that is also an argument, say
```python
",".join([x for x in l])
```
that would be an argument to two different calls causing a dataflow consistency violation.
For a comprehension `[x for x in l]
- `l` is now a legal argument (in DataFlowPublic)
- `l` is the argument of the comprehension function (in DataFlowDispatch)
- the parameter of the comprehension function is being read rather than `l` (in IterableUnpacking)
Thus the read that used to cross callable boundaries is now split into a arg-param edge and a read from that param.
A case of bad magic. Rather than evaluating separately whether a class
has a method of some name, the compiler opted to magick in the fact
that this was done as part of the `findFunctionAccordingToMro`
predicate. Hilarity ensued.
However, _we_ know that magic really isn't needed in this case (the
number of results is bounded by `Class.getAMethod` since methods have
only a single name), so by factoring it out into a helper predicate, we
can help the join-orderer along.
Before
```
(377s) Starting to evaluate predicate _DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev_DataFlowDispatch::getNextClassInMro/1#__#shared/3@i6#L3#f893bw2h (iteration 6)
(377s) Tuple counts for _DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev_DataFlowDispatch::getNextClassInMro/1#__#shared/3@i6#L3#f893bw2h after 16ms:
33363 ~0% {2} r1 = SCAN `DataFlowDispatch::getNextClassInMro/1#e1ee596a#prev_delta` OUTPUT In.1, In.0 'arg1'
159696 ~4% {3} | JOIN WITH `DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev` ON FIRST 1 OUTPUT Rhs.1 'arg0', Lhs.1 'arg1', Rhs.2 'arg2'
return r1
(377s) Starting to evaluate predicate _Class::Class.getAMethod/0#dispred#66416e47_Function::Function.getName/0#dispred#033700ef_10#join_rh__#antijoin_rhs/3@i6#L4#f893bw2h (iteration 6)
(382s) Tuple counts for _Class::Class.getAMethod/0#dispred#66416e47_Function::Function.getName/0#dispred#033700ef_10#join_rh__#antijoin_rhs/3@i6#L4#f893bw2h after 4.4s:
1770825904 ~4% {4} r1 = JOIN `_DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev_DataFlowDispatch::getNextClassInMro/1#__#shared` WITH `Function::Function.getName/0#dispred#033700ef_10#join_rhs` ON FIRST 1 OUTPUT Lhs.1 'arg0', Rhs.1, Lhs.0 'arg1', Lhs.2 'arg2'
34558 ~3% {3} | JOIN WITH `Class::Class.getAMethod/0#dispred#66416e47` ON FIRST 2 OUTPUT Lhs.0 'arg0', Lhs.2 'arg1', Lhs.3 'arg2'
return r1
...
(382s) Starting to evaluate predicate DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3/3@i6#f893b1xh (iteration 6)
(382s) - DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3_delta has 125138 rows (order for disjuncts: delta=<standard>).
(382s) Tuple counts for DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3/3@i6#f893b1xh after 12ms:
33363 ~0% {2} r1 = SCAN `DataFlowDispatch::getNextClassInMro/1#e1ee596a#prev_delta` OUTPUT In.1, In.0 'cls'
159696 ~0% {3} | JOIN WITH `DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev` ON FIRST 1 OUTPUT Lhs.1 'cls', Rhs.1 'name', Rhs.2 'result'
125138 ~1% {3} | AND NOT `_Class::Class.getAMethod/0#dispred#66416e47_Function::Function.getName/0#dispred#033700ef_10#join_rh__#antijoin_rhs`(FIRST 3)
0 ~0% {3} r2 = JOIN `DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev_delta` WITH `DataFlowDispatch::getNextClassInMro/1#e1ee596a#reorder_1_0#prev` ON FIRST 1 OUTPUT Lhs.1 'name', Lhs.2 'result', Rhs.1 'cls'
{3} | AND NOT `_Class::Class.getAMethod/0#dispred#66416e47_DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#__#antijoin_rhs`(FIRST 3)
0 ~0% {3} | SCAN OUTPUT In.2 'cls', In.0 'name', In.1 'result'
125138 ~1% {3} r3 = r1 UNION r2
125138 ~1% {3} | AND NOT `DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev`(FIRST 3)
return r3
```
And now
```
(18s) Tuple counts for DataFlowDispatch::class_has_method/2#0d2ae9c0/2@ff66c1lr after 18ms:
202279 ~1% {2} r1 = JOIN `Class::Class.getAMethod/0#dispred#66416e47_10#join_rhs` WITH `Function::Function.getName/0#dispred#033700ef` ON FIRST 1 OUTPUT Lhs.1 'cls', Rhs.1 'name'
return r1
...
(490s) Tuple counts for DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3/3@i6#48b6c1xi after 54ms:
0 ~0% {3} r1 = JOIN `DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev_delta` WITH `DataFlowDispatch::getNextClassInMro/1#e1ee596a#reorder_1_0#prev` ON FIRST 1 OUTPUT Rhs.1 'cls', Lhs.1 'name', Lhs.2 'result'
0 ~0% {3} | AND NOT `DataFlowDispatch::class_has_method/2#0d2ae9c0`(FIRST 2)
33363 ~0% {2} r2 = SCAN `DataFlowDispatch::getNextClassInMro/1#e1ee596a#prev_delta` OUTPUT In.1, In.0 'cls'
159696 ~0% {3} | JOIN WITH `DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev` ON FIRST 1 OUTPUT Lhs.1 'cls', Rhs.1 'name', Rhs.2 'result'
125138 ~1% {3} | AND NOT `DataFlowDispatch::class_has_method/2#0d2ae9c0`(FIRST 2)
125138 ~1% {3} r3 = r1 UNION r2
125138 ~1% {3} | AND NOT `DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev`(FIRST 3)
return r3
```
I was surprised to see that this predicate actually gets evaluated 3 times
- Pipeline standard for DataFlowDispatch::TrackAttrReadInput::start/2#67f26627@c15596yu was evaluated in 74 iterations totaling 165ms (delta sizes total: 113119).
- Pipeline standard for DataFlowDispatch::TrackAttrReadInput::start/2#67f26627@3459ejws was evaluated in 30 iterations totaling 76ms (delta sizes total: 32555).
- Pipeline standard for DataFlowDispatch::TrackAttrReadInput::start/2#67f26627@5ac22jwq was evaluated in 30 iterations totaling 108ms (delta sizes total: 32555).
It does however fit with it being used in exactly 3 places: https://github.com/search?q=repo%3Agithub%2Fcodeql+%2FattrReadTracker%5C%28%2F&type=code -- so I assume it's because each use forces a new evaluation. Although that's something we could look into solving, for now I'm just trying to fix the join-order.
Initial
```
Pipeline standard for DataFlowDispatch::TrackAttrReadInput::start/2#67f26627@3459ejws was evaluated in 30 iterations totaling 76ms (delta sizes total: 32555).
7068090 ~0% {2} r1 = SCAN Attributes::AttrRead#class#f6c3f431 OUTPUT In.0, In.0
{2} | AND NOT `DataFlowDispatch::TrackAttrReadInput::start/2#67f26627#prev`(FIRST 2)
3901178 ~5% {2} | SCAN OUTPUT In.1, In.1
3901178 ~0% {3} | JOIN WITH `Attributes::AttrRef.getObject/0#dispred#d7cd0a97` ON FIRST 1 OUTPUT Rhs.1, Lhs.0, Lhs.1
13615 ~1% {2} r2 = JOIN r1 WITH `DataFlowDispatch::classTracker/1#d11f2237#reorder_1_0#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2
94 ~2% {2} r3 = JOIN r1 WITH `DataFlowDispatch::superCallTwoArgumentTracker/2#d18be99f#reorder_2_0_1#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2
18846 ~1% {2} r4 = JOIN r1 WITH `DataFlowDispatch::classInstanceTracker/1#d73ecef4#prev_delta_1#join_rhs` ON FIRST 1 OUTPUT Lhs.1, Lhs.2
32555 ~1% {2} r5 = r2 UNION r3 UNION r4
return r5
```
==>
```
Pipeline standard for DataFlowDispatch::TrackAttrReadInput::start/2#67f26627@f2517jwq was evaluated in 30 iterations totaling 12ms (delta sizes total: 32704).
186719 ~121% {1} r1 = SCAN `DataFlowDispatch::classInstanceTracker/1#d73ecef4#prev_delta` OUTPUT In.1
164342 ~158% {1} r2 = SCAN `DataFlowDispatch::classTracker/1#d11f2237#reorder_1_0#prev_delta` OUTPUT In.0
96 ~0% {1} r3 = SCAN `DataFlowDispatch::superCallTwoArgumentTracker/2#d18be99f#reorder_2_0_1#prev_delta` OUTPUT In.0
351157 ~80% {1} r4 = r1 UNION r2 UNION r3
88074 ~14% {1} | JOIN WITH `Attributes::AttrRef.getObject/0#dispred#d7cd0a97_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1
41789 ~18% {2} | JOIN WITH Attributes::AttrRead#class#f6c3f431 ON FIRST 1 OUTPUT Lhs.0, Lhs.0
{2} | AND NOT `DataFlowDispatch::TrackAttrReadInput::start/2#67f26627#prev`(FIRST 2)
32883 ~2% {2} | SCAN OUTPUT In.1, In.1
return r4
```
AND
initial
```
Pipeline standard for DataFlowDispatch::TrackAttrReadInput::start/2#67f26627@c15596yu was evaluated in 74 iterations totaling 165ms (delta sizes total: 113119).
17434622 ~0% {2} r1 = SCAN Attributes::AttrRead#class#f6c3f431 OUTPUT In.0, In.0
{2} | AND NOT `DataFlowDispatch::TrackAttrReadInput::start/2#67f26627#prev`(FIRST 2)
9483976 ~4% {2} | SCAN OUTPUT In.1, In.1
9483976 ~0% {3} | JOIN WITH `Attributes::AttrRef.getObject/0#dispred#d7cd0a97` ON FIRST 1 OUTPUT Rhs.1, Lhs.0, Lhs.1
19258 ~1% {2} r2 = JOIN r1 WITH `DataFlowDispatch::classInstanceTracker/1#d73ecef4#reorder_1_0#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2
1654 ~1% {2} r3 = JOIN r1 WITH `DataFlowDispatch::superCallNoArgumentTracker/1#0a2e8a06#reorder_1_0#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2
1314 ~4% {2} r4 = JOIN r1 WITH `DataFlowDispatch::clsArgumentTracker/1#47339327#reorder_1_0#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2
94 ~2% {2} r5 = JOIN r1 WITH `DataFlowDispatch::superCallTwoArgumentTracker/2#d18be99f#reorder_2_0_1#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2
77217 ~0% {2} r6 = JOIN r1 WITH `DataFlowDispatch::selfTracker/1#f157aa27#reorder_1_0#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2
13632 ~1% {2} r7 = JOIN r1 WITH `DataFlowDispatch::classTracker/1#d11f2237#reorder_1_0#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2
113169 ~0% {2} r8 = r2 UNION r3 UNION r4 UNION r5 UNION r6 UNION r7
return r8
```
==>
```
Pipeline standard for DataFlowDispatch::TrackAttrReadInput::start/2#67f26627@d732e6yt was evaluated in 74 iterations totaling 31ms (delta sizes total: 113129).
186719 ~150% {1} r1 = SCAN `DataFlowDispatch::classInstanceTracker/1#d73ecef4#reorder_1_0#prev_delta` OUTPUT In.0
1669 ~0% {1} r2 = SCAN `DataFlowDispatch::superCallNoArgumentTracker/1#0a2e8a06#reorder_1_0#prev_delta` OUTPUT In.0
3425 ~15% {1} r3 = SCAN `DataFlowDispatch::clsArgumentTracker/1#47339327#prev_delta` OUTPUT In.1
96 ~0% {1} r4 = SCAN `DataFlowDispatch::superCallTwoArgumentTracker/2#d18be99f#reorder_2_0_1#prev_delta` OUTPUT In.0
123310 ~0% {1} r5 = SCAN `DataFlowDispatch::selfTracker/1#f157aa27#reorder_1_0#prev_delta` OUTPUT In.0
164342 ~581% {1} r6 = SCAN `DataFlowDispatch::classTracker/1#d11f2237#reorder_1_0#prev_delta` OUTPUT In.0
479561 ~94% {1} r7 = r1 UNION r2 UNION r3 UNION r4 UNION r5 UNION r6
169424 ~2% {1} | JOIN WITH `Attributes::AttrRef.getObject/0#dispred#d7cd0a97_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1
116290 ~0% {2} | JOIN WITH Attributes::AttrRead#class#f6c3f431 ON FIRST 1 OUTPUT Lhs.0, Lhs.0
{2} | AND NOT `DataFlowDispatch::TrackAttrReadInput::start/2#67f26627#prev`(FIRST 2)
113160 ~0% {2} | SCAN OUTPUT In.1, In.1
return r7
```
problem is:
```
14294 ~33% {1} r23 = r21 UNION r22
13626 ~0% {2} | JOIN WITH `DataFlowPublic::Node.getEnclosingCallable/0#dispred#be95825a` ON FIRST 1 OUTPUT Rhs.1, Lhs.0
11871493 ~2% {2} | JOIN WITH `DataFlowPublic::Node.getEnclosingCallable/0#dispred#be95825a_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1, Lhs.1
6810938 ~3% {2} | JOIN WITH num#DataFlowPublic::TCfgNode#2cd2fb22_10#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1
0 ~0% {4} | JOIN WITH `DataFlowDispatch::resolveMethodCall/4#3067f1f1#reorder_0_3_1_2#prev` ON FIRST 2 OUTPUT Rhs.3, Lhs.1, Lhs.0, Rhs.2
0 ~0% {4} | JOIN WITH num#DataFlowDispatch::CallTypeClassMethod#3508c3e5 ON FIRST 1 OUTPUT Lhs.3, Lhs.2, Lhs.0, Lhs.1
0 ~0% {4} | JOIN WITH `DataFlowDispatch::resolveCall/3#454c02d8#reorder_1_0_2#prev` ON FIRST 3 OUTPUT Lhs.3, Lhs.1, Lhs.0, Lhs.2
0 ~0% {5} | JOIN WITH num#DataFlowDispatch::TSelfArgumentPosition#de6d64b8 CARTESIAN PRODUCT OUTPUT Lhs.1, Lhs.2, Lhs.3, Lhs.0, Rhs.0
```
that is, it does cartesian product of DataFlowPublic::Node.getEnclosingCallable
After fix
```
14294 ~33% {1} r23 = r21 UNION r22
0 ~0% {4} | JOIN WITH `DataFlowDispatch::resolveMethodCall/4#3067f1f1#reorder_3_0_1_2#prev` ON FIRST 1 OUTPUT Rhs.3, Lhs.0, Rhs.1, Rhs.2
0 ~0% {4} | JOIN WITH num#DataFlowDispatch::CallTypeClassMethod#3508c3e5 ON FIRST 1 OUTPUT Lhs.3, Lhs.2, Lhs.0, Lhs.1
0 ~0% {4} | JOIN WITH `DataFlowDispatch::resolveCall/3#454c02d8#reorder_1_0_2#prev` ON FIRST 3 OUTPUT Lhs.1, Lhs.3, Lhs.0, Lhs.2
0 ~0% {5} | JOIN WITH num#DataFlowPublic::TCfgNode#2cd2fb22 ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.0, Lhs.2, Lhs.3
0 ~0% {5} | JOIN WITH `DataFlowPublic::Node.getEnclosingCallable/0#dispred#be95825a` ON FIRST 1 OUTPUT Lhs.1, Rhs.1, Lhs.2, Lhs.3, Lhs.4
0 ~0% {4} | JOIN WITH `DataFlowPublic::Node.getEnclosingCallable/0#dispred#be95825a` ON FIRST 2 OUTPUT Lhs.0, Lhs.2, Lhs.3, Lhs.4
0 ~0% {5} | JOIN WITH num#DataFlowDispatch::TSelfArgumentPosition#de6d64b8 CARTESIAN PRODUCT OUTPUT Lhs.1, Lhs.2, Lhs.3, Lhs.0, Rhs.0
```
Overall stats
(old)
Pipeline standard for DataFlowDispatch::getCallArg/5#21589076@b30c7vxg was evaluated in 51 iterations totaling 54ms (delta sizes total: 38247).
==>
(new)
Pipeline standard for DataFlowDispatch::getCallArg/5#21589076@c1559vxu was evaluated in 51 iterations totaling 28ms (delta sizes total: 38247).
for `CapturingClosureArgumentNode`.
Unless we define it for every single `CallNode`, we need a more
sophisticated mutual recursion with the call graph construction.
There is built-in support for that, but we are currently not using it.