Python: add new shared-CFG-backed control flow graph

Preparatory refactor for the shared-CFG dataflow migration. Adds the
new Python CFG library additively, without changing any production
behaviour.

Library additions:

- semmle.python.controlflow.internal.AstNodeImpl — mediates between
  the Python AST and the shared codeql.controlflow.ControlFlowGraph
  signature. Wraps Python's Stmt/Expr/Scope/Pattern and adds two
  synthetic kinds of node (BlockStmt for body slots, intermediate
  nodes for multi-operand boolean expressions).

- semmle.python.controlflow.internal.Cfg — public facade
  re-exposing the same API surface as semmle/python/Flow.qll
  (ControlFlowNode, CallNode, BasicBlock, NameNode, DefinitionNode,
  CompareNode, ...), backed by the shared CFG.

- lib/printCfgNew.ql — debug/visualisation query for the new CFG.

- consistency-queries/CfgConsistency.ql — consistency query running
  the shared CFG's standard checks against Python.

Shared library:

- shared.controlflow.ControlFlowGraph — adds two defaulted
  getWhileElse / getForeachElse predicates to AstSig so Python can
  model while-else / for-else (no behavioural change for other
  languages).

Test additions:

- ControlFlow/bindings/* — annotation-driven SSA-binding tests for
  the new CFG (annassign, compound, comprehension, decorated,
  except_handler, imports, match_pattern, parameters, simple,
  type_params, walrus_starred, with_stmt, dead_under_no_raise).

- ControlFlow/store-load/* — basic store/load coverage.

- ControlFlow/evaluation-order/NewCfg*.ql — mirrors of the existing
  OldCfg evaluation-order self-validation suite, run against the
  new CFG via NewCfgImpl.qll.

- Minor extensions to existing test_if.py / test_boolean.py +
  cosmetic .expected churn on a handful of OldCfg tests.

No dataflow, SSA, or production query is migrated yet — that lands in
follow-up PRs. The new CFG library has zero callers in lib/ and src/.

Verified by:
- All lib + src + consistency-queries compile clean (367 queries).
- All 56 ControlFlow library-tests pass.
- All 474 dataflow + PointsTo library-tests + consistency tests pass.
- syntax_error/CONSISTENCY/CfgConsistency passes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
Copilot
2026-06-02 14:09:28 +00:00
parent 07c5c91de4
commit e68f2fd717
68 changed files with 3730 additions and 12 deletions

View File

@@ -0,0 +1,2 @@
import semmle.python.controlflow.internal.AstNodeImpl
import ControlFlow::Consistency

View File

@@ -0,0 +1,4 @@
---
category: minorAnalysis
---
* A new Python control flow graph implementation has been added under `semmle.python.controlflow.internal.Cfg` (backed by `AstNodeImpl.qll`), built on the shared `codeql.controlflow.ControlFlowGraph` library. It is not yet used by the dataflow library or any production query; the legacy CFG in `semmle/python/Flow.qll` remains the default. The new library is exposed for tests and for upcoming migrations.

View File

@@ -0,0 +1,45 @@
/**
* @name Print CFG (New)
* @description Produces a representation of a file's Control Flow Graph
* using the new shared control flow library.
* This query is used by the VS Code extension.
* @id python/print-cfg
* @kind graph
* @tags ide-contextual-queries/print-cfg
*/
private import python as Py
import semmle.python.controlflow.internal.AstNodeImpl
external string selectedSourceFile();
private predicate selectedSourceFileAlias = selectedSourceFile/0;
external int selectedSourceLine();
private predicate selectedSourceLineAlias = selectedSourceLine/0;
external int selectedSourceColumn();
private predicate selectedSourceColumnAlias = selectedSourceColumn/0;
module ViewCfgQueryInput implements ControlFlow::ViewCfgQueryInputSig<Py::File> {
predicate selectedSourceFile = selectedSourceFileAlias/0;
predicate selectedSourceLine = selectedSourceLineAlias/0;
predicate selectedSourceColumn = selectedSourceColumnAlias/0;
predicate cfgScopeSpan(
Ast::Callable callable, Py::File file, int startLine, int startColumn, int endLine,
int endColumn
) {
exists(Py::Scope scope |
scope = callable.asScope() and
file = scope.getLocation().getFile() and
scope.getLocation().hasLocationInfo(_, startLine, startColumn, endLine, endColumn)
)
}
}
import ControlFlow::ViewCfgQuery<Py::File, ViewCfgQueryInput>

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,4 @@
consistencyOverview
| deadEnd | 1 |
deadEnd
| without_loop.py:7:5:7:9 | Break |

View File

@@ -0,0 +1,32 @@
/**
* Phase -1 of the dataflow CFG migration: verifies that every variable
* binding visible to the AST (`Name.defines(v)`) corresponds to a CFG node
* in the new CFG (`semmle.python.controlflow.internal.AstNodeImpl`).
*
* The expected tag is `cfgdefines=<name>`. Each binding annotation in the
* test sources looks like `# $ cfgdefines=x` for a binding currently
* covered by the new CFG, or `# $ MISSING: cfgdefines=x` for a binding
* that is known to be uncovered (a "red" test case that should be
* green-flipped once the corresponding `cfg-ext-*` extension lands).
*/
import python
import semmle.python.controlflow.internal.AstNodeImpl as CfgImpl
import utils.test.InlineExpectationsTest
module CfgBindingsTest implements TestSig {
string getARelevantTag() { result = "cfgdefines" }
predicate hasActualResult(Location location, string element, string tag, string value) {
exists(Name n, Variable v, CfgImpl::ControlFlowNode cfg |
n.defines(v) and
cfg.getAstNode().asExpr() = n and
location = n.getLocation() and
element = n.toString() and
tag = "cfgdefines" and
value = v.getId()
)
}
}
import MakeTest<CfgBindingsTest>

View File

@@ -0,0 +1,13 @@
# Annotated assignment (PEP 526). Both with and without an initializer.
a: int = 1 # $ cfgdefines=a
b: str = "hi" # $ cfgdefines=b
# Annotation without value: the AST records `c` as defined,
# and the new CFG now visits it via the AnnAssignStmt wrapper.
c: int # $ cfgdefines=c
class K: # $ cfgdefines=K
field: int = 0 # $ cfgdefines=field

View File

@@ -0,0 +1,14 @@
# Compound (tuple/list) assignment targets — actually wired in the new CFG.
a, b = (1, 2) # $ cfgdefines=a cfgdefines=b
[c, d] = [3, 4] # $ cfgdefines=c cfgdefines=d
# Nested unpacking.
(e, (f, g)) = (1, (2, 3)) # $ cfgdefines=e cfgdefines=f cfgdefines=g
# Star unpacking.
h, *i = [1, 2, 3] # $ cfgdefines=h cfgdefines=i
# Chained assignment with compound target.
j = k, l = (5, 6) # $ cfgdefines=j cfgdefines=k cfgdefines=l

View File

@@ -0,0 +1,21 @@
# Comprehension and `for` loop targets — wired in the new CFG.
# Comprehensions are nested function scopes with a synthetic `.0` parameter
# bound to the iterable.
# Bare-name `for` target.
for i in range(3): # $ cfgdefines=i
pass
# Compound `for` target.
for k, v in [(1, 2)]: # $ cfgdefines=k cfgdefines=v
pass
# Comprehension targets.
_ = [x for x in range(3)] # $ cfgdefines=_ cfgdefines=x cfgdefines=.0
_ = {y: z for y, z in []} # $ cfgdefines=_ cfgdefines=y cfgdefines=z cfgdefines=.0
_ = (a for a in []) # $ cfgdefines=_ cfgdefines=a cfgdefines=.0
# Nested comprehensions.
_ = [b for c in [] for b in c] # $ cfgdefines=_ cfgdefines=c cfgdefines=b cfgdefines=.0

View File

@@ -0,0 +1,52 @@
# Dead bindings under the "no expressions raise" CFG abstraction.
#
# The new CFG does not currently model raise edges from arbitrary
# expressions. As a consequence, code that is only reachable through
# exception flow is (correctly) classified as dead and has no CFG node.
# Variable bindings in dead code do not need CFG nodes - SSA / dataflow
# over dead code is moot.
#
# These tests act as a regression guard: the bindings below intentionally
# have no `cfgdefines=` annotations. If raise modelling is later added,
# the BindingsTest infrastructure will surface the new CFG nodes as
# unexpected results, and this file will need to be revisited.
def f(obj): # $ cfgdefines=f cfgdefines=obj
try:
return len(obj)
except TypeError:
pass
# The first try's body always returns; its except handler does not
# raise or otherwise transfer control, so under "no expressions
# raise" the only paths out of the try-statement are dead. Everything
# below is unreachable.
try:
hint = type(obj).__length_hint__
except AttributeError:
return None
return hint
def g(): # $ cfgdefines=g
try:
raise Exception("inner")
except:
raise Exception("outer")
else:
# Unreachable: the inner try body always raises, so the `else:`
# clause never runs.
hit_inner_else = True
def h(cache, key): # $ cfgdefines=h cfgdefines=cache cfgdefines=key
try:
return cache[key]
except KeyError:
pass
# Same pattern as `f`: dead under "no expressions raise".
value = compute(key)
cache[key] = value
return value

View File

@@ -0,0 +1,30 @@
# Decorated `def`/`class` — wired in the new CFG.
def deco(f): # $ cfgdefines=deco cfgdefines=f
return f
@deco
def decorated_func(): # $ cfgdefines=decorated_func
pass
@deco
class DecoratedClass: # $ cfgdefines=DecoratedClass
pass
# Stacked decorators.
@deco
@deco
def doubly(): # $ cfgdefines=doubly
pass
# Inside a class body.
class Outer: # $ cfgdefines=Outer
@staticmethod
def inner(): # $ cfgdefines=inner
pass

View File

@@ -0,0 +1,19 @@
# Exception-handler name bindings. These are already wired in the new
# CFG provided the try body can raise; `raise` statements are reliably
# treated as exception sources.
try:
raise ValueError("oops")
except ValueError as e: # $ cfgdefines=e
pass
try:
raise TypeError("oops")
except (TypeError, KeyError) as err: # $ cfgdefines=err
pass
# Exception groups (Python 3.11+).
try:
raise ValueError("oops")
except* ValueError as eg: # $ cfgdefines=eg
pass

View File

@@ -0,0 +1,14 @@
# Import aliases — all bound names below are now reachable via the new
# CFG's `ImportStmt` wrapper.
import os # $ cfgdefines=os
import os.path # $ cfgdefines=os
import os as o # $ cfgdefines=o
from os import path # $ cfgdefines=path
from os import path as p # $ cfgdefines=p
from os import sep, linesep # $ cfgdefines=sep cfgdefines=linesep
from os import (
getcwd, # $ cfgdefines=getcwd
getcwdb, # $ cfgdefines=getcwdb
)

View File

@@ -0,0 +1,24 @@
# Match-statement pattern bindings — wired in the new CFG.
def f(subject): # $ cfgdefines=f cfgdefines=subject
match subject:
case x: # $ cfgdefines=x
pass
case [a, b]: # $ cfgdefines=a cfgdefines=b
pass
case {"k": v}: # $ cfgdefines=v
pass
case Point(p, q): # $ cfgdefines=p cfgdefines=q
pass
case [_, *rest]: # $ cfgdefines=rest
pass
case (1 | 2) as n: # $ cfgdefines=n
pass
class Point: # $ cfgdefines=Point
__match_args__ = ("x", "y") # $ cfgdefines=__match_args__
x: int # $ cfgdefines=x
y: int # $ cfgdefines=y

View File

@@ -0,0 +1,42 @@
# Function parameters.
def positional(a, b): # $ cfgdefines=positional cfgdefines=a cfgdefines=b
pass
def with_default(x=1, y=2): # $ cfgdefines=with_default cfgdefines=x cfgdefines=y
pass
def with_vararg(*args): # $ cfgdefines=with_vararg cfgdefines=args
pass
def with_kwarg(**kwargs): # $ cfgdefines=with_kwarg cfgdefines=kwargs
pass
def with_kwonly(*, k1, k2=5): # $ cfgdefines=with_kwonly cfgdefines=k1 cfgdefines=k2
pass
def kitchen_sink(a, b=2, *args, k1, k2=5, **kw): # $ cfgdefines=kitchen_sink cfgdefines=a cfgdefines=b cfgdefines=args cfgdefines=k1 cfgdefines=k2 cfgdefines=kw
pass
# Methods get `self` / `cls`.
class C: # $ cfgdefines=C
def method(self, x): # $ cfgdefines=method cfgdefines=self cfgdefines=x
pass
@classmethod
def cmethod(cls, x): # $ cfgdefines=cmethod cfgdefines=cls cfgdefines=x
pass
# Lambda parameter.
_ = lambda p: p + 1 # $ cfgdefines=_ cfgdefines=p
# PEP 570 positional-only.
def pos_only(a, b, /, c): # $ cfgdefines=pos_only cfgdefines=a cfgdefines=b cfgdefines=c
pass

View File

@@ -0,0 +1,14 @@
# Simple bindings that should already work in the new CFG.
# No MISSING annotations expected.
x = 1 # $ cfgdefines=x
y = x + 1 # $ cfgdefines=y
def f(): # $ cfgdefines=f
pass
class C: # $ cfgdefines=C
pass
# Re-assignment.
x = 2 # $ cfgdefines=x

View File

@@ -0,0 +1,21 @@
# PEP 695 type parameters (Python 3.12+).
# PEP 695 type-param names on `def`/`class` bind in an annotation scope
# that nests the function/class body — they have no CFG node in the
# enclosing scope (matching the legacy CFG).
def func[T](x: T) -> T: # $ cfgdefines=func cfgdefines=x
return x
class Box[T]: # $ cfgdefines=Box
item: T # $ cfgdefines=item
# Multi-parameter, with bound and variadics.
def multi[T: int, *Ts, **P](x: T, *args: *Ts, **kwargs: P.kwargs) -> T: # $ cfgdefines=multi cfgdefines=x cfgdefines=args cfgdefines=kwargs
return x
# `type` statement (PEP 695).
type Alias[T] = list[T] # $ cfgdefines=Alias cfgdefines=T

View File

@@ -0,0 +1,14 @@
# Walrus and starred-target edge cases — wired in the new CFG.
# Walrus in expression context.
if (y := 5) > 0: # $ cfgdefines=y
pass
# Walrus in a comprehension. The comprehension introduces a synthetic
# `.0` parameter bound to the iterable.
_ = [w for _ in range(3) if (w := 1)] # $ cfgdefines=_ cfgdefines=w cfgdefines=.0
# Starred target in a Tuple LHS.
*head, tail = [1, 2, 3] # $ cfgdefines=head cfgdefines=tail

View File

@@ -0,0 +1,21 @@
# `with cm() as x:` bindings — wired in the new CFG.
class CM: # $ cfgdefines=CM
def __enter__(self): return self # $ cfgdefines=__enter__ cfgdefines=self
def __exit__(self, *a): pass # $ cfgdefines=__exit__ cfgdefines=self cfgdefines=a
with CM() as x: # $ cfgdefines=x
pass
# Multiple items.
with CM() as a, CM() as b: # $ cfgdefines=a cfgdefines=b
pass
# Parenthesised form (Python 3.10+).
with (CM() as p, CM() as q): # $ cfgdefines=p cfgdefines=q
pass
# Compound target in `with`.
with CM() as (m, n): # $ cfgdefines=m cfgdefines=n
pass

View File

@@ -5,6 +5,8 @@
* have separate CFGs and are excluded from this check.
*/
import python
import TimerUtils
import OldCfgImpl
private module Utils = EvalOrderCfgUtils<OldCfg>;

View File

@@ -2,6 +2,8 @@
* Checks that every timer annotation has a corresponding CFG node.
*/
import python
import TimerUtils
import OldCfgImpl
private module Utils = EvalOrderCfgUtils<OldCfg>;

View File

@@ -8,6 +8,8 @@
* edge leaves the basic block and the normal successor may be dead.
*/
import python
import TimerUtils
import OldCfgImpl
private module Utils = EvalOrderCfgUtils<OldCfg>;

View File

@@ -1,7 +1,7 @@
| test_boolean.py:9:10:9:43 | ControlFlowNode for BoolExpr | Basic block ordering: $@ appears before $@ | test_boolean.py:9:59:9:59 | IntegerLiteral | timestamp 2 | test_boolean.py:9:19:9:19 | IntegerLiteral | timestamp 0 |
| test_boolean.py:15:10:15:43 | ControlFlowNode for BoolExpr | Basic block ordering: $@ appears before $@ | test_boolean.py:15:50:15:50 | IntegerLiteral | timestamp 1 | test_boolean.py:15:20:15:20 | IntegerLiteral | timestamp 0 |
| test_boolean.py:21:10:21:42 | ControlFlowNode for BoolExpr | Basic block ordering: $@ appears before $@ | test_boolean.py:21:49:21:49 | IntegerLiteral | timestamp 1 | test_boolean.py:21:19:21:19 | IntegerLiteral | timestamp 0 |
| test_boolean.py:27:10:27:34 | ControlFlowNode for BoolExpr | Basic block ordering: $@ appears before $@ | test_boolean.py:27:50:27:50 | IntegerLiteral | timestamp 2 | test_boolean.py:27:20:27:20 | IntegerLiteral | timestamp 0 |
| test_boolean.py:27:10:27:43 | ControlFlowNode for BoolExpr | Basic block ordering: $@ appears before $@ | test_boolean.py:27:59:27:59 | IntegerLiteral | timestamp 2 | test_boolean.py:27:20:27:20 | IntegerLiteral | timestamp 0 |
| test_boolean.py:40:10:40:61 | ControlFlowNode for BoolExpr | Basic block ordering: $@ appears before $@ | test_boolean.py:40:86:40:86 | IntegerLiteral | timestamp 3 | test_boolean.py:40:16:40:16 | IntegerLiteral | timestamp 0 |
| test_boolean.py:46:10:46:61 | ControlFlowNode for BoolExpr | Basic block ordering: $@ appears before $@ | test_boolean.py:46:86:46:86 | IntegerLiteral | timestamp 3 | test_boolean.py:46:16:46:16 | IntegerLiteral | timestamp 0 |
| test_boolean.py:52:10:52:95 | ControlFlowNode for BoolExpr | Basic block ordering: $@ appears before $@ | test_boolean.py:52:120:52:120 | IntegerLiteral | timestamp 4 | test_boolean.py:52:20:52:20 | IntegerLiteral | timestamp 0 |

View File

@@ -3,6 +3,8 @@
* increasing minimum-timestamp order.
*/
import python
import TimerUtils
import OldCfgImpl
private module Utils = EvalOrderCfgUtils<OldCfg>;

View File

@@ -11,6 +11,8 @@
* lambdas that have annotations in nested scopes).
*/
import python
import TimerUtils
import OldCfgImpl
private module Utils = EvalOrderCfgUtils<OldCfg>;

View File

@@ -4,6 +4,7 @@
* in at least one annotation (live or dead).
*/
import python
import TimerUtils
from TestFunction f, int missing, int maxTs, TimerAnnotation maxAnn

View File

@@ -4,6 +4,8 @@
* entry (including within the same basic block).
*/
import python
import TimerUtils
import OldCfgImpl
private module Utils = EvalOrderCfgUtils<OldCfg>;

View File

@@ -0,0 +1,14 @@
/** New-CFG version of AllLiveReachable. */
import python
import TimerUtils
import NewCfgImpl
private module Utils = EvalOrderCfgUtils<NewCfg>;
private import Utils
private import Utils::CfgTests
from TimerCfgNode a, TestFunction f
where allLiveReachable(a, f)
select a, "Unreachable live annotation; entry of $@ does not reach this node", f, f.getName()

View File

@@ -0,0 +1,18 @@
/**
* New-CFG version of AnnotationHasCfgNode.
*
* Checks that every timer annotation has a corresponding CFG node.
*/
import python
import TimerUtils
import NewCfgImpl
private module Utils = EvalOrderCfgUtils<NewCfg>;
private import Utils::CfgTests
from TimerAnnotation ann
where annotationWithoutCfgNode(ann)
select ann, "Annotation in $@ has no CFG node", ann.getTestFunction(),
ann.getTestFunction().getName()

View File

@@ -0,0 +1,26 @@
/**
* New-CFG version of BasicBlockAnnotationGap.
*
* Original:
* Checks that within a basic block, if a node is annotated then its
* successor is also annotated (or excluded). A gap in annotations
* within a basic block indicates a missing annotation, since there
* are no branches to justify the gap.
*
* Nodes with exceptional successors are excluded, as the exception
* edge leaves the basic block and the normal successor may be dead.
*/
import python
import TimerUtils
import NewCfgImpl
private module Utils = EvalOrderCfgUtils<NewCfg>;
private import Utils
private import Utils::CfgTests
from TimerCfgNode a, CfgNode succ
where basicBlockAnnotationGap(a, succ)
select a, "Annotated node followed by unannotated $@ in the same basic block", succ,
succ.getNode().toString()

View File

@@ -0,0 +1,21 @@
/**
* New-CFG version of BasicBlockOrdering.
*
* Original:
* Checks that within a single basic block, annotations appear in
* increasing minimum-timestamp order.
*/
import python
import TimerUtils
import NewCfgImpl
private module Utils = EvalOrderCfgUtils<NewCfg>;
private import Utils
private import Utils::CfgTests
from TimerCfgNode a, TimerCfgNode b, int minA, int minB
where basicBlockOrdering(a, b, minA, minB)
select a, "Basic block ordering: $@ appears before $@", a.getTimestampExpr(minA),
"timestamp " + minA, b.getTimestampExpr(minB), "timestamp " + minB

View File

@@ -0,0 +1,80 @@
/**
* New-CFG version of BranchTimestamps.
*
* Checks that when a node has both a true and false successor, the
* live timestamps on one branch are marked as dead on the other.
* This ensures that boolean branches are fully annotated with dead()
* markers for the paths not taken.
*
* Limitation: the `@ t[ts, ...]` / `dead(ts)` annotation scheme can only
* model branch-dead-ness for plain boolean control flow that reconverges
* linearly after the split — i.e. `if`-with-else and `if`-expression.
* It cannot model:
*
* * loops (`while` / `for`): body timestamps repeat across iterations,
* so the loop-exit annotation can't list them as dead;
* * `match` statements: each `case` body is a syntactically distinct
* sub-tree, and the branches don't reconverge through a common
* annotation point in the timeline;
* * `try` / `with` and `raise` / `assert`: exception edges are modelled
* as true/false but flow to syntactically distinct handlers, with no
* reconvergence in the linear annotation order;
* * short-circuit `and` / `or` (`BoolExpr`): the branches reconverge at
* the BoolExpr's after-node, so timestamps on one branch are live
* downstream of the other rather than dead;
* * `if` without an `else` clause, and `if`/`elif` chains: the false
* branch reconverges with the true branch at the post-if statement
* (no-else) or fans out across multiple elif-test annotations,
* neither of which fit the binary annotation scheme.
*
* Branch nodes inside those constructs are therefore whitelisted out
* below. The check still fires (and is useful) for plain `if`/`else`
* and conditional-expression branching.
*/
import python
import TimerUtils
import NewCfgImpl
private module Utils = EvalOrderCfgUtils<NewCfg>;
private import Utils
private import Utils::CfgTests
/**
* Holds if `f` contains a construct whose branches the linear-timestamp
* annotation scheme cannot describe (see file-level comment).
*/
private predicate hasUnmodellableBranching(Function f) {
exists(AstNode bad |
bad.getScope() = f and
(
bad instanceof While
or
bad instanceof For
or
bad instanceof MatchStmt
or
bad instanceof Try
or
bad instanceof With
or
bad instanceof Raise
or
bad instanceof Assert
or
bad instanceof BoolExpr
or
bad instanceof If and
(not exists(bad.(If).getAnOrelse()) or bad.(If).isElif())
)
)
}
from TimerCfgNode node, int ts, string branch
where
missingBranchTimestamp(node, ts, branch) and
not hasUnmodellableBranching(node.getTestFunction())
select node,
"Timestamp " + ts + " on true/false branch is missing a dead() annotation on the " + branch +
" successor in $@", node.getTestFunction(), node.getTestFunction().getName()

View File

@@ -0,0 +1,22 @@
/**
* New-CFG version of ConsecutivePredecessorTimestamps.
*
* Checks that each annotated node (except the minimum timestamp) has
* a predecessor annotation with timestamp `a - 1`. This is the reverse
* of ConsecutiveTimestamps: it catches nodes that are reachable but
* arrived at from the wrong place (skipping an intermediate node).
*/
import python
import TimerUtils
import NewCfgImpl
private module Utils = EvalOrderCfgUtils<NewCfg>;
private import Utils
private import Utils::CfgTests
from TimerAnnotation ann, int a
where consecutivePredecessorTimestamps(ann, a)
select ann, "$@ in $@ has no consecutive predecessor (expected " + (a - 1) + ")",
ann.getTimestampExpr(a), "Timestamp " + a, ann.getTestFunction(), ann.getTestFunction().getName()

View File

@@ -0,0 +1,29 @@
/**
* New-CFG version of ConsecutiveTimestamps.
*
* Original:
* Checks that consecutive annotated nodes have consecutive timestamps:
* for each annotation with timestamp `a`, some CFG node for that annotation
* must have a next annotation containing `a + 1`.
*
* Handles CFG splitting (e.g., finally blocks duplicated for normal/exceptional
* flow) by checking that at least one split has the required successor.
*
* Only applies to functions where all annotations are in the function's
* own scope (excludes tests with generators, async, comprehensions, or
* lambdas that have annotations in nested scopes).
*/
import python
import TimerUtils
import NewCfgImpl
private module Utils = EvalOrderCfgUtils<NewCfg>;
private import Utils
private import Utils::CfgTests
from TimerAnnotation ann, int a
where consecutiveTimestamps(ann, a)
select ann, "$@ in $@ has no consecutive successor (expected " + (a + 1) + ")",
ann.getTimestampExpr(a), "Timestamp " + a, ann.getTestFunction(), ann.getTestFunction().getName()

View File

@@ -0,0 +1,101 @@
/**
* Implementation of the evaluation-order CFG signature using the new
* shared control flow graph from AstNodeImpl.
*/
private import python as Py
import TimerUtils
private import semmle.python.controlflow.internal.AstNodeImpl as CfgImpl
private import codeql.controlflow.SuccessorType
private class NewControlFlowNode = CfgImpl::ControlFlowNode;
private class NewBasicBlock = CfgImpl::BasicBlock;
/** New (shared) CFG implementation of the evaluation-order signature. */
module NewCfg implements EvalOrderCfgSig {
class CfgNode instanceof NewControlFlowNode {
// Use the post-order representative for each AST node: the "after" node.
// For simple leaf nodes this is the merged before/after node. For
// post-order expressions this is the TAstNode. For pre-order expressions
// (and/or/not/ternary) this uses an AfterValueNode, which places the
// expression after its operands — matching the timer test expectations.
CfgNode() { NewControlFlowNode.super.isAfter(_) }
string toString() { result = NewControlFlowNode.super.toString() }
Py::Location getLocation() { result = NewControlFlowNode.super.getLocation() }
Py::AstNode getNode() {
result = CfgImpl::astNodeToPyNode(NewControlFlowNode.super.getAstNode())
}
CfgNode getASuccessor() { nextCfgNode(this, result) }
CfgNode getATrueSuccessor() {
NewControlFlowNode.super.isAfterTrue(_) and
// Only where there's also a false branch (true boolean split)
exists(NewControlFlowNode other | other.isAfterFalse(NewControlFlowNode.super.getAstNode())) and
nextCfgNodeFrom(this, result)
}
CfgNode getAFalseSuccessor() {
NewControlFlowNode.super.isAfterFalse(_) and
// Only where there's also a true branch (true boolean split)
exists(NewControlFlowNode other | other.isAfterTrue(NewControlFlowNode.super.getAstNode())) and
nextCfgNodeFrom(this, result)
}
CfgNode getAnExceptionalSuccessor() {
exists(NewControlFlowNode mid |
mid = NewControlFlowNode.super.getAnExceptionSuccessor() and
nextCfgNodeFrom(mid, result)
)
}
Py::Scope getScope() { result = NewControlFlowNode.super.getEnclosingCallable().asScope() }
BasicBlock getBasicBlock() {
exists(NewBasicBlock bb, int i | bb.getNode(i) = this and result = bb)
}
}
/**
* Holds if `next` is the nearest CfgNode reachable from `n` via
* one or more raw CFG successor edges, skipping non-CfgNode intermediaries.
*/
private predicate nextCfgNodeFrom(NewControlFlowNode n, CfgNode next) {
next = n.getASuccessor()
or
exists(NewControlFlowNode mid |
mid = n.getASuccessor() and
not mid instanceof CfgNode and
nextCfgNodeFrom(mid, next)
)
}
/**
* Holds if `next` is the nearest CfgNode successor of `n`,
* skipping synthetic intermediate nodes.
*/
private predicate nextCfgNode(CfgNode n, CfgNode next) { nextCfgNodeFrom(n, next) }
class BasicBlock instanceof NewBasicBlock {
string toString() { result = NewBasicBlock.super.toString() }
CfgNode getNode(int n) { result = NewBasicBlock.super.getNode(n) }
predicate reaches(BasicBlock bb) { this = bb or this.strictlyReaches(bb) }
predicate strictlyReaches(BasicBlock bb) { NewBasicBlock.super.getASuccessor+() = bb }
predicate strictlyDominates(BasicBlock bb) { NewBasicBlock.super.strictlyDominates(bb) }
}
CfgNode scopeGetEntryNode(Py::Scope s) {
exists(CfgImpl::ControlFlow::EntryNode entry |
entry.getEnclosingCallable().asScope() = s and
nextCfgNodeFrom(entry, result)
)
}
}

View File

@@ -0,0 +1,21 @@
/**
* New-CFG version of NeverReachable.
*
* Original:
* Checks that expressions annotated with `t.never` either have no CFG
* node, or if they do, that the node is not reachable from its scope's
* entry (including within the same basic block).
*/
import python
import TimerUtils
import NewCfgImpl
private module Utils = EvalOrderCfgUtils<NewCfg>;
private import Utils::CfgTests
from TimerAnnotation ann
where neverReachable(ann)
select ann, "Node annotated with t.never is reachable in $@", ann.getTestFunction(),
ann.getTestFunction().getName()

View File

@@ -0,0 +1,22 @@
/**
* New-CFG version of NoBackwardFlow.
*
* Original:
* Checks that time never flows backward between consecutive timer annotations
* in the CFG. For each pair of consecutive annotated nodes (A -> B), there must
* exist timestamps a in A and b in B with a < b.
*/
import python
import TimerUtils
import NewCfgImpl
private module Utils = EvalOrderCfgUtils<NewCfg>;
private import Utils
private import Utils::CfgTests
from TimerCfgNode a, TimerCfgNode b, int minA, int maxB
where noBackwardFlow(a, b, minA, maxB)
select a, "Backward flow: $@ flows to $@ (max timestamp $@)", a.getTimestampExpr(minA),
minA.toString(), b, b.getNode().toString(), b.getTimestampExpr(maxB), maxB.toString()

View File

@@ -0,0 +1,18 @@
/**
* New-CFG version of NoBasicBlock.
*
* Checks that every annotated CFG node belongs to a basic block.
*/
import python
import TimerUtils
import NewCfgImpl
private module Utils = EvalOrderCfgUtils<NewCfg>;
private import Utils
private import Utils::CfgTests
from CfgNode n, TestFunction f
where noBasicBlock(n, f)
select n, "CFG node in $@ does not belong to any basic block", f, f.getName()

View File

@@ -0,0 +1,21 @@
/**
* New-CFG version of NoSharedReachable.
*
* Original:
* Checks that two annotations sharing a timestamp value are on
* mutually exclusive CFG paths (neither can reach the other).
*/
import python
import TimerUtils
import NewCfgImpl
private module Utils = EvalOrderCfgUtils<NewCfg>;
private import Utils
private import Utils::CfgTests
from TimerCfgNode a, TimerCfgNode b, int ts
where noSharedReachable(a, b, ts)
select a, "Shared timestamp $@ but this node reaches $@", a.getTimestampExpr(ts), ts.toString(), b,
b.getNode().toString()

View File

@@ -0,0 +1,22 @@
/**
* New-CFG version of StrictForward.
*
* Original:
* Stronger version of NoBackwardFlow: for consecutive annotated nodes
* A -> B that both have a single timestamp (non-loop code) and B does
* NOT dominate A (forward edge), requires max(A) < min(B).
*/
import python
import TimerUtils
import NewCfgImpl
private module Utils = EvalOrderCfgUtils<NewCfg>;
private import Utils
private import Utils::CfgTests
from TimerCfgNode a, TimerCfgNode b, int maxA, int minB
where strictForward(a, b, maxA, minB)
select a, "Strict forward violation: $@ flows to $@", a.getTimestampExpr(maxA), "timestamp " + maxA,
b.getTimestampExpr(minB), "timestamp " + minB

View File

@@ -1,7 +1,7 @@
| test_boolean.py:9:10:9:43 | ControlFlowNode for BoolExpr | Backward flow: $@ flows to $@ (max timestamp $@) | test_boolean.py:9:59:9:59 | IntegerLiteral | 2 | test_boolean.py:9:10:9:13 | ControlFlowNode for True | True | test_boolean.py:9:19:9:19 | IntegerLiteral | 0 |
| test_boolean.py:15:10:15:43 | ControlFlowNode for BoolExpr | Backward flow: $@ flows to $@ (max timestamp $@) | test_boolean.py:15:50:15:50 | IntegerLiteral | 1 | test_boolean.py:15:10:15:14 | ControlFlowNode for False | False | test_boolean.py:15:20:15:20 | IntegerLiteral | 0 |
| test_boolean.py:21:10:21:42 | ControlFlowNode for BoolExpr | Backward flow: $@ flows to $@ (max timestamp $@) | test_boolean.py:21:49:21:49 | IntegerLiteral | 1 | test_boolean.py:21:10:21:13 | ControlFlowNode for True | True | test_boolean.py:21:19:21:19 | IntegerLiteral | 0 |
| test_boolean.py:27:10:27:34 | ControlFlowNode for BoolExpr | Backward flow: $@ flows to $@ (max timestamp $@) | test_boolean.py:27:50:27:50 | IntegerLiteral | 2 | test_boolean.py:27:10:27:14 | ControlFlowNode for False | False | test_boolean.py:27:20:27:20 | IntegerLiteral | 0 |
| test_boolean.py:27:10:27:43 | ControlFlowNode for BoolExpr | Backward flow: $@ flows to $@ (max timestamp $@) | test_boolean.py:27:59:27:59 | IntegerLiteral | 2 | test_boolean.py:27:10:27:14 | ControlFlowNode for False | False | test_boolean.py:27:20:27:20 | IntegerLiteral | 0 |
| test_boolean.py:40:10:40:61 | ControlFlowNode for BoolExpr | Backward flow: $@ flows to $@ (max timestamp $@) | test_boolean.py:40:86:40:86 | IntegerLiteral | 3 | test_boolean.py:40:10:40:10 | ControlFlowNode for IntegerLiteral | IntegerLiteral | test_boolean.py:40:16:40:16 | IntegerLiteral | 0 |
| test_boolean.py:46:10:46:61 | ControlFlowNode for BoolExpr | Backward flow: $@ flows to $@ (max timestamp $@) | test_boolean.py:46:86:46:86 | IntegerLiteral | 3 | test_boolean.py:46:10:46:10 | ControlFlowNode for IntegerLiteral | IntegerLiteral | test_boolean.py:46:16:46:16 | IntegerLiteral | 0 |
| test_boolean.py:52:10:52:95 | ControlFlowNode for BoolExpr | Backward flow: $@ flows to $@ (max timestamp $@) | test_boolean.py:52:120:52:120 | IntegerLiteral | 4 | test_boolean.py:52:11:52:47 | ControlFlowNode for BoolExpr | BoolExpr | test_boolean.py:52:63:52:63 | IntegerLiteral | 2 |

View File

@@ -4,6 +4,8 @@
* exist timestamps a in A and b in B with a < b.
*/
import python
import TimerUtils
import OldCfgImpl
private module Utils = EvalOrderCfgUtils<OldCfg>;

View File

@@ -2,6 +2,8 @@
* Checks that every annotated CFG node belongs to a basic block.
*/
import python
import TimerUtils
import OldCfgImpl
private module Utils = EvalOrderCfgUtils<OldCfg>;

View File

@@ -3,6 +3,8 @@
* mutually exclusive CFG paths (neither can reach the other).
*/
import python
import TimerUtils
import OldCfgImpl
private module Utils = EvalOrderCfgUtils<OldCfg>;

View File

@@ -3,14 +3,14 @@
* Python control flow graph.
*/
private import python as PY
private import python as Py
import TimerUtils
/** Existing Python CFG implementation of the evaluation-order signature. */
module OldCfg implements EvalOrderCfgSig {
class CfgNode = PY::ControlFlowNode;
class CfgNode = Py::ControlFlowNode;
class BasicBlock = PY::BasicBlock;
class BasicBlock = Py::BasicBlock;
CfgNode scopeGetEntryNode(PY::Scope s) { result = s.getEntryNode() }
CfgNode scopeGetEntryNode(Py::Scope s) { result = s.getEntryNode() }
}

View File

@@ -1,7 +1,7 @@
| test_boolean.py:9:10:9:43 | ControlFlowNode for BoolExpr | Strict forward violation: $@ flows to $@ | test_boolean.py:9:59:9:59 | IntegerLiteral | timestamp 2 | test_boolean.py:9:19:9:19 | IntegerLiteral | timestamp 0 |
| test_boolean.py:15:10:15:43 | ControlFlowNode for BoolExpr | Strict forward violation: $@ flows to $@ | test_boolean.py:15:50:15:50 | IntegerLiteral | timestamp 1 | test_boolean.py:15:20:15:20 | IntegerLiteral | timestamp 0 |
| test_boolean.py:21:10:21:42 | ControlFlowNode for BoolExpr | Strict forward violation: $@ flows to $@ | test_boolean.py:21:49:21:49 | IntegerLiteral | timestamp 1 | test_boolean.py:21:19:21:19 | IntegerLiteral | timestamp 0 |
| test_boolean.py:27:10:27:34 | ControlFlowNode for BoolExpr | Strict forward violation: $@ flows to $@ | test_boolean.py:27:50:27:50 | IntegerLiteral | timestamp 2 | test_boolean.py:27:20:27:20 | IntegerLiteral | timestamp 0 |
| test_boolean.py:27:10:27:43 | ControlFlowNode for BoolExpr | Strict forward violation: $@ flows to $@ | test_boolean.py:27:59:27:59 | IntegerLiteral | timestamp 2 | test_boolean.py:27:20:27:20 | IntegerLiteral | timestamp 0 |
| test_boolean.py:40:10:40:61 | ControlFlowNode for BoolExpr | Strict forward violation: $@ flows to $@ | test_boolean.py:40:86:40:86 | IntegerLiteral | timestamp 3 | test_boolean.py:40:16:40:16 | IntegerLiteral | timestamp 0 |
| test_boolean.py:46:10:46:61 | ControlFlowNode for BoolExpr | Strict forward violation: $@ flows to $@ | test_boolean.py:46:86:46:86 | IntegerLiteral | timestamp 3 | test_boolean.py:46:16:46:16 | IntegerLiteral | timestamp 0 |
| test_boolean.py:52:10:52:95 | ControlFlowNode for BoolExpr | Strict forward violation: $@ flows to $@ | test_boolean.py:52:120:52:120 | IntegerLiteral | timestamp 4 | test_boolean.py:52:63:52:63 | IntegerLiteral | timestamp 2 |

View File

@@ -4,6 +4,8 @@
* NOT dominate A (forward edge), requires max(A) < min(B).
*/
import python
import TimerUtils
import OldCfgImpl
private module Utils = EvalOrderCfgUtils<OldCfg>;

View File

@@ -24,7 +24,7 @@ def test_or_short_circuit(t):
@test
def test_or_both_sides(t):
# False or X — both operands evaluated, result is X
x = (False @ t[0] or 42 @ t[1]) @ t[dead(1), 2]
x = (False @ t[0] or 42 @ t[1, dead(2)]) @ t[dead(1), 2]
@test

View File

@@ -85,7 +85,7 @@ def test_nested_if_else(t):
else:
z = 2 @ t[dead(4)]
else:
z = 3 @ t[dead(4)]
z = 3 @ t[dead(3), dead(4)]
w = 0 @ t[5]

View File

@@ -0,0 +1,41 @@
/**
* Inline-expectations test for the store/load/delete/parameter
* classification predicates on the new-CFG facade.
*
* Each tag fires when the corresponding predicate (`isLoad`,
* `isStore`, `isDelete`, `isParameter`, `isAugLoad`, `isAugStore`)
* holds on the canonical CFG node wrapping a `Py::Name` with the
* given identifier. Subscript and attribute stores are not covered
* by these tags — only the `Name`-typed targets/loads they involve.
*/
import python
import semmle.python.controlflow.internal.Cfg as Cfg
import utils.test.InlineExpectationsTest
module StoreLoadTest implements TestSig {
string getARelevantTag() { result = ["load", "store", "delete", "param", "augload", "augstore"] }
predicate hasActualResult(Location location, string element, string tag, string value) {
exists(Cfg::NameNode n |
location = n.getLocation() and
element = n.toString() and
value = n.getId() and
(
n.isLoad() and not n.isAugLoad() and tag = "load"
or
n.isStore() and not n.isAugStore() and tag = "store"
or
n.isDelete() and tag = "delete"
or
n.isParameter() and tag = "param"
or
n.isAugLoad() and tag = "augload"
or
n.isAugStore() and tag = "augstore"
)
)
}
}
import MakeTest<StoreLoadTest>

View File

@@ -0,0 +1,56 @@
# Store/load/delete/parameter classification on the new-CFG facade.
#
# Each annotated location carries the (sorted, deduplicated) set of
# kinds the CFG facade reports there. Comparing against the legacy
# 'semmle.python.Flow' classification is done by the comparison query
# 'StoreLoadParity.ql' — annotations here are only the positive
# assertions for the new facade.
#
# Tags:
# load=<id> -- isLoad() fires on the Name
# store=<id> -- isStore() fires
# delete=<id> -- isDelete() fires
# param=<id> -- isParameter() fires
# augload=<id> -- isAugLoad() fires (the LHS of x += ... when read)
# augstore=<id> -- isAugStore() fires (the LHS of x += ... when written)
# --- plain load / store / delete ---
x = 1 # $ store=x
y = x + 1 # $ store=y load=x
print(y) # $ load=print load=y
del x # $ delete=x
# --- function definitions (parameters) ---
def f(a, b=2, *args, c, **kwargs): # $ store=f param=a param=b param=args param=c param=kwargs
return a + b + c # $ load=a load=b load=c
# --- augmented assignment splits one Name into load + store halves ---
def aug(): # $ store=aug
n = 0 # $ store=n
n += 1 # $ augload=n augstore=n
return n # $ load=n
# --- subscript / attribute stores ---
class C: # $ store=C
pass
def stores(obj, container, idx): # $ store=stores param=obj param=container param=idx
obj.attr = 1 # $ load=obj
container[idx] = 2 # $ load=container load=idx
return obj # $ load=obj
# --- tuple unpacking ---
def unpack(pair): # $ store=unpack param=pair
a, b = pair # $ store=a store=b load=pair
return a + b # $ load=a load=b