Preparatory refactor for the shared-CFG dataflow migration. Adds the
new Python CFG library additively, without changing any production
behaviour.
Library additions:
- semmle.python.controlflow.internal.AstNodeImpl — mediates between
the Python AST and the shared codeql.controlflow.ControlFlowGraph
signature. Wraps Python's Stmt/Expr/Scope/Pattern and adds two
synthetic kinds of node (BlockStmt for body slots, intermediate
nodes for multi-operand boolean expressions).
- semmle.python.controlflow.internal.Cfg — public facade
re-exposing the same API surface as semmle/python/Flow.qll
(ControlFlowNode, CallNode, BasicBlock, NameNode, DefinitionNode,
CompareNode, ...), backed by the shared CFG.
- lib/printCfgNew.ql — debug/visualisation query for the new CFG.
- consistency-queries/CfgConsistency.ql — consistency query running
the shared CFG's standard checks against Python.
Shared library:
- shared.controlflow.ControlFlowGraph — adds two defaulted
getWhileElse / getForeachElse predicates to AstSig so Python can
model while-else / for-else (no behavioural change for other
languages).
Test additions:
- ControlFlow/bindings/* — annotation-driven SSA-binding tests for
the new CFG (annassign, compound, comprehension, decorated,
except_handler, imports, match_pattern, parameters, simple,
type_params, walrus_starred, with_stmt, dead_under_no_raise).
- ControlFlow/store-load/* — basic store/load coverage.
- ControlFlow/evaluation-order/NewCfg*.ql — mirrors of the existing
OldCfg evaluation-order self-validation suite, run against the
new CFG via NewCfgImpl.qll.
- Minor extensions to existing test_if.py / test_boolean.py +
cosmetic .expected churn on a handful of OldCfg tests.
No dataflow, SSA, or production query is migrated yet — that lands in
follow-up PRs. The new CFG library has zero callers in lib/ and src/.
Verified by:
- All lib + src + consistency-queries compile clean (367 queries).
- All 56 ControlFlow library-tests pass.
- All 474 dataflow + PointsTo library-tests + consistency tests pass.
- syntax_error/CONSISTENCY/CfgConsistency passes.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This one is potentially a bit iffy -- it checks for a very powerful
property (that implies many of the other queries), but as the test
results show, it can produce false positives when there is in fact no
problem. We may want to get rid of it entirely, if it becomes too noisy.
This looks for nodes annotated with `t[never]` in the test that are
reachable in the CFG. This should not happen (it messes with various
queries, e.g. the "mixed returns" query), but the test shows that in a
few particular cases (involving the `match` statement where all cases
contain `return`s), we _do_ have reachable nodes that shouldn't be.
This one demonstrates a bug in the current CFG. In a dictionary
comprehension `{k: v for k, v in d.items()}`, we evaluate the value
before the key, which is incorrect. (A fix for this bug has been
implemented in a separate PR.)
These use the annotated, self-verifying test files to check various
consistency requirements.
Some of these may be expressing the same thing in different ways, but
it's fairly cheap to keep them around, so I have not attempted to
produce a minimal set of queries for this.
These tests consist of various Python constructions (hopefully a
somewhat comprehensive set) with specific timestamp annotations
scattered throughout. When the tests are run using the Python 3
interpreter, these annotations are checked and compared to the "current
timestamp" to see that they are in agreement. This is what makes the
tests "self-validating".
There are a few different kinds of annotations: the basic `t[4]` style
(meaning this is executed at timestamp 4), the `t[dead(4)]` variant
(meaning this _would_ happen at timestamp 4, but it is in a dead
branch), and `t[never]` (meaning this is never executed at all).
In addition to this, there is a query, MissingAnnotations, which checks
whether we have applied these annotations maximally. Many expression
nodes are not actually annotatable, so there is a sizeable list of
excluded nodes for that query.