The new CFG previously only emitted exception edges for explicit `raise`
and `assert` statements. As a result, code that became reachable only
via the exception path of an arbitrary expression (e.g., the body of an
`except` handler following a try-body whose `call()` could raise) was
classified as dead, breaking analyses like StackTraceExposure,
FileNotAlwaysClosed, ExceptionInfo, UseOfExit, and CatchingBaseException.
This commit adds a `mayThrow` predicate over expressions that are known
sources of implicit exceptions in Python (calls, attribute access,
subscripts, arithmetic/comparison operators, imports, await/yield/yield
from) plus `from m import *` at the statement level, and routes them
through the shared CFG's `beginAbruptCompletion(_, _, ExceptionSuccessor,
always=false)` hook.
The set of exception sources is restricted to nodes that are
syntactically inside a `try`/`with` statement in the same scope.
This mirrors Java's `ControlFlowGraph::mayThrow`, which only emits
exception edges where local handling can observe them — outside such
contexts, the edges add CFG complexity (weakening BarrierGuard
precision and breaking SSA continuity around augmented assignments and
subscript stores) without analysis benefit, since exceptions just
propagate to the function exit anyway.
Net effect on the test suite: ~100 alerts restored across the exception-
related query tests (StackTraceExposure +29, ExceptionInfo +17,
FileNotAlwaysClosed +52, UseOfExit +1, CatchingBaseException restored)
with no precision regressions. Affected `.expected` files and the
regression-guard `dead_under_no_raise.py` are updated accordingly.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds the public facade on top of the AstNodeImpl adapter from the
previous commit. Re-exposes the same API surface as
semmle/python/Flow.qll (ControlFlowNode, CallNode, BasicBlock,
NameNode, DefinitionNode, CompareNode, ...), backed by the shared
codeql.controlflow.ControlFlowGraph library.
- semmle.python.controlflow.internal.Cfg — public facade.
- ControlFlow/store-load/* — basic store/load coverage via the facade.
The new CFG library is added additively: it has zero callers in lib/
and src/, and the legacy CFG in semmle/python/Flow.qll remains the
default. Dataflow, SSA, and production query migration land in
follow-up PRs.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Preparatory refactor for the shared-CFG dataflow migration. Adds the
adapter that mediates between the Python AST and the shared
codeql.controlflow.ControlFlowGraph signature, plus the test suites
that validate the new CFG directly against this adapter. The public
facade is added in the following commit.
Library additions:
- semmle.python.controlflow.internal.AstNodeImpl — wraps Python's
Stmt/Expr/Scope/Pattern and adds two synthetic kinds of node
(BlockStmt for body slots, intermediate nodes for multi-operand
boolean expressions) to satisfy the shared CFG signature.
- lib/printCfgNew.ql — debug/visualisation query for the new CFG.
- consistency-queries/CfgConsistency.ql — consistency query running
the shared CFG's standard checks against Python.
Test additions (all driven directly off AstNodeImpl):
- ControlFlow/bindings/* — annotation-driven SSA-binding tests
(annassign, compound, comprehension, decorated, except_handler,
imports, match_pattern, parameters, simple, type_params,
walrus_starred, with_stmt, dead_under_no_raise).
- ControlFlow/evaluation-order/NewCfg*.ql — mirrors of the existing
OldCfg evaluation-order self-validation suite, run against the
new CFG via NewCfgImpl.qll.
- Minor extensions to existing test_if.py / test_boolean.py +
cosmetic .expected churn on a handful of OldCfg tests.
No dataflow, SSA, or production query is migrated yet.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Preparatory refactor for the shared-CFG dataflow migration.
Deprecates the AstNode.getAFlowNode() cached predicate on the public
Python QL API and rewrites all ~140 internal callers across lib/, src/,
test/, and tools/ from `expr.getAFlowNode() = cfgNode` to
`cfgNode.getNode() = expr`, using ControlFlowNode.getNode() which
already exists in Flow.qll.
The predicate itself is preserved (with a deprecation note pointing at
the new pattern) so external users do not experience churn — they can
migrate at their own pace and the AST/CFG hierarchies still get the
intended untangling once the deprecation eventually elapses.
Semantic noop verified by:
- All 361 lib/ + src/ queries compile clean.
- All 122 ControlFlow + PointsTo library-tests pass.
- All 64 dataflow library-tests pass.
- All 113 Variables/Exceptions/Expressions/Statements/Functions/Imports/
Security/CWE-798/ModificationOfParameterWithDefault query-tests pass.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
We now find an alert on this line as we hope to
It is not an alert for _full_ SSRF, though, since that configuration cannot handle multiple substitutions.
- remove `tupleStoreStep` and `dictStoreStep` from `containerStep`
These are imprecise compared to the content being precise.
- add implicit reads to recover taint at sinks
- add implicit read steps for decoders
to supplement the `AdditionalTaintStep`
that now only covers when the full container is tainted.
This one is potentially a bit iffy -- it checks for a very powerful
property (that implies many of the other queries), but as the test
results show, it can produce false positives when there is in fact no
problem. We may want to get rid of it entirely, if it becomes too noisy.
This looks for nodes annotated with `t[never]` in the test that are
reachable in the CFG. This should not happen (it messes with various
queries, e.g. the "mixed returns" query), but the test shows that in a
few particular cases (involving the `match` statement where all cases
contain `return`s), we _do_ have reachable nodes that shouldn't be.
This one demonstrates a bug in the current CFG. In a dictionary
comprehension `{k: v for k, v in d.items()}`, we evaluate the value
before the key, which is incorrect. (A fix for this bug has been
implemented in a separate PR.)
These use the annotated, self-verifying test files to check various
consistency requirements.
Some of these may be expressing the same thing in different ways, but
it's fairly cheap to keep them around, so I have not attempted to
produce a minimal set of queries for this.
These tests consist of various Python constructions (hopefully a
somewhat comprehensive set) with specific timestamp annotations
scattered throughout. When the tests are run using the Python 3
interpreter, these annotations are checked and compared to the "current
timestamp" to see that they are in agreement. This is what makes the
tests "self-validating".
There are a few different kinds of annotations: the basic `t[4]` style
(meaning this is executed at timestamp 4), the `t[dead(4)]` variant
(meaning this _would_ happen at timestamp 4, but it is in a dead
branch), and `t[never]` (meaning this is never executed at all).
In addition to this, there is a query, MissingAnnotations, which checks
whether we have applied these annotations maximally. Many expression
nodes are not actually annotatable, so there is a sizeable list of
excluded nodes for that query.
We won't be able to run these tests until Python 3.15 is actually out
(and our CI is using it), so it seemed easiest to just put them in their
own test directory.