Preparatory refactor for the shared-CFG dataflow migration. Adds the
adapter that mediates between the Python AST and the shared
codeql.controlflow.ControlFlowGraph signature, plus the test suites
that validate the new CFG directly against this adapter. The public
facade is added in the following commit.
Library additions:
- semmle.python.controlflow.internal.AstNodeImpl — wraps Python's
Stmt/Expr/Scope/Pattern and adds two synthetic kinds of node
(BlockStmt for body slots, intermediate nodes for multi-operand
boolean expressions) to satisfy the shared CFG signature.
- lib/printCfgNew.ql — debug/visualisation query for the new CFG.
- consistency-queries/CfgConsistency.ql — consistency query running
the shared CFG's standard checks against Python.
Test additions (all driven directly off AstNodeImpl):
- ControlFlow/bindings/* — annotation-driven SSA-binding tests
(annassign, compound, comprehension, decorated, except_handler,
imports, match_pattern, parameters, simple, type_params,
walrus_starred, with_stmt, dead_under_no_raise).
- ControlFlow/evaluation-order/NewCfg*.ql — mirrors of the existing
OldCfg evaluation-order self-validation suite, run against the
new CFG via NewCfgImpl.qll.
- Minor extensions to existing test_if.py / test_boolean.py +
cosmetic .expected churn on a handful of OldCfg tests.
No dataflow, SSA, or production query is migrated yet.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- remove `tupleStoreStep` and `dictStoreStep` from `containerStep`
These are imprecise compared to the content being precise.
- add implicit reads to recover taint at sinks
- add implicit read steps for decoders
to supplement the `AdditionalTaintStep`
that now only covers when the full container is tainted.
Fixes the test failures that arose from making `ExtractedArgumentNode`
local.
For the consistency checks, we now explicitly exclude the
`ExtractedArgumentNode`s (now much more plentiful due to the
overapproximation) that don't have a corresponding `getCallArg` tuple.
For various queries/tests using `instanceof ArgumentNode`, we instead us
`isArgumentNode`, which explicitly filters out the ones for which
`isArgumentOf` doesn't hold (which, again, is the case for most of the
nodes in the overapproximation).
When looking things over a bit more, we could actually exclude the steps
that would never be used instead. A much more involved solution, but
more performance oriented and clear in terms of what is supported (at
least until we start supporting type-tracking with more than depth 1
access-path, if that ever happens)
This would have found the problem in
https://github.com/github/codeql/pull/15755.
As highlighted in the comment in the code, it's not a perfect solution
since we don't have an automatic way to ensure we don't introduce a new
PhaseDependentFlow use with a new step relation and forget to add it to
this consistency check... but I think this consistency check still adds
value!
These are the labmda self references. This is similar to
how `BlockParameterArgumentNode` is excluded for Ruby.
It is important that we restrict `call` in this logic.
Otherwise, we get a cartesian product and the consistency
check runs for a very long time...