Commit Graph

9994 Commits

Author SHA1 Message Date
yoff
ccfaa6ea7f Python: migrate dataflow library to new CFG + shared SSA
Switches the trunk dataflow library and all in-tree consumers
(frameworks, ApiGraphs, Concepts, regexp, security customisations,
test harness) from the legacy Flow.qll/ESSA stack to the new
shared-CFG facade (Cfg.qll) and the ESSA-shaped adapter on the
shared-SSA library (SsaImpl.qll).

Highlights:

  * DataFlowPublic/Private/Dispatch, Attributes, VariableCapture,
    IterableUnpacking, ImportResolution, ImportStar, LocalSources,
    TaintTrackingPrivate, MatchUnpacking, TypeTrackingImpl,
    SsaImpl, Builtins all now qualify CFG/SSA references with
    Cfg:: / SsaImpl:: and stop pulling in semmle.python.essa.*.

  * AstNodeImpl.qll/Cfg.qll: ImportMember exposes its inner
    ImportExpr, DefinitionNode.getValue covers Alias / AnnAssign /
    AugAssign / AssignExpr / For-target / Parameter-default,
    ForNode is treated as an expression node, AnnotatedExitNode is
    canonical, and BoolExprNode.getAnOperand drops the dominance
    constraint that did not hold for short-circuit BBs.

  * SsaImpl.qll: parameters always get a ParameterDefinition (so
    unused parameters still have SSA defs), scope-entry defs for
    module globals require an actual store somewhere, scope-exit
    has a synthetic use so reaching-defs survives to module
    boundary, and the legacy SsaSourceVariable / EssaVariable
    surface (getName, getScope, getAUse, getASourceUse,
    getAnImplicitUse) is reinstated for downstream queries.

  * DataFlowPublic.qll: GuardNode redesigned around the new
    structural outcome nodes (isAfterTrue / isAfterFalse).  The
    legacy ConditionBlock + flipped indirection is gone;
    controlsBlock walks UP through 'not' / '==True' / 'is False'
    etc. via outcomeOfGuard, accumulating polarity cleanly.  Only
    BarrierGuard<...> is preserved as public API.

  * ModuleVariableNode.getAWrite and LocalFlow::definitionFlowStep
    bypass SSA and consult Cfg::NameNode.defines /
    Cfg::DefinitionNode.getValue directly, so that write defs
    pruned by shared SSA (because the variable has no in-scope
    read) still produce dataflow steps.

  * Frameworks + downstream consumers: replace
    EssaVariable.hasDefiningNode, getAReturnValueFlowNode,
    Parameter.getDefault, Scope.getEntryNode / getANormalExit etc.
    with CFG-side bridges through Cfg::ControlFlowNode.

The legacy Flow.qll / Essa.qll stack is untouched and remains
available for queries that import it directly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-26 07:20:44 +00:00
yoff
cccd207ae7 Python: remove getAFlowNode() — bridge AST→CFG only via CFG-side getNode()
Option 2: eliminates the AST→CFG bridge from the AST layer. Previously
'AstNode.getAFlowNode()' returned a 'ControlFlowNode' from the legacy
'Flow.qll' CFG via 'py_flow_bb_node' — this hardcoded the AST to know
about the legacy CFG, preventing files from cleanly switching to the
new shared CFG.

Removes:
  * 'AstNode.getAFlowNode()' from 'AstExtended.qll'
  * Type-narrowing overrides on 'Attribute' / 'Subscript' / 'Call' /
    'IfExp' / 'Name' / 'NameConstant' / 'ImportMember' (in Exprs.qll
    and Import.qll)

Rewrites ~130 call sites across 'python/ql/lib/' and 'python/ql/src/'
to bridge from the CFG side instead:

  Before:  node = expr.getAFlowNode()
  After:   node.getNode() = expr

  Before:  expr.getAFlowNode().(DefinitionNode).getValue()
  After:   exists(DefinitionNode d | d.getNode() = expr | d.getValue())

  Before:  cn.operands(const.getAFlowNode(), op, x)
  After:   exists(ControlFlowNode c | c.getNode() = const | cn.operands(c, op, x))

This is semantically a no-op — both forms are duals of the same predicate.
Verified by passing all library tests:
  * 64 dataflow tests
  * 28 ControlFlow + dataflow-new-ssa tests
  * 1 essa SSA-compute test
  * 93 tests total in the focused suite

Once committed, files that want to switch from the legacy 'Flow' CFG
to the new 'Cfg' facade only need to change their imports — the
bridge sites are CFG-side and respect whichever ControlFlowNode is in
scope.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-21 13:08:37 +00:00
yoff
6814bcfb1b Python: SSA adapter: add MultiAssignmentDefinition, definedBy, useOfDef
Extends the ESSA-shaped adapter on top of the new shared SSA with the
remaining APIs consumed by the dataflow library:

  * MultiAssignmentDefinition: matches the AST pattern 'a, b = ...' where
    the LHS is a Tuple/List and the Name being defined is a sub-element.
    Used by IterableUnpacking.qll to recognise unpacking assignments.

  * EssaNodeDefinition.definedBy(var, defNode): a flatter equivalent of
    'getSourceVariable() = var and getDefiningNode() = defNode', matching
    legacy ESSA's signature. Used by DataFlowPublic.qll's
    ModuleVariableNode to enumerate writes of a global.

  * AdjacentUses::useOfDef(def, use): all reachable uses of a definition
    (firstUse plus transitive use-use adjacency). Used by guards in
    DataFlowPublic.qll.

These complete the API surface enumerated by grep across the dataflow
library. The remaining items (EssaNodeRefinement, EssaImportStep) are
ImportResolution-specific and will need separate treatment, possibly via
a different abstraction since the SSA library does not model heap-state
refinements like 'foo.bar = X'.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-19 12:19:28 +00:00
yoff
7521d838c0 Python: SSA: handle closure variables via per-scope entry defs
The new SSA's implicit entry-def predicate previously placed entries in
the variable's defining scope. For closure variables that's the outer
function, so inner functions had no entry def for the captured
variable — reads in the inner scope failed to resolve to any
definition.

Mirrors legacy ESSA's 'NonLocalVariable.getScopeEntryDefinition()':
place an implicit entry def at every reading scope's entry block,
independently of where the variable is *defined*. A closure variable
accessed in two nested functions and the outer one gets three entry
defs (one per reading scope).

Also makes 'ScopeEntryDefinition' extend 'EssaNodeDefinition' (matching
legacy ESSA), with 'getDefiningNode()' returning the scope's entry CFG
node. This requires extending the private 'writeDefNode' helper to
project i=-1 entries to bb.getNode(0).

Updates the new-vs-legacy comparison snapshot: closure-variable reads
('x:32:5'), nested global reads ('GLOBAL:52:1') now resolve. New
'def-only-new' entries appear for unbound names ('sum', 'open',
'compute') — the new SSA uniformly creates scope-entry defs for all
non-local reads, including those that legacy ESSA classifies as
builtin and excludes. This is a more uniform semantic and arguably
cleaner.

Updates the SsaTest 'some_undefined' annotation: previously documented
as a known limitation, now correctly resolves to a scope-entry def.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-19 12:06:15 +00:00
yoff
c4674bca14 Python: extend new SSA with ESSA-shaped adapter + baseline comparison test
Phase 0.5 - Adapter API on top of the shared SSA:

Adds the legacy-ESSA-shaped class hierarchy that the dataflow library
consumes, layered on the shared 'Ssa::Make' instantiation:

  * EssaDefinition / EssaNodeDefinition: the latter exposes
    'getDefiningNode()' (the CFG node at the def's index in its BB)
    and 'getVariable()' / 'getScope()'.
  * AssignmentDefinition: matches Assign, AnnAssign with value,
    AssignExpr and AugAssign target Names. Exposes 'getValue()'
    pointing at the RHS' CFG node.
  * ParameterDefinition: matches when the defining Name is in
    parameter context.
  * WithDefinition: matches 'with ... as x:' bindings.
  * ScopeEntryDefinition: implicit entry defs at synthetic position
    '-1' of the scope's entry basic block (non-local / global /
    builtin / captured reads).
  * PhiFunction (alias for PhiNode).
  * EssaVariable adapter wrapping a 'Ssa::Definition' with 'getAUse()',
    'getDefinition()', 'getAnUltimateDefinition()', and 'getName()'.
  * AdjacentUses module with 'firstUse' and 'adjacentUseUse' predicates
    bridging to 'Ssa::firstUse' / 'Ssa::adjacentUseUse'.

This is the minimum API the new dataflow's internals call into. The
richer legacy ESSA (refinement nodes, attribute refinements, edge
refinements) stays in 'semmle.python.essa.Essa' for legacy code.

Phase 0.6 - Comparison test:

Adds 'dataflow-new-ssa-vs-legacy/CmpTest.ql' that snapshots the
difference between definitions produced by new SSA vs legacy ESSA on
the same Python source. Baseline output records the current
'def-only-old' mismatches, grouped by category:

  * function/class/global definitions with no in-scope read (intentional;
    SSA is liveness-pruned)
  * captured / closure variables (real gap in new SSA - no
    closure-capture handling yet)
  * module variables __name__ / __package__ / $ (legacy ESSA implicit
    bindings)
  * exception 'as' bindings (depend on raise modelling)

Zero 'def-only-new' mismatches: the new SSA never produces a spurious
definition compared to legacy ESSA on this corpus.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-18 21:13:57 +00:00
yoff
30f0a3f2b9 Python: qualify Flow.qll's AST references with Py:: prefix
Prepares Flow.qll for co-existence with the new CFG facade by switching
'import python' to 'import python as Py' and qualifying every AST-class
reference inside Flow.qll's body. Flow.qll's own CFG types
(ControlFlowNode, BasicBlock, CallNode, NameNode, etc.) keep their
unqualified names.

This change is a no-op semantically:
  * all 24 evaluation-order tests still pass,
  * the bindings + store-load + new-CFG-SSA library tests still pass,
  * compilation produces zero errors.

The change enables a follow-up commit to swap python.qll's
'import semmle.python.Flow' for 'import semmle.python.controlflow.internal.Cfg'
without triggering name-clash errors inside Flow.qll itself. Legacy
modules that still want the legacy CFG (essa/, GuardedControlFlow,
LegacyPointsTo, objects/, pointsto/, types/, dataflow/old/) will need a
similar treatment in subsequent commits.

The qualification was applied mechanically via a script that prefixed
every reference to a known AST class. The list includes the standard
AST node types from semmle.python.{Files, Variables, Stmts, Exprs,
Class, Function, Patterns, Comprehensions} plus 'Location' / 'File' /
'Folder' / 'Container' / 'ConditionBlock' / 'Delete' / 'Load'.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-18 19:19:20 +00:00
yoff
bd9f65478e Python: bring Cfg.qll's facade to API parity with Flow.qll
Adds the methods and type-narrowing overrides needed for Cfg.qll to be
a drop-in replacement for Flow.qll's CFG API surface:

  * 'override getNode()' type narrowing on all AST-shape subclasses
    (CallNode -> Py::Call, AttrNode -> Py::Attribute, ImportExprNode
    -> Py::ImportExpr, etc.). This lets callers chain methods like
    'iexpr.getNode().isRelative()' that previously failed because
    'getNode()' returned the generic AstNode.

  * 'ControlFlowNode.isBranch()' -- true and/or false successor exists.
  * 'ControlFlowNode.getAChild()' -- CFG-level child traversal via the
    AST's getAChildNode, with dominance constraint.
  * 'ControlFlowNode.strictlyReaches(other)' -- node-level reachability.
  * 'NameNode.isSelf()' -- AST-level approximation: uses the 'Variable'
    that is the first parameter of an enclosing method.
  * 'BinaryExprNode.operands(left, op, right)' + 'getAnOperand()'.
  * 'BoolExprNode.getAnOperand()'.
  * 'ForNode.getSequence()' (alias for 'getIter') and
    'ForNode.iterates(target, sequence)'.
  * 'ForNode' / 'RaiseStmtNode' type-narrowing overrides.
  * 'ExceptFlowNode.getName()' / 'ExceptGroupFlowNode.getName()'
    -- the bound 'as'-name CFG node.
  * 'DictNode.getAKey()' (only 'getAValue' was present).

These additions are independent of the dataflow-migration approach
(option 4 vs option 5). They close the API-parity gap identified
during the Option-5 investigation; with them in place, hundreds of
type-resolution errors that previously appeared when swapping Cfg for
Flow at the python.qll level go away.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-18 18:24:38 +00:00
yoff
87b2e2fb0f Python: fix augstore for the new CFG and add store/load test
In the legacy CFG the same Python 'Name' that is the target of an
augmented assignment has two distinct CFG nodes — a load node (context
3) earlier in the basic block and a store node (context 5) later.
'augstore(load, store)' relates the pair via dominance.

The new (shared) CFG canonicalises each AST expression to a single
CFG node, so 'load' and 'store' collapse to one. The dominance-based
'augstore' from the legacy implementation no longer holds (it would
require 'load.strictlyDominates(load)'), so 'isAugLoad' / 'isAugStore'
never fired and 'isStore' missed the AugAssign target entirely.

Redefines 'augstore' as reflexive on the AugAssign target's canonical
CFG node. With this change:

  * isAugLoad / isAugStore both fire on the single canonical node.
  * isStore fires (via 'or augstore(_, this)') — matching the legacy
    classification that an augmented-assignment target is a store.
  * isLoad does not fire (excluded by 'not augstore(_, this)').

Adds 'python/ql/test/library-tests/ControlFlow/store-load/' covering
plain load/store/delete, parameters, augmented assignment, tuple
unpacking, attribute and subscript stores. The test asserts the
classification directly on the new-CFG facade.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-18 12:29:37 +00:00
yoff
2bb8e90476 Python: introduce shared-SSA adapter on the new CFG
Adds 'python/ql/lib/semmle/python/dataflow/new/internal/SsaImpl.qll', a
minimal Python SSA implementation built on the shared SSA library
('codeql.ssa.Ssa::Make<Location, Cfg, Input>'). The structure mirrors
Java's adapter at 'java/ql/lib/semmle/code/java/dataflow/internal/SsaImpl.qll'.

Key design choices:

  * 'SourceVariable' wraps 'Py::Variable'. Only variables that are read
    or deleted somewhere are tracked - write-only variables don't
    benefit from SSA construction.

  * Variable references are positional ('BasicBlock', 'int') pairs
    looked up via 'Cfg::NameNode.defines'/'.uses'/'.deletes' (which
    themselves are one-line bridges to AST-level 'Name.defines' etc.).

  * Parameter writes are not synthesised: parameter Name nodes are
    already wired into the CFG (per the earlier C#-style parameter
    extension in 'AstNodeImpl.qll'), so the regular 'variableWrite'
    path handles them at their natural CFG index.

  * Non-local / captured / global / builtin variables read in a scope
    but not written in it receive a synthetic entry definition at
    index '-1' of the scope's entry basic block. This matches Java's
    'hasEntryDef'.

  * 'del x' is modelled as a certain write at the deletion site.

Includes an inline-expectations test under
'python/ql/test/library-tests/dataflow-new-ssa/' covering:
plain parameter pass-through, simple assignment + read, reassignment
with dead-write pruning, if/else with phi insertion at the join, and
an undefined-name read (currently a known limitation - no SSA flow
without an enclosing definition).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-18 11:03:45 +00:00
yoff
0da0633159 Python: introduce new-CFG facade
Adds 'Cfg.qll' alongside 'AstNodeImpl.qll' in the controlflow internal
package. The facade re-exposes the same API surface as the legacy
'semmle/python/Flow.qll' (ControlFlowNode, BasicBlock, NameNode, CallNode,
AttrNode, ImportExprNode, ImportMemberNode, ImportStarNode, SubscriptNode,
CompareNode, IfExprNode, AssignmentExprNode, BinaryExprNode, BoolExprNode,
UnaryExprNode, DefinitionNode, DeletionNode, ForNode, RaiseStmtNode,
StarredNode, ExceptFlowNode, ExceptGroupFlowNode, TupleNode, ListNode,
SetNode, DictNode, IterableNode, NameConstantNode), but is implemented
on top of the new shared CFG via 'AstNodeImpl.qll'.

The variable-identity predicates ('NameNode.defines', '.uses',
'.deletes', '.isLocal', '.isNonLocal', ...) are one-line bridges to the
underlying AST predicates ('Name.defines', '.uses', '.deletes'),
mirroring the Java pattern.

Re-exports 'EntryBasicBlock' and 'dominatingEdge/2' from the shared
'BB::CfgSig' produced by 'AstNodeImpl.qll', so downstream consumers
(e.g. the SSA adapter) can wire the new CFG into other shared modules
that expect a 'CfgSig' implementation.

This facade is not yet consumed by the dataflow library — that is the
next phase.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-18 11:03:32 +00:00
yoff
558cd5b00c Python: test dead bindings under no-raise CFG abstraction
Adds 'dead_under_no_raise.py' to the bindings test suite, capturing the
three CPython patterns where bindings legitimately have no CFG node
because the surrounding code is unreachable under the 'no expressions
raise' abstraction:

  1. Statements after a 'try: return X; except: pass' block.
  2. The 'else:' clause of a try whose body always raises.
  3. Cache-lookup pattern 'try: return cache[k]; except: pass' followed
     by computation and store.

These bindings intentionally carry no 'cfgdefines=' annotations. If
raise modelling is later added to the CFG, the BindingsTest will surface
the new CFG nodes as unexpected results and this file will need to be
revisited.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-18 10:48:43 +00:00
yoff
77149be759 Python: wire PEP 695 type parameters into the shared CFG (green)
Adds CFG coverage for the binding 'Name's introduced by PEP 695
type-parameter syntax on functions, classes, and 'type' aliases:

  def func[T](...): ...
  class Box[T]: ...
  def multi[T: int, *Ts, **P](...): ...
  type Alias[T] = ...

For each parametrised AST node, the type-parameter names (and, for
'type' aliases, the alias name itself) are added as children of the
enclosing CFG node so that 'Name.defines(v)' has a corresponding
position. Bounds and defaults are intentionally not wired (they have
no SSA-relevant semantics for our purposes).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-18 09:58:27 +00:00
Copilot
aa8402d1b7 Python: wire match-pattern bindings into the shared CFG (green)
Adds concrete `Pattern` subclasses in `AstNodeImpl.qll` for every
`MatchPattern` AST kind, with `getChild` overrides that expose
sub-patterns and bound Names. Specifically:

- MatchCapturePattern (`case x:`) -> getVariable()
- MatchAsPattern (`case … as v:`) -> getPattern(), getAlias()
- MatchStarPattern (`case [*rest]:`) -> getTarget()
- MatchSequencePattern (`case [a, b]:`) -> getPattern(i)
- MatchClassPattern (`case Cls(p, q, k=v)`) -> getClass(), positional, keyword
- MatchMappingPattern (`case {k: v}:`) -> getMapping(i)
- MatchKeyValuePattern, MatchKeywordPattern, MatchDoubleStarPattern
- MatchOrPattern, MatchLiteralPattern, MatchValuePattern

Without these, every Name bound by a match pattern lacked a CFG node.
Removes the corresponding MISSING: annotations from match_pattern.py
(all 11 cases).

Verified: all 24 ControlFlow/evaluation-order tests still pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-12 13:11:18 +00:00
Copilot
b8c093eefb Python: wire import-statement bindings into the shared CFG (green)
Adds `ImportStmt` and `ImportStarStmt` wrappers in `AstNodeImpl.qll`.
For each `Alias` in an import statement, both the value (module/member
expression) and the bound `asname` Name become children of the CFG node
for the import statement, in evaluation order.

Without this, every `Name` introduced by `import` / `from .. import ..`
lacked a CFG node, even though `Name.defines(v)` returns true for it on
the AST side. This was the highest-volume gap: 20,332 missing import
aliases across CPython.

Removes the corresponding MISSING: annotations from imports.py.

Verified: all 24 ControlFlow/evaluation-order tests still pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-12 12:35:47 +00:00
Copilot
0742e1e901 Python: wire parameters into the shared CFG (C# pattern)
Implements `AstSig::Parameter` and `callableGetParameter(c, i)` in
`AstNodeImpl.qll`, following the C# template
(`csharp/.../ControlFlowGraph.qll:147-156`) rather than Java's
`Parameter() { none() }`.

Each Python parameter (positional, *args, keyword-only, **kwargs) now
becomes a CFG node at a stable position in the enclosing callable's
entry sequence. Defaults still evaluate at function-definition time
via `FunctionDefExpr.getDefault` / `LambdaExpr.getDefault`, so
`Parameter::getDefaultValue()` returns `none()` (the shared CFG
library calls this to model the missing-argument fallback, which
Python does not surface at the CFG level).

The bindings test now exercises parameters (the `py_expr_contexts(_, 4, ...)`
exclusion has been removed). A new `parameters.py` test case covers
positional, defaulted, vararg, kwarg, keyword-only, kitchen-sink,
method (self/cls), lambda, and PEP 570 positional-only parameters.
Several other test files were updated to annotate parameters that the
test had previously hidden (synthetic `.0` comprehension parameter,
method `self`, decorator `f`, etc.).

Verified:
- All 24 ControlFlow/evaluation-order tests still pass.
- CFG consistency query (`python/ql/consistency-queries/CfgConsistency.ql`)
  shows zero violations on CPython.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-12 12:29:28 +00:00
Copilot
a20af3cf41 Python: wire AnnAssign into the shared CFG (green)
Adds an `AnnAssignStmt` wrapper in `AstNodeImpl.qll` so that PEP 526
annotated assignments (`x: int = 1`, `x: int`) participate in the
control flow graph. Evaluation order follows CPython: annotation,
optional value, target binding.

Without this, `x: int = 1` had no CFG node for `x` even though
`Name.defines(v)` returns true for it on the AST side. SSA built on
the new CFG would therefore miss every annotated-assignment write.

Removes the corresponding MISSING: annotations from the CFG-binding
gap test:
- annassign.py — all four cases now green.
- match_pattern.py — class-body annotated fields (`x: int`, `y: int`).
- type_params.py — `item: T` inside class.

Verified: all 24 ControlFlow/evaluation-order tests still pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-12 12:22:57 +00:00
Copilot
f077e7c27c Python: add CFG-binding gap tests (red)
Adds inline-expectation tests for the new shared CFG implementation in
python/ql/lib/semmle/python/controlflow/internal/AstNodeImpl.qll,
covering every Python binding construct that introduces a variable.

The test files use MISSING: annotations to record bindings whose
defining Name AST node is *not* currently reachable from the new CFG.
These are the 'red' half of red-green commit pairs: subsequent commits
will extend AstNodeImpl to cover each construct and remove the
corresponding MISSING: marker.

Confirmed-broken categories:
- Import aliases (from x import a)
- Annotated assignment (x: int = 1)
- Exception handler (except E as e)
- Match patterns (case x, case [a,b], case ... as v)
- PEP 695 type params (def f[T], class C[T])

Confirmed-working (no MISSING:):
- Compound targets, with-as, comprehensions, decorated def/class,
  walrus, starred.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-12 12:19:34 +00:00
Copilot
e816ba61ec Python: project via as* helpers outside characteristic predicates
Style cleanup: avoid naming newtype branch constructors (TPyStmt,
TPyExpr, TBlockStmt, TPattern, TBoolExprPair, TScope) outside the
char-preds that classify their wrappers. Method bodies and helper
predicates now use the as* projections instead:

  // Before: result = TBlockStmt(ifStmt.getBody())
  // After:  result.asStmtList() = ifStmt.getBody()

  // Before: result = TPyStmt(matchStmt.getCase(index))
  // After:  result.asStmt() = matchStmt.getCase(index)

Adds:

- AstNode.asStmtList() - the inverse of TBlockStmt(_).
- BinaryExpr.getIndex() - exposes the synthetic-pair index, used
  internally by getRightOperand to find the next pair without
  naming TBoolExprPair.

No behaviour change: all 24 NewCfg evaluation-order tests pass; all
11 shared-CFG consistency queries report 0 violations on CPython.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-07 20:00:55 +00:00
Copilot
3e0ae6ea3c Python: use newtype-branch constructors in characteristic predicates
Style cleanup: when a class's characteristic predicate binds via a
'cast' helper like

  IfStmt() { ifStmt = this.asStmt() }

prefer naming the newtype branch directly:

  IfStmt() { this = TPyStmt(ifStmt) }

This makes the wrapped representation explicit. Apply throughout:
~30 charpreds (every Stmt/Expr leaf wrapper, plus LoopStmt, BreakStmt,
ContinueStmt, BooleanLiteral, UnaryExpr, ArithUnaryExpr, Comprehension).

Method bodies that use asStmt/asExpr to project an underlying
Python AST node (Stmt.toString, BlockStmt.getEnclosingCallable,
UnaryExpr.getOperand, etc.) keep that form - they're projections,
not classifications.

No behaviour change: all 24 NewCfg evaluation-order tests pass; all
11 shared-CFG consistency queries report 0 violations on CPython.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-07 19:51:38 +00:00
Copilot
7fefacd56f Python: introduce TExpr union via newtype-branch alias
Mirror the TStmt refactor for the Expr hierarchy: rename the TExpr
newtype branch to TPyExpr and add

  private class TExpr = TPyExpr or TBoolExprPair;

This lets the public Expr class use TExpr directly:

  class Expr extends AstNodeImpl, TExpr { ... }

instead of

  class Expr extends AstNodeImpl {
    Expr() { this instanceof TExpr or this instanceof TBoolExprPair }
    ...
  }

No behaviour change: all 24 NewCfg evaluation-order tests pass; all
11 shared-CFG consistency queries report 0 violations on CPython.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-07 19:33:33 +00:00
Copilot
b682877968 Python: simplify TBlockStmt char pred via exclusion list
Replace the 14-disjunct allow-list with a 2-conjunct exclusion list.
Of the 17 Py::StmtList getters in AstGenerated.qll, only Try.getHandlers()
and MatchStmt.getCases() should not be wrapped as BlockStmts (they are
iterated individually by the shared library's Try/Switch logic via
getCatch(int) and getCase(int)). All other StmtLists are imperative
block bodies.

No behaviour change: all 24 NewCfg evaluation-order tests pass; all
11 shared-CFG consistency queries report 0 violations on CPython.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-07 17:13:50 +00:00
Copilot
c9445f74c2 Python: introduce TStmt union via newtype-branch alias
Rename the TStmt newtype branch to TPyStmt, and add a private union
type alias

  private class TStmt = TPyStmt or TBlockStmt;

This lets the public Stmt class use TStmt directly in its extends
clause:

  class Stmt extends AstNodeImpl, TStmt { ... }

instead of the previous

  class Stmt extends AstNodeImpl {
    Stmt() { this instanceof TStmt or this instanceof TBlockStmt }
    ...
  }

The same pattern is used in cpp/.../TInstruction.qll and
rust/.../Synth.qll.

No behaviour change: all 24 NewCfg evaluation-order tests pass; all
11 shared-CFG consistency queries report 0 violations on CPython.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-07 17:08:10 +00:00
Copilot
ed1709eb4a Python: use private-abstract + final-alias pattern for AstNode
Convert AstNode from a concrete class with empty default predicates into
a private abstract class plus a final alias, matching the pattern used
in cpp/.../EdgeKind.qll and cpp/.../IRVariable.qll:

  abstract private class AstNodeImpl extends TAstNode {
    abstract string toString();
    abstract Py::Location getLocation();
    abstract Callable getEnclosingCallable();
    ...
  }

  final class AstNode = AstNodeImpl;

This makes the compiler enforce that every concrete subclass implements
toString/getLocation/getEnclosingCallable, replacing the brittle
'empty default + per-branch override' arrangement. Sister classes
inside the module now extend AstNodeImpl instead of AstNode (which is
final and cannot be extended).

The empty Parameter stub gains explicit none() overrides for the
three abstract members, since QL requires them statically even when
the class has no instances.

No behaviour change: all 24 NewCfg evaluation-order tests pass; all
11 shared-CFG consistency queries report 0 violations on CPython.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-07 16:58:43 +00:00
Copilot
00c742f5ff Python: document why Assignment subclasses are empty
Explain that the shared library's Assignment / CompoundAssignment
hierarchy extends BinaryExpr, so it cannot host Python's statement-
level assignment forms (Assign, AugAssign), and that Python has no
short-circuiting compound operators (&&=, ||=, ??=) so all
subclasses remain empty.

No behaviour change; doc comments only.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-07 14:14:16 +00:00
Copilot
06e9bbc3a6 Python: index TBlockStmt by Py::StmtList instead of (parent, slot)
Replace the two-key TBlockStmt(Py::AstNode parent, string slot) newtype
branch with the simpler TBlockStmt(Py::StmtList sl). Each Py::StmtList
that represents an imperative block (function/class/module body, if/
while/for branch, try/except/finally body, case body, except/except*
body) becomes one BlockStmt directly. The slot string disappears;
toString just defers to Py::StmtList.toString() ('StmtList').

The newtype branch keeps an explicit characteristic predicate listing
the slots that count as block bodies. This excludes Try.getHandlers(),
which is a Py::StmtList of ExceptStmt items already iterated by the
shared library's Try logic via getCatch(int) - including it would
produce parallel CFG edges (verified: a permissive
TBlockStmt(Py::StmtList sl) version regressed CPython to 1720
multipleSuccessors and 584 deadEnds before this restriction).

Drops the getBodyStmtList helper. Caller sites now use the StmtList
accessor directly: TBlockStmt(ifStmt.getBody()),
TBlockStmt(tryStmt.getFinalbody()), etc.

No behaviour change: all 24 NewCfg evaluation-order tests pass; all
11 shared-CFG consistency queries report 0 violations on CPython.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-07 12:54:02 +00:00
Copilot
250b63c216 Python: unify Py::BoolExpr handling via TBoolExprPair
Previously a Py::BoolExpr appeared in two newtype branches: as TExpr(be)
(the outermost pair) and TBoolExprPair(be, i) for inner pairs of 3+
operand expressions. This forced BinaryExpr/LogicalAndExpr/LogicalOrExpr
to disjoin two cases, and the synthetic-pair handling spanned multiple
layers.

Restrict TExpr to non-BoolExpr Py::Expr, and extend TBoolExprPair to
cover every operand pair (index 0..n-2). Now every Py::BoolExpr is
represented uniformly as TBoolExprPair(_, 0) for the whole expression
and TBoolExprPair(_, i) for inner pairs.

Extend AstNode.asExpr() to also recover the underlying Py::BoolExpr
from TBoolExprPair(_, 0). This makes asExpr() the inverse of
construction: every 'result = TExpr(e)' turns into 'result.asExpr() = e',
which works uniformly for BoolExprs and non-BoolExprs alike.

Consequences:

- BinaryExpr now extends TBoolExprPair directly with a single uniform
  rule for left/right operands.
- LogicalAndExpr/LogicalOrExpr are one-line char preds via
  getBoolExpr().
- The private BoolExprPair wrapper class folds into BinaryExpr.
- 60+ leaf wrappers now read 'result.asExpr() = py_expr' instead of
  'result = TExpr(py_expr)'.

No behaviour change: all 24 NewCfg evaluation-order tests pass; all
11 shared-CFG consistency queries report 0 violations on CPython.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-07 12:04:54 +00:00
Copilot
77e36d3cfa Python: merge T*AstNode wrappers into matching public classes
Five of the six per-newtype-branch wrapper classes had a natural
public class corresponding to that branch:

  TStmtAstNode      -> Stmt        (TStmt subset; BlockStmt overrides for TBlockStmt)
  TExprAstNode      -> Expr        (TExpr subset; BoolExprPair overrides for TBoolExprPair)
  TScopeAstNode     -> Callable    (= TScope exactly)
  TPatternAstNode   -> Pattern     (= TPattern exactly)
  TBlockStmtAstNode -> BlockStmt   (= TBlockStmt exactly)

Move toString/getLocation/getEnclosingCallable onto these classes and
delete the wrappers.

The sixth wrapper (TBoolExprPair) has no exact public counterpart -
BinaryExpr is broader, including TExpr-branch BoolExprs - so it
remains as a small private class, renamed BoolExprPair.

No behaviour change: all 24 NewCfg evaluation-order tests pass; all
11 shared-CFG consistency queries report 0 violations on CPython.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-07 11:31:07 +00:00
Copilot
f2151fe232 Python: dispatch toString/getLocation/getEnclosingCallable per branch
Replace the three big disjunctive predicates on AstNode with empty
defaults plus per-newtype-branch override classes:

  AstNode.toString()              { none() }
  AstNode.getLocation()           { none() }
  AstNode.getEnclosingCallable()  { none() }

Six private subclasses (one per newtype branch — TStmt, TExpr,
TScope, TPattern, TBoolExprPair, TBlockStmt) override these with
the branch-specific implementation. This mirrors the per-class
dispatch already used for getChild.

No behaviour change: all 24 NewCfg evaluation-order tests pass and
all 11 shared-CFG consistency queries still report 0 violations on
CPython.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-07 11:06:40 +00:00
Copilot
9781ee8d66 Python: adapt to new shared CFG signature
Main added two new requirements to AstSig:
- A 'Parameter' class with a 'getDefaultValue()' method, plus a
  'callableGetParameter(Callable, int)' predicate.
- A 'CallableContext' class in InputSig1, replacing the previous
  'CallableBodyPartContext'.

Add stub implementations: 'Parameter' is empty (none()) and
'callableGetParameter' returns nothing, mirroring Java's TODO. Rename
'CallableBodyPartContext = Void' to 'CallableContext = Void' in the
Python Input module.

NewCfg evaluation-order tests still pass at the 22/24 baseline; all
11 shared-CFG consistency queries still report 0 violations on
CPython.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-05 15:26:20 +00:00
Copilot
03b8e8fdde Python: refactor getChild into per-class OO dispatch
Replace the single ~240-line top-level getChild predicate with one
override per AST class. AstNode declares a default

  AstNode getChild(int index) { none() }

and each subclass with children overrides it (41 classes total).
The top-level predicate becomes a one-line dispatch:

  AstNode getChild(AstNode n, int index) { result = n.getChild(index) }

No behavioral change: NewCfg evaluation-order tests still pass at the
same 22/24 baseline, and all 11 shared-CFG consistency queries still
report 0 violations on CPython.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-05 15:21:43 +00:00
Copilot
93112b2b75 Python: include try-else in getChild for completion propagation
The shared CFG library propagates abrupt completions from child to
parent via getChild(parent, _) = child. Python's try.getElse() was
wired into normal step rules but not listed in getChild(TryStmt, ...),
so return/break/continue/raise statements occurring inside a try-else
block had no parent path and ended up as dead-end CFG nodes.

Add the else block at index -2 (alongside finally at -1). This affects
only completion propagation; the normal-flow CFG is unchanged because
TryStmt has explicit step rules.

Verified on a CPython database: all 11 shared-CFG consistency queries
now pass with 0 violations (deadEnd: 244 -> 0).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-05 15:21:43 +00:00
Copilot
76724c5391 Shared CFG: support for-else and while-else loops
Add two default predicates to AstSig:

  default AstNode getWhileElse(WhileStmt loop) { none() }
  default AstNode getForeachElse(ForeachStmt loop) { none() }

When defined, the explicit-step rules for While/Do and Foreach
route the loop's normal-completion exits through the else block
before reaching the after-loop node:

  - WhileStmt: after-false condition -> before-else -> after-while
    (instead of directly after-while).
  - ForeachStmt: after-collection [empty] and the LoopHeader exit
    are both routed through before-else -> after-foreach.

Python's Ast module overrides the predicates to return the
synthetic BlockStmt for the orelse slot, replacing the previous
customisations in Input::step. This eliminates parallel direct
successors emitted by the previous Python-side step additions
(verified: multipleSuccessors on a CPython database goes from
1340 to 0).

Java and C# CFG tests are unaffected.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-05 15:21:43 +00:00
Copilot
edfe91832b Python: compact-renumber FunctionExpr/Lambda defaults
`Args.getDefault(int)` and `Args.getKwDefault(int)` are indexed by
argument position (with gaps for args without defaults), not by
default position. The CFG `getChild` predicate for FunctionDefExpr
and LambdaExpr therefore had gaps at low indices and collisions
where defaults and kwdefaults overlapped, producing parallel
edges before the FunctionExpr.

Use `rank` to compact-renumber `getDefault(n)` and `getKwDefault(n)`
in source order. Verified on a CPython database: removes ~536
`multipleSuccessors` consistency results (1340 -> 804); the rest are
`for/else` and `while/else`.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-05 15:21:43 +00:00
Copilot
58cda914db Python: collapse two-layer AstNodeImpl into a single Ast module
Merge the previous `Ast` and `AstSigImpl` modules into a single
`module Ast implements AstSig<Py::Location>`. Classes now use the
signature names (IfStmt, WhileStmt, ForeachStmt, etc.) and signature
predicates (getCondition, getThen, getElse, etc.) directly, with no
intermediate renaming layer.

Drop the TStmtListNode newtype branch entirely. Replace it with a
synthetic TBlockStmt(parent, slot) keyed by a parent AST node and a
slot label string ('body', 'orelse', 'finally'). Py::StmtList no
longer appears in the newtype; the BlockStmt class provides indexed
access to the underlying body items via getStmt(n).

All 22 of 24 evaluation-order tests still pass; the same 2
comprehension-related failures predate this refactor.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-05 15:21:43 +00:00
yoff
0b4a24884f python: add consistency checks
Co-authored-by: aschackmull <aschackmull@github.com>
2026-05-05 15:21:42 +00:00
yoff
3b0abad701 Python: add pattern nodes
Co-authored-by: Copilot <copilot@github.com>
2026-05-05 15:21:42 +00:00
Taus
68b3d57563 Cleanup, printCFG
Co-authored-by: yoff <yoff@github.com>
2026-05-05 15:21:42 +00:00
Taus
a33b49a3f3 WIP2 2026-05-05 15:21:42 +00:00
Taus
1af415bec3 WIP 2026-05-05 15:21:42 +00:00
Taus
e3155ea544 Python: Handle dict unpacking in calls
Co-authored-by: yoff <yoff@github.com>
2026-05-05 15:21:42 +00:00
Taus
04b8c4bc7e Python: Fix exception issue
Co-authored-by: yoff <yoff@github.com>
2026-05-05 15:21:42 +00:00
Taus
f85b532bb3 Python: Fix match
Co-authored-by: yoff <yoff@github.com>
2026-05-05 15:21:42 +00:00
Taus
0e1f1d9f09 Python: Support match
Co-authored-by: yoff <yoff@github.com>
2026-05-05 15:21:42 +00:00
Taus
53da31bd15 Python: More nodes
Not entirely sure about the `else:` blocks.

Co-authored-by: yoff <yoff@github.com>
2026-05-05 15:21:42 +00:00
Taus
1f82dbc583 Python: Comprehensions
Co-authored-by: yoff <yoff@github.com>
2026-05-05 15:21:41 +00:00
Taus
b229066891 Python: Add with
Co-authored-by: yoff <yoff@github.com>
2026-05-05 15:21:41 +00:00
Taus
0acbb12fb9 Python: More simple statements
Co-authored-by: yoff <yoff@github.com>
2026-05-05 15:21:41 +00:00
Taus
542efce4a6 Python: assignments
Co-authored-by: yoff <yoff@github.com>
2026-05-05 15:21:41 +00:00
Taus
2db400aebd Python: Attributes
Co-authored-by: yoff <yoff@github.com>
2026-05-05 15:21:41 +00:00
Taus
66bbb60614 Python: Function calls
Co-authored-by: yoff <yoff@github.com>
2026-05-05 15:21:41 +00:00