Compare commits

..

70 Commits

Author SHA1 Message Date
yoff
8cab5a20f2 Python: drop legacy essa import from ImportResolution
`ImportResolution.qll` was the last new-dataflow file with a direct
`import semmle.python.essa.SsaDefinitions`, used only for the
`SsaSource::init_module_submodule_defn` helper. Inline the 5-line body
as a local private predicate. No functional change — the inlined
predicate is clause-for-clause equivalent (the `f = init.getEntryNode()`
join only constrained `package = init`, since `Scope.getEntryNode()` is
unique per scope; we now express that constraint directly).

All 70 dataflow + ApiGraphs library-tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-26 13:10:00 +00:00
yoff
635134ffd2 Python: treat augmented-assignment targets as both load and store
The legacy CFG emitted two ControlFlowNodes for `x[i] += 42` (one load,
one store, with `load.strictlyDominates(store)`). The new CFG collapses
them to a single canonical node, mirroring Java's single-`VarAccess`
model where `isVarRead`/`isVarWrite` are non-disjoint on the same
expression. Reconcile two legacy two-node behaviours with the merged
single-node world:

1. `Cfg::ControlFlowNode.isLoad()` no longer excludes augmented
   targets — both `isLoad` and `isStore` hold on the merged canonical
   node, matching Java. `NameNode.defines` drops the now-redundant
   `not isLoad` guard; `Py::Name.defines` already filters by
   `isDefinition` (Store/Param/AugAssign-target ctx).

2. `LocalFlow::definitionFlowStep` is restricted to NameNode targets,
   matching legacy ESSA's `assignment_definition` which required
   `defn.(NameNode).defines(v)`. Subscript and attribute writes
   (`x[i] = 42`, `obj.attr = 42`) no longer emit a local-flow step
   *into* the LHS expression — that flow is handled by the AttrWrite
   and content-flow machinery. This is essential for keeping augmented
   Subscript/Attribute targets classifiable as `LocalSourceNode` on
   the read side, which the API graph requires for emitting Use edges.

`StoreLoadTest.ql` is updated to filter `isAugLoad` out of the regular
`load` tag, mirroring the pre-existing `not isAugStore` filter on the
`store` tag so augmented-assignment expectations remain
`augload=n augstore=n` (not also `load=n store=n`).

Closes the three remaining ApiGraphs library-test failures
(`getSubscript.ql` semantically, plus cosmetic toString updates in
`ModuleImportWithDots.ql` and `test_crosstalk.ql`).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-26 10:56:21 +00:00
yoff
dd2deacd00 Python: model from X import * as uncertain SSA writes
Add a 4th disjunct to `SsaImplInput::variableWrite` in the shared-SSA
adapter that mirrors legacy ESSA's `ImportStarRefinement`: every
variable whose scope is the import-star's scope, OR which is used in
the import-star's scope, gets an uncertain write at the `import *`
position.

Uncertain writes do not kill prior definitions; shared SSA's
`SsaUncertainWrite` joins the new value with the immediately-preceding
definition via `uncertainWriteDefinitionInput`. This is the equivalent
of legacy ESSA's two-input refinement.

Cannot depend on `ImportStar` / `ImportResolution` (those modules
import `SsaImpl`), so the predicate uses the structural heuristic on
`Cfg::ImportStarNode` directly.

This closes the two remaining failing dataflow library-tests:

- `import-star/global` — `module_export` chains via `from X import *`
  re-exports now resolve: the importing module has an SSA def of every
  re-exported name, so `lastUseVar` finds the read at the use site.
- `typetracking_imports/highlight_problem` — a direct `from .foo import
  foo` immediately followed by `from .other import *` is now correctly
  marked as dead at the direct import.

Two scope-entry-def noise rows in `highlight_problem.expected` are also
dropped — legacy ESSA needed them as refinement inputs, but shared SSA
handles uncertain writes without an explicit prior def. They were
always tagged `no use to normal exit` (dead).

Dataflow library-tests: 62/64 → 64/64 passing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-26 10:09:04 +00:00
yoff
6d930099d4 Python: update dataflow tests for new CFG + shared SSA
Test-side changes accompanying the dataflow migration:

  * Test queries (.ql) and shared test harness (TestSummaries,
    TestTaintLib) qualify CFG / SSA types with Cfg:: / SsaImpl::,
    bridge via AST (Name, Call, ...) instead of legacy NameNode /
    CallNode, and switch GlobalSsaVariable / EssaVariable usages
    to the new adapter API.

  * .expected files updated for legitimate precision and toString
    changes:
      - phi-node def-use edges newly exposed in def_use_counts.
      - scope-exit synthetic use surfaces one extra implicit use
        in use-use-counts.
      - For [empty]/[non-empty] outcome rows added in
        EnclosingCallable.
      - SsaSourceVariable / Global Variable label cosmetics
        normalised throughout.

  * Inline annotations:
      - typetracking/test.py: removed MISSING:tracked on lines
        93/95 (now found), added SPURIOUS:tracked on line 108
        (decorator over-reach).
      - global-flow/test.py: added SPURIOUS writes=g_mod on line
        20 (correctly reports immediately-overwritten write).
      - tainttracking/customSanitizer/test.py: marked
        try/except: ensure_tainted(s) cases as MISSING: tainted
        (no-raise CFG abstraction does not connect try body to
        except body).
      - coverage/test.py: marked
        SINK(return_from_inner_scope([])) as
        MISSING: flow=... pending closer investigation.

  * regression/{dataflow,custom_dataflow}.expected: accept two
    if/else cond-correlation over-reaches (documented limitation;
    same imprecision applies under legacy semantics by design).

After this change the dataflow library-tests stand at 62 of 64
passing; the two remaining failures are tracked under the
ImportStarRefinement workstream.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-26 07:21:06 +00:00
yoff
ccfaa6ea7f Python: migrate dataflow library to new CFG + shared SSA
Switches the trunk dataflow library and all in-tree consumers
(frameworks, ApiGraphs, Concepts, regexp, security customisations,
test harness) from the legacy Flow.qll/ESSA stack to the new
shared-CFG facade (Cfg.qll) and the ESSA-shaped adapter on the
shared-SSA library (SsaImpl.qll).

Highlights:

  * DataFlowPublic/Private/Dispatch, Attributes, VariableCapture,
    IterableUnpacking, ImportResolution, ImportStar, LocalSources,
    TaintTrackingPrivate, MatchUnpacking, TypeTrackingImpl,
    SsaImpl, Builtins all now qualify CFG/SSA references with
    Cfg:: / SsaImpl:: and stop pulling in semmle.python.essa.*.

  * AstNodeImpl.qll/Cfg.qll: ImportMember exposes its inner
    ImportExpr, DefinitionNode.getValue covers Alias / AnnAssign /
    AugAssign / AssignExpr / For-target / Parameter-default,
    ForNode is treated as an expression node, AnnotatedExitNode is
    canonical, and BoolExprNode.getAnOperand drops the dominance
    constraint that did not hold for short-circuit BBs.

  * SsaImpl.qll: parameters always get a ParameterDefinition (so
    unused parameters still have SSA defs), scope-entry defs for
    module globals require an actual store somewhere, scope-exit
    has a synthetic use so reaching-defs survives to module
    boundary, and the legacy SsaSourceVariable / EssaVariable
    surface (getName, getScope, getAUse, getASourceUse,
    getAnImplicitUse) is reinstated for downstream queries.

  * DataFlowPublic.qll: GuardNode redesigned around the new
    structural outcome nodes (isAfterTrue / isAfterFalse).  The
    legacy ConditionBlock + flipped indirection is gone;
    controlsBlock walks UP through 'not' / '==True' / 'is False'
    etc. via outcomeOfGuard, accumulating polarity cleanly.  Only
    BarrierGuard<...> is preserved as public API.

  * ModuleVariableNode.getAWrite and LocalFlow::definitionFlowStep
    bypass SSA and consult Cfg::NameNode.defines /
    Cfg::DefinitionNode.getValue directly, so that write defs
    pruned by shared SSA (because the variable has no in-scope
    read) still produce dataflow steps.

  * Frameworks + downstream consumers: replace
    EssaVariable.hasDefiningNode, getAReturnValueFlowNode,
    Parameter.getDefault, Scope.getEntryNode / getANormalExit etc.
    with CFG-side bridges through Cfg::ControlFlowNode.

The legacy Flow.qll / Essa.qll stack is untouched and remains
available for queries that import it directly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-26 07:20:44 +00:00
yoff
cccd207ae7 Python: remove getAFlowNode() — bridge AST→CFG only via CFG-side getNode()
Option 2: eliminates the AST→CFG bridge from the AST layer. Previously
'AstNode.getAFlowNode()' returned a 'ControlFlowNode' from the legacy
'Flow.qll' CFG via 'py_flow_bb_node' — this hardcoded the AST to know
about the legacy CFG, preventing files from cleanly switching to the
new shared CFG.

Removes:
  * 'AstNode.getAFlowNode()' from 'AstExtended.qll'
  * Type-narrowing overrides on 'Attribute' / 'Subscript' / 'Call' /
    'IfExp' / 'Name' / 'NameConstant' / 'ImportMember' (in Exprs.qll
    and Import.qll)

Rewrites ~130 call sites across 'python/ql/lib/' and 'python/ql/src/'
to bridge from the CFG side instead:

  Before:  node = expr.getAFlowNode()
  After:   node.getNode() = expr

  Before:  expr.getAFlowNode().(DefinitionNode).getValue()
  After:   exists(DefinitionNode d | d.getNode() = expr | d.getValue())

  Before:  cn.operands(const.getAFlowNode(), op, x)
  After:   exists(ControlFlowNode c | c.getNode() = const | cn.operands(c, op, x))

This is semantically a no-op — both forms are duals of the same predicate.
Verified by passing all library tests:
  * 64 dataflow tests
  * 28 ControlFlow + dataflow-new-ssa tests
  * 1 essa SSA-compute test
  * 93 tests total in the focused suite

Once committed, files that want to switch from the legacy 'Flow' CFG
to the new 'Cfg' facade only need to change their imports — the
bridge sites are CFG-side and respect whichever ControlFlowNode is in
scope.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-21 13:08:37 +00:00
yoff
6814bcfb1b Python: SSA adapter: add MultiAssignmentDefinition, definedBy, useOfDef
Extends the ESSA-shaped adapter on top of the new shared SSA with the
remaining APIs consumed by the dataflow library:

  * MultiAssignmentDefinition: matches the AST pattern 'a, b = ...' where
    the LHS is a Tuple/List and the Name being defined is a sub-element.
    Used by IterableUnpacking.qll to recognise unpacking assignments.

  * EssaNodeDefinition.definedBy(var, defNode): a flatter equivalent of
    'getSourceVariable() = var and getDefiningNode() = defNode', matching
    legacy ESSA's signature. Used by DataFlowPublic.qll's
    ModuleVariableNode to enumerate writes of a global.

  * AdjacentUses::useOfDef(def, use): all reachable uses of a definition
    (firstUse plus transitive use-use adjacency). Used by guards in
    DataFlowPublic.qll.

These complete the API surface enumerated by grep across the dataflow
library. The remaining items (EssaNodeRefinement, EssaImportStep) are
ImportResolution-specific and will need separate treatment, possibly via
a different abstraction since the SSA library does not model heap-state
refinements like 'foo.bar = X'.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-19 12:19:28 +00:00
yoff
7521d838c0 Python: SSA: handle closure variables via per-scope entry defs
The new SSA's implicit entry-def predicate previously placed entries in
the variable's defining scope. For closure variables that's the outer
function, so inner functions had no entry def for the captured
variable — reads in the inner scope failed to resolve to any
definition.

Mirrors legacy ESSA's 'NonLocalVariable.getScopeEntryDefinition()':
place an implicit entry def at every reading scope's entry block,
independently of where the variable is *defined*. A closure variable
accessed in two nested functions and the outer one gets three entry
defs (one per reading scope).

Also makes 'ScopeEntryDefinition' extend 'EssaNodeDefinition' (matching
legacy ESSA), with 'getDefiningNode()' returning the scope's entry CFG
node. This requires extending the private 'writeDefNode' helper to
project i=-1 entries to bb.getNode(0).

Updates the new-vs-legacy comparison snapshot: closure-variable reads
('x:32:5'), nested global reads ('GLOBAL:52:1') now resolve. New
'def-only-new' entries appear for unbound names ('sum', 'open',
'compute') — the new SSA uniformly creates scope-entry defs for all
non-local reads, including those that legacy ESSA classifies as
builtin and excludes. This is a more uniform semantic and arguably
cleaner.

Updates the SsaTest 'some_undefined' annotation: previously documented
as a known limitation, now correctly resolves to a scope-entry def.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-19 12:06:15 +00:00
yoff
c4674bca14 Python: extend new SSA with ESSA-shaped adapter + baseline comparison test
Phase 0.5 - Adapter API on top of the shared SSA:

Adds the legacy-ESSA-shaped class hierarchy that the dataflow library
consumes, layered on the shared 'Ssa::Make' instantiation:

  * EssaDefinition / EssaNodeDefinition: the latter exposes
    'getDefiningNode()' (the CFG node at the def's index in its BB)
    and 'getVariable()' / 'getScope()'.
  * AssignmentDefinition: matches Assign, AnnAssign with value,
    AssignExpr and AugAssign target Names. Exposes 'getValue()'
    pointing at the RHS' CFG node.
  * ParameterDefinition: matches when the defining Name is in
    parameter context.
  * WithDefinition: matches 'with ... as x:' bindings.
  * ScopeEntryDefinition: implicit entry defs at synthetic position
    '-1' of the scope's entry basic block (non-local / global /
    builtin / captured reads).
  * PhiFunction (alias for PhiNode).
  * EssaVariable adapter wrapping a 'Ssa::Definition' with 'getAUse()',
    'getDefinition()', 'getAnUltimateDefinition()', and 'getName()'.
  * AdjacentUses module with 'firstUse' and 'adjacentUseUse' predicates
    bridging to 'Ssa::firstUse' / 'Ssa::adjacentUseUse'.

This is the minimum API the new dataflow's internals call into. The
richer legacy ESSA (refinement nodes, attribute refinements, edge
refinements) stays in 'semmle.python.essa.Essa' for legacy code.

Phase 0.6 - Comparison test:

Adds 'dataflow-new-ssa-vs-legacy/CmpTest.ql' that snapshots the
difference between definitions produced by new SSA vs legacy ESSA on
the same Python source. Baseline output records the current
'def-only-old' mismatches, grouped by category:

  * function/class/global definitions with no in-scope read (intentional;
    SSA is liveness-pruned)
  * captured / closure variables (real gap in new SSA - no
    closure-capture handling yet)
  * module variables __name__ / __package__ / $ (legacy ESSA implicit
    bindings)
  * exception 'as' bindings (depend on raise modelling)

Zero 'def-only-new' mismatches: the new SSA never produces a spurious
definition compared to legacy ESSA on this corpus.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-18 21:13:57 +00:00
yoff
30f0a3f2b9 Python: qualify Flow.qll's AST references with Py:: prefix
Prepares Flow.qll for co-existence with the new CFG facade by switching
'import python' to 'import python as Py' and qualifying every AST-class
reference inside Flow.qll's body. Flow.qll's own CFG types
(ControlFlowNode, BasicBlock, CallNode, NameNode, etc.) keep their
unqualified names.

This change is a no-op semantically:
  * all 24 evaluation-order tests still pass,
  * the bindings + store-load + new-CFG-SSA library tests still pass,
  * compilation produces zero errors.

The change enables a follow-up commit to swap python.qll's
'import semmle.python.Flow' for 'import semmle.python.controlflow.internal.Cfg'
without triggering name-clash errors inside Flow.qll itself. Legacy
modules that still want the legacy CFG (essa/, GuardedControlFlow,
LegacyPointsTo, objects/, pointsto/, types/, dataflow/old/) will need a
similar treatment in subsequent commits.

The qualification was applied mechanically via a script that prefixed
every reference to a known AST class. The list includes the standard
AST node types from semmle.python.{Files, Variables, Stmts, Exprs,
Class, Function, Patterns, Comprehensions} plus 'Location' / 'File' /
'Folder' / 'Container' / 'ConditionBlock' / 'Delete' / 'Load'.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-18 19:19:20 +00:00
yoff
bd9f65478e Python: bring Cfg.qll's facade to API parity with Flow.qll
Adds the methods and type-narrowing overrides needed for Cfg.qll to be
a drop-in replacement for Flow.qll's CFG API surface:

  * 'override getNode()' type narrowing on all AST-shape subclasses
    (CallNode -> Py::Call, AttrNode -> Py::Attribute, ImportExprNode
    -> Py::ImportExpr, etc.). This lets callers chain methods like
    'iexpr.getNode().isRelative()' that previously failed because
    'getNode()' returned the generic AstNode.

  * 'ControlFlowNode.isBranch()' -- true and/or false successor exists.
  * 'ControlFlowNode.getAChild()' -- CFG-level child traversal via the
    AST's getAChildNode, with dominance constraint.
  * 'ControlFlowNode.strictlyReaches(other)' -- node-level reachability.
  * 'NameNode.isSelf()' -- AST-level approximation: uses the 'Variable'
    that is the first parameter of an enclosing method.
  * 'BinaryExprNode.operands(left, op, right)' + 'getAnOperand()'.
  * 'BoolExprNode.getAnOperand()'.
  * 'ForNode.getSequence()' (alias for 'getIter') and
    'ForNode.iterates(target, sequence)'.
  * 'ForNode' / 'RaiseStmtNode' type-narrowing overrides.
  * 'ExceptFlowNode.getName()' / 'ExceptGroupFlowNode.getName()'
    -- the bound 'as'-name CFG node.
  * 'DictNode.getAKey()' (only 'getAValue' was present).

These additions are independent of the dataflow-migration approach
(option 4 vs option 5). They close the API-parity gap identified
during the Option-5 investigation; with them in place, hundreds of
type-resolution errors that previously appeared when swapping Cfg for
Flow at the python.qll level go away.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-18 18:24:38 +00:00
yoff
87b2e2fb0f Python: fix augstore for the new CFG and add store/load test
In the legacy CFG the same Python 'Name' that is the target of an
augmented assignment has two distinct CFG nodes — a load node (context
3) earlier in the basic block and a store node (context 5) later.
'augstore(load, store)' relates the pair via dominance.

The new (shared) CFG canonicalises each AST expression to a single
CFG node, so 'load' and 'store' collapse to one. The dominance-based
'augstore' from the legacy implementation no longer holds (it would
require 'load.strictlyDominates(load)'), so 'isAugLoad' / 'isAugStore'
never fired and 'isStore' missed the AugAssign target entirely.

Redefines 'augstore' as reflexive on the AugAssign target's canonical
CFG node. With this change:

  * isAugLoad / isAugStore both fire on the single canonical node.
  * isStore fires (via 'or augstore(_, this)') — matching the legacy
    classification that an augmented-assignment target is a store.
  * isLoad does not fire (excluded by 'not augstore(_, this)').

Adds 'python/ql/test/library-tests/ControlFlow/store-load/' covering
plain load/store/delete, parameters, augmented assignment, tuple
unpacking, attribute and subscript stores. The test asserts the
classification directly on the new-CFG facade.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-18 12:29:37 +00:00
yoff
2bb8e90476 Python: introduce shared-SSA adapter on the new CFG
Adds 'python/ql/lib/semmle/python/dataflow/new/internal/SsaImpl.qll', a
minimal Python SSA implementation built on the shared SSA library
('codeql.ssa.Ssa::Make<Location, Cfg, Input>'). The structure mirrors
Java's adapter at 'java/ql/lib/semmle/code/java/dataflow/internal/SsaImpl.qll'.

Key design choices:

  * 'SourceVariable' wraps 'Py::Variable'. Only variables that are read
    or deleted somewhere are tracked - write-only variables don't
    benefit from SSA construction.

  * Variable references are positional ('BasicBlock', 'int') pairs
    looked up via 'Cfg::NameNode.defines'/'.uses'/'.deletes' (which
    themselves are one-line bridges to AST-level 'Name.defines' etc.).

  * Parameter writes are not synthesised: parameter Name nodes are
    already wired into the CFG (per the earlier C#-style parameter
    extension in 'AstNodeImpl.qll'), so the regular 'variableWrite'
    path handles them at their natural CFG index.

  * Non-local / captured / global / builtin variables read in a scope
    but not written in it receive a synthetic entry definition at
    index '-1' of the scope's entry basic block. This matches Java's
    'hasEntryDef'.

  * 'del x' is modelled as a certain write at the deletion site.

Includes an inline-expectations test under
'python/ql/test/library-tests/dataflow-new-ssa/' covering:
plain parameter pass-through, simple assignment + read, reassignment
with dead-write pruning, if/else with phi insertion at the join, and
an undefined-name read (currently a known limitation - no SSA flow
without an enclosing definition).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-18 11:03:45 +00:00
yoff
0da0633159 Python: introduce new-CFG facade
Adds 'Cfg.qll' alongside 'AstNodeImpl.qll' in the controlflow internal
package. The facade re-exposes the same API surface as the legacy
'semmle/python/Flow.qll' (ControlFlowNode, BasicBlock, NameNode, CallNode,
AttrNode, ImportExprNode, ImportMemberNode, ImportStarNode, SubscriptNode,
CompareNode, IfExprNode, AssignmentExprNode, BinaryExprNode, BoolExprNode,
UnaryExprNode, DefinitionNode, DeletionNode, ForNode, RaiseStmtNode,
StarredNode, ExceptFlowNode, ExceptGroupFlowNode, TupleNode, ListNode,
SetNode, DictNode, IterableNode, NameConstantNode), but is implemented
on top of the new shared CFG via 'AstNodeImpl.qll'.

The variable-identity predicates ('NameNode.defines', '.uses',
'.deletes', '.isLocal', '.isNonLocal', ...) are one-line bridges to the
underlying AST predicates ('Name.defines', '.uses', '.deletes'),
mirroring the Java pattern.

Re-exports 'EntryBasicBlock' and 'dominatingEdge/2' from the shared
'BB::CfgSig' produced by 'AstNodeImpl.qll', so downstream consumers
(e.g. the SSA adapter) can wire the new CFG into other shared modules
that expect a 'CfgSig' implementation.

This facade is not yet consumed by the dataflow library — that is the
next phase.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-18 11:03:32 +00:00
yoff
558cd5b00c Python: test dead bindings under no-raise CFG abstraction
Adds 'dead_under_no_raise.py' to the bindings test suite, capturing the
three CPython patterns where bindings legitimately have no CFG node
because the surrounding code is unreachable under the 'no expressions
raise' abstraction:

  1. Statements after a 'try: return X; except: pass' block.
  2. The 'else:' clause of a try whose body always raises.
  3. Cache-lookup pattern 'try: return cache[k]; except: pass' followed
     by computation and store.

These bindings intentionally carry no 'cfgdefines=' annotations. If
raise modelling is later added to the CFG, the BindingsTest will surface
the new CFG nodes as unexpected results and this file will need to be
revisited.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-18 10:48:43 +00:00
yoff
77149be759 Python: wire PEP 695 type parameters into the shared CFG (green)
Adds CFG coverage for the binding 'Name's introduced by PEP 695
type-parameter syntax on functions, classes, and 'type' aliases:

  def func[T](...): ...
  class Box[T]: ...
  def multi[T: int, *Ts, **P](...): ...
  type Alias[T] = ...

For each parametrised AST node, the type-parameter names (and, for
'type' aliases, the alias name itself) are added as children of the
enclosing CFG node so that 'Name.defines(v)' has a corresponding
position. Bounds and defaults are intentionally not wired (they have
no SSA-relevant semantics for our purposes).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-18 09:58:27 +00:00
Copilot
aa8402d1b7 Python: wire match-pattern bindings into the shared CFG (green)
Adds concrete `Pattern` subclasses in `AstNodeImpl.qll` for every
`MatchPattern` AST kind, with `getChild` overrides that expose
sub-patterns and bound Names. Specifically:

- MatchCapturePattern (`case x:`) -> getVariable()
- MatchAsPattern (`case … as v:`) -> getPattern(), getAlias()
- MatchStarPattern (`case [*rest]:`) -> getTarget()
- MatchSequencePattern (`case [a, b]:`) -> getPattern(i)
- MatchClassPattern (`case Cls(p, q, k=v)`) -> getClass(), positional, keyword
- MatchMappingPattern (`case {k: v}:`) -> getMapping(i)
- MatchKeyValuePattern, MatchKeywordPattern, MatchDoubleStarPattern
- MatchOrPattern, MatchLiteralPattern, MatchValuePattern

Without these, every Name bound by a match pattern lacked a CFG node.
Removes the corresponding MISSING: annotations from match_pattern.py
(all 11 cases).

Verified: all 24 ControlFlow/evaluation-order tests still pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-12 13:11:18 +00:00
Copilot
b8c093eefb Python: wire import-statement bindings into the shared CFG (green)
Adds `ImportStmt` and `ImportStarStmt` wrappers in `AstNodeImpl.qll`.
For each `Alias` in an import statement, both the value (module/member
expression) and the bound `asname` Name become children of the CFG node
for the import statement, in evaluation order.

Without this, every `Name` introduced by `import` / `from .. import ..`
lacked a CFG node, even though `Name.defines(v)` returns true for it on
the AST side. This was the highest-volume gap: 20,332 missing import
aliases across CPython.

Removes the corresponding MISSING: annotations from imports.py.

Verified: all 24 ControlFlow/evaluation-order tests still pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-12 12:35:47 +00:00
Copilot
0742e1e901 Python: wire parameters into the shared CFG (C# pattern)
Implements `AstSig::Parameter` and `callableGetParameter(c, i)` in
`AstNodeImpl.qll`, following the C# template
(`csharp/.../ControlFlowGraph.qll:147-156`) rather than Java's
`Parameter() { none() }`.

Each Python parameter (positional, *args, keyword-only, **kwargs) now
becomes a CFG node at a stable position in the enclosing callable's
entry sequence. Defaults still evaluate at function-definition time
via `FunctionDefExpr.getDefault` / `LambdaExpr.getDefault`, so
`Parameter::getDefaultValue()` returns `none()` (the shared CFG
library calls this to model the missing-argument fallback, which
Python does not surface at the CFG level).

The bindings test now exercises parameters (the `py_expr_contexts(_, 4, ...)`
exclusion has been removed). A new `parameters.py` test case covers
positional, defaulted, vararg, kwarg, keyword-only, kitchen-sink,
method (self/cls), lambda, and PEP 570 positional-only parameters.
Several other test files were updated to annotate parameters that the
test had previously hidden (synthetic `.0` comprehension parameter,
method `self`, decorator `f`, etc.).

Verified:
- All 24 ControlFlow/evaluation-order tests still pass.
- CFG consistency query (`python/ql/consistency-queries/CfgConsistency.ql`)
  shows zero violations on CPython.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-12 12:29:28 +00:00
Copilot
a20af3cf41 Python: wire AnnAssign into the shared CFG (green)
Adds an `AnnAssignStmt` wrapper in `AstNodeImpl.qll` so that PEP 526
annotated assignments (`x: int = 1`, `x: int`) participate in the
control flow graph. Evaluation order follows CPython: annotation,
optional value, target binding.

Without this, `x: int = 1` had no CFG node for `x` even though
`Name.defines(v)` returns true for it on the AST side. SSA built on
the new CFG would therefore miss every annotated-assignment write.

Removes the corresponding MISSING: annotations from the CFG-binding
gap test:
- annassign.py — all four cases now green.
- match_pattern.py — class-body annotated fields (`x: int`, `y: int`).
- type_params.py — `item: T` inside class.

Verified: all 24 ControlFlow/evaluation-order tests still pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-12 12:22:57 +00:00
Copilot
f077e7c27c Python: add CFG-binding gap tests (red)
Adds inline-expectation tests for the new shared CFG implementation in
python/ql/lib/semmle/python/controlflow/internal/AstNodeImpl.qll,
covering every Python binding construct that introduces a variable.

The test files use MISSING: annotations to record bindings whose
defining Name AST node is *not* currently reachable from the new CFG.
These are the 'red' half of red-green commit pairs: subsequent commits
will extend AstNodeImpl to cover each construct and remove the
corresponding MISSING: marker.

Confirmed-broken categories:
- Import aliases (from x import a)
- Annotated assignment (x: int = 1)
- Exception handler (except E as e)
- Match patterns (case x, case [a,b], case ... as v)
- PEP 695 type params (def f[T], class C[T])

Confirmed-working (no MISSING:):
- Compound targets, with-as, comprehensions, decorated def/class,
  walrus, starred.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-12 12:19:34 +00:00
Copilot
e816ba61ec Python: project via as* helpers outside characteristic predicates
Style cleanup: avoid naming newtype branch constructors (TPyStmt,
TPyExpr, TBlockStmt, TPattern, TBoolExprPair, TScope) outside the
char-preds that classify their wrappers. Method bodies and helper
predicates now use the as* projections instead:

  // Before: result = TBlockStmt(ifStmt.getBody())
  // After:  result.asStmtList() = ifStmt.getBody()

  // Before: result = TPyStmt(matchStmt.getCase(index))
  // After:  result.asStmt() = matchStmt.getCase(index)

Adds:

- AstNode.asStmtList() - the inverse of TBlockStmt(_).
- BinaryExpr.getIndex() - exposes the synthetic-pair index, used
  internally by getRightOperand to find the next pair without
  naming TBoolExprPair.

No behaviour change: all 24 NewCfg evaluation-order tests pass; all
11 shared-CFG consistency queries report 0 violations on CPython.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-07 20:00:55 +00:00
Copilot
3e0ae6ea3c Python: use newtype-branch constructors in characteristic predicates
Style cleanup: when a class's characteristic predicate binds via a
'cast' helper like

  IfStmt() { ifStmt = this.asStmt() }

prefer naming the newtype branch directly:

  IfStmt() { this = TPyStmt(ifStmt) }

This makes the wrapped representation explicit. Apply throughout:
~30 charpreds (every Stmt/Expr leaf wrapper, plus LoopStmt, BreakStmt,
ContinueStmt, BooleanLiteral, UnaryExpr, ArithUnaryExpr, Comprehension).

Method bodies that use asStmt/asExpr to project an underlying
Python AST node (Stmt.toString, BlockStmt.getEnclosingCallable,
UnaryExpr.getOperand, etc.) keep that form - they're projections,
not classifications.

No behaviour change: all 24 NewCfg evaluation-order tests pass; all
11 shared-CFG consistency queries report 0 violations on CPython.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-07 19:51:38 +00:00
Copilot
7fefacd56f Python: introduce TExpr union via newtype-branch alias
Mirror the TStmt refactor for the Expr hierarchy: rename the TExpr
newtype branch to TPyExpr and add

  private class TExpr = TPyExpr or TBoolExprPair;

This lets the public Expr class use TExpr directly:

  class Expr extends AstNodeImpl, TExpr { ... }

instead of

  class Expr extends AstNodeImpl {
    Expr() { this instanceof TExpr or this instanceof TBoolExprPair }
    ...
  }

No behaviour change: all 24 NewCfg evaluation-order tests pass; all
11 shared-CFG consistency queries report 0 violations on CPython.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-07 19:33:33 +00:00
Copilot
b682877968 Python: simplify TBlockStmt char pred via exclusion list
Replace the 14-disjunct allow-list with a 2-conjunct exclusion list.
Of the 17 Py::StmtList getters in AstGenerated.qll, only Try.getHandlers()
and MatchStmt.getCases() should not be wrapped as BlockStmts (they are
iterated individually by the shared library's Try/Switch logic via
getCatch(int) and getCase(int)). All other StmtLists are imperative
block bodies.

No behaviour change: all 24 NewCfg evaluation-order tests pass; all
11 shared-CFG consistency queries report 0 violations on CPython.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-07 17:13:50 +00:00
Copilot
c9445f74c2 Python: introduce TStmt union via newtype-branch alias
Rename the TStmt newtype branch to TPyStmt, and add a private union
type alias

  private class TStmt = TPyStmt or TBlockStmt;

This lets the public Stmt class use TStmt directly in its extends
clause:

  class Stmt extends AstNodeImpl, TStmt { ... }

instead of the previous

  class Stmt extends AstNodeImpl {
    Stmt() { this instanceof TStmt or this instanceof TBlockStmt }
    ...
  }

The same pattern is used in cpp/.../TInstruction.qll and
rust/.../Synth.qll.

No behaviour change: all 24 NewCfg evaluation-order tests pass; all
11 shared-CFG consistency queries report 0 violations on CPython.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-07 17:08:10 +00:00
Copilot
ed1709eb4a Python: use private-abstract + final-alias pattern for AstNode
Convert AstNode from a concrete class with empty default predicates into
a private abstract class plus a final alias, matching the pattern used
in cpp/.../EdgeKind.qll and cpp/.../IRVariable.qll:

  abstract private class AstNodeImpl extends TAstNode {
    abstract string toString();
    abstract Py::Location getLocation();
    abstract Callable getEnclosingCallable();
    ...
  }

  final class AstNode = AstNodeImpl;

This makes the compiler enforce that every concrete subclass implements
toString/getLocation/getEnclosingCallable, replacing the brittle
'empty default + per-branch override' arrangement. Sister classes
inside the module now extend AstNodeImpl instead of AstNode (which is
final and cannot be extended).

The empty Parameter stub gains explicit none() overrides for the
three abstract members, since QL requires them statically even when
the class has no instances.

No behaviour change: all 24 NewCfg evaluation-order tests pass; all
11 shared-CFG consistency queries report 0 violations on CPython.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-07 16:58:43 +00:00
Copilot
00c742f5ff Python: document why Assignment subclasses are empty
Explain that the shared library's Assignment / CompoundAssignment
hierarchy extends BinaryExpr, so it cannot host Python's statement-
level assignment forms (Assign, AugAssign), and that Python has no
short-circuiting compound operators (&&=, ||=, ??=) so all
subclasses remain empty.

No behaviour change; doc comments only.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-07 14:14:16 +00:00
Copilot
06e9bbc3a6 Python: index TBlockStmt by Py::StmtList instead of (parent, slot)
Replace the two-key TBlockStmt(Py::AstNode parent, string slot) newtype
branch with the simpler TBlockStmt(Py::StmtList sl). Each Py::StmtList
that represents an imperative block (function/class/module body, if/
while/for branch, try/except/finally body, case body, except/except*
body) becomes one BlockStmt directly. The slot string disappears;
toString just defers to Py::StmtList.toString() ('StmtList').

The newtype branch keeps an explicit characteristic predicate listing
the slots that count as block bodies. This excludes Try.getHandlers(),
which is a Py::StmtList of ExceptStmt items already iterated by the
shared library's Try logic via getCatch(int) - including it would
produce parallel CFG edges (verified: a permissive
TBlockStmt(Py::StmtList sl) version regressed CPython to 1720
multipleSuccessors and 584 deadEnds before this restriction).

Drops the getBodyStmtList helper. Caller sites now use the StmtList
accessor directly: TBlockStmt(ifStmt.getBody()),
TBlockStmt(tryStmt.getFinalbody()), etc.

No behaviour change: all 24 NewCfg evaluation-order tests pass; all
11 shared-CFG consistency queries report 0 violations on CPython.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-07 12:54:02 +00:00
Copilot
250b63c216 Python: unify Py::BoolExpr handling via TBoolExprPair
Previously a Py::BoolExpr appeared in two newtype branches: as TExpr(be)
(the outermost pair) and TBoolExprPair(be, i) for inner pairs of 3+
operand expressions. This forced BinaryExpr/LogicalAndExpr/LogicalOrExpr
to disjoin two cases, and the synthetic-pair handling spanned multiple
layers.

Restrict TExpr to non-BoolExpr Py::Expr, and extend TBoolExprPair to
cover every operand pair (index 0..n-2). Now every Py::BoolExpr is
represented uniformly as TBoolExprPair(_, 0) for the whole expression
and TBoolExprPair(_, i) for inner pairs.

Extend AstNode.asExpr() to also recover the underlying Py::BoolExpr
from TBoolExprPair(_, 0). This makes asExpr() the inverse of
construction: every 'result = TExpr(e)' turns into 'result.asExpr() = e',
which works uniformly for BoolExprs and non-BoolExprs alike.

Consequences:

- BinaryExpr now extends TBoolExprPair directly with a single uniform
  rule for left/right operands.
- LogicalAndExpr/LogicalOrExpr are one-line char preds via
  getBoolExpr().
- The private BoolExprPair wrapper class folds into BinaryExpr.
- 60+ leaf wrappers now read 'result.asExpr() = py_expr' instead of
  'result = TExpr(py_expr)'.

No behaviour change: all 24 NewCfg evaluation-order tests pass; all
11 shared-CFG consistency queries report 0 violations on CPython.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-07 12:04:54 +00:00
Copilot
77e36d3cfa Python: merge T*AstNode wrappers into matching public classes
Five of the six per-newtype-branch wrapper classes had a natural
public class corresponding to that branch:

  TStmtAstNode      -> Stmt        (TStmt subset; BlockStmt overrides for TBlockStmt)
  TExprAstNode      -> Expr        (TExpr subset; BoolExprPair overrides for TBoolExprPair)
  TScopeAstNode     -> Callable    (= TScope exactly)
  TPatternAstNode   -> Pattern     (= TPattern exactly)
  TBlockStmtAstNode -> BlockStmt   (= TBlockStmt exactly)

Move toString/getLocation/getEnclosingCallable onto these classes and
delete the wrappers.

The sixth wrapper (TBoolExprPair) has no exact public counterpart -
BinaryExpr is broader, including TExpr-branch BoolExprs - so it
remains as a small private class, renamed BoolExprPair.

No behaviour change: all 24 NewCfg evaluation-order tests pass; all
11 shared-CFG consistency queries report 0 violations on CPython.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-07 11:31:07 +00:00
Copilot
f2151fe232 Python: dispatch toString/getLocation/getEnclosingCallable per branch
Replace the three big disjunctive predicates on AstNode with empty
defaults plus per-newtype-branch override classes:

  AstNode.toString()              { none() }
  AstNode.getLocation()           { none() }
  AstNode.getEnclosingCallable()  { none() }

Six private subclasses (one per newtype branch — TStmt, TExpr,
TScope, TPattern, TBoolExprPair, TBlockStmt) override these with
the branch-specific implementation. This mirrors the per-class
dispatch already used for getChild.

No behaviour change: all 24 NewCfg evaluation-order tests pass and
all 11 shared-CFG consistency queries still report 0 violations on
CPython.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-07 11:06:40 +00:00
Copilot
9781ee8d66 Python: adapt to new shared CFG signature
Main added two new requirements to AstSig:
- A 'Parameter' class with a 'getDefaultValue()' method, plus a
  'callableGetParameter(Callable, int)' predicate.
- A 'CallableContext' class in InputSig1, replacing the previous
  'CallableBodyPartContext'.

Add stub implementations: 'Parameter' is empty (none()) and
'callableGetParameter' returns nothing, mirroring Java's TODO. Rename
'CallableBodyPartContext = Void' to 'CallableContext = Void' in the
Python Input module.

NewCfg evaluation-order tests still pass at the 22/24 baseline; all
11 shared-CFG consistency queries still report 0 violations on
CPython.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-05 15:26:20 +00:00
Copilot
03b8e8fdde Python: refactor getChild into per-class OO dispatch
Replace the single ~240-line top-level getChild predicate with one
override per AST class. AstNode declares a default

  AstNode getChild(int index) { none() }

and each subclass with children overrides it (41 classes total).
The top-level predicate becomes a one-line dispatch:

  AstNode getChild(AstNode n, int index) { result = n.getChild(index) }

No behavioral change: NewCfg evaluation-order tests still pass at the
same 22/24 baseline, and all 11 shared-CFG consistency queries still
report 0 violations on CPython.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-05 15:21:43 +00:00
Copilot
93112b2b75 Python: include try-else in getChild for completion propagation
The shared CFG library propagates abrupt completions from child to
parent via getChild(parent, _) = child. Python's try.getElse() was
wired into normal step rules but not listed in getChild(TryStmt, ...),
so return/break/continue/raise statements occurring inside a try-else
block had no parent path and ended up as dead-end CFG nodes.

Add the else block at index -2 (alongside finally at -1). This affects
only completion propagation; the normal-flow CFG is unchanged because
TryStmt has explicit step rules.

Verified on a CPython database: all 11 shared-CFG consistency queries
now pass with 0 violations (deadEnd: 244 -> 0).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-05 15:21:43 +00:00
Copilot
76724c5391 Shared CFG: support for-else and while-else loops
Add two default predicates to AstSig:

  default AstNode getWhileElse(WhileStmt loop) { none() }
  default AstNode getForeachElse(ForeachStmt loop) { none() }

When defined, the explicit-step rules for While/Do and Foreach
route the loop's normal-completion exits through the else block
before reaching the after-loop node:

  - WhileStmt: after-false condition -> before-else -> after-while
    (instead of directly after-while).
  - ForeachStmt: after-collection [empty] and the LoopHeader exit
    are both routed through before-else -> after-foreach.

Python's Ast module overrides the predicates to return the
synthetic BlockStmt for the orelse slot, replacing the previous
customisations in Input::step. This eliminates parallel direct
successors emitted by the previous Python-side step additions
(verified: multipleSuccessors on a CPython database goes from
1340 to 0).

Java and C# CFG tests are unaffected.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-05 15:21:43 +00:00
Copilot
edfe91832b Python: compact-renumber FunctionExpr/Lambda defaults
`Args.getDefault(int)` and `Args.getKwDefault(int)` are indexed by
argument position (with gaps for args without defaults), not by
default position. The CFG `getChild` predicate for FunctionDefExpr
and LambdaExpr therefore had gaps at low indices and collisions
where defaults and kwdefaults overlapped, producing parallel
edges before the FunctionExpr.

Use `rank` to compact-renumber `getDefault(n)` and `getKwDefault(n)`
in source order. Verified on a CPython database: removes ~536
`multipleSuccessors` consistency results (1340 -> 804); the rest are
`for/else` and `while/else`.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-05 15:21:43 +00:00
Copilot
58cda914db Python: collapse two-layer AstNodeImpl into a single Ast module
Merge the previous `Ast` and `AstSigImpl` modules into a single
`module Ast implements AstSig<Py::Location>`. Classes now use the
signature names (IfStmt, WhileStmt, ForeachStmt, etc.) and signature
predicates (getCondition, getThen, getElse, etc.) directly, with no
intermediate renaming layer.

Drop the TStmtListNode newtype branch entirely. Replace it with a
synthetic TBlockStmt(parent, slot) keyed by a parent AST node and a
slot label string ('body', 'orelse', 'finally'). Py::StmtList no
longer appears in the newtype; the BlockStmt class provides indexed
access to the underlying body items via getStmt(n).

All 22 of 24 evaluation-order tests still pass; the same 2
comprehension-related failures predate this refactor.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-05 15:21:43 +00:00
yoff
0b4a24884f python: add consistency checks
Co-authored-by: aschackmull <aschackmull@github.com>
2026-05-05 15:21:42 +00:00
yoff
3b0abad701 Python: add pattern nodes
Co-authored-by: Copilot <copilot@github.com>
2026-05-05 15:21:42 +00:00
Taus
68b3d57563 Cleanup, printCFG
Co-authored-by: yoff <yoff@github.com>
2026-05-05 15:21:42 +00:00
Taus
a33b49a3f3 WIP2 2026-05-05 15:21:42 +00:00
Taus
1af415bec3 WIP 2026-05-05 15:21:42 +00:00
Taus
e3155ea544 Python: Handle dict unpacking in calls
Co-authored-by: yoff <yoff@github.com>
2026-05-05 15:21:42 +00:00
Taus
04b8c4bc7e Python: Fix exception issue
Co-authored-by: yoff <yoff@github.com>
2026-05-05 15:21:42 +00:00
Taus
f85b532bb3 Python: Fix match
Co-authored-by: yoff <yoff@github.com>
2026-05-05 15:21:42 +00:00
Taus
0e1f1d9f09 Python: Support match
Co-authored-by: yoff <yoff@github.com>
2026-05-05 15:21:42 +00:00
Taus
53da31bd15 Python: More nodes
Not entirely sure about the `else:` blocks.

Co-authored-by: yoff <yoff@github.com>
2026-05-05 15:21:42 +00:00
Taus
1f82dbc583 Python: Comprehensions
Co-authored-by: yoff <yoff@github.com>
2026-05-05 15:21:41 +00:00
Taus
b229066891 Python: Add with
Co-authored-by: yoff <yoff@github.com>
2026-05-05 15:21:41 +00:00
Taus
0acbb12fb9 Python: More simple statements
Co-authored-by: yoff <yoff@github.com>
2026-05-05 15:21:41 +00:00
Taus
542efce4a6 Python: assignments
Co-authored-by: yoff <yoff@github.com>
2026-05-05 15:21:41 +00:00
Taus
2db400aebd Python: Attributes
Co-authored-by: yoff <yoff@github.com>
2026-05-05 15:21:41 +00:00
Taus
66bbb60614 Python: Function calls
Co-authored-by: yoff <yoff@github.com>
2026-05-05 15:21:41 +00:00
Taus
971beb2d89 Python: Assert statements
Co-authored-by: yoff <yoff@github.com>
2026-05-05 15:21:41 +00:00
Taus
ea204ac75f Python: Support various literals
Co-authored-by: yoff <yoff@github.com>
2026-05-05 15:21:41 +00:00
Taus
3be562929a Python: Ignore synthetic CFG nodes
We can only annotate the ones that correspond directly to AST nodes
anyway.

Co-authored-by: yoff <yoff@github.com>
2026-05-05 15:21:41 +00:00
Taus
dc0344e2fc Python: More AstNodeImpl improvements
Co-authored-by: yoff <yoff@github.com>
2026-05-05 15:21:41 +00:00
Taus
2ed75e7ca7 Python: Instantiate CFG tests with new CFG library
Co-authored-by: yoff <yoff@github.com>
2026-05-05 15:21:40 +00:00
Taus
9974584102 Python: Instantiate CFG module fully
Co-authored-by: yoff <yoff@github.com>
2026-05-05 15:21:40 +00:00
Taus
6086b999f6 Python: Use fields everywhere in new AST classes
Co-authored-by: yoff <yoff@github.com>
2026-05-05 15:21:40 +00:00
Taus
d62e116fc2 Python: First stab at shared control-flow 2026-05-05 15:21:40 +00:00
Taus
4582855de1 Python: Make CFG tests parameterised
Currently we only instantiate them with the old CFG library, but in the
future we'll want to do this with the new library as well.

Co-authored-by: yoff <yoff@github.com>
2026-05-05 15:21:40 +00:00
Taus
ba29e7e34d Python: Add ConsecutiveTimestamps test
This one is potentially a bit iffy -- it checks for a very powerful
propetry (that implies many of the other queries), but as the test
results show, it can produce false positives when there is in fact no
problem. We may want to get rid of it entirely, if it becomes too noisy.
2026-05-05 15:21:40 +00:00
Taus
f97bf38f3b Python: Add NeverReachable test
This looks for nodes annotated with `t.never` in the test that are
reachable in the CFG. This should not happen (it messes with various
queries, e.g. the "mixed returns" query), but the test shows that in a
few particular cases (involving the `match` statement where all cases
contain `return`s), we _do_ have reachable nodes that shouldn't be.
2026-05-05 15:21:40 +00:00
Taus
a8d136d3d6 Python: Add BasicBlockOrdering test
This one demonstrates a bug in the current CFG. In a dictionary
comprehension `{k: v for k, v in d.items()}`, we evaluate the value
before the key, which is incorrect. (A fix for this bug has been
implemented in a separate PR.)
2026-05-05 15:21:40 +00:00
Taus
710a43ac7f Python: Add some CFG-validation queries
These use the annotated, self-verifying test files to check various
consistency requirements.

Some of these may be expressing the same thing in different ways, but
it's fairly cheap to keep them around, so I have not attempted to
produce a minimal set of queries for this.
2026-05-05 15:21:40 +00:00
Taus
3402d0eaeb Python: Add self-validating CFG tests
These tests consist of various Python constructions (hopefully a
somewhat comprehensive set) with specific timestamp annotations
scattered throughout. When the tests are run using the Python 3
interpreter, these annotations are checked and compared to the "current
timestamp" to see that they are in agreement. This is what makes the
tests "self-validating".

There are a few different kinds of annotations: the basic `t[4]` style
(meaning this is executed at timestamp 4), the `t.dead[4]` variant
(meaning this _would_ happen at timestamp 4, but it is in a dead
branch), and `t.never` (meaning this is never executed at all).

In addition to this, there is a query, MissingAnnotations, which checks
whether we have applied these annotations maximally. Many expression
nodes are not actually annotatable, so there is a sizeable list of
excluded nodes for that query.
2026-05-05 15:21:39 +00:00
Tom Hvitved
4c1461ad5b Merge pull request #21786 from hvitved/inline-test-ignore-tags
Inline test expectations: Rename `tagIsOptional` to `tagIsIgnored`
2026-05-05 09:01:58 +02:00
Tom Hvitved
80ccdcc696 Inline test expectations: Rename tagIsOptional to tagIsIgnored 2026-05-04 11:21:33 +02:00
245 changed files with 8998 additions and 1408 deletions

View File

@@ -47,7 +47,7 @@ C/C++
* The "Multiplication result converted to larger type" (:code:`cpp/integer-multiplication-cast-to-long`) query has been upgraded to :code:`high` precision. This query will now run in the default code scanning suite.
* The "Suspicious add with sizeof" (:code:`cpp/suspicious-add-sizeof`) query has been upgraded to :code:`high` precision. This query will now run in the default code scanning suite.
* The "Wrong type of arguments to formatting function" (:code:`cpp/wrong-type-format-argument`) query has been upgraded to :code:`high` precision. This query will now run in the default code scanning suite.
* The "Implicit function declaration" (:code:`cpp/implicit-function-declaration`) query has been upgraded to :code:`high` precision. However, for :code:`build mode: none` databases, it no longer produces any results. The results in this mode were found to be very noisy and fundamentally imprecise.
* The "Implicit function declaration" (:code:`cpp/implicit-function-declaration`) query has been upgraded to :code:`high` precision. However, for :code:`build-mode: none` databases, it no longer produces any results. The results in this mode were found to be very noisy and fundamentally imprecise.
C#
""

View File

@@ -0,0 +1,2 @@
import semmle.python.controlflow.internal.AstNodeImpl
import ControlFlow::Consistency

View File

@@ -213,9 +213,11 @@ class ExprWithPointsTo extends Expr {
* Gets what this expression might "refer-to" in the given `context`.
*/
predicate refersTo(Context context, Object obj, ClassObject cls, AstNode origin) {
this.getAFlowNode()
.(ControlFlowNodeWithPointsTo)
.refersTo(context, obj, cls, origin.getAFlowNode())
exists(ControlFlowNode this_, ControlFlowNode origin_ |
this_.getNode() = this and origin_.getNode() = origin
|
this_.(ControlFlowNodeWithPointsTo).refersTo(context, obj, cls, origin_)
)
}
/**
@@ -226,7 +228,11 @@ class ExprWithPointsTo extends Expr {
*/
pragma[nomagic]
predicate refersTo(Object obj, AstNode origin) {
this.getAFlowNode().(ControlFlowNodeWithPointsTo).refersTo(obj, origin.getAFlowNode())
exists(ControlFlowNode this_, ControlFlowNode origin_ |
this_.getNode() = this and origin_.getNode() = origin
|
this_.(ControlFlowNodeWithPointsTo).refersTo(obj, origin_)
)
}
/**
@@ -240,16 +246,22 @@ class ExprWithPointsTo extends Expr {
* in the given `context`.
*/
predicate pointsTo(Context context, Value value, AstNode origin) {
this.getAFlowNode()
.(ControlFlowNodeWithPointsTo)
.pointsTo(context, value, origin.getAFlowNode())
exists(ControlFlowNode this_, ControlFlowNode origin_ |
this_.getNode() = this and origin_.getNode() = origin
|
this_.(ControlFlowNodeWithPointsTo).pointsTo(context, value, origin_)
)
}
/**
* Holds if this expression might "point-to" to `value` which is from `origin`.
*/
predicate pointsTo(Value value, AstNode origin) {
this.getAFlowNode().(ControlFlowNodeWithPointsTo).pointsTo(value, origin.getAFlowNode())
exists(ControlFlowNode this_, ControlFlowNode origin_ |
this_.getNode() = this and origin_.getNode() = origin
|
this_.(ControlFlowNodeWithPointsTo).pointsTo(value, origin_)
)
}
/**
@@ -475,7 +487,10 @@ class FunctionMetricsWithPointsTo extends FunctionMetrics {
not non_coupling_method(result) and
exists(Call call | call.getScope() = this |
exists(FunctionObject callee | callee.getFunction() = result |
call.getAFlowNode().getFunction().(ControlFlowNodeWithPointsTo).refersTo(callee)
exists(CallNode call_ |
call_.getNode() = call and
call_.getFunction().(ControlFlowNodeWithPointsTo).refersTo(callee)
)
)
or
exists(Attribute a | call.getFunc() = a |

View File

@@ -64,7 +64,7 @@ private predicate jump_to_defn(ControlFlowNode use, Definition defn) {
private predicate preferred_jump_to_defn(Expr use, Definition def) {
not use instanceof ClassExpr and
not use instanceof FunctionExpr and
jump_to_defn(use.getAFlowNode(), def)
exists(ControlFlowNode useNode | useNode.getNode() = use | jump_to_defn(useNode, def))
}
private predicate unique_jump_to_defn(Expr use, Definition def) {
@@ -452,7 +452,7 @@ private predicate self_parameter_jump_to_defn_attribute(
* This exists primarily for testing use `getPreferredDefinition()` instead.
*/
Definition getADefinition(Expr use) {
jump_to_defn(use.getAFlowNode(), result) and
exists(ControlFlowNode useNode | useNode.getNode() = use | jump_to_defn(useNode, result)) and
not use instanceof Call and
not use.isArtificial() and
// Not the use itself

View File

@@ -0,0 +1,45 @@
/**
* @name Print CFG (New)
* @description Produces a representation of a file's Control Flow Graph
* using the new shared control flow library.
* This query is used by the VS Code extension.
* @id python/print-cfg
* @kind graph
* @tags ide-contextual-queries/print-cfg
*/
private import python as Py
import semmle.python.controlflow.internal.AstNodeImpl
external string selectedSourceFile();
private predicate selectedSourceFileAlias = selectedSourceFile/0;
external int selectedSourceLine();
private predicate selectedSourceLineAlias = selectedSourceLine/0;
external int selectedSourceColumn();
private predicate selectedSourceColumnAlias = selectedSourceColumn/0;
module ViewCfgQueryInput implements ControlFlow::ViewCfgQueryInputSig<Py::File> {
predicate selectedSourceFile = selectedSourceFileAlias/0;
predicate selectedSourceLine = selectedSourceLineAlias/0;
predicate selectedSourceColumn = selectedSourceColumnAlias/0;
predicate cfgScopeSpan(
Ast::Callable callable, Py::File file, int startLine, int startColumn, int endLine,
int endColumn
) {
exists(Py::Scope scope |
scope = callable.asScope() and
file = scope.getLocation().getFile() and
scope.getLocation().hasLocationInfo(_, startLine, startColumn, endLine, endColumn)
)
}
}
import ControlFlow::ViewCfgQuery<Py::File, ViewCfgQueryInput>

View File

@@ -7,6 +7,7 @@ library: true
upgrades: upgrades
dependencies:
codeql/concepts: ${workspace}
codeql/controlflow: ${workspace}
codeql/dataflow: ${workspace}
codeql/mad: ${workspace}
codeql/regex: ${workspace}

View File

@@ -6,8 +6,9 @@
* directed and labeled; they specify how the components represented by nodes relate to each other.
*/
// Importing python under the `py` namespace to avoid importing `CallNode` from `Flow.qll` and thereby having a naming conflict with `API::CallNode`.
// Importing python under the `py` namespace to avoid importing `Cfg::CallNode` from `Flow.qll` and thereby having a naming conflict with `API::CallNode`.
private import python as PY
private import semmle.python.controlflow.internal.Cfg as Cfg
import semmle.python.dataflow.new.DataFlow
private import semmle.python.internal.CachedStages
@@ -282,7 +283,7 @@ module API {
index = this.getIndex() and
(
// subscripting
exists(PY::SubscriptNode subscript |
exists(Cfg::SubscriptNode subscript |
subscript.getObject() = this.getAValueReachableFromSource().asCfgNode() and
subscript.getIndex() = index.asSink().asCfgNode()
|
@@ -290,7 +291,7 @@ module API {
subscript = result.asSource().asCfgNode()
or
// writing
subscript.(PY::DefinitionNode).getValue() = result.asSink().asCfgNode()
subscript.(Cfg::DefinitionNode).getValue() = result.asSink().asCfgNode()
)
or
// dictionary literals
@@ -684,7 +685,7 @@ module API {
* Ignores relative imports, such as `from ..foo.bar import baz`.
*/
private predicate imports(DataFlow::CfgNode imp, string name) {
exists(PY::ImportExprNode iexpr |
exists(Cfg::ImportExprNode iexpr |
imp.getNode() = iexpr and
not iexpr.getNode().isRelative() and
name = iexpr.getNode().getImportedModuleName()
@@ -775,7 +776,7 @@ module API {
// list literals, from `x` to `[x]`
// TODO: once convenient, this should be done at a higher level than the AST,
// at least at the CFG layer, to take splitting into account.
// Also consider `SequenceNode for generality.
// Also consider `Cfg::SequenceNode for generality.
exists(PY::List list | list = pred.(DataFlow::ExprNode).getNode().getNode() |
rhs.(DataFlow::ExprNode).getNode().getNode() = list.getAnElt() and
lbl = Label::subscript()
@@ -805,7 +806,7 @@ module API {
subscript = trackUseNode(src).getSubscript(index)
|
// from `x` to a definition of `x[...]`
rhs.asCfgNode() = subscript.asCfgNode().(PY::DefinitionNode).getValue() and
rhs.asCfgNode() = subscript.asCfgNode().(Cfg::DefinitionNode).getValue() and
lbl = Label::subscript()
or
// from `x` to `"key"` in `x["key"]`

View File

@@ -16,17 +16,6 @@ abstract class AstNode extends AstNode_ {
/** Gets the scope that this node occurs in */
abstract Scope getScope();
/**
* Gets a flow node corresponding directly to this node.
* NOTE: For some statements and other purely syntactic elements,
* there may not be a `ControlFlowNode`
*/
cached
ControlFlowNode getAFlowNode() {
Stages::AST::ref() and
py_flow_bb_node(result, this, _, _)
}
/** Gets the location for this AST node */
cached
Location getLocation() { none() }

View File

@@ -5,6 +5,7 @@
*/
private import python
private import semmle.python.controlflow.internal.Cfg as Cfg
private import semmle.python.dataflow.new.DataFlow
private import semmle.python.dataflow.new.internal.DataFlowImplSpecific
private import semmle.python.dataflow.new.RemoteFlowSources
@@ -214,7 +215,7 @@ module Path {
SafeAccessCheck() { this = DataFlow::BarrierGuard<safeAccessCheck/3>::getABarrierNode() }
}
private predicate safeAccessCheck(DataFlow::GuardNode g, ControlFlowNode node, boolean branch) {
private predicate safeAccessCheck(DataFlow::GuardNode g, Cfg::ControlFlowNode node, boolean branch) {
g.(SafeAccessCheck::Range).checks(node, branch)
}
@@ -223,7 +224,7 @@ module Path {
/** A data-flow node that checks that a path is safe to access in some way, for example by having a controlled prefix. */
abstract class Range extends DataFlow::GuardNode {
/** Holds if this guard validates `node` upon evaluating to `branch`. */
abstract predicate checks(ControlFlowNode node, boolean branch);
abstract predicate checks(Cfg::ControlFlowNode node, boolean branch);
}
}
}

View File

@@ -28,7 +28,7 @@ class Expr extends Expr_, AstNode {
/** Whether this expression may have a side effect (as determined purely from its syntax) */
predicate hasSideEffects() {
/* If an exception raised by this expression handled, count that as a side effect */
this.getAFlowNode().getASuccessor().getNode() instanceof ExceptStmt
exists(ControlFlowNode n | n.getNode() = this | n.getASuccessor().getNode() instanceof ExceptStmt)
or
this.getASubExpression().hasSideEffects()
}
@@ -68,8 +68,6 @@ class Attribute extends Attribute_ {
/* syntax: Expr.name */
override Expr getASubExpression() { result = this.getObject() }
override AttrNode getAFlowNode() { result = super.getAFlowNode() }
/** Gets the name of this attribute. That is the `name` in `obj.name` */
string getName() { result = Attribute_.super.getAttr() }
@@ -97,7 +95,6 @@ class Subscript extends Subscript_ {
Expr getObject() { result = Subscript_.super.getValue() }
override SubscriptNode getAFlowNode() { result = super.getAFlowNode() }
}
/** A call expression, such as `func(...)` */
@@ -113,7 +110,6 @@ class Call extends Call_ {
override string toString() { result = this.getFunc().toString() + "()" }
override CallNode getAFlowNode() { result = super.getAFlowNode() }
/** Gets a tuple (*) argument of this call. */
Expr getStarargs() { result = this.getAPositionalArg().(Starred).getValue() }
@@ -201,7 +197,6 @@ class IfExp extends IfExp_ {
result = this.getTest() or result = this.getBody() or result = this.getOrelse()
}
override IfExprNode getAFlowNode() { result = super.getAFlowNode() }
}
/** A starred expression, such as the `*rest` in the assignment `first, *rest = seq` */
@@ -411,7 +406,6 @@ class PlaceHolder extends PlaceHolder_ {
override string toString() { result = "$" + this.getId() }
override NameNode getAFlowNode() { result = super.getAFlowNode() }
}
/** A tuple expression such as `( 1, 3, 5, 7, 9 )` */
@@ -478,7 +472,6 @@ class Name extends Name_ {
override string toString() { result = this.getId() }
override NameNode getAFlowNode() { result = super.getAFlowNode() }
override predicate isArtificial() {
/* Artificial variable names in comprehensions all start with "." */
@@ -585,7 +578,6 @@ abstract class NameConstant extends Name, ImmutableLiteral {
override predicate isConstant() { any() }
override NameConstantNode getAFlowNode() { result = Name.super.getAFlowNode() }
override predicate isArtificial() { none() }
}

File diff suppressed because it is too large Load Diff

View File

@@ -163,7 +163,6 @@ class ImportMember extends ImportMember_ {
result = this.getModule().(ImportExpr).getImportedModuleName() + "." + this.getName()
}
override ImportMemberNode getAFlowNode() { result = super.getAFlowNode() }
}
/** An import statement */

View File

@@ -46,20 +46,23 @@ class SelfAttributeRead extends SelfAttribute {
}
predicate guardedByHasattr() {
exists(Variable var, ControlFlowNode n |
var.getAUse() = this.getObject().getAFlowNode() and
exists(Variable var, ControlFlowNode n, ControlFlowNode this_, ControlFlowNode obj_ |
this_.getNode() = this and obj_.getNode() = this.getObject()
|
var.getAUse() = obj_ and
hasattr(n, var.getAUse(), this.getName()) and
n.strictlyDominates(this.getAFlowNode())
n.strictlyDominates(this_)
)
}
pragma[noinline]
predicate locallyDefined() {
exists(SelfAttributeStore store |
this.getName() = store.getName() and
this.getScope() = store.getScope()
exists(SelfAttributeStore store, ControlFlowNode store_, ControlFlowNode this_ |
store_.getNode() = store and this_.getNode() = this
|
store.getAFlowNode().strictlyDominates(this.getAFlowNode())
this.getName() = store.getName() and
this.getScope() = store.getScope() and
store_.strictlyDominates(this_)
)
}
}

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -1,36 +1,43 @@
/** Provides commonly used BarrierGuards. */
private import python
private import semmle.python.controlflow.internal.Cfg as Cfg
private import semmle.python.dataflow.new.DataFlow
private predicate constCompare(DataFlow::GuardNode g, ControlFlowNode node, boolean branch) {
exists(CompareNode cn | cn = g |
exists(ImmutableLiteral const, Cmpop op |
op = any(Eq eq) and branch = true
or
op = any(NotEq ne) and branch = false
private predicate constCompare(DataFlow::GuardNode g, Cfg::ControlFlowNode node, boolean branch) {
exists(Cfg::CompareNode cn | cn = g |
exists(ImmutableLiteral const, Cmpop op, Cfg::ControlFlowNode c |
c.getNode() = const and
(
op = any(Eq eq) and branch = true
or
op = any(NotEq ne) and branch = false
)
|
cn.operands(const.getAFlowNode(), op, node)
cn.operands(c, op, node)
or
cn.operands(node, op, const.getAFlowNode())
cn.operands(node, op, c)
)
or
exists(NameConstant const, Cmpop op |
op = any(Is is_) and branch = true
or
op = any(IsNot isn) and branch = false
exists(NameConstant const, Cmpop op, Cfg::ControlFlowNode c |
c.getNode() = const and
(
op = any(Is is_) and branch = true
or
op = any(IsNot isn) and branch = false
)
|
cn.operands(const.getAFlowNode(), op, node)
cn.operands(c, op, node)
or
cn.operands(node, op, const.getAFlowNode())
cn.operands(node, op, c)
)
or
exists(IterableNode const_iterable, Cmpop op |
exists(Cfg::IterableNode const_iterable, Cmpop op |
op = any(In in_) and branch = true
or
op = any(NotIn ni) and branch = false
|
forall(ControlFlowNode elem | elem = const_iterable.getAnElement() |
forall(Cfg::ControlFlowNode elem | elem = const_iterable.getAnElement() |
elem.getNode() instanceof ImmutableLiteral
) and
cn.operands(node, op, const_iterable)

View File

@@ -4,6 +4,7 @@
*/
private import python
private import semmle.python.controlflow.internal.Cfg as Cfg
private import semmle.python.dataflow.new.DataFlow
// Need to import `semmle.python.Frameworks` since frameworks can extend `SensitiveDataSource::Range`
private import semmle.python.Frameworks
@@ -105,7 +106,7 @@ private module SensitiveDataModeling {
or
// to cover functions that we don't have the definition for, and where the
// reference to the function has not already been marked as being sensitive
this.getFunction().asCfgNode().(NameNode).getId() = sensitiveString(classification)
this.getFunction().asCfgNode().(Cfg::NameNode).getId() = sensitiveString(classification)
}
override SensitiveDataClassification getClassification() { result = classification }
@@ -251,12 +252,12 @@ private module SensitiveDataModeling {
SensitiveDataClassification classification;
SensitiveVariableAssignment() {
exists(DefinitionNode def |
def.(NameNode).getId() = sensitiveString(classification) and
exists(Cfg::DefinitionNode def |
def.(Cfg::NameNode).getId() = sensitiveString(classification) and
(
this.asCfgNode() = def.getValue()
or
this.asCfgNode() = def.getValue().(ForNode).getSequence()
this.asCfgNode() = def.getValue().(Cfg::ForNode).getSequence()
) and
not this.asExpr() instanceof FunctionExpr and
not this.asExpr() instanceof ClassExpr
@@ -293,7 +294,7 @@ private module SensitiveDataModeling {
SensitiveDataClassification classification;
SensitiveSubscript() {
this.asCfgNode().(SubscriptNode).getIndex() =
this.asCfgNode().(Cfg::SubscriptNode).getIndex() =
sensitiveLookupStringConst(classification).asCfgNode()
}

View File

@@ -3,6 +3,7 @@ overlay[local]
module;
private import python
private import semmle.python.controlflow.internal.Cfg as Cfg
import DataFlowUtil
import DataFlowPublic
private import DataFlowPrivate
@@ -83,9 +84,9 @@ abstract class AttrWrite extends AttrRef {
* ```python
* object.attr = value
* ```
* Also gives access to the `value` being written, by extending `DefinitionNode`.
* Also gives access to the `value` being written, by extending `Cfg::DefinitionNode`.
*/
private class AttributeAssignmentNode extends DefinitionNode, AttrNode { }
private class AttributeAssignmentNode extends Cfg::DefinitionNode, Cfg::AttrNode { }
/** A simple attribute assignment: `object.attr = value`. */
private class AttributeAssignmentAsAttrWrite extends AttrWrite, CfgNode {
@@ -131,13 +132,13 @@ private class GlobalAttributeAssignmentAsAttrWrite extends AttrWrite, CfgNode {
override string getAttributeName() { result = node.getName() }
}
/** Represents `CallNode`s that may refer to calls to built-in functions or classes. */
private class BuiltInCallNode extends CallNode {
/** Represents `Cfg::CallNode`s that may refer to calls to built-in functions or classes. */
private class BuiltInCallNode extends Cfg::CallNode {
string name;
BuiltInCallNode() {
// TODO disallow instances where the name of the built-in may refer to an in-scope variable of that name.
exists(NameNode id |
exists(Cfg::NameNode id |
name = Builtins::getBuiltinName() and
this.getFunction() = id and
id.getId() = name and
@@ -145,7 +146,7 @@ private class BuiltInCallNode extends CallNode {
)
}
/** Gets the name of the built-in function that is called at this `CallNode` */
/** Gets the name of the built-in function that is called at this `Cfg::CallNode` */
string getBuiltinName() { result = name }
}
@@ -157,20 +158,20 @@ private class BuiltinAttrCallNode extends BuiltInCallNode {
BuiltinAttrCallNode() { name in ["setattr", "getattr", "hasattr", "delattr"] }
/** Gets the control flow node for object on which the attribute is accessed. */
ControlFlowNode getObject() { result in [this.getArg(0), this.getArgByName("object")] }
Cfg::ControlFlowNode getObject() { result in [this.getArg(0), this.getArgByName("object")] }
/**
* Gets the control flow node for the value that is being written to the attribute.
* Only relevant for `setattr` calls.
*/
ControlFlowNode getValue() {
Cfg::ControlFlowNode getValue() {
// only valid for `setattr`
name = "setattr" and
result in [this.getArg(2), this.getArgByName("value")]
}
/** Gets the control flow node that defines the name of the attribute being accessed. */
ControlFlowNode getName() { result in [this.getArg(1), this.getArgByName("name")] }
Cfg::ControlFlowNode getName() { result in [this.getArg(1), this.getArgByName("name")] }
}
/** Represents calls to the built-in `setattr`. */
@@ -205,10 +206,10 @@ private class SetAttrCallAsAttrWrite extends AttrWrite, CfgNode {
* attr = value
* ...
* ```
* Instances of this class correspond to the `NameNode` for `attr`, and also gives access to `value` by
* virtue of being a `DefinitionNode`.
* Instances of this class correspond to the `Cfg::NameNode` for `attr`, and also gives access to `value` by
* virtue of being a `Cfg::DefinitionNode`.
*/
private class ClassAttributeAssignmentNode extends DefinitionNode, NameNode {
private class ClassAttributeAssignmentNode extends Cfg::DefinitionNode, Cfg::NameNode {
ClassAttributeAssignmentNode() { this.getScope() = any(ClassExpr c).getInnerScope() }
}
@@ -228,7 +229,7 @@ private class ClassDefinitionAsAttrWrite extends AttrWrite, CfgNode {
override Node getValue() { result.asCfgNode() = node.getValue() }
override Node getObject() { result.asCfgNode() = cls.getAFlowNode() }
override Node getObject() { result.asCfgNode().getNode() = cls }
override ExprNode getAttributeNameExpr() { none() }
@@ -248,7 +249,7 @@ abstract class AttrRead extends AttrRef, Node, LocalSourceNode {
/** A simple attribute read, e.g. `object.attr` */
private class AttributeReadAsAttrRead extends AttrRead, CfgNode {
override AttrNode node;
override Cfg::AttrNode node;
AttributeReadAsAttrRead() { node.isLoad() }
@@ -285,7 +286,7 @@ private class GetAttrCallAsAttrRead extends AttrRead, CfgNode {
* is treated as if it is a read of the attribute `module.attr`, even if `module` is not imported directly.
*/
private class ModuleAttributeImportAsAttrRead extends AttrRead, CfgNode {
override ImportMemberNode node;
override Cfg::ImportMemberNode node;
override Node getObject() { result.asCfgNode() = node.getModule(_) }

View File

@@ -3,6 +3,7 @@ overlay[local]
module;
private import python
private import semmle.python.controlflow.internal.Cfg as Cfg
private import semmle.python.dataflow.new.DataFlow
private import semmle.python.dataflow.new.internal.ImportStar
@@ -67,7 +68,7 @@ module Builtins {
DataFlow::CfgNode likelyBuiltin(string name) {
exists(Module m |
result.getNode() =
any(NameNode n |
any(Cfg::NameNode n |
possible_builtin_accessed_in_module(n, name, m) and
not possible_builtin_defined_in_module(name, m)
)
@@ -87,7 +88,7 @@ module Builtins {
* Holds if `n` is an access of a global variable called `name` (which is also the name of a
* built-in) inside the module `m`.
*/
private predicate possible_builtin_accessed_in_module(NameNode n, string name, Module m) {
private predicate possible_builtin_accessed_in_module(Cfg::NameNode n, string name, Module m) {
n.isGlobal() and
n.isLoad() and
name = n.getId() and

View File

@@ -25,7 +25,7 @@
* what callable this call might end up targeting.
*
* Specifically this means that we cannot use type-backtrackers from the function of a
* `CallNode`, since there is no `CallNode` to backtrack from for `func` in the example
* `Cfg::CallNode`, since there is no `Cfg::CallNode` to backtrack from for `func` in the example
* above.
*
* Note: This hasn't been 100% realized yet, so we don't currently expose a predicate to
@@ -35,6 +35,7 @@ overlay[local?]
module;
private import python
private import semmle.python.controlflow.internal.Cfg as Cfg
private import DataFlowPublic
private import DataFlowPrivate
private import FlowSummaryImpl as FlowSummaryImpl
@@ -162,7 +163,7 @@ newtype TArgumentPosition =
*/
TLambdaSelfArgumentPosition() or
TPositionalArgumentPosition(int index) {
exists(any(CallNode c).getArg(index))
exists(any(Cfg::CallNode c).getArg(index))
or
// since synthetic calls within a summarized callable could use a unique argument
// position, we need to ensure we make these available (these are specified as
@@ -174,7 +175,7 @@ newtype TArgumentPosition =
index = 0
} or
TKeywordArgumentPosition(string name) {
exists(any(CallNode c).getArgByName(name))
exists(any(Cfg::CallNode c).getArgByName(name))
or
// see comment for TPositionalArgumentPosition
FlowSummaryImpl::ParsePositions::isParsedKeywordParameterPosition(_, name)
@@ -256,9 +257,13 @@ predicate parameterMatch(ParameterPosition ppos, ArgumentPosition apos) {
*/
overlay[local]
predicate isStaticmethod(Function func) {
exists(NameNode id | id.getId() = "staticmethod" and id.isGlobal() |
func.getADecorator() = id.getNode()
)
// The decorator is *syntactically* a Name "staticmethod" — we don't
// care which variable it resolves to. Even if a class redefines
// `staticmethod`, the binding hasn't happened yet at the decorator
// position, so Python's runtime semantics is "use the builtin".
// Matches legacy ESSA semantics which used `isGlobal()` (i.e. "no
// SSA def reaches this load") at the decorator's `NameNode`.
func.getADecorator().(Name).getId() = "staticmethod"
}
/**
@@ -268,9 +273,7 @@ predicate isStaticmethod(Function func) {
*/
overlay[local]
predicate isClassmethod(Function func) {
exists(NameNode id | id.getId() = "classmethod" and id.isGlobal() |
func.getADecorator() = id.getNode()
)
func.getADecorator().(Name).getId() = "classmethod"
or
exists(Class cls |
cls.getAMethod() = func and
@@ -285,9 +288,7 @@ predicate isClassmethod(Function func) {
/** Holds if the function `func` has a `property` decorator. */
overlay[local]
predicate hasPropertyDecorator(Function func) {
exists(NameNode id | id.getId() = "property" and id.isGlobal() |
func.getADecorator() = id.getNode()
)
func.getADecorator().(Name).getId() = "property"
}
/**
@@ -295,10 +296,10 @@ predicate hasPropertyDecorator(Function func) {
*/
overlay[local]
predicate hasContextmanagerDecorator(Function func) {
exists(ControlFlowNode contextmanager |
contextmanager.(NameNode).getId() = "contextmanager" and contextmanager.(NameNode).isGlobal()
exists(Cfg::ControlFlowNode contextmanager |
contextmanager.(Cfg::NameNode).getId() = "contextmanager" and contextmanager.(Cfg::NameNode).isGlobal()
or
contextmanager.(AttrNode).getObject("contextmanager").(NameNode).getId() = "contextlib"
contextmanager.(Cfg::AttrNode).getObject("contextmanager").(Cfg::NameNode).getId() = "contextlib"
|
func.getADecorator() = contextmanager.getNode()
)
@@ -314,10 +315,10 @@ predicate hasContextmanagerDecorator(Function func) {
*/
overlay[local]
private predicate hasOverloadDecorator(Function func) {
exists(ControlFlowNode overload |
overload.(NameNode).getId() = "overload" and overload.(NameNode).isGlobal()
exists(Cfg::ControlFlowNode overload |
overload.(Cfg::NameNode).getId() = "overload" and overload.(Cfg::NameNode).isGlobal()
or
overload.(AttrNode).getObject("overload").(NameNode).isGlobal()
overload.(Cfg::AttrNode).getObject("overload").(Cfg::NameNode).isGlobal()
|
func.getADecorator() = overload.getNode()
)
@@ -536,7 +537,7 @@ class LibraryCallableValue extends DataFlowCallable, TLibraryCallable {
// =============================================================================
/** Gets a call to `type`. */
private CallCfgNode getTypeCall() {
exists(NameNode id | id.getId() = "type" and id.isGlobal() |
exists(Cfg::NameNode id | id.getId() = "type" and id.isGlobal() |
result.getFunction().asCfgNode() = id
)
}
@@ -548,7 +549,7 @@ private CallCfgNode getSuperCall() {
// link below), but otherwise only 2 edgecases. Overall it seems ok to ignore this complexity.
//
// https://github.com/python/cpython/blob/18b1782192f85bd26db89f5bc850f8bee4247c1a/Lib/unittest/mock.py#L48-L50
exists(NameNode id | id.getId() = "super" and id.isGlobal() |
exists(Cfg::NameNode id | id.getId() = "super" and id.isGlobal() |
result.getFunction().asCfgNode() = id
)
}
@@ -1034,7 +1035,7 @@ private module MethodCalls {
*/
pragma[nomagic]
private predicate directCall(
CallNode call, Function target, string functionName, Class cls, AttrRead attr, Node self
Cfg::CallNode call, Function target, string functionName, Class cls, AttrRead attr, Node self
) {
target = findFunctionAccordingToMroKnownStartingClass(cls, functionName) and
directCall_join(call, functionName, cls, attr, self)
@@ -1043,7 +1044,7 @@ private module MethodCalls {
/** Extracted to give good join order */
pragma[nomagic]
private predicate directCall_join(
CallNode call, string functionName, Class cls, AttrRead attr, Node self
Cfg::CallNode call, string functionName, Class cls, AttrRead attr, Node self
) {
call.getFunction() = attrReadTracker(attr).asCfgNode() and
attr.accesses(self, functionName) and
@@ -1060,7 +1061,7 @@ private module MethodCalls {
*/
pragma[nomagic]
private predicate callWithinMethodImplicitSelfOrCls(
CallNode call, Function target, string functionName, Class classWithMethod, AttrRead attr,
Cfg::CallNode call, Function target, string functionName, Class classWithMethod, AttrRead attr,
Node self
) {
target = findFunctionAccordingToMro(getADirectSubclass*(classWithMethod), functionName) and
@@ -1070,7 +1071,7 @@ private module MethodCalls {
/** Extracted to give good join order */
pragma[nomagic]
private predicate callWithinMethodImplicitSelfOrCls_join(
CallNode call, string functionName, Class classWithMethod, AttrRead attr, Node self
Cfg::CallNode call, string functionName, Class classWithMethod, AttrRead attr, Node self
) {
call.getFunction() = attrReadTracker(attr).asCfgNode() and
attr.accesses(self, functionName) and
@@ -1082,7 +1083,7 @@ private module MethodCalls {
* resolve the call to a known target (since the only super class might be the
* builtin `object`, so we never have the implementation of `__new__` in the DB).
*/
predicate fromSuperNewCall(CallNode call, Class classUsedInSuper, AttrRead attr, Node self) {
predicate fromSuperNewCall(Cfg::CallNode call, Class classUsedInSuper, AttrRead attr, Node self) {
fromSuper_join(call, "__new__", classUsedInSuper, attr, self) and
self in [classTracker(_), clsArgumentTracker(_)]
}
@@ -1104,7 +1105,7 @@ private module MethodCalls {
*/
pragma[nomagic]
predicate fromSuper(
CallNode call, Function target, string functionName, Class classUsedInSuper, AttrRead attr,
Cfg::CallNode call, Function target, string functionName, Class classUsedInSuper, AttrRead attr,
Node self
) {
target = findFunctionAccordingToMro(getNextClassInMro(classUsedInSuper), functionName) and
@@ -1114,7 +1115,7 @@ private module MethodCalls {
/** Extracted to give good join order */
pragma[nomagic]
private predicate fromSuper_join(
CallNode call, string functionName, Class classUsedInSuper, AttrRead attr, Node self
Cfg::CallNode call, string functionName, Class classUsedInSuper, AttrRead attr, Node self
) {
call.getFunction() = attrReadTracker(attr).asCfgNode() and
(
@@ -1133,7 +1134,7 @@ private module MethodCalls {
)
}
predicate resolveMethodCall(CallNode call, Function target, CallType type, Node self) {
predicate resolveMethodCall(Cfg::CallNode call, Function target, CallType type, Node self) {
(
directCall(call, target, _, _, _, self)
or
@@ -1180,7 +1181,7 @@ import MethodCalls
* NOTE: We have this predicate mostly to be able to compare with old point-to
* call-graph resolution. So it could be removed in the future.
*/
predicate resolveClassCall(CallNode call, Class cls) {
predicate resolveClassCall(Cfg::CallNode call, Class cls) {
call.getFunction() = classTracker(cls).asCfgNode()
or
// `cls()` inside a classmethod (which also contains `type(self)()` inside a method)
@@ -1210,7 +1211,7 @@ Function invokedFunctionFromClassConstruction(Class cls, string funcName) {
*
* See https://docs.python.org/3/reference/datamodel.html#object.__call__
*/
predicate resolveClassInstanceCall(CallNode call, Function target, Node self) {
predicate resolveClassInstanceCall(Cfg::CallNode call, Function target, Node self) {
exists(Class cls |
call.getFunction() = classInstanceTracker(cls).asCfgNode() and
target = findFunctionAccordingToMroKnownStartingClass(cls, "__call__")
@@ -1229,7 +1230,7 @@ predicate resolveClassInstanceCall(CallNode call, Function target, Node self) {
* Holds if `call` is a call to the `target`, with call-type `type`.
*/
cached
predicate resolveCall(CallNode call, Function target, CallType type) {
predicate resolveCall(Cfg::CallNode call, Function target, CallType type) {
Stages::DataFlow::ref() and
(
type instanceof CallTypePlainFunction and
@@ -1254,11 +1255,11 @@ predicate resolveCall(CallNode call, Function target, CallType type) {
// =============================================================================
/**
* Holds if the argument of `call` at position `apos` is `arg`. This is just a helper
* predicate that maps ArgumentPositions to the arguments of the underlying `CallNode`.
* predicate that maps ArgumentPositions to the arguments of the underlying `Cfg::CallNode`.
*/
overlay[local]
cached
predicate normalCallArg(CallNode call, Node arg, ArgumentPosition apos) {
predicate normalCallArg(Cfg::CallNode call, Node arg, ArgumentPosition apos) {
exists(int index |
apos.isPositional(index) and
arg.asCfgNode() = call.getArg(index)
@@ -1273,7 +1274,7 @@ predicate normalCallArg(CallNode call, Node arg, ArgumentPosition apos) {
exists(int index |
apos.isStarArgs(index) and
arg.asCfgNode() = call.getStarArg() and
// since `CallNode.getArg` doesn't include `*args`, we need to drop to the AST level
// since `Cfg::CallNode.getArg` doesn't include `*args`, we need to drop to the AST level
// to get the index. Notice that we only use the AST for getting the index, so we
// don't need to check for dominance in regards to splitting.
call.getStarArg().getNode() = call.getNode().getPositionalArg(index).(Starred).getValue()
@@ -1347,7 +1348,7 @@ predicate normalCallArg(CallNode call, Node arg, ArgumentPosition apos) {
* translated into `l.clear()`, and we can still have use-use flow.
*/
cached
predicate getCallArg(CallNode call, Function target, CallType type, Node arg, ArgumentPosition apos) {
predicate getCallArg(Cfg::CallNode call, Function target, CallType type, Node arg, ArgumentPosition apos) {
Stages::DataFlow::ref() and
resolveCall(call, target, type) and
(
@@ -1440,10 +1441,10 @@ private predicate sameEnclosingCallable(Node node1, Node node2) {
// DataFlowCall
// =============================================================================
newtype TDataFlowCall =
TNormalCall(CallNode call, Function target, CallType type) { resolveCall(call, target, type) } or
TNormalCall(Cfg::CallNode call, Function target, CallType type) { resolveCall(call, target, type) } or
/** A call to the generated function inside a comprehension */
TComprehensionCall(Comp c) or
TPotentialLibraryCall(CallNode call) or
TPotentialLibraryCall(Cfg::CallNode call) or
/** A synthesized call inside a summarized callable */
TSummaryCall(
FlowSummaryImpl::Public::SummarizedCallable c, FlowSummaryImpl::Private::SummaryNode receiver
@@ -1463,7 +1464,7 @@ abstract class DataFlowCall extends TDataFlowCall {
abstract ArgumentNode getArgument(ArgumentPosition apos);
/** Get the control flow node representing this call, if any. */
abstract ControlFlowNode getNode();
abstract Cfg::ControlFlowNode getNode();
/** Gets the enclosing callable of this call. */
DataFlowCallable getEnclosingCallable() { result = getCallableScope(this.getScope()) }
@@ -1494,28 +1495,28 @@ abstract class ExtractedDataFlowCall extends DataFlowCall {
}
/**
* A resolved call in source code with an underlying `CallNode`.
* A resolved call in source code with an underlying `Cfg::CallNode`.
*
* This is considered normal, compared with special calls such as `obj[0]` calling the
* `__getitem__` method on the object. However, this also includes calls that go to the
* `__call__` special method.
*/
class NormalCall extends ExtractedDataFlowCall, TNormalCall {
CallNode call;
Cfg::CallNode call;
Function target;
CallType type;
NormalCall() { this = TNormalCall(call, target, type) }
override string toString() {
// note: if we used toString directly on the CallNode we would get
// `ControlFlowNode for func()`
// but the `ControlFlowNode` part is just clutter, so we go directly to the AST node
// note: if we used toString directly on the Cfg::CallNode we would get
// `Cfg::ControlFlowNode for func()`
// but the `Cfg::ControlFlowNode` part is just clutter, so we go directly to the AST node
// instead.
result = call.getNode().toString()
}
override ControlFlowNode getNode() { result = call }
override Cfg::ControlFlowNode getNode() { result = call }
override Scope getScope() { result = call.getScope() }
@@ -1543,7 +1544,7 @@ class ComprehensionCall extends ExtractedDataFlowCall, TComprehensionCall {
override string toString() { result = "comprehension call" }
override ControlFlowNode getNode() { result.getNode() = c }
override Cfg::ControlFlowNode getNode() { result.getNode() = c }
override Scope getScope() { result = c.getScope() }
@@ -1566,14 +1567,14 @@ class ComprehensionCall extends ExtractedDataFlowCall, TComprehensionCall {
* in this class.
*/
class PotentialLibraryCall extends ExtractedDataFlowCall, TPotentialLibraryCall {
CallNode call;
Cfg::CallNode call;
PotentialLibraryCall() { this = TPotentialLibraryCall(call) }
override string toString() {
// note: if we used toString directly on the CallNode we would get
// `ControlFlowNode for func()`
// but the `ControlFlowNode` part is just clutter, so we go directly to the AST node
// note: if we used toString directly on the Cfg::CallNode we would get
// `Cfg::ControlFlowNode for func()`
// but the `Cfg::ControlFlowNode` part is just clutter, so we go directly to the AST node
// instead.
result = call.getNode().toString()
}
@@ -1590,10 +1591,10 @@ class PotentialLibraryCall extends ExtractedDataFlowCall, TPotentialLibraryCall
// potential self argument, from `foo.bar()` -- note that this could also just be a
// module reference, but we really don't have a good way of knowing :|
apos.isSelf() and
result.asCfgNode() = call.getFunction().(AttrNode).getObject()
result.asCfgNode() = call.getFunction().(Cfg::AttrNode).getObject()
}
override ControlFlowNode getNode() { result = call }
override Cfg::ControlFlowNode getNode() { result = call }
override Scope getScope() { result = call.getScope() }
}
@@ -1625,7 +1626,7 @@ class SummaryCall extends DataFlowCall, TSummaryCall {
override ArgumentNode getArgument(ArgumentPosition apos) { none() }
override ControlFlowNode getNode() { none() }
override Cfg::ControlFlowNode getNode() { none() }
override string toString() { result = "[summary] call to " + receiver + " in " + c }
@@ -1767,12 +1768,12 @@ private class SummaryPostUpdateNode extends FlowSummaryNode, PostUpdateNodeImpl
* This is used for tracking flow through captured variables.
*/
class SynthCapturedVariablesArgumentNode extends Node, TSynthCapturedVariablesArgumentNode {
ControlFlowNode callable;
Cfg::ControlFlowNode callable;
SynthCapturedVariablesArgumentNode() { this = TSynthCapturedVariablesArgumentNode(callable) }
/** Gets the `CallNode` corresponding to this captured variables argument node. */
CallNode getCallNode() { result.getFunction() = callable }
/** Gets the `Cfg::CallNode` corresponding to this captured variables argument node. */
Cfg::CallNode getCallNode() { result.getFunction() = callable }
/** Gets the `CfgNode` that corresponds to this synthetic node. */
CfgNode getUnderlyingNode() { result.asCfgNode() = callable }
@@ -1790,7 +1791,7 @@ class CapturedVariablesArgumentNodeAsArgumentNode extends ArgumentNode,
{
overlay[global]
override predicate argumentOf(DataFlowCall call, ArgumentPosition pos) {
exists(CallNode callNode | callNode = this.getCallNode() |
exists(Cfg::CallNode callNode | callNode = this.getCallNode() |
callNode = call.getNode() and
exists(Function target | resolveCall(callNode, target, _) |
target = any(VariableCapture::CapturedVariable v).getACapturingScope()
@@ -1804,7 +1805,7 @@ class CapturedVariablesArgumentNodeAsArgumentNode extends ArgumentNode,
class SynthCapturedVariablesArgumentPostUpdateNode extends PostUpdateNodeImpl,
TSynthCapturedVariablesArgumentPostUpdateNode
{
ControlFlowNode callable;
Cfg::ControlFlowNode callable;
SynthCapturedVariablesArgumentPostUpdateNode() {
this = TSynthCapturedVariablesArgumentPostUpdateNode(callable)
@@ -1911,8 +1912,8 @@ abstract class ReturnNode extends Node {
class ExtractedReturnNode extends ReturnNode, CfgNode {
// See `TaintTrackingImplementation::returnFlowStep`
ExtractedReturnNode() {
node = any(Return ret).getValue().getAFlowNode() or
node = any(Yield yield).getAFlowNode()
node.getNode() = any(Return ret).getValue() or
node.getNode() = any(Yield yield)
}
override ReturnKind getKind() { any() }
@@ -1930,7 +1931,7 @@ class ExtractedReturnNode extends ReturnNode, CfgNode {
class YieldNodeInContextManagerFunction extends ReturnNode, CfgNode {
YieldNodeInContextManagerFunction() {
hasContextmanagerDecorator(node.getScope()) and
node = any(Yield yield).getValue().getAFlowNode()
node.getNode() = any(Yield yield).getValue()
}
override ReturnKind getKind() { any() }

View File

@@ -2,8 +2,9 @@ overlay[local?]
module;
private import python
private import semmle.python.controlflow.internal.Cfg as Cfg
private import DataFlowPublic
private import semmle.python.essa.SsaCompute
private import semmle.python.dataflow.new.internal.SsaImpl as SsaImpl
private import semmle.python.dataflow.new.internal.ImportResolution
private import FlowSummaryImpl as FlowSummaryImpl
private import semmle.python.frameworks.data.ModelsAsData
@@ -43,13 +44,23 @@ predicate isArgumentNode(ArgumentNode arg, DataFlowCall c, ArgumentPosition pos)
// Nodes
//--------
overlay[local]
predicate isExpressionNode(ControlFlowNode node) { node.getNode() instanceof Expr }
predicate isExpressionNode(Cfg::ControlFlowNode node) {
node.getNode() instanceof Expr
or
// `Cfg::ForNode` wraps a `For` statement's iter position, but
// overrides `.getNode()` to return the `Py::For` statement (for
// legacy parity). The underlying AST is still an `Expr` (the iter
// expression); we want a dataflow node here so that for-loop
// content reads (`for y in l`) have a source expression node to
// read content from.
node instanceof Cfg::ForNode
}
// =============================================================================
// SyntheticPreUpdateNode
// =============================================================================
class SyntheticPreUpdateNode extends Node, TSyntheticPreUpdateNode {
CallNode node;
Cfg::CallNode node;
SyntheticPreUpdateNode() { this = TSyntheticPreUpdateNode(node) }
@@ -151,7 +162,7 @@ predicate synthStarArgsElementParameterNodeStoreStep(
* been passed in a `**kwargs` argument.
*/
class SynthDictSplatArgumentNode extends Node, TSynthDictSplatArgumentNode {
CallNode node;
Cfg::CallNode node;
SynthDictSplatArgumentNode() { this = TSynthDictSplatArgumentNode(node) }
@@ -165,7 +176,7 @@ class SynthDictSplatArgumentNode extends Node, TSynthDictSplatArgumentNode {
private predicate synthDictSplatArgumentNodeStoreStep(
ArgumentNode nodeFrom, DictionaryElementContent c, SynthDictSplatArgumentNode nodeTo
) {
exists(string name, CallNode call, ArgumentPosition keywordPos |
exists(string name, Cfg::CallNode call, ArgumentPosition keywordPos |
nodeTo = TSynthDictSplatArgumentNode(call) and
getCallArg(call, _, _, nodeFrom, keywordPos) and
keywordPos.isKeyword(name) and
@@ -185,8 +196,8 @@ private predicate synthDictSplatArgumentNodeStoreStep(
*/
predicate yieldStoreStep(Node nodeFrom, Content c, Node nodeTo) {
exists(Yield yield |
nodeTo.asCfgNode() = yield.getAFlowNode() and
nodeFrom.asCfgNode() = yield.getValue().getAFlowNode() and
nodeTo.asCfgNode().getNode() = yield and
nodeFrom.asCfgNode().getNode() = yield.getValue() and
// TODO: Consider if this will also need to transfer dictionary content
// once dictionary comprehensions are supported.
c instanceof ListElementContent
@@ -289,7 +300,7 @@ abstract class PostUpdateNodeImpl extends Node {
* Synthetic post-update nodes for synthetic nodes need to be listed one by one.
*/
class SyntheticPostUpdateNode extends PostUpdateNodeImpl, TSyntheticPostUpdateNode {
ControlFlowNode node;
Cfg::ControlFlowNode node;
SyntheticPostUpdateNode() { this = TSyntheticPostUpdateNode(node) }
@@ -333,16 +344,38 @@ module LocalFlow {
// `x = f(42)`
// nodeFrom is `f(42)`
// nodeTo is `x`
exists(AssignmentDefinition def |
//
// We use the CFG-level `DefinitionNode.getValue()` directly rather
// than going through SSA, because the new SSA library prunes write
// definitions that have no subsequent read in the same scope (e.g.
// a module-level `def f():` whose `f` is only read inside other
// functions). The CFG-level link is unconditional.
//
// The Name-target restriction mirrors legacy ESSA's
// `SsaDefinitions::assignment_definition`, which required
// `defn.(NameNode).defines(v)`. Subscript and attribute writes
// (`x[i] = 42`, `obj.attr = 42`) are intentionally excluded — their
// value flow is handled by the content-flow / `AttrWrite` machinery,
// not by a local-flow step *into* the Subscript/Attribute expression.
// Excluding them is essential for keeping augmented-assignment
// targets (`x[i] += 42`) classifiable as `LocalSourceNode` on the
// read side: the single canonical CFG node is both a load and a
// store, and any incoming local-flow step would disqualify it from
// being a local source.
exists(Cfg::DefinitionNode def |
nodeFrom.(CfgNode).getNode() = def.getValue() and
nodeTo.(CfgNode).getNode() = def.getDefiningNode()
nodeTo.(CfgNode).getNode() = def and
def instanceof Cfg::NameNode
)
or
// With definition
// `with f(42) as x:`
// nodeFrom is `f(42)`
// nodeTo is `x`
exists(With with, ControlFlowNode contextManager, WithDefinition withDef, ControlFlowNode var |
exists(
With with, Cfg::ControlFlowNode contextManager, SsaImpl::WithDefinition withDef,
Cfg::ControlFlowNode var
|
var = withDef.getDefiningNode()
|
nodeFrom.(CfgNode).getNode() = contextManager and
@@ -361,13 +394,13 @@ module LocalFlow {
predicate expressionFlowStep(Node nodeFrom, Node nodeTo) {
// If expressions
nodeFrom.asCfgNode() = nodeTo.asCfgNode().(IfExprNode).getAnOperand()
nodeFrom.asCfgNode() = nodeTo.asCfgNode().(Cfg::IfExprNode).getAnOperand()
or
// Assignment expressions
nodeFrom.asCfgNode() = nodeTo.asCfgNode().(AssignmentExprNode).getValue()
nodeFrom.asCfgNode() = nodeTo.asCfgNode().(Cfg::AssignmentExprNode).getValue()
or
// boolean inline expressions such as `x or y` or `x and y`
nodeFrom.asCfgNode() = nodeTo.asCfgNode().(BoolExprNode).getAnOperand()
nodeFrom.asCfgNode() = nodeTo.asCfgNode().(Cfg::BoolExprNode).getAnOperand()
or
// Flow inside an unpacking assignment
iterableUnpackingFlowStep(nodeFrom, nodeTo)
@@ -376,12 +409,12 @@ module LocalFlow {
matchFlowStep(nodeFrom, nodeTo)
}
predicate useToNextUse(NameNode nodeFrom, NameNode nodeTo) {
AdjacentUses::adjacentUseUse(nodeFrom, nodeTo)
predicate useToNextUse(Cfg::NameNode nodeFrom, Cfg::NameNode nodeTo) {
SsaImpl::AdjacentUses::adjacentUseUse(nodeFrom, nodeTo)
}
predicate defToFirstUse(EssaVariable var, NameNode nodeTo) {
AdjacentUses::firstUse(var.getDefinition(), nodeTo)
predicate defToFirstUse(SsaImpl::EssaVariable var, Cfg::NameNode nodeTo) {
SsaImpl::AdjacentUses::firstUse(var.getDefinition(), nodeTo)
}
predicate useUseFlowStep(Node nodeFrom, Node nodeTo) {
@@ -390,12 +423,12 @@ module LocalFlow {
// `x = f(y)`
// nodeFrom is `y` on first line
// nodeTo is `y` on second line
exists(EssaDefinition def |
nodeFrom.(CfgNode).getNode() = def.(EssaNodeDefinition).getDefiningNode()
exists(SsaImpl::EssaDefinition def |
nodeFrom.(CfgNode).getNode() = def.(SsaImpl::EssaNodeDefinition).getDefiningNode()
or
nodeFrom.(ScopeEntryDefinitionNode).getDefinition() = def
|
AdjacentUses::firstUse(def, nodeTo.(CfgNode).getNode())
SsaImpl::AdjacentUses::firstUse(def, nodeTo.(CfgNode).getNode())
)
or
// Next use after use
@@ -557,9 +590,9 @@ predicate runtimeJumpStep(Node nodeFrom, Node nodeTo) {
// a parameter with a default value, since the parameter will be in the scope of the
// function, while the default value itself will be in the scope that _defines_ the
// function.
exists(ParameterDefinition param |
exists(SsaImpl::ParameterDefinition param |
// note: we go to the _control-flow node_ of the parameter, and not the ESSA node of the parameter, since for type-tracking, the ESSA node is not a LocalSourceNode, so we would get in trouble.
nodeFrom.asCfgNode() = param.getDefault() and
nodeFrom.asCfgNode().getNode() = param.getParameter().(Parameter).getDefault() and
nodeTo.asCfgNode() = param.getDefiningNode()
)
or
@@ -663,7 +696,7 @@ predicate neverSkipInPathGraph(Node n) {
// ```
// we would end up saying that the path MUST not skip the x in `y = x`, which is just
// annoying and doesn't help the path explanation become clearer.
n.asCfgNode() = any(EssaNodeDefinition def).getDefiningNode()
n.asCfgNode() = any(SsaImpl::EssaNodeDefinition def).getDefiningNode()
}
/**
@@ -872,7 +905,7 @@ predicate listStoreStep(CfgNode nodeFrom, ListElementContent c, CfgNode nodeTo)
// nodeFrom is `42`, cfg node
// nodeTo is the list, `[..., 42, ...]`, cfg node
// c denotes element of list
nodeTo.getNode().(ListNode).getAnElement() = nodeFrom.getNode() and
nodeTo.getNode().(Cfg::ListNode).getAnElement() = nodeFrom.getNode() and
not nodeTo.getNode() instanceof UnpackingAssignmentSequenceTarget and
// Suppress unused variable warning
c = c
@@ -885,7 +918,7 @@ predicate setStoreStep(CfgNode nodeFrom, SetElementContent c, CfgNode nodeTo) {
// nodeFrom is `42`, cfg node
// nodeTo is the set, `{..., 42, ...}`, cfg node
// c denotes element of list
nodeTo.getNode().(SetNode).getAnElement() = nodeFrom.getNode() and
nodeTo.getNode().(Cfg::SetNode).getAnElement() = nodeFrom.getNode() and
// Suppress unused variable warning
c = c
}
@@ -898,7 +931,7 @@ predicate tupleStoreStep(CfgNode nodeFrom, TupleElementContent c, CfgNode nodeTo
// nodeTo is the tuple, `(..., 42, ...)`, cfg node
// c denotes element of tuple and index of nodeFrom
exists(int n |
nodeTo.getNode().(TupleNode).getElement(n) = nodeFrom.getNode() and
nodeTo.getNode().(Cfg::TupleNode).getElement(n) = nodeFrom.getNode() and
not nodeTo.getNode() instanceof UnpackingAssignmentSequenceTarget and
c.getIndex() = n
)
@@ -912,7 +945,7 @@ predicate dictStoreStep(CfgNode nodeFrom, DictionaryElementContent c, Node nodeT
// nodeTo is the dict, `{..., "key" = 42, ...}`, cfg node
// c denotes element of dictionary and the key `"key"`
exists(KeyValuePair item |
item = nodeTo.asCfgNode().(DictNode).getNode().(Dict).getAnItem() and
item = nodeTo.asCfgNode().(Cfg::DictNode).getNode().(Dict).getAnItem() and
nodeFrom.getNode().getNode() = item.getValue() and
c.getKey() = item.getKey().(StringLiteral).getS()
)
@@ -927,9 +960,9 @@ predicate dictStoreStep(CfgNode nodeFrom, DictionaryElementContent c, Node nodeT
private predicate moreDictStoreSteps(CfgNode nodeFrom, DictionaryElementContent c, Node nodeTo) {
// NOTE: It's important to add logic to the newtype definition of
// DictionaryElementContent if you add new cases here.
exists(SubscriptNode subscript |
exists(Cfg::SubscriptNode subscript |
nodeTo.(PostUpdateNode).getPreUpdateNode().asCfgNode() = subscript.getObject() and
nodeFrom.asCfgNode() = subscript.(DefinitionNode).getValue() and
nodeFrom.asCfgNode() = subscript.(Cfg::DefinitionNode).getValue() and
c.getKey() = subscript.getIndex().getNode().(StringLiteral).getText()
)
or
@@ -942,8 +975,8 @@ private predicate moreDictStoreSteps(CfgNode nodeFrom, DictionaryElementContent
}
predicate dictClearStep(Node node, DictionaryElementContent c) {
exists(SubscriptNode subscript |
subscript instanceof DefinitionNode and
exists(Cfg::SubscriptNode subscript |
subscript instanceof Cfg::DefinitionNode and
node.asCfgNode() = subscript.getObject() and
c.getKey() = subscript.getIndex().getNode().(StringLiteral).getText()
)
@@ -1018,7 +1051,7 @@ predicate subscriptReadStep(CfgNode nodeFrom, Content c, CfgNode nodeTo) {
// nodeFrom is `l`, cfg node
// nodeTo is `l[3]`, cfg node
// c is compatible with 3
nodeFrom.getNode() = nodeTo.getNode().(SubscriptNode).getObject() and
nodeFrom.getNode() = nodeTo.getNode().(Cfg::SubscriptNode).getObject() and
(
c instanceof ListElementContent
or
@@ -1027,10 +1060,10 @@ predicate subscriptReadStep(CfgNode nodeFrom, Content c, CfgNode nodeTo) {
c instanceof DictionaryElementAnyContent
or
c.(TupleElementContent).getIndex() =
nodeTo.getNode().(SubscriptNode).getIndex().getNode().(IntegerLiteral).getValue()
nodeTo.getNode().(Cfg::SubscriptNode).getIndex().getNode().(IntegerLiteral).getValue()
or
c.(DictionaryElementContent).getKey() =
nodeTo.getNode().(SubscriptNode).getIndex().getNode().(StringLiteral).getS()
nodeTo.getNode().(Cfg::SubscriptNode).getIndex().getNode().(StringLiteral).getS()
)
}

View File

@@ -5,11 +5,12 @@ overlay[local]
module;
private import python
private import semmle.python.controlflow.internal.Cfg as Cfg
private import DataFlowPrivate
import semmle.python.dataflow.new.TypeTracking
import Attributes
import LocalSources
private import semmle.python.essa.SsaCompute
private import semmle.python.dataflow.new.internal.SsaImpl as SsaImpl
private import semmle.python.dataflow.new.internal.ImportStar
private import semmle.python.frameworks.data.ModelsAsData
private import FlowSummaryImpl as FlowSummaryImpl
@@ -27,7 +28,7 @@ private import semmle.python.frameworks.data.ModelsAsData
overlay[local]
newtype TNode =
/** A node corresponding to a control flow node. */
TCfgNode(ControlFlowNode node) {
TCfgNode(Cfg::ControlFlowNode node) {
isExpressionNode(node)
or
node.getNode() instanceof Pattern
@@ -36,7 +37,7 @@ newtype TNode =
* A node corresponding to a scope entry definition. That is, the value of a variable
* as it enters a scope.
*/
TScopeEntryDefinitionNode(ScopeEntryDefinition def) { not def.getScope() instanceof Module } or
TScopeEntryDefinitionNode(SsaImpl::ScopeEntryDefinition def) { not def.getScope() instanceof Module } or
/**
* A synthetic node representing the value of an object before a state change.
*
@@ -47,13 +48,13 @@ newtype TNode =
// NOTE: since we can't rely on the call graph, but we want to have synthetic
// pre-update nodes for class calls, we end up getting synthetic pre-update nodes for
// ALL calls :|
TSyntheticPreUpdateNode(CallNode call) or
TSyntheticPreUpdateNode(Cfg::CallNode call) or
/**
* A synthetic node representing the value of an object after a state change.
* See QLDoc for `PostUpdateNode`.
*/
TSyntheticPostUpdateNode(ControlFlowNode node) {
exists(CallNode call |
TSyntheticPostUpdateNode(Cfg::ControlFlowNode node) {
exists(Cfg::CallNode call |
node = call.getArg(_)
or
node = call.getArgByName(_)
@@ -62,12 +63,12 @@ newtype TNode =
node = call.getFunction()
)
or
node = any(AttrNode a).getObject()
node = any(Cfg::AttrNode a).getObject()
or
node = any(SubscriptNode s).getObject()
node = any(Cfg::SubscriptNode s).getObject()
or
// self parameter when used implicitly in `super()`
exists(Class cls, Function func, ParameterDefinition def |
exists(Class cls, Function func, SsaImpl::ParameterDefinition def |
func = cls.getAMethod() and
not isStaticmethod(func) and
// this matches what we do in ExtractedParameterNode
@@ -112,7 +113,7 @@ newtype TNode =
exists(ParameterPosition ppos | ppos.isStarArgs(_) | exists(callable.getParameter(ppos)))
} or
/** A synthetic node to capture keyword arguments that are passed to a `**kwargs` parameter. */
TSynthDictSplatArgumentNode(CallNode call) { exists(call.getArgByName(_)) } or
TSynthDictSplatArgumentNode(Cfg::CallNode call) { exists(call.getArgByName(_)) } or
/** A synthetic node to allow flow to keyword parameters from a `**kwargs` argument. */
TSynthDictSplatParameterNode(DataFlowCallable callable) {
exists(ParameterPosition ppos | ppos.isKeyword(_) | exists(callable.getParameter(ppos)))
@@ -128,15 +129,15 @@ newtype TNode =
* A synthetic node representing the values of the variables captured
* by the callable being called.
*/
TSynthCapturedVariablesArgumentNode(ControlFlowNode callable) {
callable = any(CallNode c).getFunction()
TSynthCapturedVariablesArgumentNode(Cfg::ControlFlowNode callable) {
callable = any(Cfg::CallNode c).getFunction()
} or
/**
* A synthetic node representing the values of the variables captured
* by the callable being called, after the output has been computed.
*/
TSynthCapturedVariablesArgumentPostUpdateNode(ControlFlowNode callable) {
callable = any(CallNode c).getFunction()
TSynthCapturedVariablesArgumentPostUpdateNode(Cfg::ControlFlowNode callable) {
callable = any(Cfg::CallNode c).getFunction()
} or
/** A synthetic node representing the values of variables captured by a comprehension. */
TSynthCompCapturedVariablesArgumentNode(Comp comp) {
@@ -194,7 +195,7 @@ class Node extends TNode {
}
/** Gets the control-flow node corresponding to this node, if any. */
ControlFlowNode asCfgNode() { none() }
Cfg::ControlFlowNode asCfgNode() { none() }
/** Gets the expression corresponding to this node, if any. */
Expr asExpr() { none() }
@@ -207,14 +208,14 @@ class Node extends TNode {
/** A data-flow node corresponding to a control-flow node. */
class CfgNode extends Node, TCfgNode {
ControlFlowNode node;
Cfg::ControlFlowNode node;
CfgNode() { this = TCfgNode(node) }
/** Gets the `ControlFlowNode` represented by this data-flow node. */
ControlFlowNode getNode() { result = node }
/** Gets the `Cfg::ControlFlowNode` represented by this data-flow node. */
Cfg::ControlFlowNode getNode() { result = node }
override ControlFlowNode asCfgNode() { result = node }
override Cfg::ControlFlowNode asCfgNode() { result = node }
/** Gets a textual representation of this element. */
override string toString() { result = node.toString() }
@@ -224,9 +225,9 @@ class CfgNode extends Node, TCfgNode {
override Location getLocation() { result = node.getLocation() }
}
/** A data-flow node corresponding to a `CallNode` in the control-flow graph. */
/** A data-flow node corresponding to a `Cfg::CallNode` in the control-flow graph. */
class CallCfgNode extends CfgNode, LocalSourceNode {
override CallNode node;
override Cfg::CallNode node;
/**
* Gets the data-flow node for the function component of the call corresponding to this data-flow
@@ -307,15 +308,15 @@ ExprNode exprNode(DataFlowExpr e) { result.getNode().getNode() = e }
* as it enters a scope.
*/
class ScopeEntryDefinitionNode extends Node, TScopeEntryDefinitionNode {
ScopeEntryDefinition def;
SsaImpl::ScopeEntryDefinition def;
ScopeEntryDefinitionNode() { this = TScopeEntryDefinitionNode(def) }
/** Gets the `ScopeEntryDefinition` associated with this node. */
ScopeEntryDefinition getDefinition() { result = def }
/** Gets the `SsaImpl::ScopeEntryDefinition` associated with this node. */
SsaImpl::ScopeEntryDefinition getDefinition() { result = def }
/** Gets the source variable represented by this node. */
SsaSourceVariable getVariable() { result = def.getSourceVariable() }
SsaImpl::SsaSourceVariable getVariable() { result = def.getSourceVariable() }
override Location getLocation() { result = def.getLocation() }
@@ -337,7 +338,7 @@ class ParameterNode extends Node instanceof ParameterNodeImpl {
/** A parameter node found in the source code (not in a summary). */
class ExtractedParameterNode extends ParameterNodeImpl, CfgNode {
//, LocalSourceNode {
ParameterDefinition def;
SsaImpl::ParameterDefinition def;
ExtractedParameterNode() { node = def.getDefiningNode() }
@@ -368,10 +369,10 @@ Node getCallArgApproximation() {
exists(Class c | result.asExpr() = c.getAMethod().getArg(0))
or
// the object part of an attribute expression (which might be a bound method)
result.asCfgNode() = any(AttrNode a).getObject()
result.asCfgNode() = any(Cfg::AttrNode a).getObject()
or
// the function part of any call
result.asCfgNode() = any(CallNode c).getFunction()
result.asCfgNode() = any(Cfg::CallNode c).getFunction()
}
/** Gets the extracted argument nodes that do not rely on `getCallArg`. */
@@ -380,7 +381,7 @@ private Node implicitArgumentNode() {
normalCallArg(_, result, _)
or
// and self arguments
result.asCfgNode() = any(CallNode c).getFunction().(AttrNode).getObject()
result.asCfgNode() = any(Cfg::CallNode c).getFunction().(Cfg::AttrNode).getObject()
or
// for comprehensions, we allow the synthetic `iterable` argument
result.asExpr() = any(Comp c).getIterable()
@@ -485,21 +486,24 @@ class ModuleVariableNode extends Node, TModuleVariableNode {
/** Gets a node that reads this variable, excluding reads that happen through `from ... import *`. */
Node getALocalRead() {
result.asCfgNode() = var.getALoad().getAFlowNode() and
result.asCfgNode().getNode() = var.getALoad() and
not result.getScope() = mod
}
/** Gets an `EssaNode` that corresponds to an assignment of this global variable. */
/** Gets a CFG node that corresponds to an assignment of this global variable. */
Node getAWrite() {
any(EssaNodeDefinition def).definedBy(var, result.asCfgNode().(DefinitionNode))
exists(Cfg::NameNode n |
n.defines(var) and
result.asCfgNode() = n
)
}
/** Gets the possible values of the variable at the end of import time */
CfgNode getADefiningWrite() {
exists(SsaVariable def |
def = any(SsaVariable ssa_var).getAnUltimateDefinition() and
def.getDefinition() = result.asCfgNode() and
def.getVariable() = var
exists(SsaImpl::EssaVariable def |
def = any(SsaImpl::EssaVariable ssa_var).getAnUltimateDefinition() and
def.getDefinition().(SsaImpl::EssaNodeDefinition).getDefiningNode() = result.asCfgNode() and
def.getSourceVariable().getVariable() = var
)
}
@@ -516,7 +520,7 @@ private ModuleVariableNode import_star_read(Node n) {
overlay[global]
pragma[nomagic]
private predicate resolved_import_star_module(Module m, string name, Node n) {
exists(NameNode nn | nn = n.asCfgNode() |
exists(Cfg::NameNode nn | nn = n.asCfgNode() |
ImportStar::importStarResolvesTo(pragma[only_bind_into](nn), m) and
nn.getId() = name
)
@@ -574,88 +578,106 @@ class StarPatternElementNode extends Node, TStarPatternElementNode {
}
/**
* Gets a node that controls whether other nodes are evaluated.
* A node that participates in a conditional split: a CFG node whose
* evaluation outcome (true/false) is used to choose between two
* successor basic blocks. In the new shared CFG, such nodes appear in
* pairs of `isAfterTrue`/`isAfterFalse` annotated CFG nodes.
*
* In the base case, this is the last node of `conditionBlock`, and `flipped` is `false`.
* This definition accounts for (short circuting) `and`- and `or`-expressions, as the structure
* of basic blocks will reflect their semantics.
* Users typically obtain a `GuardNode` by casting from a more specific
* Cfg type: `g.(Cfg::CallNode)` for a call-based check, etc.
*
* However, in the program
* ```python
* if not is_safe(path):
* return
* ```
* the last node in the `ConditionBlock` is `not is_safe(path)`.
*
* We would like to consider also `is_safe(path)` a guard node, albeit with `flipped` being `true`.
* Thus we recurse through `not`-expressions.
* This replaces the legacy (pre-shared-CFG) `GuardNode`/`flipped`
* machinery: the shared CFG carries outcome information structurally
* (via `isAfterTrue`/`isAfterFalse`), so no separate polarity field
* is required.
*/
ControlFlowNode guardNode(ConditionBlock conditionBlock, boolean flipped) {
// Base case: the last node truly does determine which successor is chosen
result = conditionBlock.getLastNode() and
flipped = false
or
// Recursive cases:
// if a guard node is a `not`-expression,
// the operand is also a guard node, but with inverted polarity.
exists(UnaryExprNode notNode |
result = notNode.getOperand() and
notNode.getNode().getOp() instanceof Not
|
notNode = guardNode(conditionBlock, flipped.booleanNot())
)
or
// if a guard node is compared to a boolean literal,
// the other operand is also a guard node,
// but with polarity depending on the literal (and on the comparison).
exists(CompareNode cmpNode, Cmpop op, ControlFlowNode b, boolean should_flip |
(
cmpNode.operands(result, op, b) or
cmpNode.operands(b, op, result)
) and
not result.getNode() instanceof BooleanLiteral and
(
// comparing to the boolean
(op instanceof Eq or op instanceof Is) and
// we should flip if the value compared against, here the value of `b`, is false
should_flip = b.getNode().(BooleanLiteral).booleanValue().booleanNot()
or
// comparing to the negation of the boolean
(op instanceof NotEq or op instanceof IsNot) and
// again, we should flip if the value compared against, here the value of `not b`, is false.
// That is, if the value of `b` is true.
should_flip = b.getNode().(BooleanLiteral).booleanValue()
class GuardNode extends Cfg::ControlFlowNode {
GuardNode() {
// This is the canonical (post-order) version of an AST node, and
// some `[true]`/`[false]` variant of the same AST exists. We
// include the canonical node because users identify guards by
// their AST (`g.(Cfg::CallNode)` etc.), and the outcome-tagged
// variants are accessed by `outcomeOfGuard` below.
exists(Cfg::ControlFlowNode outcome |
outcome.getNode() = this.getNode() and
(outcome.isAfterTrue(_) or outcome.isAfterFalse(_))
)
|
// we flip `flipped` according to `should_flip` via the formula `flipped xor should_flip`.
flipped in [true, false] and
cmpNode = guardNode(conditionBlock, flipped.booleanXor(should_flip))
)
or
// Or: this IS one of the outcome-tagged variants, supporting
// users who want to query the split point directly.
this.isAfterTrue(_)
or
this.isAfterFalse(_)
}
/** Holds if this guard controls block `b` upon evaluating to `branch`. */
predicate controlsBlock(Cfg::BasicBlock b, boolean branch) {
branch in [true, false] and
exists(Cfg::ControlFlowNode outcomeNode |
outcomeOfGuard(this, outcomeNode, branch) and
outcomeNode.getBasicBlock().dominates(b)
)
}
}
/**
* A node that controls whether other nodes are evaluated.
* Holds if some execution that arrives at `outcomeNode` corresponds
* to `guard` having evaluated to `branch`.
*
* The field `flipped` allows us to match `GuardNode`s underneath
* `not`-expressions and still choose the appropriate branch.
* For a direct guard `if g:`, the outcome node is `g` itself with
* `isAfterTrue`/`isAfterFalse`. For wrapped guards like `not g` or
* `g == True`, the outcome is on the wrapping expression with an
* appropriate polarity transform — we follow those wrappers up the
* AST to find the outermost expression that carries an actual
* `isAfterTrue`/`isAfterFalse` outcome.
*/
class GuardNode extends ControlFlowNode {
ConditionBlock conditionBlock;
boolean flipped;
GuardNode() { this = guardNode(conditionBlock, flipped) }
/** Holds if this guard controls block `b` upon evaluating to `branch`. */
predicate controlsBlock(BasicBlock b, boolean branch) {
branch in [true, false] and
conditionBlock.controls(b, branch.booleanXor(flipped))
}
private predicate outcomeOfGuard(
Cfg::ControlFlowNode guard, Cfg::ControlFlowNode outcomeNode, boolean branch
) {
// Base case: the guard itself splits — the outcome node is the
// first node of an outcome BB, with matching outcome label.
// (The shared CFG also marks inner expressions with outcome flags
// for analysis purposes, but only "splitting" nodes — those that
// actually start an outcome BB — are valid guards on their own.)
outcomeNode.getNode() = guard.getNode() and
outcomeNode = outcomeNode.getBasicBlock().firstNode() and
(
outcomeNode.isAfterTrue(_) and branch = true
or
outcomeNode.isAfterFalse(_) and branch = false
)
or
// Recursive: `not guard` — same outcome split as `guard`, flipped.
exists(Cfg::UnaryExprNode notNode, boolean notBranch |
notNode.getOperand().getNode() = guard.getNode() and
notNode.getNode().getOp() instanceof Not and
outcomeOfGuard(notNode, outcomeNode, notBranch) and
branch = notBranch.booleanNot()
)
or
// Recursive: comparisons against a boolean literal.
exists(Cfg::CompareNode cmpNode, Cmpop op, Cfg::ControlFlowNode otherOperand,
Cfg::ControlFlowNode guardOperand, boolean polarity, boolean cmpBranch
|
guardOperand.getNode() = guard.getNode() and
(cmpNode.operands(guardOperand, op, otherOperand) or cmpNode.operands(otherOperand, op, guardOperand)) and
not guard.getNode() instanceof BooleanLiteral and
(
(op instanceof Eq or op instanceof Is) and
polarity = otherOperand.getNode().(BooleanLiteral).booleanValue()
or
(op instanceof NotEq or op instanceof IsNot) and
polarity = otherOperand.getNode().(BooleanLiteral).booleanValue().booleanNot()
) and
outcomeOfGuard(cmpNode, outcomeNode, cmpBranch) and
branch = cmpBranch.booleanXor(polarity.booleanNot())
)
}
/**
* Holds if the guard `g` validates `node` upon evaluating to `branch`.
*/
signature predicate guardChecksSig(GuardNode g, ControlFlowNode node, boolean branch);
signature predicate guardChecksSig(GuardNode g, Cfg::ControlFlowNode node, boolean branch);
/**
* Provides a set of barrier nodes for a guard that validates a node.
@@ -670,7 +692,7 @@ module BarrierGuard<guardChecksSig/3 guardChecks> {
result = ParameterizedBarrierGuard<Unit, extendedGuardChecks/4>::getABarrierNode(_)
}
private predicate extendedGuardChecks(GuardNode g, ControlFlowNode node, boolean branch, Unit u) {
private predicate extendedGuardChecks(GuardNode g, Cfg::ControlFlowNode node, boolean branch, Unit u) {
guardChecks(g, node, branch) and
u = u
}
@@ -680,7 +702,7 @@ bindingset[this]
private signature class ParamSig;
private module WithParam<ParamSig P> {
signature predicate guardChecksSig(GuardNode g, ControlFlowNode node, boolean branch, P param);
signature predicate guardChecksSig(GuardNode g, Cfg::ControlFlowNode node, boolean branch, P param);
}
/**
@@ -693,10 +715,10 @@ module ParameterizedBarrierGuard<ParamSig P, WithParam<P>::guardChecksSig/4 guar
/** Gets a node that is safely guarded by the given guard check with parameter `param`. */
overlay[global]
ExprNode getABarrierNode(P param) {
exists(GuardNode g, EssaDefinition def, ControlFlowNode node, boolean branch |
AdjacentUses::useOfDef(def, node) and
exists(GuardNode g, SsaImpl::EssaDefinition def, Cfg::ControlFlowNode node, boolean branch |
SsaImpl::AdjacentUses::useOfDef(def, node) and
guardChecks(g, node, branch, param) and
AdjacentUses::useOfDef(def, result.asCfgNode()) and
SsaImpl::AdjacentUses::useOfDef(def, result.asCfgNode()) and
g.controlsBlock(result.asCfgNode().getBasicBlock(), branch)
)
}
@@ -712,7 +734,7 @@ module ExternalBarrierGuard {
private import semmle.python.ApiGraphs
overlay[global]
private predicate guardCheck(GuardNode g, ControlFlowNode node, boolean branch, string kind) {
private predicate guardCheck(GuardNode g, Cfg::ControlFlowNode node, boolean branch, string kind) {
exists(API::CallNode call, API::Node parameter |
parameter = call.getAParameter() and
parameter = ModelOutput::getABarrierGuardNode(kind, branch)
@@ -748,10 +770,10 @@ newtype TContent =
TSetElementContent() or
/** An element of a tuple at a specific index. */
TTupleElementContent(int index) {
exists(any(TupleNode tn).getElement(index))
exists(any(Cfg::TupleNode tn).getElement(index))
or
// Arguments can overflow and end up in the starred parameter tuple.
exists(any(CallNode cn).getArg(index))
exists(any(Cfg::CallNode cn).getArg(index))
or
// since flow summaries might use tuples, we ensure that we at least have valid
// TTupleElementContent for the 0..7 (7 was picked to match `small_tuple` in
@@ -768,10 +790,10 @@ newtype TContent =
or
// d["key"] = ...
key =
any(SubscriptNode sub | sub.isStore() | sub.getIndex().getNode().(StringLiteral).getText())
any(Cfg::SubscriptNode sub | sub.isStore() | sub.getIndex().getNode().(StringLiteral).getText())
or
// d.setdefault("key", ...)
exists(CallNode call | call.getFunction().(AttrNode).getName() = "setdefault" |
exists(Cfg::CallNode call | call.getFunction().(Cfg::AttrNode).getName() = "setdefault" |
key = call.getArg(0).getNode().(StringLiteral).getText()
)
} or

View File

@@ -5,11 +5,24 @@
*/
private import python
private import semmle.python.controlflow.internal.Cfg as Cfg
private import semmle.python.dataflow.new.internal.SsaImpl as SsaImpl
private import semmle.python.dataflow.new.DataFlow
private import semmle.python.dataflow.new.internal.ImportStar
private import semmle.python.dataflow.new.TypeTracking
private import semmle.python.dataflow.new.internal.DataFlowPrivate
private import semmle.python.essa.SsaDefinitions
/**
* Holds if the name of `var` refers to a submodule of a package and `init` is
* the `__init__` module of that package. Locally inlined replacement for the
* legacy `SsaSource::init_module_submodule_defn` so that this module has no
* direct dependency on `semmle.python.essa.SsaDefinitions`.
*/
private predicate initModuleSubmoduleDefn(GlobalVariable var, Module init) {
init.isPackageInit() and
exists(init.getPackage().getSubModule(var.getId())) and
var.getScope() = init
}
/**
* Python modules and the way imports are resolved are... complicated. Here's a crash course in how
@@ -69,13 +82,13 @@ module ImportResolution {
* Holds if there is an ESSA step from `defFrom` to `defTo`, which should be allowed
* for import resolution.
*/
private predicate allowedEssaImportStep(EssaDefinition defFrom, EssaDefinition defTo) {
private predicate allowedEssaImportStep(
SsaImpl::EssaDefinition defFrom, SsaImpl::EssaDefinition defTo
) {
// to handle definitions guarded by if-then-else
defFrom = defTo.(PhiFunction).getAnInput()
or
// refined variable
// example: https://github.com/nvbn/thefuck/blob/ceeaeab94b5df5a4fe9d94d61e4f6b0bbea96378/thefuck/utils.py#L25-L45
defFrom = defTo.(EssaNodeRefinement).getInput().getDefinition()
defFrom = defTo.(SsaImpl::PhiFunction).getAnInput()
// Note: legacy ESSA refinement-step (e.g. for `foo.bar = X`) is
// not modelled in the new SSA. We rely on phi steps only.
}
/**
@@ -92,30 +105,32 @@ module ImportResolution {
// Definitions made inside `m` itself
//
// for code such as `foo = ...; foo.bar = ...` there will be TWO
// EssaDefinition/EssaVariable. One for `foo = ...` (AssignmentDefinition) and one
// SsaImpl::EssaDefinition/SsaImpl::EssaVariable. One for `foo = ...` (SsaImpl::AssignmentDefinition) and one
// for `foo.bar = ...`. The one for `foo.bar = ...` (EssaNodeRefinement). The
// EssaNodeRefinement is the one that will reach the end of the module (normal
// exit).
//
// However, we cannot just use the EssaNodeRefinement as the `val`, because the
// normal data-flow depends on use-use flow, and use-use flow targets CFG nodes not
// EssaNodes. So we need to go back from the EssaDefinition/EssaVariable that
// EssaNodes. So we need to go back from the SsaImpl::EssaDefinition/SsaImpl::EssaVariable that
// reaches the end of the module, to the first definition of the variable, and then
// track forwards using use-use flow to find a suitable CFG node that has flow into
// it from use-use flow.
exists(EssaVariable lastUseVar, EssaVariable firstDef |
exists(SsaImpl::EssaVariable lastUseVar, SsaImpl::EssaVariable firstDef |
lastUseVar.getName() = name and
// we ignore special variable $ introduced by our analysis (not used for anything)
// we ignore special variable * introduced by `from <pkg> import *` -- TODO: understand why we even have this?
not name in ["$", "*"] and
lastUseVar.getAUse() = m.getANormalExit() and
exists(Cfg::ControlFlowNode exit |
exit.isNormalExit() and exit.getScope() = m and lastUseVar.getAUse() = exit
) and
allowedEssaImportStep*(firstDef, lastUseVar) and
not allowedEssaImportStep(_, firstDef)
|
not LocalFlow::defToFirstUse(firstDef, _) and
val.asCfgNode() = firstDef.getDefinition().(EssaNodeDefinition).getDefiningNode()
val.asCfgNode() = firstDef.getDefinition().(SsaImpl::EssaNodeDefinition).getDefiningNode()
or
exists(ControlFlowNode mid, ControlFlowNode end |
exists(Cfg::ControlFlowNode mid, Cfg::ControlFlowNode end |
LocalFlow::defToFirstUse(firstDef, mid) and
LocalFlow::useToNextUse*(mid, end) and
not LocalFlow::useToNextUse(end, _) and
@@ -143,9 +158,9 @@ module ImportResolution {
* handles simple cases where we can statically tell that this is the case.
*/
private predicate all_mentions_name(Module m, string name) {
exists(DefinitionNode def, SequenceNode n |
exists(Cfg::DefinitionNode def, Cfg::SequenceNode n |
def.getValue() = n and
def.(NameNode).getId() = "__all__" and
def.(Cfg::NameNode).getId() = "__all__" and
def.getScope() = m and
any(StringLiteral s | s.getText() = name) = n.getAnElement().getNode()
)
@@ -158,18 +173,20 @@ module ImportResolution {
*/
private predicate no_or_complicated_all(Module m) {
// No mention of `__all__` in the module
not exists(DefinitionNode def | def.getScope() = m and def.(NameNode).getId() = "__all__")
not exists(Cfg::DefinitionNode def |
def.getScope() = m and def.(Cfg::NameNode).getId() = "__all__"
)
or
// `__all__` is set to a non-sequence value
exists(DefinitionNode def |
def.(NameNode).getId() = "__all__" and
exists(Cfg::DefinitionNode def |
def.(Cfg::NameNode).getId() = "__all__" and
def.getScope() = m and
not def.getValue() instanceof SequenceNode
not def.getValue() instanceof Cfg::SequenceNode
)
or
// `__all__` is used in some way that doesn't involve storing a value in it. This usually means
// it is being mutated through `append` or `extend`, which we don't handle.
exists(NameNode n | n.getId() = "__all__" and n.getScope() = m and n.isLoad())
exists(Cfg::NameNode n | n.getId() = "__all__" and n.getScope() = m and n.isLoad())
}
private predicate potential_module_export(Module m, string name) {
@@ -177,7 +194,7 @@ module ImportResolution {
or
no_or_complicated_all(m) and
(
exists(NameNode n | n.getId() = name and n.getScope() = m and name.charAt(0) != "_")
exists(Cfg::NameNode n | n.getId() = name and n.getScope() = m and name.charAt(0) != "_")
or
exists(Alias a | a.getAsname().(Name).getId() = name and a.getValue().getScope() = m)
)
@@ -207,12 +224,12 @@ module ImportResolution {
/** Gets a module that may have been added to `sys.modules`. */
private Module sys_modules_module_with_name(string name) {
exists(ControlFlowNode n, DataFlow::Node mod |
exists(SubscriptNode sub |
exists(Cfg::ControlFlowNode n, DataFlow::Node mod |
exists(Cfg::SubscriptNode sub |
sub.getObject() = sys_modules_reference().asCfgNode() and
sub.getIndex() = n and
n.getNode().(StringLiteral).getText() = name and
sub.(DefinitionNode).getValue() = mod.asCfgNode() and
sub.(Cfg::DefinitionNode).getValue() = mod.asCfgNode() and
mod = getModuleReference(result)
)
)
@@ -324,11 +341,11 @@ module ImportResolution {
// name as a submodule, we always consider that this attribute _could_ be a
// reference to the submodule, even if we don't know that the submodule has been
// imported yet.
exists(string submodule, Module package, EssaVariable var |
exists(string submodule, Module package, SsaImpl::EssaVariable var |
submodule = var.getName() and
SsaSource::init_module_submodule_defn(var.getSourceVariable(), package.getEntryNode()) and
initModuleSubmoduleDefn(var.getSourceVariable().getVariable(), package) and
m = getModuleFromName(package.getPackageName() + "." + submodule) and
result.asCfgNode() = var.getDefinition().(EssaNodeDefinition).getDefiningNode()
result.asCfgNode() = var.getDefinition().(SsaImpl::EssaNodeDefinition).getDefiningNode()
)
}

View File

@@ -3,6 +3,7 @@ overlay[local]
module;
private import python
private import semmle.python.controlflow.internal.Cfg as Cfg
private import semmle.python.dataflow.new.internal.Builtins
private import semmle.python.dataflow.new.internal.ImportResolution
private import semmle.python.dataflow.new.DataFlow
@@ -15,7 +16,7 @@ module ImportStar {
*/
overlay[local]
cached
predicate namePossiblyDefinedInImportStar(NameNode n, string name, Scope s) {
predicate namePossiblyDefinedInImportStar(Cfg::NameNode n, string name, Scope s) {
n.isLoad() and
name = n.getId() and
s = n.getScope().getEnclosingScope*() and
@@ -52,7 +53,7 @@ module ImportStar {
/** Holds if a global variable called `name` is assigned a value in the module `m`. */
cached
predicate globalNameDefinedInModule(string name, Module m) {
exists(NameNode n |
exists(Cfg::NameNode n |
not exists(LocalVariable v | n.defines(v)) and
n.isStore() and
name = n.getId() and
@@ -66,7 +67,7 @@ module ImportStar {
*/
overlay[global]
cached
predicate importStarResolvesTo(NameNode n, Module m) {
predicate importStarResolvesTo(Cfg::NameNode n, Module m) {
m = getStarImported+(n.getEnclosingModule()) and
globalNameDefinedInModule(n.getId(), m) and
not isDefinedLocally(n.getNode())
@@ -99,7 +100,7 @@ module ImportStar {
*/
overlay[local]
cached
ControlFlowNode potentialImportStarBase(Scope s) {
result = any(ImportStarNode n | n.getScope() = s).getModule()
Cfg::ControlFlowNode potentialImportStarBase(Scope s) {
result = any(Cfg::ImportStarNode n | n.getScope() = s).getModule()
}
}

View File

@@ -170,6 +170,8 @@ overlay[local]
module;
private import python
private import semmle.python.controlflow.internal.Cfg as Cfg
private import semmle.python.dataflow.new.internal.SsaImpl as SsaImpl
private import DataFlowPublic
/**
@@ -178,7 +180,7 @@ private import DataFlowPublic
* This class abstracts away the differing representations of comprehensions and
* for statements.
*/
class ForTarget extends ControlFlowNode {
class ForTarget extends Cfg::ControlFlowNode {
Expr source;
ForTarget() {
@@ -198,7 +200,7 @@ class ForTarget extends ControlFlowNode {
}
/** The LHS of an assignment, it also records the assigned value. */
class AssignmentTarget extends ControlFlowNode {
class AssignmentTarget extends Cfg::ControlFlowNode {
Expr value;
AssignmentTarget() {
@@ -209,7 +211,7 @@ class AssignmentTarget extends ControlFlowNode {
}
/** A direct (or top-level) target of an unpacking assignment. */
class UnpackingAssignmentDirectTarget extends ControlFlowNode instanceof SequenceNode {
class UnpackingAssignmentDirectTarget extends Cfg::ControlFlowNode instanceof Cfg::SequenceNode {
Expr value;
UnpackingAssignmentDirectTarget() {
@@ -222,7 +224,7 @@ class UnpackingAssignmentDirectTarget extends ControlFlowNode instanceof Sequenc
}
/** A (possibly recursive) target of an unpacking assignment. */
class UnpackingAssignmentTarget extends ControlFlowNode {
class UnpackingAssignmentTarget extends Cfg::ControlFlowNode {
UnpackingAssignmentTarget() {
this instanceof UnpackingAssignmentDirectTarget
or
@@ -231,10 +233,10 @@ class UnpackingAssignmentTarget extends ControlFlowNode {
}
/** A (possibly recursive) target of an unpacking assignment which is also a sequence. */
class UnpackingAssignmentSequenceTarget extends UnpackingAssignmentTarget instanceof SequenceNode {
ControlFlowNode getElement(int i) { result = super.getElement(i) }
class UnpackingAssignmentSequenceTarget extends UnpackingAssignmentTarget instanceof Cfg::SequenceNode {
Cfg::ControlFlowNode getElement(int i) { result = super.getElement(i) }
ControlFlowNode getAnElement() { result = this.getElement(_) }
Cfg::ControlFlowNode getAnElement() { result = this.getElement(_) }
}
/**
@@ -255,7 +257,7 @@ predicate iterableUnpackingAssignmentFlowStep(Node nodeFrom, Node nodeTo) {
predicate iterableUnpackingForReadStep(CfgNode nodeFrom, Content c, Node nodeTo) {
exists(ForTarget target |
nodeFrom.getNode().getNode() = target.getSource() and
target instanceof SequenceNode and
target instanceof Cfg::SequenceNode and
nodeTo = TIterableSequenceNode(target)
) and
(
@@ -323,11 +325,11 @@ predicate iterableUnpackingConvertingStoreStep(Node nodeFrom, Content c, Node no
*/
predicate iterableUnpackingElementReadStep(Node nodeFrom, Content c, Node nodeTo) {
exists(
UnpackingAssignmentSequenceTarget target, int index, ControlFlowNode element, int starIndex
UnpackingAssignmentSequenceTarget target, int index, Cfg::ControlFlowNode element, int starIndex
|
target.getElement(starIndex) instanceof StarredNode
target.getElement(starIndex) instanceof Cfg::StarredNode
or
not exists(target.getAnElement().(StarredNode)) and
not exists(target.getAnElement().(Cfg::StarredNode)) and
starIndex = -1
|
nodeFrom.(CfgNode).getNode() = target and
@@ -342,18 +344,18 @@ predicate iterableUnpackingElementReadStep(Node nodeFrom, Content c, Node nodeTo
else c.(TupleElementContent).getIndex() >= index - 1
) and
(
if element instanceof SequenceNode
if element instanceof Cfg::SequenceNode
then
// Step 5b
nodeTo = TIterableSequenceNode(element)
else
if element instanceof StarredNode
if element instanceof Cfg::StarredNode
then
// Step 5c
nodeTo = TIterableElementNode(element)
else
// Step 5a
exists(MultiAssignmentDefinition mad | element = mad.getDefiningNode() |
exists(SsaImpl::MultiAssignmentDefinition mad | element = mad.getDefiningNode() |
nodeTo.(CfgNode).getNode() = element
)
)
@@ -366,7 +368,7 @@ predicate iterableUnpackingElementReadStep(Node nodeFrom, Content c, Node nodeTo
* content type `ListElementContent`.
*/
predicate iterableUnpackingStarredElementStoreStep(Node nodeFrom, Content c, Node nodeTo) {
exists(ControlFlowNode starred, MultiAssignmentDefinition mad |
exists(Cfg::ControlFlowNode starred, SsaImpl::MultiAssignmentDefinition mad |
starred.getNode() instanceof Starred and
starred = mad.getDefiningNode()
|

View File

@@ -9,6 +9,7 @@ overlay[local]
module;
private import python
private import semmle.python.controlflow.internal.Cfg as Cfg
import DataFlowPublic
private import DataFlowPrivate
private import semmle.python.internal.CachedStages
@@ -314,7 +315,7 @@ private module Cached {
*/
cached
predicate subscript(LocalSourceNode node, CfgNode subscript, CfgNode index) {
exists(CfgNode seq, SubscriptNode subscriptNode | subscriptNode = subscript.getNode() |
exists(CfgNode seq, Cfg::SubscriptNode subscriptNode | subscriptNode = subscript.getNode() |
node.flowsTo(seq) and
seq.getNode() = subscriptNode.getObject() and
index.getNode() = subscriptNode.getIndex()

View File

@@ -55,6 +55,7 @@ module;
private import python
private import DataFlowPublic
private import semmle.python.controlflow.internal.Cfg as Cfg
/**
* Holds when there is flow from the subject `nodeFrom` to the (top-level) pattern `nodeTo` of a `match` statement.
@@ -91,8 +92,8 @@ predicate matchAsFlowStep(Node nodeFrom, Node nodeTo) {
or
// the interior pattern flows to the alias
nodeFrom.(CfgNode).getNode().getNode() = subject.getPattern() and
exists(PatternAliasDefinition pad | pad.getDefiningNode().getNode() = alias |
nodeTo.(CfgNode).getNode() = pad.getDefiningNode()
exists(Cfg::ControlFlowNode aliasCfg | aliasCfg.getNode() = alias |
nodeTo.(CfgNode).getNode() = aliasCfg
)
)
}
@@ -126,8 +127,8 @@ predicate matchLiteralFlowStep(Node nodeFrom, Node nodeTo) {
predicate matchCaptureFlowStep(Node nodeFrom, Node nodeTo) {
exists(MatchCapturePattern capture, Name var | capture.getVariable() = var |
nodeFrom.(CfgNode).getNode().getNode() = capture and
exists(PatternCaptureDefinition pcd | pcd.getDefiningNode().getNode() = var |
nodeTo.(CfgNode).getNode() = pcd.getDefiningNode()
exists(Cfg::ControlFlowNode varCfg | varCfg.getNode() = var |
nodeTo.(CfgNode).getNode() = varCfg
)
)
}

View File

@@ -0,0 +1,547 @@
/**
* Provides the Python SSA implementation built on the new (shared) CFG.
*
* Mirrors the Java SSA adapter at
* `java/ql/lib/semmle/code/java/dataflow/internal/SsaImpl.qll`:
* an `InputSig` is defined in terms of positional `(BasicBlock, int)`
* variable references, and the shared
* `codeql.ssa.Ssa::Make<Location, Cfg, Input>` module is then
* instantiated.
*
* `SourceVariable` is the AST-level `Py::Variable`. Variable references
* are looked up via the CFG facade's `NameNode.defines`/`uses`/`deletes`
* predicates, which themselves are one-line bridges to AST-level
* `Name.defines`/`uses`/`deletes`.
*
* Implicit-entry definitions are inserted for:
* - non-local / global / builtin variables that are read in the scope
* but never assigned (no enclosing CFG node defines them),
* - captured variables (variables defined in an enclosing scope that
* are read inside the scope), and
* - parameters, but only if the corresponding parameter name is *not*
* itself a CFG node. With the C#-style parameter wiring already
* installed in `AstNodeImpl.qll`, parameter names *are* CFG nodes,
* so the regular `variableWrite` path handles them — no `i = -1`
* entry is needed for ordinary parameters.
*/
overlay[local?]
module;
private import python as Py
private import semmle.python.controlflow.internal.AstNodeImpl as CfgImpl
private import semmle.python.controlflow.internal.Cfg as Cfg
private import codeql.ssa.Ssa as SsaImplCommon
private import codeql.controlflow.BasicBlock as BB
/**
* Adapts the Python `Cfg` facade to the shared SSA library's `CfgSig`.
* All members are inherited from `Cfg::ControlFlowNode` and
* `Cfg::BasicBlock`.
*/
private module CfgForSsa implements BB::CfgSig<Py::Location> {
class ControlFlowNode = CfgImpl::ControlFlowNode;
class BasicBlock = CfgImpl::BasicBlock;
class EntryBasicBlock = CfgImpl::Cfg::EntryBasicBlock;
predicate dominatingEdge = CfgImpl::Cfg::dominatingEdge/2;
}
/**
* A source variable for SSA, wrapping a Python AST `Variable`.
*
* We only track variables that are read at least once in their scope —
* tracking write-only variables would be unnecessary work — *except*
* for module-scope globals, where the "read" can be external (e.g.
* `import mymodule; mymodule.x`). Such globals are tracked
* unconditionally so that import-resolution can find their defining
* write.
*/
private newtype TSsaSourceVariable =
TPyVar(Py::Variable v) {
// Has a use somewhere — read-relevant for SSA.
exists(Cfg::NameNode n | n.uses(v))
or
// Or has a deletion (treated as a write that destroys the value).
exists(Cfg::NameNode n | n.deletes(v))
or
// Or is a module-scope global written in this module — must be
// tracked even if never read locally, because importers may read
// it as an attribute on the module object.
v.getScope() instanceof Py::Module and
exists(Cfg::NameNode n | n.defines(v))
or
// Or is a parameter — parameters must always have a
// `ParameterDefinition` for dataflow argument-routing to work,
// even if the parameter is never read in its scope. Mirrors
// legacy ESSA's `ParameterDefinition` (which fired for every
// parameter binding regardless of liveness).
exists(Py::Parameter p | p.asName() = v.getAStore())
}
/**
* A source variable for SSA, wrapping a Python AST `Variable`.
*/
class SsaSourceVariable extends TSsaSourceVariable {
/** Gets the underlying Python AST variable. */
Py::Variable getVariable() { this = TPyVar(result) }
/** Gets the (textual) name of this variable. */
string getName() { result = this.getVariable().getId() }
/** Gets a textual representation of this source variable. */
string toString() { result = this.getVariable().toString() }
/** Gets the location of this source variable. */
Py::Location getLocation() { result = this.getVariable().getScope().getLocation() }
/** Gets the scope in which this variable lives. */
Py::Scope getScope() { result = this.getVariable().getScope() }
/**
* Gets a use of this variable as it appears in the source — a `NameNode`
* that loads or deletes the variable. Mirrors legacy
* `SsaSourceVariable.getASourceUse()`.
*/
Cfg::ControlFlowNode getASourceUse() {
exists(Cfg::NameNode n | result = n |
n.uses(this.getVariable()) or n.deletes(this.getVariable())
)
}
/**
* Gets an implicit use of this variable. The new SSA does not have
* implicit-use refinements, but we keep this for API parity — every
* normal-exit of the variable's scope counts as a sink, ensuring
* variables stay live to scope exit for taint-tracking.
*/
Cfg::ControlFlowNode getAnImplicitUse() {
result.isNormalExit() and result.getScope() = this.getScope()
}
/**
* Gets a use of this variable — either an explicit source use or an
* implicit use at scope exit. Mirrors legacy `SsaSourceVariable.getAUse()`.
*/
Cfg::ControlFlowNode getAUse() {
result = this.getASourceUse() or result = this.getAnImplicitUse()
}
}
/**
* Holds if `v` is a non-local read in scope `s`, in the sense that `s`
* uses `v` but does not write it within `s`. This includes globals,
* builtins, and variables captured from an enclosing function scope.
*
* The `Py::Variable` `v` lives in some defining scope (the module for
* globals, an outer function for closures, etc.); the reading scope
* `s` is the scope where the use of `v` occurs.
*/
private predicate nonLocalReadIn(Py::Variable v, Py::Scope s) {
exists(Cfg::NameNode n |
n.uses(v) and
n.getScope() = s and
not exists(Cfg::NameNode def | def.defines(v) and def.getScope() = s)
) and
// Match legacy ESSA: only create entry defs for variables that have
// at least one defining store somewhere — otherwise the entry def
// represents "nothing reaches here", which is the default anyway and
// introduces no useful flow. (Legacy's `ModuleVariable` required a
// store; this is the closure-aware generalisation.)
exists(Cfg::NameNode store | store.defines(v))
}
/**
* Holds if `bb` is the entry basic block of a scope where `v` should
* have an implicit entry definition. This covers:
* - non-local / global / builtin variables read in `s`, and
* - captured variables (defined in an enclosing scope but read in `s`).
*
* Each reading scope gets its own entry def, so a closure variable can
* have multiple entry defs across all functions/methods that read it.
*
* Parameters are *not* included: their bound `Name` is itself a CFG
* node (per the C#-style parameter wiring), so `variableWrite` fires at
* the parameter's natural CFG index.
*/
private predicate hasEntryDefIn(SsaSourceVariable v, CfgImpl::BasicBlock bb) {
exists(Py::Scope s |
nonLocalReadIn(v.getVariable(), s) and
bb = entryBlock(s)
)
}
/**
* Gets the entry basic block of scope `s`, where implicit entry
* definitions are placed (at synthetic index `-1`).
*/
private CfgImpl::BasicBlock entryBlock(Py::Scope s) {
exists(CfgImpl::ControlFlowNode entry |
entry instanceof CfgImpl::ControlFlow::EntryNode and
entry.getEnclosingCallable().asScope() = s and
result = entry.getBasicBlock()
)
}
/**
* The SSA `InputSig` for Python. References are positional
* `(BasicBlock, int)` pairs into the new CFG.
*/
private module SsaImplInput implements SsaImplCommon::InputSig<Py::Location, CfgImpl::BasicBlock> {
class SourceVariable = SsaSourceVariable;
predicate variableWrite(CfgImpl::BasicBlock bb, int i, SourceVariable v, boolean certain) {
// Explicit binding at a CFG node — includes assignments,
// parameter Names (wired in via the C# pattern), exception-handler
// `as`-bindings, import aliases, and match-pattern captures.
exists(Cfg::NameNode n |
bb.getNode(i) = n and
n.defines(v.getVariable()) and
certain = true
)
or
// `del x` — removes the binding. Modelled as a certain write that
// makes any subsequent read invalid.
exists(Cfg::NameNode n |
bb.getNode(i) = n and
n.deletes(v.getVariable()) and
certain = true
)
or
// Implicit entry definition for non-local / captured / global /
// builtin variables read in some scope. Each reading scope's entry
// block gets one such write, allowing closures: e.g. when `x` is a
// parameter of an outer function and read inside a nested
// function, both scopes get entry defs for `x`.
hasEntryDefIn(v, bb) and
i = -1 and
certain = true
or
// `from X import *` — possibly rebinds every name in the importing
// scope. Modelled as an uncertain write at the import-star's CFG
// position for every variable that lives in (or is referenced
// from) the same scope as the import-star. Mirrors legacy ESSA's
// `ImportStarRefinement` (see `essa/SsaDefinitions.qll`'s
// `import_star_refinement` predicate). The write is uncertain so
// that prior definitions of the variable remain available — the
// shared-SSA `SsaUncertainWrite` merges the new value with the
// immediately preceding definition.
exists(Cfg::ImportStarNode imp |
bb.getNode(i) = imp and
certain = false and
(
v.getVariable().getScope() = imp.getScope()
or
// Variable is defined in some other scope but referenced in
// the same scope as the import-star (matches legacy clause 2:
// `other.uses(v) and def.getScope() = other.getScope()`).
exists(Cfg::NameNode other |
other.uses(v.getVariable()) and
imp.getScope() = other.getScope()
)
)
)
}
predicate variableRead(CfgImpl::BasicBlock bb, int i, SourceVariable v, boolean certain) {
// Explicit source use — a `Name` load or a `del x` of the variable.
exists(Cfg::NameNode n |
bb.getNode(i) = n and
n.uses(v.getVariable()) and
certain = true
)
or
// Synthetic use at the normal exit of the variable's defining scope.
// This keeps every variable live to scope exit so that callers (e.g.
// `module_export` in ImportResolution.qll, or taint-tracking pass-through
// through unread locals) can ask "which definition reaches end of
// scope?". Mirrors legacy ESSA's `SsaSourceVariable.getAUse()` which
// included `getScope().getANormalExit()`.
exists(Cfg::ControlFlowNode exit |
exit.isNormalExit() and
exit.getScope() = v.getVariable().getScope() and
bb.getNode(i) = exit and
certain = true
)
}
}
/**
* The shared SSA instantiation for Python.
*
* Members:
* - `Definition` — the union of explicit, uncertain, and phi definitions
* - `WriteDefinition`, `UncertainWriteDefinition`, `PhiNode`
* - the standard SSA predicates (`getAUse`, `getAnUltimateDefinition`, ...).
*/
module Ssa = SsaImplCommon::Make<Py::Location, CfgForSsa, SsaImplInput>;
final class Definition = Ssa::Definition;
final class WriteDefinition = Ssa::WriteDefinition;
final class UncertainWriteDefinition = Ssa::UncertainWriteDefinition;
final class PhiNode = Ssa::PhiNode;
// ===========================================================================
// ESSA-shaped adapter layer
//
// The dataflow library (`python/ql/lib/semmle/python/dataflow/new/`) and
// related modules (`ApiGraphs.qll`, etc.) consume the legacy ESSA API
// (`EssaVariable`, `EssaDefinition`, `AssignmentDefinition`,
// `ScopeEntryDefinition`, `ParameterDefinition`, `WithDefinition`,
// `PhiFunction`, plus the `AdjacentUses` module). To migrate them off
// the legacy CFG, we expose the same API surface on top of the
// shared SSA built above.
//
// This adapter is intentionally narrow: it covers only the predicates
// that new dataflow consumes. The richer legacy ESSA — refinement
// nodes, attribute refinements, edge refinements — stays available
// via `semmle.python.essa.Essa` for points-to / legacy code.
// ===========================================================================
/**
* Gets the CFG node at which a write definition's binding takes place.
*
* For ordinary writes (assignment, deletion, parameter) this is the
* canonical CFG node of the bound Name. For implicit entry definitions
* (synthesised at position `-1` of a scope's entry BB) this is the
* scope's entry node.
*/
private Cfg::ControlFlowNode writeDefNode(Ssa::WriteDefinition def) {
exists(CfgImpl::BasicBlock bb, int i | def.definesAt(_, bb, i) |
i >= 0 and result = bb.getNode(i)
or
i = -1 and result = bb.getNode(0)
)
}
/**
* A write definition whose binding has a corresponding CFG node — i.e.
* everything that's not a phi node. Mirrors legacy ESSA's
* `EssaNodeDefinition`.
*/
class EssaNodeDefinition extends Ssa::WriteDefinition {
/** Gets the CFG node where this definition's binding takes place. */
Cfg::ControlFlowNode getDefiningNode() { result = writeDefNode(this) }
/** Gets the variable defined here (legacy name). */
SsaSourceVariable getVariable() { result = this.getSourceVariable() }
/** Gets the enclosing scope. */
Py::Scope getScope() {
exists(Cfg::ControlFlowNode n | n = this.getDefiningNode() | result = n.getScope())
}
/**
* Holds if this definition defines source variable `v` at CFG node
* `defNode`. Flatter form of `getSourceVariable()` +
* `getDefiningNode()`, matching legacy ESSA's `definedBy`.
*/
predicate definedBy(SsaSourceVariable v, Cfg::ControlFlowNode defNode) {
v = this.getSourceVariable() and defNode = this.getDefiningNode()
}
}
/**
* An assignment definition: any binding where the value being assigned
* is statically known via `Cfg::DefinitionNode.getValue()`. Includes
* plain assignments, walrus, annotated assignments, augmented
* assignments, import aliases (`import x` / `from m import x [as y]`),
* `with ... as x`, and for-target bindings (where `getValue()` returns
* the iter expression's CFG node). Excludes parameter bindings —
* those are modelled by `ParameterDefinition`.
*/
class AssignmentDefinition extends EssaNodeDefinition {
AssignmentDefinition() {
exists(Cfg::NameNode n | n = this.getDefiningNode() |
exists(n.(Cfg::DefinitionNode).getValue()) and
not n.(Cfg::ControlFlowNode).isParameter()
)
}
/** Gets the CFG node for the value being assigned, if statically known. */
Cfg::ControlFlowNode getValue() {
result = this.getDefiningNode().(Cfg::DefinitionNode).getValue()
}
}
/**
* A parameter definition — the binding of a parameter name in a
* function's scope.
*/
class ParameterDefinition extends EssaNodeDefinition {
ParameterDefinition() { this.getDefiningNode().isParameter() }
/** Gets the AST `Parameter` (a `Py::Name` in param context). */
Py::Name getParameter() { result = this.getDefiningNode().getNode() }
}
/**
* A definition introduced by a `with ... as x:` clause.
*/
class WithDefinition extends EssaNodeDefinition {
WithDefinition() {
exists(Cfg::NameNode n, Py::With w |
n = this.getDefiningNode() and
w.getOptionalVars() = n.getNode()
)
}
}
/**
* An assignment where the LHS is a tuple/list and the RHS is unpacked:
* `a, b = (1, 2)` or `a, *rest = xs`. The SSA def lives at the inner
* `Name` CFG node, but for IterableUnpacking integration we expose
* the enclosing `StarredNode` as the `getDefiningNode()` for `*rest`
* patterns — mirroring legacy ESSA's `multi_assignment_definition`,
* which placed the def at the StarredNode CFG node.
*/
class MultiAssignmentDefinition extends EssaNodeDefinition {
MultiAssignmentDefinition() {
exists(Cfg::NameNode n | n = super.getDefiningNode() |
exists(Py::Assign a, Py::Expr lhs |
a.getATarget() = lhs and
(lhs instanceof Py::Tuple or lhs instanceof Py::List) and
lhs.getASubExpression+() = n.getNode()
)
or
// For-loop with tuple/list target: `for a, b in xs:` —
// tuple-unpacking semantics applies to the for-target.
exists(Py::For f, Py::Expr lhs |
f.getTarget() = lhs and
(lhs instanceof Py::Tuple or lhs instanceof Py::List) and
lhs.getASubExpression+() = n.getNode()
)
)
}
override Cfg::ControlFlowNode getDefiningNode() {
// Default: the underlying `Name` CFG node (where the SSA def lives).
not exists(Cfg::StarredNode s |
s.getNode().(Py::Starred).getValue() = super.getDefiningNode().getNode()
) and
result = super.getDefiningNode()
or
// Exception: for `*rest`, expose the enclosing `Starred` CFG node
// so that `IterableUnpacking::iterableUnpackingStarredElementStoreStep`
// can attach the rest-list to it.
exists(Cfg::StarredNode s |
s.getNode().(Py::Starred).getValue() = super.getDefiningNode().getNode()
|
result = s
)
}
}
/**
* An implicit entry definition for a non-local / captured / global /
* builtin variable read in a scope but not defined there.
*
* Inherits from `EssaNodeDefinition` and exposes the scope's entry node
* as its defining node (matching legacy ESSA semantics).
*/
class ScopeEntryDefinition extends EssaNodeDefinition {
ScopeEntryDefinition() {
exists(CfgImpl::BasicBlock bb |
this.definesAt(_, bb, -1) and
bb instanceof CfgImpl::Cfg::EntryBasicBlock
)
}
/** Gets the enclosing scope (the scope whose entry block this def is in). */
override Py::Scope getScope() {
exists(CfgImpl::BasicBlock bb |
this.definesAt(_, bb, -1) and
result = bb.getNode(0).(Cfg::ControlFlowNode).getScope()
)
}
}
/** A phi node (alias matching legacy naming). */
class PhiFunction extends PhiNode {
/**
* Gets an input to this phi function (a definition that flows into
* the phi from one of its predecessor blocks). Mirrors legacy
* ESSA's `PhiFunction.getAnInput()`.
*/
Ssa::Definition getAnInput() { Ssa::phiHasInputFromBlock(this, result, _) }
}
/** Base class for all ESSA definitions (legacy-shaped). */
class EssaDefinition = Ssa::Definition;
/**
* An adapter representing a single SSA-defined "variable" — wrapping
* one `Ssa::Definition`. Mirrors legacy `EssaVariable` API.
*/
class EssaVariable extends Ssa::Definition {
/** Gets the underlying SSA definition (legacy name). */
Ssa::Definition getDefinition() { result = this }
/**
* Gets a CFG node where this definition is used. Includes regular
* `Name` reads as well as the synthetic scope-exit "use" registered
* via `SsaImplInput::variableRead` — mirrors legacy ESSA's
* `EssaVariable.getAUse()` which inherited the synthetic exit-use
* from `SsaSourceVariable`.
*/
Cfg::ControlFlowNode getAUse() {
exists(CfgImpl::BasicBlock bb, int i |
Ssa::ssaDefReachesRead(this.getSourceVariable(), this, bb, i) and
bb.getNode(i) = result
)
}
/** Gets the (textual) name of the underlying variable. */
string getName() { result = this.getSourceVariable().getVariable().getId() }
/** Gets the scope in which this variable lives. */
Py::Scope getScope() { result = this.getSourceVariable().getVariable().getScope() }
/** Gets an ultimate non-phi ancestor of this definition. */
EssaVariable getAnUltimateDefinition() {
if this instanceof PhiNode
then
exists(Ssa::Definition input |
Ssa::phiHasInputFromBlock(this, input, _) and
result = input.(EssaVariable).getAnUltimateDefinition()
)
else result = this
}
}
/**
* Adjacent use-use and def-use relations exposed by the shared SSA
* library. Provides the same interface as legacy
* `semmle.python.essa.SsaCompute::AdjacentUses`.
*/
module AdjacentUses {
/** Holds if `nodeFrom` and `nodeTo` are adjacent uses of the same SSA variable. */
predicate adjacentUseUse(Cfg::NameNode nodeFrom, Cfg::NameNode nodeTo) {
exists(SsaSourceVariable v, CfgImpl::BasicBlock bb1, int i1, CfgImpl::BasicBlock bb2, int i2 |
Ssa::adjacentUseUse(bb1, i1, bb2, i2, v, _) and
nodeFrom = bb1.getNode(i1) and
nodeTo = bb2.getNode(i2)
)
}
/** Holds if `use` is a first use of definition `def`. */
predicate firstUse(Ssa::Definition def, Cfg::NameNode use) {
exists(CfgImpl::BasicBlock bb, int i |
Ssa::firstUse(def, bb, i, _) and
use = bb.getNode(i)
)
}
/**
* Holds if `use` is any reachable use of definition `def`. Combines
* `firstUse` with transitive use-use adjacency.
*/
predicate useOfDef(Ssa::Definition def, Cfg::NameNode use) {
firstUse(def, use)
or
exists(Cfg::NameNode mid | useOfDef(def, mid) and adjacentUseUse(mid, use))
}
}

View File

@@ -1,4 +1,6 @@
private import python
private import semmle.python.controlflow.internal.Cfg as Cfg
private import semmle.python.dataflow.new.internal.SsaImpl as SsaImpl
private import semmle.python.dataflow.new.DataFlow
private import semmle.python.dataflow.new.internal.DataFlowPrivate as DataFlowPrivate
private import FlowSummaryImpl as FlowSummaryImpl
@@ -75,7 +77,7 @@ import Cached
* and isn't a big problem in practice.
*/
predicate concatStep(DataFlow::CfgNode nodeFrom, DataFlow::CfgNode nodeTo) {
exists(BinaryExprNode add | add = nodeTo.getNode() |
exists(Cfg::BinaryExprNode add | add = nodeTo.getNode() |
add.getOp() instanceof Add and add.getAnOperand() = nodeFrom.getNode()
)
}
@@ -84,7 +86,7 @@ predicate concatStep(DataFlow::CfgNode nodeFrom, DataFlow::CfgNode nodeTo) {
* Holds if taint can flow from `nodeFrom` to `nodeTo` with a step related to subscripting.
*/
predicate subscriptStep(DataFlow::CfgNode nodeFrom, DataFlow::CfgNode nodeTo) {
nodeTo.getNode().(SubscriptNode).getObject() = nodeFrom.getNode()
nodeTo.getNode().(Cfg::SubscriptNode).getObject() = nodeFrom.getNode()
}
/**
@@ -100,15 +102,15 @@ predicate stringManipulation(DataFlow::CfgNode nodeFrom, DataFlow::CfgNode nodeT
(
call = API::builtin(["str", "bytes", "unicode"]).getACall()
or
call.getFunction().asCfgNode().(NameNode).getId() in ["str", "bytes", "unicode"]
call.getFunction().asCfgNode().(Cfg::NameNode).getId() in ["str", "bytes", "unicode"]
) and
nodeFrom in [call.getArg(0), call.getArgByName("object")]
)
or
// String methods. Note that this doesn't recognize `meth = "foo".upper; meth()`
exists(CallNode call, string method_name, ControlFlowNode object |
exists(Cfg::CallNode call, string method_name, Cfg::ControlFlowNode object |
call = nodeTo.getNode() and
object = call.getFunction().(AttrNode).getObject(method_name)
object = call.getFunction().(Cfg::AttrNode).getObject(method_name)
|
nodeFrom.getNode() = object and
method_name in [
@@ -139,7 +141,7 @@ predicate stringManipulation(DataFlow::CfgNode nodeFrom, DataFlow::CfgNode nodeT
)
or
// % formatting
exists(BinaryExprNode fmt | fmt = nodeTo.getNode() |
exists(Cfg::BinaryExprNode fmt | fmt = nodeTo.getNode() |
fmt.getOp() instanceof Mod and
(
fmt.getLeft() = nodeFrom.getNode()
@@ -149,7 +151,7 @@ predicate stringManipulation(DataFlow::CfgNode nodeFrom, DataFlow::CfgNode nodeT
)
or
// string multiplication -- `"foo" * 10`
exists(BinaryExprNode mult | mult = nodeTo.getNode() |
exists(Cfg::BinaryExprNode mult | mult = nodeTo.getNode() |
mult.getOp() instanceof Mult and
mult.getLeft() = nodeFrom.getNode()
)
@@ -207,8 +209,8 @@ predicate awaitStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {
* the variable `f` is tainted if the result of `open("foo")` is tainted.
*/
predicate asyncWithStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {
exists(With with, ControlFlowNode contextManager, ControlFlowNode var |
var = any(WithDefinition wd).getDefiningNode()
exists(With with, Cfg::ControlFlowNode contextManager, Cfg::ControlFlowNode var |
var = any(SsaImpl::WithDefinition wd).getDefiningNode()
|
nodeFrom.(DataFlow::CfgNode).getNode() = contextManager and
nodeTo.(DataFlow::CfgNode).getNode() = var and

View File

@@ -2,6 +2,8 @@ import codeql.util.Unit
import codeql.typetracking.TypeTracking as Shared
import codeql.typetracking.internal.TypeTrackingImpl as SharedImpl
private import python
private import semmle.python.controlflow.internal.Cfg as Cfg
private import semmle.python.dataflow.new.internal.SsaImpl as SsaImpl
private import semmle.python.internal.CachedStages
private import semmle.python.dataflow.new.internal.DataFlowPublic as DataFlowPublic
private import semmle.python.dataflow.new.internal.DataFlowPrivate as DataFlowPrivate
@@ -94,8 +96,11 @@ private module SummaryTypeTrackerInput implements SummaryTypeTracker::Input {
Node returnOf(Node callable, SummaryComponent return) {
return = FlowSummaryImpl::Private::SummaryComponent::return() and
// `result` should be the return value of a callable expression (lambda or function) referenced by `callable`
result.asCfgNode() =
callable.getALocalSource().asExpr().(CallableExpr).getInnerScope().getAReturnValueFlowNode()
exists(Return ret |
ret.getScope() =
callable.getALocalSource().asExpr().(CallableExpr).getInnerScope() and
result.asCfgNode().getNode() = ret.getValue()
)
}
// Relating callables to nodes
@@ -160,7 +165,7 @@ module TypeTrackingInput implements Shared::TypeTrackingInput<Location> {
// ignore the flow steps from the synthetic sequence node to the real sequence node,
// since we only support one level of content in type-trackers, and the nested
// structure requires two levels at least to be useful.
not exists(SequenceNode outer |
not exists(Cfg::SequenceNode outer |
outer.getAnElement() = nodeTo.asCfgNode() and
IterableUnpacking::iterableUnpackingTupleFlowStep(nodeFrom, nodeTo)
)
@@ -259,7 +264,7 @@ module TypeTrackingInput implements Shared::TypeTrackingInput<Location> {
// Since we only support one level of content in type-trackers we don't actually
// support `(aa, ab), (ba, bb) = ...`. Therefore we exclude the read-step from `(aa,
// ab)` to `aa` (since it is not needed).
not exists(SequenceNode outer |
not exists(Cfg::SequenceNode outer |
outer.getAnElement() = nodeFrom.asCfgNode() and
IterableUnpacking::iterableUnpackingTupleFlowStep(_, nodeFrom)
) and
@@ -269,7 +274,7 @@ module TypeTrackingInput implements Shared::TypeTrackingInput<Location> {
IterableUnpacking::iterableUnpackingForReadStep(_, _, seq) and
IterableUnpacking::iterableUnpackingConvertingReadStep(seq, _, elem) and
IterableUnpacking::iterableUnpackingConvertingStoreStep(elem, _, nodeFrom) and
nodeFrom.asCfgNode() instanceof SequenceNode
nodeFrom.asCfgNode() instanceof Cfg::SequenceNode
)
or
TypeTrackerSummaryFlow::basicLoadStep(nodeFrom, nodeTo, content)
@@ -306,13 +311,13 @@ module TypeTrackingInput implements Shared::TypeTrackingInput<Location> {
//
// nodeFrom is `expr`
// nodeTo is entry node for `f`
exists(ScopeEntryDefinition e, SsaSourceVariable var, DefinitionNode def |
exists(SsaImpl::ScopeEntryDefinition e, SsaImpl::SsaSourceVariable var, Cfg::DefinitionNode def |
e.getSourceVariable() = var and
var.hasDefiningNode(def)
def.getNode() = var.getVariable().getAStore()
|
nodeTo.(DataFlowPublic::ScopeEntryDefinitionNode).getDefinition() = e and
nodeFrom.asCfgNode() = def and
var.getScope().getScope*() = nodeFrom.getScope()
var.getVariable().getScope().getScope*() = nodeFrom.getScope()
)
}

View File

@@ -3,6 +3,9 @@ overlay[local]
module;
private import python
private import semmle.python.controlflow.internal.Cfg as Cfg
private import semmle.python.controlflow.internal.AstNodeImpl as CfgImpl
private import semmle.python.dataflow.new.internal.SsaImpl as SsaImpl
private import DataFlowPublic
private import semmle.python.dataflow.new.internal.DataFlowPrivate
private import codeql.dataflow.VariableCapture as Shared
@@ -14,10 +17,10 @@ private import codeql.dataflow.VariableCapture as Shared
// The first is the main implementation, the second is a performance motivated restriction.
// The restriction is to clear any `CapturedVariableContent` before writing a new one
// to avoid long access paths (see the link for a nice explanation).
private module CaptureInput implements Shared::InputSig<Location, Cfg::BasicBlock> {
private module CaptureInput implements Shared::InputSig<Location, CfgImpl::BasicBlock> {
private import python as PY
additional class ExprCfgNode extends ControlFlowNode {
additional class ExprCfgNode extends Cfg::ControlFlowNode {
ExprCfgNode() { isExpressionNode(this) }
}
@@ -25,7 +28,9 @@ private module CaptureInput implements Shared::InputSig<Location, Cfg::BasicBloc
predicate isConstructor() { none() }
}
Callable basicBlockGetEnclosingCallable(Cfg::BasicBlock bb) { result = bb.getScope() }
Callable basicBlockGetEnclosingCallable(CfgImpl::BasicBlock bb) {
result = bb.getEnclosingCallable().asScope()
}
class CapturedVariable extends LocalVariable {
Function f;
@@ -51,27 +56,29 @@ private module CaptureInput implements Shared::InputSig<Location, Cfg::BasicBloc
class CapturedParameter extends CapturedVariable {
CapturedParameter() { this.isParameter() }
ControlFlowNode getCfgNode() { result.getNode().(Parameter) = this.getAnAccess() }
Cfg::ControlFlowNode getCfgNode() { result.getNode().(Parameter) = this.getAnAccess() }
}
class Expr extends ExprCfgNode {
predicate hasCfgNode(Cfg::BasicBlock bb, int i) { this = bb.getNode(i) }
predicate hasCfgNode(CfgImpl::BasicBlock bb, int i) { this = bb.getNode(i) }
}
class VariableWrite extends ControlFlowNode {
class VariableWrite extends Cfg::ControlFlowNode {
CapturedVariable v;
VariableWrite() { this = v.getAStore().getAFlowNode().(DefinitionNode).getValue() }
VariableWrite() {
exists(Cfg::DefinitionNode d | d.getNode() = v.getAStore() | this = d.getValue())
}
CapturedVariable getVariable() { result = v }
predicate hasCfgNode(Cfg::BasicBlock bb, int i) { this = bb.getNode(i) }
predicate hasCfgNode(CfgImpl::BasicBlock bb, int i) { this = bb.getNode(i) }
}
class VariableRead extends Expr {
CapturedVariable v;
VariableRead() { this = v.getALoad().getAFlowNode() }
VariableRead() { this.getNode() = v.getALoad() }
CapturedVariable getVariable() { result = v }
}
@@ -80,9 +87,14 @@ private module CaptureInput implements Shared::InputSig<Location, Cfg::BasicBloc
// TODO: Other languages have an extra case here looking like
// simpleAstFlowStep(nodeFrom, nodeTo)
// we should investigate the potential benefit of adding that.
exists(SsaVariable def |
exists(SsaImpl::EssaVariable def |
def.getAUse() = nodeTo and
def.getAnUltimateDefinition().getDefinition().(DefinitionNode).getValue() = nodeFrom
def.getAnUltimateDefinition()
.getDefinition()
.(SsaImpl::EssaNodeDefinition)
.getDefiningNode()
.(Cfg::DefinitionNode)
.getValue() = nodeFrom
)
}
@@ -107,7 +119,7 @@ class CapturedVariable = CaptureInput::CapturedVariable;
class ClosureExpr = CaptureInput::ClosureExpr;
module Flow = Shared::Flow<Location, Cfg, CaptureInput>;
module Flow = Shared::Flow<Location, Cfg::CfgSigImpl, CaptureInput>;
private Flow::ClosureNode asClosureNode(Node n) {
result = n.(SynthCaptureNode).getSynthesizedCaptureNode()

View File

@@ -448,8 +448,8 @@ class TaintTrackingImplementation extends string instanceof TaintTracking::Confi
context = TNoParam() and
src = TTaintTrackingNode_(retval, TNoParam(), path, kind, this) and
node.asCfgNode() = call and
retval.asCfgNode() =
any(Return ret | ret.getScope() = pyfunc.getScope()).getValue().getAFlowNode()
retval.asCfgNode().getNode() =
any(Return ret | ret.getScope() = pyfunc.getScope()).getValue()
) and
edgeLabel = "return"
}
@@ -471,8 +471,8 @@ class TaintTrackingImplementation extends string instanceof TaintTracking::Confi
this.callContexts(call, src, pyfunc, context, callee) and
retnode = TTaintTrackingNode_(retval, callee, path, kind, this) and
node.asCfgNode() = call and
retval.asCfgNode() =
any(Return ret | ret.getScope() = pyfunc.getScope()).getValue().getAFlowNode()
retval.asCfgNode().getNode() =
any(Return ret | ret.getScope() = pyfunc.getScope()).getValue()
) and
edgeLabel = "call"
}
@@ -716,8 +716,10 @@ private class EssaTaintTracking extends string instanceof TaintTracking::Configu
src = TTaintTrackingNode_(srcnode, context, path, srckind, this) and
path.noAttribute()
|
assign.getValue().getAFlowNode() = srcnode.asCfgNode() and
depth = iterable_unpacking_descent(assign.getATarget().getAFlowNode(), defn.getDefiningNode()) and
srcnode.asCfgNode().getNode() = assign.getValue() and
exists(SequenceNode left_parent | left_parent.getNode() = assign.getATarget() |
depth = iterable_unpacking_descent(left_parent, defn.getDefiningNode())
) and
kind = taint_at_depth(srckind, depth)
)
}
@@ -964,7 +966,7 @@ private TaintKind taint_at_depth(SequenceKind parent_kind, int depth) {
* - with `left_defn` = `*y`, `left_parent` = `((x, *y), ...)`, result = 1
*/
int iterable_unpacking_descent(SequenceNode left_parent, ControlFlowNode left_defn) {
exists(Assign a | a.getATarget().getASubExpression*().getAFlowNode() = left_parent) and
exists(Assign a | left_parent.getNode() = a.getATarget().getASubExpression*()) and
left_parent.getAnElement() = left_defn and
// Handle `a, *b = some_iterable`
if left_defn instanceof StarredNode then result = 0 else result = 1

View File

@@ -56,7 +56,7 @@ module SsaSource {
predicate with_definition(Variable v, ControlFlowNode defn) {
exists(With with, Name var |
with.getOptionalVars() = var and
var.getAFlowNode() = defn
defn.getNode() = var
|
var = v.getAStore()
)
@@ -67,7 +67,7 @@ module SsaSource {
predicate pattern_capture_definition(Variable v, ControlFlowNode defn) {
exists(MatchCapturePattern capture, Name var |
capture.getVariable() = var and
var.getAFlowNode() = defn
defn.getNode() = var
|
var = v.getAStore()
)
@@ -78,7 +78,7 @@ module SsaSource {
predicate pattern_alias_definition(Variable v, ControlFlowNode defn) {
exists(MatchAsPattern pattern, Name var |
pattern.getAlias() = var and
var.getAFlowNode() = defn
defn.getNode() = var
|
var = v.getAStore()
)

View File

@@ -4,6 +4,7 @@
*/
private import python
private import semmle.python.controlflow.internal.Cfg as Cfg
private import semmle.python.Concepts
private import semmle.python.ApiGraphs
private import semmle.python.dataflow.new.RemoteFlowSources
@@ -59,7 +60,7 @@ module Bottle {
override Parameter getARoutedParameter() { none() }
override Function getARequestHandler() { result.getADecorator().getAFlowNode() = node }
override Function getARequestHandler() { node.getNode() = result.getADecorator() }
}
}
@@ -73,7 +74,9 @@ module Bottle {
/** A response returned by a view callable. */
class BottleReturnResponse extends Http::Server::HttpResponse::Range {
BottleReturnResponse() {
this.asCfgNode() = any(View::ViewCallable vc).getAReturnValueFlowNode()
exists(View::ViewCallable vc, Return ret |
ret.getScope() = vc and this.asCfgNode().getNode() = ret.getValue()
)
}
override DataFlow::Node getBody() { result = this }
@@ -154,9 +157,9 @@ module Bottle {
DataFlow::Node value;
HeaderWriteSubscript() {
exists(SubscriptNode subscript |
exists(Cfg::SubscriptNode subscript |
this.asCfgNode() = subscript and
value.asCfgNode() = subscript.(DefinitionNode).getValue() and
value.asCfgNode() = subscript.(Cfg::DefinitionNode).getValue() and
name.asCfgNode() = subscript.getIndex() and
subscript.getObject() = headers().asSource().asCfgNode()
)

View File

@@ -4,6 +4,7 @@
*/
private import python
private import semmle.python.controlflow.internal.Cfg as Cfg
private import semmle.python.dataflow.new.DataFlow
private import semmle.python.dataflow.new.RemoteFlowSources
private import semmle.python.dataflow.new.TaintTracking
@@ -1305,7 +1306,7 @@ module PrivateDjango {
dict.(DataFlow::MethodCallNode).calls(files, "dict")
)
|
this.asCfgNode().(SubscriptNode).getObject() = dict.asCfgNode()
this.asCfgNode().(Cfg::SubscriptNode).getObject() = dict.asCfgNode()
or
this.(DataFlow::MethodCallNode).calls(dict, "get")
)
@@ -1314,7 +1315,7 @@ module PrivateDjango {
exists(DataFlow::AttrRead files, DataFlow::MethodCallNode getlistCall |
files.accesses(instance(), "FILES") and
getlistCall.calls(files, "getlist") and
this.asCfgNode().(SubscriptNode).getObject() = getlistCall.asCfgNode()
this.asCfgNode().(Cfg::SubscriptNode).getObject() = getlistCall.asCfgNode()
)
}
}
@@ -2216,7 +2217,7 @@ module PrivateDjango {
DataFlow::Node value;
DjangoResponseCookieSubscriptWrite() {
exists(SubscriptNode subscript, DataFlow::AttrRead cookieLookup |
exists(Cfg::SubscriptNode subscript, DataFlow::AttrRead cookieLookup |
// To give `this` a value, we need to choose between either LHS or RHS,
// and just go with the LHS
this.asCfgNode() = subscript
@@ -2228,7 +2229,7 @@ module PrivateDjango {
|
cookieLookup.flowsTo(subscriptObj)
) and
value.asCfgNode() = subscript.(DefinitionNode).getValue() and
value.asCfgNode() = subscript.(Cfg::DefinitionNode).getValue() and
index.asCfgNode() = subscript.getIndex()
)
}
@@ -2249,7 +2250,7 @@ module PrivateDjango {
DataFlow::Node value;
DjangoResponseHeaderSubscriptWrite() {
exists(SubscriptNode subscript, DataFlow::AttrRead headerLookup |
exists(Cfg::SubscriptNode subscript, DataFlow::AttrRead headerLookup |
// To give `this` a value, we need to choose between either LHS or RHS,
// and just go with the LHS
this.asCfgNode() = subscript
@@ -2261,7 +2262,7 @@ module PrivateDjango {
|
headerLookup.flowsTo(subscriptObj)
) and
value.asCfgNode() = subscript.(DefinitionNode).getValue() and
value.asCfgNode() = subscript.(Cfg::DefinitionNode).getValue() and
index.asCfgNode() = subscript.getIndex()
)
}
@@ -2284,14 +2285,14 @@ module PrivateDjango {
DataFlow::Node value;
DjangoResponseSubscriptWrite() {
exists(SubscriptNode subscript |
exists(Cfg::SubscriptNode subscript |
// To give `this` a value, we need to choose between either LHS or RHS,
// and just go with the LHS
this.asCfgNode() = subscript
|
subscript.getObject() =
DjangoImpl::DjangoHttp::Response::HttpResponse::instance().asCfgNode() and
value.asCfgNode() = subscript.(DefinitionNode).getValue() and
value.asCfgNode() = subscript.(Cfg::DefinitionNode).getValue() and
index.asCfgNode() = subscript.getIndex()
)
}
@@ -2426,7 +2427,7 @@ module PrivateDjango {
/** Gets a reference to the result of calling the `as_view` classmethod of this class. */
private DataFlow::TypeTrackingNode asViewResult(DataFlow::TypeTracker t) {
t.start() and
result.asCfgNode().(CallNode).getFunction() = this.asViewRef().asCfgNode()
result.asCfgNode().(Cfg::CallNode).getFunction() = this.asViewRef().asCfgNode()
or
exists(DataFlow::TypeTracker t2 | result = this.asViewResult(t2).track(t2, t))
}
@@ -2872,7 +2873,9 @@ module PrivateDjango {
DataFlow::CfgNode
{
DjangoRedirectViewGetRedirectUrlReturn() {
node = any(GetRedirectUrlFunction f).getAReturnValueFlowNode()
exists(GetRedirectUrlFunction f, Return ret |
ret.getScope() = f and node.getNode() = ret.getValue()
)
}
override DataFlow::Node getRedirectLocation() { result = this }

View File

@@ -4,6 +4,7 @@
*/
private import python
private import semmle.python.controlflow.internal.Cfg as Cfg
private import semmle.python.dataflow.new.DataFlow
private import semmle.python.dataflow.new.RemoteFlowSources
private import semmle.python.dataflow.new.TaintTracking
@@ -129,7 +130,7 @@ module FastApi {
result in [this.getArg(0), this.getArgByName("path")]
}
override Function getARequestHandler() { result.getADecorator().getAFlowNode() = node }
override Function getARequestHandler() { node.getNode() = result.getADecorator() }
override string getFramework() { result = "FastAPI" }
@@ -309,7 +310,11 @@ module FastApi {
FastApiRouteSetup routeSetup;
FastApiRequestHandlerReturn() {
node = routeSetup.getARequestHandler().getAReturnValueFlowNode()
exists(Function h, Return ret |
h = routeSetup.getARequestHandler() and
ret.getScope() = h and
node.getNode() = ret.getValue()
)
}
override DataFlow::Node getBody() { result = this }
@@ -438,7 +443,7 @@ module FastApi {
DataFlow::Node value;
HeaderSubscriptWrite() {
exists(SubscriptNode subscript, DataFlow::AttrRead headerLookup |
exists(Cfg::SubscriptNode subscript, DataFlow::AttrRead headerLookup |
// To give `this` a value, we need to choose between either LHS or RHS,
// and just go with the LHS
this.asCfgNode() = subscript
@@ -447,7 +452,7 @@ module FastApi {
exists(DataFlow::Node subscriptObj | subscriptObj.asCfgNode() = subscript.getObject() |
headerLookup.flowsTo(subscriptObj)
) and
value.asCfgNode() = subscript.(DefinitionNode).getValue() and
value.asCfgNode() = subscript.(Cfg::DefinitionNode).getValue() and
index.asCfgNode() = subscript.getIndex()
)
}

View File

@@ -371,7 +371,7 @@ module Flask {
result in [this.getArg(0), this.getArgByName("rule")]
}
override Function getARequestHandler() { result.getADecorator().getAFlowNode() = node }
override Function getARequestHandler() { node.getNode() = result.getADecorator() }
}
/**
@@ -536,7 +536,7 @@ module Flask {
FlaskRouteHandlerReturn() {
exists(Function routeHandler |
routeHandler = any(FlaskRouteSetup rs).getARequestHandler() and
node = routeHandler.getAReturnValueFlowNode() and
exists(Return ret | ret.getScope() = routeHandler and node.getNode() = ret.getValue()) and
not this instanceof Flask::Response::InstanceSource
)
}

View File

@@ -38,7 +38,7 @@ private module FlaskAdmin {
result in [this.getArg(0), this.getArgByName("url")]
}
override Function getARequestHandler() { result.getADecorator().getAFlowNode() = node }
override Function getARequestHandler() { node.getNode() = result.getADecorator() }
}
/**
@@ -71,7 +71,7 @@ private module FlaskAdmin {
override Function getARequestHandler() {
exists(Flask::FlaskViewClass cls |
cls.getADecorator().getAFlowNode() = node and
node.getNode() = cls.getADecorator() and
result = cls.getARequestHandler()
)
}

View File

@@ -4,6 +4,7 @@
*/
import python
private import semmle.python.controlflow.internal.Cfg as Cfg
import semmle.python.dataflow.new.RemoteFlowSources
import semmle.python.dataflow.new.TaintTracking
import semmle.python.ApiGraphs
@@ -51,9 +52,9 @@ module Gradio {
// limit only to lists of parameters given to `inputs`.
(
(
call.getKeywordParameter("inputs").asSink().asCfgNode() instanceof ListNode
call.getKeywordParameter("inputs").asSink().asCfgNode() instanceof Cfg::ListNode
or
call.getParameter(1).asSink().asCfgNode() instanceof ListNode
call.getParameter(1).asSink().asCfgNode() instanceof Cfg::ListNode
) and
(
this = call.getKeywordParameter("inputs").getASubscript().getAValueReachingSink()
@@ -75,8 +76,8 @@ module Gradio {
exists(GradioInput call |
this = call.getParameter(0, "fn").getParameter(_).asSource() and
// exclude lists of parameters given to `inputs`
not call.getKeywordParameter("inputs").asSink().asCfgNode() instanceof ListNode and
not call.getParameter(1).asSink().asCfgNode() instanceof ListNode
not call.getKeywordParameter("inputs").asSink().asCfgNode() instanceof Cfg::ListNode and
not call.getParameter(1).asSink().asCfgNode() instanceof Cfg::ListNode
)
}
@@ -105,16 +106,16 @@ module Gradio {
// handle cases where there are multiple arguments passed as a list to `inputs`
(
(
node.getKeywordParameter("inputs").asSink().asCfgNode() instanceof ListNode
node.getKeywordParameter("inputs").asSink().asCfgNode() instanceof Cfg::ListNode
or
node.getParameter(1).asSink().asCfgNode() instanceof ListNode
node.getParameter(1).asSink().asCfgNode() instanceof Cfg::ListNode
) and
exists(int i | nodeTo = node.getParameter(0, "fn").getParameter(i).asSource() |
nodeFrom.asCfgNode() =
node.getKeywordParameter("inputs").asSink().asCfgNode().(ListNode).getElement(i)
node.getKeywordParameter("inputs").asSink().asCfgNode().(Cfg::ListNode).getElement(i)
or
nodeFrom.asCfgNode() =
node.getParameter(1).asSink().asCfgNode().(ListNode).getElement(i)
node.getParameter(1).asSink().asCfgNode().(Cfg::ListNode).getElement(i)
)
)
)

View File

@@ -4,6 +4,7 @@
*/
private import python
private import semmle.python.controlflow.internal.Cfg as Cfg
private import semmle.python.dataflow.new.DataFlow
private import semmle.python.dataflow.new.TaintTracking
private import semmle.python.Concepts
@@ -46,7 +47,7 @@ module MarkupSafeModel {
/** A direct instantiation of `markupsafe.Markup`. */
private class ClassInstantiation extends InstanceSource, DataFlow::CallCfgNode {
override CallNode node;
override Cfg::CallNode node;
ClassInstantiation() { this = classRef().getACall() }
}
@@ -64,7 +65,7 @@ module MarkupSafeModel {
/** A string concatenation with a `markupsafe.Markup` involved. */
class StringConcat extends Markup::InstanceSource, DataFlow::CfgNode {
override BinaryExprNode node;
override Cfg::BinaryExprNode node;
StringConcat() {
node.getOp() instanceof Add and
@@ -79,7 +80,7 @@ module MarkupSafeModel {
/** A %-style string format with `markupsafe.Markup` as the format string. */
class PercentStringFormat extends Markup::InstanceSource, DataFlow::CfgNode {
override BinaryExprNode node;
override Cfg::BinaryExprNode node;
PercentStringFormat() {
node.getOp() instanceof Mod and

View File

@@ -7,6 +7,7 @@
*/
private import python
private import semmle.python.controlflow.internal.Cfg as Cfg
private import semmle.python.Concepts
private import semmle.python.ApiGraphs
private import semmle.python.frameworks.data.ModelsAsData
@@ -56,7 +57,7 @@ module Pycurl {
{
OutgoingRequestCall() {
this = setopt().getACall() and
this.getArg(0).asCfgNode().(AttrNode).getName() = "URL"
this.getArg(0).asCfgNode().(Cfg::AttrNode).getName() = "URL"
}
override DataFlow::Node getAUrlPart() {
@@ -81,7 +82,7 @@ module Pycurl {
private class CurlSslCall extends Http::Client::Request::Range instanceof DataFlow::CallCfgNode {
CurlSslCall() {
this = setopt().getACall() and
this.getArg(0).asCfgNode().(AttrNode).getName() = ["SSL_VERIFYPEER", "SSL_VERIFYHOST"]
this.getArg(0).asCfgNode().(Cfg::AttrNode).getName() = ["SSL_VERIFYPEER", "SSL_VERIFYHOST"]
}
override DataFlow::Node getAUrlPart() { none() }

View File

@@ -7,6 +7,7 @@
*/
private import python
private import semmle.python.controlflow.internal.Cfg as Cfg
private import semmle.python.dataflow.new.DataFlow
private import semmle.python.dataflow.new.TaintTracking
private import semmle.python.Concepts
@@ -93,7 +94,7 @@ module Pydantic {
// be a Pydantic model. So `model[0]` will be an overapproximation, but should not
// really cause problems (since we don't expect real code to contain such accesses)
nodeFrom = instance() and
nodeTo.asCfgNode().(SubscriptNode).getObject() = nodeFrom.asCfgNode()
nodeTo.asCfgNode().(Cfg::SubscriptNode).getObject() = nodeFrom.asCfgNode()
}
/**

View File

@@ -166,7 +166,9 @@ module Pyramid {
/** A response returned by a view callable. */
private class PyramidReturnResponse extends Http::Server::HttpResponse::Range {
PyramidReturnResponse() {
this.asCfgNode() = any(View::ViewCallable vc).getAReturnValueFlowNode() and
exists(View::ViewCallable vc, Return ret |
ret.getScope() = vc and this.asCfgNode().getNode() = ret.getValue()
) and
not this = instance()
}

View File

@@ -6,6 +6,7 @@ overlay[local?]
module;
private import python
private import semmle.python.controlflow.internal.Cfg as Cfg
private import semmle.python.dataflow.new.DataFlow
private import semmle.python.dataflow.new.TaintTracking
private import semmle.python.dataflow.new.RemoteFlowSources
@@ -1246,7 +1247,7 @@ module StdlibPrivate {
/** An additional taint step for calls to `os.path.join` */
private class OsPathJoinCallAdditionalTaintStep extends TaintTracking::AdditionalTaintStep {
override predicate step(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {
exists(CallNode call |
exists(Cfg::CallNode call |
nodeTo.asCfgNode() = call and
call = OS::OsPath::join().getACall().asCfgNode() and
call.getAnArg() = nodeFrom.asCfgNode()
@@ -1317,13 +1318,13 @@ module StdlibPrivate {
// run, so if we're able to, we only mark the first element as the command
// (and not the arguments to the command).
//
result.asCfgNode() = arg_args.asCfgNode().(SequenceNode).getElement(0)
result.asCfgNode() = arg_args.asCfgNode().(Cfg::SequenceNode).getElement(0)
or
// Either the "args" argument is not a sequence (which is valid) or we where
// just not able to figure it out. Simply mark the "args" argument as the
// command.
//
not arg_args.asCfgNode() instanceof SequenceNode and
not arg_args.asCfgNode() instanceof Cfg::SequenceNode and
result = arg_args
)
)
@@ -1542,7 +1543,7 @@ module StdlibPrivate {
* See https://docs.python.org/3/library/functions.html#eval
*/
private class BuiltinsEvalCall extends CodeExecution::Range, DataFlow::CallCfgNode {
override CallNode node;
override Cfg::CallNode node;
BuiltinsEvalCall() { this = API::builtin("eval").getACall() }
@@ -1923,7 +1924,7 @@ module StdlibPrivate {
nodeFrom = instance().getAValueReachableFromSource() and
nodeTo = [getvalueRef(), getfirstRef(), getlistRef()].getAValueReachableFromSource()
or
nodeFrom.asCfgNode() = nodeTo.asCfgNode().(CallNode).getFunction() and
nodeFrom.asCfgNode() = nodeTo.asCfgNode().(Cfg::CallNode).getFunction() and
(
nodeFrom = getvalueRef().getAValueReachableFromSource() and
nodeTo = getvalueResult().asSource()
@@ -1939,7 +1940,7 @@ module StdlibPrivate {
nodeFrom in [
instance().getAValueReachableFromSource(), fieldList().getAValueReachableFromSource()
] and
nodeTo.asCfgNode().(SubscriptNode).getObject() = nodeFrom.asCfgNode()
nodeTo.asCfgNode().(Cfg::SubscriptNode).getObject() = nodeFrom.asCfgNode()
or
// Attributes on Field
nodeFrom = field().getAValueReachableFromSource() and
@@ -2254,8 +2255,8 @@ module StdlibPrivate {
DataFlow::CfgNode
{
WsgirefSimpleServerApplicationReturn() {
exists(WsgirefSimpleServerApplication requestHandler |
node = requestHandler.getAReturnValueFlowNode()
exists(WsgirefSimpleServerApplication requestHandler, Return ret |
ret.getScope() = requestHandler and node.getNode() = ret.getValue()
)
}
@@ -2337,9 +2338,9 @@ module StdlibPrivate {
DataFlow::Node value;
HeaderWriteSubscript() {
exists(SubscriptNode subscript |
exists(Cfg::SubscriptNode subscript |
this.asCfgNode() = subscript and
value.asCfgNode() = subscript.(DefinitionNode).getValue() and
value.asCfgNode() = subscript.(Cfg::DefinitionNode).getValue() and
name.asCfgNode() = subscript.getIndex() and
subscript.getObject() = instance().asCfgNode()
)
@@ -2681,7 +2682,7 @@ module StdlibPrivate {
or
// Data injection
// Special handling of the `/` operator
exists(BinaryExprNode slash, DataFlow::Node pathOperand, DataFlow::TypeTracker t2 |
exists(Cfg::BinaryExprNode slash, DataFlow::Node pathOperand, DataFlow::TypeTracker t2 |
slash.getOp() instanceof Div and
pathOperand.asCfgNode() = slash.getAnOperand() and
pathlibPath(t2).flowsTo(pathOperand) and
@@ -2806,7 +2807,7 @@ module StdlibPrivate {
pathlibPath().flowsTo(nodeTo) and
(
// Special handling of the `/` operator
exists(BinaryExprNode slash, DataFlow::Node pathOperand |
exists(Cfg::BinaryExprNode slash, DataFlow::Node pathOperand |
slash.getOp() instanceof Div and
pathOperand.asCfgNode() = slash.getAnOperand() and
pathlibPath().flowsTo(pathOperand)
@@ -4605,9 +4606,9 @@ module StdlibPrivate {
}
override predicate propagatesFlow(string input, string output, boolean preservesValue) {
exists(CallNode c, string name, ControlFlowNode n, DataFlow::AttributeContent ac |
c.getFunction().(NameNode).getId() = "replace" or
c.getFunction().(AttrNode).getName() = "replace"
exists(Cfg::CallNode c, string name, Cfg::ControlFlowNode n, DataFlow::AttributeContent ac |
c.getFunction().(Cfg::NameNode).getId() = "replace" or
c.getFunction().(Cfg::AttrNode).getName() = "replace"
|
n = c.getArgByName(name) and
ac.getAttribute() = name and
@@ -5151,10 +5152,10 @@ module StdlibPrivate {
* See https://docs.python.org/3.9/library/stdtypes.html#str.startswith
*/
private class StartswithCall extends Path::SafeAccessCheck::Range {
StartswithCall() { this.(CallNode).getFunction().(AttrNode).getName() = "startswith" }
StartswithCall() { this.(Cfg::CallNode).getFunction().(Cfg::AttrNode).getName() = "startswith" }
override predicate checks(ControlFlowNode node, boolean branch) {
node = this.(CallNode).getFunction().(AttrNode).getObject() and
override predicate checks(Cfg::ControlFlowNode node, boolean branch) {
node = this.(Cfg::CallNode).getFunction().(Cfg::AttrNode).getObject() and
branch = true
}
}

View File

@@ -8,6 +8,7 @@
*/
private import python
private import semmle.python.controlflow.internal.Cfg as Cfg
private import semmle.python.Concepts
private import semmle.python.ApiGraphs
private import semmle.python.security.dataflow.UrlRedirectCustomizations
@@ -91,7 +92,7 @@ private module Urllib {
* A read of the `netloc` attribute of a parsed URL as returned by `urllib.parse.urlparse`,
* which is being checked in a way that is relevant for URL redirection vulnerabilities.
*/
private predicate netlocCheck(DataFlow::GuardNode g, ControlFlowNode node, boolean branch) {
private predicate netlocCheck(DataFlow::GuardNode g, Cfg::ControlFlowNode node, boolean branch) {
exists(DataFlow::CallCfgNode urlParseCall, DataFlow::AttrRead netlocRead |
urlParseCall = getUrlParseCall() and
netlocRead = urlParseCall.getAnAttributeRead("netloc") and

View File

@@ -4,6 +4,7 @@
*/
private import python
private import semmle.python.controlflow.internal.Cfg as Cfg
private import semmle.python.dataflow.new.DataFlow
private import semmle.python.dataflow.new.RemoteFlowSources
private import semmle.python.dataflow.new.TaintTracking
@@ -72,9 +73,9 @@ module Tornado {
DataFlow::Node value;
TornadoHeaderSubscriptWrite() {
exists(SubscriptNode subscript |
exists(Cfg::SubscriptNode subscript |
subscript.getObject() = instance().asCfgNode() and
value.asCfgNode() = subscript.(DefinitionNode).getValue() and
value.asCfgNode() = subscript.(Cfg::DefinitionNode).getValue() and
index.asCfgNode() = subscript.getIndex() and
this.asCfgNode() = subscript
)
@@ -422,7 +423,7 @@ module Tornado {
// be able to do something more structured for providing modeling of the members
// of a container-object.
exists(DataFlow::AttrRead files | files.accesses(instance(), "cookies") |
this.asCfgNode().(SubscriptNode).getObject() = files.asCfgNode()
this.asCfgNode().(Cfg::SubscriptNode).getObject() = files.asCfgNode()
or
this.(DataFlow::MethodCallNode).calls(files, "get")
)
@@ -479,20 +480,20 @@ module Tornado {
// routing
// ---------------------------------------------------------------------------
/** Gets a sequence that defines a number of route rules */
SequenceNode routeSetupRuleList() {
exists(CallNode call |
Cfg::SequenceNode routeSetupRuleList() {
exists(Cfg::CallNode call |
call = any(TornadoModule::Web::Application::ClassInstantiation c).asCfgNode()
|
result in [call.getArg(0), call.getArgByName("handlers")]
)
or
exists(CallNode call |
exists(Cfg::CallNode call |
call.getFunction() = TornadoModule::Web::Application::add_handlers().asCfgNode()
|
result in [call.getArg(1), call.getArgByName("host_handlers")]
)
or
result = routeSetupRuleList().getElement(_).(TupleNode).getElement(1)
result = routeSetupRuleList().getElement(_).(Cfg::TupleNode).getElement(1)
}
/** A tornado route setup. */
@@ -515,12 +516,12 @@ module Tornado {
/** A route setup using a tuple. */
private class TornadoTupleRouteSetup extends TornadoRouteSetup, DataFlow::CfgNode {
override TupleNode node;
override Cfg::TupleNode node;
TornadoTupleRouteSetup() {
node = routeSetupRuleList().getElement(_) and
count(node.getElement(_)) = 2 and
not node.getElement(1) instanceof SequenceNode
not node.getElement(1) instanceof Cfg::SequenceNode
}
override DataFlow::Node getUrlPatternArg() { result.asCfgNode() = node.getElement(0) }

View File

@@ -182,7 +182,9 @@ private module Twisted {
DataFlow::CfgNode
{
TwistedResourceRenderMethodReturn() {
this.asCfgNode() = any(TwistedResourceRenderMethod meth).getAReturnValueFlowNode()
exists(TwistedResourceRenderMethod meth, Return ret |
ret.getScope() = meth and this.asCfgNode().getNode() = ret.getValue()
)
}
override DataFlow::Node getBody() { result = this }

View File

@@ -6,6 +6,7 @@
*/
private import python
private import semmle.python.controlflow.internal.Cfg as Cfg
private import semmle.python.dataflow.new.DataFlow
private import semmle.python.dataflow.new.TaintTracking
private import semmle.python.ApiGraphs
@@ -221,9 +222,9 @@ module Werkzeug {
DataFlow::Node value;
HeaderWriteSubscript() {
exists(SubscriptNode subscript |
exists(Cfg::SubscriptNode subscript |
this.asCfgNode() = subscript and
value.asCfgNode() = subscript.(DefinitionNode).getValue() and
value.asCfgNode() = subscript.(Cfg::DefinitionNode).getValue() and
name.asCfgNode() = subscript.getIndex() and
subscript.getObject() = instance().asCfgNode()
)

View File

@@ -8,6 +8,7 @@
*/
private import python
private import semmle.python.controlflow.internal.Cfg as Cfg
private import semmle.python.dataflow.new.DataFlow
private import semmle.python.Concepts
private import semmle.python.ApiGraphs
@@ -28,7 +29,7 @@ private module Yaml {
* See https://pyyaml.org/wiki/PyYAMLDocumentation (you will have to scroll down).
*/
private class YamlLoadCall extends Decoding::Range, DataFlow::CallCfgNode {
override CallNode node;
override Cfg::CallNode node;
string func_name;
YamlLoadCall() {

View File

@@ -4,6 +4,7 @@
*/
private import python
private import semmle.python.controlflow.internal.Cfg as Cfg
private import semmle.python.dataflow.new.DataFlow
private import semmle.python.dataflow.new.TaintTracking
private import semmle.python.Concepts
@@ -111,7 +112,7 @@ module Yarl {
}
private predicate yarlUrlIsAbsoluteCall(
DataFlow::GuardNode g, ControlFlowNode node, boolean branch
DataFlow::GuardNode g, Cfg::ControlFlowNode node, boolean branch
) {
exists(ClassInstantiation instance, DataFlow::MethodCallNode call |
call.calls(instance, "is_absolute") and

View File

@@ -77,7 +77,7 @@ module Stages {
or
exists(any(AstExtended::AstNode n).getParentNode())
or
exists(any(AstExtended::AstNode n).getAFlowNode())
exists(PyFlow::ControlFlowNode cfg, AstExtended::AstNode n | cfg.getNode() = n)
or
exists(any(PyFlow::BasicBlock b).getImmediateDominator())
or

View File

@@ -56,8 +56,9 @@ abstract class CallableObjectInternal extends ObjectInternal {
/** A Python function. */
class PythonFunctionObjectInternal extends CallableObjectInternal, TPythonFunctionObject {
override Function getScope() {
exists(CallableExpr expr |
this = TPythonFunctionObject(expr.getAFlowNode()) and
exists(CallableExpr expr, ControlFlowNode exprCfg |
exprCfg.getNode() = expr and
this = TPythonFunctionObject(exprCfg) and
result = expr.getInnerScope()
)
}
@@ -160,10 +161,11 @@ class PythonFunctionObjectInternal extends CallableObjectInternal, TPythonFuncti
}
private BasicBlock blockReturningNone(Function func) {
exists(Return ret |
exists(Return ret, ControlFlowNode ret_ |
not exists(ret.getValue()) and
ret.getScope() = func and
result = ret.getAFlowNode().getBasicBlock()
ret_.getNode() = ret and
result = ret_.getBasicBlock()
)
}

View File

@@ -113,8 +113,9 @@ abstract class ClassObjectInternal extends ObjectInternal {
class PythonClassObjectInternal extends ClassObjectInternal, TPythonClassObject {
/** Gets the scope for this Python class */
Class getScope() {
exists(ClassExpr expr |
this = TPythonClassObject(expr.getAFlowNode()) and
exists(ClassExpr expr, ControlFlowNode exprCfg |
exprCfg.getNode() = expr and
this = TPythonClassObject(exprCfg) and
result = expr.getInnerScope()
)
}

View File

@@ -387,7 +387,7 @@ private PythonClassObjectInternal abcMetaClassObject() {
private predicate neither_class_nor_static_method(Function f) {
not exists(f.getADecorator())
or
exists(ControlFlowNode deco | deco = f.getADecorator().getAFlowNode() |
exists(ControlFlowNode deco | deco.getNode() = f.getADecorator() |
exists(ObjectInternal o | PointsToInternal::pointsTo(deco, _, o, _) |
o != ObjectInternal::staticMethod() and
o != ObjectInternal::classMethod()

View File

@@ -711,7 +711,7 @@ private module InterModulePointsTo {
ControlFlowNode f, PointsToContext context, ObjectInternal value, ControlFlowNode origin
) {
exists(string name, ImportExpr i |
i.getAFlowNode() = f and
f.getNode() = i and
i.getImportedModuleName() = name and
PointsToInternal::module_imported_as(value, name) and
origin = f and
@@ -2118,8 +2118,9 @@ module Types {
result.getBuiltin() = cls.getBuiltin().getBaseClass() and n = 0
or
exists(Class pycls | pycls = cls.(PythonClassObjectInternal).getScope() |
exists(ObjectInternal base |
PointsToInternal::pointsTo(pycls.getBase(n).getAFlowNode(), _, base, _)
exists(ObjectInternal base, ControlFlowNode baseNode |
baseNode.getNode() = pycls.getBase(n) and
PointsToInternal::pointsTo(baseNode, _, base, _)
|
result = base and base != ObjectInternal::unknown()
or
@@ -2223,7 +2224,10 @@ module Types {
}
private ControlFlowNode decorator_call_callee(PythonClassObjectInternal cls) {
result = cls.getScope().getADecorator().getAFlowNode().(CallNode).getFunction()
exists(CallNode deco |
deco.getNode() = cls.getScope().getADecorator() and
result = deco.getFunction()
)
}
private boolean has_six_add_metaclass(PythonClassObjectInternal cls) {
@@ -2262,7 +2266,7 @@ module Types {
}
private EssaVariable metaclass_var(Class cls) {
result.getASourceUse() = cls.getMetaClass().getAFlowNode()
result.getASourceUse().getNode() = cls.getMetaClass()
or
major_version() = 2 and
not exists(cls.getMetaClass()) and

View File

@@ -3,6 +3,7 @@
*/
import python
private import semmle.python.controlflow.internal.Cfg as Cfg
private import semmle.python.dataflow.new.DataFlow
private import semmle.python.Concepts as Concepts
private import semmle.python.regex
@@ -78,7 +79,7 @@ private module FindRegexMode {
t.start() and
exists(API::Node flag | flag_name = canonical_name(flag) and result = flag.asSource())
or
exists(BinaryExprNode binop, DataFlow::Node operand |
exists(Cfg::BinaryExprNode binop, DataFlow::Node operand |
operand.getALocalSource() = re_flag_tracker(flag_name, t.continue()) and
operand.asCfgNode() = binop.getAnOperand() and
(binop.getOp() instanceof BitOr or binop.getOp() instanceof Add) and

View File

@@ -5,6 +5,7 @@
*/
private import python
private import semmle.python.controlflow.internal.Cfg as Cfg
private import semmle.python.dataflow.new.DataFlow
private import semmle.python.Concepts
private import semmle.python.ApiGraphs
@@ -111,7 +112,7 @@ module UrlRedirect {
// Url redirection is a problem only if the user controls the prefix of the URL.
// TODO: This is a copy of the taint-sanitizer from the old points-to query, which doesn't
// cover formatting.
exists(BinaryExprNode string_concat | string_concat.getOp() instanceof Add |
exists(Cfg::BinaryExprNode string_concat | string_concat.getOp() instanceof Add |
string_concat.getRight() = this.asCfgNode()
)
}

View File

@@ -181,7 +181,7 @@ class ClassObject extends Object {
)
}
ControlFlowNode declaredMetaClass() { result = this.getPyClass().getMetaClass().getAFlowNode() }
ControlFlowNode declaredMetaClass() { result.getNode() = this.getPyClass().getMetaClass() }
/** Has type inference failed to compute the full class hierarchy for this class for the reason given. */
predicate failedInference(string reason) { Types::failedInference(this.theClass(), reason) }
@@ -195,8 +195,9 @@ class ClassObject extends Object {
* It is guaranteed that getProbableSingletonInstance() returns at most one Object for each ClassObject.
*/
Object getProbableSingletonInstance() {
exists(ControlFlowNodeWithPointsTo use, Expr origin |
use.refersTo(result, this, origin.getAFlowNode())
exists(ControlFlowNodeWithPointsTo use, Expr origin, ControlFlowNode origin_ |
origin_.getNode() = origin and
use.refersTo(result, this, origin_)
|
this.hasStaticallyUniqueInstance() and
/* Ensure that original expression will be executed only one. */

View File

@@ -427,7 +427,7 @@ class ExceptFlowNodeWithPointsTo extends ExceptFlowNode {
}
private ControlFlowNodeWithPointsTo element_from_tuple_objectapi(Object tuple) {
exists(Tuple t | t = tuple.getOrigin() and result = t.getAnElt().getAFlowNode())
exists(Tuple t | t = tuple.getOrigin() and result.getNode() = t.getAnElt())
}
/**

View File

@@ -36,8 +36,8 @@ class RangeIterationVariableFact extends PointsToExtension {
RangeIterationVariableFact() {
exists(For f, ControlFlowNode iterable |
iterable.getBasicBlock().dominates(this.(ControlFlowNode).getBasicBlock()) and
f.getIter().getAFlowNode() = iterable and
f.getTarget().getAFlowNode() = this and
iterable.getNode() = f.getIter() and
this.(ControlFlowNode).getNode() = f.getTarget() and
exists(ObjectInternal range |
PointsTo::pointsTo(iterable, _, range, _) and
range.getClass() = ObjectInternal::builtin("range")

View File

@@ -170,7 +170,7 @@ class PyFunctionObject extends FunctionObject {
predicate unconditionallyReturnsParameter(int n) {
exists(SsaVariable pvar |
exists(Parameter p | p = this.getFunction().getArg(n) |
p.asName().getAFlowNode() = pvar.getDefinition()
pvar.getDefinition().getNode() = p.asName()
) and
exists(NameNode rval |
rval = pvar.getAUse() and

View File

@@ -337,7 +337,7 @@ class TupleObject extends SequenceObject {
or
this instanceof TupleNode
or
exists(Function func | func.getVararg().getAFlowNode() = this)
exists(Function func | this.(ControlFlowNode).getNode() = func.getVararg())
}
}
@@ -352,7 +352,9 @@ module TupleObject {
}
class NonEmptyTupleObject extends TupleObject {
NonEmptyTupleObject() { exists(Function func | func.getVararg().getAFlowNode() = this) }
NonEmptyTupleObject() {
exists(Function func | this.(ControlFlowNode).getNode() = func.getVararg())
}
override boolean booleanValue() { result = true }
}

View File

@@ -1,4 +1,5 @@
import python
private import semmle.python.controlflow.internal.Cfg as Cfg
import semmle.python.dataflow.new.DataFlow
private import semmle.python.dataflow.new.internal.DataFlowPrivate
import FlowTest
@@ -23,7 +24,7 @@ import MakeTest<MakeTestSig<MaximalFlowTest>>
module MaximalFlowsConfig implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node node) {
exists(node.getLocation().getFile().getRelativePath()) and
not node.asCfgNode() instanceof CallNode and
not node.asCfgNode() instanceof Cfg::CallNode and
not node.asCfgNode().getNode() instanceof Return and
not node instanceof DataFlow::ParameterNode and
not node instanceof DataFlow::PostUpdateNode and
@@ -34,9 +35,9 @@ module MaximalFlowsConfig implements DataFlow::ConfigSig {
predicate isSink(DataFlow::Node node) {
exists(node.getLocation().getFile().getRelativePath()) and
not any(CallNode c).getArg(_) = node.asCfgNode() and
not any(Cfg::CallNode c).getArg(_) = node.asCfgNode() and
not isArgumentNode(node, _, _) and
not node.asCfgNode().(NameNode).getId().matches("SINK%") and
not node.asCfgNode().(Cfg::NameNode).getId().matches("SINK%") and
not DataFlow::localFlowStep(node, _)
}
}

View File

@@ -1,4 +1,5 @@
import python
private import semmle.python.controlflow.internal.Cfg as Cfg
import utils.test.dataflow.FlowTest
import utils.test.dataflow.testConfig
private import semmle.python.dataflow.new.internal.PrintNode
@@ -19,7 +20,7 @@ query predicate missingAnnotationOnSink(Location location, string error, string
TestConfig::isSink(sink) and
// note: we only care about `SINK` and not `SINK_F`, so we have to reconstruct manually.
exists(DataFlow::CallCfgNode call |
call.getFunction().asCfgNode().(NameNode).getId() = "SINK" and
call.getFunction().asCfgNode().(Cfg::NameNode).getId() = "SINK" and
(sink = call.getArg(_) or sink = call.getArgByName(_))
) and
location = sink.getLocation() and

View File

@@ -1,4 +1,5 @@
import python
private import semmle.python.controlflow.internal.Cfg as Cfg
import utils.test.dataflow.FlowTest
import utils.test.dataflow.testTaintConfig
private import semmle.python.dataflow.new.internal.PrintNode
@@ -18,7 +19,7 @@ query predicate missingAnnotationOnSink(Location location, string error, string
exists(DataFlow::Node sink |
exists(DataFlow::CallCfgNode call |
// note: we only care about `SINK` and not `SINK_F`, so we have to reconstruct manually.
call.getFunction().asCfgNode().(NameNode).getId() = "SINK" and
call.getFunction().asCfgNode().(Cfg::NameNode).getId() = "SINK" and
(sink = call.getArg(_) or sink = call.getArgByName(_))
) and
location = sink.getLocation() and

View File

@@ -1,4 +1,5 @@
import python
private import semmle.python.controlflow.internal.Cfg as Cfg
import semmle.python.dataflow.new.DataFlow
import utils.test.InlineExpectationsTest
private import semmle.python.dataflow.new.internal.PrintNode
@@ -49,7 +50,7 @@ private string fromValue(DataFlow::Node fromNode) {
pragma[inline]
private string fromFunc(DataFlow::ArgumentNode fromNode) {
result = fromNode.getCall().getNode().(CallNode).getFunction().getNode().(Name).getId()
result = fromNode.getCall().getNode().(Cfg::CallNode).getFunction().getNode().(Name).getId()
}
pragma[inline]

View File

@@ -1,15 +1,16 @@
import python
private import semmle.python.controlflow.internal.Cfg as Cfg
private import semmle.python.dataflow.new.internal.PrintNode
private import semmle.python.dataflow.new.internal.DataFlowPrivate as DataFlowPrivate
private import semmle.python.ApiGraphs
import utils.test.InlineExpectationsTest
signature module UnresolvedCallExpectationsSig {
predicate unresolvedCall(CallNode call);
predicate unresolvedCall(Cfg::CallNode call);
}
module DefaultUnresolvedCallExpectations implements UnresolvedCallExpectationsSig {
predicate unresolvedCall(CallNode call) {
predicate unresolvedCall(Cfg::CallNode call) {
not exists(DataFlowPrivate::DataFlowCall dfc |
exists(dfc.getCallable()) and dfc.getNode() = call
) and
@@ -24,7 +25,7 @@ module MakeUnresolvedCallExpectations<UnresolvedCallExpectationsSig Impl> {
predicate hasActualResult(Location location, string element, string tag, string value) {
exists(location.getFile().getRelativePath()) and
exists(CallNode call | Impl::unresolvedCall(call) |
exists(Cfg::CallNode call | Impl::unresolvedCall(call) |
location = call.getLocation() and
tag = "unresolved_call" and
value = prettyExpr(call.getNode()) and

View File

@@ -21,11 +21,12 @@
*/
private import python
private import semmle.python.controlflow.internal.Cfg as Cfg
import semmle.python.dataflow.new.DataFlow
module TestConfig implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node node) {
node.(DataFlow::CfgNode).getNode().(NameNode).getId() = "SOURCE"
node.(DataFlow::CfgNode).getNode().(Cfg::NameNode).getId() = "SOURCE"
or
node.(DataFlow::CfgNode).getNode().getNode().(StringLiteral).getS() = "source"
or
@@ -37,7 +38,7 @@ module TestConfig implements DataFlow::ConfigSig {
predicate isSink(DataFlow::Node node) {
exists(DataFlow::CallCfgNode call |
call.getFunction().asCfgNode().(NameNode).getId() in ["SINK", "SINK_F"] and
call.getFunction().asCfgNode().(Cfg::NameNode).getId() in ["SINK", "SINK_F"] and
(node = call.getArg(_) or node = call.getArgByName(_)) and
not node = call.getArgByName("not_present_at_runtime")
)

View File

@@ -21,12 +21,13 @@
*/
private import python
private import semmle.python.controlflow.internal.Cfg as Cfg
import semmle.python.dataflow.new.DataFlow
import semmle.python.dataflow.new.TaintTracking
module TestConfig implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node node) {
node.(DataFlow::CfgNode).getNode().(NameNode).getId() = "SOURCE"
node.(DataFlow::CfgNode).getNode().(Cfg::NameNode).getId() = "SOURCE"
or
node.(DataFlow::CfgNode).getNode().getNode().(StringLiteral).getS() = "source"
or
@@ -37,8 +38,8 @@ module TestConfig implements DataFlow::ConfigSig {
}
predicate isSink(DataFlow::Node node) {
exists(CallNode call |
call.getFunction().(NameNode).getId() in ["SINK", "SINK_F"] and
exists(Cfg::CallNode call |
call.getFunction().(Cfg::NameNode).getId() in ["SINK", "SINK_F"] and
node.(DataFlow::CfgNode).getNode() = call.getAnArg()
)
}

View File

@@ -48,9 +48,11 @@ class CheckClass extends ClassObject {
self_dict = sub.getObject()
or
/* Indirect assignment via temporary variable */
exists(SsaVariable v |
v.getAUse() = sub.getObject().getAFlowNode() and
v.getDefinition().(DefinitionNode).getValue() = self_dict.getAFlowNode()
exists(SsaVariable v, ControlFlowNode subObjCfg, ControlFlowNode selfDictCfg |
subObjCfg.getNode() = sub.getObject() and selfDictCfg.getNode() = self_dict
|
v.getAUse() = subObjCfg and
v.getDefinition().(DefinitionNode).getValue() = selfDictCfg
)
) and
a.getATarget() = sub and
@@ -62,9 +64,10 @@ class CheckClass extends ClassObject {
pragma[nomagic]
private predicate monkeyPatched(string name) {
exists(Attribute a |
exists(Attribute a, ControlFlowNode objCfg |
objCfg.getNode() = a.getObject() and
a.getCtx() instanceof Store and
PointsTo::points_to(a.getObject().getAFlowNode(), _, this, _, _) and
PointsTo::points_to(objCfg, _, this, _, _) and
a.getName() = name
)
}
@@ -84,9 +87,9 @@ class CheckClass extends ClassObject {
}
predicate interestingUndefined(SelfAttributeRead a) {
exists(string name | name = a.getName() |
exists(string name, ControlFlowNode aCfg | name = a.getName() and aCfg.getNode() = a |
this.interestingContext(a, name) and
not this.definedInBlock(a.getAFlowNode().getBasicBlock(), name)
not this.definedInBlock(aCfg.getBasicBlock(), name)
)
}
@@ -109,8 +112,9 @@ class CheckClass extends ClassObject {
pragma[nomagic]
private predicate definitionInBlock(BasicBlock b, string name) {
exists(SelfAttributeStore sa |
sa.getAFlowNode().getBasicBlock() = b and
exists(SelfAttributeStore sa, ControlFlowNode saCfg |
saCfg.getNode() = sa and
saCfg.getBasicBlock() = b and
sa.getName() = name and
sa.getClass() = this.getPyClass()
)

View File

@@ -15,7 +15,9 @@
import python
import semmle.python.ApiGraphs
predicate doesnt_reraise(ExceptStmt ex) { ex.getAFlowNode().getBasicBlock().reachesExit() }
predicate doesnt_reraise(ExceptStmt ex) {
exists(ControlFlowNode exCfg | exCfg.getNode() = ex | exCfg.getBasicBlock().reachesExit())
}
predicate catches_base_exception(ExceptStmt ex) {
ex.getType() = API::builtin("BaseException").getAValueReachableFromSource().asExpr()

View File

@@ -116,7 +116,9 @@ FunctionValue get_function_or_initializer(Value func_or_cls) {
predicate illegally_named_parameter_objectapi(Call call, Object func, string name) {
not func.isC() and
name = call.getANamedArgumentName() and
call.getAFlowNode() = get_a_call_objectapi(func) and
exists(ControlFlowNode callCfg | callCfg.getNode() = call |
callCfg = get_a_call_objectapi(func)
) and
not get_function_or_initializer_objectapi(func).isLegalArgumentName(name)
}
@@ -124,7 +126,7 @@ predicate illegally_named_parameter_objectapi(Call call, Object func, string nam
predicate illegally_named_parameter(Call call, Value func, string name) {
not func.isBuiltin() and
name = call.getANamedArgumentName() and
call.getAFlowNode() = get_a_call(func) and
exists(ControlFlowNode callCfg | callCfg.getNode() = call | callCfg = get_a_call(func)) and
not get_function_or_initializer(func).isLegalArgumentName(name)
}
@@ -146,7 +148,9 @@ predicate too_few_args_objectapi(Call call, Object callable, int limit) {
call = func.getAMethodCall().getNode() and limit = func.minParameters() - 1
or
callable instanceof ClassObject and
call.getAFlowNode() = get_a_call_objectapi(callable) and
exists(ControlFlowNode callCfg | callCfg.getNode() = call |
callCfg = get_a_call_objectapi(callable)
) and
limit = func.minParameters() - 1
)
}
@@ -172,7 +176,7 @@ predicate too_few_args(Call call, Value callable, int limit) {
call = func.getAMethodCall().getNode() and limit = func.minParameters() - 1
or
callable instanceof ClassValue and
call.getAFlowNode() = get_a_call(callable) and
exists(ControlFlowNode callCfg | callCfg.getNode() = call | callCfg = get_a_call(callable)) and
limit = func.minParameters() - 1
)
}
@@ -191,7 +195,9 @@ predicate too_many_args_objectapi(Call call, Object callable, int limit) {
call = func.getAMethodCall().getNode() and limit = func.maxParameters() - 1
or
callable instanceof ClassObject and
call.getAFlowNode() = get_a_call_objectapi(callable) and
exists(ControlFlowNode callCfg | callCfg.getNode() = call |
callCfg = get_a_call_objectapi(callable)
) and
limit = func.maxParameters() - 1
) and
positional_arg_count_for_call_objectapi(call, callable) > limit
@@ -211,7 +217,7 @@ predicate too_many_args(Call call, Value callable, int limit) {
call = func.getAMethodCall().getNode() and limit = func.maxParameters() - 1
or
callable instanceof ClassValue and
call.getAFlowNode() = get_a_call(callable) and
exists(ControlFlowNode callCfg | callCfg.getNode() = call | callCfg = get_a_call(callable)) and
limit = func.maxParameters() - 1
) and
positional_arg_count_for_call(call, callable) > limit

View File

@@ -36,11 +36,13 @@ where
exists(string s | dict_key(d, k1, s) and dict_key(d, k2, s) and k1 != k2) and
(
exists(BasicBlock b, int i1, int i2 |
k1.getAFlowNode() = b.getNode(i1) and
k2.getAFlowNode() = b.getNode(i2) and
b.getNode(i1).getNode() = k1 and
b.getNode(i2).getNode() = k2 and
i1 < i2
)
or
k1.getAFlowNode().getBasicBlock().strictlyDominates(k2.getAFlowNode().getBasicBlock())
exists(ControlFlowNode k1Cfg, ControlFlowNode k2Cfg | k1Cfg.getNode() = k1 and k2Cfg.getNode() = k2 |
k1Cfg.getBasicBlock().strictlyDominates(k2Cfg.getBasicBlock())
)
)
select k1, "Dictionary key " + repr(k1) + " is subsequently $@.", k2, "overwritten"

View File

@@ -98,16 +98,16 @@ private predicate brace_pair(PossibleAdvancedFormatString fmt, int start, int en
}
private predicate advanced_format_call(Call format_expr, PossibleAdvancedFormatString fmt, int args) {
exists(CallNode call | call = format_expr.getAFlowNode() |
exists(CallNode call, ControlFlowNode fmtCfg | call.getNode() = format_expr and fmtCfg.getNode() = fmt |
call.getFunction().(ControlFlowNodeWithPointsTo).pointsTo(Value::named("format")) and
call.getArg(0).(ControlFlowNodeWithPointsTo).pointsTo(_, fmt.getAFlowNode()) and
call.getArg(0).(ControlFlowNodeWithPointsTo).pointsTo(_, fmtCfg) and
args = count(format_expr.getAnArg()) - 1
or
call.getFunction()
.(AttrNode)
.getObject("format")
.(ControlFlowNodeWithPointsTo)
.pointsTo(_, fmt.getAFlowNode()) and
.pointsTo(_, fmtCfg) and
args = count(format_expr.getAnArg())
)
}

View File

@@ -15,7 +15,7 @@ import python
/** Holds if the comparison `comp` uses `is` or `is not` (represented as `op`) to compare its `left` and `right` arguments. */
predicate comparison_using_is(Compare comp, ControlFlowNode left, Cmpop op, ControlFlowNode right) {
exists(CompareNode fcomp | fcomp = comp.getAFlowNode() |
exists(CompareNode fcomp | fcomp.getNode() = comp |
fcomp.operands(left, op, right) and
(op instanceof Is or op instanceof IsNot)
)

View File

@@ -5,7 +5,7 @@ private import LegacyPointsTo
/** Holds if the comparison `comp` uses `is` or `is not` (represented as `op`) to compare its `left` and `right` arguments. */
predicate comparison_using_is(Compare comp, ControlFlowNode left, Cmpop op, ControlFlowNode right) {
exists(CompareNode fcomp | fcomp = comp.getAFlowNode() |
exists(CompareNode fcomp | fcomp.getNode() = comp |
fcomp.operands(left, op, right) and
(op instanceof Is or op instanceof IsNot)
)

View File

@@ -19,7 +19,7 @@ where
// Only relevant for Python 2, as all later versions implement true division
major_version() = 2 and
exists(BinaryExprNode bin, Value lval, Value rval |
bin = div.getAFlowNode() and
bin.getNode() = div and
bin.getNode().getOp() instanceof Div and
bin.getLeft().(ControlFlowNodeWithPointsTo).pointsTo(lval, left) and
lval.getClass() = ClassValue::int_() and

View File

@@ -19,7 +19,9 @@ where
exists(Function init | init.isInitMethod() and r.getScope() = init) and
r.getValue() = rv and
not rv.pointsTo(Value::none_()) and
not exists(FunctionValue f | f.getACall() = rv.getAFlowNode() | f.neverReturns()) and
not exists(FunctionValue f, ControlFlowNode rvCfg | rvCfg.getNode() = rv |
f.getACall() = rvCfg and f.neverReturns()
) and
// to avoid double reporting, don't trigger if returning result from other __init__ function
not exists(Attribute meth | meth = rv.(Call).getFunc() | meth.getName() = "__init__")
select r, "Explicit return in __init__ method."

View File

@@ -69,7 +69,12 @@ where
returns_meaningful_value(callee) and
not wrapped_in_try_except(call) and
exists(int unused |
unused = count(ExprStmt e | e.getValue().getAFlowNode() = callee.getACall()) and
unused =
count(ExprStmt e |
exists(ControlFlowNode eValCfg | eValCfg.getNode() = e.getValue() |
eValCfg = callee.getACall()
)
) and
total = count(callee.getACall())
|
percentage_used = (100.0 * (total - unused) / total).floor()

View File

@@ -138,12 +138,12 @@ predicate function_opens_file(FunctionValue f) {
f = Value::named("open")
or
exists(EssaVariable v, Return ret | ret.getScope() = f.getScope() |
ret.getValue().getAFlowNode() = v.getAUse() and
v.getNode() = ret.getValue().getAUse() and
var_is_open(v, _)
)
or
exists(Return ret, FunctionValue callee | ret.getScope() = f.getScope() |
ret.getValue().getAFlowNode() = callee.getACall() and
callee.getNode() = ret.getValue().getACall() and
function_opens_file(callee)
)
}

View File

@@ -94,7 +94,7 @@ class CredentialSink extends DataFlow::Node {
this.(DataFlow::ArgumentNode).argumentOf(_, pos)
)
or
exists(Keyword k | k.getArg() = name and k.getValue().getAFlowNode() = this.asCfgNode())
exists(Keyword k | k.getArg() = name and this.getNode() = k.getValue().asCfgNode())
or
exists(CompareNode cmp, NameNode n | n.getId() = name |
cmp.operands(this.asCfgNode(), any(Eq eq), n)

View File

@@ -25,7 +25,7 @@ from
For loop, ControlFlowNodeWithPointsTo iter, Value str, Value seq, ControlFlowNode seq_origin,
ControlFlowNode str_origin
where
loop.getIter().getAFlowNode() = iter and
iter.getNode() = loop.getIter() and
iter.pointsTo(str, str_origin) and
iter.pointsTo(seq, seq_origin) and
has_string_type(str) and

View File

@@ -15,7 +15,7 @@
import python
predicate loop_variable_ssa(For f, Variable v, SsaVariable s) {
f.getTarget().getAFlowNode() = s.getDefinition() and v = s.getVariable()
s.getDefinition().getNode() = f.getTarget() and v = s.getVariable()
}
predicate variableUsedInNestedLoops(For inner, For outer, Variable v, Name n) {

View File

@@ -16,7 +16,7 @@ private import LegacyPointsTo
from For loop, ControlFlowNodeWithPointsTo iter, Value v, ClassValue t, ControlFlowNode origin
where
loop.getIter().getAFlowNode() = iter and
iter.getNode() = loop.getIter() and
iter.pointsTo(_, v, origin) and
v.getClass() = t and
not t.isIterable() and

View File

@@ -24,11 +24,13 @@ predicate func_with_side_effects(Expr e) {
}
predicate call_with_side_effect(Call e) {
e.getAFlowNode() =
API::moduleImport("subprocess")
.getMember(["call", "check_call", "check_output"])
.getACall()
.asCfgNode()
exists(ControlFlowNode eCfg | eCfg.getNode() = e |
eCfg =
API::moduleImport("subprocess")
.getMember(["call", "check_call", "check_output"])
.getACall()
.asCfgNode()
)
}
predicate probable_side_effect(Expr e) {

View File

@@ -133,7 +133,11 @@ class ListComprehensionDeclaration extends ListComp {
major_version() = 2 and
this.getIterationVariable(_).getId() = result.getId() and
result.getScope() = this.getScope() and
this.getAFlowNode().strictlyReaches(result.getAFlowNode()) and
exists(ControlFlowNode thisCfg, ControlFlowNode resultCfg |
thisCfg.getNode() = this and resultCfg.getNode() = result
|
thisCfg.strictlyReaches(resultCfg)
) and
result.isUse()
}

View File

@@ -13,18 +13,20 @@
import python
import Definition
from ListComprehensionDeclaration l, Name use, Name defn
from ListComprehensionDeclaration l, Name use, Name defn, ControlFlowNode lCfg, ControlFlowNode useCfg
where
use = l.getALeakedVariableUse() and
defn = l.getDefinition() and
l.getAFlowNode().strictlyReaches(use.getAFlowNode()) and
lCfg.getNode() = l and
useCfg.getNode() = use and
lCfg.strictlyReaches(useCfg) and
/* Make sure we aren't in a loop, as the variable may be redefined */
not use.getAFlowNode().strictlyReaches(l.getAFlowNode()) and
not useCfg.strictlyReaches(lCfg) and
not l.contains(use) and
not use.deletes(_) and
not exists(SsaVariable v |
v.getAUse() = use.getAFlowNode() and
not v.getDefinition().strictlyDominates(l.getAFlowNode())
v.getAUse() = useCfg and
not v.getDefinition().strictlyDominates(lCfg)
)
select use,
use.getId() + " may have a different value in Python 3, as the $@ will not be in scope.", defn,

View File

@@ -26,8 +26,11 @@ private Stmt loop_probably_defines(Variable v) {
/** Holds if the variable used by `use` is probably defined in a loop */
predicate probably_defined_in_loop(Name use) {
exists(Stmt loop | loop = loop_probably_defines(use.getVariable()) |
loop.getAFlowNode().strictlyReaches(use.getAFlowNode())
exists(Stmt loop, ControlFlowNode loopCfg, ControlFlowNode useCfg |
loop = loop_probably_defines(use.getVariable()) and
loopCfg.getNode() = loop and
useCfg.getNode() = use and
loopCfg.strictlyReaches(useCfg)
)
}

View File

@@ -24,8 +24,8 @@ predicate multiply_defined(AstNode asgn1, AstNode asgn2, Variable v) {
forex(Definition def, Definition redef |
def.getVariable() = v and
def = asgn1.getAFlowNode() and
redef = asgn2.getAFlowNode()
def.getNode() = asgn1 and
redef.getNode() = asgn2
|
def.isUnused() and
def.getARedef() = redef and

View File

@@ -88,7 +88,9 @@ predicate implicit_repeat(For f) {
* E.g. gets `x` from `{ y for y in x }`.
*/
ControlFlowNode get_comp_iterable(For f) {
exists(Comp c | c.getFunction().getStmt(0) = f | c.getAFlowNode().getAPredecessor() = result)
exists(Comp c, ControlFlowNode cCfg |
c.getFunction().getStmt(0) = f and cCfg.getNode() = c and cCfg.getAPredecessor() = result
)
}
from For f, Variable v, string msg

View File

@@ -19,9 +19,10 @@ private predicate loop_entry_variables(EssaVariable pred, EssaVariable succ) {
private predicate loop_entry_edge(BasicBlock pred, BasicBlock loop) {
pred = loop.getAPredecessor() and
pred = loop.getImmediateDominator() and
exists(Stmt s |
exists(Stmt s, ControlFlowNode sCfg |
loop_probably_executes_at_least_once(s) and
s.getAFlowNode().getBasicBlock() = loop
sCfg.getNode() = s and
sCfg.getBasicBlock() = loop
)
}

View File

@@ -27,7 +27,7 @@ predicate guarded_against_name_error(Name u) {
|
globals.getFunc().(Name).getId() = "globals" and
guard.controls(controlled, _) and
controlled.contains(u.getAFlowNode())
exists(ControlFlowNode uCfg | uCfg.getNode() = u | controlled.contains(uCfg))
)
}
@@ -101,18 +101,18 @@ predicate undefined_use(Name u) {
}
private predicate first_use_in_a_block(Name use) {
exists(GlobalVariable v, BasicBlock b, int i |
i = min(int j | b.getNode(j).getNode() = v.getALoad()) and b.getNode(i) = use.getAFlowNode()
exists(GlobalVariable v, BasicBlock b, int i, ControlFlowNode useCfg | useCfg.getNode() = use |
i = min(int j | b.getNode(j).getNode() = v.getALoad()) and b.getNode(i) = useCfg
)
}
predicate first_undefined_use(Name use) {
undefined_use(use) and
exists(GlobalVariable v | v.getALoad() = use |
exists(GlobalVariable v, ControlFlowNode useCfg | v.getALoad() = use and useCfg.getNode() = use |
first_use_in_a_block(use) and
not exists(ControlFlowNode other |
other.getNode() = v.getALoad() and
other.getBasicBlock().strictlyDominates(use.getAFlowNode().getBasicBlock())
other.getBasicBlock().strictlyDominates(useCfg.getBasicBlock())
)
)
}

View File

@@ -18,8 +18,8 @@ private import semmle.python.types.ImportTime
/* Local variable part */
predicate initialized_as_local(PlaceHolder use) {
exists(SsaVariableWithPointsTo l, Function f |
f = use.getScope() and l.getAUse() = use.getAFlowNode()
exists(SsaVariableWithPointsTo l, Function f, ControlFlowNode useCfg |
f = use.getScope() and useCfg.getNode() = use and l.getAUse() = useCfg
|
l.getVariable() instanceof LocalVariable and
not l.maybeUndefined()

View File

@@ -54,7 +54,7 @@ predicate unused_global(Name unused, GlobalVariable v) {
u.uses(v)
|
// That is reachable from this definition, directly
defn.strictlyReaches(u.getAFlowNode())
exists(ControlFlowNode uCfg | uCfg.getNode() = u | defn.strictlyReaches(uCfg))
or
// indirectly
defn.getBasicBlock().reachesExit() and u.getScope() != unused.getScope()

View File

@@ -48,15 +48,17 @@ class Symbol extends TSymbol {
AstNode find() {
this = TModule(result)
or
exists(Symbol s, string name | this = TMember(s, name) |
exists(Symbol s, string name, ControlFlowNode resultCfg |
this = TMember(s, name) and resultCfg.getNode() = result
|
exists(ClassObject cls |
s.resolvesTo() = cls and
cls.attributeRefersTo(name, _, result.getAFlowNode())
cls.attributeRefersTo(name, _, resultCfg)
)
or
exists(ModuleObject m |
s.resolvesTo() = m and
m.attributeRefersTo(name, _, result.getAFlowNode())
m.attributeRefersTo(name, _, resultCfg)
)
)
}

View File

@@ -80,10 +80,11 @@ class VersionGuard extends ConditionBlock {
VersionGuard() { this.getLastNode() instanceof VersionTest }
}
from ImportExpr ie
from ImportExpr ie, ControlFlowNode ieCfg
where
ieCfg.getNode() = ie and
not ie.(ExprWithPointsTo).refersTo(_) and
exists(Context c | c.appliesTo(ie.getAFlowNode())) and
exists(Context c | c.appliesTo(ieCfg)) and
not ok_to_fail(ie) and
not exists(VersionGuard guard | guard.controls(ie.getAFlowNode().getBasicBlock(), _))
not exists(VersionGuard guard | guard.controls(ieCfg.getBasicBlock(), _))
select ie, "Unable to resolve import of '" + ie.getImportedModuleName() + "'."

View File

@@ -11,13 +11,13 @@ import python
import semmle.python.pointsto.PointsTo
predicate points_to_failure(Expr e) {
exists(ControlFlowNode f | f = e.getAFlowNode() | not PointsTo::pointsTo(f, _, _, _))
exists(ControlFlowNode f | f.getNode() = e | not PointsTo::pointsTo(f, _, _, _))
}
predicate key_points_to_failure(Expr e) {
points_to_failure(e) and
not points_to_failure(e.getASubExpression()) and
not exists(SsaVariable ssa | ssa.getAUse() = e.getAFlowNode() |
not exists(SsaVariable ssa, ControlFlowNode eCfg | eCfg.getNode() = e and ssa.getAUse() = eCfg |
points_to_failure(ssa.getAnUltimateDefinition().getDefinition().getNode())
) and
not exists(Assign a | a.getATarget() = e)

View File

@@ -12,5 +12,5 @@ import python
private import LegacyPointsTo
from Expr e
where exists(ControlFlowNodeWithPointsTo f | f = e.getAFlowNode() | not f.refersTo(_))
where exists(ControlFlowNodeWithPointsTo f | f.getNode() = e | not f.refersTo(_))
select e, "Expression does not 'point-to' any object."

View File

@@ -131,7 +131,7 @@ module ModificationOfParameterWithDefault {
exists(DeletionNode d | d.getTarget().(SubscriptNode).getObject() = this.asCfgNode())
or
// augmented assignment to the value
exists(AugAssign a | a.getTarget().getAFlowNode() = this.asCfgNode())
exists(AugAssign a | this.asCfgNode().getNode() = a.getTarget())
or
// modifying function call
exists(DataFlow::CallCfgNode c, DataFlow::AttrRead a | c.getFunction() = a |

View File

@@ -10,6 +10,7 @@
*/
import python
private import semmle.python.controlflow.internal.Cfg as Cfg
import semmle.python.dataflow.new.DataFlow
import semmle.python.dataflow.new.TaintTracking
import semmle.python.dataflow.new.RemoteFlowSources
@@ -19,14 +20,14 @@ private import semmle.python.Concepts
DataFlow::Node shouldBeTainted() {
exists(DataFlow::CallCfgNode call |
call.getFunction().asCfgNode().(NameNode).getId() = "ensure_tainted" and
call.getFunction().asCfgNode().(Cfg::NameNode).getId() = "ensure_tainted" and
result in [call.getArg(_), call.getArgByName(_)]
)
}
DataFlow::Node shouldNotBeTainted() {
exists(DataFlow::CallCfgNode call |
call.getFunction().asCfgNode().(NameNode).getId() = "ensure_not_tainted" and
call.getFunction().asCfgNode().(Cfg::NameNode).getId() = "ensure_not_tainted" and
result in [call.getArg(_), call.getArgByName(_)]
)
}
@@ -36,13 +37,13 @@ DataFlow::Node shouldNotBeTainted() {
module Conf {
module TestTaintTrackingConfig implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
source.asCfgNode().(NameNode).getId() in [
source.asCfgNode().(Cfg::NameNode).getId() in [
"TAINTED_STRING", "TAINTED_BYTES", "TAINTED_LIST", "TAINTED_DICT"
]
or
// User defined sources
exists(CallNode call |
call.getFunction().(NameNode).getId() = "taint" and
exists(Cfg::CallNode call |
call.getFunction().(Cfg::NameNode).getId() = "taint" and
source.(DataFlow::CfgNode).getNode() = call.getAnArg()
)
or

Some files were not shown because too many files have changed in this diff Show More