Commit Graph

4169 Commits

Author SHA1 Message Date
yoff
635134ffd2 Python: treat augmented-assignment targets as both load and store
The legacy CFG emitted two ControlFlowNodes for `x[i] += 42` (one load,
one store, with `load.strictlyDominates(store)`). The new CFG collapses
them to a single canonical node, mirroring Java's single-`VarAccess`
model where `isVarRead`/`isVarWrite` are non-disjoint on the same
expression. Reconcile two legacy two-node behaviours with the merged
single-node world:

1. `Cfg::ControlFlowNode.isLoad()` no longer excludes augmented
   targets — both `isLoad` and `isStore` hold on the merged canonical
   node, matching Java. `NameNode.defines` drops the now-redundant
   `not isLoad` guard; `Py::Name.defines` already filters by
   `isDefinition` (Store/Param/AugAssign-target ctx).

2. `LocalFlow::definitionFlowStep` is restricted to NameNode targets,
   matching legacy ESSA's `assignment_definition` which required
   `defn.(NameNode).defines(v)`. Subscript and attribute writes
   (`x[i] = 42`, `obj.attr = 42`) no longer emit a local-flow step
   *into* the LHS expression — that flow is handled by the AttrWrite
   and content-flow machinery. This is essential for keeping augmented
   Subscript/Attribute targets classifiable as `LocalSourceNode` on
   the read side, which the API graph requires for emitting Use edges.

`StoreLoadTest.ql` is updated to filter `isAugLoad` out of the regular
`load` tag, mirroring the pre-existing `not isAugStore` filter on the
`store` tag so augmented-assignment expectations remain
`augload=n augstore=n` (not also `load=n store=n`).

Closes the three remaining ApiGraphs library-test failures
(`getSubscript.ql` semantically, plus cosmetic toString updates in
`ModuleImportWithDots.ql` and `test_crosstalk.ql`).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-26 10:56:21 +00:00
yoff
dd2deacd00 Python: model from X import * as uncertain SSA writes
Add a 4th disjunct to `SsaImplInput::variableWrite` in the shared-SSA
adapter that mirrors legacy ESSA's `ImportStarRefinement`: every
variable whose scope is the import-star's scope, OR which is used in
the import-star's scope, gets an uncertain write at the `import *`
position.

Uncertain writes do not kill prior definitions; shared SSA's
`SsaUncertainWrite` joins the new value with the immediately-preceding
definition via `uncertainWriteDefinitionInput`. This is the equivalent
of legacy ESSA's two-input refinement.

Cannot depend on `ImportStar` / `ImportResolution` (those modules
import `SsaImpl`), so the predicate uses the structural heuristic on
`Cfg::ImportStarNode` directly.

This closes the two remaining failing dataflow library-tests:

- `import-star/global` — `module_export` chains via `from X import *`
  re-exports now resolve: the importing module has an SSA def of every
  re-exported name, so `lastUseVar` finds the read at the use site.
- `typetracking_imports/highlight_problem` — a direct `from .foo import
  foo` immediately followed by `from .other import *` is now correctly
  marked as dead at the direct import.

Two scope-entry-def noise rows in `highlight_problem.expected` are also
dropped — legacy ESSA needed them as refinement inputs, but shared SSA
handles uncertain writes without an explicit prior def. They were
always tagged `no use to normal exit` (dead).

Dataflow library-tests: 62/64 → 64/64 passing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-26 10:09:04 +00:00
yoff
6d930099d4 Python: update dataflow tests for new CFG + shared SSA
Test-side changes accompanying the dataflow migration:

  * Test queries (.ql) and shared test harness (TestSummaries,
    TestTaintLib) qualify CFG / SSA types with Cfg:: / SsaImpl::,
    bridge via AST (Name, Call, ...) instead of legacy NameNode /
    CallNode, and switch GlobalSsaVariable / EssaVariable usages
    to the new adapter API.

  * .expected files updated for legitimate precision and toString
    changes:
      - phi-node def-use edges newly exposed in def_use_counts.
      - scope-exit synthetic use surfaces one extra implicit use
        in use-use-counts.
      - For [empty]/[non-empty] outcome rows added in
        EnclosingCallable.
      - SsaSourceVariable / Global Variable label cosmetics
        normalised throughout.

  * Inline annotations:
      - typetracking/test.py: removed MISSING:tracked on lines
        93/95 (now found), added SPURIOUS:tracked on line 108
        (decorator over-reach).
      - global-flow/test.py: added SPURIOUS writes=g_mod on line
        20 (correctly reports immediately-overwritten write).
      - tainttracking/customSanitizer/test.py: marked
        try/except: ensure_tainted(s) cases as MISSING: tainted
        (no-raise CFG abstraction does not connect try body to
        except body).
      - coverage/test.py: marked
        SINK(return_from_inner_scope([])) as
        MISSING: flow=... pending closer investigation.

  * regression/{dataflow,custom_dataflow}.expected: accept two
    if/else cond-correlation over-reaches (documented limitation;
    same imprecision applies under legacy semantics by design).

After this change the dataflow library-tests stand at 62 of 64
passing; the two remaining failures are tracked under the
ImportStarRefinement workstream.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-26 07:21:06 +00:00
yoff
ccfaa6ea7f Python: migrate dataflow library to new CFG + shared SSA
Switches the trunk dataflow library and all in-tree consumers
(frameworks, ApiGraphs, Concepts, regexp, security customisations,
test harness) from the legacy Flow.qll/ESSA stack to the new
shared-CFG facade (Cfg.qll) and the ESSA-shaped adapter on the
shared-SSA library (SsaImpl.qll).

Highlights:

  * DataFlowPublic/Private/Dispatch, Attributes, VariableCapture,
    IterableUnpacking, ImportResolution, ImportStar, LocalSources,
    TaintTrackingPrivate, MatchUnpacking, TypeTrackingImpl,
    SsaImpl, Builtins all now qualify CFG/SSA references with
    Cfg:: / SsaImpl:: and stop pulling in semmle.python.essa.*.

  * AstNodeImpl.qll/Cfg.qll: ImportMember exposes its inner
    ImportExpr, DefinitionNode.getValue covers Alias / AnnAssign /
    AugAssign / AssignExpr / For-target / Parameter-default,
    ForNode is treated as an expression node, AnnotatedExitNode is
    canonical, and BoolExprNode.getAnOperand drops the dominance
    constraint that did not hold for short-circuit BBs.

  * SsaImpl.qll: parameters always get a ParameterDefinition (so
    unused parameters still have SSA defs), scope-entry defs for
    module globals require an actual store somewhere, scope-exit
    has a synthetic use so reaching-defs survives to module
    boundary, and the legacy SsaSourceVariable / EssaVariable
    surface (getName, getScope, getAUse, getASourceUse,
    getAnImplicitUse) is reinstated for downstream queries.

  * DataFlowPublic.qll: GuardNode redesigned around the new
    structural outcome nodes (isAfterTrue / isAfterFalse).  The
    legacy ConditionBlock + flipped indirection is gone;
    controlsBlock walks UP through 'not' / '==True' / 'is False'
    etc. via outcomeOfGuard, accumulating polarity cleanly.  Only
    BarrierGuard<...> is preserved as public API.

  * ModuleVariableNode.getAWrite and LocalFlow::definitionFlowStep
    bypass SSA and consult Cfg::NameNode.defines /
    Cfg::DefinitionNode.getValue directly, so that write defs
    pruned by shared SSA (because the variable has no in-scope
    read) still produce dataflow steps.

  * Frameworks + downstream consumers: replace
    EssaVariable.hasDefiningNode, getAReturnValueFlowNode,
    Parameter.getDefault, Scope.getEntryNode / getANormalExit etc.
    with CFG-side bridges through Cfg::ControlFlowNode.

The legacy Flow.qll / Essa.qll stack is untouched and remains
available for queries that import it directly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-26 07:20:44 +00:00
yoff
7521d838c0 Python: SSA: handle closure variables via per-scope entry defs
The new SSA's implicit entry-def predicate previously placed entries in
the variable's defining scope. For closure variables that's the outer
function, so inner functions had no entry def for the captured
variable — reads in the inner scope failed to resolve to any
definition.

Mirrors legacy ESSA's 'NonLocalVariable.getScopeEntryDefinition()':
place an implicit entry def at every reading scope's entry block,
independently of where the variable is *defined*. A closure variable
accessed in two nested functions and the outer one gets three entry
defs (one per reading scope).

Also makes 'ScopeEntryDefinition' extend 'EssaNodeDefinition' (matching
legacy ESSA), with 'getDefiningNode()' returning the scope's entry CFG
node. This requires extending the private 'writeDefNode' helper to
project i=-1 entries to bb.getNode(0).

Updates the new-vs-legacy comparison snapshot: closure-variable reads
('x:32:5'), nested global reads ('GLOBAL:52:1') now resolve. New
'def-only-new' entries appear for unbound names ('sum', 'open',
'compute') — the new SSA uniformly creates scope-entry defs for all
non-local reads, including those that legacy ESSA classifies as
builtin and excludes. This is a more uniform semantic and arguably
cleaner.

Updates the SsaTest 'some_undefined' annotation: previously documented
as a known limitation, now correctly resolves to a scope-entry def.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-19 12:06:15 +00:00
yoff
c4674bca14 Python: extend new SSA with ESSA-shaped adapter + baseline comparison test
Phase 0.5 - Adapter API on top of the shared SSA:

Adds the legacy-ESSA-shaped class hierarchy that the dataflow library
consumes, layered on the shared 'Ssa::Make' instantiation:

  * EssaDefinition / EssaNodeDefinition: the latter exposes
    'getDefiningNode()' (the CFG node at the def's index in its BB)
    and 'getVariable()' / 'getScope()'.
  * AssignmentDefinition: matches Assign, AnnAssign with value,
    AssignExpr and AugAssign target Names. Exposes 'getValue()'
    pointing at the RHS' CFG node.
  * ParameterDefinition: matches when the defining Name is in
    parameter context.
  * WithDefinition: matches 'with ... as x:' bindings.
  * ScopeEntryDefinition: implicit entry defs at synthetic position
    '-1' of the scope's entry basic block (non-local / global /
    builtin / captured reads).
  * PhiFunction (alias for PhiNode).
  * EssaVariable adapter wrapping a 'Ssa::Definition' with 'getAUse()',
    'getDefinition()', 'getAnUltimateDefinition()', and 'getName()'.
  * AdjacentUses module with 'firstUse' and 'adjacentUseUse' predicates
    bridging to 'Ssa::firstUse' / 'Ssa::adjacentUseUse'.

This is the minimum API the new dataflow's internals call into. The
richer legacy ESSA (refinement nodes, attribute refinements, edge
refinements) stays in 'semmle.python.essa.Essa' for legacy code.

Phase 0.6 - Comparison test:

Adds 'dataflow-new-ssa-vs-legacy/CmpTest.ql' that snapshots the
difference between definitions produced by new SSA vs legacy ESSA on
the same Python source. Baseline output records the current
'def-only-old' mismatches, grouped by category:

  * function/class/global definitions with no in-scope read (intentional;
    SSA is liveness-pruned)
  * captured / closure variables (real gap in new SSA - no
    closure-capture handling yet)
  * module variables __name__ / __package__ / $ (legacy ESSA implicit
    bindings)
  * exception 'as' bindings (depend on raise modelling)

Zero 'def-only-new' mismatches: the new SSA never produces a spurious
definition compared to legacy ESSA on this corpus.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-18 21:13:57 +00:00
yoff
87b2e2fb0f Python: fix augstore for the new CFG and add store/load test
In the legacy CFG the same Python 'Name' that is the target of an
augmented assignment has two distinct CFG nodes — a load node (context
3) earlier in the basic block and a store node (context 5) later.
'augstore(load, store)' relates the pair via dominance.

The new (shared) CFG canonicalises each AST expression to a single
CFG node, so 'load' and 'store' collapse to one. The dominance-based
'augstore' from the legacy implementation no longer holds (it would
require 'load.strictlyDominates(load)'), so 'isAugLoad' / 'isAugStore'
never fired and 'isStore' missed the AugAssign target entirely.

Redefines 'augstore' as reflexive on the AugAssign target's canonical
CFG node. With this change:

  * isAugLoad / isAugStore both fire on the single canonical node.
  * isStore fires (via 'or augstore(_, this)') — matching the legacy
    classification that an augmented-assignment target is a store.
  * isLoad does not fire (excluded by 'not augstore(_, this)').

Adds 'python/ql/test/library-tests/ControlFlow/store-load/' covering
plain load/store/delete, parameters, augmented assignment, tuple
unpacking, attribute and subscript stores. The test asserts the
classification directly on the new-CFG facade.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-18 12:29:37 +00:00
yoff
2bb8e90476 Python: introduce shared-SSA adapter on the new CFG
Adds 'python/ql/lib/semmle/python/dataflow/new/internal/SsaImpl.qll', a
minimal Python SSA implementation built on the shared SSA library
('codeql.ssa.Ssa::Make<Location, Cfg, Input>'). The structure mirrors
Java's adapter at 'java/ql/lib/semmle/code/java/dataflow/internal/SsaImpl.qll'.

Key design choices:

  * 'SourceVariable' wraps 'Py::Variable'. Only variables that are read
    or deleted somewhere are tracked - write-only variables don't
    benefit from SSA construction.

  * Variable references are positional ('BasicBlock', 'int') pairs
    looked up via 'Cfg::NameNode.defines'/'.uses'/'.deletes' (which
    themselves are one-line bridges to AST-level 'Name.defines' etc.).

  * Parameter writes are not synthesised: parameter Name nodes are
    already wired into the CFG (per the earlier C#-style parameter
    extension in 'AstNodeImpl.qll'), so the regular 'variableWrite'
    path handles them at their natural CFG index.

  * Non-local / captured / global / builtin variables read in a scope
    but not written in it receive a synthetic entry definition at
    index '-1' of the scope's entry basic block. This matches Java's
    'hasEntryDef'.

  * 'del x' is modelled as a certain write at the deletion site.

Includes an inline-expectations test under
'python/ql/test/library-tests/dataflow-new-ssa/' covering:
plain parameter pass-through, simple assignment + read, reassignment
with dead-write pruning, if/else with phi insertion at the join, and
an undefined-name read (currently a known limitation - no SSA flow
without an enclosing definition).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-18 11:03:45 +00:00
yoff
558cd5b00c Python: test dead bindings under no-raise CFG abstraction
Adds 'dead_under_no_raise.py' to the bindings test suite, capturing the
three CPython patterns where bindings legitimately have no CFG node
because the surrounding code is unreachable under the 'no expressions
raise' abstraction:

  1. Statements after a 'try: return X; except: pass' block.
  2. The 'else:' clause of a try whose body always raises.
  3. Cache-lookup pattern 'try: return cache[k]; except: pass' followed
     by computation and store.

These bindings intentionally carry no 'cfgdefines=' annotations. If
raise modelling is later added to the CFG, the BindingsTest will surface
the new CFG nodes as unexpected results and this file will need to be
revisited.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-18 10:48:43 +00:00
yoff
77149be759 Python: wire PEP 695 type parameters into the shared CFG (green)
Adds CFG coverage for the binding 'Name's introduced by PEP 695
type-parameter syntax on functions, classes, and 'type' aliases:

  def func[T](...): ...
  class Box[T]: ...
  def multi[T: int, *Ts, **P](...): ...
  type Alias[T] = ...

For each parametrised AST node, the type-parameter names (and, for
'type' aliases, the alias name itself) are added as children of the
enclosing CFG node so that 'Name.defines(v)' has a corresponding
position. Bounds and defaults are intentionally not wired (they have
no SSA-relevant semantics for our purposes).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-18 09:58:27 +00:00
Copilot
aa8402d1b7 Python: wire match-pattern bindings into the shared CFG (green)
Adds concrete `Pattern` subclasses in `AstNodeImpl.qll` for every
`MatchPattern` AST kind, with `getChild` overrides that expose
sub-patterns and bound Names. Specifically:

- MatchCapturePattern (`case x:`) -> getVariable()
- MatchAsPattern (`case … as v:`) -> getPattern(), getAlias()
- MatchStarPattern (`case [*rest]:`) -> getTarget()
- MatchSequencePattern (`case [a, b]:`) -> getPattern(i)
- MatchClassPattern (`case Cls(p, q, k=v)`) -> getClass(), positional, keyword
- MatchMappingPattern (`case {k: v}:`) -> getMapping(i)
- MatchKeyValuePattern, MatchKeywordPattern, MatchDoubleStarPattern
- MatchOrPattern, MatchLiteralPattern, MatchValuePattern

Without these, every Name bound by a match pattern lacked a CFG node.
Removes the corresponding MISSING: annotations from match_pattern.py
(all 11 cases).

Verified: all 24 ControlFlow/evaluation-order tests still pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-12 13:11:18 +00:00
Copilot
b8c093eefb Python: wire import-statement bindings into the shared CFG (green)
Adds `ImportStmt` and `ImportStarStmt` wrappers in `AstNodeImpl.qll`.
For each `Alias` in an import statement, both the value (module/member
expression) and the bound `asname` Name become children of the CFG node
for the import statement, in evaluation order.

Without this, every `Name` introduced by `import` / `from .. import ..`
lacked a CFG node, even though `Name.defines(v)` returns true for it on
the AST side. This was the highest-volume gap: 20,332 missing import
aliases across CPython.

Removes the corresponding MISSING: annotations from imports.py.

Verified: all 24 ControlFlow/evaluation-order tests still pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-12 12:35:47 +00:00
Copilot
0742e1e901 Python: wire parameters into the shared CFG (C# pattern)
Implements `AstSig::Parameter` and `callableGetParameter(c, i)` in
`AstNodeImpl.qll`, following the C# template
(`csharp/.../ControlFlowGraph.qll:147-156`) rather than Java's
`Parameter() { none() }`.

Each Python parameter (positional, *args, keyword-only, **kwargs) now
becomes a CFG node at a stable position in the enclosing callable's
entry sequence. Defaults still evaluate at function-definition time
via `FunctionDefExpr.getDefault` / `LambdaExpr.getDefault`, so
`Parameter::getDefaultValue()` returns `none()` (the shared CFG
library calls this to model the missing-argument fallback, which
Python does not surface at the CFG level).

The bindings test now exercises parameters (the `py_expr_contexts(_, 4, ...)`
exclusion has been removed). A new `parameters.py` test case covers
positional, defaulted, vararg, kwarg, keyword-only, kitchen-sink,
method (self/cls), lambda, and PEP 570 positional-only parameters.
Several other test files were updated to annotate parameters that the
test had previously hidden (synthetic `.0` comprehension parameter,
method `self`, decorator `f`, etc.).

Verified:
- All 24 ControlFlow/evaluation-order tests still pass.
- CFG consistency query (`python/ql/consistency-queries/CfgConsistency.ql`)
  shows zero violations on CPython.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-12 12:29:28 +00:00
Copilot
a20af3cf41 Python: wire AnnAssign into the shared CFG (green)
Adds an `AnnAssignStmt` wrapper in `AstNodeImpl.qll` so that PEP 526
annotated assignments (`x: int = 1`, `x: int`) participate in the
control flow graph. Evaluation order follows CPython: annotation,
optional value, target binding.

Without this, `x: int = 1` had no CFG node for `x` even though
`Name.defines(v)` returns true for it on the AST side. SSA built on
the new CFG would therefore miss every annotated-assignment write.

Removes the corresponding MISSING: annotations from the CFG-binding
gap test:
- annassign.py — all four cases now green.
- match_pattern.py — class-body annotated fields (`x: int`, `y: int`).
- type_params.py — `item: T` inside class.

Verified: all 24 ControlFlow/evaluation-order tests still pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-12 12:22:57 +00:00
Copilot
f077e7c27c Python: add CFG-binding gap tests (red)
Adds inline-expectation tests for the new shared CFG implementation in
python/ql/lib/semmle/python/controlflow/internal/AstNodeImpl.qll,
covering every Python binding construct that introduces a variable.

The test files use MISSING: annotations to record bindings whose
defining Name AST node is *not* currently reachable from the new CFG.
These are the 'red' half of red-green commit pairs: subsequent commits
will extend AstNodeImpl to cover each construct and remove the
corresponding MISSING: marker.

Confirmed-broken categories:
- Import aliases (from x import a)
- Annotated assignment (x: int = 1)
- Exception handler (except E as e)
- Match patterns (case x, case [a,b], case ... as v)
- PEP 695 type params (def f[T], class C[T])

Confirmed-working (no MISSING:):
- Compound targets, with-as, comprehensions, decorated def/class,
  walrus, starred.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-12 12:19:34 +00:00
Copilot
76724c5391 Shared CFG: support for-else and while-else loops
Add two default predicates to AstSig:

  default AstNode getWhileElse(WhileStmt loop) { none() }
  default AstNode getForeachElse(ForeachStmt loop) { none() }

When defined, the explicit-step rules for While/Do and Foreach
route the loop's normal-completion exits through the else block
before reaching the after-loop node:

  - WhileStmt: after-false condition -> before-else -> after-while
    (instead of directly after-while).
  - ForeachStmt: after-collection [empty] and the LoopHeader exit
    are both routed through before-else -> after-foreach.

Python's Ast module overrides the predicates to return the
synthetic BlockStmt for the orelse slot, replacing the previous
customisations in Input::step. This eliminates parallel direct
successors emitted by the previous Python-side step additions
(verified: multipleSuccessors on a CPython database goes from
1340 to 0).

Java and C# CFG tests are unaffected.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-05 15:21:43 +00:00
Taus
a33b49a3f3 WIP2 2026-05-05 15:21:42 +00:00
Taus
1af415bec3 WIP 2026-05-05 15:21:42 +00:00
Taus
3be562929a Python: Ignore synthetic CFG nodes
We can only annotate the ones that correspond directly to AST nodes
anyway.

Co-authored-by: yoff <yoff@github.com>
2026-05-05 15:21:41 +00:00
Taus
2ed75e7ca7 Python: Instantiate CFG tests with new CFG library
Co-authored-by: yoff <yoff@github.com>
2026-05-05 15:21:40 +00:00
Taus
4582855de1 Python: Make CFG tests parameterised
Currently we only instantiate them with the old CFG library, but in the
future we'll want to do this with the new library as well.

Co-authored-by: yoff <yoff@github.com>
2026-05-05 15:21:40 +00:00
Taus
ba29e7e34d Python: Add ConsecutiveTimestamps test
This one is potentially a bit iffy -- it checks for a very powerful
propetry (that implies many of the other queries), but as the test
results show, it can produce false positives when there is in fact no
problem. We may want to get rid of it entirely, if it becomes too noisy.
2026-05-05 15:21:40 +00:00
Taus
f97bf38f3b Python: Add NeverReachable test
This looks for nodes annotated with `t.never` in the test that are
reachable in the CFG. This should not happen (it messes with various
queries, e.g. the "mixed returns" query), but the test shows that in a
few particular cases (involving the `match` statement where all cases
contain `return`s), we _do_ have reachable nodes that shouldn't be.
2026-05-05 15:21:40 +00:00
Taus
a8d136d3d6 Python: Add BasicBlockOrdering test
This one demonstrates a bug in the current CFG. In a dictionary
comprehension `{k: v for k, v in d.items()}`, we evaluate the value
before the key, which is incorrect. (A fix for this bug has been
implemented in a separate PR.)
2026-05-05 15:21:40 +00:00
Taus
710a43ac7f Python: Add some CFG-validation queries
These use the annotated, self-verifying test files to check various
consistency requirements.

Some of these may be expressing the same thing in different ways, but
it's fairly cheap to keep them around, so I have not attempted to
produce a minimal set of queries for this.
2026-05-05 15:21:40 +00:00
Taus
3402d0eaeb Python: Add self-validating CFG tests
These tests consist of various Python constructions (hopefully a
somewhat comprehensive set) with specific timestamp annotations
scattered throughout. When the tests are run using the Python 3
interpreter, these annotations are checked and compared to the "current
timestamp" to see that they are in agreement. This is what makes the
tests "self-validating".

There are a few different kinds of annotations: the basic `t[4]` style
(meaning this is executed at timestamp 4), the `t.dead[4]` variant
(meaning this _would_ happen at timestamp 4, but it is in a dead
branch), and `t.never` (meaning this is never executed at all).

In addition to this, there is a query, MissingAnnotations, which checks
whether we have applied these annotations maximally. Many expression
nodes are not actually annotatable, so there is a sizeable list of
excluded nodes for that query.
2026-05-05 15:21:39 +00:00
Josef Svenningsson
68be006a29 Merge pull request #21641 from github/josefs/promptInjectionImprovements
Improve prompt inject for Python
2026-04-29 11:23:52 +01:00
Josef Svenningsson
25a8aa97b2 Fix openai prompt injection tests 2026-04-28 18:24:26 +01:00
Josef Svenningsson
a05e191518 Add tests for anthropic prompt injection models 2026-04-28 18:24:22 +01:00
Josef Svenningsson
e069c9c2ee Fix tests 2026-04-28 18:24:19 +01:00
Taus
ac23e16786 Python: Move Python 3.15 data-flow tests to a separate file
We won't be able to run these tests until Python 3.15 is actually out
(and our CI is using it), so it seemed easiest to just put them in their
own test directory.
2026-04-17 13:16:46 +00:00
Taus
dc36609743 Python: Add data-flow tests
Alas, all these demonstrate is that we already don't fully support the
desugared `yield from` form.
2026-04-17 12:15:04 +00:00
Taus
8b1ecf05c9 Python: Update test output
This change reflects the `(value, key)` to `(key, value)` fix in an
earlier commit.
2026-04-14 13:27:31 +02:00
Taus
de900fc3b5 Python: Add QL test for comprehensions with unpacking 2026-04-14 13:27:31 +02:00
Taus
c748fdf8ee Merge pull request #21694 from github/tausbn/python-add-support-for-pep-810
Python: Add support for PEP 810
2026-04-14 13:27:08 +02:00
Taus
2eeb31b472 Python: Add tests for lazy from ... import * as well 2026-04-13 11:49:06 +00:00
Taus
6b7d47ee7d Python: Add QL test for the new syntax 2026-04-10 14:39:13 +00:00
Taus
e3688444d7 Python: Also exclude class scope
Changing the `locals()` dictionary actually _does_ change the attributes
of the class being defined, so we shouldn't alert in this case.
2026-04-07 23:46:03 +02:00
Taus
16683aee0e Merge pull request #21590 from github/tausbn/python-improve-bind-all-interfaces-query
Python: Improve "bind all interfaces" query
2026-04-07 17:59:48 +02:00
Taus
187f7c7bcf Python: Move isNetworkBind check into isSink 2026-03-27 22:45:26 +00:00
Taus
4f74d421b9 Python: Exclude AF_UNIX sockets from BindToAllInterfaces
Looking at the results of the the previous DCA run, there was a bunch of
false positives where `bind` was being used with a `AF_UNIX` socket (a
filesystem path encoded as a string), not a `(host, port)` tuple. These
results should be excluded from the query, as they are not vulnerable.

Ideally, we would just add `.TupleElement[0]` to the MaD sink, except we
don't actually support this in Python MaD...

So, instead I opted for a more low-tech solution: check that the
argument in question flows from a tuple in the local scope.

This eliminates a bunch of false positives on `python/cpython` leaving
behind four true positive results.
2026-03-27 16:55:10 +00:00
Taus
47d24632e6 Python: Port ShouldUseWithStatement.ql
Only trivial test changes.
2026-03-27 12:34:20 +00:00
Taus
c9832c330a Python: Convert BindToAllInterfaces to path-problem
Now that we're using global data-flow, we might as well make use of the
fact that we know where the source is.
2026-03-26 21:10:43 +00:00
Taus
c439fc5d45 Python: Replace type tracking with global data-flow
This takes care of most of the false negatives from the preceding
commit.

Additionally, we add models for some known wrappers of `socket.socket`
from the `gevent` and `eventlet` packages.
2026-03-26 15:35:33 +00:00
Taus
1ecd9e83b8 Python: Add test cases for BindToAllInterfaces FNs
Adds test cases from github/codeql#21582 demonstrating false negatives:
- Address stored in class attribute (`self.bind_addr`)
- `os.environ.get` with insecure default value
- `gevent.socket` (alternative socket module)
2026-03-26 14:57:24 +00:00
Taus
824d004a27 Python: Convert BindToAllInterfaces test to inline expectations 2026-03-26 14:56:57 +00:00
Taus
1ffcdc9293 Python: Select property instead of function
in PropertyInOldStyleClass. This matches the previous behaviour more
closely.
2026-03-23 14:55:28 +00:00
Taus
3584ad1905 Python: Port DeprecatedSliceMethod.ql
Only trivial test changes.
2026-03-20 13:30:29 +00:00
Taus
283231bdbc Python: Port ShouldBeContextManager.ql
Only trivial test changes.
2026-03-20 13:28:45 +00:00
Taus
8cfdea2001 Python: Port PropertyInOldStyleClass.ql
Only trivial test changes.
2026-03-20 13:28:45 +00:00