Commit Graph

9688 Commits

Author SHA1 Message Date
Copilot
e51acdc541 Python: add new shared-SSA-backed SSA adapter
Preparatory refactor for the shared-CFG dataflow migration. Adds the
new Python SSA adapter additively, without changing any production
behaviour.

Library additions:

- semmle.python.dataflow.new.internal.SsaImpl — Python SSA
  implementation built on the new (shared) CFG. Mirrors the Java SSA
  adapter (java/ql/lib/semmle/code/java/dataflow/internal/SsaImpl.qll):
  an InputSig is defined in terms of positional (BasicBlock, int)
  variable references, and the shared
  codeql.ssa.Ssa::Make<Location, Cfg, Input> module is then
  instantiated.

  SourceVariable is the AST-level Py::Variable. Variable references
  are looked up via the new CFG facade's NameNode.defines/uses/deletes
  predicates (added in the preceding PR), which themselves are
  one-line bridges to AST-level Name.defines/uses/deletes.

  Implicit-entry definitions are inserted for non-local/global/builtin
  reads, captured variables, and (when needed) parameters.

Test additions:

- library-tests/dataflow-new-ssa/ — exercises the new SSA over a
  representative test corpus and checks expected def/use chains.

- library-tests/dataflow-new-ssa-vs-legacy/ — runs both new SSA and
  legacy ESSA over the same corpus and diffs the results, so any
  semantic divergence shows up as a test failure.

Production impact:

None. The new SSA adapter has zero callers in lib/ and src/ — the
legacy ESSA SSA (semmle/python/essa/*) remains the default. The
dataflow library is not migrated yet; that lands in a follow-up PR.

Verified by:
- All 367 lib + src + consistency-queries compile clean.
- All 641 ControlFlow + PointsTo + dataflow + essa + consistency
  library-tests pass.
- Both new dataflow-new-ssa[/vs-legacy] test packs pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-24 08:40:16 +00:00
yoff
c95c3fe638 Python: model exception edges for raise-prone expressions inside try/with
The new CFG previously only emitted exception edges for explicit `raise`
and `assert` statements. As a result, code that became reachable only
via the exception path of an arbitrary expression (e.g., the body of an
`except` handler following a try-body whose `call()` could raise) was
classified as dead, breaking analyses like StackTraceExposure,
FileNotAlwaysClosed, ExceptionInfo, UseOfExit, and CatchingBaseException.

This commit adds a `mayThrow` predicate over expressions that are known
sources of implicit exceptions in Python (calls, attribute access,
subscripts, arithmetic/comparison operators, imports, await/yield/yield
from) plus `from m import *` at the statement level, and routes them
through the shared CFG's `beginAbruptCompletion(_, _, ExceptionSuccessor,
always=false)` hook.

The set of exception sources is restricted to nodes that are
syntactically inside a `try`/`with` statement in the same scope.
This mirrors Java's `ControlFlowGraph::mayThrow`, which only emits
exception edges where local handling can observe them — outside such
contexts, the edges add CFG complexity (weakening BarrierGuard
precision and breaking SSA continuity around augmented assignments and
subscript stores) without analysis benefit, since exceptions just
propagate to the function exit anyway.

Net effect on the test suite: ~100 alerts restored across the exception-
related query tests (StackTraceExposure +29, ExceptionInfo +17,
FileNotAlwaysClosed +52, UseOfExit +1, CatchingBaseException restored)
with no precision regressions. Affected `.expected` files and the
regression-guard `dead_under_no_raise.py` are updated accordingly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-24 08:15:45 +00:00
yoff
6c03194092 Python: add new shared-CFG-backed control flow graph facade (Cfg)
Adds the public facade on top of the AstNodeImpl adapter from the
previous commit. Re-exposes the same API surface as
semmle/python/Flow.qll (ControlFlowNode, CallNode, BasicBlock,
NameNode, DefinitionNode, CompareNode, ...), backed by the shared
codeql.controlflow.ControlFlowGraph library.

- semmle.python.controlflow.internal.Cfg — public facade.
- ControlFlow/store-load/* — basic store/load coverage via the facade.

The new CFG library is added additively: it has zero callers in lib/
and src/, and the legacy CFG in semmle/python/Flow.qll remains the
default. Dataflow, SSA, and production query migration land in
follow-up PRs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-24 08:15:45 +00:00
yoff
39e6bfc894 Python: add shared-CFG AstSig adapter (AstNodeImpl)
Preparatory refactor for the shared-CFG dataflow migration. Adds the
adapter that mediates between the Python AST and the shared
codeql.controlflow.ControlFlowGraph signature, plus the test suites
that validate the new CFG directly against this adapter. The public
facade is added in the following commit.

Library additions:

- semmle.python.controlflow.internal.AstNodeImpl — wraps Python's
  Stmt/Expr/Scope/Pattern and adds two synthetic kinds of node
  (BlockStmt for body slots, intermediate nodes for multi-operand
  boolean expressions) to satisfy the shared CFG signature.

- lib/printCfgNew.ql — debug/visualisation query for the new CFG.

- consistency-queries/CfgConsistency.ql — consistency query running
  the shared CFG's standard checks against Python.

Test additions (all driven directly off AstNodeImpl):

- ControlFlow/bindings/* — annotation-driven SSA-binding tests
  (annassign, compound, comprehension, decorated, except_handler,
  imports, match_pattern, parameters, simple, type_params,
  walrus_starred, with_stmt, dead_under_no_raise).

- ControlFlow/evaluation-order/NewCfg*.ql — mirrors of the existing
  OldCfg evaluation-order self-validation suite, run against the
  new CFG via NewCfgImpl.qll.

- Minor extensions to existing test_if.py / test_boolean.py +
  cosmetic .expected churn on a handful of OldCfg tests.

No dataflow, SSA, or production query is migrated yet.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-24 08:15:33 +00:00
yoff
7d95024487 Apply suggestions from code review
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-06-23 12:36:53 +02:00
Copilot
06fa46f664 Python: qualify Flow.qll's AST references with Py:: prefix
Preparatory refactor for the shared-CFG dataflow migration. Switches
'import python' to 'import python as Py' inside Flow.qll, and qualifies
every AST-class reference (Expr, Bytes, Dict, AssignExpr, Compare,
Module, Scope, Call, Attribute, SsaVariable, AugAssign, etc.) with the
Py:: prefix.

Flow.qll's own CFG types (ControlFlowNode, BasicBlock, CallNode,
NameNode, DefinitionNode, CompareNode, ...) keep their unqualified
names — they remain the public CFG API exported from this file.

This is a semantic noop: the qualification was applied mechanically by
script and no name resolution changes. Verified by:
- All 361 lib/ + src/ queries compile clean.
- All 186 ControlFlow + PointsTo + dataflow library-tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-23 12:36:53 +02:00
yoff
1a9bb2416a Python: deprecate Function.getAReturnValueFlowNode() and rewrite internal callers
Follow-up to the getAFlowNode deprecation in the same PR: same AST→legacy-CFG
bridge pattern. The 11 internal call sites (across objects/, types/,
frameworks/, and TypeTrackingImpl) are rewritten to bind a `Return ret`
explicitly, then constrain via `ret.getScope() = f and n.getNode() = ret.getValue()`.

The predicate itself is preserved with a deprecation note so external
users do not experience churn.

Semantic noop.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-22 14:55:19 +02:00
Copilot
717ff62d70 Python: deprecate AstNode.getAFlowNode() and rewrite internal callers
Preparatory refactor for the shared-CFG dataflow migration.

Deprecates the AstNode.getAFlowNode() cached predicate on the public
Python QL API and rewrites all ~140 internal callers across lib/, src/,
test/, and tools/ from `expr.getAFlowNode() = cfgNode` to
`cfgNode.getNode() = expr`, using ControlFlowNode.getNode() which
already exists in Flow.qll.

The predicate itself is preserved (with a deprecation note pointing at
the new pattern) so external users do not experience churn — they can
migrate at their own pace and the AST/CFG hierarchies still get the
intended untangling once the deprecation eventually elapses.

Semantic noop verified by:
- All 361 lib/ + src/ queries compile clean.
- All 122 ControlFlow + PointsTo library-tests pass.
- All 64 dataflow library-tests pass.
- All 113 Variables/Exceptions/Expressions/Statements/Functions/Imports/
  Security/CWE-798/ModificationOfParameterWithDefault query-tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-22 14:55:19 +02:00
yoff
8179bffe64 Merge pull request #21930 from github/yoff/python-dataflow-noop-simplifications
Python: inline init_module_submodule_defn into ImportResolution
2026-06-22 14:50:39 +02:00
Owen Mansel-Chan
c9d45217d2 Fix order of comments in test 2026-06-19 13:23:52 +01:00
Owen Mansel-Chan
451fc2e4e7 Undo conversion for queries that import LegacyPointsTo 2026-06-19 12:22:42 +01:00
Owen Mansel-Chan
5497f2c5fe Convert Python qlref tests to inline expectations 2026-06-19 12:22:40 +01:00
Owen Mansel-Chan
c7c1eca415 Merge branch 'main' into copilot/investigate-missing-alerts 2026-06-17 22:54:22 +01:00
Sotiris Dragonas
57f20064ba Merge branch 'main' into bazookamusic/cwe-1427 2026-06-17 17:12:20 +03:00
Owen Mansel-Chan
1f9899d7db Extend added type tracking step to related types 2026-06-17 15:04:53 +01:00
Owen Mansel-Chan
dd61dd2d74 Fix FP for py/modification-of-locals 2026-06-17 14:24:18 +01:00
Owen Mansel-Chan
47c2c9e763 Add test for FP for py/modification-of-locals 2026-06-17 14:22:42 +01:00
Owen Mansel-Chan
ea7510bf72 Refactor ReExposedInstance logic into one place 2026-06-17 13:10:47 +01:00
Owen Mansel-Chan
415857cacb Fix FP for py/should-use-with 2026-06-17 13:01:36 +01:00
Owen Mansel-Chan
d72144646a Add test for FP for py/should-use-with 2026-06-17 12:55:17 +01:00
Owen Mansel-Chan
199fd864ad Fix FP for py/file-not-closed 2026-06-17 12:36:04 +01:00
Owen Mansel-Chan
890969433f Add test for FP for py/file-not-closed 2026-06-17 12:19:03 +01:00
Sotiris Dragonas
274f014d31 Merge branch 'main' into bazookamusic/cwe-1427 2026-06-17 12:53:03 +03:00
Mathias Vorreiter Pedersen
c12cf88c52 Merge branch 'main' into add-yaml-comments 2026-06-17 10:17:06 +01:00
Owen Mansel-Chan
9c65082189 Fix MISSING alert 2026-06-15 00:14:52 +01:00
Owen Mansel-Chan
434a99447e Add thorough tests, including one MISSING alert 2026-06-12 13:45:02 +01:00
Owen Mansel-Chan
d389ea4039 Convert sql-injection test to inline expectations 2026-06-12 13:44:56 +01:00
copilot-swe-agent[bot]
838d06c53f Fix changelog copy errors in change-notes and CHANGELOG.md files (codeql-cli-2.25.6) 2026-06-11 22:45:33 +02:00
Owen Mansel-Chan
913dcb1190 Add change note 2026-06-11 22:20:52 +02:00
Owen Mansel-Chan
befb557bfd Accept fixed MISSING tests 2026-06-11 15:44:20 +02:00
copilot-swe-agent[bot]
73bc2d70ae Model instance-attribute type flow
Use a field level step like JS and Ruby.
2026-06-11 14:48:55 +02:00
Sotiris Dragonas
17dbf03c6d Merge branch 'main' into bazookamusic/cwe-1427 2026-06-11 12:05:57 +02:00
copilot-swe-agent[bot]
a4585d8d94 Add test documenting missing PEP249 alerts for connection stored in self attribute 2026-06-11 05:48:40 +00:00
Tom Hvitved
f5919875b7 Merge pull request #21941 from hvitved/python/content-approx
Python: Implement `ContentApprox`
2026-06-09 15:46:04 +02:00
yoff
0cea01c22f Merge pull request #21926 from github/yoff/python-simplify-decorator-predicates
Python: simplify decorator-detection predicates to pure AST match
2026-06-08 22:04:33 +02:00
BazookaMusic
2cb0851900 1. Rename AgentSDK -> AgentSdk
2. Remove redundant constant comparison barriers. This is already happening by default by the taint tracking library.
2026-06-08 12:55:52 +02:00
Tom Hvitved
cc1ea25856 Python: Implement ContentApprox 2026-06-08 08:41:28 +02:00
Owen Mansel-Chan
1f91f915c7 Merge pull request #21888 from owen-mc/py/remove-imprecise-container-steps
Python: Remove imprecise container steps #2
2026-06-04 22:16:24 +01:00
Mathias Vorreiter Pedersen
b877943b42 Python: Add upgrade and downgrade scripts. 2026-06-04 17:54:53 +01:00
Mathias Vorreiter Pedersen
0aa1abe432 Python: Add support for YAML comments. 2026-06-04 17:54:48 +01:00
Owen Mansel-Chan
da999ee440 Address review comments 2026-06-03 21:24:16 +01:00
Owen Mansel-Chan
6f2cc43f32 Remove imprecise model for tuple() 2026-06-02 21:59:48 +01:00
Owen Mansel-Chan
5042fdee84 Remove imprecise model for list() 2026-06-02 21:59:46 +01:00
Owen Mansel-Chan
04341c47bd Tweak model for str.join 2026-06-02 21:59:44 +01:00
Owen Mansel-Chan
b27d08ee32 Update edges in expected test output 2026-06-02 18:29:56 +01:00
Owen Mansel-Chan
20ce679d61 Accept changed edges in test output
No changes to alerts
2026-06-02 16:15:08 +01:00
Owen Mansel-Chan
f62ebef9e0 Adjust expected test output 2026-06-02 16:15:06 +01:00
Owen Mansel-Chan
c3ef1ddd64 Add MaD models for lxml and xml etree.fromstringlist 2026-06-02 16:15:01 +01:00
Owen Mansel-Chan
dede5bc49b Track flow through tuple() with list with tainted elements 2026-06-02 16:14:59 +01:00
Owen Mansel-Chan
ad97b6dd64 Use access path for str.join model 2026-06-02 16:14:56 +01:00