Commit Graph

9687 Commits

Author SHA1 Message Date
yoff
408ba6218f Python: switch dataflow library to new (shared) CFG + SSA
Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll)
and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade
(semmle.python.controlflow.internal.Cfg) and the new SSA adapter
(semmle.python.dataflow.new.internal.SsaImpl), both introduced
additively in the preceding PRs in this stack.

This is the trunk-flip equivalent of the original draft PR #21894 (kept
around as documentation), rebased on top of the four preparatory PRs:

  P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919).
  P2: Qualify Flow.qll's AST references with Py:: prefix (#21920).
  P3: Add new shared-CFG-backed control flow graph (#21921).
  P4: Add new shared-SSA-backed SSA adapter (#21923).

The Python dataflow library (semmle/python/dataflow/new/) now imports
the new CFG facade and SSA adapter. All CFG-typed predicates
(ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are
qualified with the Cfg:: prefix; SSA references switch from
EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable.

GuardNode is redesigned to use the new CFG's outcome-node model
(isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock +
flipped indirection. Only BarrierGuard<...> is preserved as public
API.

Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib,
...) are updated to take CFG nodes from the new facade.

A handful of dataflow consistency tweaks for the new CFG:
- Augmented-assignment targets are treated as both load and store.
- 'from X import *' produces uncertain SSA writes for unknown names.
- CFG nodes are canonicalised so dataflow does not see equivalent
  pre/post-order pairs as distinct nodes.

Two AST tweaks for the new CFG:
- AstNodeImpl: omit PEP 695 type-parameter names from
  FunctionDefExpr / ClassDefExpr children.
- ImportResolution: drop the legacy essa import.

Test churn (~175 files): reblessed library- and query-test .expected
files reflect slightly different CFG granularity, different toString
output, and a handful of true alert deltas in security queries.

Verification: all 367 lib + src + consistency-queries compile clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-22 13:46:43 +00:00
Copilot
b7f79f2d34 Python: add new shared-SSA-backed SSA adapter
Preparatory refactor for the shared-CFG dataflow migration. Adds the
new Python SSA adapter additively, without changing any production
behaviour.

Library additions:

- semmle.python.dataflow.new.internal.SsaImpl — Python SSA
  implementation built on the new (shared) CFG. Mirrors the Java SSA
  adapter (java/ql/lib/semmle/code/java/dataflow/internal/SsaImpl.qll):
  an InputSig is defined in terms of positional (BasicBlock, int)
  variable references, and the shared
  codeql.ssa.Ssa::Make<Location, Cfg, Input> module is then
  instantiated.

  SourceVariable is the AST-level Py::Variable. Variable references
  are looked up via the new CFG facade's NameNode.defines/uses/deletes
  predicates (added in the preceding PR), which themselves are
  one-line bridges to AST-level Name.defines/uses/deletes.

  Implicit-entry definitions are inserted for non-local/global/builtin
  reads, captured variables, and (when needed) parameters.

Test additions:

- library-tests/dataflow-new-ssa/ — exercises the new SSA over a
  representative test corpus and checks expected def/use chains.

- library-tests/dataflow-new-ssa-vs-legacy/ — runs both new SSA and
  legacy ESSA over the same corpus and diffs the results, so any
  semantic divergence shows up as a test failure.

Production impact:

None. The new SSA adapter has zero callers in lib/ and src/ — the
legacy ESSA SSA (semmle/python/essa/*) remains the default. The
dataflow library is not migrated yet; that lands in a follow-up PR.

Verified by:
- All 367 lib + src + consistency-queries compile clean.
- All 641 ControlFlow + PointsTo + dataflow + essa + consistency
  library-tests pass.
- Both new dataflow-new-ssa[/vs-legacy] test packs pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-22 13:38:40 +00:00
Copilot
9cdeb8e7ee Python: add new shared-CFG-backed control flow graph
Preparatory refactor for the shared-CFG dataflow migration. Adds the
new Python CFG library additively, without changing any production
behaviour.

Library additions:

- semmle.python.controlflow.internal.AstNodeImpl — mediates between
  the Python AST and the shared codeql.controlflow.ControlFlowGraph
  signature. Wraps Python's Stmt/Expr/Scope/Pattern and adds two
  synthetic kinds of node (BlockStmt for body slots, intermediate
  nodes for multi-operand boolean expressions).

- semmle.python.controlflow.internal.Cfg — public facade
  re-exposing the same API surface as semmle/python/Flow.qll
  (ControlFlowNode, CallNode, BasicBlock, NameNode, DefinitionNode,
  CompareNode, ...), backed by the shared CFG.

- lib/printCfgNew.ql — debug/visualisation query for the new CFG.

- consistency-queries/CfgConsistency.ql — consistency query running
  the shared CFG's standard checks against Python.

Shared library:

- shared.controlflow.ControlFlowGraph — adds two defaulted
  getWhileElse / getForeachElse predicates to AstSig so Python can
  model while-else / for-else (no behavioural change for other
  languages).

Test additions:

- ControlFlow/bindings/* — annotation-driven SSA-binding tests for
  the new CFG (annassign, compound, comprehension, decorated,
  except_handler, imports, match_pattern, parameters, simple,
  type_params, walrus_starred, with_stmt, dead_under_no_raise).

- ControlFlow/store-load/* — basic store/load coverage.

- ControlFlow/evaluation-order/NewCfg*.ql — mirrors of the existing
  OldCfg evaluation-order self-validation suite, run against the
  new CFG via NewCfgImpl.qll.

- Minor extensions to existing test_if.py / test_boolean.py +
  cosmetic .expected churn on a handful of OldCfg tests.

No dataflow, SSA, or production query is migrated yet — that lands in
follow-up PRs. The new CFG library has zero callers in lib/ and src/.

Verified by:
- All lib + src + consistency-queries compile clean (367 queries).
- All 56 ControlFlow library-tests pass.
- All 474 dataflow + PointsTo library-tests + consistency tests pass.
- syntax_error/CONSISTENCY/CfgConsistency passes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-22 13:38:30 +00:00
yoff
9c41238eee Apply suggestions from code review
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-06-22 13:37:46 +00:00
Copilot
d7c0ef7e4d Python: qualify Flow.qll's AST references with Py:: prefix
Preparatory refactor for the shared-CFG dataflow migration. Switches
'import python' to 'import python as Py' inside Flow.qll, and qualifies
every AST-class reference (Expr, Bytes, Dict, AssignExpr, Compare,
Module, Scope, Call, Attribute, SsaVariable, AugAssign, etc.) with the
Py:: prefix.

Flow.qll's own CFG types (ControlFlowNode, BasicBlock, CallNode,
NameNode, DefinitionNode, CompareNode, ...) keep their unqualified
names — they remain the public CFG API exported from this file.

This is a semantic noop: the qualification was applied mechanically by
script and no name resolution changes. Verified by:
- All 361 lib/ + src/ queries compile clean.
- All 186 ControlFlow + PointsTo + dataflow library-tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-22 13:37:45 +00:00
yoff
1a9bb2416a Python: deprecate Function.getAReturnValueFlowNode() and rewrite internal callers
Follow-up to the getAFlowNode deprecation in the same PR: same AST→legacy-CFG
bridge pattern. The 11 internal call sites (across objects/, types/,
frameworks/, and TypeTrackingImpl) are rewritten to bind a `Return ret`
explicitly, then constrain via `ret.getScope() = f and n.getNode() = ret.getValue()`.

The predicate itself is preserved with a deprecation note so external
users do not experience churn.

Semantic noop.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-22 14:55:19 +02:00
Copilot
717ff62d70 Python: deprecate AstNode.getAFlowNode() and rewrite internal callers
Preparatory refactor for the shared-CFG dataflow migration.

Deprecates the AstNode.getAFlowNode() cached predicate on the public
Python QL API and rewrites all ~140 internal callers across lib/, src/,
test/, and tools/ from `expr.getAFlowNode() = cfgNode` to
`cfgNode.getNode() = expr`, using ControlFlowNode.getNode() which
already exists in Flow.qll.

The predicate itself is preserved (with a deprecation note pointing at
the new pattern) so external users do not experience churn — they can
migrate at their own pace and the AST/CFG hierarchies still get the
intended untangling once the deprecation eventually elapses.

Semantic noop verified by:
- All 361 lib/ + src/ queries compile clean.
- All 122 ControlFlow + PointsTo library-tests pass.
- All 64 dataflow library-tests pass.
- All 113 Variables/Exceptions/Expressions/Statements/Functions/Imports/
  Security/CWE-798/ModificationOfParameterWithDefault query-tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-22 14:55:19 +02:00
yoff
8179bffe64 Merge pull request #21930 from github/yoff/python-dataflow-noop-simplifications
Python: inline init_module_submodule_defn into ImportResolution
2026-06-22 14:50:39 +02:00
Owen Mansel-Chan
c9d45217d2 Fix order of comments in test 2026-06-19 13:23:52 +01:00
Owen Mansel-Chan
451fc2e4e7 Undo conversion for queries that import LegacyPointsTo 2026-06-19 12:22:42 +01:00
Owen Mansel-Chan
5497f2c5fe Convert Python qlref tests to inline expectations 2026-06-19 12:22:40 +01:00
Owen Mansel-Chan
c7c1eca415 Merge branch 'main' into copilot/investigate-missing-alerts 2026-06-17 22:54:22 +01:00
Sotiris Dragonas
57f20064ba Merge branch 'main' into bazookamusic/cwe-1427 2026-06-17 17:12:20 +03:00
Owen Mansel-Chan
1f9899d7db Extend added type tracking step to related types 2026-06-17 15:04:53 +01:00
Owen Mansel-Chan
dd61dd2d74 Fix FP for py/modification-of-locals 2026-06-17 14:24:18 +01:00
Owen Mansel-Chan
47c2c9e763 Add test for FP for py/modification-of-locals 2026-06-17 14:22:42 +01:00
Owen Mansel-Chan
ea7510bf72 Refactor ReExposedInstance logic into one place 2026-06-17 13:10:47 +01:00
Owen Mansel-Chan
415857cacb Fix FP for py/should-use-with 2026-06-17 13:01:36 +01:00
Owen Mansel-Chan
d72144646a Add test for FP for py/should-use-with 2026-06-17 12:55:17 +01:00
Owen Mansel-Chan
199fd864ad Fix FP for py/file-not-closed 2026-06-17 12:36:04 +01:00
Owen Mansel-Chan
890969433f Add test for FP for py/file-not-closed 2026-06-17 12:19:03 +01:00
Sotiris Dragonas
274f014d31 Merge branch 'main' into bazookamusic/cwe-1427 2026-06-17 12:53:03 +03:00
Mathias Vorreiter Pedersen
c12cf88c52 Merge branch 'main' into add-yaml-comments 2026-06-17 10:17:06 +01:00
Owen Mansel-Chan
9c65082189 Fix MISSING alert 2026-06-15 00:14:52 +01:00
Owen Mansel-Chan
434a99447e Add thorough tests, including one MISSING alert 2026-06-12 13:45:02 +01:00
Owen Mansel-Chan
d389ea4039 Convert sql-injection test to inline expectations 2026-06-12 13:44:56 +01:00
copilot-swe-agent[bot]
838d06c53f Fix changelog copy errors in change-notes and CHANGELOG.md files (codeql-cli-2.25.6) 2026-06-11 22:45:33 +02:00
Owen Mansel-Chan
913dcb1190 Add change note 2026-06-11 22:20:52 +02:00
Owen Mansel-Chan
befb557bfd Accept fixed MISSING tests 2026-06-11 15:44:20 +02:00
copilot-swe-agent[bot]
73bc2d70ae Model instance-attribute type flow
Use a field level step like JS and Ruby.
2026-06-11 14:48:55 +02:00
Sotiris Dragonas
17dbf03c6d Merge branch 'main' into bazookamusic/cwe-1427 2026-06-11 12:05:57 +02:00
copilot-swe-agent[bot]
a4585d8d94 Add test documenting missing PEP249 alerts for connection stored in self attribute 2026-06-11 05:48:40 +00:00
Tom Hvitved
f5919875b7 Merge pull request #21941 from hvitved/python/content-approx
Python: Implement `ContentApprox`
2026-06-09 15:46:04 +02:00
yoff
0cea01c22f Merge pull request #21926 from github/yoff/python-simplify-decorator-predicates
Python: simplify decorator-detection predicates to pure AST match
2026-06-08 22:04:33 +02:00
BazookaMusic
2cb0851900 1. Rename AgentSDK -> AgentSdk
2. Remove redundant constant comparison barriers. This is already happening by default by the taint tracking library.
2026-06-08 12:55:52 +02:00
Tom Hvitved
cc1ea25856 Python: Implement ContentApprox 2026-06-08 08:41:28 +02:00
Owen Mansel-Chan
1f91f915c7 Merge pull request #21888 from owen-mc/py/remove-imprecise-container-steps
Python: Remove imprecise container steps #2
2026-06-04 22:16:24 +01:00
Mathias Vorreiter Pedersen
b877943b42 Python: Add upgrade and downgrade scripts. 2026-06-04 17:54:53 +01:00
Mathias Vorreiter Pedersen
0aa1abe432 Python: Add support for YAML comments. 2026-06-04 17:54:48 +01:00
Owen Mansel-Chan
da999ee440 Address review comments 2026-06-03 21:24:16 +01:00
Owen Mansel-Chan
6f2cc43f32 Remove imprecise model for tuple() 2026-06-02 21:59:48 +01:00
Owen Mansel-Chan
5042fdee84 Remove imprecise model for list() 2026-06-02 21:59:46 +01:00
Owen Mansel-Chan
04341c47bd Tweak model for str.join 2026-06-02 21:59:44 +01:00
Owen Mansel-Chan
b27d08ee32 Update edges in expected test output 2026-06-02 18:29:56 +01:00
Owen Mansel-Chan
20ce679d61 Accept changed edges in test output
No changes to alerts
2026-06-02 16:15:08 +01:00
Owen Mansel-Chan
f62ebef9e0 Adjust expected test output 2026-06-02 16:15:06 +01:00
Owen Mansel-Chan
c3ef1ddd64 Add MaD models for lxml and xml etree.fromstringlist 2026-06-02 16:15:01 +01:00
Owen Mansel-Chan
dede5bc49b Track flow through tuple() with list with tainted elements 2026-06-02 16:14:59 +01:00
Owen Mansel-Chan
ad97b6dd64 Use access path for str.join model 2026-06-02 16:14:56 +01:00
yoff
ac5fa629ef Python: inline init_module_submodule_defn into ImportResolution
The new-dataflow ImportResolution module only used
semmle.python.essa.SsaDefinitions for the 5-line helper predicate
SsaSource::init_module_submodule_defn. Inline it locally and drop the
dependency on legacy SsaDefinitions. This is the only remaining direct
import of semmle.python.essa.* in the new dataflow stack, so dropping
it makes the layering cleaner.

Semantic noop on the current SSA: SsaSourceVariable.getName() and
GlobalVariable.getId() both project the same DB column
(variable(_,_,result)), and the old call's 'init.getEntryNode() = f'
join was just constraining init = package via Scope.getEntryNode()'s
functional uniqueness. RA dump of accesses.ql confirms only the
expected predicate-rename shuffle; all 70 dataflow + ApiGraphs library
tests pass.

This factors out commit 8cab5a20f2 from the larger shared-CFG
migration #21925.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-02 08:24:17 +00:00