Commit Graph

10030 Commits

Author SHA1 Message Date
yoff
893301098c Python: visit function parameter and return annotations in new CFG
The new (shared-CFG-based) Python control flow graph in
`semmle.python.controlflow.internal.Cfg` previously did not emit CFG
nodes for parameter type annotations (`def f(x: T): ...`) or for the
return type annotation (`-> T`). The legacy CFG emitted both, and a
small number of framework models rely on this: `LocalSources.qll`'s
`annotatedInstance` walks the parameter annotation expression by way
of its CFG node to track that a parameter receives an instance of the
annotated class.

After the dataflow flip to the new CFG/SSA this regression manifested
as lost flows in any test exercising annotation-based parameter
tracking: FastAPI `Depends()` receivers, Pydantic request bodies,
Starlette `WebSocket`, the call-graph type-annotation test, and so on.
Extend `FunctionDefExpr` to visit each annotation as a child of the
function-def expression, in CPython evaluation order: positional
parameter annotations, `*args` annotation, keyword-only parameter
annotations, `**kwargs` annotation, then the return annotation. (Lambda
expressions have no annotations in Python syntax, so `LambdaExpr` is
unchanged.) PEP 695 type parameters remain out of scope; they belong
to the inner annotation scope, not the enclosing CFG.

Restored test results across `framework/aiohttp`, `framework/fastapi`,
`framework/lxml`, the `CallGraph-type-annotations` test, and
`CWE-022-PathInjection`. Two FastAPI list-comprehension MISSING markers
become positive (`taint_test.py:41,55`). CPython CFG consistency
remains clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-22 13:47:04 +00:00
yoff
92e03318ea Python: model exception edges for raise-prone expressions inside try/with
The new CFG previously only emitted exception edges for explicit `raise`
and `assert` statements. As a result, code that became reachable only
via the exception path of an arbitrary expression (e.g., the body of an
`except` handler following a try-body whose `call()` could raise) was
classified as dead, breaking analyses like StackTraceExposure,
FileNotAlwaysClosed, ExceptionInfo, UseOfExit, and CatchingBaseException.

This commit adds a `mayThrow` predicate over expressions that are known
sources of implicit exceptions in Python (calls, attribute access,
subscripts, arithmetic/comparison operators, imports, await/yield/yield
from) plus `from m import *` at the statement level, and routes them
through the shared CFG's `beginAbruptCompletion(_, _, ExceptionSuccessor,
always=false)` hook.

The set of exception sources is restricted to nodes that are
syntactically inside a `try`/`with` statement in the same scope.
This mirrors Java's `ControlFlowGraph::mayThrow`, which only emits
exception edges where local handling can observe them — outside such
contexts, the edges add CFG complexity (weakening BarrierGuard
precision and breaking SSA continuity around augmented assignments and
subscript stores) without analysis benefit, since exceptions just
propagate to the function exit anyway.

Net effect on the test suite: ~100 alerts restored across the exception-
related query tests (StackTraceExposure +29, ExceptionInfo +17,
FileNotAlwaysClosed +52, UseOfExit +1, CatchingBaseException restored)
with no precision regressions. Affected `.expected` files and the
regression-guard `dead_under_no_raise.py` are updated accordingly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-22 13:46:51 +00:00
yoff
408ba6218f Python: switch dataflow library to new (shared) CFG + SSA
Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll)
and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade
(semmle.python.controlflow.internal.Cfg) and the new SSA adapter
(semmle.python.dataflow.new.internal.SsaImpl), both introduced
additively in the preceding PRs in this stack.

This is the trunk-flip equivalent of the original draft PR #21894 (kept
around as documentation), rebased on top of the four preparatory PRs:

  P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919).
  P2: Qualify Flow.qll's AST references with Py:: prefix (#21920).
  P3: Add new shared-CFG-backed control flow graph (#21921).
  P4: Add new shared-SSA-backed SSA adapter (#21923).

The Python dataflow library (semmle/python/dataflow/new/) now imports
the new CFG facade and SSA adapter. All CFG-typed predicates
(ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are
qualified with the Cfg:: prefix; SSA references switch from
EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable.

GuardNode is redesigned to use the new CFG's outcome-node model
(isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock +
flipped indirection. Only BarrierGuard<...> is preserved as public
API.

Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib,
...) are updated to take CFG nodes from the new facade.

A handful of dataflow consistency tweaks for the new CFG:
- Augmented-assignment targets are treated as both load and store.
- 'from X import *' produces uncertain SSA writes for unknown names.
- CFG nodes are canonicalised so dataflow does not see equivalent
  pre/post-order pairs as distinct nodes.

Two AST tweaks for the new CFG:
- AstNodeImpl: omit PEP 695 type-parameter names from
  FunctionDefExpr / ClassDefExpr children.
- ImportResolution: drop the legacy essa import.

Test churn (~175 files): reblessed library- and query-test .expected
files reflect slightly different CFG granularity, different toString
output, and a handful of true alert deltas in security queries.

Verification: all 367 lib + src + consistency-queries compile clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-22 13:46:43 +00:00
Copilot
b7f79f2d34 Python: add new shared-SSA-backed SSA adapter
Preparatory refactor for the shared-CFG dataflow migration. Adds the
new Python SSA adapter additively, without changing any production
behaviour.

Library additions:

- semmle.python.dataflow.new.internal.SsaImpl — Python SSA
  implementation built on the new (shared) CFG. Mirrors the Java SSA
  adapter (java/ql/lib/semmle/code/java/dataflow/internal/SsaImpl.qll):
  an InputSig is defined in terms of positional (BasicBlock, int)
  variable references, and the shared
  codeql.ssa.Ssa::Make<Location, Cfg, Input> module is then
  instantiated.

  SourceVariable is the AST-level Py::Variable. Variable references
  are looked up via the new CFG facade's NameNode.defines/uses/deletes
  predicates (added in the preceding PR), which themselves are
  one-line bridges to AST-level Name.defines/uses/deletes.

  Implicit-entry definitions are inserted for non-local/global/builtin
  reads, captured variables, and (when needed) parameters.

Test additions:

- library-tests/dataflow-new-ssa/ — exercises the new SSA over a
  representative test corpus and checks expected def/use chains.

- library-tests/dataflow-new-ssa-vs-legacy/ — runs both new SSA and
  legacy ESSA over the same corpus and diffs the results, so any
  semantic divergence shows up as a test failure.

Production impact:

None. The new SSA adapter has zero callers in lib/ and src/ — the
legacy ESSA SSA (semmle/python/essa/*) remains the default. The
dataflow library is not migrated yet; that lands in a follow-up PR.

Verified by:
- All 367 lib + src + consistency-queries compile clean.
- All 641 ControlFlow + PointsTo + dataflow + essa + consistency
  library-tests pass.
- Both new dataflow-new-ssa[/vs-legacy] test packs pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-22 13:38:40 +00:00
Copilot
9cdeb8e7ee Python: add new shared-CFG-backed control flow graph
Preparatory refactor for the shared-CFG dataflow migration. Adds the
new Python CFG library additively, without changing any production
behaviour.

Library additions:

- semmle.python.controlflow.internal.AstNodeImpl — mediates between
  the Python AST and the shared codeql.controlflow.ControlFlowGraph
  signature. Wraps Python's Stmt/Expr/Scope/Pattern and adds two
  synthetic kinds of node (BlockStmt for body slots, intermediate
  nodes for multi-operand boolean expressions).

- semmle.python.controlflow.internal.Cfg — public facade
  re-exposing the same API surface as semmle/python/Flow.qll
  (ControlFlowNode, CallNode, BasicBlock, NameNode, DefinitionNode,
  CompareNode, ...), backed by the shared CFG.

- lib/printCfgNew.ql — debug/visualisation query for the new CFG.

- consistency-queries/CfgConsistency.ql — consistency query running
  the shared CFG's standard checks against Python.

Shared library:

- shared.controlflow.ControlFlowGraph — adds two defaulted
  getWhileElse / getForeachElse predicates to AstSig so Python can
  model while-else / for-else (no behavioural change for other
  languages).

Test additions:

- ControlFlow/bindings/* — annotation-driven SSA-binding tests for
  the new CFG (annassign, compound, comprehension, decorated,
  except_handler, imports, match_pattern, parameters, simple,
  type_params, walrus_starred, with_stmt, dead_under_no_raise).

- ControlFlow/store-load/* — basic store/load coverage.

- ControlFlow/evaluation-order/NewCfg*.ql — mirrors of the existing
  OldCfg evaluation-order self-validation suite, run against the
  new CFG via NewCfgImpl.qll.

- Minor extensions to existing test_if.py / test_boolean.py +
  cosmetic .expected churn on a handful of OldCfg tests.

No dataflow, SSA, or production query is migrated yet — that lands in
follow-up PRs. The new CFG library has zero callers in lib/ and src/.

Verified by:
- All lib + src + consistency-queries compile clean (367 queries).
- All 56 ControlFlow library-tests pass.
- All 474 dataflow + PointsTo library-tests + consistency tests pass.
- syntax_error/CONSISTENCY/CfgConsistency passes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-22 13:38:30 +00:00
yoff
9c41238eee Apply suggestions from code review
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-06-22 13:37:46 +00:00
Copilot
d7c0ef7e4d Python: qualify Flow.qll's AST references with Py:: prefix
Preparatory refactor for the shared-CFG dataflow migration. Switches
'import python' to 'import python as Py' inside Flow.qll, and qualifies
every AST-class reference (Expr, Bytes, Dict, AssignExpr, Compare,
Module, Scope, Call, Attribute, SsaVariable, AugAssign, etc.) with the
Py:: prefix.

Flow.qll's own CFG types (ControlFlowNode, BasicBlock, CallNode,
NameNode, DefinitionNode, CompareNode, ...) keep their unqualified
names — they remain the public CFG API exported from this file.

This is a semantic noop: the qualification was applied mechanically by
script and no name resolution changes. Verified by:
- All 361 lib/ + src/ queries compile clean.
- All 186 ControlFlow + PointsTo + dataflow library-tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-22 13:37:45 +00:00
yoff
1a9bb2416a Python: deprecate Function.getAReturnValueFlowNode() and rewrite internal callers
Follow-up to the getAFlowNode deprecation in the same PR: same AST→legacy-CFG
bridge pattern. The 11 internal call sites (across objects/, types/,
frameworks/, and TypeTrackingImpl) are rewritten to bind a `Return ret`
explicitly, then constrain via `ret.getScope() = f and n.getNode() = ret.getValue()`.

The predicate itself is preserved with a deprecation note so external
users do not experience churn.

Semantic noop.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-22 14:55:19 +02:00
Copilot
717ff62d70 Python: deprecate AstNode.getAFlowNode() and rewrite internal callers
Preparatory refactor for the shared-CFG dataflow migration.

Deprecates the AstNode.getAFlowNode() cached predicate on the public
Python QL API and rewrites all ~140 internal callers across lib/, src/,
test/, and tools/ from `expr.getAFlowNode() = cfgNode` to
`cfgNode.getNode() = expr`, using ControlFlowNode.getNode() which
already exists in Flow.qll.

The predicate itself is preserved (with a deprecation note pointing at
the new pattern) so external users do not experience churn — they can
migrate at their own pace and the AST/CFG hierarchies still get the
intended untangling once the deprecation eventually elapses.

Semantic noop verified by:
- All 361 lib/ + src/ queries compile clean.
- All 122 ControlFlow + PointsTo library-tests pass.
- All 64 dataflow library-tests pass.
- All 113 Variables/Exceptions/Expressions/Statements/Functions/Imports/
  Security/CWE-798/ModificationOfParameterWithDefault query-tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-22 14:55:19 +02:00
yoff
8179bffe64 Merge pull request #21930 from github/yoff/python-dataflow-noop-simplifications
Python: inline init_module_submodule_defn into ImportResolution
2026-06-22 14:50:39 +02:00
Owen Mansel-Chan
c9d45217d2 Fix order of comments in test 2026-06-19 13:23:52 +01:00
Owen Mansel-Chan
451fc2e4e7 Undo conversion for queries that import LegacyPointsTo 2026-06-19 12:22:42 +01:00
Owen Mansel-Chan
5497f2c5fe Convert Python qlref tests to inline expectations 2026-06-19 12:22:40 +01:00
Owen Mansel-Chan
c7c1eca415 Merge branch 'main' into copilot/investigate-missing-alerts 2026-06-17 22:54:22 +01:00
Mathias Vorreiter Pedersen
004a5b4645 Python: Ensure that YAML comment extraction is properly reflected in the dbscheme template. 2026-06-17 17:04:43 +01:00
Sotiris Dragonas
57f20064ba Merge branch 'main' into bazookamusic/cwe-1427 2026-06-17 17:12:20 +03:00
Owen Mansel-Chan
1f9899d7db Extend added type tracking step to related types 2026-06-17 15:04:53 +01:00
Owen Mansel-Chan
dd61dd2d74 Fix FP for py/modification-of-locals 2026-06-17 14:24:18 +01:00
Owen Mansel-Chan
47c2c9e763 Add test for FP for py/modification-of-locals 2026-06-17 14:22:42 +01:00
Owen Mansel-Chan
ea7510bf72 Refactor ReExposedInstance logic into one place 2026-06-17 13:10:47 +01:00
Owen Mansel-Chan
415857cacb Fix FP for py/should-use-with 2026-06-17 13:01:36 +01:00
Owen Mansel-Chan
d72144646a Add test for FP for py/should-use-with 2026-06-17 12:55:17 +01:00
Owen Mansel-Chan
199fd864ad Fix FP for py/file-not-closed 2026-06-17 12:36:04 +01:00
Owen Mansel-Chan
890969433f Add test for FP for py/file-not-closed 2026-06-17 12:19:03 +01:00
Sotiris Dragonas
274f014d31 Merge branch 'main' into bazookamusic/cwe-1427 2026-06-17 12:53:03 +03:00
Mathias Vorreiter Pedersen
c12cf88c52 Merge branch 'main' into add-yaml-comments 2026-06-17 10:17:06 +01:00
Owen Mansel-Chan
9c65082189 Fix MISSING alert 2026-06-15 00:14:52 +01:00
Owen Mansel-Chan
434a99447e Add thorough tests, including one MISSING alert 2026-06-12 13:45:02 +01:00
Owen Mansel-Chan
d389ea4039 Convert sql-injection test to inline expectations 2026-06-12 13:44:56 +01:00
copilot-swe-agent[bot]
838d06c53f Fix changelog copy errors in change-notes and CHANGELOG.md files (codeql-cli-2.25.6) 2026-06-11 22:45:33 +02:00
Owen Mansel-Chan
913dcb1190 Add change note 2026-06-11 22:20:52 +02:00
Owen Mansel-Chan
befb557bfd Accept fixed MISSING tests 2026-06-11 15:44:20 +02:00
copilot-swe-agent[bot]
73bc2d70ae Model instance-attribute type flow
Use a field level step like JS and Ruby.
2026-06-11 14:48:55 +02:00
Sotiris Dragonas
17dbf03c6d Merge branch 'main' into bazookamusic/cwe-1427 2026-06-11 12:05:57 +02:00
copilot-swe-agent[bot]
a4585d8d94 Add test documenting missing PEP249 alerts for connection stored in self attribute 2026-06-11 05:48:40 +00:00
Tom Hvitved
f5919875b7 Merge pull request #21941 from hvitved/python/content-approx
Python: Implement `ContentApprox`
2026-06-09 15:46:04 +02:00
yoff
0cea01c22f Merge pull request #21926 from github/yoff/python-simplify-decorator-predicates
Python: simplify decorator-detection predicates to pure AST match
2026-06-08 22:04:33 +02:00
BazookaMusic
2cb0851900 1. Rename AgentSDK -> AgentSdk
2. Remove redundant constant comparison barriers. This is already happening by default by the taint tracking library.
2026-06-08 12:55:52 +02:00
Tom Hvitved
cc1ea25856 Python: Implement ContentApprox 2026-06-08 08:41:28 +02:00
Owen Mansel-Chan
1f91f915c7 Merge pull request #21888 from owen-mc/py/remove-imprecise-container-steps
Python: Remove imprecise container steps #2
2026-06-04 22:16:24 +01:00
Mathias Vorreiter Pedersen
b877943b42 Python: Add upgrade and downgrade scripts. 2026-06-04 17:54:53 +01:00
Mathias Vorreiter Pedersen
0aa1abe432 Python: Add support for YAML comments. 2026-06-04 17:54:48 +01:00
Owen Mansel-Chan
da999ee440 Address review comments 2026-06-03 21:24:16 +01:00
Owen Mansel-Chan
6f2cc43f32 Remove imprecise model for tuple() 2026-06-02 21:59:48 +01:00
Owen Mansel-Chan
5042fdee84 Remove imprecise model for list() 2026-06-02 21:59:46 +01:00
Owen Mansel-Chan
04341c47bd Tweak model for str.join 2026-06-02 21:59:44 +01:00
Owen Mansel-Chan
b27d08ee32 Update edges in expected test output 2026-06-02 18:29:56 +01:00
Owen Mansel-Chan
20ce679d61 Accept changed edges in test output
No changes to alerts
2026-06-02 16:15:08 +01:00
Owen Mansel-Chan
f62ebef9e0 Adjust expected test output 2026-06-02 16:15:06 +01:00
Owen Mansel-Chan
c3ef1ddd64 Add MaD models for lxml and xml etree.fromstringlist 2026-06-02 16:15:01 +01:00