Commit Graph

3279 Commits

Author SHA1 Message Date
yoff
c95c3fe638 Python: model exception edges for raise-prone expressions inside try/with
The new CFG previously only emitted exception edges for explicit `raise`
and `assert` statements. As a result, code that became reachable only
via the exception path of an arbitrary expression (e.g., the body of an
`except` handler following a try-body whose `call()` could raise) was
classified as dead, breaking analyses like StackTraceExposure,
FileNotAlwaysClosed, ExceptionInfo, UseOfExit, and CatchingBaseException.

This commit adds a `mayThrow` predicate over expressions that are known
sources of implicit exceptions in Python (calls, attribute access,
subscripts, arithmetic/comparison operators, imports, await/yield/yield
from) plus `from m import *` at the statement level, and routes them
through the shared CFG's `beginAbruptCompletion(_, _, ExceptionSuccessor,
always=false)` hook.

The set of exception sources is restricted to nodes that are
syntactically inside a `try`/`with` statement in the same scope.
This mirrors Java's `ControlFlowGraph::mayThrow`, which only emits
exception edges where local handling can observe them — outside such
contexts, the edges add CFG complexity (weakening BarrierGuard
precision and breaking SSA continuity around augmented assignments and
subscript stores) without analysis benefit, since exceptions just
propagate to the function exit anyway.

Net effect on the test suite: ~100 alerts restored across the exception-
related query tests (StackTraceExposure +29, ExceptionInfo +17,
FileNotAlwaysClosed +52, UseOfExit +1, CatchingBaseException restored)
with no precision regressions. Affected `.expected` files and the
regression-guard `dead_under_no_raise.py` are updated accordingly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-24 08:15:45 +00:00
yoff
6c03194092 Python: add new shared-CFG-backed control flow graph facade (Cfg)
Adds the public facade on top of the AstNodeImpl adapter from the
previous commit. Re-exposes the same API surface as
semmle/python/Flow.qll (ControlFlowNode, CallNode, BasicBlock,
NameNode, DefinitionNode, CompareNode, ...), backed by the shared
codeql.controlflow.ControlFlowGraph library.

- semmle.python.controlflow.internal.Cfg — public facade.
- ControlFlow/store-load/* — basic store/load coverage via the facade.

The new CFG library is added additively: it has zero callers in lib/
and src/, and the legacy CFG in semmle/python/Flow.qll remains the
default. Dataflow, SSA, and production query migration land in
follow-up PRs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-24 08:15:45 +00:00
yoff
39e6bfc894 Python: add shared-CFG AstSig adapter (AstNodeImpl)
Preparatory refactor for the shared-CFG dataflow migration. Adds the
adapter that mediates between the Python AST and the shared
codeql.controlflow.ControlFlowGraph signature, plus the test suites
that validate the new CFG directly against this adapter. The public
facade is added in the following commit.

Library additions:

- semmle.python.controlflow.internal.AstNodeImpl — wraps Python's
  Stmt/Expr/Scope/Pattern and adds two synthetic kinds of node
  (BlockStmt for body slots, intermediate nodes for multi-operand
  boolean expressions) to satisfy the shared CFG signature.

- lib/printCfgNew.ql — debug/visualisation query for the new CFG.

- consistency-queries/CfgConsistency.ql — consistency query running
  the shared CFG's standard checks against Python.

Test additions (all driven directly off AstNodeImpl):

- ControlFlow/bindings/* — annotation-driven SSA-binding tests
  (annassign, compound, comprehension, decorated, except_handler,
  imports, match_pattern, parameters, simple, type_params,
  walrus_starred, with_stmt, dead_under_no_raise).

- ControlFlow/evaluation-order/NewCfg*.ql — mirrors of the existing
  OldCfg evaluation-order self-validation suite, run against the
  new CFG via NewCfgImpl.qll.

- Minor extensions to existing test_if.py / test_boolean.py +
  cosmetic .expected churn on a handful of OldCfg tests.

No dataflow, SSA, or production query is migrated yet.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-24 08:15:33 +00:00
yoff
7d95024487 Apply suggestions from code review
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-06-23 12:36:53 +02:00
Copilot
06fa46f664 Python: qualify Flow.qll's AST references with Py:: prefix
Preparatory refactor for the shared-CFG dataflow migration. Switches
'import python' to 'import python as Py' inside Flow.qll, and qualifies
every AST-class reference (Expr, Bytes, Dict, AssignExpr, Compare,
Module, Scope, Call, Attribute, SsaVariable, AugAssign, etc.) with the
Py:: prefix.

Flow.qll's own CFG types (ControlFlowNode, BasicBlock, CallNode,
NameNode, DefinitionNode, CompareNode, ...) keep their unqualified
names — they remain the public CFG API exported from this file.

This is a semantic noop: the qualification was applied mechanically by
script and no name resolution changes. Verified by:
- All 361 lib/ + src/ queries compile clean.
- All 186 ControlFlow + PointsTo + dataflow library-tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-23 12:36:53 +02:00
yoff
1a9bb2416a Python: deprecate Function.getAReturnValueFlowNode() and rewrite internal callers
Follow-up to the getAFlowNode deprecation in the same PR: same AST→legacy-CFG
bridge pattern. The 11 internal call sites (across objects/, types/,
frameworks/, and TypeTrackingImpl) are rewritten to bind a `Return ret`
explicitly, then constrain via `ret.getScope() = f and n.getNode() = ret.getValue()`.

The predicate itself is preserved with a deprecation note so external
users do not experience churn.

Semantic noop.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-22 14:55:19 +02:00
Copilot
717ff62d70 Python: deprecate AstNode.getAFlowNode() and rewrite internal callers
Preparatory refactor for the shared-CFG dataflow migration.

Deprecates the AstNode.getAFlowNode() cached predicate on the public
Python QL API and rewrites all ~140 internal callers across lib/, src/,
test/, and tools/ from `expr.getAFlowNode() = cfgNode` to
`cfgNode.getNode() = expr`, using ControlFlowNode.getNode() which
already exists in Flow.qll.

The predicate itself is preserved (with a deprecation note pointing at
the new pattern) so external users do not experience churn — they can
migrate at their own pace and the AST/CFG hierarchies still get the
intended untangling once the deprecation eventually elapses.

Semantic noop verified by:
- All 361 lib/ + src/ queries compile clean.
- All 122 ControlFlow + PointsTo library-tests pass.
- All 64 dataflow library-tests pass.
- All 113 Variables/Exceptions/Expressions/Statements/Functions/Imports/
  Security/CWE-798/ModificationOfParameterWithDefault query-tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-22 14:55:19 +02:00
yoff
8179bffe64 Merge pull request #21930 from github/yoff/python-dataflow-noop-simplifications
Python: inline init_module_submodule_defn into ImportResolution
2026-06-22 14:50:39 +02:00
Owen Mansel-Chan
c7c1eca415 Merge branch 'main' into copilot/investigate-missing-alerts 2026-06-17 22:54:22 +01:00
Owen Mansel-Chan
1f9899d7db Extend added type tracking step to related types 2026-06-17 15:04:53 +01:00
Owen Mansel-Chan
ea7510bf72 Refactor ReExposedInstance logic into one place 2026-06-17 13:10:47 +01:00
Mathias Vorreiter Pedersen
c12cf88c52 Merge branch 'main' into add-yaml-comments 2026-06-17 10:17:06 +01:00
Owen Mansel-Chan
9c65082189 Fix MISSING alert 2026-06-15 00:14:52 +01:00
copilot-swe-agent[bot]
838d06c53f Fix changelog copy errors in change-notes and CHANGELOG.md files (codeql-cli-2.25.6) 2026-06-11 22:45:33 +02:00
Owen Mansel-Chan
913dcb1190 Add change note 2026-06-11 22:20:52 +02:00
copilot-swe-agent[bot]
73bc2d70ae Model instance-attribute type flow
Use a field level step like JS and Ruby.
2026-06-11 14:48:55 +02:00
Tom Hvitved
f5919875b7 Merge pull request #21941 from hvitved/python/content-approx
Python: Implement `ContentApprox`
2026-06-09 15:46:04 +02:00
yoff
0cea01c22f Merge pull request #21926 from github/yoff/python-simplify-decorator-predicates
Python: simplify decorator-detection predicates to pure AST match
2026-06-08 22:04:33 +02:00
Tom Hvitved
cc1ea25856 Python: Implement ContentApprox 2026-06-08 08:41:28 +02:00
Owen Mansel-Chan
1f91f915c7 Merge pull request #21888 from owen-mc/py/remove-imprecise-container-steps
Python: Remove imprecise container steps #2
2026-06-04 22:16:24 +01:00
Mathias Vorreiter Pedersen
b877943b42 Python: Add upgrade and downgrade scripts. 2026-06-04 17:54:53 +01:00
Mathias Vorreiter Pedersen
0aa1abe432 Python: Add support for YAML comments. 2026-06-04 17:54:48 +01:00
Owen Mansel-Chan
da999ee440 Address review comments 2026-06-03 21:24:16 +01:00
Owen Mansel-Chan
6f2cc43f32 Remove imprecise model for tuple() 2026-06-02 21:59:48 +01:00
Owen Mansel-Chan
5042fdee84 Remove imprecise model for list() 2026-06-02 21:59:46 +01:00
Owen Mansel-Chan
04341c47bd Tweak model for str.join 2026-06-02 21:59:44 +01:00
Owen Mansel-Chan
c3ef1ddd64 Add MaD models for lxml and xml etree.fromstringlist 2026-06-02 16:15:01 +01:00
Owen Mansel-Chan
dede5bc49b Track flow through tuple() with list with tainted elements 2026-06-02 16:14:59 +01:00
Owen Mansel-Chan
ad97b6dd64 Use access path for str.join model 2026-06-02 16:14:56 +01:00
yoff
ac5fa629ef Python: inline init_module_submodule_defn into ImportResolution
The new-dataflow ImportResolution module only used
semmle.python.essa.SsaDefinitions for the 5-line helper predicate
SsaSource::init_module_submodule_defn. Inline it locally and drop the
dependency on legacy SsaDefinitions. This is the only remaining direct
import of semmle.python.essa.* in the new dataflow stack, so dropping
it makes the layering cleaner.

Semantic noop on the current SSA: SsaSourceVariable.getName() and
GlobalVariable.getId() both project the same DB column
(variable(_,_,result)), and the old call's 'init.getEntryNode() = f'
join was just constraining init = package via Scope.getEntryNode()'s
functional uniqueness. RA dump of accesses.ql confirms only the
expected predicate-rename shuffle; all 70 dataflow + ApiGraphs library
tests pass.

This factors out commit 8cab5a20f2 from the larger shared-CFG
migration #21925.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-02 08:24:17 +00:00
yoff
5fb75ac987 Python: simplify decorator-detection predicates to pure AST match
The internal predicates that identify `@staticmethod`, `@classmethod` and
`@property` decorators previously required the decorator's `NameNode` to
satisfy `isGlobal()` (i.e. no SSA def reaches the decorator's name use).
That filter was correct but unnecessarily indirect: these three names
are builtins, and even when a class body redefines one, the class body
has not started executing at the decorator position, so Python uses the
builtin.

Match the decorator's AST `Name` directly instead, dropping the CFG/SSA
detour. The slight semantic change — `isGlobal()` would have rejected
module-level shadowing of these builtins — is negligible in practice
and explicitly documented in the change note.

`hasContextmanagerDecorator` and `hasOverloadDecorator` keep the
`NameNode.isGlobal()` check because their target names (`contextmanager`,
`overload`) are imported, not builtin, and local shadowing is a real
concern.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-01 14:04:43 +00:00
Owen Mansel-Chan
b38440490a Address review comment 2026-05-31 21:47:44 +01:00
github-actions[bot]
cfb18c2477 Post-release preparation for codeql-cli-2.25.6 2026-05-29 12:04:35 +00:00
github-actions[bot]
8b6f969cdb Release preparation for version 2.25.6 2026-05-29 11:27:54 +00:00
Henry Mercer
9bc0c1b1ab Revert "Release preparation for version 2.25.6" 2026-05-29 12:13:50 +01:00
Owen Mansel-Chan
aee33a0cc9 Add missing code for TAnyTupleOrDictionaryElement 2026-05-29 10:26:24 +01:00
Owen Mansel-Chan
df15a719cb Add a ContentSet for any tuple or dictionary element 2026-05-28 16:48:23 +01:00
Owen Mansel-Chan
812e8e6b34 Add change note 2026-05-28 11:37:54 +01:00
Owen Mansel-Chan
80c6f082d1 Fix TODO in containerStep 2026-05-28 11:34:02 +01:00
Owen Mansel-Chan
ec13e1bcd3 Add wildcard ContentSets to avoid performance problems 2026-05-27 15:28:07 +01:00
github-actions[bot]
44a914e40f Release preparation for version 2.25.6 2026-05-25 10:23:26 +00:00
Óscar San José
996e79131e Merge branch 'main' into post-release-prep/codeql-cli-2.25.5 2026-05-22 16:32:30 +02:00
Rasmus Lerchedahl Petersen
0ecca91dea Python: typo 2026-05-21 16:59:16 +01:00
Rasmus Lerchedahl Petersen
f669a4f3bf Python: Make sure all imprecise taint bubbles up 2026-05-21 16:59:14 +01:00
Rasmus Lerchedahl Petersen
9a180036a5 Python: conversion step for format_map
and adjust collection test
2026-05-21 16:59:08 +01:00
Rasmus Lerchedahl Petersen
facb3b681d Python: recover taint for % format strings 2026-05-21 16:57:50 +01:00
Rasmus Lerchedahl Petersen
b67694b2ab Python: Remove imprecise container steps
- remove `tupleStoreStep` and `dictStoreStep` from `containerStep`
   These are imprecise compared to the content being precise.
- add implicit reads to recover taint at sinks
- add implicit read steps for decoders
  to supplement the `AdditionalTaintStep`
  that now only covers when the full container is tainted.
2026-05-21 16:57:44 +01:00
github-actions[bot]
9f64000962 Post-release preparation for codeql-cli-2.25.5 2026-05-18 15:20:31 +00:00
github-actions[bot]
e38616a2ef Release preparation for version 2.25.5 2026-05-18 12:05:32 +00:00
Geoffrey White
a4b2c0f6fd Update change notes (Copilot's suggestions). 2026-05-15 09:24:29 +01:00