Commit Graph

4187 Commits

Author SHA1 Message Date
yoff
92e03318ea Python: model exception edges for raise-prone expressions inside try/with
The new CFG previously only emitted exception edges for explicit `raise`
and `assert` statements. As a result, code that became reachable only
via the exception path of an arbitrary expression (e.g., the body of an
`except` handler following a try-body whose `call()` could raise) was
classified as dead, breaking analyses like StackTraceExposure,
FileNotAlwaysClosed, ExceptionInfo, UseOfExit, and CatchingBaseException.

This commit adds a `mayThrow` predicate over expressions that are known
sources of implicit exceptions in Python (calls, attribute access,
subscripts, arithmetic/comparison operators, imports, await/yield/yield
from) plus `from m import *` at the statement level, and routes them
through the shared CFG's `beginAbruptCompletion(_, _, ExceptionSuccessor,
always=false)` hook.

The set of exception sources is restricted to nodes that are
syntactically inside a `try`/`with` statement in the same scope.
This mirrors Java's `ControlFlowGraph::mayThrow`, which only emits
exception edges where local handling can observe them — outside such
contexts, the edges add CFG complexity (weakening BarrierGuard
precision and breaking SSA continuity around augmented assignments and
subscript stores) without analysis benefit, since exceptions just
propagate to the function exit anyway.

Net effect on the test suite: ~100 alerts restored across the exception-
related query tests (StackTraceExposure +29, ExceptionInfo +17,
FileNotAlwaysClosed +52, UseOfExit +1, CatchingBaseException restored)
with no precision regressions. Affected `.expected` files and the
regression-guard `dead_under_no_raise.py` are updated accordingly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-22 13:46:51 +00:00
yoff
408ba6218f Python: switch dataflow library to new (shared) CFG + SSA
Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll)
and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade
(semmle.python.controlflow.internal.Cfg) and the new SSA adapter
(semmle.python.dataflow.new.internal.SsaImpl), both introduced
additively in the preceding PRs in this stack.

This is the trunk-flip equivalent of the original draft PR #21894 (kept
around as documentation), rebased on top of the four preparatory PRs:

  P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919).
  P2: Qualify Flow.qll's AST references with Py:: prefix (#21920).
  P3: Add new shared-CFG-backed control flow graph (#21921).
  P4: Add new shared-SSA-backed SSA adapter (#21923).

The Python dataflow library (semmle/python/dataflow/new/) now imports
the new CFG facade and SSA adapter. All CFG-typed predicates
(ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are
qualified with the Cfg:: prefix; SSA references switch from
EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable.

GuardNode is redesigned to use the new CFG's outcome-node model
(isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock +
flipped indirection. Only BarrierGuard<...> is preserved as public
API.

Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib,
...) are updated to take CFG nodes from the new facade.

A handful of dataflow consistency tweaks for the new CFG:
- Augmented-assignment targets are treated as both load and store.
- 'from X import *' produces uncertain SSA writes for unknown names.
- CFG nodes are canonicalised so dataflow does not see equivalent
  pre/post-order pairs as distinct nodes.

Two AST tweaks for the new CFG:
- AstNodeImpl: omit PEP 695 type-parameter names from
  FunctionDefExpr / ClassDefExpr children.
- ImportResolution: drop the legacy essa import.

Test churn (~175 files): reblessed library- and query-test .expected
files reflect slightly different CFG granularity, different toString
output, and a handful of true alert deltas in security queries.

Verification: all 367 lib + src + consistency-queries compile clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-22 13:46:43 +00:00
Copilot
b7f79f2d34 Python: add new shared-SSA-backed SSA adapter
Preparatory refactor for the shared-CFG dataflow migration. Adds the
new Python SSA adapter additively, without changing any production
behaviour.

Library additions:

- semmle.python.dataflow.new.internal.SsaImpl — Python SSA
  implementation built on the new (shared) CFG. Mirrors the Java SSA
  adapter (java/ql/lib/semmle/code/java/dataflow/internal/SsaImpl.qll):
  an InputSig is defined in terms of positional (BasicBlock, int)
  variable references, and the shared
  codeql.ssa.Ssa::Make<Location, Cfg, Input> module is then
  instantiated.

  SourceVariable is the AST-level Py::Variable. Variable references
  are looked up via the new CFG facade's NameNode.defines/uses/deletes
  predicates (added in the preceding PR), which themselves are
  one-line bridges to AST-level Name.defines/uses/deletes.

  Implicit-entry definitions are inserted for non-local/global/builtin
  reads, captured variables, and (when needed) parameters.

Test additions:

- library-tests/dataflow-new-ssa/ — exercises the new SSA over a
  representative test corpus and checks expected def/use chains.

- library-tests/dataflow-new-ssa-vs-legacy/ — runs both new SSA and
  legacy ESSA over the same corpus and diffs the results, so any
  semantic divergence shows up as a test failure.

Production impact:

None. The new SSA adapter has zero callers in lib/ and src/ — the
legacy ESSA SSA (semmle/python/essa/*) remains the default. The
dataflow library is not migrated yet; that lands in a follow-up PR.

Verified by:
- All 367 lib + src + consistency-queries compile clean.
- All 641 ControlFlow + PointsTo + dataflow + essa + consistency
  library-tests pass.
- Both new dataflow-new-ssa[/vs-legacy] test packs pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-22 13:38:40 +00:00
Copilot
9cdeb8e7ee Python: add new shared-CFG-backed control flow graph
Preparatory refactor for the shared-CFG dataflow migration. Adds the
new Python CFG library additively, without changing any production
behaviour.

Library additions:

- semmle.python.controlflow.internal.AstNodeImpl — mediates between
  the Python AST and the shared codeql.controlflow.ControlFlowGraph
  signature. Wraps Python's Stmt/Expr/Scope/Pattern and adds two
  synthetic kinds of node (BlockStmt for body slots, intermediate
  nodes for multi-operand boolean expressions).

- semmle.python.controlflow.internal.Cfg — public facade
  re-exposing the same API surface as semmle/python/Flow.qll
  (ControlFlowNode, CallNode, BasicBlock, NameNode, DefinitionNode,
  CompareNode, ...), backed by the shared CFG.

- lib/printCfgNew.ql — debug/visualisation query for the new CFG.

- consistency-queries/CfgConsistency.ql — consistency query running
  the shared CFG's standard checks against Python.

Shared library:

- shared.controlflow.ControlFlowGraph — adds two defaulted
  getWhileElse / getForeachElse predicates to AstSig so Python can
  model while-else / for-else (no behavioural change for other
  languages).

Test additions:

- ControlFlow/bindings/* — annotation-driven SSA-binding tests for
  the new CFG (annassign, compound, comprehension, decorated,
  except_handler, imports, match_pattern, parameters, simple,
  type_params, walrus_starred, with_stmt, dead_under_no_raise).

- ControlFlow/store-load/* — basic store/load coverage.

- ControlFlow/evaluation-order/NewCfg*.ql — mirrors of the existing
  OldCfg evaluation-order self-validation suite, run against the
  new CFG via NewCfgImpl.qll.

- Minor extensions to existing test_if.py / test_boolean.py +
  cosmetic .expected churn on a handful of OldCfg tests.

No dataflow, SSA, or production query is migrated yet — that lands in
follow-up PRs. The new CFG library has zero callers in lib/ and src/.

Verified by:
- All lib + src + consistency-queries compile clean (367 queries).
- All 56 ControlFlow library-tests pass.
- All 474 dataflow + PointsTo library-tests + consistency tests pass.
- syntax_error/CONSISTENCY/CfgConsistency passes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-22 13:38:30 +00:00
Copilot
717ff62d70 Python: deprecate AstNode.getAFlowNode() and rewrite internal callers
Preparatory refactor for the shared-CFG dataflow migration.

Deprecates the AstNode.getAFlowNode() cached predicate on the public
Python QL API and rewrites all ~140 internal callers across lib/, src/,
test/, and tools/ from `expr.getAFlowNode() = cfgNode` to
`cfgNode.getNode() = expr`, using ControlFlowNode.getNode() which
already exists in Flow.qll.

The predicate itself is preserved (with a deprecation note pointing at
the new pattern) so external users do not experience churn — they can
migrate at their own pace and the AST/CFG hierarchies still get the
intended untangling once the deprecation eventually elapses.

Semantic noop verified by:
- All 361 lib/ + src/ queries compile clean.
- All 122 ControlFlow + PointsTo library-tests pass.
- All 64 dataflow library-tests pass.
- All 113 Variables/Exceptions/Expressions/Statements/Functions/Imports/
  Security/CWE-798/ModificationOfParameterWithDefault query-tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-22 14:55:19 +02:00
Owen Mansel-Chan
c9d45217d2 Fix order of comments in test 2026-06-19 13:23:52 +01:00
Owen Mansel-Chan
451fc2e4e7 Undo conversion for queries that import LegacyPointsTo 2026-06-19 12:22:42 +01:00
Owen Mansel-Chan
5497f2c5fe Convert Python qlref tests to inline expectations 2026-06-19 12:22:40 +01:00
Owen Mansel-Chan
1f9899d7db Extend added type tracking step to related types 2026-06-17 15:04:53 +01:00
Owen Mansel-Chan
dd61dd2d74 Fix FP for py/modification-of-locals 2026-06-17 14:24:18 +01:00
Owen Mansel-Chan
47c2c9e763 Add test for FP for py/modification-of-locals 2026-06-17 14:22:42 +01:00
Owen Mansel-Chan
415857cacb Fix FP for py/should-use-with 2026-06-17 13:01:36 +01:00
Owen Mansel-Chan
d72144646a Add test for FP for py/should-use-with 2026-06-17 12:55:17 +01:00
Owen Mansel-Chan
199fd864ad Fix FP for py/file-not-closed 2026-06-17 12:36:04 +01:00
Owen Mansel-Chan
890969433f Add test for FP for py/file-not-closed 2026-06-17 12:19:03 +01:00
Owen Mansel-Chan
9c65082189 Fix MISSING alert 2026-06-15 00:14:52 +01:00
Owen Mansel-Chan
434a99447e Add thorough tests, including one MISSING alert 2026-06-12 13:45:02 +01:00
Owen Mansel-Chan
d389ea4039 Convert sql-injection test to inline expectations 2026-06-12 13:44:56 +01:00
Owen Mansel-Chan
befb557bfd Accept fixed MISSING tests 2026-06-11 15:44:20 +02:00
copilot-swe-agent[bot]
73bc2d70ae Model instance-attribute type flow
Use a field level step like JS and Ruby.
2026-06-11 14:48:55 +02:00
copilot-swe-agent[bot]
a4585d8d94 Add test documenting missing PEP249 alerts for connection stored in self attribute 2026-06-11 05:48:40 +00:00
Owen Mansel-Chan
1f91f915c7 Merge pull request #21888 from owen-mc/py/remove-imprecise-container-steps
Python: Remove imprecise container steps #2
2026-06-04 22:16:24 +01:00
Owen Mansel-Chan
b27d08ee32 Update edges in expected test output 2026-06-02 18:29:56 +01:00
Owen Mansel-Chan
20ce679d61 Accept changed edges in test output
No changes to alerts
2026-06-02 16:15:08 +01:00
Owen Mansel-Chan
f62ebef9e0 Adjust expected test output 2026-06-02 16:15:06 +01:00
Taus
6165623cbf Merge pull request #21724 from github/tausbn/python-add-self-validating-cfg-tests 2026-05-28 22:07:55 +02:00
Taus
35faec3db1 Python: Address review comments
- Get rid of unnecessary parentheses
- Use call syntax in the relevant test
- Get rid of `dead(2)` annotation
2026-05-27 15:27:19 +00:00
Owen Mansel-Chan
ec13e1bcd3 Add wildcard ContentSets to avoid performance problems 2026-05-27 15:28:07 +01:00
Owen Mansel-Chan
e8779295ee Update test results 2026-05-22 11:43:18 +01:00
Rasmus Lerchedahl Petersen
fa758d6bf5 python: fix test 2026-05-21 16:59:19 +01:00
Rasmus Lerchedahl Petersen
fa9426c749 Python: extra tests for comprehension 2026-05-21 16:59:18 +01:00
Rasmus Lerchedahl Petersen
f669a4f3bf Python: Make sure all imprecise taint bubbles up 2026-05-21 16:59:14 +01:00
Rasmus Lerchedahl Petersen
3275c814bd Python: reset test expectations 2026-05-21 16:59:11 +01:00
Rasmus Lerchedahl Petersen
9a180036a5 Python: conversion step for format_map
and adjust collection test
2026-05-21 16:59:08 +01:00
Rasmus Lerchedahl Petersen
93e7ab52b7 Python: adjust test expectations
We now find an alert on this line as we hope to
It is not an alert for _full_ SSRF, though, since that configuration cannot handle multiple substitutions.
2026-05-21 16:58:51 +01:00
Rasmus Lerchedahl Petersen
facb3b681d Python: recover taint for % format strings 2026-05-21 16:57:50 +01:00
Rasmus Lerchedahl Petersen
b67694b2ab Python: Remove imprecise container steps
- remove `tupleStoreStep` and `dictStoreStep` from `containerStep`
   These are imprecise compared to the content being precise.
- add implicit reads to recover taint at sinks
- add implicit read steps for decoders
  to supplement the `AdditionalTaintStep`
  that now only covers when the full container is tainted.
2026-05-21 16:57:44 +01:00
Taus
1ef557c972 Python: Address Copilot's comments 2026-05-12 15:27:14 +00:00
Taus
f5c3b63a4a Python: Add ConsecutiveTimestamps test
This one is potentially a bit iffy -- it checks for a very powerful
property (that implies many of the other queries), but as the test
results show, it can produce false positives when there is in fact no
problem. We may want to get rid of it entirely, if it becomes too noisy.
2026-05-12 12:54:26 +00:00
Taus
c30d6ae3aa Python: Add NeverReachable test
This looks for nodes annotated with `t[never]` in the test that are
reachable in the CFG. This should not happen (it messes with various
queries, e.g. the "mixed returns" query), but the test shows that in a
few particular cases (involving the `match` statement where all cases
contain `return`s), we _do_ have reachable nodes that shouldn't be.
2026-05-12 12:54:26 +00:00
Taus
fc2bc26f36 Python: Add BasicBlockOrdering test
This one demonstrates a bug in the current CFG. In a dictionary
comprehension `{k: v for k, v in d.items()}`, we evaluate the value
before the key, which is incorrect. (A fix for this bug has been
implemented in a separate PR.)
2026-05-12 12:54:25 +00:00
Taus
3a979ac2f8 Python: Add some CFG-validation queries
These use the annotated, self-verifying test files to check various
consistency requirements.

Some of these may be expressing the same thing in different ways, but
it's fairly cheap to keep them around, so I have not attempted to
produce a minimal set of queries for this.
2026-05-12 12:54:25 +00:00
Taus
71cd5be513 Python: Add self-validating CFG tests
These tests consist of various Python constructions (hopefully a
somewhat comprehensive set) with specific timestamp annotations
scattered throughout. When the tests are run using the Python 3
interpreter, these annotations are checked and compared to the "current
timestamp" to see that they are in agreement. This is what makes the
tests "self-validating".

There are a few different kinds of annotations: the basic `t[4]` style
(meaning this is executed at timestamp 4), the `t[dead(4)]` variant
(meaning this _would_ happen at timestamp 4, but it is in a dead
branch), and `t[never]` (meaning this is never executed at all).

In addition to this, there is a query, MissingAnnotations, which checks
whether we have applied these annotations maximally. Many expression
nodes are not actually annotatable, so there is a sizeable list of
excluded nodes for that query.
2026-05-12 12:42:29 +00:00
Geoffrey White
1c704a0912 Python: Accept test changes (improvement). 2026-05-07 10:28:19 +01:00
Josef Svenningsson
68be006a29 Merge pull request #21641 from github/josefs/promptInjectionImprovements
Improve prompt inject for Python
2026-04-29 11:23:52 +01:00
Josef Svenningsson
25a8aa97b2 Fix openai prompt injection tests 2026-04-28 18:24:26 +01:00
Josef Svenningsson
a05e191518 Add tests for anthropic prompt injection models 2026-04-28 18:24:22 +01:00
Josef Svenningsson
e069c9c2ee Fix tests 2026-04-28 18:24:19 +01:00
Taus
ac23e16786 Python: Move Python 3.15 data-flow tests to a separate file
We won't be able to run these tests until Python 3.15 is actually out
(and our CI is using it), so it seemed easiest to just put them in their
own test directory.
2026-04-17 13:16:46 +00:00
Taus
dc36609743 Python: Add data-flow tests
Alas, all these demonstrate is that we already don't fully support the
desugared `yield from` form.
2026-04-17 12:15:04 +00:00