Commit Graph

870 Commits

Author SHA1 Message Date
yoff
9ce906bbed Python: switch dataflow library to new (shared) CFG + SSA
Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll)
and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade
(semmle.python.controlflow.internal.Cfg) and the new SSA adapter
(semmle.python.dataflow.new.internal.SsaImpl), both introduced
additively in the preceding PRs in this stack.

This is the trunk-flip equivalent of the original draft PR #21894 (kept
around as documentation), rebased on top of the four preparatory PRs:

  P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919).
  P2: Qualify Flow.qll's AST references with Py:: prefix (#21920).
  P3: Add new shared-CFG-backed control flow graph (#21921).
  P4: Add new shared-SSA-backed SSA adapter (#21923).

The Python dataflow library (semmle/python/dataflow/new/) now imports
the new CFG facade and SSA adapter. All CFG-typed predicates
(ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are
qualified with the Cfg:: prefix; SSA references switch from
EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable.

GuardNode is redesigned to use the new CFG's outcome-node model
(isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock +
flipped indirection. Only BarrierGuard<...> is preserved as public
API.

Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib,
...) are updated to take CFG nodes from the new facade.

A handful of dataflow consistency tweaks for the new CFG:
- Augmented-assignment targets are treated as both load and store.
- 'from X import *' produces uncertain SSA writes for unknown names.
- CFG nodes are canonicalised so dataflow does not see equivalent
  pre/post-order pairs as distinct nodes.

Two AST tweaks for the new CFG:
- AstNodeImpl: omit PEP 695 type-parameter names from
  FunctionDefExpr / ClassDefExpr children.
- ImportResolution: drop the legacy essa import.

Test churn (~175 files): reblessed library- and query-test .expected
files reflect slightly different CFG granularity, different toString
output, and a handful of true alert deltas in security queries.

Verification: all 367 lib + src + consistency-queries compile clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-29 13:17:49 +00:00
yoff
5fcaac7cb2 Merge pull request #21869 from yoff/python/support-flask-subclasses
Python: Support Flask subclasses
2026-06-25 23:42:21 +02:00
yoff
1a9bb2416a Python: deprecate Function.getAReturnValueFlowNode() and rewrite internal callers
Follow-up to the getAFlowNode deprecation in the same PR: same AST→legacy-CFG
bridge pattern. The 11 internal call sites (across objects/, types/,
frameworks/, and TypeTrackingImpl) are rewritten to bind a `Return ret`
explicitly, then constrain via `ret.getScope() = f and n.getNode() = ret.getValue()`.

The predicate itself is preserved with a deprecation note so external
users do not experience churn.

Semantic noop.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-22 14:55:19 +02:00
Copilot
717ff62d70 Python: deprecate AstNode.getAFlowNode() and rewrite internal callers
Preparatory refactor for the shared-CFG dataflow migration.

Deprecates the AstNode.getAFlowNode() cached predicate on the public
Python QL API and rewrites all ~140 internal callers across lib/, src/,
test/, and tools/ from `expr.getAFlowNode() = cfgNode` to
`cfgNode.getNode() = expr`, using ControlFlowNode.getNode() which
already exists in Flow.qll.

The predicate itself is preserved (with a deprecation note pointing at
the new pattern) so external users do not experience churn — they can
migrate at their own pace and the AST/CFG hierarchies still get the
intended untangling once the deprecation eventually elapses.

Semantic noop verified by:
- All 361 lib/ + src/ queries compile clean.
- All 122 ControlFlow + PointsTo library-tests pass.
- All 64 dataflow library-tests pass.
- All 113 Variables/Exceptions/Expressions/Statements/Functions/Imports/
  Security/CWE-798/ModificationOfParameterWithDefault query-tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-22 14:55:19 +02:00
Owen Mansel-Chan
da999ee440 Address review comments 2026-06-03 21:24:16 +01:00
Owen Mansel-Chan
6f2cc43f32 Remove imprecise model for tuple() 2026-06-02 21:59:48 +01:00
Owen Mansel-Chan
5042fdee84 Remove imprecise model for list() 2026-06-02 21:59:46 +01:00
Owen Mansel-Chan
04341c47bd Tweak model for str.join 2026-06-02 21:59:44 +01:00
Owen Mansel-Chan
c3ef1ddd64 Add MaD models for lxml and xml etree.fromstringlist 2026-06-02 16:15:01 +01:00
Owen Mansel-Chan
dede5bc49b Track flow through tuple() with list with tainted elements 2026-06-02 16:14:59 +01:00
Owen Mansel-Chan
ad97b6dd64 Use access path for str.join model 2026-06-02 16:14:56 +01:00
yoff
f6ed5c19be Python: fix sub class test 2026-06-02 13:50:31 +02:00
Rasmus Lerchedahl Petersen
0ecca91dea Python: typo 2026-05-21 16:59:16 +01:00
Rasmus Lerchedahl Petersen
f669a4f3bf Python: Make sure all imprecise taint bubbles up 2026-05-21 16:59:14 +01:00
Josef Svenningsson
68be006a29 Merge pull request #21641 from github/josefs/promptInjectionImprovements
Improve prompt inject for Python
2026-04-29 11:23:52 +01:00
Josef Svenningsson
bb18bb084c Improve prompt inject for Python 2026-04-28 18:24:16 +01:00
Owen Mansel-Chan
7458674470 Merge pull request #21584 from owen-mc/shared/update-mad-comments
Shared: update code comments explaining models-as-data format to include barriers and barrier guards
2026-04-14 09:30:28 +01:00
Owen Mansel-Chan
37aac05964 Replace branch with acceptingValue 2026-03-27 22:39:10 +00:00
Owen Mansel-Chan
10fddc7b96 Add barriers and barrier guards to MaD format explanations 2026-03-27 09:47:24 +00:00
Taus
c439fc5d45 Python: Replace type tracking with global data-flow
This takes care of most of the false negatives from the preceding
commit.

Additionally, we add models for some known wrappers of `socket.socket`
from the `gevent` and `eventlet` packages.
2026-03-26 15:35:33 +00:00
yoff
cfbae50845 Python: convert barrier guard to MaD 2026-02-26 13:12:34 +01:00
Taus
987b10ab3e Python: Fix bad join in OutgoingRequestCall
On `keras-team/keras`, this was producing ~200 million intermediate
tuples in order to produce a total of ... 2 tuples.

After the refactor, max intermediate tuple count is ~80k for the
charpred (and 4 for the new helper predicate).
2026-02-16 13:48:33 +00:00
Taus
df0f2f8ce4 Python: Simple dataflow annotations
None of these required any changes to the dataflow libraries, so it
seemed easiest to put them in their own commit.
2026-02-16 13:48:32 +00:00
REDMOND\brodes
8459eec239 Moving the SsrfSink concept into Concepts.qll, and renaming to HttpClientRequestFromModel as suggested in PR review. 2026-02-06 09:26:49 -05:00
REDMOND\brodes
0a88425170 Python: Altering SSRF MaD to use 'request-forgery' tag. Update to test cases expected results, off by one line. Changed to using ModelOutput::sinkNode. 2026-02-04 09:04:22 -05:00
Ben Rodes
7ddfa80399 Merge branch 'main' into azure_python_sdk_url_summary_upstream 2026-02-02 09:00:35 -05:00
Owen Mansel-Chan
5204255615 Merge pull request #21234 from owen-mc/python/convert-sanitizers-to-mad
Python: Allow models-as-data sanitizers
2026-01-30 14:28:39 +00:00
Owen Mansel-Chan
0222159df5 Specify vulnerable args instead of safe ones 2026-01-30 14:10:03 +00:00
Owen Mansel-Chan
a3885cd8b2 Replace sanitizer by exclusion from sink definition 2026-01-30 09:28:02 +00:00
yoff
e7a0fc7140 python: Add query for prompt injection
This pull request introduces a new CodeQL query for detecting prompt injection vulnerabilities in Python code targeting AI prompting APIs such as agents and openai. The changes includes a new experimental query, new taint flow and type models, a customizable dataflow configuration, documentation, and comprehensive test coverage.
2026-01-29 23:47:52 +01:00
Taus
34800d1519 Merge pull request #20945 from joefarebrother/python-websockets
Python: Model remote flow sources for the `websockets` library
2026-01-29 15:47:46 +01:00
Tom Hvitved
b974a84bef Merge pull request #21051 from hvitved/shared/flow-summary-provenance-filtering
Shared: Provenance-based filtering of flow summaries
2026-01-26 17:24:34 +01:00
Tom Hvitved
0adece7cde Python: Adapt to changes in FlowSummaryImpl 2026-01-26 12:40:19 +01:00
yoff
d05901ad3f python/javascript/ruby: mark internal predicates 2026-01-22 17:30:24 +01:00
yoff
3dbfb9fa4b python: add machinery for MaD barriers
and reinstate previously removed barrier
now as a MaD row
2026-01-22 17:30:24 +01:00
yoff
699ed50432 python: remove barrier that can be expressed in MaD 2026-01-22 17:30:24 +01:00
Taus
5414bd2716 Merge pull request #21134 from yoff/python/support-ListElement-in-MaD
Python support `ListElement` in MaD
2026-01-20 23:38:02 +01:00
yoff
1ac3706e75 Python support ListElement in MaD 2026-01-09 13:08:06 +01:00
Asger F
869efb8a48 JS: Sync ApiGraphModels.qll 2026-01-07 11:05:41 +01:00
yoff
5c6d83ed65 Merge pull request #20877 from joefarebrother/python-tornado-websocket
Python: Add models for websocket handlers for Tornado
2025-12-09 10:08:59 +01:00
Joe Farebrother
d70c596c86 Merge pull request #20914 from joefarebrother/python-socketio
Python: Add models for socketio
2025-12-04 23:14:58 +00:00
Joe Farebrother
ac55cf9544 Update test and qldoc 2025-12-01 20:41:59 +00:00
Joe Farebrother
6fbae45d49 Update qldoc 2025-12-01 20:14:36 +00:00
Joe Farebrother
384e17a4ef Implement websockets models 2025-12-01 16:24:59 +00:00
yoff
ebe29dd143 python: model urllib.ParseResult 2025-11-26 13:36:05 +01:00
yoff
a878bc61e1 python: add model for urllib.urlparse 2025-11-26 13:32:54 +01:00
Joe Farebrother
8d313ff85b qldoc fixes 2025-11-26 11:23:04 +00:00
Joe Farebrother
6207137ef0 Add changenote 2025-11-26 11:21:05 +00:00
Joe Farebrother
eb7fe71557 Fix namespace instances and update tests 2025-11-26 10:51:16 +00:00
Joe Farebrother
83eadbad60 Add namespace models 2025-11-25 16:56:36 +00:00