Commit Graph

26 Commits

Author SHA1 Message Date
Copilot
4ed5722e3e Python: switch dataflow library to new (shared) CFG + SSA
Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll)
and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade
(semmle.python.controlflow.internal.Cfg) and the new SSA adapter
(semmle.python.dataflow.new.internal.SsaImpl), both introduced
additively in the preceding PRs in this stack.

This is the trunk-flip equivalent of the original draft PR #21894 (kept
around as documentation), rebased on top of the four preparatory PRs:

  P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919).
  P2: Qualify Flow.qll's AST references with Py:: prefix (#21920).
  P3: Add new shared-CFG-backed control flow graph (#21921).
  P4: Add new shared-SSA-backed SSA adapter (#21923).

The Python dataflow library (semmle/python/dataflow/new/) now imports
the new CFG facade and SSA adapter. All CFG-typed predicates
(ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are
qualified with the Cfg:: prefix; SSA references switch from
EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable.

GuardNode is redesigned to use the new CFG's outcome-node model
(isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock +
flipped indirection. Only BarrierGuard<...> is preserved as public
API.

Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib,
...) are updated to take CFG nodes from the new facade.

A handful of dataflow consistency tweaks for the new CFG:
- Augmented-assignment targets are treated as both load and store.
- 'from X import *' produces uncertain SSA writes for unknown names.
- CFG nodes are canonicalised so dataflow does not see equivalent
  pre/post-order pairs as distinct nodes.

Two AST tweaks for the new CFG:
- AstNodeImpl: omit PEP 695 type-parameter names from
  FunctionDefExpr / ClassDefExpr children.
- ImportResolution: drop the legacy essa import.

Test churn (~175 files): reblessed library- and query-test .expected
files reflect slightly different CFG granularity, different toString
output, and a handful of true alert deltas in security queries.

Verification: all 367 lib + src + consistency-queries compile clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-02 14:09:45 +00:00
REDMOND\brodes
4d4e7a1b5c Pretty print for tests. 2026-02-12 08:28:08 -05:00
REDMOND\brodes
9f9c353806 Update expected files. Copilot suggestions broke unit test expected results (column numbers). 2026-02-10 11:47:23 -05:00
REDMOND\brodes
a91cf6b7cb Applying copilot PR suggestions. 2026-02-10 11:37:11 -05:00
REDMOND\brodes
97ddab0724 Added support for new URIValidator in AntiSSRF library. Updated test caes to use postprocessing results. Currently results for partial ssrf still need work, it is flagging cases where the URL is fully controlled, but is sanitized. I'm not sure if this should be flagged yet. 2026-02-06 11:20:11 -05:00
REDMOND\brodes
9912aaaf1a Adding azure sdk test cases and updated test expected file. 2026-02-06 11:18:16 -05:00
REDMOND\brodes
0a88425170 Python: Altering SSRF MaD to use 'request-forgery' tag. Update to test cases expected results, off by one line. Changed to using ModelOutput::sinkNode. 2026-02-04 09:04:22 -05:00
REDMOND\brodes
704e2966cb Adding azure sdk test cases and updated test expected file. 2025-09-30 13:32:56 -04:00
Anders Schack-Mulligen
bfcfedab8c Python: Update expected output (uninteresting). 2024-04-12 09:20:30 +02:00
Anders Schack-Mulligen
088a0a54ba Python: Add empty provenance column to expected files. 2024-02-09 11:32:08 +01:00
Rasmus Lerchedahl Petersen
11c71fdd18 Python: remove EssaNodes
This commit removes SSA nodes from the data flow graph. Specifically, for a definition and use such as
```python
  x = expr
  y = x + 2
```
we used to have flow from `expr` to an SSA variable representing x and from that SSA variable to the use of `x` in the definition of `y`. Now we instead have flow from `expr` to the control flow node for `x` at line 1 and from there to the control flow node for `x` at line 2.

Specific changes:
- `EssaNode` from the data flow layer no longer exists.
- Several glue steps between `EssaNode`s and `CfgNode`s have been deleted.
- Entry nodes are now admitted as `CfgNodes` in the data flow layer (they were filtered out before).
- Entry nodes now have a new `toString` taking into account that the module name may be ambigous.
- Some tests have been rewritten to accomodate the changes, but only `python/ql/test/experimental/dataflow/basic/maximalFlowsConfig.qll` should have semantic changes.
- Comments have been updated
- Test output has been updated, but apart from `python/ql/test/experimental/dataflow/basic/maximalFlows.expected` only `python/ql/test/experimental/dataflow/typetracking-summaries/summaries.py` should have a semantic change. This is a bonus fix, probably meaning that something was never connected up correctly.
2023-11-20 21:35:32 +01:00
Rasmus Wriedt Larsen
657b1997cc Python: Move FullServerSideRequestForgery and PartialServerSideRequestForgery to new dataflow API 2023-08-28 15:27:50 +02:00
Rasmus Wriedt Larsen
ca93f4d223 Python: Accept .expected changes 2023-08-11 10:36:05 +02:00
Rasmus Wriedt Larsen
d73289ac4e Python: Accept .expected changes 2023-04-27 11:54:39 +02:00
erik-krogh
4da0508dae Merge branch 'main' into py-last-msg 2022-10-11 10:49:19 +02:00
erik-krogh
944ca4a0da fix some more style-guide violations in the alert-messages 2022-10-07 11:23:34 +02:00
Rasmus Wriedt Larsen
b01a0ae696 Python: Adjust .expected after flask source change
It's really hard to audit that this is all good.. I tried my best with
`icdiff` though -- and there is a problem with
ql/src/experimental/Security/CWE-348/ClientSuppliedIpUsedInSecurityCheck.ql
that needs to be fixed in the next commit
2022-10-03 20:35:49 +02:00
Tom Hvitved
57f2a74636 Python: Implement ContentSet 2022-04-04 13:51:44 +02:00
Rasmus Wriedt Larsen
83f87f0272 Python: Adjust .expected based on new comment
That was changed in 9866214
2021-12-17 15:29:41 +01:00
Rasmus Wriedt Larsen
1d00730753 Python: Allow http[s]:// prefix for SSRF 2021-12-17 00:27:18 +01:00
Rasmus Wriedt Larsen
8d9a797b75 Python: Add tricky .format SSRF tests 2021-12-17 00:24:51 +01:00
Rasmus Wriedt Larsen
6f297f4e9c Python: Fix SSRF sanitizer tests
They were very misleading before, because a sanitizer that happened
early, would remove taint from the rest of the cases by use-use flow :|
2021-12-16 23:24:08 +01:00
Rasmus Wriedt Larsen
4b5599fe17 Python: Improve full/partial SSRF split
Now full-ssrf will only alert if **all** URL parts are fully
user-controlled.
2021-12-16 22:48:51 +01:00
Rasmus Wriedt Larsen
cb934e17b1 Python: Adjust SSRF location to request call
Since that might not be the same place where the vulnerable URL part is.
2021-12-16 22:48:51 +01:00
Rasmus Wriedt Larsen
b1bca85162 Python: Add interesting test-case 2021-12-16 22:48:51 +01:00
Rasmus Wriedt Larsen
1cc5e54357 Python: Add SSRF queries
I've added 2 queries:

- one that detects full SSRF, where an attacker can control the full URL,
  which is always bad
- and one for partial SSRF, where an attacker can control parts of an
  URL (such as the path, query parameters, or fragment), which is not a
  big problem in many cases (but might still be exploitable)

full SSRF should run by default, and partial SSRF should not (but makes
it easy to see the other results).

Some elements of the full SSRF queries needs a bit more polishing, like
being able to detect `"https://" + user_input` is in fact controlling
the full URL.
2021-12-16 01:48:34 +01:00