Commit Graph

14 Commits

Author SHA1 Message Date
Copilot
4ed5722e3e Python: switch dataflow library to new (shared) CFG + SSA
Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll)
and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade
(semmle.python.controlflow.internal.Cfg) and the new SSA adapter
(semmle.python.dataflow.new.internal.SsaImpl), both introduced
additively in the preceding PRs in this stack.

This is the trunk-flip equivalent of the original draft PR #21894 (kept
around as documentation), rebased on top of the four preparatory PRs:

  P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919).
  P2: Qualify Flow.qll's AST references with Py:: prefix (#21920).
  P3: Add new shared-CFG-backed control flow graph (#21921).
  P4: Add new shared-SSA-backed SSA adapter (#21923).

The Python dataflow library (semmle/python/dataflow/new/) now imports
the new CFG facade and SSA adapter. All CFG-typed predicates
(ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are
qualified with the Cfg:: prefix; SSA references switch from
EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable.

GuardNode is redesigned to use the new CFG's outcome-node model
(isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock +
flipped indirection. Only BarrierGuard<...> is preserved as public
API.

Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib,
...) are updated to take CFG nodes from the new facade.

A handful of dataflow consistency tweaks for the new CFG:
- Augmented-assignment targets are treated as both load and store.
- 'from X import *' produces uncertain SSA writes for unknown names.
- CFG nodes are canonicalised so dataflow does not see equivalent
  pre/post-order pairs as distinct nodes.

Two AST tweaks for the new CFG:
- AstNodeImpl: omit PEP 695 type-parameter names from
  FunctionDefExpr / ClassDefExpr children.
- ImportResolution: drop the legacy essa import.

Test churn (~175 files): reblessed library- and query-test .expected
files reflect slightly different CFG granularity, different toString
output, and a handful of true alert deltas in security queries.

Verification: all 367 lib + src + consistency-queries compile clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-02 14:09:45 +00:00
Anders Schack-Mulligen
bfcfedab8c Python: Update expected output (uninteresting). 2024-04-12 09:20:30 +02:00
Anders Schack-Mulligen
088a0a54ba Python: Add empty provenance column to expected files. 2024-02-09 11:32:08 +01:00
Rasmus Lerchedahl Petersen
11c71fdd18 Python: remove EssaNodes
This commit removes SSA nodes from the data flow graph. Specifically, for a definition and use such as
```python
  x = expr
  y = x + 2
```
we used to have flow from `expr` to an SSA variable representing x and from that SSA variable to the use of `x` in the definition of `y`. Now we instead have flow from `expr` to the control flow node for `x` at line 1 and from there to the control flow node for `x` at line 2.

Specific changes:
- `EssaNode` from the data flow layer no longer exists.
- Several glue steps between `EssaNode`s and `CfgNode`s have been deleted.
- Entry nodes are now admitted as `CfgNodes` in the data flow layer (they were filtered out before).
- Entry nodes now have a new `toString` taking into account that the module name may be ambigous.
- Some tests have been rewritten to accomodate the changes, but only `python/ql/test/experimental/dataflow/basic/maximalFlowsConfig.qll` should have semantic changes.
- Comments have been updated
- Test output has been updated, but apart from `python/ql/test/experimental/dataflow/basic/maximalFlows.expected` only `python/ql/test/experimental/dataflow/typetracking-summaries/summaries.py` should have a semantic change. This is a bonus fix, probably meaning that something was never connected up correctly.
2023-11-20 21:35:32 +01:00
Rasmus Wriedt Larsen
ca93f4d223 Python: Accept .expected changes 2023-08-11 10:36:05 +02:00
Rasmus Lerchedahl Petersen
9cb83fcdc9 python: add summaries for
copy, pop, get, getitem, setdefault

Also add read steps to taint tracking.

Reading from a tainted collection can be done in two situations:
1. There is an acces path
    In this case a read step (possibly from a flow summary)
    gives rise to a taint step.
2. There is no access path
    In this case an explicit taint step (possibly via a flow
    summary) should exist.
2023-05-26 14:04:15 +02:00
Rasmus Lerchedahl Petersen
8d4f9447b1 python: remove explicit steps
copy, pop, get, popitem
2023-05-26 13:22:54 +02:00
Rasmus Wriedt Larsen
d73289ac4e Python: Accept .expected changes 2023-04-27 11:54:39 +02:00
erik-krogh
4da0508dae Merge branch 'main' into py-last-msg 2022-10-11 10:49:19 +02:00
erik-krogh
944ca4a0da fix some more style-guide violations in the alert-messages 2022-10-07 11:23:34 +02:00
Rasmus Wriedt Larsen
b01a0ae696 Python: Adjust .expected after flask source change
It's really hard to audit that this is all good.. I tried my best with
`icdiff` though -- and there is a problem with
ql/src/experimental/Security/CWE-348/ClientSuppliedIpUsedInSecurityCheck.ql
that needs to be fixed in the next commit
2022-10-03 20:35:49 +02:00
erik-krogh
089ce5a8a4 change alert messages of path queries to use the same template 2022-09-02 14:45:40 +02:00
Anders Schack-Mulligen
f30dad7705 Dataflow: Update test expected outputs. 2021-09-07 13:02:20 +02:00
Rasmus Wriedt Larsen
77021ae119 Python: Restructure security tests to contain query name
We were mixing between things, so this is just to keep things
consistent. Even though it's not strictly needed for all queries,
it does look nice I think
2021-07-19 16:54:34 +02:00