Commit Graph

15 Commits

Author SHA1 Message Date
Copilot
4ed5722e3e Python: switch dataflow library to new (shared) CFG + SSA
Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll)
and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade
(semmle.python.controlflow.internal.Cfg) and the new SSA adapter
(semmle.python.dataflow.new.internal.SsaImpl), both introduced
additively in the preceding PRs in this stack.

This is the trunk-flip equivalent of the original draft PR #21894 (kept
around as documentation), rebased on top of the four preparatory PRs:

  P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919).
  P2: Qualify Flow.qll's AST references with Py:: prefix (#21920).
  P3: Add new shared-CFG-backed control flow graph (#21921).
  P4: Add new shared-SSA-backed SSA adapter (#21923).

The Python dataflow library (semmle/python/dataflow/new/) now imports
the new CFG facade and SSA adapter. All CFG-typed predicates
(ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are
qualified with the Cfg:: prefix; SSA references switch from
EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable.

GuardNode is redesigned to use the new CFG's outcome-node model
(isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock +
flipped indirection. Only BarrierGuard<...> is preserved as public
API.

Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib,
...) are updated to take CFG nodes from the new facade.

A handful of dataflow consistency tweaks for the new CFG:
- Augmented-assignment targets are treated as both load and store.
- 'from X import *' produces uncertain SSA writes for unknown names.
- CFG nodes are canonicalised so dataflow does not see equivalent
  pre/post-order pairs as distinct nodes.

Two AST tweaks for the new CFG:
- AstNodeImpl: omit PEP 695 type-parameter names from
  FunctionDefExpr / ClassDefExpr children.
- ImportResolution: drop the legacy essa import.

Test churn (~175 files): reblessed library- and query-test .expected
files reflect slightly different CFG granularity, different toString
output, and a handful of true alert deltas in security queries.

Verification: all 367 lib + src + consistency-queries compile clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-02 14:09:45 +00:00
Rasmus Lerchedahl Petersen
263c0aade7 Python: adjust test expectations
mostly removing of nodes from the graph.
One result lost:
```
check("submodule.submodule_attr", submodule.submodule_attr, "submodule_attr", globals()) #$ MISSING:prints=submodule_attr
```
2023-12-06 23:00:51 +01:00
Rasmus Lerchedahl Petersen
11c71fdd18 Python: remove EssaNodes
This commit removes SSA nodes from the data flow graph. Specifically, for a definition and use such as
```python
  x = expr
  y = x + 2
```
we used to have flow from `expr` to an SSA variable representing x and from that SSA variable to the use of `x` in the definition of `y`. Now we instead have flow from `expr` to the control flow node for `x` at line 1 and from there to the control flow node for `x` at line 2.

Specific changes:
- `EssaNode` from the data flow layer no longer exists.
- Several glue steps between `EssaNode`s and `CfgNode`s have been deleted.
- Entry nodes are now admitted as `CfgNodes` in the data flow layer (they were filtered out before).
- Entry nodes now have a new `toString` taking into account that the module name may be ambigous.
- Some tests have been rewritten to accomodate the changes, but only `python/ql/test/experimental/dataflow/basic/maximalFlowsConfig.qll` should have semantic changes.
- Comments have been updated
- Test output has been updated, but apart from `python/ql/test/experimental/dataflow/basic/maximalFlows.expected` only `python/ql/test/experimental/dataflow/typetracking-summaries/summaries.py` should have a semantic change. This is a bonus fix, probably meaning that something was never connected up correctly.
2023-11-20 21:35:32 +01:00
Rasmus Wriedt Larsen
11000fd123 Python: Fix ModuleExport.ql test for Python 2 2023-02-27 17:00:17 +01:00
Rasmus Wriedt Larsen
b7bdc551d5 Python: Show import resolution is a bit generous with exported value 2023-02-23 00:55:58 +01:00
Rasmus Wriedt Larsen
373907265b Python: Fixed most problems from last commit
That one line was an afterthought, and certainly did not work as
intended.
2023-02-23 00:39:45 +01:00
Rasmus Wriedt Larsen
97fefd2545 Python: Attempt to fix import flow
It's nice that it fixes the `InsecureProtocol` test-case (which maybe
should have been a test-case for the import resolution library in the
first place?)

But it's not quite right:

1. it adds spurious flow for `clashing_attr`
2. it runs into huge problems for typetracking_imports/tracked.expected
3. it runs into the problem for
   https://github.com/github/codeql/pull/10176 with an `from <pkg>
   import *` blocking flow from previously defined variable, that is NOT
   overridden. (simplistic_reexport.bar_attr)
2023-02-23 00:36:30 +01:00
Rasmus Wriedt Larsen
bea0acb497 Python: Add barrier test to import resolution
Just like the one added for `py/insecure-protocol` in fb425b7, but
instead added in the import-resolution tests, such that we don't have to
remember it's in a completely different directory.
2023-02-23 00:33:12 +01:00
Rasmus Wriedt Larsen
321a4b4ef2 Python: ModuleExport.ql test: ignore main.py
It's not very useful to look at, and it's a mess when you change any
tests to see all the changes lines in the expected output that you
really do not care about!
2023-02-23 00:31:05 +01:00
Rasmus Wriedt Larsen
8eaaf8e3e5 Python: Ignore trace.py in ModuleExport.ql test
I guess we could have done this at the very start of introducing this
test in this PR, but I think the last commit was mostly inspired from
looking at all the things that evidently was re-exported from the trace
import, even when I knew they were not available because of the
`__all__` definition.
2023-02-22 15:42:28 +01:00
Rasmus Wriedt Larsen
c8a76246d8 Python: Take __all__ into consideration for re-export of from <pkg> import *
However, we can see that `from <pkg> import *` and `import pkg` are
handled differently. Would have liked `has_defined_all_indirection` to
behave in the same way no matter how the import was made.
2023-02-22 15:39:57 +01:00
Rasmus Wriedt Larsen
be5812cf91 Python: from <pkg> import * ignores __all__ regression
Notice that `has_defined_all_indirection` all have both
`all_defined_bar_copy` and `all_defined_foo_copy` marked as exported,
even though only `all_defined_foo_copy` is available.
2023-02-22 15:38:24 +01:00
Rasmus Wriedt Larsen
4df7dfbff6 Python: Don't import module as module_attr
For `from <pkg> import <attr>` we would use to treat the `<pkg>`
(ImportExpr) as a definition of the name `<attr>`.

Since this removes bad import-flow, and nothing broke, I'm guessing this
was never intentional.
2023-02-22 14:52:35 +01:00
Rasmus Wriedt Larsen
6ba39d5fb3 Python: Add import regression for re-exported things 2023-02-22 14:50:42 +01:00
Rasmus Wriedt Larsen
6a5eebe891 Python: Add test of module_export 2023-02-22 12:26:01 +01:00