Commit Graph

43 Commits

Author SHA1 Message Date
yoff
3c9b0f770b Python: switch dataflow library to new (shared) CFG + SSA
Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll)
and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade
(semmle.python.controlflow.internal.Cfg) and the new SSA adapter
(semmle.python.dataflow.new.internal.SsaImpl), both introduced
additively in the preceding PRs in this stack.

This is the trunk-flip equivalent of the original draft PR #21894 (kept
around as documentation), rebased on top of the four preparatory PRs:

  P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919).
  P2: Qualify Flow.qll's AST references with Py:: prefix (#21920).
  P3: Add new shared-CFG-backed control flow graph (#21921).
  P4: Add new shared-SSA-backed SSA adapter (#21923).

The Python dataflow library (semmle/python/dataflow/new/) now imports
the new CFG facade and SSA adapter. All CFG-typed predicates
(ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are
qualified with the Cfg:: prefix; SSA references switch from
EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable.

GuardNode is redesigned to use the new CFG's outcome-node model
(isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock +
flipped indirection. Only BarrierGuard<...> is preserved as public
API.

Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib,
...) are updated to take CFG nodes from the new facade.

A handful of dataflow consistency tweaks for the new CFG:
- Augmented-assignment targets are treated as both load and store.
- 'from X import *' produces uncertain SSA writes for unknown names.
- CFG nodes are canonicalised so dataflow does not see equivalent
  pre/post-order pairs as distinct nodes.

Two AST tweaks for the new CFG:
- AstNodeImpl: omit PEP 695 type-parameter names from
  FunctionDefExpr / ClassDefExpr children.
- ImportResolution: drop the legacy essa import.

Test churn (~175 files): reblessed library- and query-test .expected
files reflect slightly different CFG granularity, different toString
output, and a handful of true alert deltas in security queries.

Verification: all 367 lib + src + consistency-queries compile clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-18 15:17:35 +00:00
Copilot
6248d3ad9d Python: add new shared-CFG-backed control flow graph
Preparatory refactor for the shared-CFG dataflow migration. Adds the
new Python CFG library additively, without changing any production
behaviour.

Library additions:

- semmle.python.controlflow.internal.AstNodeImpl — mediates between
  the Python AST and the shared codeql.controlflow.ControlFlowGraph
  signature. Wraps Python's Stmt/Expr/Scope/Pattern and adds two
  synthetic kinds of node (BlockStmt for body slots, intermediate
  nodes for multi-operand boolean expressions).

- semmle.python.controlflow.internal.Cfg — public facade
  re-exposing the same API surface as semmle/python/Flow.qll
  (ControlFlowNode, CallNode, BasicBlock, NameNode, DefinitionNode,
  CompareNode, ...), backed by the shared CFG.

- lib/printCfgNew.ql — debug/visualisation query for the new CFG.

- consistency-queries/CfgConsistency.ql — consistency query running
  the shared CFG's standard checks against Python.

Shared library:

- shared.controlflow.ControlFlowGraph — adds two defaulted
  getWhileElse / getForeachElse predicates to AstSig so Python can
  model while-else / for-else (no behavioural change for other
  languages).

Test additions:

- ControlFlow/bindings/* — annotation-driven SSA-binding tests for
  the new CFG (annassign, compound, comprehension, decorated,
  except_handler, imports, match_pattern, parameters, simple,
  type_params, walrus_starred, with_stmt, dead_under_no_raise).

- ControlFlow/store-load/* — basic store/load coverage.

- ControlFlow/evaluation-order/NewCfg*.ql — mirrors of the existing
  OldCfg evaluation-order self-validation suite, run against the
  new CFG via NewCfgImpl.qll.

- Minor extensions to existing test_if.py / test_boolean.py +
  cosmetic .expected churn on a handful of OldCfg tests.

No dataflow, SSA, or production query is migrated yet — that lands in
follow-up PRs. The new CFG library has zero callers in lib/ and src/.

Verified by:
- All lib + src + consistency-queries compile clean (367 queries).
- All 56 ControlFlow library-tests pass.
- All 474 dataflow + PointsTo library-tests + consistency tests pass.
- syntax_error/CONSISTENCY/CfgConsistency passes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-18 15:17:34 +00:00
Taus
fb6175d10b Python: Fix consistency test failures
As we now have many more capturing closure arguments, we must once again
exclude the ones that don't actually have `argumentOf` defined.
2026-01-30 12:50:25 +00:00
Taus
6113d4be9e Python: Fix test issues
Fixes the test failures that arose from making `ExtractedArgumentNode`
local.

For the consistency checks, we now explicitly exclude the
`ExtractedArgumentNode`s (now much more plentiful due to the
overapproximation) that don't have a corresponding `getCallArg` tuple.

For various queries/tests using `instanceof ArgumentNode`, we instead us
`isArgumentNode`, which explicitly filters out the ones for which
`isArgumentOf` doesn't hold (which, again, is the case for most of the
nodes in the overapproximation).
2026-01-30 12:50:25 +00:00
Nora Dimitrijević
20d4e429ca Add consistency query (exactly one path for every entity) 2025-10-06 11:47:56 +02:00
yoff
1048cf7c5e Merge pull request #15711 from RasmusWL/tt-content
Python: Add type tracking for content
2024-04-09 10:37:43 +02:00
Rasmus Wriedt Larsen
a22b9947c0 Python: Revert IterableSequenceNode as LocalSourceNode
When looking things over a bit more, we could actually exclude the steps
that would never be used instead. A much more involved solution, but
more performance oriented and clear in terms of what is supported (at
least until we start supporting type-tracking with more than depth 1
access-path, if that ever happens)
2024-04-02 16:51:00 +02:00
Tom Hvitved
fc55567d90 Merge pull request #15853 from hvitved/dataflow/get-location
Data flow: Replace `hasLocationInfo` with `getLocation`
2024-03-18 20:21:46 +01:00
Rasmus Wriedt Larsen
4d78762ba8 Python: Ignore consistency failure 2024-03-14 10:43:28 +01:00
Tom Hvitved
6c0ed28e6b Python: Implement new data flow interface 2024-03-13 14:41:57 +01:00
Rasmus Wriedt Larsen
800351c7b7 Merge branch 'main' into tt-consistency 2024-03-11 14:12:09 +01:00
Rasmus Wriedt Larsen
adf5a4b1e4 Python: Fix internal consistency failures 2024-03-08 14:13:47 +01:00
Rasmus Wriedt Larsen
87b6592dbc Python: Accept inconsistency for missing use-use flow
At least until we have a proper fix
2024-03-08 13:34:26 +01:00
Rasmus Wriedt Larsen
85a45b0155 Python: Fix comment
Co-authored-by: yoff <lerchedahl@gmail.com>
2024-03-04 11:40:17 +01:00
Rasmus Wriedt Larsen
7c60562132 Python: Ignore IterableSequenceNode inconsistencies 2024-03-01 14:22:18 +01:00
Rasmus Wriedt Larsen
bcd5c08ebd Python: Ignore match-related inconsistencies 2024-03-01 14:15:32 +01:00
Rasmus Wriedt Larsen
1658a1cb80 Python: Ignore SynthDictSplatArgumentNode failures 2024-03-01 14:00:06 +01:00
Rasmus Wriedt Larsen
ff5f794750 Python: Exclude synth preupdate nodes from tt-consistency
... and that should be it 👍 (so that's why I'm allowing the tests to
run on all data-flow nodes again)
2024-03-01 10:27:29 +01:00
Rasmus Wriedt Larsen
bbe8c6dcaa Python: Remove synth postupdate nodes from tt-consistency 2024-03-01 10:23:50 +01:00
Rasmus Wriedt Larsen
9f01ea68f7 Python: Add type-tracking consistency query
For now I'm only ignoring stdlib nodes, so it's easy for reviewer to see
why we need to have more excludes :)
2024-03-01 10:19:49 +01:00
Rasmus Wriedt Larsen
d182eae868 Python: Add consistency check for PhaseDependentFlow
This would have found the problem in
https://github.com/github/codeql/pull/15755.

As highlighted in the comment in the code, it's not a perfect solution
since we don't have an automatic way to ensure we don't introduce a new
PhaseDependentFlow use with a new step relation and forget to add it to
this consistency check... but I think this consistency check still adds
value!
2024-03-01 10:01:08 +01:00
Rasmus Lerchedahl Petersen
da4aef80e9 Revert "Python: make it a real consistency check"
This reverts commit 45411f4a93.
2023-12-20 16:15:17 +01:00
Rasmus Lerchedahl Petersen
45411f4a93 Python: make it a real consistency check 2023-12-20 14:53:37 +01:00
Rasmus Lerchedahl Petersen
d6544cc550 Python: remove consistency exclusion 2023-12-18 15:24:49 +01:00
Rasmus Lerchedahl Petersen
5de1725648 Python: update class name 2023-12-15 23:50:29 +01:00
Rasmus Lerchedahl Petersen
a311582285 Python: Bring back (now simplified) exclusion 2023-12-15 13:28:16 +01:00
Rasmus Lerchedahl Petersen
5b6ea15028 Python: remove unneeded consistency exclusion 2023-12-15 11:09:37 +01:00
Rasmus Lerchedahl Petersen
262d43abcf Python: Make compile and add comment 2023-12-15 10:28:51 +01:00
Rasmus Lerchedahl Petersen
38e03216f6 Python: allow CaptureArgumentNodes as multiple arguemnts
These are the labmda self references. This is similar to
how `BlockParameterArgumentNode` is excluded for Ruby.

It is important that we restrict `call` in this logic.
Otherwise, we get a cartesian product and the consistency
check runs for a very long time...
2023-12-14 10:32:29 +01:00
Rasmus Lerchedahl Petersen
f32d5e422d Python: typo 2023-12-14 10:28:26 +01:00
Rasmus Lerchedahl Petersen
5471c92e9f Python: exclusion for summary nodes
as in Ruby
2023-12-14 10:28:26 +01:00
Rasmus Lerchedahl Petersen
b513871b9b Python: add consistency exclusions 2023-12-14 10:27:15 +01:00
Rasmus Wriedt Larsen
2c10160ad4 Python: Highlight we actually want post-update nodes for *args and **kwargs arguments 2023-11-28 14:07:03 +01:00
Rasmus Wriedt Larsen
02f2031239 Python: Ensure other call for super().foo 2023-11-28 14:04:51 +01:00
Rasmus Wriedt Larsen
4a98ed903e Python: Fix consistency for bound-methods used in list-comp 2023-11-22 14:07:40 +01:00
Rasmus Wriedt Larsen
67b1414177 Python: Highlight even more cases for multipleArgumentCallExclude 2023-11-22 11:25:23 +01:00
Rasmus Wriedt Larsen
f9d7becd04 Python: Make multipleArgumentCallExclude more specific 2023-11-21 15:57:12 +01:00
Rasmus Wriedt Larsen
df9fb141b8 Python: Remove old manual consistency query tests 2023-11-21 11:50:23 +01:00
Rasmus Wriedt Larsen
b6df6b7c99 Python: Add dataflow consistency query 2023-11-21 11:33:28 +01:00
Kasper Svendsen
f41276cb7f Python: Enable implicit this warnings for remaining packs 2023-06-27 12:00:13 +02:00
Dave Bartolomeo
9d5e5e3ee7 ${workspace} all the things 2022-11-01 13:29:05 -04:00
Rasmus Wriedt Larsen
32cd7d6fa7 Add groups to all consistency-queries/qlpack.yml
as discussed in PR review
2022-02-07 11:15:48 +01:00
Rasmus Wriedt Larsen
c817ba5718 Python: Add consistency-queries/qlpack.yml
But no queries yet
2022-02-04 12:08:54 +01:00