codeql

mirror of https://github.com/github/codeql.git synced 2026-06-03 12:50:16 +02:00

Files

Copilot 4ed5722e3e Python: switch dataflow library to new (shared) CFG + SSA

Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll)
and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade
(semmle.python.controlflow.internal.Cfg) and the new SSA adapter
(semmle.python.dataflow.new.internal.SsaImpl), both introduced
additively in the preceding PRs in this stack.

This is the trunk-flip equivalent of the original draft PR #21894 (kept
around as documentation), rebased on top of the four preparatory PRs:

  P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919).
  P2: Qualify Flow.qll's AST references with Py:: prefix (#21920).
  P3: Add new shared-CFG-backed control flow graph (#21921).
  P4: Add new shared-SSA-backed SSA adapter (#21923).

The Python dataflow library (semmle/python/dataflow/new/) now imports
the new CFG facade and SSA adapter. All CFG-typed predicates
(ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are
qualified with the Cfg:: prefix; SSA references switch from
EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable.

GuardNode is redesigned to use the new CFG's outcome-node model
(isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock +
flipped indirection. Only BarrierGuard<...> is preserved as public
API.

Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib,
...) are updated to take CFG nodes from the new facade.

A handful of dataflow consistency tweaks for the new CFG:
- Augmented-assignment targets are treated as both load and store.
- 'from X import *' produces uncertain SSA writes for unknown names.
- CFG nodes are canonicalised so dataflow does not see equivalent
  pre/post-order pairs as distinct nodes.

Two AST tweaks for the new CFG:
- AstNodeImpl: omit PEP 695 type-parameter names from
  FunctionDefExpr / ClassDefExpr children.
- ImportResolution: drop the legacy essa import.

Test churn (~175 files): reblessed library- and query-test .expected
files reflect slightly different CFG granularity, different toString
output, and a handful of true alert deltas in security queries.

Verification: all 367 lib + src + consistency-queries compile clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

2026-06-02 14:09:45 +00:00

NaiveModel.expected

Python: switch dataflow library to new (shared) CFG + SSA

2026-06-02 14:09:45 +00:00

NaiveModel.ql

Python: switch dataflow library to new (shared) CFG + SSA

2026-06-02 14:09:45 +00:00

ProperModel.expected

Python: switch dataflow library to new (shared) CFG + SSA

2026-06-02 14:09:45 +00:00

ProperModel.ql

Python: switch dataflow library to new (shared) CFG + SSA

2026-06-02 14:09:45 +00:00

README.md

Python: Move framework tests out of experimental

2021-03-19 15:51:54 +01:00

SharedCode.qll

Python: switch dataflow library to new (shared) CFG + SSA

2026-06-02 14:09:45 +00:00

test.py

spelling: across

2022-10-11 00:23:35 -04:00

README.md

This test illustrates that you need to be very careful when adding additional taint-steps or dataflow steps using TypeTracker.

The basic setup is that we're modeling the behavior of a (fictitious) external library class MyClass, and (fictitious) source of such an instance (the source function).

class MyClass:
    def __init__(self, value):
        self.value = value

    def get_value(self):
        return self.value

We want to extend our analysis to obj.get_value() is also tainted if obj is a tainted instance of MyClass.

The actual type-tracking is done in SharedCode.qll, but it's the way we use it that matters.

In NaiveModel.ql we add an additional taint step from an instance of MyClass to calls of the bound method get_value (that we have tracked). It provides us with the correct results, but the path explanations are not very useful, since we are now able to cross functions in one step.

In ProperModel.ql we split the additional taint step in two:

from tracked obj that is instance of MyClass, to obj.get_value but only exactly where the attribute is accessed (by an AttrNode). This is important, since if we allowed <any tracked qualifier>.get_value we would again be able to cross functions in one step.
from tracked get_value bound method to calls of it, but only exactly where the call is (by a CallNode). for same reason as above.

Try running the queries in VS Code to see the difference

Possible improvements

Using AttrNode directly in the code here means there is no easy way to add getattr support too all such predicates. Not really sure how to handle this in a generalized way though :|