The shared CFG creates multiple ControlFlowNodes per AST node in
conditional contexts (e.g. afterTrue/afterFalse for boolean conditions,
empty/non-empty for for-loops, matched/unmatched for match cases).
These splits matter for control-flow analysis, but for dataflow — where
we ask 'what is the value of this expression?' — we need exactly one
representative per AST or we double-count calls, arguments, and store
steps.
This adds Cfg::isCanonicalAstNodeRepresentative as a purely structural
pick: for split ASTs it selects the 'positive' outcome variant; for
non-split ASTs it selects the unique variant. The picker is implemented
via genuine-outcome helpers that work around the shared CFG's
cross-kind isAfterValue fallback (ControlFlowGraph.qll:870-892), see
the doc on isGenuineAfterTrue for details.
The TCfgNode-family newtypes in DataFlowPublic, TNormalCall and
TPotentialLibraryCall in DataFlowDispatch, and the SSA-projected
use-use/def-use steps in DataFlowPrivate are all routed through the
canonical filter. DataFlowConsistency and the test UnresolvedCalls
helper qualify their CallNode casts with Cfg:: to keep working.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Switches the trunk dataflow library and all in-tree consumers
(frameworks, ApiGraphs, Concepts, regexp, security customisations,
test harness) from the legacy Flow.qll/ESSA stack to the new
shared-CFG facade (Cfg.qll) and the ESSA-shaped adapter on the
shared-SSA library (SsaImpl.qll).
Highlights:
* DataFlowPublic/Private/Dispatch, Attributes, VariableCapture,
IterableUnpacking, ImportResolution, ImportStar, LocalSources,
TaintTrackingPrivate, MatchUnpacking, TypeTrackingImpl,
SsaImpl, Builtins all now qualify CFG/SSA references with
Cfg:: / SsaImpl:: and stop pulling in semmle.python.essa.*.
* AstNodeImpl.qll/Cfg.qll: ImportMember exposes its inner
ImportExpr, DefinitionNode.getValue covers Alias / AnnAssign /
AugAssign / AssignExpr / For-target / Parameter-default,
ForNode is treated as an expression node, AnnotatedExitNode is
canonical, and BoolExprNode.getAnOperand drops the dominance
constraint that did not hold for short-circuit BBs.
* SsaImpl.qll: parameters always get a ParameterDefinition (so
unused parameters still have SSA defs), scope-entry defs for
module globals require an actual store somewhere, scope-exit
has a synthetic use so reaching-defs survives to module
boundary, and the legacy SsaSourceVariable / EssaVariable
surface (getName, getScope, getAUse, getASourceUse,
getAnImplicitUse) is reinstated for downstream queries.
* DataFlowPublic.qll: GuardNode redesigned around the new
structural outcome nodes (isAfterTrue / isAfterFalse). The
legacy ConditionBlock + flipped indirection is gone;
controlsBlock walks UP through 'not' / '==True' / 'is False'
etc. via outcomeOfGuard, accumulating polarity cleanly. Only
BarrierGuard<...> is preserved as public API.
* ModuleVariableNode.getAWrite and LocalFlow::definitionFlowStep
bypass SSA and consult Cfg::NameNode.defines /
Cfg::DefinitionNode.getValue directly, so that write defs
pruned by shared SSA (because the variable has no in-scope
read) still produce dataflow steps.
* Frameworks + downstream consumers: replace
EssaVariable.hasDefiningNode, getAReturnValueFlowNode,
Parameter.getDefault, Scope.getEntryNode / getANormalExit etc.
with CFG-side bridges through Cfg::ControlFlowNode.
The legacy Flow.qll / Essa.qll stack is untouched and remains
available for queries that import it directly.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Fixes the test failures that arose from making `ExtractedArgumentNode`
local.
For the consistency checks, we now explicitly exclude the
`ExtractedArgumentNode`s (now much more plentiful due to the
overapproximation) that don't have a corresponding `getCallArg` tuple.
For various queries/tests using `instanceof ArgumentNode`, we instead us
`isArgumentNode`, which explicitly filters out the ones for which
`isArgumentOf` doesn't hold (which, again, is the case for most of the
nodes in the overapproximation).
With `ModuleVariableNode`s now appearing for _all_ global variables (not
just the ones that actually seem to be used), some of the tests changed
a bit. Mostly this was in the form of new flow (because of new nodes
that popped into existence). For some inline expectation tests, I opted
to instead exclude these results, as there was no suitable location to
annotate. For the normal tests, I just accepted the output (after having
vetted it carefully, of course).