Option 2: eliminates the AST→CFG bridge from the AST layer. Previously
'AstNode.getAFlowNode()' returned a 'ControlFlowNode' from the legacy
'Flow.qll' CFG via 'py_flow_bb_node' — this hardcoded the AST to know
about the legacy CFG, preventing files from cleanly switching to the
new shared CFG.
Removes:
* 'AstNode.getAFlowNode()' from 'AstExtended.qll'
* Type-narrowing overrides on 'Attribute' / 'Subscript' / 'Call' /
'IfExp' / 'Name' / 'NameConstant' / 'ImportMember' (in Exprs.qll
and Import.qll)
Rewrites ~130 call sites across 'python/ql/lib/' and 'python/ql/src/'
to bridge from the CFG side instead:
Before: node = expr.getAFlowNode()
After: node.getNode() = expr
Before: expr.getAFlowNode().(DefinitionNode).getValue()
After: exists(DefinitionNode d | d.getNode() = expr | d.getValue())
Before: cn.operands(const.getAFlowNode(), op, x)
After: exists(ControlFlowNode c | c.getNode() = const | cn.operands(c, op, x))
This is semantically a no-op — both forms are duals of the same predicate.
Verified by passing all library tests:
* 64 dataflow tests
* 28 ControlFlow + dataflow-new-ssa tests
* 1 essa SSA-compute test
* 93 tests total in the focused suite
Once committed, files that want to switch from the legacy 'Flow' CFG
to the new 'Cfg' facade only need to change their imports — the
bridge sites are CFG-side and respect whichever ControlFlowNode is in
scope.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Moves the classes/predicates that _actually_ depend on points-to to the
`LegacyPointsTo` module, leaving behind a module that contains all of
the metrics-related stuff (line counts, nesting depth, etc.) that don't
need points-to to be evaluated.
Consequently, `Metrics` is now no longer a private import in
`python.qll`.
Gets rid of the `getMetrics` methods on the `Function`, `Class`, and
`Module` classes. To access the metrics, one must first import the
`LegacyPointsTo` module, and then either change the type to
`{Function,Class,Module}Metrics` or cast to the appropriate type.
Turns out the `ImportTime` module (despite living in
`semmle.python.types` does not actually depend on points-to, so some of
the `LegacyPointsTo` imports could be replaced or removed.
Gets rid of a bunch of predicates relating to reachability (which
depended on the modelling of exceptions, which uses points-to), moving
them to `LegacyPointsTo`. In the process, we gained a new class
`BasicBlockWithPointsTo`.
This frees `Class.qll`, `Exprs.qll`, and `Function.qll` from the
clutches of points-to. For the somewhat complicated setup with
`getLiteralObject` (an abstract method), I opted for a slightly ugly but
workable solution of just defining a predicate on `ImmutableLiteral`
that inlines each predicate body, special-cased to the specific instance
to which it applies.
For now, these have just been made into `private` imports. After doing
this, I went through all of the (now not compiling) files and added in
private imports to the modules that they actually depended on.
I also added an explicit import of `LegacyPointsTo` (even though it may
be unnecessary) in cases where the points-to dependency was somewhat
surprising (and one we want to get rid of). This was primarily inside
the various SSA layers.
For modules inside `semmle.python.{types, objects, pointsto}` I did not
bother, as these are fairly clearly related to points-to.
Moves the existing points-to predicates to the newly added class
`ControlFlowNodeWithPointsTo` which resides in the `LegacyPointsTo`
module.
(Existing code that uses these predicates should import this module, and
references to `ControlFlowNode` should be changed to
`ControlFlowNodeWithPointsTo`.)
Also updates all existing points-to based code to do just this.