Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll)
and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade
(semmle.python.controlflow.internal.Cfg) and the new SSA adapter
(semmle.python.dataflow.new.internal.SsaImpl), both introduced
additively in the preceding PRs in this stack.
This is the trunk-flip equivalent of the original draft PR #21894 (kept
around as documentation), rebased on top of the four preparatory PRs:
P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919).
P2: Qualify Flow.qll's AST references with Py:: prefix (#21920).
P3: Add new shared-CFG-backed control flow graph (#21921).
P4: Add new shared-SSA-backed SSA adapter (#21923).
The Python dataflow library (semmle/python/dataflow/new/) now imports
the new CFG facade and SSA adapter. All CFG-typed predicates
(ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are
qualified with the Cfg:: prefix; SSA references switch from
EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable.
GuardNode is redesigned to use the new CFG's outcome-node model
(isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock +
flipped indirection. Only BarrierGuard<...> is preserved as public
API.
Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib,
...) are updated to take CFG nodes from the new facade.
A handful of dataflow consistency tweaks for the new CFG:
- Augmented-assignment targets are treated as both load and store.
- 'from X import *' produces uncertain SSA writes for unknown names.
- CFG nodes are canonicalised so dataflow does not see equivalent
pre/post-order pairs as distinct nodes.
Two AST tweaks for the new CFG:
- AstNodeImpl: omit PEP 695 type-parameter names from
FunctionDefExpr / ClassDefExpr children.
- ImportResolution: drop the legacy essa import.
Test churn (~175 files): reblessed library- and query-test .expected
files reflect slightly different CFG granularity, different toString
output, and a handful of true alert deltas in security queries.
Verification: all 367 lib + src + consistency-queries compile clean.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Extends the mechanism introduced in
https://github.com/github/codeql/pull/18030
to behave the same for _all_ `MatchLiteralPattern`s, not just the ones
that happen to be the constant `True` or `False`.
Co-authored-by: yoff <yoff@github.com>
Adds a test demostrating the false positive observed by andersfugmann.
Note that this does not change the `.expected` file, and so the tests
will fail. This is expected.
This is a temporary fix!
Added minimal working example (MWE) as a regression, so it's easier to fix the
real problem.
only Python 3 is facing the problem -- and without --max-import-depth=1 the test
times out at 10 minutes :O
Should fix#1833, #2137, and #2187.
Internally, comprehensions are (at present) elaborated into local functions and
iterators as described in [PEP-289](https://www.python.org/dev/peps/pep-0289/).
That is, something like:
```
g = (x**2 for x in range(10))
```
becomes something akin to
```
def __gen(exp):
for x in exp:
yield x**2
g = __gen(iter(range(10)))
```
In the context of the top-level of a class, this means `__gen` looks as if it is
a method of the class, and in particular `exp` looks like it's the `self`
argument of this method, which leads the points-to analysis to think that `exp`
is an instance of the surrounding class itself.
The fix in this case is pretty simple: we look for occurrences of `exp` (in fact
called `.0` internally -- carefully chosen to _not_ be a valid Python
identifier) and explicitly exclude this parameter from being classified as a
`self` parameter.
This was brought up on the LGTM.com forums here:
https://discuss.lgtm.com/t/warn-when-always-failing-assert-is-reachable-rather-than-unreachable/2436
Essentially, in a complex chain of `elif` statements, like
```python
if x < 0:
...
elif x >= 0:
...
else:
...
```
the `else` clause is redundant, since the preceding conditions completely
exhaust the possible values for `x` (assuming `x` is an integer). Rather than
promoting the final `elif` clause to an `else` clause, it is common to instead
raise an explicit exception in the `else` clause. During execution, this
exception will never actually be raised, but its presence indicates that the
preceding conditions are intended to cover all possible cases.
I think it's a fair point. This is a clear instance where the alert, even if it
is technically correct, is not useful for the end user.
Also, I decided to make the exclusion fairly restrictive: it only applies if
the unreachable statement is an `assert False, ...` or `raise ...`, and only
if said statement is the first in the `else` block. Any other statements will
still be reported.