Commit Graph

119 Commits

Author SHA1 Message Date
yoff
bf8ff487dc Python: switch dataflow library to new (shared) CFG + SSA
Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll)
and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade
(semmle.python.controlflow.internal.Cfg) and the new SSA adapter
(semmle.python.dataflow.new.internal.SsaImpl), both introduced
additively in the preceding PRs in this stack.

This is the trunk-flip equivalent of the original draft PR #21894 (kept
around as documentation), rebased on top of the four preparatory PRs:

  P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919).
  P2: Qualify Flow.qll's AST references with Py:: prefix (#21920).
  P3: Add new shared-CFG-backed control flow graph (#21921).
  P4: Add new shared-SSA-backed SSA adapter (#21923).

The Python dataflow library (semmle/python/dataflow/new/) now imports
the new CFG facade and SSA adapter. All CFG-typed predicates
(ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are
qualified with the Cfg:: prefix; SSA references switch from
EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable.

GuardNode is redesigned to use the new CFG's outcome-node model
(isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock +
flipped indirection. Only BarrierGuard<...> is preserved as public
API.

Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib,
...) are updated to take CFG nodes from the new facade.

A handful of dataflow consistency tweaks for the new CFG:
- Augmented-assignment targets are treated as both load and store.
- 'from X import *' produces uncertain SSA writes for unknown names.
- CFG nodes are canonicalised so dataflow does not see equivalent
  pre/post-order pairs as distinct nodes.

Two AST tweaks for the new CFG:
- AstNodeImpl: omit PEP 695 type-parameter names from
  FunctionDefExpr / ClassDefExpr children.
- ImportResolution: drop the legacy essa import.

Test churn (~175 files): reblessed library- and query-test .expected
files reflect slightly different CFG granularity, different toString
output, and a handful of true alert deltas in security queries.

Verification: all 367 lib + src + consistency-queries compile clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-29 19:37:37 +00:00
Copilot
717ff62d70 Python: deprecate AstNode.getAFlowNode() and rewrite internal callers
Preparatory refactor for the shared-CFG dataflow migration.

Deprecates the AstNode.getAFlowNode() cached predicate on the public
Python QL API and rewrites all ~140 internal callers across lib/, src/,
test/, and tools/ from `expr.getAFlowNode() = cfgNode` to
`cfgNode.getNode() = expr`, using ControlFlowNode.getNode() which
already exists in Flow.qll.

The predicate itself is preserved (with a deprecation note pointing at
the new pattern) so external users do not experience churn — they can
migrate at their own pace and the AST/CFG hierarchies still get the
intended untangling once the deprecation eventually elapses.

Semantic noop verified by:
- All 361 lib/ + src/ queries compile clean.
- All 122 ControlFlow + PointsTo library-tests pass.
- All 64 dataflow library-tests pass.
- All 113 Variables/Exceptions/Expressions/Statements/Functions/Imports/
  Security/CWE-798/ModificationOfParameterWithDefault query-tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-22 14:55:19 +02:00
yoff
5fb75ac987 Python: simplify decorator-detection predicates to pure AST match
The internal predicates that identify `@staticmethod`, `@classmethod` and
`@property` decorators previously required the decorator's `NameNode` to
satisfy `isGlobal()` (i.e. no SSA def reaches the decorator's name use).
That filter was correct but unnecessarily indirect: these three names
are builtins, and even when a class body redefines one, the class body
has not started executing at the decorator position, so Python uses the
builtin.

Match the decorator's AST `Name` directly instead, dropping the CFG/SSA
detour. The slight semantic change — `isGlobal()` would have rejected
module-level shadowing of these builtins — is negligible in practice
and explicitly documented in the change note.

`hasContextmanagerDecorator` and `hasOverloadDecorator` keep the
`NameNode.isGlobal()` check because their target names (`contextmanager`,
`overload`) are imported, not builtin, and local shadowing is a real
concern.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-01 14:04:43 +00:00
Taus
ac48eca916 Python: Use cls.getMethod instead of getName 2026-03-23 15:26:00 +00:00
Taus
93e35661e6 Python: Make isNewType more precise
For module-level metaclass declarations, we now also check that the
right hand side in a `__metaclass__ = type` assignment is in fact the
built-in `type`.
2026-03-23 15:22:24 +00:00
Taus
a276f721f7 Python: Add ternary overridesMethod
This one also allows easy access to the method being overridden and the
class on which it resides. This let's us simplify DocStrings.ql
accordingly.
2026-03-23 15:21:27 +00:00
Taus
56c83e250e Python: Make comment more precise
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-03-23 15:09:27 +01:00
Taus
50b3b7ee1f Python: Add DuckTyping::hasUnreliableMro
Primarily used to filter out false positives in cases where our MRO
approximation may be wrong.
2026-03-20 13:30:29 +00:00
Taus
c04b615a07 Python: Extend DuckTyping module
Adds `overridesMethod` and `isPropertyAccessor`.
2026-03-20 13:28:45 +00:00
Taus
b57e92164c Python: Add declares/getAttribute API
These could arguably be moved to `Class` itself, but for now I'm
choosing to limit the changes to the `DuckTyping` module (until we
decide on a proper API).
2026-03-20 13:28:45 +00:00
Taus
cd92162920 Python: Add DuckTyping::isNewStyle
Approximates the behaviour of `Types::isNewStyle` but without depending
on points-to
2026-03-20 13:28:45 +00:00
Taus
33ed6034f6 Python: Introduce DuckTyping module
This module (which for convenience currently resides inside
`DataFlowDispatch`, but this may change later) contains convenience
predicates for bridging the gap between the data-flow layer and the old
points-to analysis.
2026-03-20 13:28:44 +00:00
Taus
f2bad1e6e1 Python: Improve docstring and make predicate private 2026-03-09 13:41:38 +00:00
Taus
fa61f6f3df Python: Model @typing.overload in method resolution
Adds `hasOverloadDecorator` as a predicate on functions. It looks for
decorators called `overload` or `something.overload` (usually
`typing.overload` or `t.overload`). These are then filtered out in the
predicates that (approximate) resolving methods according to the MRO.

As the test introduced in the previous commit shows, this removes the
spurious resolutions we had before.
2026-03-05 22:20:03 +00:00
Taus
306d7d1b5d Python: DataFlowDispatch.qll annotations 2026-02-16 13:48:32 +00:00
Taus
3f718123a6 Python: Make capturing closure arguments synthetic and non-global
Uses the same trick as for `ExtractedArgumentNode`, wherein we postpone
the global restriction on the charpred to instead be in the `argumentOf`
predicate (which is global anyway).

In addition to this, we also converted `CapturedVariablesArgumentNode`
into a proper synthetic node, and added an explicit post-update node for
it. These nodes just act as wrappers for the function part of call
nodes. Thus, to make them work with the variable capture machinery, we
simply map them to the closure node for the corresponding control-flow
or post-update node.
2026-01-30 12:50:25 +00:00
Joe Farebrother
d0daacd17e Modernize multple calls to init/del 2025-09-01 14:10:22 +01:00
Taus
d1cf7f0624 Python: Support type annotations in call graph
Adds support for tracking instances via type annotations. Also adds a
convenience method to the newly added `Annotation` class,
`getAnnotatedExpression`, that returns the expression that is annotated
with the given type. For return annotations this is any value returned
from the annotated function in question.

Co-authored-by: Napalys Klicius <napalys@github.com>
2025-07-11 12:03:14 +00:00
Rasmus Lerchedahl Petersen
a4c1a622b7 Merge branch 'main' of https://github.com/github/codeql into python/add-comprehension-capture-flow 2024-10-04 14:53:03 +02:00
Rasmus Lerchedahl Petersen
201c4aad13 Python: add comment 2024-10-04 14:09:33 +02:00
yoff
2b6aab108d Update python/ql/lib/semmle/python/dataflow/new/internal/DataFlowDispatch.qll
Co-authored-by: Taus <tausbn@github.com>
2024-10-01 12:36:20 +02:00
Rasmus Lerchedahl Petersen
9357762e06 Python: remove superflous code
This is handled by parameter-argument matching
2024-10-01 00:03:04 +02:00
Rasmus Lerchedahl Petersen
dacc0ab8fe Python: docs and a simplification 2024-09-30 16:06:30 +02:00
Rasmus Lerchedahl Petersen
ded39749a7 Python: allow comp arg as argumentnode 2024-09-30 13:02:20 +02:00
Rasmus Lerchedahl Petersen
3ef05a628f Python: add location to node 2024-09-30 11:56:36 +02:00
Rasmus Lerchedahl Petersen
310819d392 Python: fix dataflow inconsistencies
- adjust scope of argument, the argument is outside the called function
- add missing post-update nodes for the new arguments
2024-09-30 10:31:36 +02:00
Rasmus Lerchedahl Petersen
d4ea62edec Python: flow through yield
- add yield as a dataflow return
- replace comprehension store step
   with a store step to the yield
2024-09-30 09:01:29 +02:00
Rasmus Lerchedahl Petersen
72530a8312 Python: use synthetic node for comprehension capture argument
We used to use the CfgNode for the comprehension itself.
In cases where that is also an argument, say
```python
",".join([x for x in l])
```
that would be an argument to two different calls causing a dataflow consistency violation.
2024-09-27 12:15:03 +02:00
Rasmus Lerchedahl Petersen
294092b671 Python: use comprehension function argument
For a comprehension `[x for x in l]
- `l` is now a legal argument (in DataFlowPublic)
- `l` is the argument of the comprehension function (in DataFlowDispatch)
- the parameter of the comprehension function is being read rather than `l` (in IterableUnpacking)
Thus the read that used to cross callable boundaries is now split into a arg-param edge and a read from that param.
2024-09-27 09:44:39 +02:00
Rasmus Lerchedahl Petersen
fc2dc28f87 python: capture flow through comprehensions
- add comprehension functions as `DataFlowCallable`s
- add comprehension call as `DataFlowCall`
- create capture argument node for comprehension calls
2024-09-25 10:02:31 +02:00
Anders Schack-Mulligen
da5abc8321 Dataflow: Replace MakeSets with QlBuiltins::InternSets. 2024-07-15 13:35:57 +02:00
Taus
6db7e72fb8 Python: Fix bad join in DataFlowDispatch
A case of bad magic. Rather than evaluating separately whether a class
has a method of some name, the compiler opted to magick in the fact
that this was done as part of the `findFunctionAccordingToMro`
predicate. Hilarity ensued.

However, _we_ know that magic really isn't needed in this case (the
number of results is bounded by `Class.getAMethod` since methods have
only a single name), so by factoring it out into a helper predicate, we
can help the join-orderer along.

Before
```
(377s) Starting to evaluate predicate _DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev_DataFlowDispatch::getNextClassInMro/1#__#shared/3@i6#L3#f893bw2h (iteration 6)
(377s) Tuple counts for _DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev_DataFlowDispatch::getNextClassInMro/1#__#shared/3@i6#L3#f893bw2h after 16ms:
33363  ~0%     {2} r1 = SCAN `DataFlowDispatch::getNextClassInMro/1#e1ee596a#prev_delta` OUTPUT In.1, In.0 'arg1'
159696 ~4%     {3}    | JOIN WITH `DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev` ON FIRST 1 OUTPUT Rhs.1 'arg0', Lhs.1 'arg1', Rhs.2 'arg2'
               return r1
(377s) Starting to evaluate predicate _Class::Class.getAMethod/0#dispred#66416e47_Function::Function.getName/0#dispred#033700ef_10#join_rh__#antijoin_rhs/3@i6#L4#f893bw2h (iteration 6)
(382s) Tuple counts for _Class::Class.getAMethod/0#dispred#66416e47_Function::Function.getName/0#dispred#033700ef_10#join_rh__#antijoin_rhs/3@i6#L4#f893bw2h after 4.4s:
1770825904 ~4%     {4} r1 = JOIN `_DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev_DataFlowDispatch::getNextClassInMro/1#__#shared` WITH `Function::Function.getName/0#dispred#033700ef_10#join_rhs` ON FIRST 1 OUTPUT Lhs.1 'arg0', Rhs.1, Lhs.0 'arg1', Lhs.2 'arg2'
34558      ~3%     {3}    | JOIN WITH `Class::Class.getAMethod/0#dispred#66416e47` ON FIRST 2 OUTPUT Lhs.0 'arg0', Lhs.2 'arg1', Lhs.3 'arg2'
                   return r1
...
(382s) Starting to evaluate predicate DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3/3@i6#f893b1xh (iteration 6)
(382s)                     - DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3_delta has 125138 rows (order for disjuncts: delta=<standard>).
(382s) Tuple counts for DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3/3@i6#f893b1xh after 12ms:
33363  ~0%     {2} r1 = SCAN `DataFlowDispatch::getNextClassInMro/1#e1ee596a#prev_delta` OUTPUT In.1, In.0 'cls'
159696 ~0%     {3}    | JOIN WITH `DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev` ON FIRST 1 OUTPUT Lhs.1 'cls', Rhs.1 'name', Rhs.2 'result'
125138 ~1%     {3}    | AND NOT `_Class::Class.getAMethod/0#dispred#66416e47_Function::Function.getName/0#dispred#033700ef_10#join_rh__#antijoin_rhs`(FIRST 3)

0      ~0%     {3} r2 = JOIN `DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev_delta` WITH `DataFlowDispatch::getNextClassInMro/1#e1ee596a#reorder_1_0#prev` ON FIRST 1 OUTPUT Lhs.1 'name', Lhs.2 'result', Rhs.1 'cls'
               {3}    | AND NOT `_Class::Class.getAMethod/0#dispred#66416e47_DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#__#antijoin_rhs`(FIRST 3)
0      ~0%     {3}    | SCAN OUTPUT In.2 'cls', In.0 'name', In.1 'result'

125138 ~1%     {3} r3 = r1 UNION r2
125138 ~1%     {3}    | AND NOT `DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev`(FIRST 3)
               return r3
```

And now
```
(18s) Tuple counts for DataFlowDispatch::class_has_method/2#0d2ae9c0/2@ff66c1lr after 18ms:
202279 ~1%     {2} r1 = JOIN `Class::Class.getAMethod/0#dispred#66416e47_10#join_rhs` WITH `Function::Function.getName/0#dispred#033700ef` ON FIRST 1 OUTPUT Lhs.1 'cls', Rhs.1 'name'
               return r1
...
(490s) Tuple counts for DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3/3@i6#48b6c1xi after 54ms:
0      ~0%     {3} r1 = JOIN `DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev_delta` WITH `DataFlowDispatch::getNextClassInMro/1#e1ee596a#reorder_1_0#prev` ON FIRST 1 OUTPUT Rhs.1 'cls', Lhs.1 'name', Lhs.2 'result'
0      ~0%     {3}    | AND NOT `DataFlowDispatch::class_has_method/2#0d2ae9c0`(FIRST 2)

33363  ~0%     {2} r2 = SCAN `DataFlowDispatch::getNextClassInMro/1#e1ee596a#prev_delta` OUTPUT In.1, In.0 'cls'
159696 ~0%     {3}    | JOIN WITH `DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev` ON FIRST 1 OUTPUT Lhs.1 'cls', Rhs.1 'name', Rhs.2 'result'
125138 ~1%     {3}    | AND NOT `DataFlowDispatch::class_has_method/2#0d2ae9c0`(FIRST 2)

125138 ~1%     {3} r3 = r1 UNION r2
125138 ~1%     {3}    | AND NOT `DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev`(FIRST 3)
               return r3
```
2024-06-21 12:16:27 +00:00
Anders Schack-Mulligen
1432519cc2 Dataflow: Add totalorder predicates to all languages. 2024-05-27 11:01:52 +02:00
Rasmus Wriedt Larsen
93f940aa9c Python: Join-order improvement for DataFlowDispatch::TrackAttrReadInput
I was surprised to see that this predicate actually gets evaluated 3 times

- Pipeline standard for DataFlowDispatch::TrackAttrReadInput::start/2#67f26627@c15596yu was evaluated in 74 iterations totaling 165ms (delta sizes total: 113119).
- Pipeline standard for DataFlowDispatch::TrackAttrReadInput::start/2#67f26627@3459ejws was evaluated in 30 iterations totaling 76ms (delta sizes total: 32555).
- Pipeline standard for DataFlowDispatch::TrackAttrReadInput::start/2#67f26627@5ac22jwq was evaluated in 30 iterations totaling 108ms (delta sizes total: 32555).

It does however fit with it being used in exactly 3 places: https://github.com/search?q=repo%3Agithub%2Fcodeql+%2FattrReadTracker%5C%28%2F&type=code -- so I assume it's because each use forces a new evaluation. Although that's something we could look into solving, for now I'm just trying to fix the join-order.

Initial

```
Pipeline standard for DataFlowDispatch::TrackAttrReadInput::start/2#67f26627@3459ejws was evaluated in 30 iterations totaling 76ms (delta sizes total: 32555).
        7068090   ~0%    {2} r1 = SCAN Attributes::AttrRead#class#f6c3f431 OUTPUT In.0, In.0
                         {2}    | AND NOT `DataFlowDispatch::TrackAttrReadInput::start/2#67f26627#prev`(FIRST 2)
        3901178   ~5%    {2}    | SCAN OUTPUT In.1, In.1
        3901178   ~0%    {3}    | JOIN WITH `Attributes::AttrRef.getObject/0#dispred#d7cd0a97` ON FIRST 1 OUTPUT Rhs.1, Lhs.0, Lhs.1

          13615   ~1%    {2} r2 = JOIN r1 WITH `DataFlowDispatch::classTracker/1#d11f2237#reorder_1_0#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2

             94   ~2%    {2} r3 = JOIN r1 WITH `DataFlowDispatch::superCallTwoArgumentTracker/2#d18be99f#reorder_2_0_1#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2

          18846   ~1%    {2} r4 = JOIN r1 WITH `DataFlowDispatch::classInstanceTracker/1#d73ecef4#prev_delta_1#join_rhs` ON FIRST 1 OUTPUT Lhs.1, Lhs.2

          32555   ~1%    {2} r5 = r2 UNION r3 UNION r4
                         return r5
```

==>

```
Pipeline standard for DataFlowDispatch::TrackAttrReadInput::start/2#67f26627@f2517jwq was evaluated in 30 iterations totaling 12ms (delta sizes total: 32704).
        186719  ~121%    {1} r1 = SCAN `DataFlowDispatch::classInstanceTracker/1#d73ecef4#prev_delta` OUTPUT In.1

        164342  ~158%    {1} r2 = SCAN `DataFlowDispatch::classTracker/1#d11f2237#reorder_1_0#prev_delta` OUTPUT In.0

            96    ~0%    {1} r3 = SCAN `DataFlowDispatch::superCallTwoArgumentTracker/2#d18be99f#reorder_2_0_1#prev_delta` OUTPUT In.0

        351157   ~80%    {1} r4 = r1 UNION r2 UNION r3
         88074   ~14%    {1}    | JOIN WITH `Attributes::AttrRef.getObject/0#dispred#d7cd0a97_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1
         41789   ~18%    {2}    | JOIN WITH Attributes::AttrRead#class#f6c3f431 ON FIRST 1 OUTPUT Lhs.0, Lhs.0
                         {2}    | AND NOT `DataFlowDispatch::TrackAttrReadInput::start/2#67f26627#prev`(FIRST 2)
         32883    ~2%    {2}    | SCAN OUTPUT In.1, In.1
                         return r4
```

AND

initial

```
Pipeline standard for DataFlowDispatch::TrackAttrReadInput::start/2#67f26627@c15596yu was evaluated in 74 iterations totaling 165ms (delta sizes total: 113119).
        17434622   ~0%    {2} r1 = SCAN Attributes::AttrRead#class#f6c3f431 OUTPUT In.0, In.0
                          {2}    | AND NOT `DataFlowDispatch::TrackAttrReadInput::start/2#67f26627#prev`(FIRST 2)
         9483976   ~4%    {2}    | SCAN OUTPUT In.1, In.1
         9483976   ~0%    {3}    | JOIN WITH `Attributes::AttrRef.getObject/0#dispred#d7cd0a97` ON FIRST 1 OUTPUT Rhs.1, Lhs.0, Lhs.1

           19258   ~1%    {2} r2 = JOIN r1 WITH `DataFlowDispatch::classInstanceTracker/1#d73ecef4#reorder_1_0#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2

            1654   ~1%    {2} r3 = JOIN r1 WITH `DataFlowDispatch::superCallNoArgumentTracker/1#0a2e8a06#reorder_1_0#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2

            1314   ~4%    {2} r4 = JOIN r1 WITH `DataFlowDispatch::clsArgumentTracker/1#47339327#reorder_1_0#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2

              94   ~2%    {2} r5 = JOIN r1 WITH `DataFlowDispatch::superCallTwoArgumentTracker/2#d18be99f#reorder_2_0_1#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2

           77217   ~0%    {2} r6 = JOIN r1 WITH `DataFlowDispatch::selfTracker/1#f157aa27#reorder_1_0#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2

           13632   ~1%    {2} r7 = JOIN r1 WITH `DataFlowDispatch::classTracker/1#d11f2237#reorder_1_0#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2

          113169   ~0%    {2} r8 = r2 UNION r3 UNION r4 UNION r5 UNION r6 UNION r7
                          return r8
```
==>

```
Pipeline standard for DataFlowDispatch::TrackAttrReadInput::start/2#67f26627@d732e6yt was evaluated in 74 iterations totaling 31ms (delta sizes total: 113129).
        186719  ~150%    {1} r1 = SCAN `DataFlowDispatch::classInstanceTracker/1#d73ecef4#reorder_1_0#prev_delta` OUTPUT In.0

          1669    ~0%    {1} r2 = SCAN `DataFlowDispatch::superCallNoArgumentTracker/1#0a2e8a06#reorder_1_0#prev_delta` OUTPUT In.0

          3425   ~15%    {1} r3 = SCAN `DataFlowDispatch::clsArgumentTracker/1#47339327#prev_delta` OUTPUT In.1

            96    ~0%    {1} r4 = SCAN `DataFlowDispatch::superCallTwoArgumentTracker/2#d18be99f#reorder_2_0_1#prev_delta` OUTPUT In.0

        123310    ~0%    {1} r5 = SCAN `DataFlowDispatch::selfTracker/1#f157aa27#reorder_1_0#prev_delta` OUTPUT In.0

        164342  ~581%    {1} r6 = SCAN `DataFlowDispatch::classTracker/1#d11f2237#reorder_1_0#prev_delta` OUTPUT In.0

        479561   ~94%    {1} r7 = r1 UNION r2 UNION r3 UNION r4 UNION r5 UNION r6
        169424    ~2%    {1}    | JOIN WITH `Attributes::AttrRef.getObject/0#dispred#d7cd0a97_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1
        116290    ~0%    {2}    | JOIN WITH Attributes::AttrRead#class#f6c3f431 ON FIRST 1 OUTPUT Lhs.0, Lhs.0
                         {2}    | AND NOT `DataFlowDispatch::TrackAttrReadInput::start/2#67f26627#prev`(FIRST 2)
        113160    ~0%    {2}    | SCAN OUTPUT In.1, In.1
                         return r7
```
2024-03-21 15:55:58 +01:00
Rasmus Wriedt Larsen
cff63ad5d5 Python: Fix small join-order problem for call-graph
problem is:
```
           14294  ~33%    {1} r23 = r21 UNION r22
           13626   ~0%    {2}    | JOIN WITH `DataFlowPublic::Node.getEnclosingCallable/0#dispred#be95825a` ON FIRST 1 OUTPUT Rhs.1, Lhs.0
        11871493   ~2%    {2}    | JOIN WITH `DataFlowPublic::Node.getEnclosingCallable/0#dispred#be95825a_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1, Lhs.1
         6810938   ~3%    {2}    | JOIN WITH num#DataFlowPublic::TCfgNode#2cd2fb22_10#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1
               0   ~0%    {4}    | JOIN WITH `DataFlowDispatch::resolveMethodCall/4#3067f1f1#reorder_0_3_1_2#prev` ON FIRST 2 OUTPUT Rhs.3, Lhs.1, Lhs.0, Rhs.2
               0   ~0%    {4}    | JOIN WITH num#DataFlowDispatch::CallTypeClassMethod#3508c3e5 ON FIRST 1 OUTPUT Lhs.3, Lhs.2, Lhs.0, Lhs.1
               0   ~0%    {4}    | JOIN WITH `DataFlowDispatch::resolveCall/3#454c02d8#reorder_1_0_2#prev` ON FIRST 3 OUTPUT Lhs.3, Lhs.1, Lhs.0, Lhs.2
               0   ~0%    {5}    | JOIN WITH num#DataFlowDispatch::TSelfArgumentPosition#de6d64b8 CARTESIAN PRODUCT OUTPUT Lhs.1, Lhs.2, Lhs.3, Lhs.0, Rhs.0
```
that is, it does cartesian product of DataFlowPublic::Node.getEnclosingCallable

After fix

```
        14294  ~33%    {1} r23 = r21 UNION r22
            0   ~0%    {4}    | JOIN WITH `DataFlowDispatch::resolveMethodCall/4#3067f1f1#reorder_3_0_1_2#prev` ON FIRST 1 OUTPUT Rhs.3, Lhs.0, Rhs.1, Rhs.2
            0   ~0%    {4}    | JOIN WITH num#DataFlowDispatch::CallTypeClassMethod#3508c3e5 ON FIRST 1 OUTPUT Lhs.3, Lhs.2, Lhs.0, Lhs.1
            0   ~0%    {4}    | JOIN WITH `DataFlowDispatch::resolveCall/3#454c02d8#reorder_1_0_2#prev` ON FIRST 3 OUTPUT Lhs.1, Lhs.3, Lhs.0, Lhs.2
            0   ~0%    {5}    | JOIN WITH num#DataFlowPublic::TCfgNode#2cd2fb22 ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.0, Lhs.2, Lhs.3
            0   ~0%    {5}    | JOIN WITH `DataFlowPublic::Node.getEnclosingCallable/0#dispred#be95825a` ON FIRST 1 OUTPUT Lhs.1, Rhs.1, Lhs.2, Lhs.3, Lhs.4
            0   ~0%    {4}    | JOIN WITH `DataFlowPublic::Node.getEnclosingCallable/0#dispred#be95825a` ON FIRST 2 OUTPUT Lhs.0, Lhs.2, Lhs.3, Lhs.4
            0   ~0%    {5}    | JOIN WITH num#DataFlowDispatch::TSelfArgumentPosition#de6d64b8 CARTESIAN PRODUCT OUTPUT Lhs.1, Lhs.2, Lhs.3, Lhs.0, Rhs.0
```

Overall stats

(old)
Pipeline standard for DataFlowDispatch::getCallArg/5#21589076@b30c7vxg was evaluated in 51 iterations totaling 54ms (delta sizes total: 38247).

==>

(new)
Pipeline standard for DataFlowDispatch::getCallArg/5#21589076@c1559vxu was evaluated in 51 iterations totaling 28ms (delta sizes total: 38247).
2024-03-21 12:31:58 +01:00
Tom Hvitved
fc55567d90 Merge pull request #15853 from hvitved/dataflow/get-location
Data flow: Replace `hasLocationInfo` with `getLocation`
2024-03-18 20:21:46 +01:00
Tom Hvitved
6c0ed28e6b Python: Implement new data flow interface 2024-03-13 14:41:57 +01:00
Rasmus Lerchedahl Petersen
580e68d5de python: add support for lower bound position 2024-02-09 13:51:16 +01:00
Rasmus Lerchedahl Petersen
215b146f06 Python: remove unused member predicate 2023-12-20 14:45:00 +01:00
Rasmus Lerchedahl Petersen
3cea46fe7b Python: fix typos 2023-12-20 14:35:10 +01:00
yoff
a60c52b8b7 Merge branch 'main' into python/captured-variables-basic 2023-12-18 23:44:46 +01:00
Rasmus Lerchedahl Petersen
6e4011d2ae Python: rename sythetic nodes
Avoid the term "closure" as it is somewhat academic.
2023-12-18 23:16:51 +01:00
Rasmus Lerchedahl Petersen
25c83dc70d Python: adjust comment 2023-12-18 22:15:37 +01:00
Rasmus Lerchedahl Petersen
c88d686ce4 Python: move SynthCapturePostUpdateNode
next to `SynthCaptureNode`
2023-12-18 21:37:52 +01:00
Rasmus Lerchedahl Petersen
661ba1ca7b Python: move restriction into branch predicate
Otherwise we get loads of nodes with missing locations
from the brnach nodes that are not matched.
2023-12-16 00:33:11 +01:00
Rasmus Lerchedahl Petersen
4a1fcde649 Python: abandon synthetic node
for `CapturingClosureArgumentNode`.

Unless we define it for every single `CallNode`, we need a more
sophisticated mutual recursion with the call graph construction.
There is built-in support for that, but we are currently not using it.
2023-12-15 23:42:29 +01:00
Rasmus Lerchedahl Petersen
1ee11ae7af Merge branch 'main' of https://github.com/github/codeql into python/captured-variables-basic 2023-12-15 14:31:57 +01:00
yoff
4b89a412c6 Update python/ql/lib/semmle/python/dataflow/new/internal/DataFlowDispatch.qll
Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
2023-12-15 12:59:01 +01:00
Rasmus Lerchedahl Petersen
d3b237bf7e Python: rename synthetic lambda nodes 2023-12-15 12:55:26 +01:00
Rasmus Lerchedahl Petersen
f96c52ed3b Python: make compile again
also improve comment
2023-12-15 10:25:49 +01:00