Uses the same trick as for `ExtractedArgumentNode`, wherein we postpone
the global restriction on the charpred to instead be in the `argumentOf`
predicate (which is global anyway).
In addition to this, we also converted `CapturedVariablesArgumentNode`
into a proper synthetic node, and added an explicit post-update node for
it. These nodes just act as wrappers for the function part of call
nodes. Thus, to make them work with the variable capture machinery, we
simply map them to the closure node for the corresponding control-flow
or post-update node.
Adds support for tracking instances via type annotations. Also adds a
convenience method to the newly added `Annotation` class,
`getAnnotatedExpression`, that returns the expression that is annotated
with the given type. For return annotations this is any value returned
from the annotated function in question.
Co-authored-by: Napalys Klicius <napalys@github.com>
We used to use the CfgNode for the comprehension itself.
In cases where that is also an argument, say
```python
",".join([x for x in l])
```
that would be an argument to two different calls causing a dataflow consistency violation.
For a comprehension `[x for x in l]
- `l` is now a legal argument (in DataFlowPublic)
- `l` is the argument of the comprehension function (in DataFlowDispatch)
- the parameter of the comprehension function is being read rather than `l` (in IterableUnpacking)
Thus the read that used to cross callable boundaries is now split into a arg-param edge and a read from that param.
A case of bad magic. Rather than evaluating separately whether a class
has a method of some name, the compiler opted to magick in the fact
that this was done as part of the `findFunctionAccordingToMro`
predicate. Hilarity ensued.
However, _we_ know that magic really isn't needed in this case (the
number of results is bounded by `Class.getAMethod` since methods have
only a single name), so by factoring it out into a helper predicate, we
can help the join-orderer along.
Before
```
(377s) Starting to evaluate predicate _DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev_DataFlowDispatch::getNextClassInMro/1#__#shared/3@i6#L3#f893bw2h (iteration 6)
(377s) Tuple counts for _DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev_DataFlowDispatch::getNextClassInMro/1#__#shared/3@i6#L3#f893bw2h after 16ms:
33363 ~0% {2} r1 = SCAN `DataFlowDispatch::getNextClassInMro/1#e1ee596a#prev_delta` OUTPUT In.1, In.0 'arg1'
159696 ~4% {3} | JOIN WITH `DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev` ON FIRST 1 OUTPUT Rhs.1 'arg0', Lhs.1 'arg1', Rhs.2 'arg2'
return r1
(377s) Starting to evaluate predicate _Class::Class.getAMethod/0#dispred#66416e47_Function::Function.getName/0#dispred#033700ef_10#join_rh__#antijoin_rhs/3@i6#L4#f893bw2h (iteration 6)
(382s) Tuple counts for _Class::Class.getAMethod/0#dispred#66416e47_Function::Function.getName/0#dispred#033700ef_10#join_rh__#antijoin_rhs/3@i6#L4#f893bw2h after 4.4s:
1770825904 ~4% {4} r1 = JOIN `_DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev_DataFlowDispatch::getNextClassInMro/1#__#shared` WITH `Function::Function.getName/0#dispred#033700ef_10#join_rhs` ON FIRST 1 OUTPUT Lhs.1 'arg0', Rhs.1, Lhs.0 'arg1', Lhs.2 'arg2'
34558 ~3% {3} | JOIN WITH `Class::Class.getAMethod/0#dispred#66416e47` ON FIRST 2 OUTPUT Lhs.0 'arg0', Lhs.2 'arg1', Lhs.3 'arg2'
return r1
...
(382s) Starting to evaluate predicate DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3/3@i6#f893b1xh (iteration 6)
(382s) - DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3_delta has 125138 rows (order for disjuncts: delta=<standard>).
(382s) Tuple counts for DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3/3@i6#f893b1xh after 12ms:
33363 ~0% {2} r1 = SCAN `DataFlowDispatch::getNextClassInMro/1#e1ee596a#prev_delta` OUTPUT In.1, In.0 'cls'
159696 ~0% {3} | JOIN WITH `DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev` ON FIRST 1 OUTPUT Lhs.1 'cls', Rhs.1 'name', Rhs.2 'result'
125138 ~1% {3} | AND NOT `_Class::Class.getAMethod/0#dispred#66416e47_Function::Function.getName/0#dispred#033700ef_10#join_rh__#antijoin_rhs`(FIRST 3)
0 ~0% {3} r2 = JOIN `DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev_delta` WITH `DataFlowDispatch::getNextClassInMro/1#e1ee596a#reorder_1_0#prev` ON FIRST 1 OUTPUT Lhs.1 'name', Lhs.2 'result', Rhs.1 'cls'
{3} | AND NOT `_Class::Class.getAMethod/0#dispred#66416e47_DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#__#antijoin_rhs`(FIRST 3)
0 ~0% {3} | SCAN OUTPUT In.2 'cls', In.0 'name', In.1 'result'
125138 ~1% {3} r3 = r1 UNION r2
125138 ~1% {3} | AND NOT `DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev`(FIRST 3)
return r3
```
And now
```
(18s) Tuple counts for DataFlowDispatch::class_has_method/2#0d2ae9c0/2@ff66c1lr after 18ms:
202279 ~1% {2} r1 = JOIN `Class::Class.getAMethod/0#dispred#66416e47_10#join_rhs` WITH `Function::Function.getName/0#dispred#033700ef` ON FIRST 1 OUTPUT Lhs.1 'cls', Rhs.1 'name'
return r1
...
(490s) Tuple counts for DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3/3@i6#48b6c1xi after 54ms:
0 ~0% {3} r1 = JOIN `DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev_delta` WITH `DataFlowDispatch::getNextClassInMro/1#e1ee596a#reorder_1_0#prev` ON FIRST 1 OUTPUT Rhs.1 'cls', Lhs.1 'name', Lhs.2 'result'
0 ~0% {3} | AND NOT `DataFlowDispatch::class_has_method/2#0d2ae9c0`(FIRST 2)
33363 ~0% {2} r2 = SCAN `DataFlowDispatch::getNextClassInMro/1#e1ee596a#prev_delta` OUTPUT In.1, In.0 'cls'
159696 ~0% {3} | JOIN WITH `DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev` ON FIRST 1 OUTPUT Lhs.1 'cls', Rhs.1 'name', Rhs.2 'result'
125138 ~1% {3} | AND NOT `DataFlowDispatch::class_has_method/2#0d2ae9c0`(FIRST 2)
125138 ~1% {3} r3 = r1 UNION r2
125138 ~1% {3} | AND NOT `DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev`(FIRST 3)
return r3
```
I was surprised to see that this predicate actually gets evaluated 3 times
- Pipeline standard for DataFlowDispatch::TrackAttrReadInput::start/2#67f26627@c15596yu was evaluated in 74 iterations totaling 165ms (delta sizes total: 113119).
- Pipeline standard for DataFlowDispatch::TrackAttrReadInput::start/2#67f26627@3459ejws was evaluated in 30 iterations totaling 76ms (delta sizes total: 32555).
- Pipeline standard for DataFlowDispatch::TrackAttrReadInput::start/2#67f26627@5ac22jwq was evaluated in 30 iterations totaling 108ms (delta sizes total: 32555).
It does however fit with it being used in exactly 3 places: https://github.com/search?q=repo%3Agithub%2Fcodeql+%2FattrReadTracker%5C%28%2F&type=code -- so I assume it's because each use forces a new evaluation. Although that's something we could look into solving, for now I'm just trying to fix the join-order.
Initial
```
Pipeline standard for DataFlowDispatch::TrackAttrReadInput::start/2#67f26627@3459ejws was evaluated in 30 iterations totaling 76ms (delta sizes total: 32555).
7068090 ~0% {2} r1 = SCAN Attributes::AttrRead#class#f6c3f431 OUTPUT In.0, In.0
{2} | AND NOT `DataFlowDispatch::TrackAttrReadInput::start/2#67f26627#prev`(FIRST 2)
3901178 ~5% {2} | SCAN OUTPUT In.1, In.1
3901178 ~0% {3} | JOIN WITH `Attributes::AttrRef.getObject/0#dispred#d7cd0a97` ON FIRST 1 OUTPUT Rhs.1, Lhs.0, Lhs.1
13615 ~1% {2} r2 = JOIN r1 WITH `DataFlowDispatch::classTracker/1#d11f2237#reorder_1_0#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2
94 ~2% {2} r3 = JOIN r1 WITH `DataFlowDispatch::superCallTwoArgumentTracker/2#d18be99f#reorder_2_0_1#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2
18846 ~1% {2} r4 = JOIN r1 WITH `DataFlowDispatch::classInstanceTracker/1#d73ecef4#prev_delta_1#join_rhs` ON FIRST 1 OUTPUT Lhs.1, Lhs.2
32555 ~1% {2} r5 = r2 UNION r3 UNION r4
return r5
```
==>
```
Pipeline standard for DataFlowDispatch::TrackAttrReadInput::start/2#67f26627@f2517jwq was evaluated in 30 iterations totaling 12ms (delta sizes total: 32704).
186719 ~121% {1} r1 = SCAN `DataFlowDispatch::classInstanceTracker/1#d73ecef4#prev_delta` OUTPUT In.1
164342 ~158% {1} r2 = SCAN `DataFlowDispatch::classTracker/1#d11f2237#reorder_1_0#prev_delta` OUTPUT In.0
96 ~0% {1} r3 = SCAN `DataFlowDispatch::superCallTwoArgumentTracker/2#d18be99f#reorder_2_0_1#prev_delta` OUTPUT In.0
351157 ~80% {1} r4 = r1 UNION r2 UNION r3
88074 ~14% {1} | JOIN WITH `Attributes::AttrRef.getObject/0#dispred#d7cd0a97_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1
41789 ~18% {2} | JOIN WITH Attributes::AttrRead#class#f6c3f431 ON FIRST 1 OUTPUT Lhs.0, Lhs.0
{2} | AND NOT `DataFlowDispatch::TrackAttrReadInput::start/2#67f26627#prev`(FIRST 2)
32883 ~2% {2} | SCAN OUTPUT In.1, In.1
return r4
```
AND
initial
```
Pipeline standard for DataFlowDispatch::TrackAttrReadInput::start/2#67f26627@c15596yu was evaluated in 74 iterations totaling 165ms (delta sizes total: 113119).
17434622 ~0% {2} r1 = SCAN Attributes::AttrRead#class#f6c3f431 OUTPUT In.0, In.0
{2} | AND NOT `DataFlowDispatch::TrackAttrReadInput::start/2#67f26627#prev`(FIRST 2)
9483976 ~4% {2} | SCAN OUTPUT In.1, In.1
9483976 ~0% {3} | JOIN WITH `Attributes::AttrRef.getObject/0#dispred#d7cd0a97` ON FIRST 1 OUTPUT Rhs.1, Lhs.0, Lhs.1
19258 ~1% {2} r2 = JOIN r1 WITH `DataFlowDispatch::classInstanceTracker/1#d73ecef4#reorder_1_0#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2
1654 ~1% {2} r3 = JOIN r1 WITH `DataFlowDispatch::superCallNoArgumentTracker/1#0a2e8a06#reorder_1_0#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2
1314 ~4% {2} r4 = JOIN r1 WITH `DataFlowDispatch::clsArgumentTracker/1#47339327#reorder_1_0#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2
94 ~2% {2} r5 = JOIN r1 WITH `DataFlowDispatch::superCallTwoArgumentTracker/2#d18be99f#reorder_2_0_1#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2
77217 ~0% {2} r6 = JOIN r1 WITH `DataFlowDispatch::selfTracker/1#f157aa27#reorder_1_0#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2
13632 ~1% {2} r7 = JOIN r1 WITH `DataFlowDispatch::classTracker/1#d11f2237#reorder_1_0#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2
113169 ~0% {2} r8 = r2 UNION r3 UNION r4 UNION r5 UNION r6 UNION r7
return r8
```
==>
```
Pipeline standard for DataFlowDispatch::TrackAttrReadInput::start/2#67f26627@d732e6yt was evaluated in 74 iterations totaling 31ms (delta sizes total: 113129).
186719 ~150% {1} r1 = SCAN `DataFlowDispatch::classInstanceTracker/1#d73ecef4#reorder_1_0#prev_delta` OUTPUT In.0
1669 ~0% {1} r2 = SCAN `DataFlowDispatch::superCallNoArgumentTracker/1#0a2e8a06#reorder_1_0#prev_delta` OUTPUT In.0
3425 ~15% {1} r3 = SCAN `DataFlowDispatch::clsArgumentTracker/1#47339327#prev_delta` OUTPUT In.1
96 ~0% {1} r4 = SCAN `DataFlowDispatch::superCallTwoArgumentTracker/2#d18be99f#reorder_2_0_1#prev_delta` OUTPUT In.0
123310 ~0% {1} r5 = SCAN `DataFlowDispatch::selfTracker/1#f157aa27#reorder_1_0#prev_delta` OUTPUT In.0
164342 ~581% {1} r6 = SCAN `DataFlowDispatch::classTracker/1#d11f2237#reorder_1_0#prev_delta` OUTPUT In.0
479561 ~94% {1} r7 = r1 UNION r2 UNION r3 UNION r4 UNION r5 UNION r6
169424 ~2% {1} | JOIN WITH `Attributes::AttrRef.getObject/0#dispred#d7cd0a97_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1
116290 ~0% {2} | JOIN WITH Attributes::AttrRead#class#f6c3f431 ON FIRST 1 OUTPUT Lhs.0, Lhs.0
{2} | AND NOT `DataFlowDispatch::TrackAttrReadInput::start/2#67f26627#prev`(FIRST 2)
113160 ~0% {2} | SCAN OUTPUT In.1, In.1
return r7
```
problem is:
```
14294 ~33% {1} r23 = r21 UNION r22
13626 ~0% {2} | JOIN WITH `DataFlowPublic::Node.getEnclosingCallable/0#dispred#be95825a` ON FIRST 1 OUTPUT Rhs.1, Lhs.0
11871493 ~2% {2} | JOIN WITH `DataFlowPublic::Node.getEnclosingCallable/0#dispred#be95825a_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1, Lhs.1
6810938 ~3% {2} | JOIN WITH num#DataFlowPublic::TCfgNode#2cd2fb22_10#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1
0 ~0% {4} | JOIN WITH `DataFlowDispatch::resolveMethodCall/4#3067f1f1#reorder_0_3_1_2#prev` ON FIRST 2 OUTPUT Rhs.3, Lhs.1, Lhs.0, Rhs.2
0 ~0% {4} | JOIN WITH num#DataFlowDispatch::CallTypeClassMethod#3508c3e5 ON FIRST 1 OUTPUT Lhs.3, Lhs.2, Lhs.0, Lhs.1
0 ~0% {4} | JOIN WITH `DataFlowDispatch::resolveCall/3#454c02d8#reorder_1_0_2#prev` ON FIRST 3 OUTPUT Lhs.3, Lhs.1, Lhs.0, Lhs.2
0 ~0% {5} | JOIN WITH num#DataFlowDispatch::TSelfArgumentPosition#de6d64b8 CARTESIAN PRODUCT OUTPUT Lhs.1, Lhs.2, Lhs.3, Lhs.0, Rhs.0
```
that is, it does cartesian product of DataFlowPublic::Node.getEnclosingCallable
After fix
```
14294 ~33% {1} r23 = r21 UNION r22
0 ~0% {4} | JOIN WITH `DataFlowDispatch::resolveMethodCall/4#3067f1f1#reorder_3_0_1_2#prev` ON FIRST 1 OUTPUT Rhs.3, Lhs.0, Rhs.1, Rhs.2
0 ~0% {4} | JOIN WITH num#DataFlowDispatch::CallTypeClassMethod#3508c3e5 ON FIRST 1 OUTPUT Lhs.3, Lhs.2, Lhs.0, Lhs.1
0 ~0% {4} | JOIN WITH `DataFlowDispatch::resolveCall/3#454c02d8#reorder_1_0_2#prev` ON FIRST 3 OUTPUT Lhs.1, Lhs.3, Lhs.0, Lhs.2
0 ~0% {5} | JOIN WITH num#DataFlowPublic::TCfgNode#2cd2fb22 ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.0, Lhs.2, Lhs.3
0 ~0% {5} | JOIN WITH `DataFlowPublic::Node.getEnclosingCallable/0#dispred#be95825a` ON FIRST 1 OUTPUT Lhs.1, Rhs.1, Lhs.2, Lhs.3, Lhs.4
0 ~0% {4} | JOIN WITH `DataFlowPublic::Node.getEnclosingCallable/0#dispred#be95825a` ON FIRST 2 OUTPUT Lhs.0, Lhs.2, Lhs.3, Lhs.4
0 ~0% {5} | JOIN WITH num#DataFlowDispatch::TSelfArgumentPosition#de6d64b8 CARTESIAN PRODUCT OUTPUT Lhs.1, Lhs.2, Lhs.3, Lhs.0, Rhs.0
```
Overall stats
(old)
Pipeline standard for DataFlowDispatch::getCallArg/5#21589076@b30c7vxg was evaluated in 51 iterations totaling 54ms (delta sizes total: 38247).
==>
(new)
Pipeline standard for DataFlowDispatch::getCallArg/5#21589076@c1559vxu was evaluated in 51 iterations totaling 28ms (delta sizes total: 38247).
for `CapturingClosureArgumentNode`.
Unless we define it for every single `CallNode`, we need a more
sophisticated mutual recursion with the call graph construction.
There is built-in support for that, but we are currently not using it.
This provides variable capture in standard situations:
- nested functions
- lambdas
There are some deficiencies:
- we do not yet handle objects capturing variables.
- we do not handle variables captured via the `nonlocal` keyword.
This should be solved at the AST level, though, and then it
should "just work".
There are still inconsistencies in the case where
a `SynthesizedCaptureNode` has a comprehensions
as its enclosing callable. In this case,
`TFunction(cn.getEnclosingCallable())` is not
defined and so getEnclosingCallable does not exist
for the `CaptureNode`.