I was surprised to see that this predicate actually gets evaluated 3 times
- Pipeline standard for DataFlowDispatch::TrackAttrReadInput::start/2#67f26627@c15596yu was evaluated in 74 iterations totaling 165ms (delta sizes total: 113119).
- Pipeline standard for DataFlowDispatch::TrackAttrReadInput::start/2#67f26627@3459ejws was evaluated in 30 iterations totaling 76ms (delta sizes total: 32555).
- Pipeline standard for DataFlowDispatch::TrackAttrReadInput::start/2#67f26627@5ac22jwq was evaluated in 30 iterations totaling 108ms (delta sizes total: 32555).
It does however fit with it being used in exactly 3 places: https://github.com/search?q=repo%3Agithub%2Fcodeql+%2FattrReadTracker%5C%28%2F&type=code -- so I assume it's because each use forces a new evaluation. Although that's something we could look into solving, for now I'm just trying to fix the join-order.
Initial
```
Pipeline standard for DataFlowDispatch::TrackAttrReadInput::start/2#67f26627@3459ejws was evaluated in 30 iterations totaling 76ms (delta sizes total: 32555).
7068090 ~0% {2} r1 = SCAN Attributes::AttrRead#class#f6c3f431 OUTPUT In.0, In.0
{2} | AND NOT `DataFlowDispatch::TrackAttrReadInput::start/2#67f26627#prev`(FIRST 2)
3901178 ~5% {2} | SCAN OUTPUT In.1, In.1
3901178 ~0% {3} | JOIN WITH `Attributes::AttrRef.getObject/0#dispred#d7cd0a97` ON FIRST 1 OUTPUT Rhs.1, Lhs.0, Lhs.1
13615 ~1% {2} r2 = JOIN r1 WITH `DataFlowDispatch::classTracker/1#d11f2237#reorder_1_0#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2
94 ~2% {2} r3 = JOIN r1 WITH `DataFlowDispatch::superCallTwoArgumentTracker/2#d18be99f#reorder_2_0_1#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2
18846 ~1% {2} r4 = JOIN r1 WITH `DataFlowDispatch::classInstanceTracker/1#d73ecef4#prev_delta_1#join_rhs` ON FIRST 1 OUTPUT Lhs.1, Lhs.2
32555 ~1% {2} r5 = r2 UNION r3 UNION r4
return r5
```
==>
```
Pipeline standard for DataFlowDispatch::TrackAttrReadInput::start/2#67f26627@f2517jwq was evaluated in 30 iterations totaling 12ms (delta sizes total: 32704).
186719 ~121% {1} r1 = SCAN `DataFlowDispatch::classInstanceTracker/1#d73ecef4#prev_delta` OUTPUT In.1
164342 ~158% {1} r2 = SCAN `DataFlowDispatch::classTracker/1#d11f2237#reorder_1_0#prev_delta` OUTPUT In.0
96 ~0% {1} r3 = SCAN `DataFlowDispatch::superCallTwoArgumentTracker/2#d18be99f#reorder_2_0_1#prev_delta` OUTPUT In.0
351157 ~80% {1} r4 = r1 UNION r2 UNION r3
88074 ~14% {1} | JOIN WITH `Attributes::AttrRef.getObject/0#dispred#d7cd0a97_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1
41789 ~18% {2} | JOIN WITH Attributes::AttrRead#class#f6c3f431 ON FIRST 1 OUTPUT Lhs.0, Lhs.0
{2} | AND NOT `DataFlowDispatch::TrackAttrReadInput::start/2#67f26627#prev`(FIRST 2)
32883 ~2% {2} | SCAN OUTPUT In.1, In.1
return r4
```
AND
initial
```
Pipeline standard for DataFlowDispatch::TrackAttrReadInput::start/2#67f26627@c15596yu was evaluated in 74 iterations totaling 165ms (delta sizes total: 113119).
17434622 ~0% {2} r1 = SCAN Attributes::AttrRead#class#f6c3f431 OUTPUT In.0, In.0
{2} | AND NOT `DataFlowDispatch::TrackAttrReadInput::start/2#67f26627#prev`(FIRST 2)
9483976 ~4% {2} | SCAN OUTPUT In.1, In.1
9483976 ~0% {3} | JOIN WITH `Attributes::AttrRef.getObject/0#dispred#d7cd0a97` ON FIRST 1 OUTPUT Rhs.1, Lhs.0, Lhs.1
19258 ~1% {2} r2 = JOIN r1 WITH `DataFlowDispatch::classInstanceTracker/1#d73ecef4#reorder_1_0#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2
1654 ~1% {2} r3 = JOIN r1 WITH `DataFlowDispatch::superCallNoArgumentTracker/1#0a2e8a06#reorder_1_0#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2
1314 ~4% {2} r4 = JOIN r1 WITH `DataFlowDispatch::clsArgumentTracker/1#47339327#reorder_1_0#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2
94 ~2% {2} r5 = JOIN r1 WITH `DataFlowDispatch::superCallTwoArgumentTracker/2#d18be99f#reorder_2_0_1#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2
77217 ~0% {2} r6 = JOIN r1 WITH `DataFlowDispatch::selfTracker/1#f157aa27#reorder_1_0#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2
13632 ~1% {2} r7 = JOIN r1 WITH `DataFlowDispatch::classTracker/1#d11f2237#reorder_1_0#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2
113169 ~0% {2} r8 = r2 UNION r3 UNION r4 UNION r5 UNION r6 UNION r7
return r8
```
==>
```
Pipeline standard for DataFlowDispatch::TrackAttrReadInput::start/2#67f26627@d732e6yt was evaluated in 74 iterations totaling 31ms (delta sizes total: 113129).
186719 ~150% {1} r1 = SCAN `DataFlowDispatch::classInstanceTracker/1#d73ecef4#reorder_1_0#prev_delta` OUTPUT In.0
1669 ~0% {1} r2 = SCAN `DataFlowDispatch::superCallNoArgumentTracker/1#0a2e8a06#reorder_1_0#prev_delta` OUTPUT In.0
3425 ~15% {1} r3 = SCAN `DataFlowDispatch::clsArgumentTracker/1#47339327#prev_delta` OUTPUT In.1
96 ~0% {1} r4 = SCAN `DataFlowDispatch::superCallTwoArgumentTracker/2#d18be99f#reorder_2_0_1#prev_delta` OUTPUT In.0
123310 ~0% {1} r5 = SCAN `DataFlowDispatch::selfTracker/1#f157aa27#reorder_1_0#prev_delta` OUTPUT In.0
164342 ~581% {1} r6 = SCAN `DataFlowDispatch::classTracker/1#d11f2237#reorder_1_0#prev_delta` OUTPUT In.0
479561 ~94% {1} r7 = r1 UNION r2 UNION r3 UNION r4 UNION r5 UNION r6
169424 ~2% {1} | JOIN WITH `Attributes::AttrRef.getObject/0#dispred#d7cd0a97_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1
116290 ~0% {2} | JOIN WITH Attributes::AttrRead#class#f6c3f431 ON FIRST 1 OUTPUT Lhs.0, Lhs.0
{2} | AND NOT `DataFlowDispatch::TrackAttrReadInput::start/2#67f26627#prev`(FIRST 2)
113160 ~0% {2} | SCAN OUTPUT In.1, In.1
return r7
```
problem is:
```
14294 ~33% {1} r23 = r21 UNION r22
13626 ~0% {2} | JOIN WITH `DataFlowPublic::Node.getEnclosingCallable/0#dispred#be95825a` ON FIRST 1 OUTPUT Rhs.1, Lhs.0
11871493 ~2% {2} | JOIN WITH `DataFlowPublic::Node.getEnclosingCallable/0#dispred#be95825a_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1, Lhs.1
6810938 ~3% {2} | JOIN WITH num#DataFlowPublic::TCfgNode#2cd2fb22_10#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1
0 ~0% {4} | JOIN WITH `DataFlowDispatch::resolveMethodCall/4#3067f1f1#reorder_0_3_1_2#prev` ON FIRST 2 OUTPUT Rhs.3, Lhs.1, Lhs.0, Rhs.2
0 ~0% {4} | JOIN WITH num#DataFlowDispatch::CallTypeClassMethod#3508c3e5 ON FIRST 1 OUTPUT Lhs.3, Lhs.2, Lhs.0, Lhs.1
0 ~0% {4} | JOIN WITH `DataFlowDispatch::resolveCall/3#454c02d8#reorder_1_0_2#prev` ON FIRST 3 OUTPUT Lhs.3, Lhs.1, Lhs.0, Lhs.2
0 ~0% {5} | JOIN WITH num#DataFlowDispatch::TSelfArgumentPosition#de6d64b8 CARTESIAN PRODUCT OUTPUT Lhs.1, Lhs.2, Lhs.3, Lhs.0, Rhs.0
```
that is, it does cartesian product of DataFlowPublic::Node.getEnclosingCallable
After fix
```
14294 ~33% {1} r23 = r21 UNION r22
0 ~0% {4} | JOIN WITH `DataFlowDispatch::resolveMethodCall/4#3067f1f1#reorder_3_0_1_2#prev` ON FIRST 1 OUTPUT Rhs.3, Lhs.0, Rhs.1, Rhs.2
0 ~0% {4} | JOIN WITH num#DataFlowDispatch::CallTypeClassMethod#3508c3e5 ON FIRST 1 OUTPUT Lhs.3, Lhs.2, Lhs.0, Lhs.1
0 ~0% {4} | JOIN WITH `DataFlowDispatch::resolveCall/3#454c02d8#reorder_1_0_2#prev` ON FIRST 3 OUTPUT Lhs.1, Lhs.3, Lhs.0, Lhs.2
0 ~0% {5} | JOIN WITH num#DataFlowPublic::TCfgNode#2cd2fb22 ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.0, Lhs.2, Lhs.3
0 ~0% {5} | JOIN WITH `DataFlowPublic::Node.getEnclosingCallable/0#dispred#be95825a` ON FIRST 1 OUTPUT Lhs.1, Rhs.1, Lhs.2, Lhs.3, Lhs.4
0 ~0% {4} | JOIN WITH `DataFlowPublic::Node.getEnclosingCallable/0#dispred#be95825a` ON FIRST 2 OUTPUT Lhs.0, Lhs.2, Lhs.3, Lhs.4
0 ~0% {5} | JOIN WITH num#DataFlowDispatch::TSelfArgumentPosition#de6d64b8 CARTESIAN PRODUCT OUTPUT Lhs.1, Lhs.2, Lhs.3, Lhs.0, Rhs.0
```
Overall stats
(old)
Pipeline standard for DataFlowDispatch::getCallArg/5#21589076@b30c7vxg was evaluated in 51 iterations totaling 54ms (delta sizes total: 38247).
==>
(new)
Pipeline standard for DataFlowDispatch::getCallArg/5#21589076@c1559vxu was evaluated in 51 iterations totaling 28ms (delta sizes total: 38247).
No major performance impact, more of a learning example for myself (had +3000 join order badness).
Initial tuple counts
```
Evaluated recursive predicate SqlAlchemy::SqlAlchemy::Connection::ConnectionConstruction#45e716e0@594cfx2g in 1ms on iteration 1 (delta size: 4).
Evaluated relational algebra for predicate SqlAlchemy::SqlAlchemy::Connection::ConnectionConstruction#45e716e0@594cfx2g on iteration 1 running pipeline base with tuple counts:
37793 ~0% {3} r1 = JOIN `ApiGraphs::API::Node.getACall/0#dispred#312deb92_10#join_rhs` WITH DataFlowPublic::CallCfgNode#b8ddbf81 ON FIRST 1 OUTPUT Lhs.1, Lhs.0, Rhs.1
0 ~0% {2} | JOIN WITH `SqlAlchemy::SqlAlchemy::Connection::classRef/0#565fc3ad` ON FIRST 1 OUTPUT Lhs.1, Lhs.2
30 ~0% {5} r2 = JOIN DataFlowPublic::CallCfgNode#b8ddbf81 WITH `DataFlowPublic::MethodCallNode.calls/2#dispred#1dd1e0f4#ffb` ON FIRST 1 OUTPUT Lhs.0, Lhs.1, Rhs.1, Rhs.2, _
{4} | REWRITE WITH NOT [NOT [Tmp.4 := "begin", TEST InOut.3 = Tmp.4], NOT [Tmp.4 := "connect", TEST InOut.3 = Tmp.4]] KEEPING 4
21 ~0% {3} | SCAN OUTPUT In.2, In.0, In.1
4 ~0% {2} | JOIN WITH `SqlAlchemy::SqlAlchemy::Engine::instance/0#1828baef` ON FIRST 1 OUTPUT Lhs.1, Lhs.2
4 ~0% {2} r3 = r1 UNION r2
return r3
```
which is fixed by the only_bind_out
```
Evaluated recursive predicate SqlAlchemy::SqlAlchemy::Connection::ConnectionConstruction#45e716e0@49effxtg in 0ms on iteration 1 (delta size: 0).
Evaluated relational algebra for predicate SqlAlchemy::SqlAlchemy::Connection::ConnectionConstruction#45e716e0@49effxtg on iteration 1 running pipeline base with tuple counts:
0 ~0% {1} r1 = JOIN `SqlAlchemy::SqlAlchemy::Connection::classRef/0#565fc3ad` WITH `ApiGraphs::API::Node.getACall/0#dispred#312deb92` ON FIRST 1 OUTPUT Rhs.1
0 ~0% {2} | JOIN WITH DataFlowPublic::CallCfgNode#b8ddbf81 ON FIRST 1 OUTPUT Lhs.0, Rhs.1
return r1
```
We also had this initial problem
```
Evaluated recursive predicate SqlAlchemy::SqlAlchemy::Connection::ConnectionConstruction#45e716e0@594cfx2g in 1ms on iteration 4 (delta size: 0).
Evaluated relational algebra for predicate SqlAlchemy::SqlAlchemy::Connection::ConnectionConstruction#45e716e0@594cfx2g on iteration 4 running pipeline standard with tuple counts:
48722 ~6% {2} r1 = DataFlowPublic::CallCfgNode#b8ddbf81 AND NOT SqlAlchemy::SqlAlchemy::Connection::ConnectionConstruction#45e716e0#prev(FIRST 2)
48722 ~3% {3} r2 = SCAN r1 OUTPUT In.0, _, In.1
48722 ~1% {3} | REWRITE WITH Out.1 := "connect"
16 ~0% {3} | JOIN WITH `DataFlowPublic::MethodCallNode.calls/2#dispred#1dd1e0f4#ffb_021#join_rhs` ON FIRST 2 OUTPUT Rhs.2, Lhs.0, Lhs.2
0 ~0% {2} | JOIN WITH `SqlAlchemy::SqlAlchemy::Connection::instance/0#5ed87c17#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2
48722 ~3% {3} r3 = SCAN r1 OUTPUT In.0, _, In.1
48722 ~2% {3} | REWRITE WITH Out.1 := "execution_options"
9 ~0% {3} | JOIN WITH `DataFlowPublic::MethodCallNode.calls/2#dispred#1dd1e0f4#ffb_021#join_rhs` ON FIRST 2 OUTPUT Rhs.2, Lhs.0, Lhs.2
0 ~0% {2} | JOIN WITH `SqlAlchemy::SqlAlchemy::Connection::instance/0#5ed87c17#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2
0 ~0% {2} r4 = r2 UNION r3
return r4
```
which is fixed by `connectionConstruction_helper`
```
Evaluated recursive predicate SqlAlchemy::SqlAlchemy::Connection::helper/0#62cfc178#b@4f295yef in 1ms on iteration 4 (delta size: 0).
Evaluated relational algebra for predicate SqlAlchemy::SqlAlchemy::Connection::helper/0#62cfc178#b@4f295yef on iteration 4 running pipeline standard with tuple counts:
4 ~0% {1} r1 = JOIN `SqlAlchemy::SqlAlchemy::Connection::instance/1#029b4c87#prev_delta` WITH `TypeTrackingImpl::TypeTracker::end/0#2ac2cfd4` ON FIRST 1 OUTPUT Lhs.1
16 ~0% {1} | JOIN WITH `LocalSources::Cached::hasLocalSource/2#8b3ee0ec_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1
0 ~0% {3} | JOIN WITH `DataFlowPublic::MethodCallNode.calls/2#dispred#1dd1e0f4#ffb_102#join_rhs` ON FIRST 1 OUTPUT Rhs.1, Rhs.2, _
0 ~0% {2} | REWRITE WITH NOT [NOT [Tmp.2 := "connect", TEST InOut.1 = Tmp.2], NOT [Tmp.2 := "execution_options", TEST InOut.1 = Tmp.2]] KEEPING 2
0 ~0% {1} | JOIN WITH DataFlowPublic::CallCfgNode#b8ddbf81 ON FIRST 1 OUTPUT Lhs.0
0 ~0% {1} | AND NOT `SqlAlchemy::SqlAlchemy::Connection::helper/0#62cfc178#b#prev`(FIRST 1)
return r1
```
Two issues:
- Tests relying on existing query machinery (i.e. `import python`) were not resolving
correctly due to a bad `qlpack.yml` file.
- The diagnostics output tests needed an updated import to account for their new location.
This is essentially the contents of `language-packs/python/tools` with some minor
modifications to account for the changed location.
Of note: we explicitly exclude the `recorded-call-graph-metrics` director that
was already present in `python/tools`. When we revisit this directory for some
cleanup (e.g. to get rid of the `lgtm` references), we'll probably want to switch
to an explicit list of sources to include.
We noticed that when
- a function has more than one summary (with different charpred)
- one summary is subsumed by a subpath (or something happens around the function being extracted)
- the function is called multiple times(we needed at least three)
one of the summaries would no longer lead to flow.
We do this to remove the inconsistencies, and to be ready for a future
where type-tracking support content tracker of depth > 1.
It works because targets of loadSteps needs to be LocalSourceNodes
predicate loadStep(Node nodeFrom, LocalSourceNode nodeTo, Content content) {