Before:
```
Tuple counts for Essa::EssaEdgeRefinement::getInput#dispred#f0820431#ff/2@b84afc77 after 20.3s:
873421 ~0% {3} r1 = JOIN Essa::TEssaEdgeDefinition#24e22a14#ffff_31#join_rhs WITH Essa::TEssaEdgeDefinition#24e22a14#ffff_30#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.0 'this'
181627951 ~0% {3} r2 = JOIN r1 WITH Essa::EssaDefinition::getSourceVariable#dispred#f0820431#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'result', Lhs.1, Lhs.2 'this'
873418 ~0% {2} r3 = JOIN r2 WITH Essa::EssaDefinition::reachesEndOfBlock#dispred#f0820431#ff ON FIRST 2 OUTPUT Lhs.2 'this', Lhs.0 'result'
return r3
```
It's perhaps not immediately obvious what's going on here (because of
the `...join_rhs` indirection), but basically we're joining together
`this` and `def` and their `getSourceVariable`, and only then actually
relating `this` and `def` through `reachesEndOfBlock`.
By unbinding `var`, we prevent this early join, which now encourages the
`reachesEndOfBlock` join to happen earlier:
```
Tuple counts for Essa::EssaEdgeRefinement::getInput#dispred#f0820431#ff/2@2d63e5lb after 2s
873421 ~0% {2} r1 = SCAN Essa::TEssaEdgeDefinition#24e22a14#ffff OUTPUT In.3 'this', In.1
873421 ~0% {3} r2 = JOIN r1 WITH Essa::TEssaEdgeDefinition#24e22a14#ffff_30#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.0 'this'
873421 ~0% {3} r3 = JOIN r2 WITH Definitions::SsaSourceVariable#class#486534ab#f ON FIRST 1 OUTPUT Lhs.1, Lhs.2 'this', Lhs.0
8758877 ~0% {3} r4 = JOIN r3 WITH Essa::EssaDefinition::reachesEndOfBlock#dispred#f0820431#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'result', Lhs.2, Lhs.1 'this'
873418 ~0% {2} r5 = JOIN r4 WITH Essa::EssaDefinition::getSourceVariable#dispred#f0820431#ff ON FIRST 2 OUTPUT Lhs.2 'this', Lhs.0 'result'
return r5
```
On certain databases, the evaluation of this predicate was running out
of memory due to the way the `count` aggregate was being used. Here's
an example of the tuple counts involved:
```
Tuple counts for PointsToContext::syntactic_call_count#cf3039a0#ff#antijoin_rhs/1@d2199bb8 after 1m27s:
595518502 ~521250% {1} r1 = JOIN PointsToContext::syntactic_call_count#cf3039a0#ff#shared#3 WITH Flow::CallNode::getFunction#dispred#f0820431#ff_1#join_rhs ON FIRST 1 OUTPUT Lhs.1 'arg0'
26518709 ~111513% {1} r2 = JOIN PointsToContext::syntactic_call_count#cf3039a0#ff#shared#2 WITH Flow::CallNode::getFunction#dispred#f0820431#ff_1#join_rhs ON FIRST 1 OUTPUT Lhs.1 'arg0'
622037211 ~498045% {1} r3 = r1 UNION r2
return r3
```
and a timing report that looked like this:
```
time | evals | max @ iter | predicate
------|-------|--------------|----------
5m8s | | | PointsToContext::syntactic_call_count#cf3039a0#ff#shared#2@6d98d1nd
4m38s | | | PointsToContext::syntactic_call_count#cf3039a0#ff#count_range@f5df1do4
3m51s | | | PointsToContext::syntactic_call_count#cf3039a0#ff#shared#3@da3b4abf
1m58s | 7613 | 37ms @ 4609 | MRO::ClassListList::removedClassParts#f0820431#fffff#reorder_2_3_4_0_1@8155axyi
1m37s | 7613 | 33ms @ 3904 | MRO::ClassListList::bestMergeCandidate#f0820431#2#fff@8155a83w
1m27s | | | PointsToContext::syntactic_call_count#cf3039a0#ff#antijoin_rhs@d2199bb8
1m8s | 1825 | 63ms @ 404 | PointsTo::Expressions::equalityEvaluatesTo#741b54e2#fffff@8155aw7w
37.6s | | | PointsToContext::syntactic_call_count#cf3039a0#ff#join_rhs@e348fc1p
...
```
To make optimising this easier for the compiler, I moved the bodies of
the `count` aggregate into their own helper predicates (with size
linear in the number of `CallNode`s), and also factored out the many
calls to `f.getName()`.
The astute reader will notice that in writing this as a sum of `count`s
rather than a count of a disjunction, the intersection (if it exists)
will be counted twice, and so the semantics may be different. However,
since `method_call` and `function_call` require `AttrNode` and
`NameNode` functions respectively, and as these two types are disjoint,
there is no intersection, and so the semantics should be preserved.
After the change, the evaluation of `syntactic_call_count` now looks as
follows:
```
Tuple counts for PointsToContext::syntactic_call_count#cf3039a0#ff/2@662dd8s0 after 216ms:
23960 ~0% {1} r1 = @py_scope#f AND NOT py_Functions_0#antijoin_rhs(Lhs.0 's')
23960 ~0% {2} r2 = SCAN r1 OUTPUT In.0 's', 0
276309 ~7% {2} r3 = SCAN @py_scope#f OUTPUT In.0 's', "__init__"
11763 ~0% {2} r4 = JOIN r3 WITH Scope::Scope::getName#dispred#f0820431#fb ON FIRST 2 OUTPUT Lhs.0 's', 1
35723 ~0% {2} r5 = r2 UNION r4
252349 ~0% {2} r6 = JOIN @py_scope#f WITH Function::Function::getName#dispred#f0820431#ff ON FIRST 1 OUTPUT Lhs.0 's', Rhs.1
240586 ~0% {2} r7 = SELECT r6 ON In.1 != "__init__"
131727 ~4% {2} r8 = r7 AND NOT project#PointsToContext::method_call#cf3039a0#ff(Lhs.1)
131727 ~0% {3} r9 = SCAN r8 OUTPUT In.1, In.0 's', 0
240586 ~0% {2} r10 = SCAN r7 OUTPUT In.1, In.0 's'
108859 ~0% {3} r11 = JOIN r10 WITH PointsToContext::syntactic_call_count#cf3039a0#ff#join_rhs ON FIRST 1 OUTPUT Lhs.0, Lhs.1 's', Rhs.1
240586 ~0% {3} r12 = r9 UNION r11
24100 ~0% {2} r13 = JOIN r12 WITH PointsToContext::syntactic_call_count#cf3039a0#ff#join_rhs#1 ON FIRST 1 OUTPUT Lhs.1 's', (Rhs.1 + Lhs.2)
240586 ~0% {2} r14 = SELECT r6 ON In.1 != "__init__"
131727 ~4% {2} r15 = r14 AND NOT project#PointsToContext::method_call#cf3039a0#ff(Lhs.1)
131727 ~0% {3} r16 = SCAN r15 OUTPUT In.0 's', In.1, 0
108859 ~4% {3} r17 = JOIN r10 WITH PointsToContext::syntactic_call_count#cf3039a0#ff#join_rhs ON FIRST 1 OUTPUT Lhs.1 's', Lhs.0, Rhs.1
240586 ~4% {3} r18 = r16 UNION r17
216486 ~2% {3} r19 = r18 AND NOT project#PointsToContext::function_call#cf3039a0#ff(Lhs.1)
216486 ~0% {2} r20 = SCAN r19 OUTPUT In.0 's', (0 + In.2)
240586 ~0% {2} r21 = r13 UNION r20
276309 ~0% {2} r22 = r5 UNION r21
return r22
```
Although this wasn't (as far as I know) causing any performance issues,
it was making the join-order badness report quite noisy, and so I
figured it was worth fixing.
Before:
```
Tuple counts for RegexTreeView::RegExpBackRef::getGroup#dispred#f0820431#ff/2@d3441d0b after 84ms:
1501195 ~3% {2} r1 = JOIN RegexTreeView::RegExpTerm::getLiteral#dispred#f0820431#ff_10#join_rhs WITH RegexTreeView::RegExpTerm::getLiteral#dispred#f0820431#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'result', Lhs.1 'result'
149 ~0% {5} r2 = JOIN r1 WITH RegexTreeView::RegExpBackRef#class#31aac2a7#ffff ON FIRST 1 OUTPUT Rhs.1, Rhs.2, Rhs.3, Lhs.1 'result', Lhs.0 'this'
149 ~1% {3} r3 = JOIN r2 WITH regex::RegexString::numbered_backreference#dispred#f0820431#ffff ON FIRST 3 OUTPUT Lhs.3 'result', Rhs.3, Lhs.4 'this'
4 ~0% {2} r4 = JOIN r3 WITH RegexTreeView::RegExpGroup::getNumber#dispred#f0820431#ff ON FIRST 2 OUTPUT Lhs.2 'this', Lhs.0 'result'
1501195 ~3% {2} r5 = JOIN RegexTreeView::RegExpTerm::getLiteral#dispred#f0820431#ff_10#join_rhs WITH RegexTreeView::RegExpTerm::getLiteral#dispred#f0820431#ff_10#join_rhs ON FIRST 1 OUTPUT Lhs.1 'result', Rhs.1 'result'
42526 ~0% {5} r6 = JOIN r5 WITH RegexTreeView::RegExpGroup#31aac2a7#ffff ON FIRST 1 OUTPUT Lhs.1 'this', Lhs.0 'result', Rhs.1, Rhs.2, Rhs.3
22 ~0% {8} r7 = JOIN r6 WITH RegexTreeView::RegExpBackRef#class#31aac2a7#ffff ON FIRST 1 OUTPUT Lhs.2, Lhs.3, Lhs.4, Lhs.1 'result', Lhs.0 'this', Rhs.1, Rhs.2, Rhs.3
0 ~0% {6} r8 = JOIN r7 WITH regex::RegexString::getGroupName#dispred#f0820431#ffff ON FIRST 3 OUTPUT Lhs.5, Lhs.6, Lhs.7, Rhs.3, Lhs.3 'result', Lhs.4 'this'
0 ~0% {2} r9 = JOIN r8 WITH regex::RegexString::named_backreference#dispred#f0820431#ffff ON FIRST 4 OUTPUT Lhs.5 'this', Lhs.4 'result'
4 ~0% {2} r10 = r4 UNION r9
return r10
```
In this case I opted for a classical solution: tying together the
literal and number (or name) part of the backreference in order to
encourage a two-column join.
After:
```
Tuple counts for RegexTreeView::RegExpBackRef::getGroup#dispred#f0820431#ff/2@b0cc4d5n after 0ms:
898 ~1% {3} r1 = JOIN RegexTreeView::RegExpTerm::getLiteral#dispred#f0820431#ff WITH RegexTreeView::RegExpGroup::getNumber#dispred#f0820431#ff ON FIRST 1 OUTPUT Lhs.1, Rhs.1, Lhs.0 'result'
4 ~0% {2} r2 = JOIN r1 WITH RegexTreeView::RegExpBackRef::hasLiteralAndNumber#f0820431#fff_120#join_rhs ON FIRST 2 OUTPUT Rhs.2 'this', Lhs.2 'result'
1110 ~0% {5} r3 = JOIN RegexTreeView::RegExpGroup#31aac2a7#ffff WITH RegexTreeView::RegExpTerm::getLiteral#dispred#f0820431#ff ON FIRST 1 OUTPUT Lhs.1, Lhs.2, Lhs.3, Lhs.0 'result', Rhs.1
146 ~0% {3} r4 = JOIN r3 WITH regex::RegexString::getGroupName#dispred#f0820431#ffff ON FIRST 3 OUTPUT Lhs.4, Rhs.3, Lhs.3 'result'
0 ~0% {2} r5 = JOIN r4 WITH RegexTreeView::RegExpBackRef::hasLiteralAndName#f0820431#fff_120#join_rhs ON FIRST 2 OUTPUT Rhs.2 'this', Lhs.2 'result'
4 ~0% {2} r6 = r2 UNION r5
return r6
```
This bad join was identified by the join-order-badness report, which
showed that:
py/use-of-input:MRO::flatten_list#f4eaf05f#fff#9c5fe54whnlqffdgu65vhb8uhpg# (order_500000)
calculated a whopping 212,820,108 tuples in order to produce an output of
size 55516, roughly 3833 times more effort than needed.
Here's a snippet of the slowest iteration of that predicate:
```
Tuple counts for MRO::flatten_list#f4eaf05f#fff/3@i1839#0265eb3w after 14ms:
0 ~0% {3} r1 = JOIN MRO::need_flattening#f4eaf05f#f#prev_delta WITH MRO::ConsList#f4eaf05f#fff#reorder_2_0_1#prev ON FIRST 1 OUTPUT Rhs.1, Lhs.0 'list', Rhs.2
0 ~0% {3} r2 = JOIN r1 WITH MRO::ClassList::length#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.2, Lhs.1 'list', Rhs.1 'n'
0 ~0% {3} r3 = JOIN r2 WITH MRO::ClassListList::flatten#dispred#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.1 'list', Lhs.2 'n', Rhs.1 'result'
0 ~0% {3} r4 = SCAN MRO::ConsList#f4eaf05f#fff#prev_delta OUTPUT In.2 'list', In.0, In.1
0 ~0% {3} r5 = JOIN r4 WITH MRO::need_flattening#f4eaf05f#f#prev ON FIRST 1 OUTPUT Lhs.1, Lhs.2, Lhs.0 'list'
0 ~0% {3} r6 = JOIN r5 WITH MRO::ClassList::length#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.1, Lhs.2 'list', Rhs.1 'n'
0 ~0% {3} r7 = JOIN r6 WITH MRO::ClassListList::flatten#dispred#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.1 'list', Lhs.2 'n', Rhs.1 'result'
0 ~0% {3} r8 = r3 UNION r7
26355 ~2% {3} r9 = SCAN MRO::ConsList#f4eaf05f#fff#prev OUTPUT In.2 'list', In.0, In.1
0 ~0% {3} r10 = JOIN r9 WITH MRO::need_flattening#f4eaf05f#f#prev ON FIRST 1 OUTPUT Lhs.1, Lhs.2, Lhs.0 'list'
0 ~0% {3} r11 = JOIN r10 WITH MRO::ClassList::length#f0820431#ff#prev_delta ON FIRST 1 OUTPUT Lhs.1, Lhs.2 'list', Rhs.1 'n'
0 ~0% {3} r12 = JOIN r11 WITH MRO::ClassListList::flatten#dispred#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.1 'list', Lhs.2 'n', Rhs.1 'result'
...
```
(... and a bunch more lines. The same construction appears several times,
but the join order is the same each time.)
Clearly it would be better to start with whatever is in `need_flattening`,
and then do the other joins. This is what the present fix does (by
unbinding `list` in all but the `needs_flattening` call).
After the fix, the slowest iteration is as follows:
```
Tuple counts for MRO::flatten_list#f4eaf05f#fff/3@i2617#8155ab3w after 9ms:
0 ~0% {2} r1 = SCAN MRO::need_flattening#f4eaf05f#f#prev_delta OUTPUT In.0 'list', In.0 'list'
0 ~0% {3} r2 = JOIN r1 WITH MRO::ConsList#f4eaf05f#fff#reorder_2_0_1#prev ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'list', Rhs.2
0 ~0% {3} r3 = JOIN r2 WITH MRO::ClassList::length#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.2, Lhs.1 'list', Rhs.1 'n'
0 ~0% {3} r4 = JOIN r3 WITH MRO::ClassListList::flatten#dispred#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.1 'list', Lhs.2 'n', Rhs.1 'result'
1 ~0% {2} r5 = SCAN MRO::need_flattening#f4eaf05f#f#prev OUTPUT In.0 'list', In.0 'list'
0 ~0% {3} r6 = JOIN r5 WITH MRO::ConsList#f4eaf05f#fff#reorder_2_0_1#prev_delta ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'list', Rhs.2
0 ~0% {3} r7 = JOIN r6 WITH MRO::ClassList::length#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.2, Lhs.1 'list', Rhs.1 'n'
0 ~0% {3} r8 = JOIN r7 WITH MRO::ClassListList::flatten#dispred#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.1 'list', Lhs.2 'n', Rhs.1 'result'
...
```
(... and so on. The remainder is 0 tuples all the way.)
In total, we went from
```
40.6s | 7614 | 15ms @ 1839 | MRO::flatten_list#f4eaf05f#fff@0265eb3w
```
to
```
7.8s | 7614 | 11ms @ 2617 | MRO::flatten_list#f4eaf05f#fff@8155ab3w
```
in the program
```python
if not is_safe(path):
return
```
the last node in the `ConditionBlock` is `not is_safe(path)`,
so it would never match "a call to is_safe".
Thus, guards inside `not` would not be part of `GuardNode`
(nor `BarrierGuard`). Now they can.
Fixes a bad join in `list_of_linearization_of_bases_plus_bases`.
Previvously, we joined together `ConsList` and `getBase` before filtering
these out using the recursive call. Now we do the recursion first.
Co-authored-by: yoff <yoff@github.com>