On certain databases, the evaluation of this predicate was running out
of memory due to the way the `count` aggregate was being used. Here's
an example of the tuple counts involved:
```
Tuple counts for PointsToContext::syntactic_call_count#cf3039a0#ff#antijoin_rhs/1@d2199bb8 after 1m27s:
595518502 ~521250% {1} r1 = JOIN PointsToContext::syntactic_call_count#cf3039a0#ff#shared#3 WITH Flow::CallNode::getFunction#dispred#f0820431#ff_1#join_rhs ON FIRST 1 OUTPUT Lhs.1 'arg0'
26518709 ~111513% {1} r2 = JOIN PointsToContext::syntactic_call_count#cf3039a0#ff#shared#2 WITH Flow::CallNode::getFunction#dispred#f0820431#ff_1#join_rhs ON FIRST 1 OUTPUT Lhs.1 'arg0'
622037211 ~498045% {1} r3 = r1 UNION r2
return r3
```
and a timing report that looked like this:
```
time | evals | max @ iter | predicate
------|-------|--------------|----------
5m8s | | | PointsToContext::syntactic_call_count#cf3039a0#ff#shared#2@6d98d1nd
4m38s | | | PointsToContext::syntactic_call_count#cf3039a0#ff#count_range@f5df1do4
3m51s | | | PointsToContext::syntactic_call_count#cf3039a0#ff#shared#3@da3b4abf
1m58s | 7613 | 37ms @ 4609 | MRO::ClassListList::removedClassParts#f0820431#fffff#reorder_2_3_4_0_1@8155axyi
1m37s | 7613 | 33ms @ 3904 | MRO::ClassListList::bestMergeCandidate#f0820431#2#fff@8155a83w
1m27s | | | PointsToContext::syntactic_call_count#cf3039a0#ff#antijoin_rhs@d2199bb8
1m8s | 1825 | 63ms @ 404 | PointsTo::Expressions::equalityEvaluatesTo#741b54e2#fffff@8155aw7w
37.6s | | | PointsToContext::syntactic_call_count#cf3039a0#ff#join_rhs@e348fc1p
...
```
To make optimising this easier for the compiler, I moved the bodies of
the `count` aggregate into their own helper predicates (with size
linear in the number of `CallNode`s), and also factored out the many
calls to `f.getName()`.
The astute reader will notice that in writing this as a sum of `count`s
rather than a count of a disjunction, the intersection (if it exists)
will be counted twice, and so the semantics may be different. However,
since `method_call` and `function_call` require `AttrNode` and
`NameNode` functions respectively, and as these two types are disjoint,
there is no intersection, and so the semantics should be preserved.
After the change, the evaluation of `syntactic_call_count` now looks as
follows:
```
Tuple counts for PointsToContext::syntactic_call_count#cf3039a0#ff/2@662dd8s0 after 216ms:
23960 ~0% {1} r1 = @py_scope#f AND NOT py_Functions_0#antijoin_rhs(Lhs.0 's')
23960 ~0% {2} r2 = SCAN r1 OUTPUT In.0 's', 0
276309 ~7% {2} r3 = SCAN @py_scope#f OUTPUT In.0 's', "__init__"
11763 ~0% {2} r4 = JOIN r3 WITH Scope::Scope::getName#dispred#f0820431#fb ON FIRST 2 OUTPUT Lhs.0 's', 1
35723 ~0% {2} r5 = r2 UNION r4
252349 ~0% {2} r6 = JOIN @py_scope#f WITH Function::Function::getName#dispred#f0820431#ff ON FIRST 1 OUTPUT Lhs.0 's', Rhs.1
240586 ~0% {2} r7 = SELECT r6 ON In.1 != "__init__"
131727 ~4% {2} r8 = r7 AND NOT project#PointsToContext::method_call#cf3039a0#ff(Lhs.1)
131727 ~0% {3} r9 = SCAN r8 OUTPUT In.1, In.0 's', 0
240586 ~0% {2} r10 = SCAN r7 OUTPUT In.1, In.0 's'
108859 ~0% {3} r11 = JOIN r10 WITH PointsToContext::syntactic_call_count#cf3039a0#ff#join_rhs ON FIRST 1 OUTPUT Lhs.0, Lhs.1 's', Rhs.1
240586 ~0% {3} r12 = r9 UNION r11
24100 ~0% {2} r13 = JOIN r12 WITH PointsToContext::syntactic_call_count#cf3039a0#ff#join_rhs#1 ON FIRST 1 OUTPUT Lhs.1 's', (Rhs.1 + Lhs.2)
240586 ~0% {2} r14 = SELECT r6 ON In.1 != "__init__"
131727 ~4% {2} r15 = r14 AND NOT project#PointsToContext::method_call#cf3039a0#ff(Lhs.1)
131727 ~0% {3} r16 = SCAN r15 OUTPUT In.0 's', In.1, 0
108859 ~4% {3} r17 = JOIN r10 WITH PointsToContext::syntactic_call_count#cf3039a0#ff#join_rhs ON FIRST 1 OUTPUT Lhs.1 's', Lhs.0, Rhs.1
240586 ~4% {3} r18 = r16 UNION r17
216486 ~2% {3} r19 = r18 AND NOT project#PointsToContext::function_call#cf3039a0#ff(Lhs.1)
216486 ~0% {2} r20 = SCAN r19 OUTPUT In.0 's', (0 + In.2)
240586 ~0% {2} r21 = r13 UNION r20
276309 ~0% {2} r22 = r5 UNION r21
return r22
```
- "Normal" instead of "NonSpecial"
- "NonLibrary" instead of "2"
I could not find a good replacement for "NonLibrary", nor for "Source",
but I added QLDocs in a few places to help the reading.
With this change, users are now able to run View AST command in
vscode within vscode workspaces that do not include the core libraries.
The relevant core library only needs to be installed in the package
cache.
Although this wasn't (as far as I know) causing any performance issues,
it was making the join-order badness report quite noisy, and so I
figured it was worth fixing.
Before:
```
Tuple counts for RegexTreeView::RegExpBackRef::getGroup#dispred#f0820431#ff/2@d3441d0b after 84ms:
1501195 ~3% {2} r1 = JOIN RegexTreeView::RegExpTerm::getLiteral#dispred#f0820431#ff_10#join_rhs WITH RegexTreeView::RegExpTerm::getLiteral#dispred#f0820431#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'result', Lhs.1 'result'
149 ~0% {5} r2 = JOIN r1 WITH RegexTreeView::RegExpBackRef#class#31aac2a7#ffff ON FIRST 1 OUTPUT Rhs.1, Rhs.2, Rhs.3, Lhs.1 'result', Lhs.0 'this'
149 ~1% {3} r3 = JOIN r2 WITH regex::RegexString::numbered_backreference#dispred#f0820431#ffff ON FIRST 3 OUTPUT Lhs.3 'result', Rhs.3, Lhs.4 'this'
4 ~0% {2} r4 = JOIN r3 WITH RegexTreeView::RegExpGroup::getNumber#dispred#f0820431#ff ON FIRST 2 OUTPUT Lhs.2 'this', Lhs.0 'result'
1501195 ~3% {2} r5 = JOIN RegexTreeView::RegExpTerm::getLiteral#dispred#f0820431#ff_10#join_rhs WITH RegexTreeView::RegExpTerm::getLiteral#dispred#f0820431#ff_10#join_rhs ON FIRST 1 OUTPUT Lhs.1 'result', Rhs.1 'result'
42526 ~0% {5} r6 = JOIN r5 WITH RegexTreeView::RegExpGroup#31aac2a7#ffff ON FIRST 1 OUTPUT Lhs.1 'this', Lhs.0 'result', Rhs.1, Rhs.2, Rhs.3
22 ~0% {8} r7 = JOIN r6 WITH RegexTreeView::RegExpBackRef#class#31aac2a7#ffff ON FIRST 1 OUTPUT Lhs.2, Lhs.3, Lhs.4, Lhs.1 'result', Lhs.0 'this', Rhs.1, Rhs.2, Rhs.3
0 ~0% {6} r8 = JOIN r7 WITH regex::RegexString::getGroupName#dispred#f0820431#ffff ON FIRST 3 OUTPUT Lhs.5, Lhs.6, Lhs.7, Rhs.3, Lhs.3 'result', Lhs.4 'this'
0 ~0% {2} r9 = JOIN r8 WITH regex::RegexString::named_backreference#dispred#f0820431#ffff ON FIRST 4 OUTPUT Lhs.5 'this', Lhs.4 'result'
4 ~0% {2} r10 = r4 UNION r9
return r10
```
In this case I opted for a classical solution: tying together the
literal and number (or name) part of the backreference in order to
encourage a two-column join.
After:
```
Tuple counts for RegexTreeView::RegExpBackRef::getGroup#dispred#f0820431#ff/2@b0cc4d5n after 0ms:
898 ~1% {3} r1 = JOIN RegexTreeView::RegExpTerm::getLiteral#dispred#f0820431#ff WITH RegexTreeView::RegExpGroup::getNumber#dispred#f0820431#ff ON FIRST 1 OUTPUT Lhs.1, Rhs.1, Lhs.0 'result'
4 ~0% {2} r2 = JOIN r1 WITH RegexTreeView::RegExpBackRef::hasLiteralAndNumber#f0820431#fff_120#join_rhs ON FIRST 2 OUTPUT Rhs.2 'this', Lhs.2 'result'
1110 ~0% {5} r3 = JOIN RegexTreeView::RegExpGroup#31aac2a7#ffff WITH RegexTreeView::RegExpTerm::getLiteral#dispred#f0820431#ff ON FIRST 1 OUTPUT Lhs.1, Lhs.2, Lhs.3, Lhs.0 'result', Rhs.1
146 ~0% {3} r4 = JOIN r3 WITH regex::RegexString::getGroupName#dispred#f0820431#ffff ON FIRST 3 OUTPUT Lhs.4, Rhs.3, Lhs.3 'result'
0 ~0% {2} r5 = JOIN r4 WITH RegexTreeView::RegExpBackRef::hasLiteralAndName#f0820431#fff_120#join_rhs ON FIRST 2 OUTPUT Rhs.2 'this', Lhs.2 'result'
4 ~0% {2} r6 = r2 UNION r5
return r6
```
This bad join was identified by the join-order-badness report, which
showed that:
py/use-of-input:MRO::flatten_list#f4eaf05f#fff#9c5fe54whnlqffdgu65vhb8uhpg# (order_500000)
calculated a whopping 212,820,108 tuples in order to produce an output of
size 55516, roughly 3833 times more effort than needed.
Here's a snippet of the slowest iteration of that predicate:
```
Tuple counts for MRO::flatten_list#f4eaf05f#fff/3@i1839#0265eb3w after 14ms:
0 ~0% {3} r1 = JOIN MRO::need_flattening#f4eaf05f#f#prev_delta WITH MRO::ConsList#f4eaf05f#fff#reorder_2_0_1#prev ON FIRST 1 OUTPUT Rhs.1, Lhs.0 'list', Rhs.2
0 ~0% {3} r2 = JOIN r1 WITH MRO::ClassList::length#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.2, Lhs.1 'list', Rhs.1 'n'
0 ~0% {3} r3 = JOIN r2 WITH MRO::ClassListList::flatten#dispred#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.1 'list', Lhs.2 'n', Rhs.1 'result'
0 ~0% {3} r4 = SCAN MRO::ConsList#f4eaf05f#fff#prev_delta OUTPUT In.2 'list', In.0, In.1
0 ~0% {3} r5 = JOIN r4 WITH MRO::need_flattening#f4eaf05f#f#prev ON FIRST 1 OUTPUT Lhs.1, Lhs.2, Lhs.0 'list'
0 ~0% {3} r6 = JOIN r5 WITH MRO::ClassList::length#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.1, Lhs.2 'list', Rhs.1 'n'
0 ~0% {3} r7 = JOIN r6 WITH MRO::ClassListList::flatten#dispred#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.1 'list', Lhs.2 'n', Rhs.1 'result'
0 ~0% {3} r8 = r3 UNION r7
26355 ~2% {3} r9 = SCAN MRO::ConsList#f4eaf05f#fff#prev OUTPUT In.2 'list', In.0, In.1
0 ~0% {3} r10 = JOIN r9 WITH MRO::need_flattening#f4eaf05f#f#prev ON FIRST 1 OUTPUT Lhs.1, Lhs.2, Lhs.0 'list'
0 ~0% {3} r11 = JOIN r10 WITH MRO::ClassList::length#f0820431#ff#prev_delta ON FIRST 1 OUTPUT Lhs.1, Lhs.2 'list', Rhs.1 'n'
0 ~0% {3} r12 = JOIN r11 WITH MRO::ClassListList::flatten#dispred#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.1 'list', Lhs.2 'n', Rhs.1 'result'
...
```
(... and a bunch more lines. The same construction appears several times,
but the join order is the same each time.)
Clearly it would be better to start with whatever is in `need_flattening`,
and then do the other joins. This is what the present fix does (by
unbinding `list` in all but the `needs_flattening` call).
After the fix, the slowest iteration is as follows:
```
Tuple counts for MRO::flatten_list#f4eaf05f#fff/3@i2617#8155ab3w after 9ms:
0 ~0% {2} r1 = SCAN MRO::need_flattening#f4eaf05f#f#prev_delta OUTPUT In.0 'list', In.0 'list'
0 ~0% {3} r2 = JOIN r1 WITH MRO::ConsList#f4eaf05f#fff#reorder_2_0_1#prev ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'list', Rhs.2
0 ~0% {3} r3 = JOIN r2 WITH MRO::ClassList::length#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.2, Lhs.1 'list', Rhs.1 'n'
0 ~0% {3} r4 = JOIN r3 WITH MRO::ClassListList::flatten#dispred#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.1 'list', Lhs.2 'n', Rhs.1 'result'
1 ~0% {2} r5 = SCAN MRO::need_flattening#f4eaf05f#f#prev OUTPUT In.0 'list', In.0 'list'
0 ~0% {3} r6 = JOIN r5 WITH MRO::ConsList#f4eaf05f#fff#reorder_2_0_1#prev_delta ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'list', Rhs.2
0 ~0% {3} r7 = JOIN r6 WITH MRO::ClassList::length#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.2, Lhs.1 'list', Rhs.1 'n'
0 ~0% {3} r8 = JOIN r7 WITH MRO::ClassListList::flatten#dispred#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.1 'list', Lhs.2 'n', Rhs.1 'result'
...
```
(... and so on. The remainder is 0 tuples all the way.)
In total, we went from
```
40.6s | 7614 | 15ms @ 1839 | MRO::flatten_list#f4eaf05f#fff@0265eb3w
```
to
```
7.8s | 7614 | 11ms @ 2617 | MRO::flatten_list#f4eaf05f#fff@8155ab3w
```
in the program
```python
if not is_safe(path):
return
```
the last node in the `ConditionBlock` is `not is_safe(path)`,
so it would never match "a call to is_safe".
Thus, guards inside `not` would not be part of `GuardNode`
(nor `BarrierGuard`). Now they can.
Fixes a bad join in `list_of_linearization_of_bases_plus_bases`.
Previvously, we joined together `ConsList` and `getBase` before filtering
these out using the recursive call. Now we do the recursion first.
Co-authored-by: yoff <yoff@github.com>