Currently, we have a number of assertions in the codebase and certain
assumptions about the AST. These don't always hold, sometimes leading to
a crash in the extractor.
The crashes leave incomplete TRAP files that cannot be imported into the
database.
With this change, we still get those incomplete TRAP files, but we also
get a database in the end (even thoough it is also incomplete as we
cannot import everything).
Although this wasn't (as far as I know) causing any performance issues,
it was making the join-order badness report quite noisy, and so I
figured it was worth fixing.
Before:
```
Tuple counts for RegexTreeView::RegExpBackRef::getGroup#dispred#f0820431#ff/2@d3441d0b after 84ms:
1501195 ~3% {2} r1 = JOIN RegexTreeView::RegExpTerm::getLiteral#dispred#f0820431#ff_10#join_rhs WITH RegexTreeView::RegExpTerm::getLiteral#dispred#f0820431#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'result', Lhs.1 'result'
149 ~0% {5} r2 = JOIN r1 WITH RegexTreeView::RegExpBackRef#class#31aac2a7#ffff ON FIRST 1 OUTPUT Rhs.1, Rhs.2, Rhs.3, Lhs.1 'result', Lhs.0 'this'
149 ~1% {3} r3 = JOIN r2 WITH regex::RegexString::numbered_backreference#dispred#f0820431#ffff ON FIRST 3 OUTPUT Lhs.3 'result', Rhs.3, Lhs.4 'this'
4 ~0% {2} r4 = JOIN r3 WITH RegexTreeView::RegExpGroup::getNumber#dispred#f0820431#ff ON FIRST 2 OUTPUT Lhs.2 'this', Lhs.0 'result'
1501195 ~3% {2} r5 = JOIN RegexTreeView::RegExpTerm::getLiteral#dispred#f0820431#ff_10#join_rhs WITH RegexTreeView::RegExpTerm::getLiteral#dispred#f0820431#ff_10#join_rhs ON FIRST 1 OUTPUT Lhs.1 'result', Rhs.1 'result'
42526 ~0% {5} r6 = JOIN r5 WITH RegexTreeView::RegExpGroup#31aac2a7#ffff ON FIRST 1 OUTPUT Lhs.1 'this', Lhs.0 'result', Rhs.1, Rhs.2, Rhs.3
22 ~0% {8} r7 = JOIN r6 WITH RegexTreeView::RegExpBackRef#class#31aac2a7#ffff ON FIRST 1 OUTPUT Lhs.2, Lhs.3, Lhs.4, Lhs.1 'result', Lhs.0 'this', Rhs.1, Rhs.2, Rhs.3
0 ~0% {6} r8 = JOIN r7 WITH regex::RegexString::getGroupName#dispred#f0820431#ffff ON FIRST 3 OUTPUT Lhs.5, Lhs.6, Lhs.7, Rhs.3, Lhs.3 'result', Lhs.4 'this'
0 ~0% {2} r9 = JOIN r8 WITH regex::RegexString::named_backreference#dispred#f0820431#ffff ON FIRST 4 OUTPUT Lhs.5 'this', Lhs.4 'result'
4 ~0% {2} r10 = r4 UNION r9
return r10
```
In this case I opted for a classical solution: tying together the
literal and number (or name) part of the backreference in order to
encourage a two-column join.
After:
```
Tuple counts for RegexTreeView::RegExpBackRef::getGroup#dispred#f0820431#ff/2@b0cc4d5n after 0ms:
898 ~1% {3} r1 = JOIN RegexTreeView::RegExpTerm::getLiteral#dispred#f0820431#ff WITH RegexTreeView::RegExpGroup::getNumber#dispred#f0820431#ff ON FIRST 1 OUTPUT Lhs.1, Rhs.1, Lhs.0 'result'
4 ~0% {2} r2 = JOIN r1 WITH RegexTreeView::RegExpBackRef::hasLiteralAndNumber#f0820431#fff_120#join_rhs ON FIRST 2 OUTPUT Rhs.2 'this', Lhs.2 'result'
1110 ~0% {5} r3 = JOIN RegexTreeView::RegExpGroup#31aac2a7#ffff WITH RegexTreeView::RegExpTerm::getLiteral#dispred#f0820431#ff ON FIRST 1 OUTPUT Lhs.1, Lhs.2, Lhs.3, Lhs.0 'result', Rhs.1
146 ~0% {3} r4 = JOIN r3 WITH regex::RegexString::getGroupName#dispred#f0820431#ffff ON FIRST 3 OUTPUT Lhs.4, Rhs.3, Lhs.3 'result'
0 ~0% {2} r5 = JOIN r4 WITH RegexTreeView::RegExpBackRef::hasLiteralAndName#f0820431#fff_120#join_rhs ON FIRST 2 OUTPUT Rhs.2 'this', Lhs.2 'result'
4 ~0% {2} r6 = r2 UNION r5
return r6
```
The `getName` in `Type.qll` was issuing a warning in other generated
classes having a `getName` from a `name` property in `schema.yml`.
To fix the possible inconsistency, `diagnostic_name` is being renamed to
`name` in the schema. Despite the scary doc comment on
`swift::Type::getString` (namely `for use in diagnostics only`), that
seems to be the right generic naming mechanism for types, and it
coincides with the name we were extracting on types with an explicit
`name` property.
In case we find a case where `Type::getString` gives something wrong,
we can probably just patch it on that specific type class.
This bad join was identified by the join-order-badness report, which
showed that:
py/use-of-input:MRO::flatten_list#f4eaf05f#fff#9c5fe54whnlqffdgu65vhb8uhpg# (order_500000)
calculated a whopping 212,820,108 tuples in order to produce an output of
size 55516, roughly 3833 times more effort than needed.
Here's a snippet of the slowest iteration of that predicate:
```
Tuple counts for MRO::flatten_list#f4eaf05f#fff/3@i1839#0265eb3w after 14ms:
0 ~0% {3} r1 = JOIN MRO::need_flattening#f4eaf05f#f#prev_delta WITH MRO::ConsList#f4eaf05f#fff#reorder_2_0_1#prev ON FIRST 1 OUTPUT Rhs.1, Lhs.0 'list', Rhs.2
0 ~0% {3} r2 = JOIN r1 WITH MRO::ClassList::length#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.2, Lhs.1 'list', Rhs.1 'n'
0 ~0% {3} r3 = JOIN r2 WITH MRO::ClassListList::flatten#dispred#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.1 'list', Lhs.2 'n', Rhs.1 'result'
0 ~0% {3} r4 = SCAN MRO::ConsList#f4eaf05f#fff#prev_delta OUTPUT In.2 'list', In.0, In.1
0 ~0% {3} r5 = JOIN r4 WITH MRO::need_flattening#f4eaf05f#f#prev ON FIRST 1 OUTPUT Lhs.1, Lhs.2, Lhs.0 'list'
0 ~0% {3} r6 = JOIN r5 WITH MRO::ClassList::length#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.1, Lhs.2 'list', Rhs.1 'n'
0 ~0% {3} r7 = JOIN r6 WITH MRO::ClassListList::flatten#dispred#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.1 'list', Lhs.2 'n', Rhs.1 'result'
0 ~0% {3} r8 = r3 UNION r7
26355 ~2% {3} r9 = SCAN MRO::ConsList#f4eaf05f#fff#prev OUTPUT In.2 'list', In.0, In.1
0 ~0% {3} r10 = JOIN r9 WITH MRO::need_flattening#f4eaf05f#f#prev ON FIRST 1 OUTPUT Lhs.1, Lhs.2, Lhs.0 'list'
0 ~0% {3} r11 = JOIN r10 WITH MRO::ClassList::length#f0820431#ff#prev_delta ON FIRST 1 OUTPUT Lhs.1, Lhs.2 'list', Rhs.1 'n'
0 ~0% {3} r12 = JOIN r11 WITH MRO::ClassListList::flatten#dispred#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.1 'list', Lhs.2 'n', Rhs.1 'result'
...
```
(... and a bunch more lines. The same construction appears several times,
but the join order is the same each time.)
Clearly it would be better to start with whatever is in `need_flattening`,
and then do the other joins. This is what the present fix does (by
unbinding `list` in all but the `needs_flattening` call).
After the fix, the slowest iteration is as follows:
```
Tuple counts for MRO::flatten_list#f4eaf05f#fff/3@i2617#8155ab3w after 9ms:
0 ~0% {2} r1 = SCAN MRO::need_flattening#f4eaf05f#f#prev_delta OUTPUT In.0 'list', In.0 'list'
0 ~0% {3} r2 = JOIN r1 WITH MRO::ConsList#f4eaf05f#fff#reorder_2_0_1#prev ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'list', Rhs.2
0 ~0% {3} r3 = JOIN r2 WITH MRO::ClassList::length#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.2, Lhs.1 'list', Rhs.1 'n'
0 ~0% {3} r4 = JOIN r3 WITH MRO::ClassListList::flatten#dispred#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.1 'list', Lhs.2 'n', Rhs.1 'result'
1 ~0% {2} r5 = SCAN MRO::need_flattening#f4eaf05f#f#prev OUTPUT In.0 'list', In.0 'list'
0 ~0% {3} r6 = JOIN r5 WITH MRO::ConsList#f4eaf05f#fff#reorder_2_0_1#prev_delta ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'list', Rhs.2
0 ~0% {3} r7 = JOIN r6 WITH MRO::ClassList::length#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.2, Lhs.1 'list', Rhs.1 'n'
0 ~0% {3} r8 = JOIN r7 WITH MRO::ClassListList::flatten#dispred#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.1 'list', Lhs.2 'n', Rhs.1 'result'
...
```
(... and so on. The remainder is 0 tuples all the way.)
In total, we went from
```
40.6s | 7614 | 15ms @ 1839 | MRO::flatten_list#f4eaf05f#fff@0265eb3w
```
to
```
7.8s | 7614 | 11ms @ 2617 | MRO::flatten_list#f4eaf05f#fff@8155ab3w
```
Depending on the extraction order, before this change there might be multiple
`GlobalVariable`s per declared global variable. See the tests in
`cpp/ql/test/library-tests/variables/global`. This change ensures that only one
of those `GlobalVariable`s is visible to the user if we can locate a unique
definition. If not, the old situation persists.
Note that an exception needs to be made for templated variables. Here, the
definition refers to the non-instantiated template, while a declaration that
is not a definition refers to an instantiation. In case the instantiation refers
to a template parameter, the mangled names of the template and the instantiation
will be identical. This happens for example in the following case:
```
template <typename T>
T x = T(42); // Uninstantiated templated variable
template <typename T>
class C {
T y = x<T>; // Instantiation using a template parameter
};
```
Since the uninstantiated template and the instantiation are two different
entities, we do not unify them as described above.