Here, `context.appliesTo(n)` was being distributed across all of the
disjuncts, which caused poor performance.
The new helper predicate, `literal_node_class` should be fairly small,
since it only applies to a subset of `ControlFlowNode`s, and only
assigns a limited set of `ClassObjectInternal`s to these nodes.
Since the number of relevant attributes in the `re` module is fairly
small, it made sense to factor this out in a separate predicate, and
the join order also became more sensible.
This is only _really_ expensive when there are a _lot_ of strings in
the database, but for this case, where we're always extracting the
same substring of the string, it's easier -- and faster -- to just
make a substring operation directly.
This fixes the major performance problem with type tracking on
some (pathological) databases.
The interface could probably be improved a bit. In particular, I'm
thinking that we might want to have `DataFlow::exprNode` return a
`LocalSourceNode` so that a cast isn't necessary in order to use
`flowsTo`.
I have added two `cached` annotations. The one on `flowsTo` is
crucial, as performance regresses without it. The one on
`simpleLocalFlowStep` may not be needed, but Java has a similar
annotation, and to me it makes sense to have this relation cached.
A slightly more complicated version of the situation in
https://github.com/github/codeql/pull/2507 could cause the `toString`
calculation to diverge. Although the previous PR took tuples nested
inside tuples into account (and subscripted types cannot be nested
inside each other in our modelling), it did not account for having
this nesting be interleaved, and this is what caused the divergence.
I have not done the usual "test case first to show the problem
exists", since this would also diverge and take forever to fail. The
instance observed in `scipy` was likely caused by something akin to
```python
x = ()
while True:
x = x[(x,)]
```
Finally, to prevent this from happening with other types, I went
through and checked each instance where the string representation of
an `ObjectInternal` might potentially contain a reference to
itself (and thus explode). I encapsulated this in a
`bounded_toString` helper predicate, and used this in all the cases
where I was able to determine that the above _could_ happen.