Use a `strictcount` to identify whether there is exactly one feature or not.
If so, we use it. If not, we use the empty string.
Add context to ensure we filter the set of data flow nodes down to only
the set of endpoint nodes.
This performance optimisation avoids calculating the Cartesian product
of data flow nodes and feature names, but it does not avoid calculating
the (slightly smaller) Cartesian product of endpoint nodes and feature names.
Product size = number of endpoint nodes * number of feature names.
At time of writing there are 8 feature names.
Change the cutoff logic from `count` to `strictcount`, since we know it only applies
to a non-empty set of results.
Use a single `strictconcat` aggregate to combine tokens in order of location,
instead of computing a `rank` followed by a `concat`.
Strictness introduces a slight change of behaviour because missing tokens will now result
in no results from the predicate rather than an empty feature string.
The `matchMarkerComment` predicate performs badly on any codebase with
a moderately large number of comments, because the current implementation
has to first compute the Cartesian product between the set of comments
and the set of framework library comment regexes.
Instead, match first against a single regex:
the union of all framework library comment regexes.
This computes a more benign Cartesian product, the same size as the set of comments.
See inline comments for more details.
The easiest way to implement this was to change the definition of
`module_export` to account for chains of `import *`. We reuse the
machinery from `ImportStar.qll` for this, naturally.
TL;DR: We were missing out on flow in the following situation:
`mod1.py`:
```python
foo = SOURCE
```
`mod2.py`:
```python
from mod1 import *
```
`test.py`:
```python
from mod2 import foo
SINK(foo)
```
This is because there's no node at which a read of `foo` takes place
within `test.py`, and so the added reads make no difference.
Unfortunately, this means the previous test was a bit too simplistic,
since it only looks for module variable reads and writes. Because of
this, we change the test to be a more traditional "all flow" style
(though restricted to `CfgNode`s).