Unrolling the transitive closure had slightly better performance here.
Also, we exclude names of builtins, since those will be handled by a
separate case of `isDefinedLocally`.
A few changes, all bundled together:
- We were getting a lot of magic applied to the predicates in the
`ImportStar` module, and this was causing needless re-evaluation.
To address this, the easiest solution was to simply cache the entire
module.
- In order to separate this from the dataflow analysis and make it
dependent only on control flow, `potentialImportStarBase` was changed
to return a `ControlFlowNode`.
- `isDefinedLocally` was defined on control flow nodes, which meant we
were duplicating a lot of tuples due to control flow splitting, to no
actual benefit.
Finally, there was a really bad join in `isDefinedLocally` that was
fixed by separating out a helper predicate. This is a case where we
could use a three-way join, since the join between the `Scope`, the
`name` string and the `Name` is big no matter what.
If we join `scope_defines_name` with `n.getId()`, we'll get `Name`s
belonging to irrelevant scopes.
If we join `scope_defines_name` with the enclosing scope of the `Name`
`n`, then we'll get this also for `Name`s that don't share their `getId`
with the local variable defined in the scope.
If we join `n.getId()` with `n.getScope()...` then we'll get all
enclosing scopes for each `Name`.
The last of these is what we currently have. It's not terrible, but not
great either. (Though thankfully it's rare to have lots of enclosing
scopes.)
The easiest way to implement this was to change the definition of
`module_export` to account for chains of `import *`. We reuse the
machinery from `ImportStar.qll` for this, naturally.
TL;DR: We were missing out on flow in the following situation:
`mod1.py`:
```python
foo = SOURCE
```
`mod2.py`:
```python
from mod1 import *
```
`test.py`:
```python
from mod2 import foo
SINK(foo)
```
This is because there's no node at which a read of `foo` takes place
within `test.py`, and so the added reads make no difference.
Unfortunately, this means the previous test was a bit too simplistic,
since it only looks for module variable reads and writes. Because of
this, we change the test to be a more traditional "all flow" style
(though restricted to `CfgNode`s).
Adds result for `ModuleVariableNode::getARead` corresponding to reads
that go through (chains of) `import *`.
This required a bit of a change to _which_ module variables we define.
Previously, we only included variables that were accessed elsewhere in
the same file, but now we must ensure to also include variables that may
be accessed through `import *`.
Thanks @yoff, this leaves us with the following evaluation, which looks
very close to the one in the other fix (but with cleaner implementation)
-- both at 688k max tuples (although numbers are not exactly the same).
```
[2021-11-24 13:48:40] (14s) Tuple counts for PoorMansFunctionResolution::getSimpleMethodReferenceWithinClass#ff/2@e5f05asv after 74ms:
47493 ~3% {3} r1 = JOIN Class::Class::getAMethod_dispred#ff WITH py_Classes ON FIRST 1 OUTPUT Lhs.1, 0, Lhs.0
47335 ~0% {2} r2 = JOIN r1 WITH AstGenerated::Function_::getArg_dispred#fff ON FIRST 2 OUTPUT Rhs.2, Lhs.2
46683 ~0% {2} r3 = JOIN r2 WITH DataFlowPublic::ParameterNode::getParameter_dispred#fb_10#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1
259968 ~4% {2} r4 = JOIN r3 WITH LocalSources::Cached::hasLocalSource#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1
161985 ~0% {3} r5 = JOIN r4 WITH Attributes::AttrRef::accesses_dispred#bff_102#join_rhs ON FIRST 1 OUTPUT Rhs.1 'result', Lhs.1, Rhs.2
161985 ~2% {3} r6 = JOIN r5 WITH Attributes::AttrRead#class#f ON FIRST 1 OUTPUT Lhs.2, Lhs.1, Lhs.0 'result'
688766 ~0% {3} r7 = JOIN r6 WITH Function::Function::getName_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Lhs.1, Rhs.1 'func', Lhs.2 'result'
20928 ~0% {2} r8 = JOIN r7 WITH Class::Class::getAMethod_dispred#ff ON FIRST 2 OUTPUT Lhs.1 'func', Lhs.2 'result'
return r8
```