Unsurprisingly, the only thing affected by this was the `import-helper`
tests. These have lost all of the results relating to `ImportMember`s,
but apart from that the underlying behaviour should be the same.
I also limited the test to only `CfgNode`s, as a bunch of `EssaNode`s
suddenly appeared when I switched to API graphs.
Finally, I used `API::moduleImport` with a dotted name in the type
tracking tests. This goes against the API graphs interface, but I think
it's more correct for this use case, as these type trackers are doing
the "module attribute lookup" bit manually.
We were not supporting `except` statements handling multiple exception
types (specified as a tuple) correctly, instead just returning the
tuple itself as the "type" (which makes little sense).
To fix this, we explicitly extract the elements of this node, in the
case where it _is_ a tuple.
This is a change that can potentially affect many queries (as `getType`
is used in quite a few places), so some care should be taken to
ensure that this does not adversely affect performance.
Moves the current test out of `test.py`, as otherwise any unknown global
(like, say, `sink`) would _also_ be considered to be something
potentially defined in `unknown`.
Not the prettiest of solutions, but it does the job. Basically, we were
calculating (and re-calculating) the same big relation between strings
and regexes and then checking whether the latter matched the former.
This resulted in tuple counts like the following:
```
[2021-07-12 16:09:24] (12s) Tuple counts for SensitiveDataSources::SensitiveDataModeling::SensitiveVariableAssignment#class#ff#shared/4@7489c6:
4918074 ~0% {4} r1 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH Flow::NameNode::getId_dispred#ff CARTESIAN PRODUCT OUTPUT Lhs.0 'arg0', Lhs.1 'arg1', Rhs.0, Rhs.1 'arg3'
2654 ~0% {4} r2 = JOIN r1 WITH PRIMITIVE regexpMatch#bb ON Lhs.3 'arg3',Lhs.1 'arg1'
return r2
```
(The above being just the bit that handles `DefinitionNode` in
`SensitiveVariableAssignment`, and taking 12 seconds to evaluate.)
By applying a bit of manual inlining and magic, this becomes somewhat
more manageable:
```
[2021-07-12 15:59:44] (1s) Tuple counts for SensitiveDataSources::SensitiveDataModeling::sensitiveString#ff/2@8830e2:
27671 ~2% {3} r1 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveParameterName#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0
334012 ~2% {3} r2 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveName#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0
361683 ~11% {3} r3 = r1 UNION r2
154644 ~0% {3} r4 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveFunctionName#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0
149198 ~1% {3} r5 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveStrConst#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0
124257 ~5% {3} r6 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveAttributeName#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0
273455 ~21% {3} r7 = r5 UNION r6
428099 ~30% {3} r8 = r4 UNION r7
789782 ~78% {3} r9 = r3 UNION r8
1121 ~77% {3} r10 = JOIN r9 WITH PRIMITIVE regexpMatch#bb ON Lhs.2 'result',Lhs.1
1121 ~70% {2} r11 = SCAN r10 OUTPUT In.0 'classification', In.2 'result'
return r11
```
(The above being the total for all the sensitive names we care about,
taking only 1.2 seconds to evaluate.)
Incidentally, you may wonder why this has _fewer_ results than before.
The answer is control flow splitting -- every sensitively-named
`DefinitionNode` would have been matched in isolation previously. By
pre-matching on just the names of these, we can subsequently join
against those names that are known to be sensitive, which is a much
faster operation.
(We also get the benefit of deduplicating the strings that are matched,
before actually performing the match, so if, say, an attribute name and
a variable name are identical, then we'll only match them once.)
We also exclude all docstrings as relevant string constants, as these
presumably don't actually flow anywhere.