Notice the strange thing with treating `mypkg.foo(42)` as a ClassCall,
but completely ignoring `mypkg.subpkg.bar(43)` -- due to having the two
`ClassValue`s:
- `Missing module attribute mypkg.foo`
- `Missing module attribute mypkg.subpkg`
But not `Missing module attribute mypkg.subpkg` with the current import
structure.
Since this caused bad performance (as we had to evaluate points-to).
Fixes https://github.com/github/codeql/issues/6964
This approach was motivated by the comment on the issue from @tausbn:
> We discussed this internally in the CodeQL Python team, and have
> agreed that the best approach for now is to disable the printing of
> regex ASTs.
I tried to keep our RegExpTerm logic, but doing the fix below did not
work, and still evaluated RegExpTerm :| I guess we will just have to
revert this PR if we want it back
```diff
TRegExpTermNode(RegExpTerm term) {
+ none() and
exists(StrConst str | term.getRootTerm() = getParsedRegExp(str) and shouldPrint(str, _))
}
```
I had to rewrite the SINK1-SINK7 definitions, since this new requirement
complained that we had to add this `MISSING: flow` annotation :D
Doing this implementation also revealed that there was a bug, since I
did not compare files when checking for these `MISSING:` annotations. So
fixed that up in the implementation for inline taint tests as well.
(extra whitespace in argumentPassing.py to avoid changing line numbers
for other tests)
I went with NormalDataflowTest to signify that if you don't know what
you're looking for, this is probably the one. I did not want to just
call it DataflowTest, since that becomes a big vague when there are also
`FlowTest.qll` and `MaximalFlowTest.qll` -- I'm open to renaming this
though 👍
TL;DR: Something introduced the following bad join order:
```
(227s) Tuple counts for dom#TObject::TPythonTuple#ff/2@i2#8f58670w after 3m46s:
25000 ~0% {2} r1 = SCAN PointsToContext::PointsToContext::appliesToScope_dispred#ff#prev_delta OUTPUT In.1, In.0 'context'
24000 ~1% {2} r2 = JOIN r1 WITH @py_scope#f ON FIRST 1 OUTPUT Lhs.1 'context', Lhs.0
1076876712 ~6% {3} r3 = JOIN r2 WITH Flow::TupleNode#class#f CARTESIAN PRODUCT OUTPUT Rhs.0, Lhs.0 'context', Lhs.1
870129666 ~0% {3} r4 = JOIN r3 WITH Flow::ControlFlowNode::isLoad_dispred#f ON FIRST 1 OUTPUT Lhs.1 'context', Lhs.2, Lhs.0 'origin'
870129000 ~0% {3} r5 = r4 AND NOT dom#TObject::TPythonTuple#ff#prev(Lhs.2 'origin', Lhs.0 'context')
870129000 ~1% {3} r6 = SCAN r5 OUTPUT In.2 'origin', In.1, In.0 'context'
9000 ~0% {2} r7 = JOIN r6 WITH Flow::ControlFlowNode::getScope_dispred#ff ON FIRST 2 OUTPUT Lhs.0 'origin', Lhs.2 'context'
return r7
```
(...the above being the tuple counts _at the point when I cancelled the
query_!)
Rewriting the code to force a join between `TupleNode#class` and
`getScope` results in the following join orders:
```
(0s) Tuple counts for TObject::scope_loads_tuplenode#ff/2@b3cf0bo5 after 13ms:
37369 ~3% {1} r1 = JOIN Flow::TupleNode#class#f WITH Flow::ControlFlowNode::isLoad_dispred#f ON FIRST 1 OUTPUT Lhs.0 'origin'
37369 ~3% {2} r2 = JOIN r1 WITH Flow::ControlFlowNode::getScope_dispred#ff ON FIRST 1 OUTPUT Rhs.1 's', Lhs.0 'origin'
return r2
```
and
```
(78s) Tuple counts for dom#TObject::TPythonTuple#ff/2@i53#121c440w after 6ms:
34736 ~3% {2} r1 = SCAN PointsToContext::PointsToContext::appliesToScope_dispred#ff#prev_delta OUTPUT In.1, In.0 'context'
7370 ~5% {2} r2 = JOIN r1 WITH TObject::scope_loads_tuplenode#ff ON FIRST 1 OUTPUT Lhs.1 'context', Rhs.1 'origin'
7370 ~5% {2} r3 = r2 AND NOT dom#TObject::TPythonTuple#ff#prev(Lhs.1 'origin', Lhs.0 'context')
7370 ~1% {2} r4 = SCAN r3 OUTPUT In.1 'origin', In.0 'context'
return r4
```
the latter being the largest iteration of `dom#TPythonTuple` throughout
the log.
No other major performance issues were observed.