Taken from Ruby, except that `getURL` member predicate was changed to
`getUrl` to keep consistency with the rest of our concepts, and stick
to our naming convention.
Unrolling the transitive closure had slightly better performance here.
Also, we exclude names of builtins, since those will be handled by a
separate case of `isDefinedLocally`.
A few changes, all bundled together:
- We were getting a lot of magic applied to the predicates in the
`ImportStar` module, and this was causing needless re-evaluation.
To address this, the easiest solution was to simply cache the entire
module.
- In order to separate this from the dataflow analysis and make it
dependent only on control flow, `potentialImportStarBase` was changed
to return a `ControlFlowNode`.
- `isDefinedLocally` was defined on control flow nodes, which meant we
were duplicating a lot of tuples due to control flow splitting, to no
actual benefit.
Finally, there was a really bad join in `isDefinedLocally` that was
fixed by separating out a helper predicate. This is a case where we
could use a three-way join, since the join between the `Scope`, the
`name` string and the `Name` is big no matter what.
If we join `scope_defines_name` with `n.getId()`, we'll get `Name`s
belonging to irrelevant scopes.
If we join `scope_defines_name` with the enclosing scope of the `Name`
`n`, then we'll get this also for `Name`s that don't share their `getId`
with the local variable defined in the scope.
If we join `n.getId()` with `n.getScope()...` then we'll get all
enclosing scopes for each `Name`.
The last of these is what we currently have. It's not terrible, but not
great either. (Though thankfully it's rare to have lots of enclosing
scopes.)
From `pritomrajkhowa/LoopBound`:
```
Definitions.ql-7:PointsTo::PointsToInternal::scope_entry_points_to#ffff#antijoin_rhs#2 ........... 55.1s
```
specifically
```
(443s) Tuple counts for PointsTo::PointsToInternal::scope_entry_points_to#ffff#antijoin_rhs#2/3@74a7cart after 55.1s:
184070 ~0% {3} r1 = JOIN PointsTo::PointsToInternal::scope_entry_points_to#ffff#shared#1 WITH Variables::GlobalVariable#class#f ON FIRST 1 OUTPUT Lhs.0 'arg2', Lhs.1 'arg0', Lhs.2 'arg1'
184070 ~0% {3} r2 = STREAM DEDUP r1
919966523 ~2% {4} r3 = JOIN r2 WITH Essa::EssaDefinition::getSourceVariable_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'arg0', Lhs.2 'arg1', Lhs.0 'arg2'
4281779 ~2293% {3} r4 = JOIN r3 WITH Essa::EssaVariable::getScope_dispred#ff ON FIRST 2 OUTPUT Lhs.1 'arg0', Lhs.2 'arg1', Lhs.3 'arg2'
return r4
```
First, this is an `antijoin`, so there's likely some negation involved.
Also, there's mention of `GlobalVariable`, `getScope`, and
`getSourceVariable`, none of which appear in `scope_entry_points_to`, so
it's likely that something got inlined.
Taking a closer look at the predicates mentioned in the body, we spot
`undefined_variable` as a likely culprit.
Evaluating this predicate in isolation reveals that it's not terribly
big, so we could try just marking it with `pragma[noinline]` (I opted
for the slightly more solid `nomagic`) and see how that fares. I also
checked that `builtin_not_in_outer_scope` was similarly small, and
made that one un-inlineable as well.
The result? Well, I can't even show you. Both `scope_entry_points_to`
and `undefined_variable` are so fast that they don't appear in the
clause timing report (so they can at most take 3.5s each to evaluate, as
that is the smallest timing in the list).
From `pritomrajkhowa/LoopBound`:
```
Definitions.ql-7:PointsTo::InterProceduralPointsTo::callsite_points_to#ffff#join_rhs#3 ........... 5m53s
```
specifically
```
(767s) Tuple counts for PointsTo::InterProceduralPointsTo::callsite_points_to#ffff#join_rhs#3/3@f8f86764 after 5m53s:
832806293 ~0% {4} r1 = JOIN PointsTo::InterProceduralPointsTo::callsite_points_to#ffff#shared#1 WITH PointsTo::InterProceduralPointsTo::var_at_exit#fff ON FIRST 1 OUTPUT Lhs.0, Lhs.1 'arg1', Rhs.1 'arg2', Rhs.2 'arg0'
832806293 ~0% {3} r2 = JOIN r1 WITH Essa::TEssaNodeRefinement#ffff_03#join_rhs ON FIRST 2 OUTPUT Lhs.3 'arg0', Lhs.1 'arg1', Lhs.2 'arg2'
return r2
```
This one is a bit tricky to unpack. Where is this `shared#1` defined?
```
EVALUATE NONRECURSIVE RELATION:
SYNTHETIC PointsTo::InterProceduralPointsTo::callsite_points_to#ffff#shared#1(int arg0, numbered_tuple arg1) :-
SENTINEL PointsTo::InterProceduralPointsTo::callsite_points_to#ffff#shared
SENTINEL Definitions::EscapingAssignmentGlobalVariable#class#f
SENTINEL Essa::TEssaNodeRefinement#ffff_03#join_rhs
{2} r1 = JOIN PointsTo::InterProceduralPointsTo::callsite_points_to#ffff#shared WITH Definitions::EscapingAssignmentGlobalVariable#class#f ON FIRST 1 OUTPUT Lhs.0 'arg0', Lhs.1 'arg1'
{2} r2 = STREAM DEDUP r1
{2} r3 = JOIN r2 WITH Essa::TEssaNodeRefinement#ffff_03#join_rhs ON FIRST 2 OUTPUT Lhs.0 'arg0', Lhs.1 'arg1'
{2} r4 = STREAM DEDUP r3
return r4
```
Looking at `callsite_points_to`, we see a likely candidate in `srcvar`.
It is guarded with an `instanceof` check for
`EscapingAssignmentGlobalVariable` (which lines up nicely with the
sentinel on its charpred) and `getSourceVariable` is just a projection
of `TEssaNodeRefinement`.
So let's try unbinding `srcvar` to prevent an early join.
The timing is now:
```
Definitions.ql-7:PointsTo::InterProceduralPointsTo::callsite_points_to#ffff ...................... 31.3s (2554 evaluations with max 101ms in PointsTo::InterProceduralPointsTo::callsite_points_to#ffff/4@i516#581fap5w)
```
(Showing the tuple counts doesn't make sense here, since all of the
`shared` and `join_rhs` predicates have been smooshed around.)
On `pritomrajkhowa/LoopBound`:
```
Definitions.ql-3:SsaCompute::SsaComputeImpl::AdjacentUsesImpl::adjacentUseUse#ff ................. 4m35s
```
specifically
```
(376s) Tuple counts for SsaCompute::SsaComputeImpl::AdjacentUsesImpl::adjacentUseUse#ff/2@be04e9kp after 4m58s:
388843 ~0% {4} r1 = JOIN Essa::TPhiFunction#fff_2#join_rhs WITH SsaCompute::SsaComputeImpl::AdjacentUsesImpl::definesAt#ffff ON FIRST 1 OUTPUT Rhs.1, Lhs.0, Rhs.2, Rhs.3
3629812090 ~1% {7} r2 = JOIN r1 WITH SsaCompute::SsaComputeImpl::variableUse#ffff ON FIRST 1 OUTPUT Lhs.0, Rhs.2, Rhs.3, Lhs.2, Lhs.3, Lhs.1, Rhs.1 'use1'
0 ~0% {2} r3 = JOIN r2 WITH SsaCompute::SsaComputeImpl::AdjacentUsesImpl::adjacentVarRefs#fffff ON FIRST 5 OUTPUT Lhs.5, Lhs.6 'use1'
0 ~0% {2} r4 = JOIN r3 WITH SsaCompute::SsaComputeImpl::AdjacentUsesImpl::firstUse#ff ON FIRST 1 OUTPUT Lhs.1 'use1', Rhs.1 'use2'
897141 ~0% {2} r5 = SsaCompute::SsaComputeImpl::AdjacentUsesImpl::adjacentUseUseSameVar#ff UNION r4
return r5
```
Clearly we do not want to join on the variable so soon. So we unbind it
and get
```
(78s) Tuple counts for SsaCompute::SsaComputeImpl::AdjacentUsesImpl::adjacentUseUse#ff/2@40e0e6uv after 434ms:
3377959 ~2% {4} r1 = SCAN SsaCompute::SsaComputeImpl::variableUse#ffff OUTPUT In.0, In.2, In.3, In.1 'use1'
1026855 ~2% {4} r2 = JOIN r1 WITH SsaCompute::SsaComputeImpl::AdjacentUsesImpl::adjacentVarRefs#fffff ON FIRST 3 OUTPUT Lhs.0, Rhs.3, Rhs.4, Lhs.3 'use1'
129484 ~0% {2} r3 = JOIN r2 WITH SsaCompute::SsaComputeImpl::AdjacentUsesImpl::definesAt#ffff_1230#join_rhs ON FIRST 3 OUTPUT Rhs.3, Lhs.3 'use1'
0 ~0% {2} r4 = JOIN r3 WITH Essa::TPhiFunction#fff_2#join_rhs ON FIRST 1 OUTPUT Lhs.0, Lhs.1 'use1'
0 ~0% {2} r5 = JOIN r4 WITH SsaCompute::SsaComputeImpl::AdjacentUsesImpl::firstUse#ff ON FIRST 1 OUTPUT Lhs.1 'use1', Rhs.1 'use2'
897141 ~0% {2} r6 = SsaCompute::SsaComputeImpl::AdjacentUsesImpl::adjacentUseUseSameVar#ff UNION r5
return r6
```
The easiest way to implement this was to change the definition of
`module_export` to account for chains of `import *`. We reuse the
machinery from `ImportStar.qll` for this, naturally.
These operate on file descriptors, and not on paths. file descriptors
doesn't fit into the rest of our modeling, so I would rather remove them
than to make it look like it's properly handled.
I also did not include any of the functions that work on file
descriptors when looking through all of `os`. So this keeps everything
consistent at least ;)
Adds result for `ModuleVariableNode::getARead` corresponding to reads
that go through (chains of) `import *`.
This required a bit of a change to _which_ module variables we define.
Previously, we only included variables that were accessed elsewhere in
the same file, but now we must ensure to also include variables that may
be accessed through `import *`.
Thanks @yoff, this leaves us with the following evaluation, which looks
very close to the one in the other fix (but with cleaner implementation)
-- both at 688k max tuples (although numbers are not exactly the same).
```
[2021-11-24 13:48:40] (14s) Tuple counts for PoorMansFunctionResolution::getSimpleMethodReferenceWithinClass#ff/2@e5f05asv after 74ms:
47493 ~3% {3} r1 = JOIN Class::Class::getAMethod_dispred#ff WITH py_Classes ON FIRST 1 OUTPUT Lhs.1, 0, Lhs.0
47335 ~0% {2} r2 = JOIN r1 WITH AstGenerated::Function_::getArg_dispred#fff ON FIRST 2 OUTPUT Rhs.2, Lhs.2
46683 ~0% {2} r3 = JOIN r2 WITH DataFlowPublic::ParameterNode::getParameter_dispred#fb_10#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1
259968 ~4% {2} r4 = JOIN r3 WITH LocalSources::Cached::hasLocalSource#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1
161985 ~0% {3} r5 = JOIN r4 WITH Attributes::AttrRef::accesses_dispred#bff_102#join_rhs ON FIRST 1 OUTPUT Rhs.1 'result', Lhs.1, Rhs.2
161985 ~2% {3} r6 = JOIN r5 WITH Attributes::AttrRead#class#f ON FIRST 1 OUTPUT Lhs.2, Lhs.1, Lhs.0 'result'
688766 ~0% {3} r7 = JOIN r6 WITH Function::Function::getName_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Lhs.1, Rhs.1 'func', Lhs.2 'result'
20928 ~0% {2} r8 = JOIN r7 WITH Class::Class::getAMethod_dispred#ff ON FIRST 2 OUTPUT Lhs.1 'func', Lhs.2 'result'
return r8
```