Changes the `forex` range to join on both `this` (the current `ExprEvaluator`) and `ret` (the expected function return value),
so that we look at the relevant return values rather than all interesting functions.
This predicate looked like a join of two already-computed predicates,
but it was a bit more complicated because the `*` operator expands into
two cases: the reflexive case and the transitive case. The join order
for the transitive case placed the `PrimitiveBasicBlock` charpred call
_after_ the `member_step+` call, which means that all the tuples of
`member_step+` passed through the pipeline.
This commit changes the implementation by fully writing out the
expansion of `*` into two cases, where the base case is manually
specialised to make sure the join orderer doesn't get tempted into
reusing the same strategy for both cases. This speeds up the predicate
from 2m38s to 1s on a snapshot of our own C/C++ code.
The existing implementation of `primitive_basic_block_entry_node` was
"cleverly" computing two properties about `node` with a single
`strictcount`: whether `node` had multiple predecessors and whether any
of those predecessors had more than once successor. This was fast enough
on most snapshots, but on the snapshot of our own code it took 37
seconds to compute `primitive_basic_block_entry_node` and its auxiliary
predicates. This is likely to have affected other large snapshots too.
With this change, the property is computed like in our other languages,
and it brings the run time down to 4 seconds.
After the recent inlining of `unresolveElement`, the join order in
`CommentedOutCode` became a problem. The join orderer was tempted to
join the two `hasLocationInfo` calls first because they had one column
in common. With this commit, they have no columns in common. It follows
from the other predicates in the same file that this column would be the
same, so there is no need to assert it in this predicate and risk that
the join orderer uses that information.
On Wireshark, the `CommentBlock::hasLocationInfo` predicate goes from
taking 2m2s to taking 180ms. The query produces the same 7,448 alerts.
The change to make `unresolveElement` a member predicate was helpful for
the optimiser when it dispatched on `this`, but now that it "dispatches"
on `result` it's just an unnecessary pollution of the `ElementBase`
namespace.
This change means that there are no results for `unresolveElement(t)`
where `t` is a "junk type" -- a class definition that is not in the
image of `resolveClass`. These "junk types" still exist as `Element`s,
but they will never be returned by any predicate that goes through
`unresolveElement` to query the db.
We get a small reduction in DIL size and a significant speed
improvement. The DIL for `NewArrayDeleteMismatch.ql` is reduced from
27,630 lines to 27,507 lines, and the total analysis time for the LGTM
suite on jdk8u is reduced from 1158s to 984s.
Raw classes from the database that are incomplete and should be
represented by their complete twin are now allowed to be `Element`s for
performance reasons, but this commit prevents them from being `Type`s.
It was causing confusion in test results and might also cause confusion
in queries.
Also remove the charpred of ElementBase. This gets rid of many redundant
charpred checks. It means that incomplete classes from the db are now
`Element`s, which is maybe noisy but should not be harmful.
Together, these changes give a great reduction in DIL and should help
the optimiser. It brings the DIL of `UncontrolledFormatString.ql` down
from 43,908 lines to 35,400 lines.
We currently erroneously keep mentions of class instantiations, which
can lead to bad performance on template-heavy code bases. We never
want to link those anyway, so we can simply suppress them.
Also exclude templates as their names are not canonical.
The test changes in `isfromtemplateinstantiation/` are the inverses of
what we got in 34c9892f7, which should be a good thing.
These tests exercise the problematic cases where a variable can appear
to have multiple types because of how we fail to account for qualified
names when comparing type names.
With the new formulation, we can join on function and index at the
same time, leading to significant performance gains on large code
bases that use templates extensively.
This query gets optimized badly, and it has started timing out when we
run it on our own code base. Most of the evaluation time is spent in an
RA predicate named `#select#cpe#1#f#antijoin_rhs#1`, which takes 1m36s a
Wireshark snapshot.
This restructuring of the code makes the problematic RA predicate go
away.
On a snapshot of Postgres, evaluation of
`getNextExplicitlyInitializedElementAfter#fff#antijoin_rhs#1` took
forever, preventing the computation of the IR. I haven't been able to
reproduce it with a small test case, but the implementation of
`getNextExplicitlyInitializedElementAfter` was fragile because it called
the inline predicate `ArrayAggregateLiteral.isInitialized`. It also
seemed inefficient that `getNextExplicitlyInitializedElementAfter` was
computed for many values of its parameters that were never needed by the
caller.
This commit replaces `getNextExplicitlyInitializedElementAfter` with a
new predicate named `getEndOfValueInitializedRange`, which should have
the same behavior but a more efficient implementation. It uses a helper
predicate `getNextExplicitlyInitializedElementAfter`, which shares its
name with the now-deleted predicate but has behavior that I think
matches the name.
The overrides of `Instruction.getOperandMemoryAccess` did not relate
`this` to any of its other parameters, which made it attempt to compute
the Cartesian product of `Instruction` and `TPhiOperand`. This happened
only during computation of aliased SSA. Perhaps the optimizer was able
to eliminate the CP for the non-aliased SSA computation.
With this change, I'm able to compute aliased SSA for medium-sized
snapshots.
Instead of computing these two things in one predicate, they are
computed in separate predicates and then joined. This splits the
predicate `getInstruction`, which took 81s before, into predicates that
together take 20s on a medium-sized db.
Moved IR flavors into "implementation", with internal files under "implementation/internal". Made `IRBlockConstruction` just a nested module of `IRConstruction`/`SSAConstruction`, so it gets picked up from the `Construction` parameter of the `IR` module, rather than being picked up just from being in the same directory as `IRBlock`.