This predicate got an unfortunate join order, leading to these tuple
counts on ElektraInitiative/libelektra:
(290s) Tuple counts for AliasedSSA::VariableMemoryLocation::coversEntireVariable_dispred#f:
57117 ~0% {3} r1 = SCAN IRType::IRType::getByteSize_dispred#ff AS I OUTPUT 0, (I.<1> * 8), I.<0>
421445272 ~0% {3} r2 = JOIN r1 WITH AliasedSSA::VariableMemoryLocation#fffffff_5601#join_rhs AS R ON FIRST 2 OUTPUT R.<3>, r1.<2>, R.<2>
103282 ~2% {1} r3 = JOIN r2 WITH AliasConfiguration::Allocation::getIRType_dispred#ff AS R ON FIRST 2 OUTPUT r2.<2>
return r3
With this commit, we get these tuple counts instead:
(0s) Tuple counts for AliasedSSA::VariableMemoryLocation::varIRTypeHasBitRange#bff:
361874 ~0% {3} r1 = SCAN AliasedSSA::VariableMemoryLocation#fffffff AS I OUTPUT I.<1>, 0, I.<0>
361874 ~0% {3} r2 = JOIN r1 WITH AliasConfiguration::Allocation::getIRType_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, 0, r1.<2>
361874 ~1% {3} r3 = JOIN r2 WITH IRType::IRType::getByteSize_dispred#ff AS R ON FIRST 1 OUTPUT r2.<2>, 0, (R.<1> * 8)
return r3
(0s) Tuple counts for AliasedSSA::VariableMemoryLocation::coversEntireVariable_dispred#f:
103282 ~2% {1} r1 = JOIN AliasedSSA::VariableMemoryLocation#fffffff_056#join_rhs AS L WITH AliasedSSA::VariableMemoryLocation::varIRTypeHasBitRange#bff AS R ON FIRST 3 OUTPUT L.<0>
103282 ~2% {1} r2 = STREAM DEDUP r1
return r2
Also fixed a minor bug where we should have been treating `AllNonLocalMemory` as _totally_ overlapping an access to a non-local variable, rather than _partially_ overlapping it. This fix is exhibited both in the new test case and in a couple existing test functions in `ssa.cpp`.
It turns out that the evaluator will evaluate the GVN stage even when no
predicate from it is needed after optimization of the subsequent stages.
The GVN library is expensive to evaluate, and it'll become even more
expensive when we switch its implementation to IR.
This PR disables the use of GVN in `DataFlow::BarrierNode` for the AST
data-flow library, which should improve performance when evaluating a
single data-flow query on a snapshot with no cache. Precision decreases
slightly, leading to a new FP in the qltests.
There is no corresponding change for the IR data-flow library since IR
GVN is not very expensive.
We were hitting a combinatorial explosion in `hasDefinitionAtRank` for functions that contain a large number of string literals. The problem was that every `Chi` instruction for `AliasedVirtualVariable` was treated as a definition of every string literal. We already mark string literals as `isReadOnly()`, but we were allowing `AliasedVirtualVariable` to define read-only locations so that the `AliasedDefinition` instruction would provide the initial definition for all string literals.
To fix this, I've introduced the new `InitializeNonLocal` instruction, which is inserted in the prologue of every function right after `AliasedDefinition`. It provides the initial definition for every non-stack memory location, including read-only locations, but is never written to anywhere else. It is the conterpart of the `AliasedUse` instruction in the function epilogue, which represents the use of all non-stack memory after the function returns. I considered renaming `AliasedUse` to `ReturnNonLocal`, to match the `InitializeXXX`/`ReturnXXX` pattern we already use for parameters and indirections, but held off to avoid unnecessary churn. Any thoughts on whether I should make this name change?
This change has a significant speedup in evaluation time for a few of our troublesome databases:
`attnam/ivan`: 13%
`awslabs/s2n`: 26%
`SinaMostafanejad/OpenRDM`: 7%
`zcoinofficial/zcoin`: 8%