Some of our IR consistency failure query predicates already produced results in the schema as an `@kind problem` query, including `$@` replacements for the enclosing `IRFunction` to make it easier to figure out which function to dump when debugging. This change moves the rest of the query predicates in `IRConsistency.qll` to do the same. In addition, it wraps each call to `getEnclosingIRFunction()` to return an `OptionalIRFunction`, which can be either a real `IRFunction` or a placeholder in case `getEnclosingIRFunction()` returned no results. This exposes a couple new consistency failures in `syntax-zoo`, which will be fixed in a subsequent commit.
This change also deals with consistency failures when the enclosing `IRFunction` has more than one `Function` or `Location`. For multiple `Function`s, we concatenate the function names. For multiple `Location`s, we pick the first one in lexicographical order. This changes the number of results produced in the existing tests, but does't change the actual number of problems.
While investigating a bug with `TInstruction` sharing, I discovered that we had a case where alias analysis could create two `VirtualVariable`s for the same `Allocation`. For an indirect parameter allocation, we were using the type of the pointer variable as the type of the indirect allocation, instead of just `Unknown`. If the `IRType` of the pointer variable was the same type as the type of at least one access to the indirect allocation, we'd create both an `EntireAllocationVirtualVariable` and a `VariableVirtualVariable` for the allocation.
I added a new consistency test to guard against this in the future. This also turned out to be the root cause of the one existing known consistency failure in the IR tests.
The IR generation for `InitializeIndirection` currently connects its load operand to the result of the corresponding `InitializeParameter` instruction. This isn't exactly wrong, but it doesn't fit the IR invariant of "All unmodeled uses consume `UnmodeledDefinition`". Our current code doesn't care, because we just throw away all of the existing def-use information, modeled or otherwise, when we build unaliased SSA. However, some upcoming SSA changes don't work correctly if this invariant is broken.
I've added the trivial IR generation change, along with a new sanity query.
`Instruction.getDefinitionOverlap()` depends on `SSAConstruction::getMemoryOperandDefinition()`, which in turn depends on `SSAConstruction::hasMemoryOperandDefinition()`. When the definition in question came from a `Chi` instruction, `hasMemoryOperandDefinition()` incorrectly bound `overlap` to the overlap relationship between the original (non-`Chi`) instruction and the use. The fix is to make use of the `actualDefLocation` parameter to `getDefinitionOrChiInstruction()`, which specifies the location for the result of the `Chi` in that case.
The result of `getDefinitionOverlap()` should never be `MayPartiallyOverlap`, because if that were the case, we should have inserted as `Chi` instruction and hooked the definition up to that instead.
There are quite a few existing failures.
This predicate replaces `isChiForAllAliasedMemory`, which was always
intended to be temporary. A test is added to `IRSanity.qll` to verify
that the new predicate corresponds exactly with (a fixed version of) the
old one.
The implementation of the new predicate,
`Cached::hasConflatedMemoryResult` in `SSAConstruction.qll`, is faster
to compute than the old `isChiForAllAliasedMemory` because it uses
information that's readily available during SSA construction.
Also fixed a minor bug where we should have been treating `AllNonLocalMemory` as _totally_ overlapping an access to a non-local variable, rather than _partially_ overlapping it. This fix is exhibited both in the new test case and in a couple existing test functions in `ssa.cpp`.
We were hitting a combinatorial explosion in `hasDefinitionAtRank` for functions that contain a large number of string literals. The problem was that every `Chi` instruction for `AliasedVirtualVariable` was treated as a definition of every string literal. We already mark string literals as `isReadOnly()`, but we were allowing `AliasedVirtualVariable` to define read-only locations so that the `AliasedDefinition` instruction would provide the initial definition for all string literals.
To fix this, I've introduced the new `InitializeNonLocal` instruction, which is inserted in the prologue of every function right after `AliasedDefinition`. It provides the initial definition for every non-stack memory location, including read-only locations, but is never written to anywhere else. It is the conterpart of the `AliasedUse` instruction in the function epilogue, which represents the use of all non-stack memory after the function returns. I considered renaming `AliasedUse` to `ReturnNonLocal`, to match the `InitializeXXX`/`ReturnXXX` pattern we already use for parameters and indirections, but held off to avoid unnecessary churn. Any thoughts on whether I should make this name change?
This change has a significant speedup in evaluation time for a few of our troublesome databases:
`attnam/ivan`: 13%
`awslabs/s2n`: 26%
`SinaMostafanejad/OpenRDM`: 7%
`zcoinofficial/zcoin`: 8%