The IR avoids having non-trivially-copyable and non-trivially-assignable types in register results, because objects of those types need to exist at a particular memory location. The `InitializeParameter` and `Uninitialized` instructions were violating this restriction because they returned register results, which were then stored into the destination location via a `Store`.
This change makes those two instructions take the destination address as an operand, and return a memory result representing the (un-)initialized memory, removing the need for a separate `Store` instruction.
For example, if you have 3 types called T, where t1 and t2 are defined
but t3 isn't, then you will have
unspecifiedtype(t1, t1)
unspecifiedtype(t2, t2)
unspecifiedtype(t3, t3)
t1 = resolve(t1)
t1 = resolve(t3)
t2 = resolve(t2)
t2 = resolve(t3)
so given
Type getUnspecifiedType() {
unspecifiedtype(unresolve(this), unresolve(result))
}
you get t1.getUnspecifiedType() = t2.
I think that in general the best thing to do is to not unresolve 'this',
but to just take the underlying value.
Casts to `void` did not have a semantic conversion type in the AST, so they also weren't getting generated correctly in the IR. I've added a `VoidConversion` class to the AST, along with tests. I've also added IR translation for such conversions, using a new `ConvertToVoid` opcode. I'm not sure if it's really necessary to generate an instruction to represent this, but it may be useful for detecting values that are explicitly unused (e.g. return value from a call).
I added two new sanity queries for the IR to detect the following:
- IR blocks with no successors, which usually indicates bad IR translation
- Phi instruction without an operand for one of the predecessor blocks.
These sanity queries found another subtle IR translation bug. If an expression that is normally translated as a condition (e.g. `&&`, `||`, or parens in certain contexts) has a constant value, we were not creating a `TranslatedExpr` for the expression at all. I changed it to always treat a constant condition as a non-condition expression.
All three `IRBlock.qll` files are now identical again, and they are just
a thin object-oriented layer on top of the three
`IRBlockConstruction.qll` files, two of which are identical.
As the queries live here, it makes sense for the suites to be versioned
together with them. The LGTM suite has already been moved. This commit
moves the actively-maintained non-LGTM suites.
Previously, we would try to find an element enclosing each macro
access. This is not in general well-defined, especially in the
context of template instantiations -- macros are a lexing-time
concept, and don't map cleanly onto AST elements.
`IRBlock` contains a few expensive predicates, mostly `getInstruction`
and `immediatelyDominates`. These were previously recomputed for each of
the three SSA layers even though they essentially produce the same
result in each layer. The only difference between the three types of
`IRBlock` is the phi nodes.
This commit changes the representation of `IRBlock` for `ssa` and
`aliased_ssa` so they become just wrappers around the `IRBlock` of their
previous layer. Most predicates in later layers are then computed from
the corresponding predicate of the preceding layer.
The `SSAConstruction::Cached::getInstructionOperand` predicate took
1m27s on a postgres snapshot before this change and was the slowest
predicate in SSAIR. It now takes 4.5s.
The slowdown was caused by its use of
`getUnmodeledDefinitionInstruction`, which got inlined into a place
where join orderer had little choice but to join the `MkInstruction`
relation with itself, creating a large intermediate relation.
I've added `pragma[noinline]` to `getUnmodeledDefinitionInstruction` and
also to similar predicates that are likely to cause the same problem in
the future.
Before this PR, the caching and computation of `IRBlock` spanned three
cache stages and was also separate from `SSAConstruction` even though it
shared some computations with it. They are now all cached together, so
the number of stages is reduced by 2 for each layer of IR.
I made the choice of what to cache be similar to what we do for
`PrimitiveBasicBlock` as I've recently benchmarked this and found it to
be a good choice.
The check that an instruction is in the same function as its operands is
hopefully redundant and can be removed. Just to be sure, I've added the
check to a sanity query.
This check turned out to cause bad performance in the alias analysis
because it got inlined into `AliasAnalysis::resultEscapes` and then
pulled out to a loop-invariant predicate that got a bad join order. With
this check removed, the `ssa/AliasAnalysis.qll` file is orders of
magnitude faster.
This follows what's been done for JavaScript. The `cpp-alerts-lgtm`
suite is now empty and will be auto-generated when building a dist.
This commit has no effect in itself, but these files need to be in place
when the corresponding changes are made in Semmle/code.