This changes the test output because `VariableDeclarationGroup.toString`
changes to be the one inherited from VariableDeclarationEntry. This
should not affect the output as shown by any front end because
the string to be displayed to the user for a `$@` interpolation comes
from the following column instead.
This avoids using a raw db type.
It is possible for `SuppressionComment` and `SuppressionScope` to have
different locations because `SuppressionScope` defines `hasLocationInfo`
as a new rootdef whereas `SuppressionComment` only responds to
`getLocation` that it inherited. In interpretation of query results, a
`hasLocationInfo` predicate is preferred over `getLocation` if it
exists.
Many classes have been declared with `extends @cfgnode` because they
should be implemented internally as a control-flow node but should not
expose the member predicates of `ControlFlowNode` to their users. After
the transition in a1e44041e it became mandatory to convert explicitly
between the `Element`-derived `ControlFlowNode` and the raw dbtype
`@cfgnode`, and that commit inserted numerous such conversions as a
result of having all those classes that did not derive from `Element` in
the standard library.
It was also confusing and error-prone that the libraries implementing
`ControlFlowNode` referred to `ControlFlowNode`. This seemingly cyclic
reference worked out because the libraries did not call the predicates
on `ControlFlowNode` whose implementation they were part of.
Both these problems are now solved by adding a new class
`ControlFlowNodeBase extends Element` that should be used in preference
to `@cfgnode` everywhere. This class is for exactly those use cases
where `@cfgnode` should be seen as an `Element` without having too many
member predicates on it.
The classes that move from extending `@cfgnode` to extending
`ControlFlowNodeBase` are: `BasicBlock`, `AdditionalControlFlowEdge`,
`DefOrUse`, `SsaDefinition`, `SubBasicBlock` and `RangeSsaDefinition`.
These previously had to define their own `toString` rootdef, which
typically had some dummy string as result (like `"BasicBlock"`), but now
their `toString` is part of the `Element` rootdef and should not be
overridden otherwise `Element.toString` will sometimes have multiple
results. Removing these dummy `toString` predicates had some effects on
the tests that are included in this commit.
The `getLocation` family of predicates is affected like `toString`, but
the situation is slightly different. Some of these classes had genuinely
useful alternative definitions of locations. Fortunately, they all used
`hasLocationInfo`, which is preferred over `getLocation` by the QL
engine. Because `Element` does not define `getLocationInfo`, each class
can create its own rootdef of this predicate like before.
The presence of abstract predicates on this class made it hard to
specialize it, and this is one of the reasons why the raw db-type
`@cfgnode` is often used in preference to `ControlFlowNode`.
For example, if you have 3 types called T, where t1 and t2 are defined
but t3 isn't, then you will have
unspecifiedtype(t1, t1)
unspecifiedtype(t2, t2)
unspecifiedtype(t3, t3)
t1 = resolve(t1)
t1 = resolve(t3)
t2 = resolve(t2)
t2 = resolve(t3)
so given
Type getUnspecifiedType() {
unspecifiedtype(unresolve(this), unresolve(result))
}
you get t1.getUnspecifiedType() = t2.
I think that in general the best thing to do is to not unresolve 'this',
but to just take the underlying value.
Casts to `void` did not have a semantic conversion type in the AST, so they also weren't getting generated correctly in the IR. I've added a `VoidConversion` class to the AST, along with tests. I've also added IR translation for such conversions, using a new `ConvertToVoid` opcode. I'm not sure if it's really necessary to generate an instruction to represent this, but it may be useful for detecting values that are explicitly unused (e.g. return value from a call).
I added two new sanity queries for the IR to detect the following:
- IR blocks with no successors, which usually indicates bad IR translation
- Phi instruction without an operand for one of the predecessor blocks.
These sanity queries found another subtle IR translation bug. If an expression that is normally translated as a condition (e.g. `&&`, `||`, or parens in certain contexts) has a constant value, we were not creating a `TranslatedExpr` for the expression at all. I changed it to always treat a constant condition as a non-condition expression.
All three `IRBlock.qll` files are now identical again, and they are just
a thin object-oriented layer on top of the three
`IRBlockConstruction.qll` files, two of which are identical.
As the queries live here, it makes sense for the suites to be versioned
together with them. The LGTM suite has already been moved. This commit
moves the actively-maintained non-LGTM suites.
Previously, we would try to find an element enclosing each macro
access. This is not in general well-defined, especially in the
context of template instantiations -- macros are a lexing-time
concept, and don't map cleanly onto AST elements.
`IRBlock` contains a few expensive predicates, mostly `getInstruction`
and `immediatelyDominates`. These were previously recomputed for each of
the three SSA layers even though they essentially produce the same
result in each layer. The only difference between the three types of
`IRBlock` is the phi nodes.
This commit changes the representation of `IRBlock` for `ssa` and
`aliased_ssa` so they become just wrappers around the `IRBlock` of their
previous layer. Most predicates in later layers are then computed from
the corresponding predicate of the preceding layer.
The `SSAConstruction::Cached::getInstructionOperand` predicate took
1m27s on a postgres snapshot before this change and was the slowest
predicate in SSAIR. It now takes 4.5s.
The slowdown was caused by its use of
`getUnmodeledDefinitionInstruction`, which got inlined into a place
where join orderer had little choice but to join the `MkInstruction`
relation with itself, creating a large intermediate relation.
I've added `pragma[noinline]` to `getUnmodeledDefinitionInstruction` and
also to similar predicates that are likely to cause the same problem in
the future.
Before this PR, the caching and computation of `IRBlock` spanned three
cache stages and was also separate from `SSAConstruction` even though it
shared some computations with it. They are now all cached together, so
the number of stages is reduced by 2 for each layer of IR.
I made the choice of what to cache be similar to what we do for
`PrimitiveBasicBlock` as I've recently benchmarked this and found it to
be a good choice.
The check that an instruction is in the same function as its operands is
hopefully redundant and can be removed. Just to be sure, I've added the
check to a sanity query.
This check turned out to cause bad performance in the alias analysis
because it got inlined into `AliasAnalysis::resultEscapes` and then
pulled out to a loop-invariant predicate that got a bad join order. With
this check removed, the `ssa/AliasAnalysis.qll` file is orders of
magnitude faster.
This follows what's been done for JavaScript. The `cpp-alerts-lgtm`
suite is now empty and will be auto-generated when building a dist.
This commit has no effect in itself, but these files need to be in place
when the corresponding changes are made in Semmle/code.