This PR adds new predicates to `Declaration` and `Type` to get a fully-qualified canonical name for the element, suitable for debugging and dumps. It includes template parameters, cv qualifiers, function parameter and return types, and fully-qualified names for all symbols. These strings are too large to compute in productions queries, so they should be used only for dumps and debugging. Feel free to suggest better names for these predicates.
I've updated PrintAST and PrintIR to use these instead of `Function.getFullSignature()`. The biggest advantage of the new predicates is that they handle lambdas and local classes, which `getQualifiedName` and `getFullSignature` do not. This makes IR and AST dumps much more usable for real-world snapshots.
Along the way, I cleaned up some of our handling of `IntegralType` to use a single table for tracking the signed, unsigned, and canonical versions of each type. The canonical part is new, and was necessary for `getTypeIdentityString` so that `signed int` and `int` both appear as `int`.
These predicates currently take a pair of `IRBlock`s - as it stands, at
most one edge can exist from one `IRBlock` to a given other `IRBlock`.
We may need to revisit that assumption and create an `IREdge` IPA type
at some future date
There are a few IR APIs that we've found to be confusingly named. This PR renames them to be more consistent within the IR and with the AST API:
`Instruction.getFunction` -> `Instruction.getEnclosingFunction`: This was especially confusing when you'd call `FunctionAddressInstruction.getFunction` to get the function whose address was taken, and wound up with the enclosing function instead.
`Instruction.getXXXOperand` -> `Instruction.getXXX`. Now that `Operand` is an exposed type, we want a way to get a specific `Operand` of an `Instruction`, but more often we want to get the definition instruction of that operand. Now, the pattern is that `getXXXOperand` returns the `Operand`, and `getXXX` is equivalent to `getXXXOperand().getDefinitionInstruction()`.
`Operand.getInstruction` -> `Operand.getUseInstruction`: More consistent with the existing `Operand.getDefinitionInstruction` predicate.
This adds `IntegerPartial.qll`, which is similar to
`IntegerConstant.qll` except that it contains partial functions on
integers instead of total functions on optional integers. This speeds up
the constant analysis so it takes 6.5s instead of 10.3s on comdb2.
This does to `SSAConstruction` what the previous commit did to
`IRConstruction`. An instruction in `SSAConstruction` is now defined in
terms of how it was created rather than what it can be queried for.
Effectively, this defines `TInstruction` as `TInstructionTag` was
defined before and then removes `TInstructionTag` from
`SSAConstruction`. This also has the benefit of removing the concept of
an instruction tag from the public predicates on `Instruction`.
This definition was denormalized to the extent that an instruction was
defined in terms of the six main attributes it could be queried for.
This made it possible to do multi-column joins on those six attributes,
but it doesn't appear that this feature was useful in practice. The main
multi-column join that was in use was on the pair of
(`TranslatedElement, InstructionTag`), but the `TranslatedElement` was
not part of the `TInstruction`.
This commit changes `TInstruction` to be defined in terms of what it's
_built from_ (`TranslatedElement, InstructionTag`) instead. This makes
it possible to do multi-column joins on those two components, and then
there are separate predicates (usually with two columns) to query
instruction attributes, replacing the many uncached projections from
`MkInstruction` that were generated before.
An immediate advantage is that an `Expr` with multiple types will no
longer give rise to multiple `Instruction`s, fixing most of the errors
from the sanity query `ambiguousSuccessors`. The code inside
`IRConstruction.qll` becomes simpler and hopefully faster as there is no
longer a translation from `TranslatedElement` to `Locatable` and back
again.
The previous commit had the side effect that `IRVariable`s were created
for all `Functions`, including those that did not have IR. This commit
restricts all `TIRVariable` constructors to functions that have IR.
This doesn't make it much faster, but it reduces the debug output
volume. It also simplifies the code.
I've found this change necessary when I compute the full IR on a
Wireshark snapshot in QL4E. Without it, Eclipse runs out of memory
because the console log is too large.
The new predicate `isOrphan` gets inlined into
`ignoreExprAndDescendants`, whose performance improves from
TranslatedElement::ignoreExprAndDescendants#f .. 23.4s (executed 9 times)
to
TranslatedElement::ignoreExprAndDescendants#f ... 4.3s (executed 9 times)
This dramatic improvement is not only due to eliminating a type check in
the recursive case. Removing the type check from the other base cases
also enabled them to get better join orders.
The previous reccomentation changed the behaviour of the code.
A user following the advice might have broken her/his code:
With call-by-value, the original parameter is not changed.
With a call-by-reference, however, it may be changed. To be sure,
nothing breaks by blindly following the advice, suggest to pass a
const reference.
The `SSAConstruction.getNewIRVariable` was very slow on Wireshark. This
was probably because it couldn't join on multiple columns
simultaneously. Instead of improving the join, I observed that the
`TIRVariable` type was the same between all three IR stages except for a
few occurrences of `FunctionIR` that could easily be changed to
`Function`. By sharing `TIRVariable` between all the stages, we avoid
recomputing it and translating it between every stage, turning the slow
`getNewIRVariable` predicate into a no-op.
This change means that later stages of the IR can't introduce new
variables, but that was already the case because
`config/identical-files.json` forced all three `IRVariable.qll` files to
be identical.
This predicate computed a local CP between all defs and uses of the same
virtual variable in a basic block. This wasn't a problem in
`unaliased_ssa`, but it became a huge problem in `aliased_ssa`, probably
because many variables can be modelled with a single virtual variable
there.
Before this commit, evaluation of `aliased_ssa`'s
`variableLiveOnEntryToBlock#ff#antijoin_rhs` on Wireshark took 80
_minutes_. After this commit, that predicate and its immediate
dependencies take around 5 _seconds_.