This PR adds new predicates to `Declaration` and `Type` to get a fully-qualified canonical name for the element, suitable for debugging and dumps. It includes template parameters, cv qualifiers, function parameter and return types, and fully-qualified names for all symbols. These strings are too large to compute in productions queries, so they should be used only for dumps and debugging. Feel free to suggest better names for these predicates.
I've updated PrintAST and PrintIR to use these instead of `Function.getFullSignature()`. The biggest advantage of the new predicates is that they handle lambdas and local classes, which `getQualifiedName` and `getFullSignature` do not. This makes IR and AST dumps much more usable for real-world snapshots.
Along the way, I cleaned up some of our handling of `IntegralType` to use a single table for tracking the signed, unsigned, and canonical versions of each type. The canonical part is new, and was necessary for `getTypeIdentityString` so that `signed int` and `int` both appear as `int`.
These predicates currently take a pair of `IRBlock`s - as it stands, at
most one edge can exist from one `IRBlock` to a given other `IRBlock`.
We may need to revisit that assumption and create an `IREdge` IPA type
at some future date
There are a few IR APIs that we've found to be confusingly named. This PR renames them to be more consistent within the IR and with the AST API:
`Instruction.getFunction` -> `Instruction.getEnclosingFunction`: This was especially confusing when you'd call `FunctionAddressInstruction.getFunction` to get the function whose address was taken, and wound up with the enclosing function instead.
`Instruction.getXXXOperand` -> `Instruction.getXXX`. Now that `Operand` is an exposed type, we want a way to get a specific `Operand` of an `Instruction`, but more often we want to get the definition instruction of that operand. Now, the pattern is that `getXXXOperand` returns the `Operand`, and `getXXX` is equivalent to `getXXXOperand().getDefinitionInstruction()`.
`Operand.getInstruction` -> `Operand.getUseInstruction`: More consistent with the existing `Operand.getDefinitionInstruction` predicate.
These queries have no results on our test cases in the repo, but
`ambiguousSuccessors` has results on any large C++ code base, and
`unexplainedLoop` has results on Windows builds of ChakraCore.
With this change, the `IRDataflowTestCommon.qll` and
`DataflowTestCommon.qll` files use the same definitions of sources and
sinks. Since the IR data flow library is meant to be compatible with the
AST data flow library, this is what we ought to be testing.
Two alerts change but not necessarily for the right reasons.
As we prepare to clarify how conversions are treated, we don't want a
`sink(...)` declaration where it's non-obvious which conversions are
applied to arguments.
The CFG construction code previously contained half of an approximation
of which address expressions are constant. Now this this property is
properly modelled by `Expr.isConstant`, we can remove this code.
This fixes most discrepancies between the QL-based CFG and the
extractor-based CFG on Wireshark.
Before this change, `Expr.isConstant` only was only true for those
constant expressions that could be represented as QL values: numbers,
Booleans, and string literals. It was not true for string literals
converted from arrays to pointers, and it was not true for addresses of
variables with static lifetime.
The concept of a "constant expression" varies between C and C++ and
between versions of the standard, but they all include addresses of data
with static lifetime. These are modelled by the new library
`AddressConstantExpression.qll`, which is based on the code in
`EscapesTree.qll` and modified for its new purpose.
I've tested the change for performance on Wireshark and for correctness
with the included tests. I've also checked on Wireshark that all static
initializers in C files are considered constant, which was not the case
before.
This test was intended to catch regressions in the CFG, but it looks
like it's just catching insignificant extractor changes. The test has
started failing after some recent extractor changes, but I have no way
to pinpoint the failure and understand whether it's a problem or not, so
I think it's better to delete this test.
The remaining tests check whether the QL-based CFG generates the same
graph as the extractor-based CFG. Furthermore, the `successor-tests`
check that the extractor-based CFG works as intended.
This reduces the number of bounds computed, and will simplify use of the
library. The resulting locations in the tests may be slightly strange,
because the example `Instruction` for a `ValueNumber` is the first
appearing in the IR, regardless of source order, and may not be the most
closely related `Instruction` to the bounded value. I think that's worth
doing for the performance and usability benefits.
This implements calculation of the control-flow graph in QL. The new
code is not enabled yet as we'll need more extractor changes first.
The `SyntheticDestructorCalls.qll` file is a temporary solution that can
be removed when the extractor produces this information directly.
The existing unreachable IR removal code only retargeted an infeasible edge to an `Unreached` instruction if the successor of the edge was an unreachable block. This is too conservative, because it doesn't remove an infeasible edge that targets a block that is still reachable via other paths. The trivial example of this is `do { } while (false);`, where the back edge is infeasible, but the body block is still reachable from the loop entry.
This change retargets all infeasible edges to `Unreached` instructions, regardless of the reachability of the successor block.