After a `queries.xml` was added to the test directory,
`Container.getRelativePath` now considers source files to be relative to
the `cpp/test` directory rather than the directory of the `*.ql*` file.
This caused some benign test output changes, and it also caused an
unwanted alert for `test3.c:14` to appear in
`cpp/ql/test/query-tests/Security/CWE/CWE-190/semmle/tainted/IntegerOverflowTainted.expected`.
This alert came about because `inSystemMacroExpansion` holds for files
that don't have a relative path, but the pretend system header in
`../system_header` now does have a relative path because it's below the
`cpp/test` directory. The fix is to add another `queries.xml` just for
the directory with the affected test.
This change fixes a few key problems with the existing SSA implementations:
For unaliased SSA, we were incorrectly choosing to model a local variable that had accesses that did not cover the entire variable. This has been changed to ensure that all accesses to the variable are at offset zero and have the same type as the variable itself. This was only possible to fix now that every `MemoryOperand` has its own type.
For aliased SSA, we now correctly track the offset and size of each memory access using an interval of bit offsets covered by the access. The offset interval makes the overlap computation more straightforward. Again, this is only possible now that operands have types.
The `getXXXMemoryAccess` predicates are now driven by the `MemoryAccessKind` on the operands and results, instead of by specific opcodes.
This change does fix an existing false negative in the IR dataflow tests.
I added a few simple test cases to the SSA IR tests, covering the various kinds of overlap (MustExcactly, MustTotally, and MayPartially).
I added "PrintSSA.qll", which can dump the SSA memory accesses as part of an IR dump.
For function parameters that are subject to "pointer decay", the database contains the type as originally declared (e.g. `T[]` instead of `T*`). The IR needs the actual type. Similarly, for variable declared as an array of unknown size, the actual size needs to be inferred from the initializer (e.g. `char a[] = "blah";` needs to have the type `char[5]`).
I've opened a ticket to have the extractor emit the actual type alongside the declared type, but for now, this workaround is enough to unblock progress for typical code.
The IR tests were getting kind of unwieldy. We were using "ir.cpp" to contain test cases that covered both IR construction (every language construct imaginable) and SSA construction. We would then build and dump all three flavors of IR. For IR construction tests, examining the SSA dumps when you add a new test case is tedious.
To make this easier to manage, I've split the SSA-specific test cases out into a separate directory. "ir.cpp" should now contain only IR construction test cases, and "ssa.cpp" should contain only SSA construction test cases. We dump just the raw IR for "ir.cpp", and just the two SSA flavors for "ssa.cpp". We still run all three flavors of the IR sanity tests for "ir.cpp", though.
I also removed the "ssa_block_count.ql" test, which wasn't really adding any coverage, because any change to the block count would be reflected in the dump as well.
This PR adds new predicates to `Declaration` and `Type` to get a fully-qualified canonical name for the element, suitable for debugging and dumps. It includes template parameters, cv qualifiers, function parameter and return types, and fully-qualified names for all symbols. These strings are too large to compute in productions queries, so they should be used only for dumps and debugging. Feel free to suggest better names for these predicates.
I've updated PrintAST and PrintIR to use these instead of `Function.getFullSignature()`. The biggest advantage of the new predicates is that they handle lambdas and local classes, which `getQualifiedName` and `getFullSignature` do not. This makes IR and AST dumps much more usable for real-world snapshots.
Along the way, I cleaned up some of our handling of `IntegralType` to use a single table for tracking the signed, unsigned, and canonical versions of each type. The canonical part is new, and was necessary for `getTypeIdentityString` so that `signed int` and `int` both appear as `int`.
These predicates currently take a pair of `IRBlock`s - as it stands, at
most one edge can exist from one `IRBlock` to a given other `IRBlock`.
We may need to revisit that assumption and create an `IREdge` IPA type
at some future date
There are a few IR APIs that we've found to be confusingly named. This PR renames them to be more consistent within the IR and with the AST API:
`Instruction.getFunction` -> `Instruction.getEnclosingFunction`: This was especially confusing when you'd call `FunctionAddressInstruction.getFunction` to get the function whose address was taken, and wound up with the enclosing function instead.
`Instruction.getXXXOperand` -> `Instruction.getXXX`. Now that `Operand` is an exposed type, we want a way to get a specific `Operand` of an `Instruction`, but more often we want to get the definition instruction of that operand. Now, the pattern is that `getXXXOperand` returns the `Operand`, and `getXXX` is equivalent to `getXXXOperand().getDefinitionInstruction()`.
`Operand.getInstruction` -> `Operand.getUseInstruction`: More consistent with the existing `Operand.getDefinitionInstruction` predicate.
These queries have no results on our test cases in the repo, but
`ambiguousSuccessors` has results on any large C++ code base, and
`unexplainedLoop` has results on Windows builds of ChakraCore.
With this change, the `IRDataflowTestCommon.qll` and
`DataflowTestCommon.qll` files use the same definitions of sources and
sinks. Since the IR data flow library is meant to be compatible with the
AST data flow library, this is what we ought to be testing.
Two alerts change but not necessarily for the right reasons.
As we prepare to clarify how conversions are treated, we don't want a
`sink(...)` declaration where it's non-obvious which conversions are
applied to arguments.
The CFG construction code previously contained half of an approximation
of which address expressions are constant. Now this this property is
properly modelled by `Expr.isConstant`, we can remove this code.
This fixes most discrepancies between the QL-based CFG and the
extractor-based CFG on Wireshark.
Before this change, `Expr.isConstant` only was only true for those
constant expressions that could be represented as QL values: numbers,
Booleans, and string literals. It was not true for string literals
converted from arrays to pointers, and it was not true for addresses of
variables with static lifetime.
The concept of a "constant expression" varies between C and C++ and
between versions of the standard, but they all include addresses of data
with static lifetime. These are modelled by the new library
`AddressConstantExpression.qll`, which is based on the code in
`EscapesTree.qll` and modified for its new purpose.
I've tested the change for performance on Wireshark and for correctness
with the included tests. I've also checked on Wireshark that all static
initializers in C files are considered constant, which was not the case
before.
This test was intended to catch regressions in the CFG, but it looks
like it's just catching insignificant extractor changes. The test has
started failing after some recent extractor changes, but I have no way
to pinpoint the failure and understand whether it's a problem or not, so
I think it's better to delete this test.
The remaining tests check whether the QL-based CFG generates the same
graph as the extractor-based CFG. Furthermore, the `successor-tests`
check that the extractor-based CFG works as intended.
This reduces the number of bounds computed, and will simplify use of the
library. The resulting locations in the tests may be slightly strange,
because the example `Instruction` for a `ValueNumber` is the first
appearing in the IR, regardless of source order, and may not be the most
closely related `Instruction` to the bounded value. I think that's worth
doing for the performance and usability benefits.