When building SSA, we'll be assuming that stack variables do not escape, at least until we improve our alias analysis. I've added a new `IREscapeAnalysisConfiguration` class to allow the query to control this, and a new `UseSoundEscapeAnalysis.qll` module that can be imported to switch to the sound escape analysis. I've cloned the existing IR and SSA tests to have both sound and unsound versions. There were relatively few diffs in the IR dump tests, and the sanity tests still give the same results after one change described below.
Assuming that stack variables do not escape exposed an existing bug where we do not emit an `Uninitialized` instruction for the temporary variables used by `return` statements and `throw` expressions, even if the initializer is a constructor call or array initializer. I've refactored the code for handling elements that initialize a variable to share a common base class. I added a test case for returning an object initialized by constructor call, and ensured that the IR diffs for the existing `throw` test cases are correct.
The IR data flow library now supports virtual dispatch with a library
that's similar to `security.TaintTracking`. In particular, it should
have the same performance characteristics. The main difference is that
non-recursive callers of `flowsFrom` now pass `_` instead of `true` for
`boolean allowFromArg`. This change allows flow through `return` to
actually work.
The aliased SSA code was assuming that, for every automatic variable, there would be at least one memory access that reads or writes the entire variable. We've encountered a couple cases where that isn't true due to extractor issues. As a workaround, we now always create the `VariableMemoryLocation` for every local variable.
I've also added a sanity test to detect this condition in the future.
Along the way, I had to fix a perf issue in the PrintIR code. When determining the ID of a result based on line number, we were considering all `Instruction`s generated for a particular line, regardless of whether they were all in the same `IRFunction`. In addition, the predicate had what appeared to be a bad join order that made it take forever on large snapshots. I've scoped it down to just consider `Instruction`s in the same function, and outlined that predicate to fix the join order issue. This causes some numbering changes, but they're for the better. I don't think there was actually any nondeterminism there before, but now the numbering won't depend on the number of instantiations of a template, either.
Previously, we had several predicates on `Instruction` and `Operand` whose values were determined solely by the opcode of the instruction. For large snapshots, this meant that we would populate large tables mapping each of the millions of `Instruction`s to the appropriate value, times three (once for each IR flavor).
This change moves all of these opcode properties onto `Opcode` itself, with inline wrapper predicates on `Instruction` and `Operand` where necessary. On smaller snapshots, like ChakraCore, performance is a wash, but this did speed up Wireshark by about 4%.
Even ignoring the modest performance benefit, having these properties defined on `Opcode` seems like a better organization than having them on `Instruction` and `Operand`.
With the coming `codeql test` support, the `predefined_macros` file will not
necessarily be located under a `tools` directory. Change the test to hide more
of its actual path, so it will work in both cases.
In the IR, some memory accesses are "must" accesses (the entire memory location is always read or written), and some are "may" accesses (some, all, or none of the bits in the location are written). We previously had to special case specific "may" accesses in a few places. This change regularizes our handling of "may" accesses.
The `MemoryAccessKind` enumeration now describes only the extent of the access (the set of locations potentially accessed), but does not distinguish "must" from "may". The new predicates `Operand.hasMayMemoryAccess()` and `Instruction.hasResultMayMemoryAccess()` hold when the access is a "may" access.
Unaliased SSA now correctly ignores variables that are ever accessed via a "may" access.
Aliased SSA now distinguishes `MemoryLocation`s for "may" and "must" accesses. I've refactored `getOverlap()` into the core `getExtentOverlap()`, which considers only the extent, but not the "may" vs. "must", and `getOverlap()`, which tweaks the result of `getExtentOverlap()` based on "may" vs. "must" and read-only locations.
When determining the overlap between a `Phi` operand and its definition, we now use the result of the defining `Chi` instruction, if one exists. This gives exact definitions for `Phi` operands for virtual variables.