The aliased SSA code was assuming that, for every automatic variable, there would be at least one memory access that reads or writes the entire variable. We've encountered a couple cases where that isn't true due to extractor issues. As a workaround, we now always create the `VariableMemoryLocation` for every local variable.
I've also added a sanity test to detect this condition in the future.
Along the way, I had to fix a perf issue in the PrintIR code. When determining the ID of a result based on line number, we were considering all `Instruction`s generated for a particular line, regardless of whether they were all in the same `IRFunction`. In addition, the predicate had what appeared to be a bad join order that made it take forever on large snapshots. I've scoped it down to just consider `Instruction`s in the same function, and outlined that predicate to fix the join order issue. This causes some numbering changes, but they're for the better. I don't think there was actually any nondeterminism there before, but now the numbering won't depend on the number of instantiations of a template, either.
Previously, we had several predicates on `Instruction` and `Operand` whose values were determined solely by the opcode of the instruction. For large snapshots, this meant that we would populate large tables mapping each of the millions of `Instruction`s to the appropriate value, times three (once for each IR flavor).
This change moves all of these opcode properties onto `Opcode` itself, with inline wrapper predicates on `Instruction` and `Operand` where necessary. On smaller snapshots, like ChakraCore, performance is a wash, but this did speed up Wireshark by about 4%.
Even ignoring the modest performance benefit, having these properties defined on `Opcode` seems like a better organization than having them on `Instruction` and `Operand`.
Ternary conditionals `b ? x : y` mistakenly had taint-tracking steps from both
`b`, `x`, and `y` to the conditional expression itself. Flow from `b` was not
intented, and flow from `x` and `y` is already part of ordinary data flow.
On wireshark/wireshark, `isInCycle` ran into a low-memory loop on the
`aliased_ssa` stage. It shouldn't be necessary to detect cycles after
the `raw` stage, so this commit moves cycle detection into the
`Construction` modules and makes it a no-op in `SSAConstruction.qll`.
This commit adds type information to data flow paths, by mapping node types onto
the smaller set of GVN types, and implementing `ppReprType()`.
The effect is a mere change in `DataFlow::PathNode::toString()`; no type-based
pruning is done yet.