The `automaticVariableAddressEscapes` predicate got join-ordered badly
in its `unaliased_ssa` version. These are the tuple counts on Wireshark,
where one pipeline step is seen to have 716 million tuples:
```
[2019-03-02 11:29:41] (42s) Starting to evaluate predicate AliasAnalysis::automaticVariableAddressEscapes#2#f
[2019-03-02 11:30:06] (67s) Tuple counts:
353419 ~0% {1} r1 = JOIN project#Instruction::VariableAddressInstruction#class#2#ff WITH AliasAnalysis::resultEscapesNonReturn#2#f ON project#Instruction::VariableAddressInstruction#class#2#ff.<0>=AliasAnalysis::resultEscapesNonReturn#2#f.<0> OUTPUT FIELDS {AliasAnalysis::resultEscapesNonReturn#2#f.<0>}
353419 ~0% {2} r2 = JOIN r1 WITH IRConstruction::Cached::getInstructionEnclosingFunctionIR#ff@staged_ext ON r1.<0>=IRConstruction::Cached::getInstructionEnclosingFunctionIR#ff@staged_ext.<0> OUTPUT FIELDS {IRConstruction::Cached::getInstructionEnclosingFunctionIR#ff@staged_ext.<1>,r1.<0>}
353419 ~0% {2} r3 = JOIN r2 WITH FunctionIR::FunctionIR::getFunction_dispred#3#ff ON r2.<0>=FunctionIR::FunctionIR::getFunction_dispred#3#ff.<0> OUTPUT FIELDS {FunctionIR::FunctionIR::getFunction_dispred#3#ff.<1>,r2.<1>}
716040298 ~0% {2} r4 = JOIN r3 WITH IRVariable::IRVariable#class#3#ff_10#join_rhs ON r3.<0>=IRVariable::IRVariable#class#3#ff_10#join_rhs.<0> OUTPUT FIELDS {IRVariable::IRVariable#class#3#ff_10#join_rhs.<1>,r3.<1>}
4480139 ~0% {2} r5 = JOIN r4 WITH IRVariable::IRAutomaticVariable#class#3#ff ON r4.<0>=IRVariable::IRAutomaticVariable#class#3#ff.<0> OUTPUT FIELDS {r4.<1>,r4.<0>}
66760 ~91% {1} r6 = JOIN r5 WITH Instruction::VariableInstruction::getVariable_dispred#2#ff ON r5.<0>=Instruction::VariableInstruction::getVariable_dispred#2#ff.<0> AND r5.<1>=Instruction::VariableInstruction::getVariable_dispred#2#ff.<1> OUTPUT FIELDS {r5.<1>}
return r6
[2019-03-02 11:30:06] (67s) >>> Relation AliasAnalysis::automaticVariableAddressEscapes#2#f: 35531 rows using 0 MB
```
The predicate contained a cyclic join, which is always hard to optimize.
I couldn't see a reason to join the `FunctionIR`, so I removed that
part. The predicate is now fast, and there are no changes in the tests.
These two elements weren't cached, which meant that local data flow was
recalculated in every query that used data flow. They are also cached in
the Java version of `DataFlowUtil.qll`.
This commits also adds a test that uses `getParameter`. The new tests
demonstrate that support for array-to-pointer decay works, but we get
data flow to the array rather than its contents.
This accessor may not be forward-compatible with an IR-based version,
and it's unclear whether it has any use. The `VariableAccess` remains in
the `TDefinitionByReferenceNode` constructor since it's used to
implement `getType`.
Before this change, all the cached predicates in `IRGuards.qll` were in
separate cached stages, resulting in recomputation of most of the
library for each stage. This change groups the cached predicates in two
cached classes. A better grouping may be possible, but this grouping was
easy to do and seems to solve the problem.
Before this change, the `IRGuards` library accounted for five cached
stages when using the `RangeAnalysis` library. After this change, it
only accounts for one.
This change fixes a few key problems with the existing SSA implementations:
For unaliased SSA, we were incorrectly choosing to model a local variable that had accesses that did not cover the entire variable. This has been changed to ensure that all accesses to the variable are at offset zero and have the same type as the variable itself. This was only possible to fix now that every `MemoryOperand` has its own type.
For aliased SSA, we now correctly track the offset and size of each memory access using an interval of bit offsets covered by the access. The offset interval makes the overlap computation more straightforward. Again, this is only possible now that operands have types.
The `getXXXMemoryAccess` predicates are now driven by the `MemoryAccessKind` on the operands and results, instead of by specific opcodes.
This change does fix an existing false negative in the IR dataflow tests.
I added a few simple test cases to the SSA IR tests, covering the various kinds of overlap (MustExcactly, MustTotally, and MayPartially).
I added "PrintSSA.qll", which can dump the SSA memory accesses as part of an IR dump.
For function parameters that are subject to "pointer decay", the database contains the type as originally declared (e.g. `T[]` instead of `T*`). The IR needs the actual type. Similarly, for variable declared as an array of unknown size, the actual size needs to be inferred from the initializer (e.g. `char a[] = "blah";` needs to have the type `char[5]`).
I've opened a ticket to have the extractor emit the actual type alongside the declared type, but for now, this workaround is enough to unblock progress for typical code.
This change does some shuffling to make the distinction between memory operands and register operands more clear in the IR API. First, any given type that extends `Operand` is now either always a `MemoryOperand` or always a `RegisterOperand`. This required getting rid of `CopySourceOperand`, which was used for both the `CopyValue` instruction (as a `RegisterOperand`) and for the `Load` instruction (as a `MemoryOperand`). `CopyValue` is now just a `UnaryInstruction`, `Store` has a `StoreValueOperand` (`RegisterOperand`), and all of the instructions that read a value from memory indirectly (`Load`, `ReturnValue`, and `ThrowValue`) all now have a `LoadOperand` (`MemoryOperand`).
There are no diffs in the IR output for this commit, but this change is required for a subsequent commit that will make each `MemoryOperand` have a `Type`, which in turn is needed to fix a critical bug in aliased SSA construction.