Before this change, evaluation of the IR was spread out across about 5 stages. This resulted in a lot of redundant evaluation, especially tuple numbering of large IPA types like `TInstruction`. This change makes two small changes that, when combined, ensure that the IR is evaluated all in one stage:
First, we mark `TInstruction` as `cached`. This collapses all of the work to create instructions, across all three IR phases, into a single phase.
Second, we make the `SSA` module in `SSAConstruction.qll` just contain aliases to `cached` predicates defined in the `Cached` module. This ensures that all of the `Operand`-related SSA computation happens in the same stage as all of the `Instruction`-related SSA computation.
This updates C#'s IR to share `TInstruction` across stages the same way C++ does. The only interesting part is that, since we have not yet ported full alias analysis to C#, I stubbed out the required parts of the aliased SSA interface in `AliasedSSAStub.qll`.
Making these part of the IPA object identity changes the failure mode for cases where we assign multiple result types to an instruction. Previously, we would just have one instruction with two result types, but now we'd have two instructions, which breaks things worse. This change goes back to how things were before, to avoid any new surprises on real-world code with invalid ASTs or IR.
The taint-tracking configuration in `ExposureOfPrivateInformation.ql`
overlaps with the XSS taint-tracking configuration, as witnessed by this import chain:
```
semmle.code.csharp.security.dataflow.ExposureOfPrivateInformation.qll imports
semmle.code.csharp.security.dataflow.flowsinks.ExternalLocationSink imports
semmle.code.csharp.security.dataflow.flowsinks.Remote imports
semmle.code.csharp.security.dataflow.XSS
```
(The same for `CleartextStorage.qll` and `LogForging.ql`.)
The fix is to use `TaintTracking2` for the XSS configuration.
Now that `TInstruction` is shared between IR stages, several of the per-stage IR construction predicates can now be moved into the `Raw` interface exposed only by the initial construction of IR from the ASTs. This also removed a couple predicates that were not used previously at all.
Each stage of the IR reuses the majority of the instructions from previous stages. Previously, we've been wrapping each reused old instruction in a branch of the `TInstruction` type for the next stage. This causes use to create roughly three times as many `TInstruction` objects as we actually need.
Now that IPA union types are supported in the compiler, we can share a single `TInstruction` IPA type across stages. We create a single `TInstruction` IPA type, with individual branches of this type for instructions created directly from the AST (`TRawInstruction`) and for instructions added by each stage of SSA construction (`T*PhiInstruction`, `T*ChiInstruction`, `T*UnreachedInstruction`). Each stage then defines a `TStageInstruction` type that is a union of all of the branches that can appear in that particular stage. The public `Instruction` class for each phase extends the `TStageInstruction` type for that stage.
The interface that each stage exposes to the pyrameterized modules in the IR is now split into three pieces:
- The `Raw` module, exposed only by the original IR construction stage. This module identifies which functions have IR, which `TRawInstruction`s exist, and which `IRVariable`s exist.
- The `SSA` module, exposed only by the two SSA construction stages. This identifiers which `Phi`, `Chi`, and `Unreached` instructions exist.
- The global module, exposed by all three stages. This module has all of the predicates whose implementation is different for each stage, like gathering definitions of `MemoryOperand`s.
Similarly, there is now a single `TIRFunction` IPA type that is shared across all three stages. There is a single `IRFunctionBase` class that exposes the stage-indepdendent predicates; the `IRFunction` class for each stage extends `IRFunctionBase`.
Most of the other changes are largely mechanical.
While investigating a bug with `TInstruction` sharing, I discovered that we had a case where alias analysis could create two `VirtualVariable`s for the same `Allocation`. For an indirect parameter allocation, we were using the type of the pointer variable as the type of the indirect allocation, instead of just `Unknown`. If the `IRType` of the pointer variable was the same type as the type of at least one access to the indirect allocation, we'd create both an `EntireAllocationVirtualVariable` and a `VariableVirtualVariable` for the allocation.
I added a new consistency test to guard against this in the future. This also turned out to be the root cause of the one existing known consistency failure in the IR tests.
The previous changes made the optimizer choose a bad join order for the RHS of the antijoin in `addressOperandAllocationAndOffset`. Once again, `unique` to the rescue.