The IR data flow library now supports virtual dispatch with a library
that's similar to `security.TaintTracking`. In particular, it should
have the same performance characteristics. The main difference is that
non-recursive callers of `flowsFrom` now pass `_` instead of `true` for
`boolean allowFromArg`. This change allows flow through `return` to
actually work.
The aliased SSA code was assuming that, for every automatic variable, there would be at least one memory access that reads or writes the entire variable. We've encountered a couple cases where that isn't true due to extractor issues. As a workaround, we now always create the `VariableMemoryLocation` for every local variable.
I've also added a sanity test to detect this condition in the future.
Along the way, I had to fix a perf issue in the PrintIR code. When determining the ID of a result based on line number, we were considering all `Instruction`s generated for a particular line, regardless of whether they were all in the same `IRFunction`. In addition, the predicate had what appeared to be a bad join order that made it take forever on large snapshots. I've scoped it down to just consider `Instruction`s in the same function, and outlined that predicate to fix the join order issue. This causes some numbering changes, but they're for the better. I don't think there was actually any nondeterminism there before, but now the numbering won't depend on the number of instantiations of a template, either.
The `overlappingVariableMemoryLocations` predicate was a helper
predicate introduced to fix a join-order issue in
`overlappingIRVariableMemoryLocations`. Unfortunately it caused a
performance issue of its own because it could grow too large. On the
small project (38MB zip) awslabs/s2n there were 181M rows in
`overlappingVariableMemoryLocations`, and it took 134s to evaluate.
The fix is to collapse the two predicates into one and fix join ordering
by including an extra column in the predicates being joined.
In addition, some parameters were reordered to avoid the overhead of
auto-generated `join_rhs` predicates.
Tuple counts of `overlappingVariableMemoryLocations` before:
623285 ~176% {2} r1 = JOIN AliasedSSA::isCoveredOffset#fff_120#join_rhs AS L WITH AliasedSSA::isCoveredOffset#fff_120#join_rhs AS R ON FIRST 2 OUTPUT L.<2>, R.<2>
119138 ~3% {2} r2 = SCAN AliasedSSA::VariableMemoryLocation::getVirtualVariable_dispred#ff AS I OUTPUT I.<1>, I.<0>
172192346 ~0% {2} r3 = JOIN r2 WITH AliasedSSA::hasUnknownOffset#ff_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, r2.<1>
172815631 ~0% {2} r4 = r1 \/ r3
172192346 ~0% {2} r5 = JOIN r2 WITH AliasedSSA::hasUnknownOffset#ff_10#join_rhs AS R ON FIRST 1 OUTPUT r2.<1>, R.<1>
345007977 ~87% {2} r6 = r4 \/ r5
return r6
Tuple counts of `overlappingIRVariableMemoryLocations` after:
117021 ~134% {2} r1 = JOIN AliasedSSA::isCoveredOffset#ffff AS L WITH AliasedSSA::isCoveredOffset#ffff AS R ON FIRST 3 OUTPUT L.<3>, R.<3>
201486 ~1% {2} r2 = JOIN AliasedSSA::hasUnknownOffset#fff AS L WITH AliasedSSA::hasVariableAndVirtualVariable#fff AS R ON FIRST 2 OUTPUT L.<2>, R.<2>
318507 ~26% {2} r3 = r1 \/ r2
201486 ~3% {2} r4 = JOIN AliasedSSA::hasUnknownOffset#fff AS L WITH AliasedSSA::hasVariableAndVirtualVariable#fff AS R ON FIRST 2 OUTPUT R.<2>, L.<2>
519993 ~92% {2} r5 = r3 \/ r4
return r5