Made `Node::getType()`, `Node::asParameter()`, and `Node::asUninitialized()` operate directly on the IR. This actually fixed several diffs compared to the AST dataflow, because `getType()` wasn't holding for nodes that weren't `Exprs`.
Made `Uninitialized` a `VariableInstruction`. This makes it consistent with `InitializeParameter`.
This commit adds a new model interface that describes the known side effects (or lack thereof) of a library function. Does it read memory, does it write memory, and do any of its parameters escape? Initially, we have models for just two Standard Library functions: `std::move` and `std::forward`, which neither read nor write memory, and do not escape their parameter.
IR construction has been updated to insert the correct side effect instruction (or no side effect instruction) based on the model.
This fixes a subtle bug in the construction of aliased SSA. `getResultMemoryAccess` was failing to return a `MemoryAccess` for a store to a variable whose address escaped. This is because no `VirtualIRVariable` was being created for such variables. The code was assuming that any access to such a variable would be via `UnknownMemoryAccess`. The result is that accesses to such variables were not being modeled in SSA at all.
Instead, the way to handle this is to have a `VariableMemoryAccess` even when the variable being accessed has escaped, and to have `VariableMemoryAccess::getVirtualVariable()` return the `UnknownVirtualVariable` for escaped variables. In the future, this will also let us be less conservative about inserting `Chi` nodes, because we'll be able to determine that there's an exact overlap between two accesses to the same escaped variable in some cases.
The AST dataflow library essentially ignores conversions, which is probably the right behavior. Converting an `int` to a `long` preserves the value, even if the bit pattern might be different. It's arguable whether narrowing conversions should be treated as dataflow, but we'll do so for now. We can revisit that if we see it cause problems.
I noticed that queries using the data flow library spent significant
time in `#Dominance::bbIDominates#fbPlus`, which is the body of the
`bbStrictlyDominates` predicate. That predicate took 28 seconds to
compute on Wireshark.
The `b` in the predicate name means that magic was applied, and the
application of magic meant that it could not be evaluated with the
built-in `fastTC` HOP but became an explicit recursion instead. Applying
`pragma[nomagic]` to this predicate means that we will always get it
evaluated with `fastTC`, and that takes less than a second in my test
case.
This predicate was fast with the queries and engine from 1.18. With the
queries from `master` it got a bad join order in the
`UninitializedLocal.ql` query, which made it take 2m34s on Wireshark.
This commit decomposes `bbEntryReachesLocally` into two predicates that
together take only 4s.
The `nullValue` predicate performs a slow custom data-flow analysis to
find possible null values. It's so slow that it timed out after 1200s on
Wireshark.
In `UnsafeCreateProcessCall.ql`, the values found with `nullValue` were
used as sources in another data-flow analysis. By using the `NullValue`
class as sink instead of `nullValue`, we avoid the slow-down of doing
data flow twice. The `NullValue` class is essentially the base case of
`nullValue`. Confusing names, yes.
This commit adds Chi nodes to the successor relation and accounts for
them in the CFG, but does not add them to the SSA data graph. Chi nodes
are inserted for partial writes to any VirtualVariable, regardless of
whether the partial write reaches any uses.
This enables the addition of new instructions in later phases of IR
construction; in particular, aliasing write instructions and inference
instructions.