These queries have no results on our test cases in the repo, but
`ambiguousSuccessors` has results on any large C++ code base, and
`unexplainedLoop` has results on Windows builds of ChakraCore.
The existing unreachable IR removal code only retargeted an infeasible edge to an `Unreached` instruction if the successor of the edge was an unreachable block. This is too conservative, because it doesn't remove an infeasible edge that targets a block that is still reachable via other paths. The trivial example of this is `do { } while (false);`, where the back edge is infeasible, but the body block is still reachable from the loop entry.
This change retargets all infeasible edges to `Unreached` instructions, regardless of the reachability of the successor block.
This change removes any IR instructions that can be statically proven unreachable. To detect unreachable IR, we first run a simple constant value analysis on the IR. Then, any `ConditionalBranch` with a constant condition has the appropriate edge marked as "infeasible". We define a class `ReachableBlock` as any `IRBlock` with a path from the entry block of the function. SSA construction has been modified to operate only on `ReachableBlock` and `ReachableInstruction`, which ensures that only reachable IR gets translated into SSA form. For any infeasible edge where its predecessor block is reachable, we replace the original target of the branch with an `Unreached` instruction, which lets us preserve the invariant that all `ConditionalBranch` instructions have both a true and a false edge, and allows guard inference to still work.
The changes to `SSAConstruction.qll` are not as scary as they look. They are almost entirely a mechanical replacement of `OldIR::IRBlock` with `OldBlock`, which is just an alias for `ReachableBlock`.
Note that the `constant_func.ql` test can determine that the two new test functions always return 0.
Removing unreachable code helps get rid of some common FPs in IR-based dataflow analysis, especially for constructs like `while(true)`.
This change moves the simple constant analysis that was used by the const_func test into a pyrameterized module for use on any stage of the IR. This will be used to detect unreachable code.
Made `Node::getType()`, `Node::asParameter()`, and `Node::asUninitialized()` operate directly on the IR. This actually fixed several diffs compared to the AST dataflow, because `getType()` wasn't holding for nodes that weren't `Exprs`.
Made `Uninitialized` a `VariableInstruction`. This makes it consistent with `InitializeParameter`.
This fixes a subtle bug in the construction of aliased SSA. `getResultMemoryAccess` was failing to return a `MemoryAccess` for a store to a variable whose address escaped. This is because no `VirtualIRVariable` was being created for such variables. The code was assuming that any access to such a variable would be via `UnknownMemoryAccess`. The result is that accesses to such variables were not being modeled in SSA at all.
Instead, the way to handle this is to have a `VariableMemoryAccess` even when the variable being accessed has escaped, and to have `VariableMemoryAccess::getVirtualVariable()` return the `UnknownVirtualVariable` for escaped variables. In the future, this will also let us be less conservative about inserting `Chi` nodes, because we'll be able to determine that there's an exact overlap between two accesses to the same escaped variable in some cases.
This commit adds Chi nodes to the successor relation and accounts for
them in the CFG, but does not add them to the SSA data graph. Chi nodes
are inserted for partial writes to any VirtualVariable, regardless of
whether the partial write reaches any uses.
This test was failing due to a semantic merge conflict between #509,
which added `UninitializedInstruction`, and #517, which added new test
code that would get `UninitializedInstruction`s in it after merging with #509.
This commit inserts an `Uninitialized` instruction to "initialize" a local variable when that variable is initialized with an initializer list. This ensures that there is always a definition of the whole variable before any read or write to part of that variable.
This change appears in a different form in @rdmarsh2's Chi node PR, but I needed to refactor the initialization code anyway to handle ConditionDeclExpr.
@rdmarsh2 has been working on various queries and libraries on top of the IR, and has pointed out that having to always refer to an operand of an instruction by the pair of (instruction, operandTag) makes using the IR a bit clunky. This PR adds a new `Operand` IPA type that represents an operand of an instruction. `OperandTag` still exists, but is now an internal type used only in the IR implementation.
It turns out that when building aliased SSA IR, we were still keeping around the Phi instructions from unaliased SSA IR. These leftover instructions didn't show up in dumps because they were not assigned to a block. However, when dumping additional instruction properties, they would show up as a top-level node in the dump, without a label.
Moved IR flavors into "implementation", with internal files under "implementation/internal". Made `IRBlockConstruction` just a nested module of `IRConstruction`/`SSAConstruction`, so it gets picked up from the `Construction` parameter of the `IR` module, rather than being picked up just from being in the same directory as `IRBlock`.
This change makes the public IR.qll module resolve to the flavor of the IR that we want queries to use. Today, this is the aliased SSA flavor of the IR. Should we add additional IR iterations in the future, we'll update IR.qll to resolve to whichever one we consider the default.
I moved the PrintIR.ql and IRSanity.ql queries into the internal directories of the corresponding flavors. There's still a PrintIR.ql and an IRSanity.ql in the public IR directory, which use the same IR flavor as the public IR.qll.
There are no real code changes here, other than to fix up `import`s. All tests still hae the same output, as expected.
A future commit will hide the IR flavors other than the one we want queries to use directly.
Now that opcodes are in their own module that isn't imported into the global namespace, `Opcode::Call` no longer conflicts with `Call` from the ASTs. I've renamed `Opcode::Invoke` to `Opcode::Call`.
PrintAST.ql orders the functions by location, then in lexicographical order of the function signature. This is supposed to ensure a stable ordering, but functions without a location were not getting assigned an order at all.
I introduced some unnecessary base classes in the `TranslatedExpr` hierarchy with a previous commit. This commit refactors the hierarchy a bit to align with the following high-level description:
`TranslatedExpr` represents a translated piece of an `Expr`. Each `Expr` has exactly one `TranslatedCoreExpr`, which produces the result of that `Expr` ignoring any lvalue-to-rvalue conversion on its result. If an lvalue-to-rvalue converison is present, there is an additional `TranslatedLoad` for that `Expr` to do the conversion. For higher-level `Expr`s like `NewExpr`, there can also be additional `TranslatedExpr`s to represent the sub-operations within the overall `Expr`, such as the allocator call.
These expressions are a little trickier than most because they include an implicit call to an allocator function. The database tells us which function to call, but we have to synthesize the allocation size and alignment arguments ourselves. The alignment argument, if it exists, is always a constant, but the size argument requires multiplication by the element count for most `NewArrayExpr`s. I introduced the new `TranslatedAllocationSize` class to handle this.
The IR avoids having non-trivially-copyable and non-trivially-assignable types in register results, because objects of those types need to exist at a particular memory location. The `InitializeParameter` and `Uninitialized` instructions were violating this restriction because they returned register results, which were then stored into the destination location via a `Store`.
This change makes those two instructions take the destination address as an operand, and return a memory result representing the (un-)initialized memory, removing the need for a separate `Store` instruction.
Casts to `void` did not have a semantic conversion type in the AST, so they also weren't getting generated correctly in the IR. I've added a `VoidConversion` class to the AST, along with tests. I've also added IR translation for such conversions, using a new `ConvertToVoid` opcode. I'm not sure if it's really necessary to generate an instruction to represent this, but it may be useful for detecting values that are explicitly unused (e.g. return value from a call).
I added two new sanity queries for the IR to detect the following:
- IR blocks with no successors, which usually indicates bad IR translation
- Phi instruction without an operand for one of the predecessor blocks.
These sanity queries found another subtle IR translation bug. If an expression that is normally translated as a condition (e.g. `&&`, `||`, or parens in certain contexts) has a constant value, we were not creating a `TranslatedExpr` for the expression at all. I changed it to always treat a constant condition as a non-condition expression.