Commit Graph

146 Commits

Author SHA1 Message Date
Robert Marsh
ba8ebe9f3a C++: accept test changes 2020-03-10 11:41:59 -07:00
Robert Marsh
bba6b23019 Merge branch 'master' into rdmarsh/cpp/ir-flow-through-outparams 2020-03-10 11:12:19 -07:00
Robert Marsh
95a762c987 Merge master for submodule update 2020-02-26 13:44:26 -08:00
Robert Marsh
4333fe7905 Merge branch 'master' into rdmarsh/cpp/ir-flow-through-outparams 2020-02-26 13:15:27 -08:00
Jonas Jensen
2d9df70abc Merge pull request #2887 from MathiasVP/fix-ir-gen-switch
C++: Fix IR generation for switch statements
2020-02-24 13:29:27 +01:00
Mathias Vorreiter Pedersen
af364e66fc C++/C#: Move sanity check inside InstructionSanity module and accept tests 2020-02-23 20:53:49 +01:00
Geoffrey White
89bbb975f9 C++: Effects on tests. 2020-02-19 14:52:49 +00:00
Robert Marsh
7f69cdfb56 C++: Dynamic allocations in IR alias analysis 2020-02-07 16:37:36 -08:00
Robert Marsh
05c8610bbc C++: tests for alias analysis of malloc 2020-02-07 16:35:58 -08:00
Robert Marsh
2d3a742b7f C++: autoformat and accept test changes 2020-02-06 13:41:00 -08:00
Jonas Jensen
91927c9039 Merge remote-tracking branch 'upstream/master' into ir-crement-load
Conflicts:
	cpp/ql/test/library-tests/ir/ssa/aliased_ssa_ir.expected
	cpp/ql/test/library-tests/ir/ssa/aliased_ssa_ir_unsound.expected
	cpp/ql/test/library-tests/ir/ssa/unaliased_ssa_ir.expected
	cpp/ql/test/library-tests/ir/ssa/unaliased_ssa_ir_unsound.expected
2020-02-06 08:37:09 +01:00
Jonas Jensen
cdfcee3ae9 Merge remote-tracking branch 'upstream/master' into ir-crement-load
Conflicts:
	cpp/ql/test/library-tests/ir/ssa/aliased_ssa_ir.expected
	cpp/ql/test/library-tests/ir/ssa/aliased_ssa_ir_unsound.expected
2020-02-05 16:13:21 +01:00
Dave Bartolomeo
73ad2e9658 Merge from master 2020-02-04 18:33:10 -07:00
Dave Bartolomeo
a23d5afc6c C++: Add test case to demonstrate string literl aliasing change
Also fixed a minor bug where we should have been treating `AllNonLocalMemory` as _totally_ overlapping an access to a non-local variable, rather than _partially_ overlapping it. This fix is exhibited both in the new test case and in a couple existing test functions in `ssa.cpp`.
2020-02-04 18:24:08 -07:00
Robert Marsh
3e2b0328b7 C++: update test expectations post-merge 2020-01-31 11:48:51 -08:00
Robert Marsh
2dd368fd1f C++: add SSA test for void* buffer parameters 2020-01-31 11:31:28 -08:00
Dave Bartolomeo
e27a0fe504 C++: Prevent AliasedVirtualVariable from overlapping string literals
We were hitting a combinatorial explosion in `hasDefinitionAtRank` for functions that contain a large number of string literals. The problem was that every `Chi` instruction for `AliasedVirtualVariable` was treated as a definition of every string literal. We already mark string literals as `isReadOnly()`, but we were allowing `AliasedVirtualVariable` to define read-only locations so that the `AliasedDefinition` instruction would provide the initial definition for all string literals.

To fix this, I've introduced the new `InitializeNonLocal` instruction, which is inserted in the prologue of every function right after `AliasedDefinition`. It provides the initial definition for every non-stack memory location, including read-only locations, but is never written to anywhere else. It is the conterpart of the `AliasedUse` instruction in the function epilogue, which represents the use of all non-stack memory after the function returns. I considered renaming `AliasedUse` to `ReturnNonLocal`, to match the `InitializeXXX`/`ReturnXXX` pattern we already use for parameters and indirections, but held off to avoid unnecessary churn. Any thoughts on whether I should make this name change?

This change has a significant speedup in evaluation time for a few of our troublesome databases:
`attnam/ivan`: 13%
`awslabs/s2n`: 26%
`SinaMostafanejad/OpenRDM`: 7%
`zcoinofficial/zcoin`: 8%
2020-01-31 11:33:46 -07:00
Jonas Jensen
4a77f2b53c Merge remote-tracking branch 'upstream/master' into ir-crement-load
Update test output to fix semantic merge conflict.
2020-01-29 15:56:05 +01:00
Dave Bartolomeo
dda32359fa C++: Accept IR dump test results changes due to new alias analysis 2020-01-28 10:58:05 -07:00
Dave Bartolomeo
7df3cf4c23 C++: Accept more test output after merge 2020-01-27 13:48:43 -07:00
Dave Bartolomeo
3b3502060b Merge remote-tracking branch 'upstream/master' into dbartol/NoEscape 2020-01-27 13:29:18 -07:00
Jonas Jensen
fb6ad5274f C++: Accept test changes 2020-01-24 22:28:20 +01:00
Jonas Jensen
c5950d2c9d C++: IR: Result of x in x++ is now the Load
Previously, the `Load` would be associated with the `CrementOperation`
rather than its operand, which gave surprising results when mapping
taint sinks back to `Expr`.

The changes in `raw_ir.expected` are to add `Copy` operations on the
`x++` in code like `y = x++`. This is now needed because the result that
`x++` would otherwise have (the Load) no longer belongs to the `++`
expression. Copies are inserted to ensure that all expressions are
associated with an `Instruction` result.

The changes in `*aliased_ssa_ir.expected` appear to be just wobble.
2020-01-24 09:02:50 +01:00
Dave Bartolomeo
9d35ff73c4 C++/C#: Make escape analysis unsound by default
When building SSA, we'll be assuming that stack variables do not escape, at least until we improve our alias analysis. I've added a new `IREscapeAnalysisConfiguration` class to allow the query to control this, and a new `UseSoundEscapeAnalysis.qll` module that can be imported to switch to the sound escape analysis. I've cloned the existing IR and SSA tests to have both sound and unsound versions. There were relatively few diffs in the IR dump tests, and the sanity tests still give the same results after one change described below.

Assuming that stack variables do not escape exposed an existing bug where we do not emit an `Uninitialized` instruction for the temporary variables used by `return` statements and `throw` expressions, even if the initializer is a constructor call or array initializer. I've refactored the code for handling elements that initialize a variable to share a common base class. I added a test case for returning an object initialized by constructor call, and ensured that the IR diffs for the existing `throw` test cases are correct.
2020-01-22 00:15:30 -07:00
Dave Bartolomeo
e60f902c36 C++/C#: Fix missing virtual variables
The aliased SSA code was assuming that, for every automatic variable, there would be at least one memory access that reads or writes the entire variable. We've encountered a couple cases where that isn't true due to extractor issues. As a workaround, we now always create the `VariableMemoryLocation` for every local variable.

I've also added a sanity test to detect this condition in the future.

Along the way, I had to fix a perf issue in the PrintIR code. When determining the ID of a result based on line number, we were considering all `Instruction`s generated for a particular line, regardless of whether they were all in the same `IRFunction`. In addition, the predicate had what appeared to be a bad join order that made it take forever on large snapshots. I've scoped it down to just consider `Instruction`s in the same function, and outlined that predicate to fix the join order issue. This causes some numbering changes, but they're for the better. I don't think there was actually any nondeterminism there before, but now the numbering won't depend on the number of instantiations of a template, either.
2020-01-14 17:57:15 -07:00
Dave Bartolomeo
9df37399f8 C++: Consolidate opcode properties onto Opcode class
Previously, we had several predicates on `Instruction` and `Operand` whose values were determined solely by the opcode of the instruction. For large snapshots, this meant that we would populate large tables mapping each of the millions of `Instruction`s to the appropriate value, times three (once for each IR flavor).

This change moves all of these opcode properties onto `Opcode` itself, with inline wrapper predicates on `Instruction` and `Operand` where necessary. On smaller snapshots, like ChakraCore, performance is a wash, but this did speed up Wireshark by about 4%.

Even ignoring the modest performance benefit, having these properties defined on `Opcode` seems like a better organization than having them on `Instruction` and `Operand`.
2020-01-07 13:17:27 -07:00
Robert Marsh
e209ed961a Merge branch 'master' into rdmarsh/cpp/ir-callee-side-effects 2019-12-17 15:11:02 -08:00
Dave Bartolomeo
cbb6797ca8 Merge from master and resolve conflicts 2019-12-04 10:14:52 -07:00
Dave Bartolomeo
50dc5e2ba3 Merge pull request #2438 from rdmarsh2/rdmarsh/ir-line-number-ids
C++/C#: use line numbers for instruction IDs
2019-12-03 18:48:28 -08:00
Dave Bartolomeo
acc3d23877 Clarify comment 2019-12-02 11:53:43 -08:00
Robert Marsh
e368d5dda0 C++: simplify getDisplayOrderInBlock 2019-11-26 16:02:30 -05:00
Dave Bartolomeo
f3b4140948 C++/C#: Consistent handling of "may" vs. "must" memory accesses
In the IR, some memory accesses are "must" accesses (the entire memory location is always read or written), and some are "may" accesses (some, all, or none of the bits in the location are written). We previously had to special case specific "may" accesses in a few places. This change regularizes our handling of "may" accesses.

The `MemoryAccessKind` enumeration now describes only the extent of the access (the set of locations potentially accessed), but does not distinguish "must" from "may". The new predicates `Operand.hasMayMemoryAccess()` and `Instruction.hasResultMayMemoryAccess()` hold when the access is a "may" access.

Unaliased SSA now correctly ignores variables that are ever accessed via a "may" access.

Aliased SSA now distinguishes `MemoryLocation`s for "may" and "must" accesses. I've refactored `getOverlap()` into the core `getExtentOverlap()`, which considers only the extent, but not the "may" vs. "must", and `getOverlap()`, which tweaks the result of `getExtentOverlap()` based on "may" vs. "must" and read-only locations.

When determining the overlap between a `Phi` operand and its definition, we now use the result of the defining `Chi` instruction, if one exists. This gives exact definitions for `Phi` operands for virtual variables.
2019-11-26 12:13:07 -07:00
Robert Marsh
60b384a6e5 C++/C#: use line numbers for instruction IDs
This should reduce the number of merge conflicts in the IR tests resulting
from instruction ID changes due to inserting or removing instructions
2019-11-25 18:27:59 -05:00
Dave Bartolomeo
51ff262cbc C++/C#: Add IR SSA sanity tests 2019-11-22 13:16:05 -07:00
Robert Marsh
34593701b2 Merge branch 'master' into rdmarsh/cpp/ir-callee-side-effects 2019-11-20 10:03:32 -08:00
Ian Lynagh
8e00516ecf C++: Accept changes in ir test 2019-11-15 14:42:36 +00:00
Robert Marsh
facbd32062 Merge branch 'master' into rdmarsh/cpp/ir-callee-side-effects 2019-11-14 11:09:13 -08:00
Robert Marsh
2fb1d4d1b1 C++: fix IR return block successors 2019-11-14 10:29:48 -08:00
Robert Marsh
64b34ad975 Merge branch 'master' of github.com:Semmle/ql into rdmarsh/cpp/ir-constructor-side-effects 2019-11-08 14:06:36 -08:00
Robert Marsh
1dc0cb89d0 Merge branch 'master' of github.com:Semmle/ql into rdmarsh/cpp/ir-constructor-side-effects 2019-11-08 12:47:27 -08:00
Dave Bartolomeo
c365b2f2f0 Merge from master
Resolve conflicts in test output
2019-11-08 10:42:29 -07:00
Dave Bartolomeo
17f76c2516 C++: Fix merge conflicts 2019-11-07 22:02:15 -07:00
Robert Marsh
2582b69e17 Merge branch 'master' of github.com:Semmle/ql into rdmarsh/cpp/ir-constructor-side-effects 2019-11-07 15:46:08 -08:00
Robert Marsh
e93dcdb16c Merge branch 'master' into rdmarsh/cpp/ir-constructor-side-effects 2019-11-07 15:19:46 -08:00
Robert Marsh
f483ec152b Merge branch 'master' of github.com:Semmle/ql into rdmarsh/cpp/uninit-string-initializers 2019-11-07 14:36:58 -08:00
Robert Marsh
ae1377447e C++: only generate uninits when needed 2019-11-07 13:55:49 -08:00
Dave Bartolomeo
6c1d219c86 Merge from master 2019-11-07 14:50:04 -07:00
Robert Marsh
81ad11090e C++: uninit instr for string literal initializers 2019-11-06 13:37:03 -08:00
Robert Marsh
51c4ef4f7f C++: add SSA IR test for array initializers 2019-11-06 13:32:35 -08:00
Dave Bartolomeo
a9e3bfbd11 C++/C#: Treat string literals like read-only global variables for alias purposes.
Previously, we didn't track string literals as known memory locations at all, so they all just got marked as `UnknownMemoryLocation`, just like an aribtrary read from a random pointer. This led to some confusing def-use chains, where it would look like the contents of a string literal were being written to by the side effect of an earlier function call, which of course is impossible.

To fix this, I've made two changes. First, each string literal is now given a corresponding `IRVariable` (specifically `IRStringLiteral`), since a string literal behaves more or less as a read-only global variable. Second, the `IRVariable` for each string literal is now marked `isReadOnly()`, which the alias analysis uses to determine that an arbitrary write to aliased memory will not overwrite the contents of a string literal.

I originally planned to treat all string literals with the same value as being the same memory location, since this is the usual behavior of modern compilers. However, this made implementing `IRVariable.getAST()` tricky for string literals, so I left them unpooled.
2019-11-06 13:08:28 -07:00