Commit Graph

62 Commits

Author SHA1 Message Date
Dave Bartolomeo
cbb6797ca8 Merge from master and resolve conflicts 2019-12-04 10:14:52 -07:00
Dave Bartolomeo
50dc5e2ba3 Merge pull request #2438 from rdmarsh2/rdmarsh/ir-line-number-ids
C++/C#: use line numbers for instruction IDs
2019-12-03 18:48:28 -08:00
Dave Bartolomeo
acc3d23877 Clarify comment 2019-12-02 11:53:43 -08:00
Robert Marsh
e368d5dda0 C++: simplify getDisplayOrderInBlock 2019-11-26 16:02:30 -05:00
Dave Bartolomeo
f3b4140948 C++/C#: Consistent handling of "may" vs. "must" memory accesses
In the IR, some memory accesses are "must" accesses (the entire memory location is always read or written), and some are "may" accesses (some, all, or none of the bits in the location are written). We previously had to special case specific "may" accesses in a few places. This change regularizes our handling of "may" accesses.

The `MemoryAccessKind` enumeration now describes only the extent of the access (the set of locations potentially accessed), but does not distinguish "must" from "may". The new predicates `Operand.hasMayMemoryAccess()` and `Instruction.hasResultMayMemoryAccess()` hold when the access is a "may" access.

Unaliased SSA now correctly ignores variables that are ever accessed via a "may" access.

Aliased SSA now distinguishes `MemoryLocation`s for "may" and "must" accesses. I've refactored `getOverlap()` into the core `getExtentOverlap()`, which considers only the extent, but not the "may" vs. "must", and `getOverlap()`, which tweaks the result of `getExtentOverlap()` based on "may" vs. "must" and read-only locations.

When determining the overlap between a `Phi` operand and its definition, we now use the result of the defining `Chi` instruction, if one exists. This gives exact definitions for `Phi` operands for virtual variables.
2019-11-26 12:13:07 -07:00
Robert Marsh
60b384a6e5 C++/C#: use line numbers for instruction IDs
This should reduce the number of merge conflicts in the IR tests resulting
from instruction ID changes due to inserting or removing instructions
2019-11-25 18:27:59 -05:00
Dave Bartolomeo
51ff262cbc C++/C#: Add IR SSA sanity tests 2019-11-22 13:16:05 -07:00
Ian Lynagh
8e00516ecf C++: Accept changes in ir test 2019-11-15 14:42:36 +00:00
Robert Marsh
64b34ad975 Merge branch 'master' of github.com:Semmle/ql into rdmarsh/cpp/ir-constructor-side-effects 2019-11-08 14:06:36 -08:00
Robert Marsh
1dc0cb89d0 Merge branch 'master' of github.com:Semmle/ql into rdmarsh/cpp/ir-constructor-side-effects 2019-11-08 12:47:27 -08:00
Dave Bartolomeo
c365b2f2f0 Merge from master
Resolve conflicts in test output
2019-11-08 10:42:29 -07:00
Dave Bartolomeo
17f76c2516 C++: Fix merge conflicts 2019-11-07 22:02:15 -07:00
Robert Marsh
2582b69e17 Merge branch 'master' of github.com:Semmle/ql into rdmarsh/cpp/ir-constructor-side-effects 2019-11-07 15:46:08 -08:00
Robert Marsh
e93dcdb16c Merge branch 'master' into rdmarsh/cpp/ir-constructor-side-effects 2019-11-07 15:19:46 -08:00
Robert Marsh
f483ec152b Merge branch 'master' of github.com:Semmle/ql into rdmarsh/cpp/uninit-string-initializers 2019-11-07 14:36:58 -08:00
Robert Marsh
ae1377447e C++: only generate uninits when needed 2019-11-07 13:55:49 -08:00
Dave Bartolomeo
6c1d219c86 Merge from master 2019-11-07 14:50:04 -07:00
Robert Marsh
81ad11090e C++: uninit instr for string literal initializers 2019-11-06 13:37:03 -08:00
Robert Marsh
51c4ef4f7f C++: add SSA IR test for array initializers 2019-11-06 13:32:35 -08:00
Dave Bartolomeo
a9e3bfbd11 C++/C#: Treat string literals like read-only global variables for alias purposes.
Previously, we didn't track string literals as known memory locations at all, so they all just got marked as `UnknownMemoryLocation`, just like an aribtrary read from a random pointer. This led to some confusing def-use chains, where it would look like the contents of a string literal were being written to by the side effect of an earlier function call, which of course is impossible.

To fix this, I've made two changes. First, each string literal is now given a corresponding `IRVariable` (specifically `IRStringLiteral`), since a string literal behaves more or less as a read-only global variable. Second, the `IRVariable` for each string literal is now marked `isReadOnly()`, which the alias analysis uses to determine that an arbitrary write to aliased memory will not overwrite the contents of a string literal.

I originally planned to treat all string literals with the same value as being the same memory location, since this is the usual behavior of modern compilers. However, this made implementing `IRVariable.getAST()` tricky for string literals, so I left them unpooled.
2019-11-06 13:08:28 -07:00
Jonas Jensen
76a3db9eed Merge remote-tracking branch 'upstream/master' into ir-copy-unloaded-result 2019-11-06 15:21:22 +01:00
Robert Marsh
31f25c8cfc C++: primary instrs for constructor side effects 2019-10-31 11:43:47 -07:00
Robert Marsh
86b5e97f76 Merge branch 'master' of github.com:Semmle/ql into rdmarsh/cpp/ir-constructor-side-effects 2019-10-31 11:34:22 -07:00
Robert Marsh
9477bd5698 Merge branch 'master' of github.com:Semmle/ql into rdmarsh/cpp/ir-buffer-read-call-se 2019-10-31 11:00:01 -07:00
Dave Bartolomeo
cc5a689293 C++/C#: Fix up after merge from master 2019-10-25 14:11:34 -07:00
Jonas Jensen
22de0efc58 Merge pull request #2008 from dave-bartolomeo/dave/IRType2
C++: Implement language-neutral IR type system
2019-10-25 09:42:23 +02:00
Jonas Jensen
7a6ec83572 C++: No CopyValue for immediately discarded exprs
Expressions like the `e` in `e;` or `e, e2`, whose result is immediately
discarded, should not get a synthetic `CopyValue`. This removes a lot of
redundancy from the IR.

To prevent these expressions from being confused with the expressions
from which they get their result, the predicate
`getInstructionConvertedResultExpression` now suppresses results for
expressions that don't produce their own result. This should fix the
mapping between expressions and IR data-flow nodes.
2019-10-23 11:56:30 +02:00
Jonas Jensen
cbbe9b4718 Merge remote-tracking branch 'upstream/master' into ir-copy-unloaded-result
Fixed conflicts by accepting new qltest output.

Conflicts:
      cpp/ql/test/library-tests/ir/ir/raw_ir.expected
      cpp/ql/test/library-tests/ir/ssa/aliased_ssa_ir.expected
      cpp/ql/test/library-tests/ir/ssa/unaliased_ssa_ir.expected
      cpp/ql/test/library-tests/syntax-zoo/aliased_ssa_sanity.expected
      cpp/ql/test/library-tests/syntax-zoo/unaliased_ssa_sanity.expected
2019-10-23 08:46:39 +02:00
Dave Bartolomeo
7241c1aae6 C++/C#: More sanity checks for IRType 2019-10-21 14:22:46 -07:00
Dave Bartolomeo
71a6b5dffe C++/C#: Fix some duplicate IRType problems, and add a sanity test 2019-10-21 10:46:30 -07:00
Robert Marsh
b29f88450b C++: buffer read side effects on unmodeled funcs 2019-10-17 12:10:23 -07:00
Robert Marsh
30d7238921 C++: fix missing getPrimaryInstruction 2019-10-16 17:05:37 -07:00
Robert Marsh
fffe3c2432 C++: add sanity test for side effect primaries 2019-10-16 16:53:55 -07:00
Dave Bartolomeo
167d2289c4 Merge from master 2019-10-16 10:10:10 -07:00
Robert Marsh
057c634fe4 C++: fix identical chi node operands 2019-10-04 13:05:47 -07:00
Robert Marsh
17e14348d5 C++: sanity test for identical Chi node operands 2019-10-04 12:57:30 -07:00
Robert Marsh
3377f88494 C++: generate Chi nodes on total IndirectMayWrites 2019-10-04 11:59:22 -07:00
Robert Marsh
5f8a3054d1 C++: add UninitializedInstructions for direct init 2019-10-04 11:34:14 -07:00
Robert Marsh
a76c4d9b3b C++: index for constructor qualifier side effects 2019-10-03 12:39:32 -07:00
Robert Marsh
47b9c497fa C++: IR SSA tests for explicit constructor calls 2019-10-03 12:25:41 -07:00
Robert Marsh
a45a6e48f8 C++: remove side effect operands from non-reads 2019-09-30 12:00:55 -07:00
Robert Marsh
8649978a43 C++: add indexes for specific side effects 2019-09-30 12:00:53 -07:00
Robert Marsh
24574be007 C++: add SizedBuffer side effect instructions 2019-09-30 12:00:53 -07:00
Robert Marsh
3d562243e4 C++: add side effects for outparams 2019-09-30 12:00:52 -07:00
Dave Bartolomeo
300e580874 C++: Implement language-neutral IR type system
The C++ IR currently has a very clunky way of specifying the type of an IR entity (`Instruction`, `Operand`, `IRVariable`, etc.). There are three separate predicates: `getType()`, `isGLValue()`, and `getSize()`. All three are necessary, rather than just having a `getType()` predicate, because some IR entities have types that are not represented via an existing `Type` object in the AST. Examples include the type for an lvalue returned from a `VariableAddress` instruction, the type for an array slice being zero-initialized in a variable initializer, and several others. It is very easy for QL code to just check the `getType()` predicate, while forgetting to use `isGLValue()` to determine if that type is the actual type of the entity (the prvalue case) or the type referred to by a glvalue entity. Furthermore, the C++ type system creates potentially many different `Type` objects for the same underlying type (e.g. typedefs, using declarations, `const`/`volatile` qualifiers, etc.), making it more difficult to tell when two entities have semantically equivalent types.

In addition, other languages for which we want to enable the IR have somewhat different type systems. The various language type systems differ in their structure, although they tend to share the basic building blocks necessary for the IR.

To address all of the above problems, I've introduced a new class hierarchy, rooted at the class `IRType`, that represents a bare-bones type system that is independent of source language (at least across C/C++/C#/Java). A type's identity is based on its kind (signed integer, unsigned integer, floating-point, Boolean, blob, etc.), size and in the case of blob types, a "tag" to differentiate between different classes and structs. No distinction is made between, say `signed int` and plain `int`, or between different language integer types that have the same signedness and size (e.g. `unsigned int` vs. `wchar_t` on Linux). `IRType` is intended for use by language-agnostic IR-based analyses, including range analysis, dataflow, SSA construction, and alias analysis. The set of available `IRType`s is determined by predicate provided by the language library implementation (e.g. `hasSignedIntegerType(int byteSize)`.

In addition to `IRType`, each language now defines a type alias named `LanguageType`, representing the type of an IR entity in more language-specific terms. The only predicate requried on `LanguageType` is `getIRType()`, which returns the single `IRType` object for the language-neutral representation of that `LanguageType`. All other predicates on and subclasses of `LanguageType` are language-specific. There may be many instances of `LanguageType` that map to a given `IRType`, to allow for typedefs, etc.

Most of the changes are mechanical changes in the IR construction code, to return the correct type for each IR entity. SSA construction has also been updated to avoid dependencies on language-specific types.

I have not yet removed the original `getType()` predicates that just return `Type`. These can be removed once we move the remaining existing libraries to use `IRType`.

Test results are, by design, pretty much unchanged. Once case changed for inline asm, because the previously IR generation for it played a little fast and loose with the input/output expressions. The test case now includes both input and output variables. The generated IR for `Conditional_LValue` is now more correct, because we now have a way to represent an lvalue of an lvalue. `syntax-zoo` is still a hot mess. Most of the changed outputs are due to wobble from having multiple functions with the same name, but with a slightly different order of evaluation due to the type changes. Others are wobble from already-invalid IR. A couple non-wobbly places have improved slightly, though.

The C# part of this change is waiting for #2005 to be merged, since that has some of the necessary C# implementation.
2019-09-23 16:14:00 -07:00
Jonas Jensen
cd5f3b84a8 C++: Make sure there's a Instruction for each Expr
This change ensures that all `Expr`s (except parentheses) have a
`TranslatedExpr` with a `getResult` that's one of its own instructions,
not an instruction from one of its operands. This means that when we
translate back and forth between `Expr` and `Instruction`, like in
`DataFlow::exprNode`, we will not conflate `e` with `&e` or `... = e`.
2019-09-23 15:23:31 +02:00
Dave Bartolomeo
6370391dbd C++: Add sanity test for definitions that don't dominate their uses. 2019-08-01 15:01:42 -07:00
Robert Marsh
c195420ba1 C++: respond to PR comments 2019-07-11 11:00:52 -07:00
Robert Marsh
41e4d920e3 C++: alias and side effect info for pure functions 2019-07-08 12:26:58 -07:00
Robert Marsh
ea7602b571 C++: add test for Alias and SideEffect models 2019-07-08 11:41:46 -07:00