Commit Graph

500 Commits

Author SHA1 Message Date
Robert Marsh
07bc9ca26c C++: fix whitespace 2019-03-07 13:14:58 -08:00
Robert Marsh
ef836c39bb C++: respond to PR comments 2019-03-07 13:14:57 -08:00
Robert Marsh
7e30ce0c09 C++: add phi node support to escape analysis 2019-03-07 13:14:56 -08:00
Robert Marsh
97c11a5222 C++: points-to for argument-returning calls 2019-03-07 13:14:55 -08:00
Robert Marsh
878502f82e C++: remove duplicate logic 2019-03-07 13:14:52 -08:00
Jonas Jensen
794a8954cd C++: Simplify automaticVariableAddressEscapes
The `automaticVariableAddressEscapes` predicate got join-ordered badly
in its `unaliased_ssa` version. These are the tuple counts on Wireshark,
where one pipeline step is seen to have 716 million tuples:

```
[2019-03-02 11:29:41] (42s) Starting to evaluate predicate AliasAnalysis::automaticVariableAddressEscapes#2#f
[2019-03-02 11:30:06] (67s) Tuple counts:
                      353419    ~0%      {1} r1 = JOIN project#Instruction::VariableAddressInstruction#class#2#ff WITH AliasAnalysis::resultEscapesNonReturn#2#f ON project#Instruction::VariableAddressInstruction#class#2#ff.<0>=AliasAnalysis::resultEscapesNonReturn#2#f.<0> OUTPUT FIELDS {AliasAnalysis::resultEscapesNonReturn#2#f.<0>}
                      353419    ~0%      {2} r2 = JOIN r1 WITH IRConstruction::Cached::getInstructionEnclosingFunctionIR#ff@staged_ext ON r1.<0>=IRConstruction::Cached::getInstructionEnclosingFunctionIR#ff@staged_ext.<0> OUTPUT FIELDS {IRConstruction::Cached::getInstructionEnclosingFunctionIR#ff@staged_ext.<1>,r1.<0>}
                      353419    ~0%      {2} r3 = JOIN r2 WITH FunctionIR::FunctionIR::getFunction_dispred#3#ff ON r2.<0>=FunctionIR::FunctionIR::getFunction_dispred#3#ff.<0> OUTPUT FIELDS {FunctionIR::FunctionIR::getFunction_dispred#3#ff.<1>,r2.<1>}
                      716040298 ~0%      {2} r4 = JOIN r3 WITH IRVariable::IRVariable#class#3#ff_10#join_rhs ON r3.<0>=IRVariable::IRVariable#class#3#ff_10#join_rhs.<0> OUTPUT FIELDS {IRVariable::IRVariable#class#3#ff_10#join_rhs.<1>,r3.<1>}
                      4480139   ~0%      {2} r5 = JOIN r4 WITH IRVariable::IRAutomaticVariable#class#3#ff ON r4.<0>=IRVariable::IRAutomaticVariable#class#3#ff.<0> OUTPUT FIELDS {r4.<1>,r4.<0>}
                      66760     ~91%     {1} r6 = JOIN r5 WITH Instruction::VariableInstruction::getVariable_dispred#2#ff ON r5.<0>=Instruction::VariableInstruction::getVariable_dispred#2#ff.<0> AND r5.<1>=Instruction::VariableInstruction::getVariable_dispred#2#ff.<1> OUTPUT FIELDS {r5.<1>}
                                         return r6
[2019-03-02 11:30:06] (67s)  >>> Relation AliasAnalysis::automaticVariableAddressEscapes#2#f: 35531 rows using 0 MB
```

The predicate contained a cyclic join, which is always hard to optimize.
I couldn't see a reason to join the `FunctionIR`, so I removed that
part. The predicate is now fast, and there are no changes in the tests.
2019-03-07 13:14:51 -08:00
Robert Marsh
a72cd23d1d C++: fix escape test failures 2019-03-07 13:14:51 -08:00
Robert Marsh
09321ee062 C++: refactor escape analysis for performance 2019-03-07 13:14:51 -08:00
Robert Marsh
6f76c13385 C++: fix unused variable warning 2019-03-07 13:14:50 -08:00
Robert Marsh
726f38c802 C++: refactor alias analysis for performance 2019-03-07 13:14:50 -08:00
Robert Marsh
c70bd285de C++: assume arguments to virtual functions escape 2019-03-07 13:14:49 -08:00
Robert Marsh
6089172554 C++: escape analysis for this parameters 2019-03-07 13:14:49 -08:00
Robert Marsh
466e110338 C++: add new interprocedural escape analysis 2019-03-07 13:14:48 -08:00
Max Schaefer
7f5e2630a1 Merge pull request #1032 from xiemaisi/master-for-merge
Merge master into rc/1.20
2019-03-04 21:23:51 +00:00
Robert Marsh
b8f8ed55e6 Merge pull request #1000 from jbj/dataflow-defbyref
C++: Support definition by reference in data flow library
2019-03-01 13:54:37 -08:00
Nick Rolfe
e6ddf7f48a Merge pull request #1012 from ian-semmle/constexpr
C++: Add Variable.isConstexpr()
2019-03-01 14:42:35 +00:00
Geoffrey White
28304e4fde Merge pull request #1005 from jbj/dataflow-Node-cached
C++: Cache TNode and localFlowStep
2019-02-28 17:43:14 +00:00
Ian Lynagh
a709a2d0f3 C++: Add Variable.isConstexpr() 2019-02-28 15:26:15 +00:00
Jonas Jensen
264301be66 C++: Cache TNode and localFlowStep
These two elements weren't cached, which meant that local data flow was
recalculated in every query that used data flow. They are also cached in
the Java version of `DataFlowUtil.qll`.
2019-02-28 11:41:51 +01:00
Jonas Jensen
8e6daafd7c C++: Add DefinitionByReferenceNode.getParameter
This commits also adds a test that uses `getParameter`. The new tests
demonstrate that support for array-to-pointer decay works, but we get
data flow to the array rather than its contents.
2019-02-28 09:39:51 +01:00
Jonas Jensen
2bc0a8d6fb C++: Remove getVariableAccess from def-by-ref node
This accessor may not be forward-compatible with an IR-based version,
and it's unclear whether it has any use. The `VariableAccess` remains in
the `TDefinitionByReferenceNode` constructor since it's used to
implement `getType`.
2019-02-28 09:38:40 +01:00
Jonas Jensen
7ff732d962 C++: Use OO dispatch for getType and getFunction 2019-02-28 08:23:24 +01:00
Jonas Jensen
972d00822c C++: Generalize std::move data flow 2019-02-27 15:53:00 +01:00
Jonas Jensen
80183464d9 C++: Define DefinitionByReferenceNode
This enables data flow through `memcpy` and similar functions modeled in
`semmle.code.cpp.model`.
2019-02-27 15:53:00 +01:00
Jonas Jensen
5647a1a658 C++: BlockVar value stops at def by ref (partial) 2019-02-27 15:05:53 +01:00
Dave Bartolomeo
84c7f195d6 Merge pull request #994 from geoffw0/msalloc
CPP: Add lots more allocation functions to Alloc.qll
2019-02-26 11:59:45 -08:00
Geoffrey White
e32042d69c CPP: Add support for Microsoft functions in Alloc.qll. 2019-02-26 17:11:37 +00:00
Jonas Jensen
f12dfda28f Merge pull request #985 from rdmarsh2/rdmarsh/ir-call-side-effect
C++: fix PrimaryInstruction for call side effects
2019-02-26 10:36:18 +01:00
Robert Marsh
af490a9b3e C++: fix PrimaryInstruction for call side effects 2019-02-25 11:41:40 -08:00
Ian Lynagh
4bd03d52f1 C++: Add constexpr support for functions 2019-02-25 12:48:48 +00:00
Jonas Jensen
a9f8a53dac Merge pull request #972 from geoffw0/rtl
CPP: Add support for the Rtl* functions in BufferAccess.ql
2019-02-25 13:07:05 +01:00
Dave Bartolomeo
70bccf85fc Merge pull request #970 from jbj/ir-block-count
C++: Use the cached getInstructionCount
2019-02-22 10:19:39 -08:00
Geoffrey White
dc0044288b CPP: Add support for some Rtl* functions in BufferAccess.qll. 2019-02-22 15:54:16 +00:00
Jonas Jensen
6777c8c13c C++: Use the cached getInstructionCount
The object-oriented `IRBlock` interface was recomputing instruction
counts instead of using the cached count that had already been computed.
2019-02-22 14:55:09 +01:00
Robert Marsh
aa97302671 make loads from tainted addresses tainted 2019-02-21 17:17:49 -08:00
Robert Marsh
9a9ec7bb17 C++: add IR-based taint tracking library 2019-02-21 17:09:09 -08:00
Robert Marsh
173ade1336 C++: add arithmetic/bitwise instruction classes 2019-02-21 17:09:08 -08:00
Jonas Jensen
d200bda2ad C++: Reduce the IRGuards to two cached stages
Before this change, all the cached predicates in `IRGuards.qll` were in
separate cached stages, resulting in recomputation of most of the
library for each stage. This change groups the cached predicates in two
cached classes. A better grouping may be possible, but this grouping was
easy to do and seems to solve the problem.

Before this change, the `IRGuards` library accounted for five cached
stages when using the `RangeAnalysis` library. After this change, it
only accounts for one.
2019-02-21 12:03:35 +01:00
Jonas Jensen
2dea0b4270 Merge pull request #879 from rdmarsh2/rdmarsh/cpp/ir-guards-edges
C++: Add edge-based predicates to IRGuards
2019-02-19 16:54:52 +01:00
Robert Marsh
26a0f4b100 Merge pull request #938 from dave-bartolomeo/dave/AliasedSSA
C++: Better tracking of SSA memory accesses
2019-02-14 08:10:31 -08:00
Anders Schack-Mulligen
980a690b8b CPP/Java: Sync Dataflow 2019-02-14 09:59:08 +01:00
Dave Bartolomeo
b40fd95b8e C++: Better tracking of SSA memory accesses
This change fixes a few key problems with the existing SSA implementations:

For unaliased SSA, we were incorrectly choosing to model a local variable that had accesses that did not cover the entire variable. This has been changed to ensure that all accesses to the variable are at offset zero and have the same type as the variable itself. This was only possible to fix now that every `MemoryOperand` has its own type.

For aliased SSA, we now correctly track the offset and size of each memory access using an interval of bit offsets covered by the access. The offset interval makes the overlap computation more straightforward. Again, this is only possible now that operands have types.
The `getXXXMemoryAccess` predicates are now driven by the `MemoryAccessKind` on the operands and results, instead of by specific opcodes.

This change does fix an existing false negative in the IR dataflow tests.

I added a few simple test cases to the SSA IR tests, covering the various kinds of overlap (MustExcactly, MustTotally, and MayPartially).

I added "PrintSSA.qll", which can dump the SSA memory accesses as part of an IR dump.
2019-02-13 10:44:39 -08:00
Dave Bartolomeo
055485d9eb C++: Work around lack of size for enum type 2019-02-13 10:44:39 -08:00
Dave Bartolomeo
aff2ea3316 C++: Handle pointer decay and inferred array sizes
For function parameters that are subject to "pointer decay", the database contains the type as originally declared (e.g. `T[]` instead of `T*`). The IR needs the actual type. Similarly, for variable declared as an array of unknown size, the actual size needs to be inferred from the initializer (e.g. `char a[] = "blah";` needs to have the type `char[5]`).

I've opened a ticket to have the extractor emit the actual type alongside the declared type, but for now, this workaround is enough to unblock progress for typical code.
2019-02-12 12:41:21 -08:00
Dave Bartolomeo
f5121d71bc C++: Fix range analysis for new API 2019-02-12 09:38:11 -08:00
Dave Bartolomeo
c224bbd767 C++: Fix Operand.getSize() 2019-02-11 17:48:59 -08:00
Dave Bartolomeo
bd46c43067 C++: Add sanity test for missing operand type 2019-02-11 09:47:00 -08:00
Dave Bartolomeo
a54d86423a C++: Add Operand.getType() 2019-02-11 09:47:00 -08:00
Dave Bartolomeo
fa2ef620ac C++: Rationalize RegisterOperand vs. MemoryOperand
This change does some shuffling to make the distinction between memory operands and register operands more clear in the IR API. First, any given type that extends `Operand` is now either always a `MemoryOperand` or always a `RegisterOperand`. This required getting rid of `CopySourceOperand`, which was used for both the `CopyValue` instruction (as a `RegisterOperand`) and for the `Load` instruction (as a `MemoryOperand`). `CopyValue` is now just a `UnaryInstruction`, `Store` has a `StoreValueOperand` (`RegisterOperand`), and all of the instructions that read a value from memory indirectly (`Load`, `ReturnValue`, and `ThrowValue`) all now have a `LoadOperand` (`MemoryOperand`).

There are no diffs in the IR output for this commit, but this change is required for a subsequent commit that will make each `MemoryOperand` have a `Type`, which in turn is needed to fix a critical bug in aliased SSA construction.
2019-02-11 09:47:00 -08:00
Dave Bartolomeo
283991d520 C++: Handle ProxyClass in getIdentityString() 2019-02-07 14:26:01 -08:00