Commit Graph

4510 Commits

Author SHA1 Message Date
Jonas Jensen
cdfcee3ae9 Merge remote-tracking branch 'upstream/master' into ir-crement-load
Conflicts:
	cpp/ql/test/library-tests/ir/ssa/aliased_ssa_ir.expected
	cpp/ql/test/library-tests/ir/ssa/aliased_ssa_ir_unsound.expected
2020-02-05 16:13:21 +01:00
Anders Schack-Mulligen
07482abed7 Java/C++/C#: Sync. 2020-02-05 15:17:20 +01:00
Jonas Jensen
2928f9e5b2 Merge pull request #2703 from rdmarsh2/connect-ir-dataflow-models
C++: IR dataflow through modeled functions
2020-02-05 11:28:48 +01:00
Dave Bartolomeo
73ad2e9658 Merge from master 2020-02-04 18:33:10 -07:00
Dave Bartolomeo
a23d5afc6c C++: Add test case to demonstrate string literl aliasing change
Also fixed a minor bug where we should have been treating `AllNonLocalMemory` as _totally_ overlapping an access to a non-local variable, rather than _partially_ overlapping it. This fix is exhibited both in the new test case and in a couple existing test functions in `ssa.cpp`.
2020-02-04 18:24:08 -07:00
Robert Marsh
1576bcfa3f C++: remove unused predicates 2020-02-04 12:08:03 -08:00
Tom Hvitved
6e14ba4e56 C++: Follow-up changes 2020-02-04 14:09:12 +01:00
Tom Hvitved
c591719df2 Data flow: Sync files 2020-02-04 14:09:12 +01:00
Mathias Vorreiter Pedersen
0276c97b9c Merge pull request #2755 from jbj/BarrierGuard-SSA
C++: Don't use GVN in AST DataFlow BarrierNode
2020-02-04 12:00:12 +01:00
Jonas Jensen
b4385c6e60 C++: Don't use GVN in AST DataFlow BarrierNode
It turns out that the evaluator will evaluate the GVN stage even when no
predicate from it is needed after optimization of the subsequent stages.
The GVN library is expensive to evaluate, and it'll become even more
expensive when we switch its implementation to IR.

This PR disables the use of GVN in `DataFlow::BarrierNode` for the AST
data-flow library, which should improve performance when evaluating a
single data-flow query on a snapshot with no cache. Precision decreases
slightly, leading to a new FP in the qltests.

There is no corresponding change for the IR data-flow library since IR
GVN is not very expensive.
2020-02-04 08:40:36 +01:00
Robert Marsh
677f0f090a Merge branch 'master' into rdmarsh/cpp/ir-flow-through-outparams 2020-02-03 13:06:35 -08:00
Robert Marsh
931c0e982e Merge pull request #2748 from MathiasVP/value-numbering-indirection
C++: Indirection for ValueNumbering
2020-02-03 14:41:58 -05:00
Robert Marsh
f51841ac37 Merge pull request #2736 from jbj/buffer-type-size
C++: Workaround for problem with memcpy flow
2020-02-03 14:31:28 -05:00
Robert Marsh
3bfcf0bf46 Merge branch 'master' into connect-ir-dataflow-models 2020-02-03 11:06:45 -08:00
Cornelius Riemenschneider
36479d3fd6 Support to keep bounds derived on implicit integer casts. 2020-02-03 17:33:06 +01:00
Robert Marsh
2b10cd6228 Merge pull request #2737 from jbj/DefaultTaintTracking-indirect-parameters
C++: Interprocedural indirections in DefaultTaintTracking.qll
2020-02-03 11:12:38 -05:00
Mathias Vorreiter Pedersen
8aae2990d0 C++: Formatting 2020-02-03 16:15:49 +01:00
Mathias Vorreiter Pedersen
a8b3bcb87d C++: Indirection for value numbering 2020-02-03 16:13:32 +01:00
Cornelius Riemenschneider
1b68f86d5b Fix bug in CPP range analysis. 2020-02-03 14:16:48 +01:00
Jonas Jensen
e2da98ae24 C++: Accept autoformat and test changes 2020-01-31 20:58:53 +01:00
Dave Bartolomeo
e27a0fe504 C++: Prevent AliasedVirtualVariable from overlapping string literals
We were hitting a combinatorial explosion in `hasDefinitionAtRank` for functions that contain a large number of string literals. The problem was that every `Chi` instruction for `AliasedVirtualVariable` was treated as a definition of every string literal. We already mark string literals as `isReadOnly()`, but we were allowing `AliasedVirtualVariable` to define read-only locations so that the `AliasedDefinition` instruction would provide the initial definition for all string literals.

To fix this, I've introduced the new `InitializeNonLocal` instruction, which is inserted in the prologue of every function right after `AliasedDefinition`. It provides the initial definition for every non-stack memory location, including read-only locations, but is never written to anywhere else. It is the conterpart of the `AliasedUse` instruction in the function epilogue, which represents the use of all non-stack memory after the function returns. I considered renaming `AliasedUse` to `ReturnNonLocal`, to match the `InitializeXXX`/`ReturnXXX` pattern we already use for parameters and indirections, but held off to avoid unnecessary churn. Any thoughts on whether I should make this name change?

This change has a significant speedup in evaluation time for a few of our troublesome databases:
`attnam/ivan`: 13%
`awslabs/s2n`: 26%
`SinaMostafanejad/OpenRDM`: 7%
`zcoinofficial/zcoin`: 8%
2020-01-31 11:33:46 -07:00
Jonas Jensen
83f807f182 C++: Interprocedural indirection taint tracking
As a temporary workaround in the `DefaultTaintTracking` library, we
funnel flow across calls by conflating pointer and object both at the
caller and the callee.

The three cases in `adjustedSink` were deleted because they are now
covered by the one case for `ReadSideEffectInstruction` in
`instructionTaintStep`.

When enabling `DefaultTaintTracking`, this commit on top of #2736 has
the effect effect of recovering two lost results:

    --- a/cpp/ql/test/query-tests/Security/CWE/CWE-119/semmle/tests/OverflowDestination.expected
    +++ b/cpp/ql/test/query-tests/Security/CWE/CWE-119/semmle/tests/OverflowDestination.expected
    @@ -1,2 +1,4 @@
     | overflowdestination.cpp:30:2:30:8 | call to strncpy | To avoid overflow, this operation should be bounded by destination-buffer size, not source-buffer size. |
     | overflowdestination.cpp:46:2:46:7 | call to memcpy | To avoid overflow, this operation should be bounded by destination-buffer size, not source-buffer size. |
    +| overflowdestination.cpp:53:2:53:7 | call to memcpy | To avoid overflow, this operation should be bounded by destination-buffer size, not source-buffer size. |
    +| overflowdestination.cpp:64:2:64:7 | call to memcpy | To avoid overflow, this operation should be bounded by destination-buffer size, not source-buffer size. |

In the internal repo, we recover one lost result. Additionally, there
are two queries that gain an extra source for an existing sink. I'll
classify that as noise. The new results look like this:

    foo(argv); // this `argv` is a new source for the sink in `bar`
    bar(argv); // this `argv` is the existing source for the sink in `bar`
2020-01-31 16:28:45 +01:00
Jonas Jensen
a1aed1ad93 C++: Workaround for problem with memcpy flow
The type of the source argument to `memcpy` is `void *`, and somehow
that meant that the copied object itself got type `void`. Since that has
size 0, the SSA construction did not model it as reading from the last
write.

This is probably not the right fix, but maybe it's good enough for now.
The right fix would ensure that the type reported by
`hasOperandMemoryAccess` is `UnknownType`.

When `DefaultTaintTracking.qll` is enabled, this commit has the effect
of restoring a lost results:

    --- a/cpp/ql/test/query-tests/Security/CWE/CWE-119/semmle/tests/OverflowDestination.expected
    +++ b/cpp/ql/test/query-tests/Security/CWE/CWE-119/semmle/tests/OverflowDestination.expected
    @@ -1 +1,2 @@
     | overflowdestination.cpp:30:2:30:8 | call to strncpy | To avoid overflow, this operation should be bounded by destination-buffer size, not source-buffer size. |
    +| overflowdestination.cpp:46:2:46:7 | call to memcpy | To avoid overflow, this operation should be bounded by destination-buffer size, not source-buffer size. |
2020-01-31 16:04:43 +01:00
alexet
cd688367c7 CPP: Avoid uncessary recursion 2020-01-31 12:47:03 +00:00
Robert Marsh
83d611de11 C++: don't conflate pointers in data flow 2020-01-30 16:18:24 -08:00
Robert Marsh
209a30688a Merge pull request #2718 from jbj/DefaultTaintTracking-isUserInput
C++: Fix mapping of sources from Expr to Node
2020-01-30 16:22:48 -05:00
Robert Marsh
4617940eee Merge branch 'master' into connect-ir-dataflow-models 2020-01-30 08:49:42 -08:00
Jonas Jensen
148e87c61d C++: Put AliasedSSA.qll in new qlformat style 2020-01-30 11:38:16 +01:00
Jonas Jensen
f0f752844e Merge remote-tracking branch 'upstream/master' into dbartol/Indirections
Conflicts:
	cpp/ql/src/semmle/code/cpp/ir/implementation/aliased_ssa/internal/AliasedSSA.qll
	csharp/ql/src/semmle/code/csharp/ir/implementation/unaliased_ssa/internal/AliasAnalysis.qll
2020-01-30 10:26:44 +01:00
Jonas Jensen
036e16af8b Merge remote-tracking branch 'upstream/master' into ir-crement-load
Conflicts:
	cpp/ql/src/semmle/code/cpp/ir/implementation/raw/internal/TranslatedExpr.qll
2020-01-30 09:07:30 +01:00
Jonas Jensen
c4d2163321 Merge pull request #2673 from aschackmull/ql/autoformat-comparisonterm
Java/C++/C#: Autoformat comparison terms
2020-01-30 08:47:50 +01:00
Robert Marsh
71d87be773 C++: add flow through partial loads in DTT 2020-01-29 17:51:42 -08:00
Dave Bartolomeo
6249446ba0 Merge remote-tracking branch 'upstream/master' into dbartol/Indirections 2020-01-29 17:29:44 -07:00
Robert Marsh
1472101613 Merge branch 'master' into rdmarsh/cpp/ir-flow-through-outparams 2020-01-29 14:44:29 -08:00
Robert Marsh
74ea9bcdf4 C++: fix merge issue 2020-01-29 14:37:41 -08:00
Robert Marsh
1a458aa450 C++: IR dataflow edges through outparams 2020-01-29 14:37:41 -08:00
Robert Marsh
37570c7750 Merge pull request #2676 from jbj/dataflow-partial-chi
C++: data flow through partial chi operands where type is known
2020-01-29 13:44:06 -05:00
Jonas Jensen
52d2bebd1c C++: Taint through most partial chi operands
This changes the flow to be taint rather than data flow, and it extends
it to include chi instructions with unknown type as long as they're not
for the `AliasedVirtualVariable`.

We're losing three good test results because these tests are not
affected by `DefaultTaintTracking.qll`. The taint step added here can
later be ported to `TaintTrackingUtil.qll` to recover these results, but
we probably want a better API than transitive-closure search through
instructions before doing that.
2020-01-29 18:02:03 +01:00
Jonas Jensen
4a77f2b53c Merge remote-tracking branch 'upstream/master' into ir-crement-load
Update test output to fix semantic merge conflict.
2020-01-29 15:56:05 +01:00
Jonas Jensen
9b651ea92c C++: Fix mapping of sources from Expr to Node
The code contained the remains of how `isUserInput` in `Security.qll`
used to be ported to IR. It's wrong to use that port since many queries
call `userInput` directly to get the "cause" string.
2020-01-29 15:50:08 +01:00
Jonas Jensen
7bed6ad63b C++: Add taint from gets through memcpy 2020-01-29 15:42:43 +01:00
Jonas Jensen
d7e8ea7cc5 Merge pull request #2641 from marcrepo/master
Documentation update for Issue #2623
2020-01-29 13:37:00 +01:00
Jonas Jensen
386e8e87d1 Merge pull request #2645 from geoffw0/typo
CPP: Fix typo.
2020-01-29 13:35:55 +01:00
Anders Schack-Mulligen
0d4b2e4bf7 C#/C++: Autoformat post rebase. 2020-01-29 13:16:46 +01:00
Anders Schack-Mulligen
96e4a57edd C++: Autoformat. 2020-01-29 13:11:50 +01:00
Jonas Jensen
02cb8e9cc7 Merge remote-tracking branch 'upstream/master' into dataflow-partial-chi
Conflicts:
	cpp/ql/src/semmle/code/cpp/ir/dataflow/internal/DataFlowUtil.qll
	cpp/ql/test/library-tests/dataflow/DefaultTaintTracking/tainted.expected
2020-01-29 13:03:40 +01:00
Jonas Jensen
27b5902258 Merge pull request #2707 from geoffw0/taint-format
C++: Add TaintFunction model to FormattingFunction
2020-01-29 08:20:34 +01:00
Dave Bartolomeo
976b564b68 C++: Update AliasedSSA to use Allocation instead of IRVariable
This introduces a new type of `MemoryLocation`: `EntireAllocationMemoryLocation`, representing an entire contiguous allocation whose size is not known. This is used to model the memory accesses on `InitializeIndirection` and `ReturnIndirection`.
2020-01-28 10:55:24 -07:00
Dave Bartolomeo
165a45d9b5 C++/C#: Update SimpleSSA to use Allocation instead of IRVariable 2020-01-28 10:53:18 -07:00
Dave Bartolomeo
1bbc875442 C++/C#: Parameterize alias analysis based on AliasConfiguration
Instead of tracking `IRVariable`s directly, alias analysis now tracks instances of the `Allocation` type provided by its `Configuration` parameter. For unaliased SSA, an `Allocation` is just an `IRAutomaticVariable`. For aliased SSA, an `Allocation` is either an `IRVariable` or the memory pointed to by an indirect parameter.
2020-01-28 10:51:21 -07:00