Also fixed a minor bug where we should have been treating `AllNonLocalMemory` as _totally_ overlapping an access to a non-local variable, rather than _partially_ overlapping it. This fix is exhibited both in the new test case and in a couple existing test functions in `ssa.cpp`.
It turns out that the evaluator will evaluate the GVN stage even when no
predicate from it is needed after optimization of the subsequent stages.
The GVN library is expensive to evaluate, and it'll become even more
expensive when we switch its implementation to IR.
This PR disables the use of GVN in `DataFlow::BarrierNode` for the AST
data-flow library, which should improve performance when evaluating a
single data-flow query on a snapshot with no cache. Precision decreases
slightly, leading to a new FP in the qltests.
There is no corresponding change for the IR data-flow library since IR
GVN is not very expensive.
We were hitting a combinatorial explosion in `hasDefinitionAtRank` for functions that contain a large number of string literals. The problem was that every `Chi` instruction for `AliasedVirtualVariable` was treated as a definition of every string literal. We already mark string literals as `isReadOnly()`, but we were allowing `AliasedVirtualVariable` to define read-only locations so that the `AliasedDefinition` instruction would provide the initial definition for all string literals.
To fix this, I've introduced the new `InitializeNonLocal` instruction, which is inserted in the prologue of every function right after `AliasedDefinition`. It provides the initial definition for every non-stack memory location, including read-only locations, but is never written to anywhere else. It is the conterpart of the `AliasedUse` instruction in the function epilogue, which represents the use of all non-stack memory after the function returns. I considered renaming `AliasedUse` to `ReturnNonLocal`, to match the `InitializeXXX`/`ReturnXXX` pattern we already use for parameters and indirections, but held off to avoid unnecessary churn. Any thoughts on whether I should make this name change?
This change has a significant speedup in evaluation time for a few of our troublesome databases:
`attnam/ivan`: 13%
`awslabs/s2n`: 26%
`SinaMostafanejad/OpenRDM`: 7%
`zcoinofficial/zcoin`: 8%
As a temporary workaround in the `DefaultTaintTracking` library, we
funnel flow across calls by conflating pointer and object both at the
caller and the callee.
The three cases in `adjustedSink` were deleted because they are now
covered by the one case for `ReadSideEffectInstruction` in
`instructionTaintStep`.
When enabling `DefaultTaintTracking`, this commit on top of #2736 has
the effect effect of recovering two lost results:
--- a/cpp/ql/test/query-tests/Security/CWE/CWE-119/semmle/tests/OverflowDestination.expected
+++ b/cpp/ql/test/query-tests/Security/CWE/CWE-119/semmle/tests/OverflowDestination.expected
@@ -1,2 +1,4 @@
| overflowdestination.cpp:30:2:30:8 | call to strncpy | To avoid overflow, this operation should be bounded by destination-buffer size, not source-buffer size. |
| overflowdestination.cpp:46:2:46:7 | call to memcpy | To avoid overflow, this operation should be bounded by destination-buffer size, not source-buffer size. |
+| overflowdestination.cpp:53:2:53:7 | call to memcpy | To avoid overflow, this operation should be bounded by destination-buffer size, not source-buffer size. |
+| overflowdestination.cpp:64:2:64:7 | call to memcpy | To avoid overflow, this operation should be bounded by destination-buffer size, not source-buffer size. |
In the internal repo, we recover one lost result. Additionally, there
are two queries that gain an extra source for an existing sink. I'll
classify that as noise. The new results look like this:
foo(argv); // this `argv` is a new source for the sink in `bar`
bar(argv); // this `argv` is the existing source for the sink in `bar`
The type of the source argument to `memcpy` is `void *`, and somehow
that meant that the copied object itself got type `void`. Since that has
size 0, the SSA construction did not model it as reading from the last
write.
This is probably not the right fix, but maybe it's good enough for now.
The right fix would ensure that the type reported by
`hasOperandMemoryAccess` is `UnknownType`.
When `DefaultTaintTracking.qll` is enabled, this commit has the effect
of restoring a lost results:
--- a/cpp/ql/test/query-tests/Security/CWE/CWE-119/semmle/tests/OverflowDestination.expected
+++ b/cpp/ql/test/query-tests/Security/CWE/CWE-119/semmle/tests/OverflowDestination.expected
@@ -1 +1,2 @@
| overflowdestination.cpp:30:2:30:8 | call to strncpy | To avoid overflow, this operation should be bounded by destination-buffer size, not source-buffer size. |
+| overflowdestination.cpp:46:2:46:7 | call to memcpy | To avoid overflow, this operation should be bounded by destination-buffer size, not source-buffer size. |