This works by adding data-flow edges to skip over array expressions when
reading from arrays. On the post-update side, there was already code to
skip over array expressions when storing to arrays. That happens in
`valueToUpdate` in `AddressFlow.qll`, which needed just a small tweak to
support assignments with non-field expressions at the top-level LHS,
like `*a = ...` or `a[0] = ...`.
The new code in `AddressFlow.qll` is copy-pasted from `EscapesTree.qll`,
and there is already a note in these files saying that they share a lot
of code and must be maintained in sync.
The conflicts came from how `this` is now a parameter but not a
`Parameter` on `master`.
Conflicts:
cpp/ql/src/semmle/code/cpp/ir/dataflow/internal/DataFlowUtil.qll
cpp/ql/test/library-tests/dataflow/DefaultTaintTracking/defaulttainttracking.cpp
cpp/ql/test/library-tests/dataflow/DefaultTaintTracking/tainted.expected
cpp/ql/test/library-tests/dataflow/DefaultTaintTracking/test_diff.expected
cpp/ql/test/library-tests/dataflow/dataflow-tests/dataflow-ir-consistency.expected
cpp/ql/test/library-tests/dataflow/fields/ir-flow.expected
cpp/ql/test/library-tests/syntax-zoo/dataflow-ir-consistency.expected
g++ doesn't support this code:
sorry, unimplemented: non-trivial designated initializers not supported
twoIntFields sSwapped = { .m2 = source(), .m1 = 0 };
so we need to build it in clang mode.
This commit removes fields from the responsibilities of `FlowVar.qll`.
The treatment of fields in that file was slow and imprecise.
It then adds another copy of the shared global data flow library, used
only to find local field flow, and it exposes that local field flow
through `localFlow` and `localFlowStep`.
This has a performance cost. It adds two cached stages to any query that
uses `localFlow`: the stage from `DataFlowImplCommon`, which is shared
with all queries that use global data flow, and a new stage just for
`localFlowStep`.
This commit changes how data flow works in the following code.
MyType x = source();
defineByReference(&x);
sink(x);
The question here is whether there should be flow from `source` to
`sink`. Such flow is desirable if `defineByReference` doesn't write to
all of `x`, but it's undesirable if `defineByReference` is a typical
init function in `C` that writes to every field or if
`defineByReference` is `memcpy` or `memset` on the full range.
Before 1.20.0, there would be flow from `source` to `sink` in case `x`
happened to be modeled with `BlockVar` but not in case `x` happened to
be modelled with SSA. The choice of modelling depends on an analysis of
how `x` is used elsewhere in the function, and it's supposed to be an
internal implementation detail that there are two ways to model
variables. In 1.20.0, I changed the `BlockVar` behavior so it worked the
same as SSA, never allowing that flow. It turns out that this change
broke a customer's query.
This commit reverts `BlockVar` to its old behavior of letting flow
propagate past the `defineByReference` call and then regains consistency
by changing all variables that are ever defined by reference to be
modelled with `BlockVar` instead of SSA. This means we now get too much
flow in certain cases, but that appears to be better overall than
getting too little flow. See also the discussion in CPP-336.