There's now a `localFlowStep` predicate for use directly in queries and
other libraries and a `simpleLocalFlowStep` for use only by the global
data flow library. The former predicate is intended to include field
flow, but the latter may not.
This will let Java and C# (and possibly C++ IR) avoid getting two kinds
of field flow at the same time, both from SSA and from the global data
flow library. It should let C++ AST add some form of field flow to
`localFlowStep` without making it an input to the global data flow
library.
To keep the code changes minimal, and to keep the implementation similar
to C++ and Java, the `TaintTracking{Public,Private}` files are now
imported together through `TaintTrackingUtil`. This has the side effect
of exposing `localAdditionalTaintStep`. The corresponding predicate for
Java was already exposed.
Initial implementation of data flow through fields, using the algorithm of the
shared data flow implementation. Fields (and field-like properties) are covered,
and stores can be either
- ordinary assignments, `Foo = x`,
- object initializers, `new C() { Foo = x }`, or
- field initializers, `int Foo = x`.
For field initializers, we need to synthesize calls (`SynthesizedCall`),
callables (`SynthesizedCallable`), parameters (`InstanceParameterNode`), and
arguments (`SynthesizedThisArgumentNode`), as the C# extractor does not (yet)
extract such entities. For example, in
```
class C
{
int Field1 = 1;
int Field2 = 2;
C() { }
}
```
there is a synthesized call from the constructor `C`, with a synthesized `this`
argument, and the targets of that call are two synthesized callables with bodies
`this.Field1 = 1` and `this.Field2 = 2`, respectively.
A consequence of this is that `DataFlowCallable` is no longer an alias for
`DotNet::Callable`, but instead an IPA type.
- Speedup the `varBlockReaches()` predicate, by restricting to basic blocks
in which a given SSA definition may still be live, in constrast to just
being able to reach *any* access (read or write) to the underlying source
variable.
- Account for some missing cases in the `lastRead()` predicate.
The predicate `maxSplits()` was previously applied dynamically to ensure that
any control flow node would keep track of at most `maxSplits()` number of splits.
However, there was no guarantee that two different copies of the same AST element
wouldn't contain different splits, so in general the number of copies for a given
AST element `e` could be on the order `$\binom{n}{k}c^k$`, where `n` is the total
number of splits that apply to `e`, `k = maxSplits()`, and `c` is a constant.
With this change, the relevant splits for `e` are instead computed statically,
meaning that the order is instead `$c^k$`.