This also rewrites all uses of flow test common to use `DataFlow::ConfigSig`.
Note that the removed deprecated aliases are 14 months old by now and, hence,
can be safely removed.
Flow from a definition by reference of a field into its object was
working inconsistently and in a very syntax-dependent way. For a
function `f` receiving a reference, `f(a->x)` could propagate data back
to `a` via the _reverse read_ mechanism in the shared data-flow library,
but for a function `g` receiving a pointer, `g(&a->x)` would not work.
And `f((*a).x)` would not work either.
In all cases, the issue was that the shared data-flow library propagates
data backwards between `PostUpdateNode`s only, but there is no
`PostUpdateNode` for `a->x` in `g(&a->x)`. This pull request inserts
such post-update nodes where appropriate and links them to their
neighbors. In this exapmle, flow back from the output parameter of `g`
passes first to the `PostUpdateNode` of `&`, then to the (new)
`PostUpdateNode` of `a->x`, and finally, as a _reverse read_ with the
appropriate field projection, to `a`.
The data flow library conflates pointers and their objects in some
places but not others. For example, a member function call `x.f()` will
cause flow from `x` of type `T` to `this` of type `T*` inside `f`. It
might be ideal to avoid that conflation, but that's not realistic
without using the IR.
We've had good experience in the taint tracking library with conflating
pointers and objects, and it improves results for field flow, so perhaps
it's time to try it out for all data flow.
This string looked out of place compared to `ExplicitParameterNode`,
whose string is simply the name of the parameter and therefore
indistinguishable from an access to the parameter without looking at the
location also. This has not been a problem so far, and if we want to
distinguish more clearly between initial values and accesses at some
point, we should do it for `ExplicitParameterNode` and
`UninitializedNode` too.
Data flow probably never worked when a variable declared in a
`ConditionDeclExpr` was modeled with `BlockVar`. That combination did
not come up in testing before the last commit.
The data flow library conflates pointers and objects enough for the
`definitionByReference` predicate to be too strict in some cases. It was
too permissive in other cases that are now (or will be) handled better
by field flow.
See also the change note entry.
This commit removes fields from the responsibilities of `FlowVar.qll`.
The treatment of fields in that file was slow and imprecise.
It then adds another copy of the shared global data flow library, used
only to find local field flow, and it exposes that local field flow
through `localFlow` and `localFlowStep`.
This has a performance cost. It adds two cached stages to any query that
uses `localFlow`: the stage from `DataFlowImplCommon`, which is shared
with all queries that use global data flow, and a new stage just for
`localFlowStep`.
Because `ConstructorFieldInit` (member initializer lists) are not part
of the control flow graph, there was no data flow from the initial value
of parameters to their uses in member initializers. This commit adds the
necessary flow under the assumption that parameters are not overwritten
in member initializers.
This allows a member initializer list to be seen as a sequence of field
assignments. For example, the constructor
C() : a(taint()) { }
now has data flow similar to
C() { this.a = taint(); }
This commit changes C++ `ConstructorCall` to behave like
`new`-expressions in Java: they are both `ExprNode`s and
`PostUpdateNodes`, and there's a "pre-update node" (here called
`PreConstructorCallNode`) to play the role of the qualifier argument
when calling a constructor.
This removes a lot of flow steps, but it all seems to be flow that was
present twice: both exiting a `PartialDefNode` and a
`DefinitionByReferenceNode`. All `DefinitionByReferenceNode`s are now
`PartialDefNode`s.