The virtual-dispatch code for globals was missing any relationship
between the union field access and the global variable, which meant it
propagated function-pointer flow between any two fields of a global
struct. This resulted in false positives from
`cpp/tainted-format-string` on projects using SDL, such as
WohlSoft/PGE-Project.
In addition to fixing that bug, this commit also brings the code up to
date with the new style of modeling flow through global variables:
`DataFlow::Node.asVariable()`.
This test demonstrates that IR data flow conflates unrelated fields of a
global struct-typed variable and that this bug is not present in the old
AST-based implementation of `semmle.code.cpp.security.TaintTracking`.
This cleans up the test results, which were confusing because functions
like `sink` had multiple locations.
There are some additional results now involving casts to `const char *`
because previously it varied whether `sink` used `const`, and now it
always does.
Adding a new test case leads to changes in all `.expected` files in its
directory.
The new results show that the `DefinitionsAndUses` library does not
model `std::addressof` correctly, but that library is not intended to be
used for new code.
I didn't add this support in `AddressConstantExpression.qll` since I
think it would require extra work and testing to get the constexprness
right. My long-term plan for `AddressConstantExpression.qll` is to move
its functionality to the extractor.
Flow from a definition by reference of a field into its object was
working inconsistently and in a very syntax-dependent way. For a
function `f` receiving a reference, `f(a->x)` could propagate data back
to `a` via the _reverse read_ mechanism in the shared data-flow library,
but for a function `g` receiving a pointer, `g(&a->x)` would not work.
And `f((*a).x)` would not work either.
In all cases, the issue was that the shared data-flow library propagates
data backwards between `PostUpdateNode`s only, but there is no
`PostUpdateNode` for `a->x` in `g(&a->x)`. This pull request inserts
such post-update nodes where appropriate and links them to their
neighbors. In this exapmle, flow back from the output parameter of `g`
passes first to the `PostUpdateNode` of `&`, then to the (new)
`PostUpdateNode` of `a->x`, and finally, as a _reverse read_ with the
appropriate field projection, to `a`.