After #3382 changed the escape analysis to model qualifiers as escaping,
there was an imbalance in the SSA library, where `addressTakenVariable`
excludes variables from SSA analysis if they have their address taken
but are _not_ passed by reference. This showed up as a missing result in
`TOCTOUFilesystemRace.ql`, demonstrated with a test case in #3432.
This commit changes the definition of "pass by reference" to include
call qualifiers, which allows SSA modeling of variables that have member
function calls on them.
I didn't add this support in `AddressConstantExpression.qll` since I
think it would require extra work and testing to get the constexprness
right. My long-term plan for `AddressConstantExpression.qll` is to move
its functionality to the extractor.
It does so by first defining what a pointer dereference is (on the IR
`Instruction` level), and then using the array length analysis and the range
analysis together to prove that some of these pointer dereferences are safe.
The `fieldFlow` predicate contained a fragile join that has become
ordered wrong recently, either as result of an unrelated change in the
data-flow library or as part of the stats change for the last dbscheme
change.
The minimal fix is to use `getEnclosingCallable` instead of
`getFunction` since the former uses `unique` to ensure good join
ordering in its callers. A longer-term fix should be applied to the AST
base libraries, but this will be invasive and require independent
testing.
Tuple counts on Wireshark before (cancelled after a few minutes):
(747s) Starting to evaluate predicate DataFlowUtil::localFlowStep#ff/2@bdba82
(848s) Tuple counts for DataFlowUtil::localFlowStep#ff:
1766640980 ~1% {2} r1 = JOIN DataFlowUtil::Node::getFunction_dispred#ff_10#join_rhs AS L WITH DataFlowUtil::Node::getFunction_dispred#ff_10#join_rhs AS R ON FIRST 1 OUTPUT L.<1>, R.<1>
1327 ~0% {2} r2 = JOIN r1 WITH project#DataFlowImplLocal::Configuration::hasFlow#fbb AS R ON FIRST 2 OUTPUT r1.<0>, r1.<1>
9691232 ~0% {2} r3 = DataFlowUtil::simpleLocalFlowStep#ff@staged_ext \/ r2
return r3
After:
(0s) Starting to evaluate predicate DataFlowUtil::localFlowStep#ff/2@a852a0
(0s) Tuple counts for DataFlowUtil::localFlowStep#ff:
49017 ~4% {3} r1 = JOIN project#DataFlowImplLocal::Configuration::hasFlow#fff AS L WITH DataFlowUtil::Node::getEnclosingCallable_dispred#ff AS R ON FIRST 1 OUTPUT L.<1>, R.<1>, R.<0>
42359 ~0% {2} r2 = JOIN r1 WITH DataFlowUtil::Node::getEnclosingCallable_dispred#ff AS R ON FIRST 2 OUTPUT r1.<2>, r1.<0>
9732264 ~0% {2} r3 = DataFlowUtil::simpleLocalFlowStep#ff@staged_ext \/ r2
return r3
This gives a slight speedup, and I think it makes the code shorter and
clearer.
On Wireshark, the time from the beginning of the `IRBlock` stage until
just before evaluation of `getInstruction` drops from 44s to 34s.
Flow from a definition by reference of a field into its object was
working inconsistently and in a very syntax-dependent way. For a
function `f` receiving a reference, `f(a->x)` could propagate data back
to `a` via the _reverse read_ mechanism in the shared data-flow library,
but for a function `g` receiving a pointer, `g(&a->x)` would not work.
And `f((*a).x)` would not work either.
In all cases, the issue was that the shared data-flow library propagates
data backwards between `PostUpdateNode`s only, but there is no
`PostUpdateNode` for `a->x` in `g(&a->x)`. This pull request inserts
such post-update nodes where appropriate and links them to their
neighbors. In this exapmle, flow back from the output parameter of `g`
passes first to the `PostUpdateNode` of `&`, then to the (new)
`PostUpdateNode` of `a->x`, and finally, as a _reverse read_ with the
appropriate field projection, to `a`.