The user knows that an expression functionally determines its
hashCons value, and that an expression functionally determines
its number of children, but this is not provable from the
definitions, and so not usable by the optimiser. By storing
the result of those known-functional calls in a variable,
rather than repeating the call, we enable better join orders.
These new results seem better than the previous ones, but the previous
ones are still there. Perhaps the `Buffer.qll` library could use some
adjustment, but this seems like an improvement in isolation.
The data flow library conflates pointers and their objects in some
places but not others. For example, a member function call `x.f()` will
cause flow from `x` of type `T` to `this` of type `T*` inside `f`. It
might be ideal to avoid that conflation, but that's not realistic
without using the IR.
We've had good experience in the taint tracking library with conflating
pointers and objects, and it improves results for field flow, so perhaps
it's time to try it out for all data flow.
The `toString` for IR data-flow nodes are now similar to AST data-flow
nodes. This should make it easier to use the IR as a drop-in replacement
in the future. There are still differences because the IR data flow
library takes conversions into account.
I did not attempt to align the new nodes we use for field flow. That can
come later, when we add field flow to IR data flow.
This string looked out of place compared to `ExplicitParameterNode`,
whose string is simply the name of the parameter and therefore
indistinguishable from an access to the parameter without looking at the
location also. This has not been a problem so far, and if we want to
distinguish more clearly between initial values and accesses at some
point, we should do it for `ExplicitParameterNode` and
`UninitializedNode` too.
This is for symmetry with `exprNode` etc., and it should be handy for
the same reasons. I found one caller of `asInstruction` that got simpler
by using the new predicate instead.
This commit adds a `semmle.code.cpp.ir.dataflow.DefaultTaintTracking`
library that's API-compatible with the
`semmle.code.cpp.security.TaintTracking` library. The new library is
implemented on top of the IR data flow library.
The idea is to evolve this library until it can replace
`semmle.code.cpp.security.TaintTracking` without decreasing our SAMATE
score. Then we'll have the IR in production use, and we will have one
less taint-tracking library in production.
These partial defs don't do any harm, but they could hurt performance.
In typical C++ snapshots, between 5% and 20% of all calls are to `const`
functions.