The data flow library conflates pointers and their objects in some
places but not others. For example, a member function call `x.f()` will
cause flow from `x` of type `T` to `this` of type `T*` inside `f`. It
might be ideal to avoid that conflation, but that's not realistic
without using the IR.
We've had good experience in the taint tracking library with conflating
pointers and objects, and it improves results for field flow, so perhaps
it's time to try it out for all data flow.
The `toString` for IR data-flow nodes are now similar to AST data-flow
nodes. This should make it easier to use the IR as a drop-in replacement
in the future. There are still differences because the IR data flow
library takes conversions into account.
I did not attempt to align the new nodes we use for field flow. That can
come later, when we add field flow to IR data flow.
This string looked out of place compared to `ExplicitParameterNode`,
whose string is simply the name of the parameter and therefore
indistinguishable from an access to the parameter without looking at the
location also. This has not been a problem so far, and if we want to
distinguish more clearly between initial values and accesses at some
point, we should do it for `ExplicitParameterNode` and
`UninitializedNode` too.
This is for symmetry with `exprNode` etc., and it should be handy for
the same reasons. I found one caller of `asInstruction` that got simpler
by using the new predicate instead.
This commit adds a `semmle.code.cpp.ir.dataflow.DefaultTaintTracking`
library that's API-compatible with the
`semmle.code.cpp.security.TaintTracking` library. The new library is
implemented on top of the IR data flow library.
The idea is to evolve this library until it can replace
`semmle.code.cpp.security.TaintTracking` without decreasing our SAMATE
score. Then we'll have the IR in production use, and we will have one
less taint-tracking library in production.
These partial defs don't do any harm, but they could hurt performance.
In typical C++ snapshots, between 5% and 20% of all calls are to `const`
functions.
Comments like these will make the autoformatter produce bad indentation.
For the record (not for explainability), these issues were found with
git grep -P -A1 '^( */\*| +\*( |$))(.(?!\*/))*$' cpp/ql/src/'**/*.ql*' |grep -B10 'qll\?- [^*]*$'
It superficially looks like `@param` is supported in QLDoc, but this is
mostly an accident of how its parser works. Attributes starting with `@`
are only intended to be used in the top-level QLDoc of a query, and
there can only be one of each attribute. If there are multiple `@param`
entries, the QLDoc parser will only keep the first one.
Even though `parseConvSpec` in `Scanf.qll` documented multiple
parameters, only the first one would be shown in an IDE. The
corresponding predicate in `Print.qll` documented only its first
parameter, perhaps because of an autoformatting accident earlier in
time. I've attempted to reconstruct documentation for its other
parameters based on its sibling in `Scanf.qll`.
These functions were overly complicated, and the comments explaining the
complications did not auto-format well. A reference type cannot have
specifiers on it, so it's fine to call `getUnspecifiedType` before
checking if it's a reference type.
The autoformatter is opinionated about comment styles and assumes that
"short" comments attach to the following item while "long" comments are
items themselves. I found top-level short comments with the following
two commands and then searched the output for empty lines that came
after the comment.
git grep -A1 '^/\* .*\*/' cpp/ql/src
git grep -A1 '^//' 'cpp/ql/src/**/*.ql*'