I looked through a few hundred results from this query on lgtm.com and
found that most of the FPs had to do with comment lines ending in `}`.
This change should fix most of them, at the cost of very few false
negatives.
On Wireshark, this query goes from 7,425 results to 6,686 results before
filtering for generated code. Almost all the lost results were FP,
except a handful of results involving initializer lists.
The existing `Node.asExpr` predicate changes semantics so it becomes the
one that most users should use when they don't want to think about
`Conversion`s. A new `Node.asConvertedExpr` predicate is added and has
the same semantics as the old `Node.asExpr` predicate. It's for advanced
users that know about `Conversion`s and want to account for them.
With this change, the `IRDataflowTestCommon.qll` and
`DataflowTestCommon.qll` files use the same definitions of sources and
sinks. Since the IR data flow library is meant to be compatible with the
AST data flow library, this is what we ought to be testing.
Two alerts change but not necessarily for the right reasons.
Data flow nodes for expressions do not take CFG splitting into account. Example:
```
if (b)
x = tainted;
x = x.ToLower();
if (!b)
Use(x);
```
Flow is incorrectly reported from `tainted` to `x` in `Use(x)`, because the step
from `tainted` to `x.ToLower()` throws away the information that `b = true`.
The solution is to remember the splitting in data flow expression nodes, that is,
to represent the exact control flow node instead of just the expression. With that
we get flow from `tainted` to `[b = true] x.ToLower()`, but not from `tainted` to
`[b = false] x.ToLower()`.
The data flow API remains unchanged, but in order for analyses to fully benefit from
CFG splitting, sanitizers in particular should be CFG-based instead of expression-based:
```
if (b)
x = tainted;
if (IsInvalid(x))
return;
Use(x);
```
If the call to `IsInvalid()` is a sanitizer, then defining an expression node to be
a sanitizer using `GuardedExpr` will be too conservative (`x` in `Use(x)` is in fact
not guarded). However, `[b = true] x` in `[b = true] Use(x)` is guarded, and to help
defining guard-based sanitizers, the class `GuardedDataFlowNode` has been introduced.
As we prepare to clarify how conversions are treated, we don't want a
`sink(...)` declaration where it's non-obvious which conversions are
applied to arguments.