In the future, I could imagine we would have something like this, but for now,
I'm just keeping it simple.
```codeql
/**
* A collection of common guards that ensure the checked value cannot have arbitrary
* values.
*
* Currently only supports comparison with constant string value, but could also
* include checking whether all characters are alphanumeric, or whether a regex is
* matched against the value.
*
* Such guards will be useful for many taint-tracking queries, but not necessarily
* all, which is why you need to opt into these manually.
*/
class CommonNonArbitraryGuard extends BarrierGuard {
CommonNonArbitraryGuard() {
this instanceof StringConstCompare
}
override predicate checks(ControlFlowNode node, boolean branch) {
this.(StringConstCompare).checks(node, branch)
}
}
```
From a local evaluation against flask DB, after
https://github.com/github/codeql/pull/4649 was merged we would get:
```
Tuple counts for TypeTracker::callStep#ff/2@a21b71:
9876 ~0% {3} r1 = SCAN DataFlowPrivate::DataFlowCall::getArg_dispred#fff AS I OUTPUT I.<2>, I.<0>, I.<1>
9876 ~2% {3} r2 = JOIN r1 WITH project#DataFlowPrivate::DataFlowCall::getArg_dispred#fff AS R ON FIRST 1 OUTPUT r1.<2>, R.<0>, r1.<1>
72388997 ~0% {4} r3 = JOIN r2 WITH DataFlowPublic::ParameterNode::isParameterOf_dispred#fff_201#join_rhs AS R ON FIRST 1 OUTPUT r2.<2>, R.<2>, r2.<1>, R.<1>
4952 ~0% {2} r4 = JOIN r3 WITH DataFlowPrivate::DataFlowCall::getCallable_dispred#ff AS R ON FIRST 2 OUTPUT r3.<2>, r3.<3>
return r4
```
Add a step from that `CfgNode` to the corresponding `EssaNode`.
The intended effect is seen in `ImpliesDataflow.expected`.
The efeect seen in other `.expected`-files is that parameter nodes
change type, that the extra steps are seen, and that flow from
`EssaVar`s is mirrored in flow from `CfgNode`s.
There is one surprise, which is the `.0` node in
`coverage/localFlow.expected`.
This makes it easy to extend the sources/sinks of the configuration and re-run
the query from the query console on LGTM.com.
File location in `semmle.<lang>.security.dataflow.<QueryName>.qll` is matching
what we currently do in other languages (JS and C# sampled).
I did not follow the pattern in other languages for wrapping all the code in a
`module CodeInjection`, since I didn't understand the value in doing so -- I
would like confirmation from the other teams if we _should_ actually do that,
before merging.
Also fixes a bug ("`B`" was not recognised as a bytestring prefix).
The basic idea behind this fix is that the set of possible prefixes is
fairly small, so it's easier just to precompute them, and then join
them with the entire prefix of the string in question (rather than
look at each string in isolation, get its prefix, and _then_ check
whether it looks like it's a unicode string prefix, which essentially
is what the code did before).