Previously heuristic sinks were always included, to avoid us filtering
them out due to not being an argument to an external library call.
In this commit we move the argument to an external library call
filtering to the query-specific endpoint filters.
This lets us filter out heuristic sinks if they match one of the other
endpoint filters, reducing FPs.
0e31439 introduces some occasional duplicate tokens due to duplicate AST
node attributes. The long-term fix is to update `CodeToFeatures.qll`,
but for the short-term, we update the concatenation to concatenate
unique (location, token) pairs.
Use a `strictcount` to identify whether there is exactly one feature or not.
If so, we use it. If not, we use the empty string.
Add context to ensure we filter the set of data flow nodes down to only
the set of endpoint nodes.
This performance optimisation avoids calculating the Cartesian product
of data flow nodes and feature names, but it does not avoid calculating
the (slightly smaller) Cartesian product of endpoint nodes and feature names.
Product size = number of endpoint nodes * number of feature names.
At time of writing there are 8 feature names.
Change the cutoff logic from `count` to `strictcount`, since we know it only applies
to a non-empty set of results.
Use a single `strictconcat` aggregate to combine tokens in order of location,
instead of computing a `rank` followed by a `concat`.
Strictness introduces a slight change of behaviour because missing tokens will now result
in no results from the predicate rather than an empty feature string.
The `matchMarkerComment` predicate performs badly on any codebase with
a moderately large number of comments, because the current implementation
has to first compute the Cartesian product between the set of comments
and the set of framework library comment regexes.
Instead, match first against a single regex:
the union of all framework library comment regexes.
This computes a more benign Cartesian product, the same size as the set of comments.
See inline comments for more details.