Empty StmtSequences appear, for example, in the `else` branch of `if`
statements like the following:
foo
if cond
bar
else
end
baz
Before this change, the CFG for this code would look like this:
foo
│
│
▼
cond
│
true │
▼
bar
│
│
▼
if
│
│
▼
baz
i.e. there is linear flow through the condition, the `then` branch, and
out of the if. This doesn't account for the possibility that the
condition is false and `bar` is not executed. After this change, the CFG
looks like this:
foo
│
│
▼
cond
│ │
true │ │ false
▼ │
bar │
│ │
│ │
▼ ▼
if
│
│
▼
baz
i.e. we correctly account for the `false` condition.
Use a `strictcount` to identify whether there is exactly one feature or not.
If so, we use it. If not, we use the empty string.
Add context to ensure we filter the set of data flow nodes down to only
the set of endpoint nodes.
This performance optimisation avoids calculating the Cartesian product
of data flow nodes and feature names, but it does not avoid calculating
the (slightly smaller) Cartesian product of endpoint nodes and feature names.
Product size = number of endpoint nodes * number of feature names.
At time of writing there are 8 feature names.
Change the cutoff logic from `count` to `strictcount`, since we know it only applies
to a non-empty set of results.
Use a single `strictconcat` aggregate to combine tokens in order of location,
instead of computing a `rank` followed by a `concat`.
Strictness introduces a slight change of behaviour because missing tokens will now result
in no results from the predicate rather than an empty feature string.
The `matchMarkerComment` predicate performs badly on any codebase with
a moderately large number of comments, because the current implementation
has to first compute the Cartesian product between the set of comments
and the set of framework library comment regexes.
Instead, match first against a single regex:
the union of all framework library comment regexes.
This computes a more benign Cartesian product, the same size as the set of comments.
See inline comments for more details.