It still runs on uninstantiated templates because its underlying
libraries do. It's not clear whether that leads to other false
positives, but that's independent of the change I'm making here.
Certain Microsoft projects, such as CoreCLR and ChakraCore, use a
library called the PAL, which enables two-byte strings in the printf
family of functions, even when built on a platform with four-byte
strings. This adds support for determining the size of a wide character
from the definitions of such functions, rather than assuming that they
match the compiler's wchar_t.
This predicate looked like a join of two already-computed predicates,
but it was a bit more complicated because the `*` operator expands into
two cases: the reflexive case and the transitive case. The join order
for the transitive case placed the `PrimitiveBasicBlock` charpred call
_after_ the `member_step+` call, which means that all the tuples of
`member_step+` passed through the pipeline.
This commit changes the implementation by fully writing out the
expansion of `*` into two cases, where the base case is manually
specialised to make sure the join orderer doesn't get tempted into
reusing the same strategy for both cases. This speeds up the predicate
from 2m38s to 1s on a snapshot of our own C/C++ code.
The existing implementation of `primitive_basic_block_entry_node` was
"cleverly" computing two properties about `node` with a single
`strictcount`: whether `node` had multiple predecessors and whether any
of those predecessors had more than once successor. This was fast enough
on most snapshots, but on the snapshot of our own code it took 37
seconds to compute `primitive_basic_block_entry_node` and its auxiliary
predicates. This is likely to have affected other large snapshots too.
With this change, the property is computed like in our other languages,
and it brings the run time down to 4 seconds.