Now that the query has both tests and qhelp, we can use it on LGTM. This
commit also adds a change note.
I renamed the query to reduce confusion from the lower-case unquoted
word "alloca".
The `IRBlock::backEdgeSuccessor` predicate, in its three copies, had
become slow:
6:IRBlock::Cached::backEdgeSuccessor#fff ...... 1m1s
7:IRBlock::Cached::backEdgeSuccessor#2#fff .... 52.3s
8:IRBlock::Cached::backEdgeSuccessor#3#fff .... 26.4s
The slow part was finding all the nodes involved in cycles in the
`forwardEdgeRaw` graph. This was done with `forwardEdgeRaw+(pred, pred)`,
but that got compiled into a materialization of `forwardEdgeRaw+`, which
is a huge relation with 1,816,752,107 rows on Wireshark:
(1474s) Starting to evaluate predicate IRBlock::Cached::backEdgeSuccessor#3#fff
(1501s) Tuple counts:
0 ~0% {2} r1 = SELECT #IRBlock::Cached::forwardEdgeRaw#3#ffPlus ON FIELDS #IRBlock::Cached::forwardEdgeRaw#3#ffPlus.<0>=#IRBlock::Cached::forwardEdgeRaw#3#ffPlus.<1>
0 ~0% {1} r2 = SCAN r1 OUTPUT FIELDS {r1.<0>}
0 ~0% {3} r3 = JOIN r2 WITH IRBlock::Cached::blockSuccessor#6#fff ON r2.<0>=IRBlock::Cached::blockSuccessor#6#fff.<0> OUTPUT FIELDS {r2.<0>,IRBlock::Cached::blockSuccessor#6#fff.<1>,IRBlock::Cached::blockSuccessor#6#fff.<2>}
12411 ~7% {3} r4 = IRBlock::Cached::backEdgeSuccessorRaw#3#fff \/ r3
return r4
(1501s) >>> Relation IRBlock::Cached::backEdgeSuccessor#3#fff: 12411 rows using 0 MB
The problem is the `SELECT`. It's fast to join on a fastTC result once
we know what we're looking for, so this fix materializes the identity
relation on `IRBlock` and joins with that so the fastTC ends up on the
RHS of a join, where it's fast. I had to introduce a helper predicate
because even with `noopt` I couldn't get `pred = pred2` to come _before_
`forwardEdgeRaw+(pred, pred2)`. The predicate now takes less than a
second to evaluate:
(539s) Starting to evaluate predicate IRBlock::Cached::backEdgeSuccessor#fff
(539s) >>> Relation IRBlock::Cached::blockImmediatelyDominates#ff: 574677 rows using 0 MB
(539s) ... created with 574677 rows and 2 columns.
(539s) Tuple counts:
702445 ~1% {2} r1 = SELECT IRBlock::Cached::blockIdentity#ff ON FIELDS IRBlock::Cached::blockIdentity#ff.<0>=IRBlock::Cached::blockIdentity#ff.<1>
702445 ~1% {2} r2 = SCAN r1 OUTPUT FIELDS {r1.<0>,r1.<0>}
0 ~0% {1} r3 = JOIN r2 WITH #IRBlock::Cached::forwardEdgeRaw#ffPlus ON r2.<0>=#IRBlock::Cached::forwardEdgeRaw#ffPlus.<0> AND r2.<1>=#IRBlock::Cached::forwardEdgeRaw#ffPlus.<1> OUTPUT FIELDS {r2.<0>}
0 ~0% {3} r4 = JOIN r3 WITH IRBlock::Cached::blockSuccessor#2#fff ON r3.<0>=IRBlock::Cached::blockSuccessor#2#fff.<0> OUTPUT FIELDS {r3.<0>,IRBlock::Cached::blockSuccessor#2#fff.<1>,IRBlock::Cached::blockSuccessor#2#fff.<2>}
20487 ~0% {3} r5 = IRBlock::Cached::backEdgeSuccessorRaw#fff \/ r4
return r5
(539s) >>> Relation IRBlock::Cached::backEdgeSuccessor#fff: 20487 rows using 0 MB
Use the iterated dominance frontier algorithm to speed up dominance
frontier calculations. The implementation is copied from d310338c9b.
Before this change, the SSA calculations for unaliased and aliased SSA
used 169.9 seconds in total on these predicates:
7:Dominance::getDominanceFrontier#2#ff .. 49s
7:Dominance::blockDominates#2#ff ........ 47.5s
8:Dominance::getDominanceFrontier#ff .... 44.4s
8:Dominance::blockDominates#ff .......... 29s
After this change, the above predicates are replaced by two copies of
`getDominanceFrontier`, each of which takes less than a second.