Avoids a Cartesian product on nodes:
```
[2020-02-07 11:01:22] (432s) Tuple counts for dom#DataFlowImpl::TPathNodeSink#ff:
0 ~0% {2} r1 = JOIN DataFlowImpl::Configuration::isSource_dispred#ff AS L WITH DataFlowImpl::Configuration::isSink_dispred#ff AS R ON FIRST 2 OUTPUT R.<1>, R.<0>
101611 ~0% {2} r2 = SCAN DataFlowImpl::PathNodeMid#class#ffffff AS I OUTPUT I.<5>, I.<0>
3534537047 ~3% {3} r3 = JOIN r2 WITH DataFlowImpl::Configuration::isSink_dispred#ff AS R ON FIRST 1 OUTPUT r2.<1>, R.<1>, R.<0>
251 ~41% {3} r4 = JOIN r3 WITH project#DataFlowImpl::pathStep#fffff AS R ON FIRST 2 OUTPUT R.<2>, r3.<2>, r3.<1>
251 ~50% {2} r5 = JOIN r4 WITH DataFlowImpl::TNil#ff_1#join_rhs AS R ON FIRST 1 OUTPUT r4.<2>, r4.<1>
251 ~50% {2} r6 = r1 \/ r5
323 ~67% {3} r7 = JOIN r6 WITH DataFlowImpl::flow#ff AS R ON FIRST 1 OUTPUT r6.<1>, r6.<0>, R.<1>
288 ~58% {3} r8 = SELECT r7 ON r7.<2> >= r7.<0>
251 ~53% {3} r9 = SELECT r8 ON r8.<2> <= r8.<0>
251 ~50% {2} r10 = SCAN r9 OUTPUT r9.<1>, r9.<0>
```
Joining on the enclosing callable before the kind is crucial, as witnessed by this pipeline:
```
[2020-02-06 17:58:21] (1086s) Starting to evaluate predicate DataFlowImplCommon::getReturnPosition#ff/2@83c546
[2020-02-06 18:53:16] (4382s) Tuple counts for DataFlowImplCommon::getReturnPosition#ff:
385478 ~1% {3} r1 = SCAN DataFlowImplCommon::Cached::TReturnPosition0#fff@staged_ext AS I OUTPUT I.<2>, I.<0>, I.<1>
385478 ~2% {3} r2 = JOIN r1 WITH DataFlowImplCommon::Cached::TReturnPosition0#fff_2#join_rhs AS R ON FIRST 1 OUTPUT r1.<2>, r1.<1>, r1.<0>
58638116860 ~0% {3} r3 = JOIN r2 WITH DataFlowImplCommon::ReturnNodeExt::getKind_dispred#ff_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, r2.<1>, r2.<2>
914049 ~0% {2} r4 = JOIN r3 WITH DataFlowImplCommon::returnNodeGetEnclosingCallable#ff AS R ON FIRST 2 OUTPUT r3.<0>, r3.<2>
return r4
```
Also fixed a minor bug where we should have been treating `AllNonLocalMemory` as _totally_ overlapping an access to a non-local variable, rather than _partially_ overlapping it. This fix is exhibited both in the new test case and in a couple existing test functions in `ssa.cpp`.
It turns out that the evaluator will evaluate the GVN stage even when no
predicate from it is needed after optimization of the subsequent stages.
The GVN library is expensive to evaluate, and it'll become even more
expensive when we switch its implementation to IR.
This PR disables the use of GVN in `DataFlow::BarrierNode` for the AST
data-flow library, which should improve performance when evaluating a
single data-flow query on a snapshot with no cache. Precision decreases
slightly, leading to a new FP in the qltests.
There is no corresponding change for the IR data-flow library since IR
GVN is not very expensive.