Fixes issue https://github.com/github/codeql-c-analysis-team/issues/20
This change undoes the workaround in https://github.com/Semmle/ql/pull/2736, and replaces it with a fix for the underlying cause. The problem was that the IR construction code for side effects incorrectly assumed that `BufferAccessOpcode` included `SizedBufferAccessOpcode`. I think that was actually a perfectly reasonable assumption to make, so I changed the `Opcode` hierarchy to make it true.
On some snapshots, notably ffmpeg, the IR `ValueNumbering` recursion
would generate billions of tuples and eventually run out of space.
It turns out it was fairly common for an `Instruction` to get more than
one `ValueNumber` in the base cases for `VariableAddressInstruction` and
`InitializeParameterInstruction`, and it could also happen in an
instruction with more than one operand of the same `OperandTag`. When a
binary operation was applied to an instruction with `m` value numbers
and another instruction with `n` value numbers, the result would get
`m * n` value numbers. This led to doubly-exponential growth in the
number of value numbers in rare cases.
The underlying reason why a `VariableAddressInstruction` could get
multiple value numbers is that it was keyed on the associated
`IRVariable`, and the `IRVariable` is defined in part by the type of its
underlying `Variable` (or other AST element). If the extractor defines a
variable to have multiple types because of linker ambiguity, this leads
to the creation of multiple `IRVariable`s. That should ideally be solved
in `TIRVariable.qll`, but for now I've put a workaround in
`ValueNumberingInternal.qll` instead.
To remove the problem with instructions having multiple operands, the
construction in `Operand.qll` will now filter out any such operand. It
wasn't enough to apply that filter to the `raw` stage, so I've applied
it to all three stages.
Avoids a Cartesian product on nodes:
```
[2020-02-07 11:01:22] (432s) Tuple counts for dom#DataFlowImpl::TPathNodeSink#ff:
0 ~0% {2} r1 = JOIN DataFlowImpl::Configuration::isSource_dispred#ff AS L WITH DataFlowImpl::Configuration::isSink_dispred#ff AS R ON FIRST 2 OUTPUT R.<1>, R.<0>
101611 ~0% {2} r2 = SCAN DataFlowImpl::PathNodeMid#class#ffffff AS I OUTPUT I.<5>, I.<0>
3534537047 ~3% {3} r3 = JOIN r2 WITH DataFlowImpl::Configuration::isSink_dispred#ff AS R ON FIRST 1 OUTPUT r2.<1>, R.<1>, R.<0>
251 ~41% {3} r4 = JOIN r3 WITH project#DataFlowImpl::pathStep#fffff AS R ON FIRST 2 OUTPUT R.<2>, r3.<2>, r3.<1>
251 ~50% {2} r5 = JOIN r4 WITH DataFlowImpl::TNil#ff_1#join_rhs AS R ON FIRST 1 OUTPUT r4.<2>, r4.<1>
251 ~50% {2} r6 = r1 \/ r5
323 ~67% {3} r7 = JOIN r6 WITH DataFlowImpl::flow#ff AS R ON FIRST 1 OUTPUT r6.<1>, r6.<0>, R.<1>
288 ~58% {3} r8 = SELECT r7 ON r7.<2> >= r7.<0>
251 ~53% {3} r9 = SELECT r8 ON r8.<2> <= r8.<0>
251 ~50% {2} r10 = SCAN r9 OUTPUT r9.<1>, r9.<0>
```
Joining on the enclosing callable before the kind is crucial, as witnessed by this pipeline:
```
[2020-02-06 17:58:21] (1086s) Starting to evaluate predicate DataFlowImplCommon::getReturnPosition#ff/2@83c546
[2020-02-06 18:53:16] (4382s) Tuple counts for DataFlowImplCommon::getReturnPosition#ff:
385478 ~1% {3} r1 = SCAN DataFlowImplCommon::Cached::TReturnPosition0#fff@staged_ext AS I OUTPUT I.<2>, I.<0>, I.<1>
385478 ~2% {3} r2 = JOIN r1 WITH DataFlowImplCommon::Cached::TReturnPosition0#fff_2#join_rhs AS R ON FIRST 1 OUTPUT r1.<2>, r1.<1>, r1.<0>
58638116860 ~0% {3} r3 = JOIN r2 WITH DataFlowImplCommon::ReturnNodeExt::getKind_dispred#ff_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, r2.<1>, r2.<2>
914049 ~0% {2} r4 = JOIN r3 WITH DataFlowImplCommon::returnNodeGetEnclosingCallable#ff AS R ON FIRST 2 OUTPUT r3.<0>, r3.<2>
return r4
```