Without this fix, running the full LGTM suite would get the IR evaluated
twice. That's because we have multiple IPA types and constructors with
the same name (like `TInstruction` and `MkIRFunction`), and the QL
compiler chooses how to disambiguate those names differently depending
on import order.
I've tested that the IR is only evaluated once now by running the whole
suite on a tiny project (jbj/magicrescue) and looking at the output of
perl -ne 'print if /^RESULTS IN:/ .. /^\[/ and not /^\[/' runSnapshotQueries-debug.log | sort |uniq -c |sort -n |less
Instructions that are removed from the normal value numbering recursion
because they have a duplicated type or AST element get unique value
numbers rather than going unnumbered. This ensures comparisons of value
numbers using `!=` hold for filtered instructions.
This entry in `.codeqlmanifest.json` was intended to allow
unpacking the CodeQL CLI as a subdirectory of `ql`, and things
would Just Work.
However, it is not necessary anymore because recent releases of
the CLI will search their own directory as a fallback
_independently_ of the parent directory.
On the contrary, removing this link will make internal testing
easier because you then run a test build of the CLI with
`--search-path` pointing to the `ql` checkout without inadvertently
making extractors in a _different_ build that is unpacked there visible.
On some snapshots, notably ffmpeg, the IR `ValueNumbering` recursion
would generate billions of tuples and eventually run out of space.
It turns out it was fairly common for an `Instruction` to get more than
one `ValueNumber` in the base cases for `VariableAddressInstruction` and
`InitializeParameterInstruction`, and it could also happen in an
instruction with more than one operand of the same `OperandTag`. When a
binary operation was applied to an instruction with `m` value numbers
and another instruction with `n` value numbers, the result would get
`m * n` value numbers. This led to doubly-exponential growth in the
number of value numbers in rare cases.
The underlying reason why a `VariableAddressInstruction` could get
multiple value numbers is that it was keyed on the associated
`IRVariable`, and the `IRVariable` is defined in part by the type of its
underlying `Variable` (or other AST element). If the extractor defines a
variable to have multiple types because of linker ambiguity, this leads
to the creation of multiple `IRVariable`s. That should ideally be solved
in `TIRVariable.qll`, but for now I've put a workaround in
`ValueNumberingInternal.qll` instead.
To remove the problem with instructions having multiple operands, the
construction in `Operand.qll` will now filter out any such operand. It
wasn't enough to apply that filter to the `raw` stage, so I've applied
it to all three stages.