On some snapshots, notably ffmpeg, the IR `ValueNumbering` recursion
would generate billions of tuples and eventually run out of space.
It turns out it was fairly common for an `Instruction` to get more than
one `ValueNumber` in the base cases for `VariableAddressInstruction` and
`InitializeParameterInstruction`, and it could also happen in an
instruction with more than one operand of the same `OperandTag`. When a
binary operation was applied to an instruction with `m` value numbers
and another instruction with `n` value numbers, the result would get
`m * n` value numbers. This led to doubly-exponential growth in the
number of value numbers in rare cases.
The underlying reason why a `VariableAddressInstruction` could get
multiple value numbers is that it was keyed on the associated
`IRVariable`, and the `IRVariable` is defined in part by the type of its
underlying `Variable` (or other AST element). If the extractor defines a
variable to have multiple types because of linker ambiguity, this leads
to the creation of multiple `IRVariable`s. That should ideally be solved
in `TIRVariable.qll`, but for now I've put a workaround in
`ValueNumberingInternal.qll` instead.
To remove the problem with instructions having multiple operands, the
construction in `Operand.qll` will now filter out any such operand. It
wasn't enough to apply that filter to the `raw` stage, so I've applied
it to all three stages.
Before I added `--max-import-depth=2`, there was a bit of trouble, where it
would alert on `from pkg_ok import foo2` -- since all the `pkg_ok.foo<n>`
modules were missing, I guess the analysis didn't make any assumptions on
whether `foo2` is a module or a regular attribute.
So I've been thinking a bit about import pkg_ok.foo1 after reading the Python
references for imports of submodules
https://docs.python.org/3/reference/import.html#submodules
> When a submodule is loaded using any mechanism (...) a binding is placed in the
parent module’s namespace to the submodule object. For example, if package spam
has a submodule foo, after importing spam.foo, spam will have an attribute foo
which is bound to the submodule.
That does at least explain what is going on here.
I feel that import pkg_ok.foo1 might be a very contrived example. In principle
it should be an alert, since the module pkg_ok ends up with an import of itself,
but my gut feeling is that in practice it's not a very important piece of code
to give alerts for. if we really care about giving these import related alerts,
we could probably add a new query for this pattern, as it's kind of surprising
that it works when you're just an ordinary python programmer.