This change introduces a new synthesized `IRVariable` in every varargs function. This variable represents the entire set of arguments passed to the ellipsis by the caller. We give it an opaque type big enough hold all of the arguments passed by the largest vararg call in the database. It is treated just like any other parameter. It is initialized the same, it has indirect buffers, etc.
I had to introduce a couple new APIs to `Call` and `Function`. The QLDoc comments should explain these. I added tests for these new APIs as well.
The next step will be to change the IR generation for the `va_*` macros to manipulate the ellipsis parameter.
Added a new `StaticLocalVariable` class, which made several other pieces of the original change a bit cleaner.
Fixed test failures due to a mistake in the original `CFG.qll` change.
Added a test case for static local variables with constructors.
Removed the `Uninitialized` instruction from the initialization of a static local, because all objects with static storage duration are zero-initialized at startup.
Fixed expectations for `SignAnalysis.ql` to reflect that a bad result is now fixed.
Previously, the IR for the initialization of a static local variable ran the initialization unconditionally, every time the declaration was reached during execution. This means that we don't model the possibility that an access to the static variable fetches a value that was set on a previous execution of the function.
I've added some simple modelling of the correct behavior to the IR. For each static local variable that has a dynamic initializer, we synthesize a (static) `bool` variable to hold whether the initializer for the original variable has executed. When executing a declaration, we check the value of the synthesized variable, and skip the initialization code if it is `true`. If it is `false`, we execute the initialization code as before, and then set the flag to `true`. This doesn't capture the thread-safe nature of static initialization, but I think it's more than enough to handle anything we're likely to care about for the foreseeable future.
In `TranslatedDeclarationEntry.qll`, I split the translation of a static local variable declaration into two `TranslatedElement`s: one for the declaration itself, and one for the initialization. The declaration part handles the checking and setting of the flag; the initialization just does the initialization as before.
I've added an IR test case that has static variables with constant, zero, and dynamic initialization. I've also verified the new IR generated for @jbj's previous test cases for constant initialization.
I inverted the sense of the `hasConstantInitialization()` predicate to be `hasDynamicInitialization()`. Mostly this just made more sense to me, but I think it also fixed a potential bug where `hasConstantInitialization()` would not hold for a zero-initialized variable. Technically, constant initialization isn't the same as zero initialization, but I believe that most code really cares about the distinction between dynamic initialization and static initialization, where static initialization includes both constant and zero initialization.
I've fixed up the C# side of IR generation to continue working, but it doesn't use any of the dynamic initialization stuff. In theory, it could use something similar to model the initialization of static fields.
Also fixed a minor bug where we should have been treating `AllNonLocalMemory` as _totally_ overlapping an access to a non-local variable, rather than _partially_ overlapping it. This fix is exhibited both in the new test case and in a couple existing test functions in `ssa.cpp`.
We were hitting a combinatorial explosion in `hasDefinitionAtRank` for functions that contain a large number of string literals. The problem was that every `Chi` instruction for `AliasedVirtualVariable` was treated as a definition of every string literal. We already mark string literals as `isReadOnly()`, but we were allowing `AliasedVirtualVariable` to define read-only locations so that the `AliasedDefinition` instruction would provide the initial definition for all string literals.
To fix this, I've introduced the new `InitializeNonLocal` instruction, which is inserted in the prologue of every function right after `AliasedDefinition`. It provides the initial definition for every non-stack memory location, including read-only locations, but is never written to anywhere else. It is the conterpart of the `AliasedUse` instruction in the function epilogue, which represents the use of all non-stack memory after the function returns. I considered renaming `AliasedUse` to `ReturnNonLocal`, to match the `InitializeXXX`/`ReturnXXX` pattern we already use for parameters and indirections, but held off to avoid unnecessary churn. Any thoughts on whether I should make this name change?
This change has a significant speedup in evaluation time for a few of our troublesome databases:
`attnam/ivan`: 13%
`awslabs/s2n`: 26%
`SinaMostafanejad/OpenRDM`: 7%
`zcoinofficial/zcoin`: 8%