C++: Model dynamic initialization of static local variables in IR

Previously, the IR for the initialization of a static local variable ran the initialization unconditionally, every time the declaration was reached during execution. This means that we don't model the possibility that an access to the static variable fetches a value that was set on a previous execution of the function.

I've added some simple modelling of the correct behavior to the IR. For each static local variable that has a dynamic initializer, we synthesize a (static) `bool` variable to hold whether the initializer for the original variable has executed. When executing a declaration, we check the value of the synthesized variable, and skip the initialization code if it is `true`. If it is `false`, we execute the initialization code as before, and then set the flag to `true`. This doesn't capture the thread-safe nature of static initialization, but I think it's more than enough to handle anything we're likely to care about for the foreseeable future.

In `TranslatedDeclarationEntry.qll`, I split the translation of a static local variable declaration into two `TranslatedElement`s: one for the declaration itself, and one for the initialization. The declaration part handles the checking and setting of the flag; the initialization just does the initialization as before.

I've added an IR test case that has static variables with constant, zero, and dynamic initialization. I've also verified the new IR generated for @jbj's previous test cases for constant initialization.

I inverted the sense of the `hasConstantInitialization()` predicate to be `hasDynamicInitialization()`. Mostly this just made more sense to me, but I think it also fixed a potential bug where `hasConstantInitialization()` would not hold for a zero-initialized variable. Technically, constant initialization isn't the same as zero initialization, but I believe that most code really cares about the distinction between dynamic initialization and static initialization, where static initialization includes both constant and zero initialization.

I've fixed up the C# side of IR generation to continue working, but it doesn't use any of the dynamic initialization stuff. In theory, it could use something similar to model the initialization of static fields.
This commit is contained in:
Dave Bartolomeo
2020-03-12 18:29:16 -04:00
parent 66fd566b66
commit 1526400a81
18 changed files with 517 additions and 74 deletions

View File

@@ -8303,6 +8303,70 @@ ir.cpp:
# 1219| Type = [IntType] int
# 1219| ValueCategory = prvalue(load)
# 1220| 4: [ReturnStmt] return ...
# 1222| [TopLevelFunction] int staticLocalInit(int)
# 1222| params:
# 1222| 0: [Parameter] x
# 1222| Type = [IntType] int
# 1222| body: [Block] { ... }
# 1223| 0: [DeclStmt] declaration
# 1223| 0: [VariableDeclarationEntry] definition of a
# 1223| Type = [IntType] int
# 1223| init: [Initializer] initializer for a
# 1223| expr: [Literal] 0
# 1223| Type = [IntType] int
# 1223| Value = [Literal] 0
# 1223| ValueCategory = prvalue
# 1224| 1: [DeclStmt] declaration
# 1224| 0: [VariableDeclarationEntry] definition of b
# 1224| Type = [IntType] int
# 1224| init: [Initializer] initializer for b
# 1224| expr: [CStyleCast] (int)...
# 1224| Conversion = [IntegralConversion] integral conversion
# 1224| Type = [IntType] int
# 1224| Value = [CStyleCast] 4
# 1224| ValueCategory = prvalue
# 1224| expr: [SizeofExprOperator] sizeof(<expr>)
# 1224| Type = [LongType] unsigned long
# 1224| Value = [SizeofExprOperator] 4
# 1224| ValueCategory = prvalue
# 1224| 0: [ParenthesisExpr] (...)
# 1224| Type = [IntType] int
# 1224| ValueCategory = lvalue
# 1224| expr: [VariableAccess] x
# 1224| Type = [IntType] int
# 1224| ValueCategory = lvalue
# 1225| 2: [DeclStmt] declaration
# 1225| 0: [VariableDeclarationEntry] definition of c
# 1225| Type = [IntType] int
# 1225| init: [Initializer] initializer for c
# 1225| expr: [VariableAccess] x
# 1225| Type = [IntType] int
# 1225| ValueCategory = prvalue(load)
# 1226| 3: [DeclStmt] declaration
# 1226| 0: [VariableDeclarationEntry] definition of d
# 1226| Type = [IntType] int
# 1228| 4: [ReturnStmt] return ...
# 1228| 0: [AddExpr] ... + ...
# 1228| Type = [IntType] int
# 1228| ValueCategory = prvalue
# 1228| 0: [AddExpr] ... + ...
# 1228| Type = [IntType] int
# 1228| ValueCategory = prvalue
# 1228| 0: [AddExpr] ... + ...
# 1228| Type = [IntType] int
# 1228| ValueCategory = prvalue
# 1228| 0: [VariableAccess] a
# 1228| Type = [IntType] int
# 1228| ValueCategory = prvalue(load)
# 1228| 1: [VariableAccess] b
# 1228| Type = [IntType] int
# 1228| ValueCategory = prvalue(load)
# 1228| 1: [VariableAccess] c
# 1228| Type = [IntType] int
# 1228| ValueCategory = prvalue(load)
# 1228| 1: [VariableAccess] d
# 1228| Type = [IntType] int
# 1228| ValueCategory = prvalue(load)
perf-regression.cpp:
# 4| [CopyAssignmentOperator] Big& Big::operator=(Big const&)
# 4| params:

View File

@@ -1219,4 +1219,13 @@ void switch2Case_default(int x) {
int z = y;
}
int staticLocalInit(int x) {
static int a = 0; // Constant initialization
static int b = sizeof(x); // Constant initialization
static int c = x; // Dynamic initialization
static int d; // Zero initialization
return a + b + c + d;
}
// semmle-extractor-options: -std=c++17 --clang

View File

@@ -6228,6 +6228,49 @@ ir.cpp:
# 1205| v1205_9(void) = AliasedUse : ~mu1205_4
# 1205| v1205_10(void) = ExitFunction :
# 1222| int staticLocalInit(int)
# 1222| Block 0
# 1222| v1222_1(void) = EnterFunction :
# 1222| mu1222_2(unknown) = AliasedDefinition :
# 1222| mu1222_3(unknown) = InitializeNonLocal :
# 1222| mu1222_4(unknown) = UnmodeledDefinition :
# 1222| r1222_5(glval<int>) = VariableAddress[x] :
# 1222| mu1222_6(int) = InitializeParameter[x] : &:r1222_5
# 1225| r1225_1(glval<bool>) = VariableAddress[c#init] :
# 1225| r1225_2(bool) = Load : &:r1225_1, ~mu1222_4
# 1225| v1225_3(void) = ConditionalBranch : r1225_2
#-----| False -> Block 2
#-----| True -> Block 1
# 1228| Block 1
# 1228| r1228_1(glval<int>) = VariableAddress[#return] :
# 1228| r1228_2(glval<int>) = VariableAddress[a] :
# 1228| r1228_3(int) = Load : &:r1228_2, ~mu1222_4
# 1228| r1228_4(glval<int>) = VariableAddress[b] :
# 1228| r1228_5(int) = Load : &:r1228_4, ~mu1222_4
# 1228| r1228_6(int) = Add : r1228_3, r1228_5
# 1228| r1228_7(glval<int>) = VariableAddress[c] :
# 1228| r1228_8(int) = Load : &:r1228_7, ~mu1222_4
# 1228| r1228_9(int) = Add : r1228_6, r1228_8
# 1228| r1228_10(glval<int>) = VariableAddress[d] :
# 1228| r1228_11(int) = Load : &:r1228_10, ~mu1222_4
# 1228| r1228_12(int) = Add : r1228_9, r1228_11
# 1228| mu1228_13(int) = Store : &:r1228_1, r1228_12
# 1222| r1222_7(glval<int>) = VariableAddress[#return] :
# 1222| v1222_8(void) = ReturnValue : &:r1222_7, ~mu1222_4
# 1222| v1222_9(void) = UnmodeledUse : mu*
# 1222| v1222_10(void) = AliasedUse : ~mu1222_4
# 1222| v1222_11(void) = ExitFunction :
# 1225| Block 2
# 1225| r1225_4(glval<int>) = VariableAddress[c] :
# 1225| r1225_5(glval<int>) = VariableAddress[x] :
# 1225| r1225_6(int) = Load : &:r1225_5, ~mu1222_4
# 1225| mu1225_7(int) = Store : &:r1225_4, r1225_6
# 1225| r1225_8(bool) = Constant[1] :
# 1225| mu1225_9(bool) = Store : &:r1225_1, r1225_8
#-----| Goto -> Block 1
perf-regression.cpp:
# 6| void Big::Big()
# 6| Block 0
@@ -6301,8 +6344,6 @@ struct_init.cpp:
# 20| mu20_2(unknown) = AliasedDefinition :
# 20| mu20_3(unknown) = InitializeNonLocal :
# 20| mu20_4(unknown) = UnmodeledDefinition :
# 21| r21_1(glval<Info[2]>) = VariableAddress[static_infos] :
# 21| mu21_2(Info[2]) = Uninitialized[static_infos] : &:r21_1
# 25| r25_1(glval<unknown>) = FunctionAddress[let_info_escape] :
# 25| r25_2(glval<Info[2]>) = VariableAddress[static_infos] :
# 25| r25_3(Info *) = Convert : r25_2
@@ -6358,45 +6399,57 @@ struct_init.cpp:
# 36| void declare_static_runtime_infos(char const*)
# 36| Block 0
# 36| v36_1(void) = EnterFunction :
# 36| mu36_2(unknown) = AliasedDefinition :
# 36| mu36_3(unknown) = InitializeNonLocal :
# 36| mu36_4(unknown) = UnmodeledDefinition :
# 36| r36_5(glval<char *>) = VariableAddress[name1] :
# 36| mu36_6(char *) = InitializeParameter[name1] : &:r36_5
# 36| r36_7(char *) = Load : &:r36_5, ~mu36_6
# 36| mu36_8(unknown) = InitializeIndirection[name1] : &:r36_7
# 37| r37_1(glval<Info[2]>) = VariableAddress[static_infos] :
# 37| mu37_2(Info[2]) = Uninitialized[static_infos] : &:r37_1
# 37| r37_3(int) = Constant[0] :
# 37| r37_4(glval<Info>) = PointerAdd[16] : r37_1, r37_3
# 38| r38_1(glval<char *>) = FieldAddress[name] : r37_4
# 38| r38_2(glval<char *>) = VariableAddress[name1] :
# 38| r38_3(char *) = Load : &:r38_2, ~mu36_4
# 38| mu38_4(char *) = Store : &:r38_1, r38_3
# 38| r38_5(glval<..(*)(..)>) = FieldAddress[handler] : r37_4
# 38| r38_6(..(*)(..)) = FunctionAddress[handler1] :
# 38| mu38_7(..(*)(..)) = Store : &:r38_5, r38_6
# 37| r37_5(int) = Constant[1] :
# 37| r37_6(glval<Info>) = PointerAdd[16] : r37_1, r37_5
# 39| r39_1(glval<char *>) = FieldAddress[name] : r37_6
# 39| r39_2(glval<char[2]>) = StringConstant["2"] :
# 39| r39_3(char *) = Convert : r39_2
# 39| mu39_4(char *) = Store : &:r39_1, r39_3
# 39| r39_5(glval<..(*)(..)>) = FieldAddress[handler] : r37_6
# 39| r39_6(glval<..()(..)>) = FunctionAddress[handler2] :
# 39| r39_7(..(*)(..)) = CopyValue : r39_6
# 39| mu39_8(..(*)(..)) = Store : &:r39_5, r39_7
# 41| r41_1(glval<unknown>) = FunctionAddress[let_info_escape] :
# 41| r41_2(glval<Info[2]>) = VariableAddress[static_infos] :
# 41| r41_3(Info *) = Convert : r41_2
# 41| v41_4(void) = Call : func:r41_1, 0:r41_3
# 41| mu41_5(unknown) = ^CallSideEffect : ~mu36_4
# 41| v41_6(void) = ^BufferReadSideEffect[0] : &:r41_3, ~mu36_4
# 41| mu41_7(unknown) = ^BufferMayWriteSideEffect[0] : &:r41_3
# 42| v42_1(void) = NoOp :
# 36| v36_9(void) = ReturnIndirection : &:r36_7, ~mu36_4
# 36| v36_10(void) = ReturnVoid :
# 36| v36_11(void) = UnmodeledUse : mu*
# 36| v36_12(void) = AliasedUse : ~mu36_4
# 36| v36_13(void) = ExitFunction :
# 36| v36_1(void) = EnterFunction :
# 36| mu36_2(unknown) = AliasedDefinition :
# 36| mu36_3(unknown) = InitializeNonLocal :
# 36| mu36_4(unknown) = UnmodeledDefinition :
# 36| r36_5(glval<char *>) = VariableAddress[name1] :
# 36| mu36_6(char *) = InitializeParameter[name1] : &:r36_5
# 36| r36_7(char *) = Load : &:r36_5, ~mu36_6
# 36| mu36_8(unknown) = InitializeIndirection[name1] : &:r36_7
# 37| r37_1(glval<bool>) = VariableAddress[static_infos#init] :
# 37| r37_2(bool) = Load : &:r37_1, ~mu36_4
# 37| v37_3(void) = ConditionalBranch : r37_2
#-----| False -> Block 2
#-----| True -> Block 1
# 41| Block 1
# 41| r41_1(glval<unknown>) = FunctionAddress[let_info_escape] :
# 41| r41_2(glval<Info[2]>) = VariableAddress[static_infos] :
# 41| r41_3(Info *) = Convert : r41_2
# 41| v41_4(void) = Call : func:r41_1, 0:r41_3
# 41| mu41_5(unknown) = ^CallSideEffect : ~mu36_4
# 41| v41_6(void) = ^BufferReadSideEffect[0] : &:r41_3, ~mu36_4
# 41| mu41_7(unknown) = ^BufferMayWriteSideEffect[0] : &:r41_3
# 42| v42_1(void) = NoOp :
# 36| v36_9(void) = ReturnIndirection : &:r36_7, ~mu36_4
# 36| v36_10(void) = ReturnVoid :
# 36| v36_11(void) = UnmodeledUse : mu*
# 36| v36_12(void) = AliasedUse : ~mu36_4
# 36| v36_13(void) = ExitFunction :
# 37| Block 2
# 37| r37_4(glval<Info[2]>) = VariableAddress[static_infos] :
# 37| mu37_5(Info[2]) = Uninitialized[static_infos] : &:r37_4
# 37| r37_6(int) = Constant[0] :
# 37| r37_7(glval<Info>) = PointerAdd[16] : r37_4, r37_6
# 38| r38_1(glval<char *>) = FieldAddress[name] : r37_7
# 38| r38_2(glval<char *>) = VariableAddress[name1] :
# 38| r38_3(char *) = Load : &:r38_2, ~mu36_4
# 38| mu38_4(char *) = Store : &:r38_1, r38_3
# 38| r38_5(glval<..(*)(..)>) = FieldAddress[handler] : r37_7
# 38| r38_6(..(*)(..)) = FunctionAddress[handler1] :
# 38| mu38_7(..(*)(..)) = Store : &:r38_5, r38_6
# 37| r37_8(int) = Constant[1] :
# 37| r37_9(glval<Info>) = PointerAdd[16] : r37_4, r37_8
# 39| r39_1(glval<char *>) = FieldAddress[name] : r37_9
# 39| r39_2(glval<char[2]>) = StringConstant["2"] :
# 39| r39_3(char *) = Convert : r39_2
# 39| mu39_4(char *) = Store : &:r39_1, r39_3
# 39| r39_5(glval<..(*)(..)>) = FieldAddress[handler] : r37_9
# 39| r39_6(glval<..()(..)>) = FunctionAddress[handler2] :
# 39| r39_7(..(*)(..)) = CopyValue : r39_6
# 39| mu39_8(..(*)(..)) = Store : &:r39_5, r39_7
# 37| r37_10(bool) = Constant[1] :
# 37| mu37_11(bool) = Store : &:r37_1, r37_10
#-----| Goto -> Block 1