mirror of
https://github.com/github/codeql.git
synced 2026-01-05 02:30:19 +01:00
Merge pull request #17665 from jketema/printir-doc
C++: Add some documentation on the printed IR
This commit is contained in:
@@ -6,6 +6,112 @@
|
||||
* uses, however, it is better to write a query that imports `PrintIR.qll`, extends
|
||||
* `PrintIRConfiguration`, and overrides `shouldPrintDeclaration()` to select a subset of declarations
|
||||
* to dump.
|
||||
*
|
||||
* Anatomy of a printed IR instruction
|
||||
*
|
||||
* An instruction:
|
||||
*
|
||||
* ```
|
||||
* # 2281| v2281_19(void) = Call[~String] : func:r2281_18, this:r2281_17
|
||||
* ```
|
||||
*
|
||||
* The prefix `# 2281|` specifies that this instruction was generated by the C++ source code on line 2281.
|
||||
* Scrolling up in the printed output, one will eventually find the name of the file to which the line
|
||||
* belongs.
|
||||
*
|
||||
* `v2281_19(void)` is the result of the instruction. Here, `v` means this is a void result or operand (so
|
||||
* there should be no later uses of the result; see below for other possible values). The `2281_19` is a
|
||||
* unique ID for the result. This is usually just the line number plus a small integer suffix to make it
|
||||
* unique within the function. The type of the result is `void`. In this case, it is `void`, because
|
||||
* `~String` returns `void`. The type of the result is usually just the name of the appropriate C++ type,
|
||||
* but it will sometimes be a type like `glval<int>`, which means result holds a glvalue, which at the
|
||||
* IR level works like a pointer. In other words, in the source code the type was `int`, but it is really
|
||||
* more like an `int*`. We see this, for example, in `x = y;`, where `x` is a glvalue.
|
||||
*
|
||||
* `Call` is the opcode of the instruction. Common opcodes include:
|
||||
*
|
||||
* * Arithmetic operations: `Add`, `Sub`, `Mul`, etc.
|
||||
* * Memory access operations: `Load`, `Store`.
|
||||
* * Function calls: `Call`.
|
||||
* * Literals: `Constant`.
|
||||
* * Variable addresses: `VariableAddress`.
|
||||
* * Function entry points: `EnterFunction`.
|
||||
* * Return from a function: `Return`, `ReturnVoid`. Note that the value being returned is set separately by a
|
||||
* `Store` to a special `#return` variable.
|
||||
* * Stack unwinding for C++ function that throw and where the exception escapes the function: `Unwind`.
|
||||
* * Common exit point for `Unwind` and `Return`: `ExitFunction`.
|
||||
* * SSA-related opcodes: `Phi`, `Chi`.
|
||||
*
|
||||
* `[~String]` denotes additional information. The information might be present earlier in the IR, as is the case
|
||||
* for `Call`, where it is the name of the called function. This is also the case for `Load` and `Store`, where it
|
||||
* is the name of the variable that loaded or stored (if known). In the case of `Constant`, `FieldAddress`, and
|
||||
* `VariableAddress`, the information between brackets does not occur earlier.
|
||||
*
|
||||
* `func:r2281_18` and `this:r28281_17` are the operands of the instruction. The `func:` prefix denotes the operand
|
||||
* that holds the address of the called function. The `this:` prefix denotes the argument to the special `this`
|
||||
* parameter of an instance member function. `r2281_18`, `r2281_17` are the unique IDs of the operands. Each of these
|
||||
* matches the ID of a previously seen result, showing where that value came from. The `r` means that these are
|
||||
* "register" operands (see below).
|
||||
*
|
||||
* Result and operand kinds:
|
||||
*
|
||||
* Every result and operand is one of these three kinds:
|
||||
*
|
||||
* * `r` "register". These operands are not stored in any particular memory location. We can think of them as
|
||||
* temporary values created during the evaluation of an expression. A register operand almost always has one
|
||||
* use, often in the same block as its definition.
|
||||
* * `m` "memory". These operands represents accesses to a specific memory location. The location could be a
|
||||
* local variable, a global variable, a field of an object, an element of an array, or any memory that we happen
|
||||
* to have a pointer to. These only occur as the result of a `Store`, the source operand of a `Load` or on the
|
||||
* SSA instructions (`Phi`, `Chi`).
|
||||
* * `v` "void". Really just a register operand, but we mark register operands of type void with this special prefix
|
||||
* so we know that there is no actual value there.
|
||||
*
|
||||
* Branches in the IR:
|
||||
*
|
||||
* The IR is divided into basic blocks. At the end of each block, there are one or more edges showing the possible
|
||||
* control flow successors of the block.
|
||||
*
|
||||
* ```
|
||||
* # 44| v44_3(void) = ConditionalBranch : r44_2
|
||||
* #-----| False -> Block 4
|
||||
* #-----| True -> Block 3
|
||||
* ```
|
||||
* Here we have a block that ends with a conditional branch. The two edges show where the control flows to depending
|
||||
* on whether the condition is true or false.
|
||||
*
|
||||
* SSA instructions:
|
||||
*
|
||||
* We use `Phi` instructions in SSA to create a single definition for a variable that might be assigned on multiple
|
||||
* control flow paths. The `Phi` instruction merges the potential values of that variable from each predecessor edge,
|
||||
* and the resulting definition is then used wherever that variable is accessed later on.
|
||||
*
|
||||
* When dealing with aliased memory, we use the `Chi` instruction to create a single definition for memory that might
|
||||
* or might not have been updated by a store, depending on the actual address that was written to. For example, take:
|
||||
*
|
||||
* ```cpp
|
||||
* int x = 5;
|
||||
* int y = 7;
|
||||
* int* p = condition ? &x : &y;
|
||||
* *p = 6;
|
||||
* return x;
|
||||
* ```
|
||||
*
|
||||
* At the point where we store to `*p`, we do not know whether `p` points to `x` or `y`. Thus, we do not know whether
|
||||
* `return x;` is going to return the value that `x` was originally initialized to (5), or whether it will return 6,
|
||||
* because it was overwritten by `*p = 6;`. We insert a `Chi` instruction immediately after the store to `*p`:
|
||||
*
|
||||
* ```
|
||||
* r2(int) = Constant[6]
|
||||
* r3(int*) = <<value of p>>
|
||||
* m4(int) = Store : &r3, r2 // Stores the constant 6 to *p
|
||||
* m5(unknown) = Chi : total:m1, partial:m4
|
||||
* ```
|
||||
* The `partial:` operand represents the memory that was just stored. The `total:` operand represents the previous
|
||||
* contents of all of the memory that `p` might have pointed to (in this case, both `x` and `y`). The result of the
|
||||
* `Chi` represents the new contents of whatever memory the `total:` operand referred to. We usually do not know exactly
|
||||
* which parts of that memory were overwritten, but it does model that any of that memory could have been modified, so
|
||||
* that later instructions do not assume that the memory was unchanged.
|
||||
*/
|
||||
|
||||
private import internal.IRInternal
|
||||
|
||||
@@ -6,6 +6,112 @@
|
||||
* uses, however, it is better to write a query that imports `PrintIR.qll`, extends
|
||||
* `PrintIRConfiguration`, and overrides `shouldPrintDeclaration()` to select a subset of declarations
|
||||
* to dump.
|
||||
*
|
||||
* Anatomy of a printed IR instruction
|
||||
*
|
||||
* An instruction:
|
||||
*
|
||||
* ```
|
||||
* # 2281| v2281_19(void) = Call[~String] : func:r2281_18, this:r2281_17
|
||||
* ```
|
||||
*
|
||||
* The prefix `# 2281|` specifies that this instruction was generated by the C++ source code on line 2281.
|
||||
* Scrolling up in the printed output, one will eventually find the name of the file to which the line
|
||||
* belongs.
|
||||
*
|
||||
* `v2281_19(void)` is the result of the instruction. Here, `v` means this is a void result or operand (so
|
||||
* there should be no later uses of the result; see below for other possible values). The `2281_19` is a
|
||||
* unique ID for the result. This is usually just the line number plus a small integer suffix to make it
|
||||
* unique within the function. The type of the result is `void`. In this case, it is `void`, because
|
||||
* `~String` returns `void`. The type of the result is usually just the name of the appropriate C++ type,
|
||||
* but it will sometimes be a type like `glval<int>`, which means result holds a glvalue, which at the
|
||||
* IR level works like a pointer. In other words, in the source code the type was `int`, but it is really
|
||||
* more like an `int*`. We see this, for example, in `x = y;`, where `x` is a glvalue.
|
||||
*
|
||||
* `Call` is the opcode of the instruction. Common opcodes include:
|
||||
*
|
||||
* * Arithmetic operations: `Add`, `Sub`, `Mul`, etc.
|
||||
* * Memory access operations: `Load`, `Store`.
|
||||
* * Function calls: `Call`.
|
||||
* * Literals: `Constant`.
|
||||
* * Variable addresses: `VariableAddress`.
|
||||
* * Function entry points: `EnterFunction`.
|
||||
* * Return from a function: `Return`, `ReturnVoid`. Note that the value being returned is set separately by a
|
||||
* `Store` to a special `#return` variable.
|
||||
* * Stack unwinding for C++ function that throw and where the exception escapes the function: `Unwind`.
|
||||
* * Common exit point for `Unwind` and `Return`: `ExitFunction`.
|
||||
* * SSA-related opcodes: `Phi`, `Chi`.
|
||||
*
|
||||
* `[~String]` denotes additional information. The information might be present earlier in the IR, as is the case
|
||||
* for `Call`, where it is the name of the called function. This is also the case for `Load` and `Store`, where it
|
||||
* is the name of the variable that loaded or stored (if known). In the case of `Constant`, `FieldAddress`, and
|
||||
* `VariableAddress`, the information between brackets does not occur earlier.
|
||||
*
|
||||
* `func:r2281_18` and `this:r28281_17` are the operands of the instruction. The `func:` prefix denotes the operand
|
||||
* that holds the address of the called function. The `this:` prefix denotes the argument to the special `this`
|
||||
* parameter of an instance member function. `r2281_18`, `r2281_17` are the unique IDs of the operands. Each of these
|
||||
* matches the ID of a previously seen result, showing where that value came from. The `r` means that these are
|
||||
* "register" operands (see below).
|
||||
*
|
||||
* Result and operand kinds:
|
||||
*
|
||||
* Every result and operand is one of these three kinds:
|
||||
*
|
||||
* * `r` "register". These operands are not stored in any particular memory location. We can think of them as
|
||||
* temporary values created during the evaluation of an expression. A register operand almost always has one
|
||||
* use, often in the same block as its definition.
|
||||
* * `m` "memory". These operands represents accesses to a specific memory location. The location could be a
|
||||
* local variable, a global variable, a field of an object, an element of an array, or any memory that we happen
|
||||
* to have a pointer to. These only occur as the result of a `Store`, the source operand of a `Load` or on the
|
||||
* SSA instructions (`Phi`, `Chi`).
|
||||
* * `v` "void". Really just a register operand, but we mark register operands of type void with this special prefix
|
||||
* so we know that there is no actual value there.
|
||||
*
|
||||
* Branches in the IR:
|
||||
*
|
||||
* The IR is divided into basic blocks. At the end of each block, there are one or more edges showing the possible
|
||||
* control flow successors of the block.
|
||||
*
|
||||
* ```
|
||||
* # 44| v44_3(void) = ConditionalBranch : r44_2
|
||||
* #-----| False -> Block 4
|
||||
* #-----| True -> Block 3
|
||||
* ```
|
||||
* Here we have a block that ends with a conditional branch. The two edges show where the control flows to depending
|
||||
* on whether the condition is true or false.
|
||||
*
|
||||
* SSA instructions:
|
||||
*
|
||||
* We use `Phi` instructions in SSA to create a single definition for a variable that might be assigned on multiple
|
||||
* control flow paths. The `Phi` instruction merges the potential values of that variable from each predecessor edge,
|
||||
* and the resulting definition is then used wherever that variable is accessed later on.
|
||||
*
|
||||
* When dealing with aliased memory, we use the `Chi` instruction to create a single definition for memory that might
|
||||
* or might not have been updated by a store, depending on the actual address that was written to. For example, take:
|
||||
*
|
||||
* ```cpp
|
||||
* int x = 5;
|
||||
* int y = 7;
|
||||
* int* p = condition ? &x : &y;
|
||||
* *p = 6;
|
||||
* return x;
|
||||
* ```
|
||||
*
|
||||
* At the point where we store to `*p`, we do not know whether `p` points to `x` or `y`. Thus, we do not know whether
|
||||
* `return x;` is going to return the value that `x` was originally initialized to (5), or whether it will return 6,
|
||||
* because it was overwritten by `*p = 6;`. We insert a `Chi` instruction immediately after the store to `*p`:
|
||||
*
|
||||
* ```
|
||||
* r2(int) = Constant[6]
|
||||
* r3(int*) = <<value of p>>
|
||||
* m4(int) = Store : &r3, r2 // Stores the constant 6 to *p
|
||||
* m5(unknown) = Chi : total:m1, partial:m4
|
||||
* ```
|
||||
* The `partial:` operand represents the memory that was just stored. The `total:` operand represents the previous
|
||||
* contents of all of the memory that `p` might have pointed to (in this case, both `x` and `y`). The result of the
|
||||
* `Chi` represents the new contents of whatever memory the `total:` operand referred to. We usually do not know exactly
|
||||
* which parts of that memory were overwritten, but it does model that any of that memory could have been modified, so
|
||||
* that later instructions do not assume that the memory was unchanged.
|
||||
*/
|
||||
|
||||
private import internal.IRInternal
|
||||
|
||||
@@ -6,6 +6,112 @@
|
||||
* uses, however, it is better to write a query that imports `PrintIR.qll`, extends
|
||||
* `PrintIRConfiguration`, and overrides `shouldPrintDeclaration()` to select a subset of declarations
|
||||
* to dump.
|
||||
*
|
||||
* Anatomy of a printed IR instruction
|
||||
*
|
||||
* An instruction:
|
||||
*
|
||||
* ```
|
||||
* # 2281| v2281_19(void) = Call[~String] : func:r2281_18, this:r2281_17
|
||||
* ```
|
||||
*
|
||||
* The prefix `# 2281|` specifies that this instruction was generated by the C++ source code on line 2281.
|
||||
* Scrolling up in the printed output, one will eventually find the name of the file to which the line
|
||||
* belongs.
|
||||
*
|
||||
* `v2281_19(void)` is the result of the instruction. Here, `v` means this is a void result or operand (so
|
||||
* there should be no later uses of the result; see below for other possible values). The `2281_19` is a
|
||||
* unique ID for the result. This is usually just the line number plus a small integer suffix to make it
|
||||
* unique within the function. The type of the result is `void`. In this case, it is `void`, because
|
||||
* `~String` returns `void`. The type of the result is usually just the name of the appropriate C++ type,
|
||||
* but it will sometimes be a type like `glval<int>`, which means result holds a glvalue, which at the
|
||||
* IR level works like a pointer. In other words, in the source code the type was `int`, but it is really
|
||||
* more like an `int*`. We see this, for example, in `x = y;`, where `x` is a glvalue.
|
||||
*
|
||||
* `Call` is the opcode of the instruction. Common opcodes include:
|
||||
*
|
||||
* * Arithmetic operations: `Add`, `Sub`, `Mul`, etc.
|
||||
* * Memory access operations: `Load`, `Store`.
|
||||
* * Function calls: `Call`.
|
||||
* * Literals: `Constant`.
|
||||
* * Variable addresses: `VariableAddress`.
|
||||
* * Function entry points: `EnterFunction`.
|
||||
* * Return from a function: `Return`, `ReturnVoid`. Note that the value being returned is set separately by a
|
||||
* `Store` to a special `#return` variable.
|
||||
* * Stack unwinding for C++ function that throw and where the exception escapes the function: `Unwind`.
|
||||
* * Common exit point for `Unwind` and `Return`: `ExitFunction`.
|
||||
* * SSA-related opcodes: `Phi`, `Chi`.
|
||||
*
|
||||
* `[~String]` denotes additional information. The information might be present earlier in the IR, as is the case
|
||||
* for `Call`, where it is the name of the called function. This is also the case for `Load` and `Store`, where it
|
||||
* is the name of the variable that loaded or stored (if known). In the case of `Constant`, `FieldAddress`, and
|
||||
* `VariableAddress`, the information between brackets does not occur earlier.
|
||||
*
|
||||
* `func:r2281_18` and `this:r28281_17` are the operands of the instruction. The `func:` prefix denotes the operand
|
||||
* that holds the address of the called function. The `this:` prefix denotes the argument to the special `this`
|
||||
* parameter of an instance member function. `r2281_18`, `r2281_17` are the unique IDs of the operands. Each of these
|
||||
* matches the ID of a previously seen result, showing where that value came from. The `r` means that these are
|
||||
* "register" operands (see below).
|
||||
*
|
||||
* Result and operand kinds:
|
||||
*
|
||||
* Every result and operand is one of these three kinds:
|
||||
*
|
||||
* * `r` "register". These operands are not stored in any particular memory location. We can think of them as
|
||||
* temporary values created during the evaluation of an expression. A register operand almost always has one
|
||||
* use, often in the same block as its definition.
|
||||
* * `m` "memory". These operands represents accesses to a specific memory location. The location could be a
|
||||
* local variable, a global variable, a field of an object, an element of an array, or any memory that we happen
|
||||
* to have a pointer to. These only occur as the result of a `Store`, the source operand of a `Load` or on the
|
||||
* SSA instructions (`Phi`, `Chi`).
|
||||
* * `v` "void". Really just a register operand, but we mark register operands of type void with this special prefix
|
||||
* so we know that there is no actual value there.
|
||||
*
|
||||
* Branches in the IR:
|
||||
*
|
||||
* The IR is divided into basic blocks. At the end of each block, there are one or more edges showing the possible
|
||||
* control flow successors of the block.
|
||||
*
|
||||
* ```
|
||||
* # 44| v44_3(void) = ConditionalBranch : r44_2
|
||||
* #-----| False -> Block 4
|
||||
* #-----| True -> Block 3
|
||||
* ```
|
||||
* Here we have a block that ends with a conditional branch. The two edges show where the control flows to depending
|
||||
* on whether the condition is true or false.
|
||||
*
|
||||
* SSA instructions:
|
||||
*
|
||||
* We use `Phi` instructions in SSA to create a single definition for a variable that might be assigned on multiple
|
||||
* control flow paths. The `Phi` instruction merges the potential values of that variable from each predecessor edge,
|
||||
* and the resulting definition is then used wherever that variable is accessed later on.
|
||||
*
|
||||
* When dealing with aliased memory, we use the `Chi` instruction to create a single definition for memory that might
|
||||
* or might not have been updated by a store, depending on the actual address that was written to. For example, take:
|
||||
*
|
||||
* ```cpp
|
||||
* int x = 5;
|
||||
* int y = 7;
|
||||
* int* p = condition ? &x : &y;
|
||||
* *p = 6;
|
||||
* return x;
|
||||
* ```
|
||||
*
|
||||
* At the point where we store to `*p`, we do not know whether `p` points to `x` or `y`. Thus, we do not know whether
|
||||
* `return x;` is going to return the value that `x` was originally initialized to (5), or whether it will return 6,
|
||||
* because it was overwritten by `*p = 6;`. We insert a `Chi` instruction immediately after the store to `*p`:
|
||||
*
|
||||
* ```
|
||||
* r2(int) = Constant[6]
|
||||
* r3(int*) = <<value of p>>
|
||||
* m4(int) = Store : &r3, r2 // Stores the constant 6 to *p
|
||||
* m5(unknown) = Chi : total:m1, partial:m4
|
||||
* ```
|
||||
* The `partial:` operand represents the memory that was just stored. The `total:` operand represents the previous
|
||||
* contents of all of the memory that `p` might have pointed to (in this case, both `x` and `y`). The result of the
|
||||
* `Chi` represents the new contents of whatever memory the `total:` operand referred to. We usually do not know exactly
|
||||
* which parts of that memory were overwritten, but it does model that any of that memory could have been modified, so
|
||||
* that later instructions do not assume that the memory was unchanged.
|
||||
*/
|
||||
|
||||
private import internal.IRInternal
|
||||
|
||||
Reference in New Issue
Block a user