diff --git a/cpp/ql/lib/semmle/code/cpp/security/InvalidPointerDereference/AllocationToInvalidPointer.qll b/cpp/ql/lib/semmle/code/cpp/security/InvalidPointerDereference/AllocationToInvalidPointer.qll index efa307527cb..43120e1c7f0 100644 --- a/cpp/ql/lib/semmle/code/cpp/security/InvalidPointerDereference/AllocationToInvalidPointer.qll +++ b/cpp/ql/lib/semmle/code/cpp/security/InvalidPointerDereference/AllocationToInvalidPointer.qll @@ -1,6 +1,54 @@ /** * This file provides the first phase of the `cpp/invalid-pointer-deref` query that identifies flow * from an allocation to a pointer-arithmetic instruction that constructs a pointer that is out of bounds. + * + * Consider the following snippet: + * ```cpp + * 1. char* base = (char*)malloc(size); + * 2. char* end = base + size; + * 3. for(int *p = base; p <= end; p++) { + * 4. use(*p); // BUG: Should have been bounded by `p < end`. + * 5. } + * ``` + * this file identifies the flow from `new int[size]` to `base + size`. + * + * This is done using the product-flow library. The configuration tracks flow from the pair + * `(allocation, size of allocation)` to a pair `(a, b)` where there exists a pointer-arithmetic instruction + * `pai = a + r` such that `b` is a dataflow node where `b <= r`. Because there will be a dataflow-path from + * `allocation` to `a` this means that the `pai` will compute a pointer that is some number of elements beyond + * the end position of the allocation. See `pointerAddInstructionHasBounds` for the implementation of this. + * + * In the above example, the pair `(a, b)` is `(base, size)` with `base` and `size` coming from the expression + * `base + size` on line 2, which is also the pointer-arithmetic instruction. In general, the pair does not necessarily + * correspond directly to the operands of the pointer-arithmetic instruction. + * In the following example, the pair is again `(base, size)`, but with `base` coming from line 3 and `size` from line 2, + * and the pointer-arithmetic instruction being `base + n` on line 3: + * ```cpp + * 1. int* base = new int[size]; + * 2. if(n <= size) { + * 3. int* end = base + n; + * 4. for(int* p = base; p <= end; ++p) { + * 5. *p = 0; // BUG: Should have been bounded by `p < end`. + * 6. } + * 7. } + * ``` + * + * Handling false positives: + * + * Consider a snippet such as: + * ```cpp + * 1. int* base = new int[size]; + * 2. int n = condition() ? size : 0; + * 3. if(n >= size) return; + * 4. int* end = base + n; + * 5. for(int* p = base; p <= end; ++p) { + * 6. *p = 0; // This is fine since `end < base + size` + * 7. } + * ``` + * In order to remove this false positive we define a barrier (see `SizeBarrier::SizeBarrierConfig`) that finds the + * possible guards that compares a value to the size of the allocation. In the above example, this is the `(n >= size)` + * guard on line 3. `SizeBarrier::getABarrierNode` then defines any node that is guarded by such a guard as a barrier + * in the dataflow configuration. */ private import cpp @@ -149,22 +197,6 @@ private module InterestingPointerAddInstruction { /** * A product-flow configuration for flow from an `(allocation, size)` pair to a * pointer-arithmetic operation `pai` such that `pai <= allocation + size`. - * - * The goal of this query is to find patterns such as: - * ```cpp - * 1. char* begin = (char*)malloc(size); - * 2. char* end = begin + size; - * 3. for(int *p = begin; p <= end; p++) { - * 4. use(*p); - * 5. } - * ``` - * - * We do this by splitting the task up into two configurations: - * 1. `AllocToInvalidPointerConfig` find flow from `malloc(size)` to `begin + size`, and - * 2. `InvalidPointerToDerefConfig` finds flow from `begin + size` to an `end` (on line 3). - * - * Finally, the range-analysis library will find a load from (or store to) an address that - * is non-strictly upper-bounded by `end` (which in this case is `*p`). */ private module Config implements ProductFlow::StateConfigSig { class FlowState1 = Unit; diff --git a/cpp/ql/lib/semmle/code/cpp/security/InvalidPointerDereference/InvalidPointerToDereference.qll b/cpp/ql/lib/semmle/code/cpp/security/InvalidPointerDereference/InvalidPointerToDereference.qll index d4fce2e20a8..b84bd819df5 100644 --- a/cpp/ql/lib/semmle/code/cpp/security/InvalidPointerDereference/InvalidPointerToDereference.qll +++ b/cpp/ql/lib/semmle/code/cpp/security/InvalidPointerDereference/InvalidPointerToDereference.qll @@ -2,6 +2,75 @@ * This file provides the second phase of the `cpp/invalid-pointer-deref` query that identifies flow * from the out-of-bounds pointer identified by the `AllocationToInvalidPointer.qll` library to * a dereference of the out-of-bounds pointer. + * + * Consider the following snippet: + * ```cpp + * 1. char* base = (char*)malloc(size); + * 2. char* end = base + size; + * 3. for(char *p = base; p <= end; p++) { + * 4. use(*p); // BUG: Should have been bounded by `p < end`. + * 5. } + * ``` + * this file identifies the flow from `base + size` to `end`. We call `base + size` the "dereference source" and `end` + * the "dereference sink" (even though `end` is not actually dereferenced we will use this term because we will perform + * dataflow to find a use of a pointer `x` such that `x <= end` which is dereferenced. In the above example, `x` is `p` + * on line 4). + * + * Merely _constructing_ a pointer that's out-of-bounds is fine if the pointer is never dereferenced (in reality, the + * standard only guarantees that it is safe to move the pointer one element past the last element, but we ignore that + * here). So this step is about identifying which of the out-of-bounds pointers found by `pointerAddInstructionHasBounds` + * in `AllocationToInvalidPointer.qll` are actually being dereferenced. We do this using a regular dataflow + * configuration (see `InvalidPointerToDerefConfig`). + * + * The dataflow traversal defines the set of sources as any dataflow node `n` such that there exists a pointer-arithmetic + * instruction `pai` found by `AllocationToInvalidPointer.qll` and a `n.asInstruction() >= pai + deltaDerefSourceAndPai`. + * Here, `deltaDerefSourceAndPai` is the constant difference between the source we track for finding a dereference and the + * pointer-arithmetic instruction. + * + * The set of sinks is defined as any dataflow node `n` such that `addr <= n.asInstruction() + deltaDerefSinkAndDerefAddress` + * for some address operand `addr` and constant difference `deltaDerefSinkAndDerefAddress`. Since an address operand is + * always consumed by an instruction that performs a dereference this lets us identify a "bad dereference". We call the + * instruction that consumes the address operand the "operation". + * + * For example, consider the flow from `base + size` to `end` above. The sink is `end` on line 3 because + * `p <= end.asInstruction() + deltaDerefSinkAndDerefAddress`, where `p` is the address operand in `use(*p)` and + * `deltaDerefSinkAndDerefAddress >= 0`. The load attached to `*p` is the "operation". To ensure that the path makes + * intuitive sense, we only pick operations that are control-flow reachable from the dereference sink. + * + * To compute how many elements the dereference is beyond the end position of the allocation, we sum the two deltas + * `deltaDerefSourceAndPai` and `deltaDerefSinkAndDerefAddress`. This is done in the `operationIsOffBy` predicate + * (which is the only predicate exposed by this file). + * + * Handling false positives: + * + * Consider the following snippet: + * ```cpp + * 1. char *p = new char[size]; + * 2. char *end = p + size; + * 3. if (p < end) { + * 4. p += 1; + * 5. } + * 6. if (p < end) { + * 7. int val = *p; // GOOD + * 8. } + * ``` + * this is safe because `p` is guarded to be strictly less than `end` on line 6 before the dereference on line 7. However, if we + * run the query on the above without further modifications we would see an alert on line 7. This is because range analysis infers + * that `p <= end` after the increment on line 4, and thus the result of `p += 1` is seen as a valid dereference source. This + * node then flows to `p` on line 6 (which is a valid dereference sink since it non-strictly upper bounds an address operand), and + * range analysis then infers that the address operand of `*p` (i.e., `p`) is non-strictly upper bounded by `p`, and thus reports + * an alert on line 7. + * + * In order to handle the above false positive, we define a barrier that identifies guards such as `p < end` that ensures that a value + * is less than the pointer-arithmetic instruction that computed the invalid pointer. This is done in the `InvalidPointerToDerefBarrier` + * module. Since the node we are tracking is not necessarily _equal_ to the pointer-arithmetic instruction, but rather satisfies + * `node.asInstruction() <= pai + deltaDerefSourceAndPai`, we need to account for the delta when checking if a guard is sufficiently + * strong to infer that a future dereference is safe. To do this, we check that the guard guarantees that a node `n` satisfies + * `n < node + k` where `node` is a node we know is equal to the value of the dereference source (i.e., it satisfies + * `node.asInstruction() <= pai + deltaDerefSourceAndPai`) and `k <= deltaDerefSourceAndPai`. Combining this we have + * `n < node + k <= node + deltaDerefSourceAndPai <= pai + 2*deltaDerefSourceAndPai` (TODO: Oops. This math doesn't quite work out. + * I think this is because we need to redefine the `BarrierConfig` to start flow at the pointer-arithmetic instruction instead of + * at the dereference source. When combined with TODO above it's easy to show that this guard ensures that the dereference is safe). */ private import cpp diff --git a/cpp/ql/src/experimental/Security/CWE/CWE-193/InvalidPointerDeref.ql b/cpp/ql/src/experimental/Security/CWE/CWE-193/InvalidPointerDeref.ql index 93cac5939d4..f778b4f776f 100644 --- a/cpp/ql/src/experimental/Security/CWE/CWE-193/InvalidPointerDeref.ql +++ b/cpp/ql/src/experimental/Security/CWE/CWE-193/InvalidPointerDeref.ql @@ -15,6 +15,51 @@ * external/cwe/cwe-787 */ +/* + * High-level description of the query: + * + * The goal of this query is to identify issues such as: + * ```cpp + * 1. int* base = new int[size]; + * 2. int* end = base + size; + * 3. for(int* p = base; p <= end; ++p) { + * 4. *p = 0; // BUG: Should have been bounded by `p < end`. + * 5. } + * ``` + * In order to do this, we split the problem into three subtasks: + * 1. First, we find flow from `new int[size]` to `base + size`. + * 2. Then, we find flow from `base + size` to `end` (on line 3). + * 3. Finally, we use range-analysis to find a write to (or read from) a pointer that may be greater than or equal to `end`. + * + * Step 1 is implemented in `AllocationToInvalidPointer.qll`, and step 2 is implemented by + * `InvalidPointerToDereference.qll`. See those files for the description of these. + * + * This file imports both libraries and defines a final dataflow configuration that constructs the full path from + * the allocation to the dereference of the out-of-bounds pointer. This is done for several reasons: + * 1. It means the user is able to inspect the entire path from the allocation to the dereference, which can be useful + * to understand the problem highlighted. + * 2. It ensures that the call-contexts line up correctly when we transition from step 1 to step 2. See the + * `test_missing_call_context_1` and `test_missing_call_context_2` tests for how this may flag false positives + * without this final configuration. + * + * The source of the final path is an allocation that is: + * 1. identified as flowing to an invalid pointer (by `AllocationToInvalidPointer`), and + * 2. for which the invalid pointer flows to a dereference (as identified by `InvalidPointerToDereference`). + * + * The path can be described in 3 "chunks": + * 1. One path from the allocation to the construction of the invalid pointer + * 2. Another path from the construction of the invalid pointer to the final pointer that is about to be dereferenced. + * 3. Finally, a single step from the dataflow node that represents the final pointer to the dereference. + * + * Step 1 happens when the flow state is `TInitial`, and step 2 and 3 happen when the flow state is `TPointerArith(pai)` + * where the pointer-arithmetic instruction `pai` tracks the instruction that generated the out-of-bounds pointer. This + * instruction is used in the construction of the alert message. + * + * The set of pointer-arithmetic instructions that define the `TPointerArith` flow state is restricted to be the pointer- + * arithmetic instructions that both receive flow from the allocation (as identified by `AllocationToInvalidPointer.qll`), + * and further flow to a dereference (as identified by `InvalidPointerToDereference.qll`). + */ + import cpp import semmle.code.cpp.dataflow.new.DataFlow import semmle.code.cpp.ir.IR