mirror of
https://github.com/github/codeql.git
synced 2025-12-22 19:56:32 +01:00
262 lines
12 KiB
Plaintext
262 lines
12 KiB
Plaintext
/**
|
|
* This file provides the second phase of the `cpp/invalid-pointer-deref` query that identifies flow
|
|
* from the out-of-bounds pointer identified by the `AllocationToInvalidPointer.qll` library to
|
|
* a dereference of the out-of-bounds pointer.
|
|
*
|
|
* Consider the following snippet:
|
|
* ```cpp
|
|
* 1. char* base = (char*)malloc(size);
|
|
* 2. char* end = base + size;
|
|
* 3. for(int *p = base; p <= end; p++) {
|
|
* 4. use(*p);
|
|
* 5. }
|
|
* ```
|
|
* this file identifies the flow from `base + size` to `end`. We call `base + size` the "dereference source" and `end`
|
|
* the "dereference sink" (even though `end` is not actually dereferenced - it will be used to find the correct
|
|
* dereference eventually).
|
|
*
|
|
* Merely _constructing_ a pointer that's out-of-bounds is fine if the pointer is never dereferenced (in reality, the
|
|
* standard only guarentees that it's safe to move the pointer one element past the last element. But we ignore that
|
|
* here). So this step is about identifying which of those out-of-bounds pointers identified from step 1 that are
|
|
* actually being dereferenced. We do this using a regular dataflow configuration (see `InvalidPointerToDerefConfig`).
|
|
*
|
|
* This dataflow traversal defines the set of sources as any dataflow node that is non-strictly upper-bounded by the
|
|
* pointer-arithmetic instruction identified by `AllocationToInvalidPointer.qll`. (TODO: I'm pretty sure this is incorrect,
|
|
* and we should define the set of sources as anything that is non-strictly _lower_ bounded by the pointer-arithmetic
|
|
* instruction). That is, the set of sources is any dataflow node `source` such that `source.asInstruction <= pai + delta1`
|
|
* for some `delta1 >= 0`.
|
|
*
|
|
* The set of sinks is defined to be any address operand `addr` that is non-strictly upper-bounded by the sink. That is,
|
|
* any dataflow node `n` such that `addr <= sink.asInstruction() + delta2` for some `delta2`. We call the instruction that
|
|
* consumes the address operand the "operation".
|
|
*
|
|
* For example, consider the flow from `base + size` to `end` above. The sink is `end` on line 3 because that is a dataflow
|
|
* node whose underlying instruction non-strictly upper bounds the address operand `p` in `use(*p)`. The load attached to `*p`
|
|
* is the "operation". To ensure that the path makes intuitive sense, we only pick operations that are control-flow reachable
|
|
* from the dereference sink.
|
|
*
|
|
* To compute the amount of the dereference is away from the final entry of the allocation, we sum the two deltas `delta1` and
|
|
* `delta2`. This is done in the `operationIsOffBy` predicate (which is the only predicate exposed by this file).
|
|
*
|
|
* Handling false positives:
|
|
*
|
|
* Consider the following snippet:
|
|
* ```cpp
|
|
* 1. char *p = new char[size];
|
|
* 2. char *end = p + size; // $ alloc=L363
|
|
* 3. if (p < end) {
|
|
* 4. p += 1;
|
|
* 5. }
|
|
* 6. if (p < end) {
|
|
* 7. int val = *p; // GOOD
|
|
* 8. }
|
|
* ```
|
|
* this is safe because `p` is guarded to be strictly less than `end` on line 6 before the dereference on line 7. However, if we
|
|
* run the query on the above without further modifications we'd see an alert on line 7. This is because range analysis infers
|
|
* that `p <= end` after the increment on line 4, and thus the result of `p += 1` is seen as a valid dereference source. This
|
|
* node then flows to `p` on line 6 (which is a valid dereference sink since it non-strictly upper bounds an address operand), and
|
|
* range analysis then infers that the address operand of `*p` (i.e., `p`) is non-strictly upper bounded by `p`, and thus reports
|
|
* an alert on line 7.
|
|
*
|
|
* In order to handle this false positive, we define a barrier that identifies guards such as `p < end` that ensures that a value
|
|
* is less than the pointer-arithmetic instruction that computed the invalid pointer. This is done in the `InvalidPointerToDerefBarrier`
|
|
* module. Since the node we're tracking isn't necessarily _equal_ to the pointer-arithmetic instruction, but rather satisfies
|
|
* `node.asInstruction() <= pai + delta`, we need to account for the delta when checking if a guard is sufficiently strong to infer
|
|
* that a future dereference is safe. To do this, we check that the guard guarantees that a node `n` satisfies `n < node + d` where
|
|
* `node` is a node we know is equal to the value of the dereference source (i.e., it satisfies `node.asInstruction() <= pai + delta`)
|
|
* and `d <= delta`. Combining this we have `n < node + d <= node + delta <= pai + 2*delta` (TODO: Oops. This math doesn't quite work
|
|
* out. This is because we need to redefine the `BarrierConfig` to start flow at the pointer-arithmetic instruction instead of at the
|
|
* dereference source. When combined with TODO above it's easy to show that this guard ensures that the dereference is safe).
|
|
*/
|
|
|
|
private import cpp
|
|
private import semmle.code.cpp.dataflow.new.DataFlow
|
|
private import semmle.code.cpp.ir.ValueNumbering
|
|
private import semmle.code.cpp.controlflow.IRGuards
|
|
private import AllocationToInvalidPointer as AllocToInvalidPointer
|
|
private import RangeAnalysisUtil
|
|
|
|
private module InvalidPointerToDerefBarrier {
|
|
private module BarrierConfig implements DataFlow::ConfigSig {
|
|
predicate isSource(DataFlow::Node source) {
|
|
// The sources is the same as in the sources for `InvalidPointerToDerefConfig`.
|
|
invalidPointerToDerefSource(_, _, source, _)
|
|
}
|
|
|
|
additional predicate isSink(
|
|
DataFlow::Node left, DataFlow::Node right, IRGuardCondition g, int state, boolean testIsTrue
|
|
) {
|
|
// The sink is any "large" side of a relational comparison.
|
|
g.comparesLt(left.asOperand(), right.asOperand(), state, true, testIsTrue)
|
|
}
|
|
|
|
predicate isSink(DataFlow::Node sink) { isSink(_, sink, _, _, _) }
|
|
}
|
|
|
|
private module BarrierFlow = DataFlow::Global<BarrierConfig>;
|
|
|
|
private int getInvalidPointerToDerefSourceDelta(DataFlow::Node node) {
|
|
exists(DataFlow::Node source |
|
|
BarrierFlow::flow(source, node) and
|
|
invalidPointerToDerefSource(_, _, source, result)
|
|
)
|
|
}
|
|
|
|
private predicate operandGuardChecks(
|
|
IRGuardCondition g, Operand left, Operand right, int state, boolean edge
|
|
) {
|
|
exists(DataFlow::Node nLeft, DataFlow::Node nRight, int state0 |
|
|
nRight.asOperand() = right and
|
|
nLeft.asOperand() = left and
|
|
BarrierConfig::isSink(nLeft, nRight, g, state0, edge) and
|
|
state = getInvalidPointerToDerefSourceDelta(nRight) and
|
|
state0 <= state
|
|
)
|
|
}
|
|
|
|
Instruction getABarrierInstruction(int state) {
|
|
exists(IRGuardCondition g, ValueNumber value, Operand use, boolean edge |
|
|
use = value.getAUse() and
|
|
operandGuardChecks(pragma[only_bind_into](g), pragma[only_bind_into](use), _, state,
|
|
pragma[only_bind_into](edge)) and
|
|
result = value.getAnInstruction() and
|
|
g.controls(result.getBlock(), edge)
|
|
)
|
|
}
|
|
|
|
DataFlow::Node getABarrierNode() { result.asOperand() = getABarrierInstruction(_).getAUse() }
|
|
|
|
pragma[nomagic]
|
|
IRBlock getABarrierBlock(int state) { result.getAnInstruction() = getABarrierInstruction(state) }
|
|
}
|
|
|
|
/**
|
|
* A configuration to track flow from a pointer-arithmetic operation found
|
|
* by `AllocToInvalidPointerConfig` to a dereference of the pointer.
|
|
*/
|
|
private module InvalidPointerToDerefConfig implements DataFlow::ConfigSig {
|
|
predicate isSource(DataFlow::Node source) { invalidPointerToDerefSource(_, _, source, _) }
|
|
|
|
pragma[inline]
|
|
predicate isSink(DataFlow::Node sink) { isInvalidPointerDerefSink(sink, _, _, _) }
|
|
|
|
predicate isBarrier(DataFlow::Node node) {
|
|
node = any(DataFlow::SsaPhiNode phi | not phi.isPhiRead()).getAnInput(true)
|
|
or
|
|
node = InvalidPointerToDerefBarrier::getABarrierNode()
|
|
}
|
|
}
|
|
|
|
private import DataFlow::Global<InvalidPointerToDerefConfig>
|
|
|
|
/**
|
|
* Holds if `source1` is dataflow node that represents an allocation that flows to the
|
|
* left-hand side of the pointer-arithmetic `pai`, and `derefSource` is a dataflow node with
|
|
* a pointer-value that is non-strictly upper bounded by `pai + delta`.
|
|
*
|
|
* For example, if `pai` is a pointer-arithmetic operation `p + size` in an expression such
|
|
* as `(p + size) + 1` and `derefSource` is the node representing `(p + size) + 1`. In this
|
|
* case `delta` is 1.
|
|
*/
|
|
private predicate invalidPointerToDerefSource(
|
|
DataFlow::Node source1, PointerArithmeticInstruction pai, DataFlow::Node derefSource, int delta
|
|
) {
|
|
exists(int delta0 |
|
|
// Note that `delta` is not necessarily equal to `delta0`:
|
|
// `delta0` is the constant offset added to the size of the allocation, and
|
|
// delta is the constant difference between the pointer-arithmetic instruction
|
|
// and the instruction computing the address for which we will search for a dereference.
|
|
AllocToInvalidPointer::pointerAddInstructionHasBounds(source1, pai, _, delta0) and
|
|
bounded2(derefSource.asInstruction(), pai, delta) and
|
|
delta >= 0 and
|
|
// TODO: This condition will go away once #13725 is merged, and then we can make `Barrier2`
|
|
// private to `AllocationToInvalidPointer.qll`.
|
|
not derefSource.getBasicBlock() = AllocToInvalidPointer::Barrier2::getABarrierBlock(delta0)
|
|
)
|
|
}
|
|
|
|
/**
|
|
* Holds if `sink` is a sink for `InvalidPointerToDerefConfig` and `i` is a `StoreInstruction` that
|
|
* writes to an address that non-strictly upper-bounds `sink`, or `i` is a `LoadInstruction` that
|
|
* reads from an address that non-strictly upper-bounds `sink`.
|
|
*/
|
|
pragma[inline]
|
|
private predicate isInvalidPointerDerefSink(
|
|
DataFlow::Node sink, Instruction i, string operation, int delta
|
|
) {
|
|
exists(AddressOperand addr, Instruction s, IRBlock b |
|
|
s = sink.asInstruction() and
|
|
bounded(addr.getDef(), s, delta) and
|
|
delta >= 0 and
|
|
i.getAnOperand() = addr and
|
|
b = i.getBlock() and
|
|
not b = InvalidPointerToDerefBarrier::getABarrierBlock(delta)
|
|
|
|
|
i instanceof StoreInstruction and
|
|
operation = "write"
|
|
or
|
|
i instanceof LoadInstruction and
|
|
operation = "read"
|
|
)
|
|
}
|
|
|
|
/**
|
|
* Yields any instruction that is control-flow reachable from `instr`.
|
|
*/
|
|
bindingset[instr, result]
|
|
pragma[inline_late]
|
|
private Instruction getASuccessor(Instruction instr) {
|
|
exists(IRBlock b, int instrIndex, int resultIndex |
|
|
b.getInstruction(instrIndex) = instr and
|
|
b.getInstruction(resultIndex) = result
|
|
|
|
|
resultIndex >= instrIndex
|
|
)
|
|
or
|
|
instr.getBlock().getASuccessor+() = result.getBlock()
|
|
}
|
|
|
|
private predicate paiForDereferenceSink(PointerArithmeticInstruction pai, DataFlow::Node derefSink) {
|
|
exists(DataFlow::Node derefSource |
|
|
invalidPointerToDerefSource(_, pai, derefSource, _) and
|
|
flow(derefSource, derefSink)
|
|
)
|
|
}
|
|
|
|
/**
|
|
* Holds if `derefSink` is a dataflow node that represents an out-of-bounds address that is about to
|
|
* be dereferenced by `operation` (which is either a `StoreInstruction` or `LoadInstruction`), and
|
|
* `pai` is the pointer-arithmetic operation that caused the `derefSink` to be out-of-bounds.
|
|
*/
|
|
private predicate derefSinkToOperation(
|
|
DataFlow::Node derefSink, PointerArithmeticInstruction pai, DataFlow::Node operation,
|
|
string description, int delta
|
|
) {
|
|
exists(Instruction i |
|
|
paiForDereferenceSink(pai, pragma[only_bind_into](derefSink)) and
|
|
isInvalidPointerDerefSink(derefSink, i, description, delta) and
|
|
i = getASuccessor(derefSink.asInstruction()) and
|
|
operation.asInstruction() = i
|
|
)
|
|
}
|
|
|
|
/**
|
|
* Holds if `allocation` is the result of an allocation that flows to the left-hand side of `pai`, and where
|
|
* the right-hand side of `pai` is an offset such that the result of `pai` points to an out-of-bounds pointer.
|
|
*
|
|
* Furthermore, `derefSource` is at least as large as `pai` and flows to `derefSink` before being dereferenced
|
|
* by `operation` (which is either a `StoreInstruction` or `LoadInstruction`). The result is that `operation`
|
|
* dereferences a pointer that's "off by `delta`" number of elements.
|
|
*/
|
|
predicate operationIsOffBy(
|
|
DataFlow::Node allocation, PointerArithmeticInstruction pai, DataFlow::Node derefSource,
|
|
DataFlow::Node derefSink, string description, DataFlow::Node operation, int delta
|
|
) {
|
|
exists(int deltaDerefSourceAndPai, int deltaDerefSinkAndDerefAddress |
|
|
invalidPointerToDerefSource(allocation, pai, derefSource, deltaDerefSourceAndPai) and
|
|
flow(derefSource, derefSink) and
|
|
derefSinkToOperation(derefSink, pai, operation, description, deltaDerefSinkAndDerefAddress) and
|
|
delta = deltaDerefSourceAndPai + deltaDerefSinkAndDerefAddress
|
|
)
|
|
}
|