mirror of
https://github.com/hohn/codeql-dataflow-sql-injection.git
synced 2025-12-16 10:13:04 +01:00
move isAdditionalTaintStep explanation to Taint Flow Configuration
This commit is contained in:
committed by
=Michael Hohn
parent
999f665ceb
commit
ab99c0fb44
@@ -298,6 +298,8 @@ VS Code) are good ways explore the libraries.
|
|||||||
|
|
||||||
|
|
||||||
### Codeql Recap
|
### Codeql Recap
|
||||||
|
XX:
|
||||||
|
|
||||||
As quick test of your setup, import the ql cpp library and run the empty query
|
As quick test of your setup, import the ql cpp library and run the empty query
|
||||||
```ql
|
```ql
|
||||||
import cpp
|
import cpp
|
||||||
@@ -307,7 +309,6 @@ select 1
|
|||||||
We'll assume the `import cpp` is in the header of our query and not rewrite it
|
We'll assume the `import cpp` is in the header of our query and not rewrite it
|
||||||
every time.
|
every time.
|
||||||
|
|
||||||
XX:
|
|
||||||
|
|
||||||
### The Data Sink
|
### The Data Sink
|
||||||
Now let's find the function `sqlite3_exec`. In CodeQL, this uses `Function`
|
Now let's find the function `sqlite3_exec`. In CodeQL, this uses `Function`
|
||||||
@@ -369,10 +370,10 @@ select read, buf
|
|||||||
|
|
||||||
### The Extra Flow Step
|
### The Extra Flow Step
|
||||||
The codeql data flow library traverses *visible* source code fairly well, but flow
|
The codeql data flow library traverses *visible* source code fairly well, but flow
|
||||||
through opaque functions requires additional support. Functions for which only a
|
through opaque functions requires additional support (more on this later).
|
||||||
headers is available are opaque, and we have one of these here: the call to
|
Functions for which only a headers is available are opaque, and we have one of
|
||||||
`snprintf`. Once we get this call, there are *two* nodes to identify: the inflow
|
these here: the call to `snprintf`. Once we locate this call, there are *two* nodes
|
||||||
and outflow.
|
to identify: the inflow and outflow.
|
||||||
|
|
||||||
Let's start with `snprintf`. If we try
|
Let's start with `snprintf`. If we try
|
||||||
```ql
|
```ql
|
||||||
@@ -450,8 +451,20 @@ These are done next.
|
|||||||
### Taint Flow Configuration
|
### Taint Flow Configuration
|
||||||
The way we configure global taint flow is by creating a custom extension of the
|
The way we configure global taint flow is by creating a custom extension of the
|
||||||
`TaintTracking::Configuration` class, and speciyfing `isSource`, `isSink`, and
|
`TaintTracking::Configuration` class, and speciyfing `isSource`, `isSink`, and
|
||||||
`isAdditionalTaintStep` predicates. A starting configuration can look like the
|
`isAdditionalTaintStep` predicates.
|
||||||
following, with details to follow.
|
|
||||||
|
The sources and sinks were explained earlier. Data flow and taint tracking
|
||||||
|
configuration classes support a number of additional features that help configure
|
||||||
|
the process of building and exploring the data flow path.
|
||||||
|
|
||||||
|
One such feature is adding additional taint steps. This is useful if you use
|
||||||
|
libraries which are not modelled by the default taint tracking. You can implement
|
||||||
|
this by overriding `isAdditionalTaintStep` predicate. This has two parameters, the
|
||||||
|
`from` and the `to` node, and essentially allows you to add extra edges into the
|
||||||
|
taint tracking or data flow graph.
|
||||||
|
|
||||||
|
A starting configuration can look like the following, with details to be filled
|
||||||
|
in.
|
||||||
|
|
||||||
```ql
|
```ql
|
||||||
class SqliFlowConfig extends TaintTracking::Configuration {
|
class SqliFlowConfig extends TaintTracking::Configuration {
|
||||||
@@ -594,13 +607,12 @@ predicate isSink(DataFlow::Node sink) {
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
### The Data Source
|
### The isSource Predicate
|
||||||
|
Recall that the external data enters through the `buf` argument to the call
|
||||||
The external data enters through the call
|
|
||||||
|
|
||||||
count = read(STDIN_FILENO, buf, BUFSIZE);
|
count = read(STDIN_FILENO, buf, BUFSIZE);
|
||||||
|
|
||||||
We thus want the `buf` argument to the call of the `read` function. Together, this is
|
and we got this via the query
|
||||||
|
|
||||||
```ql
|
```ql
|
||||||
from FunctionCall read, Expr buf
|
from FunctionCall read, Expr buf
|
||||||
@@ -610,43 +622,51 @@ where
|
|||||||
select read, buf
|
select read, buf
|
||||||
```
|
```
|
||||||
|
|
||||||
|
As for the `isSink` predicate in the previous section, we need to convert this to
|
||||||
|
a predicate of a single argument, `predicate isSource(DataFlow::Node source)`.
|
||||||
|
Following the same steps, we introduce a `DataFlow::Node` and an `exists()`:
|
||||||
|
|
||||||
|
```ql
|
||||||
|
import cpp
|
||||||
|
import semmle.code.cpp.dataflow.TaintTracking
|
||||||
|
|
||||||
|
from DataFlow::Node source
|
||||||
|
where
|
||||||
|
exists(FunctionCall read |
|
||||||
|
read.getTarget().getName() = "read" and
|
||||||
|
read.getArgument(1) = source.asExpr()
|
||||||
|
)
|
||||||
|
select source
|
||||||
|
```
|
||||||
|
|
||||||
|
There is one more adjustment needed for this to work. The `buf` argument is both
|
||||||
|
read by and written to by the `snprintf` function call. Because we are specifying
|
||||||
|
it as a *source*, the value of interest is the value *after* the call. We get
|
||||||
|
this value by
|
||||||
|
[casting](https://help.semmle.com/QL/ql-handbook/expressions.html#casts) to the
|
||||||
|
post-update node selector. Instead of `source.asExpr()`, we use
|
||||||
|
`source.(DataFlow::PostUpdateNode).getPreUpdateNode()`
|
||||||
|
|
||||||
|
|
||||||
|
Last, we incorporate this into a predicate:
|
||||||
|
|
||||||
|
```ql
|
||||||
|
predicate isSource(DataFlow::Node source) {
|
||||||
|
// count = read(STDIN_FILENO, buf, BUFSIZE);
|
||||||
|
exists(FunctionCall read |
|
||||||
|
read.getTarget().getName() = "read" and
|
||||||
|
read.getArgument(1) = source.(DataFlow::PostUpdateNode).getPreUpdateNode().asExpr()
|
||||||
|
)
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
If you quick-eval this predicate, you will see that `source` is now `ref arg buf`
|
||||||
|
instead of `buf`.
|
||||||
|
|
||||||
|
|
||||||
### The Extra Flow Step
|
### The Extra Flow Step
|
||||||
The codeql data flow library traverses *visible* source code fairly well, but flow
|
xx:
|
||||||
through opaque functions requires additional support. Functions for which only a
|
|
||||||
headers is available are opaque, and we have one of these here: the call to
|
|
||||||
`snprintf`. Once we get this call, there are *two* nodes to identify: the inflow
|
|
||||||
and outflow.
|
|
||||||
|
|
||||||
Let's start with `snprintf`. If we try
|
|
||||||
```ql
|
|
||||||
from FunctionCall printf
|
|
||||||
where printf.getTarget().getName() = "snprintf"
|
|
||||||
select printf
|
|
||||||
```
|
|
||||||
we get zero results. This is puzzling; if we visit the `add-user.c` source and
|
|
||||||
follow the definition of `snprintf`, it turns out to be a macro on MacOS:
|
|
||||||
```c
|
|
||||||
#undef snprintf
|
|
||||||
#define snprintf(str, len, ...) \
|
|
||||||
__builtin___snprintf_chk (str, len, 0, __darwin_obsz(str), __VA_ARGS__)
|
|
||||||
#endif
|
|
||||||
```
|
|
||||||
|
|
||||||
Fortunately, the underlying function `__builtin___snprintf_chk` has `snprintf` in
|
|
||||||
the name. So instead of working with C macros from codeql, we generalize our
|
|
||||||
query using a name pattern with `.matches`:
|
|
||||||
```ql
|
|
||||||
from FunctionCall printf
|
|
||||||
where printf.getTarget().getName().matches("%snprintf%")
|
|
||||||
select printf
|
|
||||||
```
|
|
||||||
|
|
||||||
This identifies our call
|
|
||||||
|
|
||||||
snprintf(query, bufsize, "INSERT INTO users VALUES (%d, '%s')", id, info);
|
|
||||||
|
|
||||||
and we need the inflow and outflow nodes next. `query` is the outflow, `info` is
|
|
||||||
the inflow.
|
|
||||||
|
|
||||||
In the `snprintf` macro call, those have indices 0 and 4. In the underlying function
|
In the `snprintf` macro call, those have indices 0 and 4. In the underlying function
|
||||||
`__builtin___snprintf_chk`, the indices are 0 and 6. Using the latter:
|
`__builtin___snprintf_chk`, the indices are 0 and 6. Using the latter:
|
||||||
@@ -661,89 +681,9 @@ select printf, out, into
|
|||||||
|
|
||||||
This correctly identifies the call and the extra flow arguments.
|
This correctly identifies the call and the extra flow arguments.
|
||||||
|
|
||||||
<!-- !-- Practice exercise: !-- Very specific: shifted index for macro.
|
|
||||||
Generalize this to consider !-- all trailing arguments as sources. -->
|
|
||||||
|
|
||||||
|
|
||||||
Practice exercise: If you are using linux or windows, generalize this query for
|
|
||||||
the `snprintf` arguments found there. One way to do this is using `or`:
|
|
||||||
|
|
||||||
```ql
|
|
||||||
printf.getTarget().getName().matches("%snprintf%") and
|
|
||||||
(
|
|
||||||
// mac version
|
|
||||||
or
|
|
||||||
// linux version
|
|
||||||
or
|
|
||||||
// windows version
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
### The isSource Predicate
|
|
||||||
Recall the query we have to find variable accesses:
|
|
||||||
|
|
||||||
from VariableAccess va
|
|
||||||
where va.getLocation().getFile().getShortName() = "simple"
|
|
||||||
select va, va.getTarget() as definition
|
|
||||||
|
|
||||||
This query uses the structural information from `VariableAccess`. For taint flow,
|
|
||||||
we use a `DataFlow::Node nd`.
|
|
||||||
|
|
||||||
This is a direct conversion:
|
|
||||||
- to get a `VariableAccess` from a `Node`, use `node.asExpr` to get an `Expr`
|
|
||||||
- and then narrow the `Expr` to `VariableAccess`
|
|
||||||
- the location information is also available from the `Node nd`
|
|
||||||
|
|
||||||
Together, this gives us
|
|
||||||
|
|
||||||
```ql
|
|
||||||
import cpp
|
|
||||||
import semmle.code.cpp.dataflow.TaintTracking
|
|
||||||
|
|
||||||
from DataFlow::Node nd
|
|
||||||
where
|
|
||||||
nd.asExpr() instanceof VariableAccess and
|
|
||||||
nd.getLocation().getFile().getShortName() = "simple"
|
|
||||||
select nd
|
|
||||||
```
|
|
||||||
|
|
||||||
In the TaintTracking configuration, we use
|
|
||||||
|
|
||||||
sqliSourceDemo(source)
|
|
||||||
|
|
||||||
so we convert to a predicate:
|
|
||||||
|
|
||||||
predicate sqliSourceDemo(DataFlow::Node nd) {
|
|
||||||
// variable use
|
|
||||||
nd.asExpr() instanceof VariableAccess and
|
|
||||||
nd.getLocation().getFile().getShortName() = "simple"
|
|
||||||
}
|
|
||||||
|
|
||||||
This source definition is good for our example but needs adjustment for larger
|
|
||||||
codebases. See `sqliSourceProduction` in the appendix for an adjusted version.
|
|
||||||
|
|
||||||
|
|
||||||
### The isAdditionalTaintStep Predicate
|
|
||||||
|
|
||||||
Data flow and taint tracking configuration classes support a number of additional
|
|
||||||
features that help configure the process of building and exploring the data flow
|
|
||||||
path.
|
|
||||||
|
|
||||||
One such feature is adding additional taint steps. This is useful if you use
|
|
||||||
libraries which are not modelled by the default taint tracking. You can implement
|
|
||||||
this by overriding `isAdditionalTaintStep` predicate. This has two parameters, the
|
|
||||||
`from` and the `to` node, and essentially allows you to add extra edges into the
|
|
||||||
taint tracking or data flow graph.
|
|
||||||
|
|
||||||
For this tutorial, we have provided several predicates that track string and integer
|
|
||||||
taint flow across `stl` and `bsl` functions. They are listed in the appendix
|
|
||||||
`bslstrings library`; here we will use them as library functions via the single
|
|
||||||
predicate
|
|
||||||
```ql
|
|
||||||
stlBslTaintStep(n1, n2)
|
|
||||||
```
|
|
||||||
|
|
||||||
### Complete query
|
### Complete query
|
||||||
|
xx:
|
||||||
|
|
||||||
Using the previous predicates
|
Using the previous predicates
|
||||||
|
|
||||||
sqliSourceDemo(source)
|
sqliSourceDemo(source)
|
||||||
|
|||||||
Reference in New Issue
Block a user