topic reordering

This commit is contained in:
Michael Hohn
2020-07-22 10:51:37 -07:00
committed by =Michael Hohn
parent 62234f4d55
commit 12a90e9a54

View File

@@ -3,12 +3,16 @@
--> -->
# CodeQL tutorial for C/C++: data flow and SQL injection # CodeQL tutorial for C/C++: data flow and SQL injection
xx:
md_toc github < codeql-dataflow-sql-injection.md
- [CodeQL tutorial for C/C++: data flow and SQL injection](#codeql-tutorial-for-cc-data-flow-and-sql-injection)
- [Setup instructions](#setup-instructions) - [Setup instructions](#setup-instructions)
- [Documentation links](#documentation-links) - [Documentation links](#documentation-links)
- [The Problem in Action: running the code to see the problem](#the-problem-in-action-running-the-code-to-see-the-problem)
- [Problem statement](#problem-statement) - [Problem statement](#problem-statement)
- [Tutorial, part 1: running the code to see the problem](#tutorial-part-1-running-the-code-to-see-the-problem)
- [Data flow overview and illustration](#data-flow-overview-and-illustration) - [Data flow overview and illustration](#data-flow-overview-and-illustration)
- [Tutorial, part 3: recap, sources and sinks](#tutorial-part-3-recap-sources-and-sinks) - [Tutorial: recap, sources and sinks](#tutorial-recap-sources-and-sinks)
- [Codeql recap](#codeql-recap) - [Codeql recap](#codeql-recap)
- [Call to SQL query execution (the data sink)](#call-to-sql-query-execution-the-data-sink) - [Call to SQL query execution (the data sink)](#call-to-sql-query-execution-the-data-sink)
- [Non-constant query strings and untrusted data (the data source)](#non-constant-query-strings-and-untrusted-data-the-data-source) - [Non-constant query strings and untrusted data (the data source)](#non-constant-query-strings-and-untrusted-data-the-data-source)
@@ -16,7 +20,7 @@
- [Taint flow configuration](#taint-flow-configuration) - [Taint flow configuration](#taint-flow-configuration)
- [Path problem setup](#path-problem-setup) - [Path problem setup](#path-problem-setup)
- [Path problem query format](#path-problem-query-format) - [Path problem query format](#path-problem-query-format)
- [Tutorial, part 2: data flow details](#tutorial-part-2-data-flow-details) - [Tutorial: data flow details](#tutorial-data-flow-details)
- [isSource predicate ](#issource-predicate-) - [isSource predicate ](#issource-predicate-)
- [isSink predicate ](#issink-predicate-) - [isSink predicate ](#issink-predicate-)
- [Additional data flow features: the isAdditionalTaintStep predicate](#additional-data-flow-features-the-isadditionaltaintstep-predicate) - [Additional data flow features: the isAdditionalTaintStep predicate](#additional-data-flow-features-the-isadditionaltaintstep-predicate)
@@ -54,6 +58,78 @@ If you get stuck, try searching our documentation and blog posts for help and id
- [Learning CodeQL for C/C++](https://help.semmle.com/QL/learn-ql/cpp/ql-for-cpp.html) - [Learning CodeQL for C/C++](https://help.semmle.com/QL/learn-ql/cpp/ql-for-cpp.html)
- [Using the CodeQL extension for VS Code](https://help.semmle.com/codeql/codeql-for-vscode.html) - [Using the CodeQL extension for VS Code](https://help.semmle.com/codeql/codeql-for-vscode.html)
## The Problem in Action: running the code to see the problem
This program can be compiled and linked, and a simple sqlite db created via
```sh
# Build
./build.sh
# Prepare db
./admin rm-db
./admin create-db
./admin show-db
```
Users can be added via `stdin` in several ways; the second is a pretend "server"
using the `echo` command.
```sh
# Add regular user interactively
./add-user 2>> users.log
First User
# Regular user via "external" process
echo "User Outside" | ./add-user 2>> users.log
```
Check the db and log:
```
# Check
./admin show-db
tail -4 users.log
```
Looks ok:
```
0:$ ./admin show-db
87797|First User
87808|User Outside
0:$ tail -4 users.log
[Tue Jul 21 14:15:46 2020] query: INSERT INTO users VALUES (87797, 'First User')
[Tue Jul 21 14:17:07 2020] query: INSERT INTO users VALUES (87808, 'User Outside')
```
But there may be bad input; this one guesses the table name and drops it:
```sh
# Add Johnny Droptable
./add-user 2>> users.log
Johnny'); DROP TABLE users; --
```
And then we have this:
```sh
# And the problem:
./admin show-db
0:$ ./admin show-db
Error: near line 2: no such table: users
```
What happened? The log shows that data was treated as command:
```
1:$ tail -4 users.log
[Tue Jul 21 14:15:46 2020] query: INSERT INTO users VALUES (87797, 'First User')
[Tue Jul 21 14:17:07 2020] query: INSERT INTO users VALUES (87808, 'User Outside')
[Tue Jul 21 14:18:25 2020] query: INSERT INTO users VALUES (87817, 'Johnny'); DROP TABLE users; --')
```
Looking ahead, we now *know* that there is unsafe external data (source)
which reaches (flow path) a database-writing command (sink). Thus, a query
written against this code should find at least one taint flow path.
## Problem statement ## Problem statement
Many security problems can be phrased in terms of _information flow_: Many security problems can be phrased in terms of _information flow_:
@@ -70,9 +146,9 @@ data etc.
We will use CodeQL to analyze the source code constructing a SQL We will use CodeQL to analyze the source code constructing a SQL
query using string concatenation and then executing that query query using string concatenation and then executing that query
string. The following example uses the `sqlite3` library and string. The following example uses the `sqlite3` library; it
- receives user-provided data from `stdin`, - receives user-provided data from `stdin` and keeps it in `buf`
- uses environment data in `id`, - uses environment data and stores it in `id`,
- runs a query in `sqlite3_exec` - runs a query in `sqlite3_exec`
This is intentionally simple code, but it has all the elements that have to be This is intentionally simple code, but it has all the elements that have to be
@@ -150,8 +226,8 @@ int main(int argc, char* argv[]) {
``` ```
In terms of sources, sinks, and information flow, the concrete problem is: In terms of sources, sinks, and information flow, the concrete problem for codeql is:
1. specifying `stdin` as **source** using codeql, 1. specifying `buf` as **source**,
2. specifying the `query` argument to `sqlite3_exec()` as **sink**, 2. specifying the `query` argument to `sqlite3_exec()` as **sink**,
3. specifying some code-specific data flow steps for the codeql library, 3. specifying some code-specific data flow steps for the codeql library,
3. using the codeql taint flow library find taint flow paths (if there are any) 3. using the codeql taint flow library find taint flow paths (if there are any)
@@ -160,77 +236,7 @@ In terms of sources, sinks, and information flow, the concrete problem is:
In the following, we go into more concrete detail and develop codedql scripts to In the following, we go into more concrete detail and develop codedql scripts to
solve this problem. solve this problem.
## Tutorial, part 1: running the code to see the problem
This program can be compiled and linked, and a simple sqlite db created via
```sh
# Build
./build.sh
# Prepare db
./admin rm-db
./admin create-db
./admin show-db
```
Users can be added via `stdin` in several ways; the second is a pretend "server"
using the `echo` command.
```sh
# Add regular user interactively
./add-user 2>> users.log
First User
# Regular user via "external" process
echo "User Outside" | ./add-user 2>> users.log
```
Check the db and log:
```
# Check
./admin show-db
tail -4 users.log
```
Looks ok:
```
0:$ ./admin show-db
87797|First User
87808|User Outside
0:$ tail -4 users.log
[Tue Jul 21 14:15:46 2020] query: INSERT INTO users VALUES (87797, 'First User')
[Tue Jul 21 14:17:07 2020] query: INSERT INTO users VALUES (87808, 'User Outside')
```
But there may be bad input; this one guesses the table name and drops it:
```sh
# Add Johnny Droptable
./add-user 2>> users.log
Johnny'); DROP TABLE users; --
```
And then we have this:
```sh
# And the problem:
./admin show-db
0:$ ./admin show-db
Error: near line 2: no such table: users
```
What happened? The log shows that data was treated as command:
```
1:$ tail -4 users.log
[Tue Jul 21 14:15:46 2020] query: INSERT INTO users VALUES (87797, 'First User')
[Tue Jul 21 14:17:07 2020] query: INSERT INTO users VALUES (87808, 'User Outside')
[Tue Jul 21 14:18:25 2020] query: INSERT INTO users VALUES (87817, 'Johnny'); DROP TABLE users; --')
```
Looking ahead, we now *know* that there is unsafe external data (source)
which reaches (flow path) a database-writing command (sink). Thus, a query
written against this code should find at least one taint flow path.
## Data flow overview and illustration ## Data flow overview and illustration
In the previous sections we identified the sources of problematic strings In the previous sections we identified the sources of problematic strings
@@ -271,7 +277,7 @@ nodes, rather than for the full graph.
To illustrate the dataflow for this problem, we have a [collection of slides](https://drive.google.com/file/d/1eEG0eGVDVEQh0C-0_4UIMcD23AWwnGtV/view?usp=sharing) To illustrate the dataflow for this problem, we have a [collection of slides](https://drive.google.com/file/d/1eEG0eGVDVEQh0C-0_4UIMcD23AWwnGtV/view?usp=sharing)
for this workshop. for this workshop.
## Tutorial, part 3: recap, sources and sinks ## Tutorial: recap, sources and sinks
XX: XX:
<!-- <!--
!-- The complete project can be downloaded via this !-- The complete project can be downloaded via this
@@ -332,7 +338,7 @@ from FunctionCall fc
where fc.<tab> where fc.<tab>
``` ```
Now, we are looking for the call's `*target*`; completion shows `getTarget()`, Now, we are looking for the call's *target*; completion shows `getTarget()`,
and we can finish that to and we can finish that to
```ql ```ql
@@ -482,7 +488,7 @@ where
select sink, source, sink, "Sqli flow from $@", source, "source" select sink, source, sink, "Sqli flow from $@", source, "source"
``` ```
## Tutorial, part 2: data flow details ## Tutorial: data flow details
With the dataflow configuration in place, we just need to provide the details for source(s), sink(s), and taint step(s). With the dataflow configuration in place, we just need to provide the details for source(s), sink(s), and taint step(s).
### isSource predicate ### isSource predicate