19 Commits

Author SHA1 Message Date
Michael Hohn
bc7cda5274 update from pre/postUpdate node to new forms 2025-03-05 11:12:35 -08:00
Michael Hohn
bf69cb0f45 comment isAdditionalFlowStep--no longer needed 2025-03-03 12:08:36 -08:00
Michael Hohn
684b4c145a fix flow indirection 2025-03-03 12:04:02 -08:00
Michael Hohn
7ec8b18eac db 2025-03-03 11:59:42 -08:00
Michael Hohn
01048300c0 from...where...select with class 2025-02-18 19:21:30 -08:00
Michael Hohn
e6b23a9d86 from...where...select 2025-02-18 19:13:19 -08:00
Michael Hohn
7b1daa9a8b updates for pack lock 2025-02-17 17:17:49 -08:00
Michael Hohn
f3b703a35f updates for module system; include a db 2025-02-17 17:09:04 -08:00
Michael Hohn
c1b3c8d901 Updated readme 2022-08-21 21:05:38 -07:00
Michael Hohn
c01a039d23 Partially revert qlpack to get working cli command
This, from the README, now works:
    codeql database analyze                                 \
           -v                                               \
           --ram=14000                                      \
           -j12                                             \
           --rerun                                          \
           --search-path $HOME/local/codeql-v2.9.3/ql       \
           --format=sarif-latest                            \
           --output cpp-sqli.sarif                          \
           --                                               \
           $DB                                              \
           $SRCDIR/SqlInjection.ql

It failed with

    ERROR: Referenced pack 'codeql/cpp-all' not found. (/Users/hohn/local/codeql-dataflow-sql-injection/qlpack.yml:1,1-1)

when using
    dependencies:
2022-08-21 20:59:06 -07:00
Michael Hohn
83e4ac9be8 Add xkcd link for inspiration 2022-08-03 10:33:07 -07:00
Michael Hohn
48dede015c Change from codeql v2.7.6 to codeql v2.9.3 2022-08-03 10:27:03 -07:00
Michael Hohn
f64503ae1d remove git lfs 2022-08-03 10:25:51 -07:00
Michael Hohn
bd93cad633 remove git lfs 2022-08-03 10:25:22 -07:00
Michael Hohn
3851fcb9eb database w/o git lfs 2022-06-08 15:08:17 +02:00
Michael Hohn
f9eba14771 include git lfs 2022-06-08 15:06:19 +02:00
Michael Hohn
4c7b111ea9 include database 2022-06-08 15:03:00 +02:00
Michael Hohn
3fe610d354 workshop updates 2022-06-08 14:05:32 +02:00
Michael Hohn
dd664fe4ef Insert updates from github.com:hohn/codeql.git 2022-06-08 08:36:05 +02:00
12 changed files with 378 additions and 68 deletions

Binary file not shown.

View File

@@ -1,3 +1,9 @@
[[https://imgs.xkcd.com/comics/exploits_of_a_mom.png]]
(from https://xkcd.com/327/)
* SQL injection example
** Setup and sample run
#+BEGIN_SRC sh
@@ -11,9 +17,9 @@
./build.sh
# Prepare db
./admin rm-db
./admin create-db
./admin show-db
./admin -r
./admin -c
./admin -s
# Add regular user interactively
./add-user 2>> users.log
@@ -22,34 +28,202 @@
# Regular user via "external" process
echo "User Outside" | ./add-user 2>> users.log
./admin show-db
# Check
./admin show-db
./admin -s
# Add Johnny Droptable
./add-user 2>> users.log
Johnny'); DROP TABLE users; --
# And the problem:
./admin show-db
./admin -s
# Check the log
tail users.log
#+END_SRC
** Identify the problem
=./add-user= is reading from =STDIN=, and writing to a database; looking at the code in
[[./add-user.c]] leads to
: count = read(STDIN_FILENO, buf, BUFSIZE - 1);
for the read and
: rc = sqlite3_exec(db, query, NULL, 0, &zErrMsg);
for the write.
This problem is thus a dataflow problem; in codeql terminology we have
- a /source/ at the =read(STDIN_FILENO, buf, BUFSIZE - 1);=
- a /sink/ at the =sqlite3_exec(db, query, NULL, 0, &zErrMsg);=
We write codeql to identify these two, and then connect them via
- a /dataflow configuration/ -- for this problem, the more general /taintflow
configuration/.
** Build codeql database
To get started, build the codeql database (adjust paths to your setup):
#+BEGIN_SRC sh
# Build the db with source commit id.
export PATH=$HOME/local/vmsync/codeql224:"$PATH"
SRCDIR=$HOME/local/codeql-dataflow-sql-injection/
DB=$HOME/local/db/codeql-dataflow-sql-injection-$(cd $SRCDIR && git rev-parse --short HEAD)
# export PATH=$HOME/local/vmsync/codeql250:"$PATH"
SRCDIR=$(pwd)
DB=$SRCDIR/cpp-sqli-$(cd $SRCDIR && git rev-parse --short HEAD)
echo $DB
test -d "$DB" && rm -fR "$DB"
mkdir -p "$DB"
cd $SRCDIR
codeql database create --language=cpp -s $SRCDIR -j 8 -v $DB --command='./build.sh'
cd $SRCDIR && codeql database create --language=cpp -s . -j 8 -v $DB --command='./build.sh'
#+END_SRC
Then add this database directory to your VS Code =DATABASES= tab.
** Build codeql database in steps
For larger projects, using a single command to build everything is costly when
any part of the build fails.
To build a database in steps, use the following sequence, adjusting paths to
your setup:
#+BEGIN_SRC sh
# Build the db with source commit id.
export PATH=$HOME/local/vmsync/codeql250:"$PATH"
SRCDIR=$HOME/local/codeql-training-material.cpp-sqli/cpp/codeql-dataflow-sql-injection
DB=$SRCDIR/cpp-sqli-$(cd $SRCDIR && git rev-parse --short HEAD)
# Check paths
echo $DB
echo $SRCDIR
# Prepare db directory
test -d "$DB" && rm -fR "$DB"
mkdir -p "$DB"
# Run the build
cd $SRCDIR
codeql database init --language=cpp -s . -v $DB
# Repeat trace-command as needed to cover all targets
codeql database trace-command -v $DB -- make
codeql database finalize -j4 $DB
#+END_SRC
Then add this database directory to your VS Code =DATABASES= tab.
** Develop the query bottom-up
1. Identify the /source/ part of the
: read(STDIN_FILENO, buf, BUFSIZE - 1);
expression, the =buf= argument.
Start from a =from..where..select=, then convert to a predicate.
2. Identify the /sink/ part of the
: sqlite3_exec(db, query, NULL, 0, &zErrMsg);
expression, the =query= argument. Again start from =from..where..select=,
then convert to a predicate.
3. Fill in the /taintflow configuration/ boilerplate
#+BEGIN_SRC java
class CppSqli extends TaintTracking::Configuration {
CppSqli() { this = "CppSqli" }
override predicate isSource(DataFlow::Node node) {
none()
}
override predicate isSink(DataFlow::Node node) {
none()
}
}
#+END_SRC
Note that an inout-argument in C/C++ (the =buf= pointer is passed to =read=
and points to updated data after the return) is accessed as a codeql source
via
: source.(DataFlow::PostUpdateNode).getPreUpdateNode().asExpr()
instead of the usual
: source.asExpr()
The final query (without =isAdditionalTaintStep=) is
#+BEGIN_SRC java
/**
,* @name SQLI Vulnerability
,* @description Using untrusted strings in a sql query allows sql injection attacks.
,* @kind path-problem
,* @id cpp/SQLIVulnerable
,* @problem.severity warning
,*/
import cpp
import semmle.code.cpp.dataflow.TaintTracking
import DataFlow::PathGraph
class SqliFlowConfig extends TaintTracking::Configuration {
SqliFlowConfig() { this = "SqliFlow" }
override predicate isSource(DataFlow::Node source) {
// count = read(STDIN_FILENO, buf, BUFSIZE);
exists(FunctionCall read |
read.getTarget().getName() = "read" and
read.getArgument(1) = source.(DataFlow::PostUpdateNode).getPreUpdateNode().asExpr()
)
}
override predicate isSink(DataFlow::Node sink) {
// rc = sqlite3_exec(db, query, NULL, 0, &zErrMsg);
exists(FunctionCall exec |
exec.getTarget().getName() = "sqlite3_exec" and
exec.getArgument(1) = sink.asExpr()
)
}
}
from SqliFlowConfig conf, DataFlow::PathNode source, DataFlow::PathNode sink
where conf.hasFlowPath(source, sink)
select sink, source, sink, "Possible SQL injection"
#+END_SRC
** Optional: sarif file review of the results
Query results are available in several output formats using the cli. The
following produces the sarif format, a json-based result description.
#+BEGIN_SRC sh
# The setup information from before
export PATH=$HOME/local/vmsync/codeql250:"$PATH"
SRCDIR=$HOME/local/codeql-training-material.cpp-sqli/cpp/codeql-dataflow-sql-injection
DB=$SRCDIR/cpp-sqli-$(cd $SRCDIR && git rev-parse --short HEAD)
# Check paths
echo $DB
echo $SRCDIR
# To see the help
codeql database analyze -h
# Run a query
codeql database analyze \
-v \
--ram=14000 \
-j12 \
--rerun \
--search-path ~/local/vmsync/ql \
--format=sarif-latest \
--output cpp-sqli.sarif \
-- \
$DB \
$SRCDIR/SqlInjection.ql
# Examine the file in an editor
edit cpp-sqli.sarif
#+END_SRC
An example of using the sarif data is in the the jq script [[./sarif-summary.jq]].
When run against the sarif input via
#+BEGIN_SRC sh
jq --raw-output --join-output -f sarif-summary.jq < cpp-sqli.sarif > cpp-sqli.txt
#+END_SRC
it produces output in a form close to that of compiler error messages:
#+BEGIN_SRC text
query-id: message line
Path
...
Path
...
#+END_SRC

View File

@@ -1,53 +1,39 @@
/**
* @name SQLI Vulnerability
* @description Using untrusted strings in a sql query allows sql injection attacks.
* @kind path-problem
* @id cpp/SQLIVulnerable
* @problem.severity warning
*/
* @name SQLI Vulnerability
* @description Using untrusted strings in a sql query allows sql injection attacks.
* @kind path-problem
* @id cpp/sqlivulnerable
* @problem.severity warning
*/
import cpp
import semmle.code.cpp.dataflow.TaintTracking
import DataFlow::PathGraph
import semmle.code.cpp.dataflow.new.TaintTracking
class SqliFlowConfig extends TaintTracking::Configuration {
SqliFlowConfig() { this = "SqliFlow" }
module SqliFlowConfig implements DataFlow::ConfigSig {
override predicate isSource(DataFlow::Node source) {
predicate isSource(DataFlow::Node source) {
// count = read(STDIN_FILENO, buf, BUFSIZE);
exists(FunctionCall read |
read.getTarget().getName() = "read" and
read.getArgument(1) = source.(DataFlow::PostUpdateNode).getPreUpdateNode().asExpr()
read.getArgument(1) = source.asDefiningArgument()
)
}
override predicate isSanitizer(DataFlow::Node sanitizer) { none() }
predicate isBarrier(DataFlow::Node sanitizer) { none() }
override predicate isAdditionalTaintStep(DataFlow::Node into, DataFlow::Node out) {
// Extra taint step
// snprintf(query, bufsize, "INSERT INTO users VALUES (%d, '%s')", id, info);
// But snprintf is a macro on mac os. The actual function's name is
// #undef snprintf
// #define snprintf(str, len, ...) \
// __builtin___snprintf_chk (str, len, 0, __darwin_obsz(str), __VA_ARGS__)
// #endif
exists(FunctionCall printf |
printf.getTarget().getName().matches("%snprintf%") and
printf.getArgument(0) = out.(DataFlow::PostUpdateNode).getPreUpdateNode().asExpr() and
// very specific: shifted index for macro.
printf.getArgument(6) = into.asExpr()
)
}
override predicate isSink(DataFlow::Node sink) {
predicate isSink(DataFlow::Node sink) {
// rc = sqlite3_exec(db, query, NULL, 0, &zErrMsg);
exists(FunctionCall exec |
exec.getTarget().getName() = "sqlite3_exec" and
exec.getArgument(1) = sink.asExpr()
exec.getArgument(1) = sink.asIndirectArgument()
)
}
}
from SqliFlowConfig conf, DataFlow::PathNode source, DataFlow::PathNode sink
where conf.hasFlowPath(source, sink)
module MyFlow = TaintTracking::Global<SqliFlowConfig>;
import MyFlow::PathGraph
from MyFlow::PathNode source, MyFlow::PathNode sink
where MyFlow::flowPath(source, sink)
select sink, source, sink, "Possible SQL injection"

View File

@@ -42,14 +42,17 @@ void abort_on_exec_error(int rc, sqlite3 *db, char* zErrMsg) {
char* get_user_info() {
#define BUFSIZE 1024
char* buf = (char*) malloc(BUFSIZE * sizeof(char));
if(buf==NULL) abort();
int count;
// Disable buffering to avoid need for fflush
// after printf().
setbuf( stdout, NULL );
printf("*** Welcome to sql injection ***\n");
printf("Please enter name: ");
count = read(STDIN_FILENO, buf, BUFSIZE);
count = read(STDIN_FILENO, buf, BUFSIZE - 1);
if (count <= 0) abort();
// ensure the buffer is zero-terminated
buf[count] = '\0';
/* strip trailing whitespace */
while (count && isspace(buf[count-1])) {
buf[count-1] = 0; --count;
@@ -90,6 +93,7 @@ int main(int argc, char* argv[]) {
info = get_user_info();
id = get_new_id();
write_info(id, info);
free(info);
/*
* show_info(id);
*/

43
admin
View File

@@ -1,5 +1,24 @@
#!/bin/bash
rm-db () {
set -e
script=$(basename "$0")
GREEN='\033[0;32m'
MAGENTA='\033[0;95m'
NC='\033[0m'
RED='\033[0;31m'
YELLOW='\033[0;33m'
help() {
echo -e "Usage: ./${script} [options]" \
"\n${YELLOW}Options: ${NC}" \
"\n\t -h ${GREEN}Show Help ${NC}" \
"\n\t -c ${MAGENTA}Creates a users table ${NC}" \
"\n\t -s ${MAGENTA}Shows all records in the users table ${NC}" \
"\n\t -r ${RED}Removes users table ${NC}"
}
remove-db () {
rm users.sqlite
}
@@ -18,4 +37,24 @@ show-db () {
' | sqlite3 users.sqlite
}
eval $@
if [ $# == 0 ]; then
help
exit 0
fi
while getopts "h?csr" option
do
case "${option}"
in
h|\?)
help
exit 0
;;
c) create-db
;;
s) show-db
;;
r) remove-db
;;
esac
done

View File

@@ -2,10 +2,10 @@
"folders": [
{
"path": "."
},
{
"name": "[codeql-dataflow-sql-injection-d5b28fb source archive]",
"uri": "codeql-zip-archive://0-66/Users/hohn/local/db/codeql-dataflow-sql-injection-d5b28fb/src.zip/"
}
]
],
"settings": {
"codeQL.runningQueries.autoSave": true,
"makefile.configureOnOpen": false
}
}

View File

@@ -65,7 +65,7 @@ If you get stuck, try searching our documentation and blog posts for help and id
- [Using the CodeQL extension for VS Code](https://help.semmle.com/codeql/codeql-for-vscode.html)
## Codeql Recap
This is a brief review of codeql taken from the [full
This is a brief review of CodeQL taken from the [full
introduction](https://git.io/JJqdS). For more details, see the [documentation
links](#documentation-links). We will revisit all of this during the tutorial.
@@ -89,7 +89,7 @@ select /* ... expressions ... */
The `from` clause specifies some variables that will be used in the query. The
`where` clause specifies some conditions on those variables in the form of logical
formulas. The `select` clauses speciifes what the results should be, and can refer
formulas. The `select` clauses specifies what the results should be, and can refer
to variables defined in the `from` clause.
The `from` clause is defined as a series of variable declarations, where each
@@ -206,9 +206,9 @@ This program can be compiled and linked, and a simple sqlite db created via
./build.sh
# Prepare db
./admin rm-db
./admin create-db
./admin show-db
./admin -r
./admin -c
./admin -s
```
Users can be added via `stdin` in several ways; the second is a pretend "server"
@@ -226,14 +226,14 @@ echo "User Outside" | ./add-user 2>> users.log
Check the db and log:
```
# Check
./admin show-db
./admin -s
tail -4 users.log
```
Looks ok:
```
0:$ ./admin show-db
0:$ ./admin -s
87797|First User
87808|User Outside
@@ -252,8 +252,8 @@ Johnny'); DROP TABLE users; --
And then we have this:
```sh
# And the problem:
./admin show-db
0:$ ./admin show-db
./admin -s
0:$ ./admin -s
Error: near line 2: no such table: users
```
@@ -580,7 +580,7 @@ the process of building and exploring the data flow path.
One such feature is adding additional taint steps. This is useful if you use
libraries which are not modelled by the default taint tracking. You can implement
this by overriding `isAdditionalTaintStep` predicate. This has two parameters, the
`from` and the `to` node, and essentially allows you to add extra edges into the
`from` and the `to` node, and it essentially allows you to add extra edges into the
taint tracking or data flow graph.
A starting configuration can look like the following, with details to be filled

14
codeql-pack.lock.yml Normal file
View File

@@ -0,0 +1,14 @@
---
lockVersion: 1.0.0
dependencies:
codeql/cpp-all:
version: 0.9.1
codeql/dataflow:
version: 0.0.2
codeql/ssa:
version: 0.1.3
codeql/tutorial:
version: 0.1.3
codeql/util:
version: 0.1.3
compiled: false

BIN
cpp-sqli-3fe610d-1.zip (Stored with Git LFS) Normal file

Binary file not shown.

View File

@@ -1,3 +1,4 @@
name: cpp_sql_injection
version: 0.0.0
libraryPathDependencies: codeql-cpp
name: codeql-workshop/cpp-sql-injection
version: 0.0.1
dependencies:
codeql/cpp-all: "*"

60
sarif-summary.jq Normal file
View File

@@ -0,0 +1,60 @@
# -*- sh -*-
.runs | .[] | .results | .[] |
( (.ruleId, ": ",
(.message.text | split("\n") | ( .[0], " [", length-1 , " more]")),
"\n")
,
(if (.codeFlows != null) then
(.codeFlows | .[] |
(" Path\n"
,
( .threadFlows | .[] | .locations | .[] | .location | " "
,
( .physicalLocation | ( .artifactLocation.uri, ":", .region.startLine, ":"))
,
(.message.text, " ")
,
"\n"
)))
else
(.locations | .[] |
( " "
,
(.physicalLocation | ( .artifactLocation.uri, ":", .region.startLine, ":"))
))
,
# .message.text,
"\n"
end)
) | tostring
# This script extracts the following parts of the sarif output:
#
# # problem
# "runs" : [ {
# "results" : [ {
# "ruleId" : "cpp/UncheckedErrorCode",
# # path problem
# "runs" : [ {
# "tool" : {
# "driver" : {
# "rules" : [ {
# "properties" : {
# "kind" : "path-problem",
# "runs" : [ {
# "results" : [ {
# "ruleId" : "cpp/DangerousArithmetic",
# "ruleIndex" : 6,
# "message" : {
# "text" : "Potential overflow (conversion: int -> unsigned int)\nPotential overflow (con
# "runs" : [ {
# "results" : [ {
# "codeFlows" : [ {
# "threadFlows" : [ {
# "locations" : [ {
# "location" : {
# "message" : {
# "text" : "buff"

29
session.ql Normal file
View File

@@ -0,0 +1,29 @@
import cpp
// 1. invalid input -- source
// count = read(STDIN_FILENO, buf, BUFSIZE - 1);
//
// 2. gets to a sql statement -- flow
// flow config
//
// 3. drops table -- sink
// rc = sqlite3_exec(db, query, NULL, 0, &zErrMsg);
// All predicates and classes are using one of:
// AST Abstract syntax tree
// CFG Control flow graph
// DFG Data flow graph
// Type hierarchy
class DataSource extends VariableAccess {
DataSource() {
exists(FunctionCall read |
read.getTarget().getName() = "read" and
read.getArgument(1) = this
)
}
}
from FunctionCall read, VariableAccess buf
where
read.getTarget().getName() = "read" and
read.getArgument(1) = buf
select buf