11 KiB
Using sqlite to illustrate models-as-data
This description uses / recycles a codeql workshop. The original instructions are below: SQL injection example
Build the codeql database
To get started, build the codeql database (adjust paths to your setup):
# Build the db with source commit id.
SRCDIR=$(pwd)
DB=$SRCDIR/java-sqlite-$(cd $SRCDIR && git rev-parse --short HEAD).db
echo $DB
test -d "$DB" && rm -fR "$DB"
mkdir -p "$DB"
# Use the correct codeql
export PATH="$(cd ../codeql && pwd):$PATH"
codeql database create --language=java -s . -j 8 -v $DB --command='./build.sh'
# Check for AddUser in the db
unzip -v $DB/src.zip | grep AddUser
Then add this database directory to your VS Code DATABASES tab.
Tests using a default query
You can run the stdlib query ../ql/java/ql/src/Security/CWE/CWE-089/SqlTainted.ql but will get no results. It does point at classes to inspect – in particular, the source and sink classes. Run ./Illustrations.ql; from the command line or vs studio code. Via cli:
# run query
codeql query run \
-v \
--database java-sqlite-e2e555c.db \
--output result.bqrs \
--threads=12 \
--ram=14000 \
Illustrations.ql
# format results
codeql bqrs decode --format=text result.bqrs | sed -n '/^Result set: #select/,$p'
This shows
Result set: #select
| ui | qsi |
+------+-------+
| args | query |
In the editor, these link to
main(ARGS)andconn.createStatement().executeUpdate(QUERY);
The second is correct, but System.console().readLine(); is not found.
Thus, SqlTainted.ql will not find anything.
TODO supplement sources via the model editor
-
We have no flow
- check source, sink
- we have a sink
- but ActiveThreatModelSource finds no source
-
We can
- supplement codeql: Write full manual query: already in workshop
-
supplement codeql: Add to FlowSource or a subclass
Note: this one area that just has to be known. Browsing source will not help you.
CodeQL reading hint:
class ActiveThreatModelSource extends DataFlow::Node
uses
this.(SourceNode).getThreatModel()
So following the cast (SourceNode) may be useful:
/** ,* A data flow source. ,*/ abstract class SourceNode extends DataFlow::NodeFollowing the
abstract classis promising:abstract class RemoteFlowSource extends SourceNodeand others.
ql/java/ql/lib/Customizations.qll
- customizations in staging repo
- supplement codeql: Add to models-as-data
- checkkn
-
Also check RemoteFlowSource, from
import semmle.code.cpp.security.FlowSources
The goal now is to supplement sources via the model editor.
SQL injection example
This directory contains the problematic Java source code. The rest of this README describes
- the Setup and sample run for the problem,
- briefly describes how to Identify the problem and
- instructions to Build the codeql database
The codeql query is developed in ../session/README.org.
Setup and sample run
The jdbc connector at https://github.com/xerial/sqlite-jdbc, from here is included in the git repository.
# Use a simple headline prompt
PS1='
\033[32m---- SQL injection demo ----\[\033[33m\033[0m\]
$?:$ '
# Build
./build.sh
# Prepare db
./admin -r
./admin -c
./admin -s
# Add regular user interactively
./add-user 2>> users.log
First User
# Check
./admin -s
# Add Johnny Droptable
./add-user 2>> users.log
Johnny'); DROP TABLE users; --
# And the problem:
./admin -s
# Check the log
tail users.log
Identify the problem
./add-user is reading from STDIN, and writing to a database; looking at the code in
./AddUser.java leads to
System.console().readLine();
for the read and
conn.createStatement().executeUpdate(query);
for the write.
This problem is thus a dataflow problem; in codeql terminology we have
- a source at the
System.console().readLine(); - a sink at the
conn.createStatement().executeUpdate(query);
We write codeql to identify these two, and then connect them via
- a dataflow configuration – for this problem, the more general taintflow configuration.
Build the codeql database
To get started, build the codeql database (adjust paths to your setup):
# Build the db with source commit id.
SRCDIR=$(pwd)
DB=$SRCDIR/java-sqlite-$(cd $SRCDIR && git rev-parse --short HEAD).db
echo $DB
test -d "$DB" && rm -fR "$DB"
mkdir -p "$DB"
# Use the correct codeql
export PATH="$(cd ../codeql && pwd):$PATH"
codeql database create --language=java -s . -j 8 -v $DB --command='./build.sh'
# Check for AddUser in the db
unzip -v $DB/src.zip | grep AddUser
Then add this database directory to your VS Code DATABASES tab.
(optional) Build the codeql database in steps
For larger projects, using a single command to build everything is costly when any part of the build fails. The sequence here is also used by the GHAS default setup, so familiarity with it helps in reviewing logs.
The purpose of these sections is to illustrate the codeql commands used in default setup and making the connection between the GHAS default action and the CodeQL CLI explicit.
After running default setup and downloading the log, you will see the following entries embedded in the full log. They are repeated here for completeness; you can skip the command-line options for now.
codeql version --format=json
codeql resolve languages --format=betterjson --extractor-options-verbosity=4 --extractor-include-aliases
codeql database init --force-overwrite --db-cluster /home/runner/work/_temp/codeql_databases --source-root=/home/runner/work/codeql-workshop-sql-injection-java/codeql-workshop-sql-injection-java --extractor-include-aliases --language=java --codescanning-config=/home/runner/work/_temp/user-config.yaml --build-mode=none --calculate-language-specific-baseline --sublanguage-file-coverage
codeql database trace-command --use-build-mode --working-dir /home/runner/work/codeql-workshop-sql-injection-java/codeql-workshop-sql-injection-java /home/runner/work/_temp/codeql_databases/java
codeql database finalize --finalize-dataset --threads=4 --ram=14567 /home/runner/work/_temp/codeql_databases/java
codeql database run-queries --ram=14567 --threads=4 /home/runner/work/_temp/codeql_databases/java --expect-discarded-cache --min-disk-free=1024 -v --intra-layer-parallelism
codeql database cleanup /home/runner/work/_temp/codeql_databases/java --cache-cleanup=brutal
codeql database bundle /home/runner/work/_temp/codeql_databases/java --output=/home/runner/work/_temp/codeql_databases/java.zip --name=java
To build a database in steps locally, use the following sequence, adjusting paths to your setup:
# Build the db with source commit id.
SRCDIR=$HOME/local/codeql-workshop-sql-injection-java/src
DB=$SRCDIR/java-sqli-$(cd $SRCDIR && git rev-parse --short HEAD)
# Check paths
echo "DB will be: $DB"
echo "SRC is in: $SRCDIR"
# Prepare db directory
test -d "$DB" && rm -fR "$DB"
mkdir -p "$DB"
# Run the build, without --db-cluster
# Init database
cd $SRCDIR
codeql database init \
--language=java \
--build-mode=none \
--source-root=. \
-v $DB
# Repeat trace-command as needed to cover all targets
codeql database trace-command \
--use-build-mode \
--working-dir . \
$DB
# Finalize database
codeql database finalize \
--finalize-dataset \
--threads=4 \
--ram=14567 \
$DB
# Use the database; get the location
echo $DB
# /Users/hohn/local/codeql-workshop-sql-injection-java/src/java-sqli-161a1d5
To also analyze the database just built, we use the log's command but add an explicit query name:
codeql database run-queries \
--ram=14567 \
--threads=4 $DB \
--expect-discarded-cache \
--min-disk-free=1024 \
-v \
--intra-layer-parallelism \
-- \
../session/simple.ql
This only gives us a bqrs file, we want sarif. Checking help:
codeql database run-queries --help
Usage: codeql database run-queries [OPTIONS] -- <database> [<query|dir|suite|pack>...]
[Plumbing] Run a set of queries together.
Run one or more queries against a CodeQL database, saving the results to the results
subdirectory of the database directory.
The results can later be converted to readable formats by codeql database interpret-results,
or query-for-query by with codeql bqrs decode or codeql bqrs interpret.
So we run the following
VERSION=$(cd $SRCDIR && git rev-parse --short HEAD)
codeql database interpret-results \
--format=sarifv2.1.0 \
-o simple-$VERSION.sarif \
-- $DB ../session/simple.ql
echo "Results in simple-$VERSION.sarif"
We kept the output for this sample in ./simple-161a1d5.sarif