Using sqlite to illustrate models-as-data
This section demonstrates the use of the models-as-data system by analyzing a small Java application that uses the SQLite JDBC driver. The example is adapted from a CodeQL workshop.
Build the CodeQL Database
To get started, build the CodeQL database for the SQLite-backed Java sample. Adjust paths as needed.
SRCDIR=$(pwd)
DB=$SRCDIR/java-sqlite-$(cd $SRCDIR && git rev-parse --short HEAD).db
echo $DB
test -d "$DB" && rm -fR "$DB"
mkdir -p "$DB"
# Ensure the correct CodeQL version is in your PATH
export PATH="$(cd ../codeql && pwd):$PATH"
codeql database create --language=java -s . -j 8 -v $DB --command='./build.sh'
# Check for presence of AddUser.java in the resulting database
unzip -v $DB/src.zip | grep AddUser
Then add this database directory to your VS Code DATABASES tab.
Tests Using a Default Query
You can run the standard SQL injection query:
../ql/java/ql/src/Security/CWE/CWE-089/SqlTainted.ql
but it will return no results. However, it does help identify which classes are being analyzed as potential sources and sinks. Instead, run the diagnostic query:
You can run it from the CLI:
codeql query run \
-v \
--database java-sqlite-e2e555c.db \
--output result.bqrs \
--threads=12 \
--ram=14000 \
Illustrations.ql
codeql bqrs decode --format=text result.bqrs | sed -n '/^Result set: #select/,$p'
The result will look like:
Result set: #select
| ui | qsi |
+------+-------+
| args | query |
In the editor, these correspond to:
main(String[] args)— source-likeconn.createStatement().executeUpdate(query)— sink
However, System.console().readLine() is not detected as a source. Therefore, SqlTainted.ql cannot find a complete flow.
Supplement Sources via the Model Editor
-
We observe no flow from source to sink
- A sink exists (
executeUpdate) - But no recognized source is found
- A sink exists (
-
There are two ways to fix this:
- Add a new source in
Customizations.qll - Add a new source in the models-as-data YAML format
- Add a new source in
Supplement CodeQL: Write a Full Manual Query
A manual dataflow query is already available:
This can trace the data manually even when standard configuration fails.
Supplement CodeQL: Add to FlowSource or a Subclass
Sometimes, the only way to identify how to extend a source is to understand how CodeQL internally resolves source nodes.
Key class hierarchies:
abstract class SourceNode extends DataFlow::Node
abstract class RemoteFlowSource extends SourceNode
Follow usage in:
Then modify Customizations.qll by adding the custom source. The modified
../ql/java/ql/lib/Customizations.qll is
import java
private import semmle.code.java.dataflow.FlowSources
class ReadLine extends RemoteFlowSource {
ReadLine() {
exists(Call read |
read.getCallee().getName() = "readLine" and
read = this.asExpr()
)
}
override string getSourceType() { result = "Console readline" }
}
This allows
predicate isSource(DataFlow::Node src) {
src instanceof ActiveThreatModelSource
}
to include readLine() even though we extended RemoteFlowSource.
TODO supplement codeql: Add to models-as-data
- schema in codeql: ../ql/java/ql/lib/semmle/code/java/dataflow/internal/ExternalFlowExtensions.qll
- data sample: ../.github/codeql/extensions/jedis-db-local-java/models/redis.clients.jedis.model.yml
In the model editor, we see a java.io.*Console.*readline' (using show already modeled option)
1:$ rg -i 'java.io.*Console.*readline' ql/java
ql/java/ql/lib/ext/generated/java.io.model.yml
16: - ["java.io", "Console", False, "readLine", "()", "", "Argument[this]", "ReturnValue", "taint", "df-generated"]
17: - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[0]", "Argument[this]", "taint", "df-generated"]
18: - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[1].ArrayElement", "Argument[this]", "taint", "df-generated"]
19: - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[this]", "ReturnValue", "taint", "df-generated"]
note: this file is in the generated/ tree. There are others.
The current readline modeling is in the summaryModel section; we need it
in a sourceModel
extensions:
- addsTo:
pack: codeql/java-all
extensible: summaryModel
data:
...
- ["java.io", "Console", False, "readLine", "()", "", "Argument[this]", "ReturnValue", "taint", "df-generated"]
- ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[0]", "Argument[this]", "taint", "df-generated"]
- ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[1].ArrayElement", "Argument[this]", "taint", "df-generated"]
- ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument
The model editor will not show this because its already modeled. To illustrate text-based additions, we'll use plain text. Starting from
extensions:
- addsTo:
pack: codeql/java-all
extensible: summaryModel
data:
...
- ["java.io", "Console", False, "readLine", "()", "", "Argument[this]", "ReturnValue", "taint", "df-generated"]
- ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[0]", "Argument[this]", "taint", "df-generated"]
- ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[1].ArrayElement", "Argument[this]", "taint", "df-generated"]
- ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument
and the field information
extensible predicate sourceModel(
string package, string type, boolean subtypes, string name, string signature, string ext,
string output, string kind, string provenance, QlBuiltins::ExtensionId madId
);
Starting from summaryModel
# summaryModel
# string package, string type, boolean subtypes, string name, string signature, string ext, string input, string output, string kind, string provenance, QlBuiltins::ExtensionId madId
- ["java.io", "Console", False, "readLine", "()", "", "Argument[this]", "ReturnValue", "taint", "df-generated"]
we can construct the sourceModel
extensions:
- addsTo:
pack: codeql/java-all
extensible: sourceModel
data:
# sourceModel
# string package, string type, boolean subtypes, string name, string signature, string ext, string output, string kind, string provenance, QlBuiltins::ExtensionId madId
- ["java.io", "Console", False, "readLine", "()", "", "ReturnValue", "remote", "manual"]
# # from original
# # summaryModel
# # string package, string type, boolean subtypes, string name, string signature, string ext, string input, string output, string kind, string provenance, QlBuiltins::ExtensionId madId
# - ["java.io", "Console", False, "readLine", "()", "", "Argument[this]", "ReturnValue", "taint", "df-generated"]
and move this into ../.github/codeql/extensions/sqlite-db/models/sqlite.model.yml
To ensure that these model extensions are applied during query runs, include this setting
{
...,
"settings": {
...,
"codeQL.runningQueries.useExtensionPacks": "all"
}
}
in the workspace configuration file ../qllab.code-workspace
In some environments (e.g., older VS Code versions), you may also need to replicate this setting in ../.vscode/settings.json; there it simplifies to
"codeQL.runningQueries.useExtensionPacks": "all"
Now we can run ../ql/java/ql/src/Security/CWE/CWE-089/SqlTainted.ql again.