Files
codeql-lab/codeql-sqlite-java/README.org
Michael Hohn fa875f4ea0 major revision
2025-07-30 21:26:35 -07:00

233 lines
8.9 KiB
Org Mode
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

* Using sqlite to illustrate models-as-data
This section demonstrates the use of the models-as-data system by analyzing a
small Java application that uses the SQLite JDBC driver. The example is adapted
from a CodeQL workshop.
** Build the CodeQL Database
To get started, build the CodeQL database for the SQLite-backed Java sample. Adjust paths as needed.
#+BEGIN_SRC sh
SRCDIR=$(pwd)
DB=$SRCDIR/java-sqlite-$(cd $SRCDIR && git rev-parse --short HEAD).db
echo $DB
test -d "$DB" && rm -fR "$DB"
mkdir -p "$DB"
# Ensure the correct CodeQL version is in your PATH
export PATH="$(cd ../codeql && pwd):$PATH"
codeql database create --language=java -s . -j 8 -v $DB --command='./build.sh'
# Check for presence of AddUser.java in the resulting database
unzip -v $DB/src.zip | grep AddUser
#+END_SRC
Then add this database directory to your VS Code =DATABASES= tab.
** Tests Using a Default Query
You can run the standard SQL injection query:
[[../ql/java/ql/src/Security/CWE/CWE-089/SqlTainted.ql]]
but it will return no results. However, it does help identify which classes are being analyzed as potential sources and sinks. Instead, run the diagnostic query:
[[./Illustrations.ql]]
You can run it from the CLI:
#+BEGIN_SRC sh
codeql query run \
-v \
--database java-sqlite-e2e555c.db \
--output result.bqrs \
--threads=12 \
--ram=14000 \
Illustrations.ql
codeql bqrs decode --format=text result.bqrs | sed -n '/^Result set: #select/,$p'
#+END_SRC
The result will look like:
#+BEGIN_SRC text
Result set: #select
| ui | qsi |
+------+-------+
| args | query |
#+END_SRC
In the editor, these correspond to:
1. =main(String[] args)= — source-like
2. =conn.createStatement().executeUpdate(query)= — sink
However, =System.console().readLine()= is not detected as a source. Therefore, =SqlTainted.ql= cannot find a complete flow.
** Supplement Sources via the Model Editor
- [ ] We observe no flow from source to sink
- A sink exists (=executeUpdate=)
- But no recognized source is found
- [ ] There are two ways to fix this:
1. Add a new source in =Customizations.qll=
2. Add a new source in the models-as-data YAML format
** Supplement CodeQL: Write a Full Manual Query
A manual dataflow query is already available:
[[./full-query.ql]]
This can trace the data manually even when standard configuration fails.
** Supplement CodeQL: Add to FlowSource or a Subclass
Sometimes, the only way to identify how to extend a source is to understand how CodeQL internally resolves source nodes.
Key class hierarchies:
#+BEGIN_SRC java
abstract class SourceNode extends DataFlow::Node
abstract class RemoteFlowSource extends SourceNode
#+END_SRC
Follow usage in:
- [[../ql/java/ql/lib/Customizations.qll]]
- [[../ql/java/ql/src/Security/CWE/CWE-089/SqlTainted.ql]]
Then modify =Customizations.qll= by adding the custom source. The modified
[[../ql/java/ql/lib/Customizations.qll]] is
#+BEGIN_SRC java
import java
private import semmle.code.java.dataflow.FlowSources
class ReadLine extends RemoteFlowSource {
ReadLine() {
exists(Call read |
read.getCallee().getName() = "readLine" and
read = this.asExpr()
)
}
override string getSourceType() { result = "Console readline" }
}
#+END_SRC
This allows
#+BEGIN_SRC java
predicate isSource(DataFlow::Node src) {
src instanceof ActiveThreatModelSource
}
#+END_SRC
to include =readLine()= even though we extended =RemoteFlowSource=.
** Supplement CodeQL: Add to models-as-data
To modify the dataflow configuration using the models-as-data mechanism, we will explicitly define a new source model for =java.io.Console.readLine=. This function is already modeled in CodeQL—but only as a =summaryModel=. For SQL injection tracking, we want to treat it as a =sourceModel=. Since its already covered in auto-generated data, it does not appear in the model editor interface.
We begin by locating the existing model:
- The model schema used for extensions is defined here:
[[../ql/java/ql/lib/semmle/code/java/dataflow/internal/ExternalFlowExtensions.qll]]
- For reference, see a sample manually written model YAML:
[[../.github/codeql/extensions/jedis-db-local-java/models/redis.clients.jedis.model.yml]]
- To verify that readline is already modeled, use ripgrep:
#+BEGIN_SRC sh
1:$ rg -i 'java.io.*Console.*readline' ql/java
ql/java/ql/lib/ext/generated/java.io.model.yml
16: - ["java.io", "Console", False, "readLine", "()", "", "Argument[this]", "ReturnValue", "taint", "df-generated"]
17: - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[0]", "Argument[this]", "taint", "df-generated"]
18: - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[1].ArrayElement", "Argument[this]", "taint", "df-generated"]
19: - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[this]", "ReturnValue", "taint", "df-generated"]
#+END_SRC
Note: this model is auto-generated (=df-generated=) and appears under =summaryModel=.
Here is an example of that structure:
#+BEGIN_SRC yaml
extensions:
- addsTo:
pack: codeql/java-all
extensible: summaryModel
data:
...
- ["java.io", "Console", False, "readLine", "()", "", "Argument[this]", "ReturnValue", "taint", "df-generated"]
- ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[0]", "Argument[this]", "taint", "df-generated"]
- ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[1].ArrayElement", "Argument[this]", "taint", "df-generated"]
- ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument
#+END_SRC
Because this modeling is already present, the model editor UI will hide the
function. To override it, well define a new source manually in plain YAML.
First, recall the schema definition for =sourceModel=:
#+BEGIN_SRC java
extensible predicate sourceModel(
string package, string type, boolean subtypes, string name, string signature, string ext,
string output, string kind, string provenance, QlBuiltins::ExtensionId madId
);
#+END_SRC
Starting from the existing =summaryModel=,
#+BEGIN_SRC yaml
# summaryModel
# string package, string type, boolean subtypes, string name, string signature, string ext, string input, string output, string kind, string provenance, QlBuiltins::ExtensionId madId
- ["java.io", "Console", False, "readLine", "()", "", "Argument[this]", "ReturnValue", "taint", "df-generated"]
#+END_SRC
we construct the following =sourceModel= definition instead:
#+BEGIN_SRC yaml
extensions:
- addsTo:
pack: codeql/java-all
extensible: sourceModel
data:
# sourceModel
# string package, string type, boolean subtypes, string name, string signature, string ext, string output, string kind, string provenance, QlBuiltins::ExtensionId madId
- ["java.io", "Console", False, "readLine", "()", "", "ReturnValue", "remote", "manual"]
# # from original
# # summaryModel
# # string package, string type, boolean subtypes, string name, string signature, string ext, string input, string output, string kind, string provenance, QlBuiltins::ExtensionId madId
# - ["java.io", "Console", False, "readLine", "()", "", "Argument[this]", "ReturnValue", "taint", "df-generated"]
#+END_SRC
Place this in:
[[../.github/codeql/extensions/sqlite-db/models/sqlite.model.yml]]
To ensure the model extension is applied, you must instruct the CodeQL
extension to include all extension packs. In =qllab.code-workspace=, add:
#+BEGIN_SRC javascript
{
...,
"settings": {
...,
"codeQL.runningQueries.useExtensionPacks": "all"
}
}
#+END_SRC
If needed, also include this setting in =.vscode/settings.json=:
#+BEGIN_SRC javascript
"codeQL.runningQueries.useExtensionPacks": "all"
#+END_SRC
Now re-run the query:
[[../ql/java/ql/src/Security/CWE/CWE-089/SqlTainted.ql]]
You should see flows that originate at =readLine()= and reach the SQL sink. This confirms that your manual =sourceModel= extension is effective.