mirror of
https://github.com/hohn/codeql-lab.git
synced 2025-12-16 09:53:04 +01:00
233 lines
8.9 KiB
Org Mode
233 lines
8.9 KiB
Org Mode
* Using sqlite to illustrate models-as-data
|
||
|
||
This section demonstrates the use of the models-as-data system by analyzing a
|
||
small Java application that uses the SQLite JDBC driver. The example is adapted
|
||
from a CodeQL workshop.
|
||
|
||
** Build the CodeQL Database
|
||
|
||
To get started, build the CodeQL database for the SQLite-backed Java sample. Adjust paths as needed.
|
||
|
||
#+BEGIN_SRC sh
|
||
SRCDIR=$(pwd)
|
||
DB=$SRCDIR/java-sqlite-$(cd $SRCDIR && git rev-parse --short HEAD).db
|
||
|
||
echo $DB
|
||
test -d "$DB" && rm -fR "$DB"
|
||
mkdir -p "$DB"
|
||
|
||
# Ensure the correct CodeQL version is in your PATH
|
||
export PATH="$(cd ../codeql && pwd):$PATH"
|
||
codeql database create --language=java -s . -j 8 -v $DB --command='./build.sh'
|
||
|
||
# Check for presence of AddUser.java in the resulting database
|
||
unzip -v $DB/src.zip | grep AddUser
|
||
#+END_SRC
|
||
|
||
Then add this database directory to your VS Code =DATABASES= tab.
|
||
|
||
** Tests Using a Default Query
|
||
|
||
You can run the standard SQL injection query:
|
||
|
||
[[../ql/java/ql/src/Security/CWE/CWE-089/SqlTainted.ql]]
|
||
|
||
but it will return no results. However, it does help identify which classes are being analyzed as potential sources and sinks. Instead, run the diagnostic query:
|
||
|
||
[[./Illustrations.ql]]
|
||
|
||
You can run it from the CLI:
|
||
|
||
#+BEGIN_SRC sh
|
||
codeql query run \
|
||
-v \
|
||
--database java-sqlite-e2e555c.db \
|
||
--output result.bqrs \
|
||
--threads=12 \
|
||
--ram=14000 \
|
||
Illustrations.ql
|
||
|
||
codeql bqrs decode --format=text result.bqrs | sed -n '/^Result set: #select/,$p'
|
||
#+END_SRC
|
||
|
||
The result will look like:
|
||
|
||
#+BEGIN_SRC text
|
||
Result set: #select
|
||
| ui | qsi |
|
||
+------+-------+
|
||
| args | query |
|
||
#+END_SRC
|
||
|
||
In the editor, these correspond to:
|
||
1. =main(String[] args)= — source-like
|
||
2. =conn.createStatement().executeUpdate(query)= — sink
|
||
|
||
However, =System.console().readLine()= is not detected as a source. Therefore, =SqlTainted.ql= cannot find a complete flow.
|
||
|
||
** Supplement Sources via the Model Editor
|
||
|
||
- [ ] We observe no flow from source to sink
|
||
- A sink exists (=executeUpdate=)
|
||
- But no recognized source is found
|
||
- [ ] There are two ways to fix this:
|
||
1. Add a new source in =Customizations.qll=
|
||
2. Add a new source in the models-as-data YAML format
|
||
|
||
** Supplement CodeQL: Write a Full Manual Query
|
||
|
||
A manual dataflow query is already available:
|
||
|
||
[[./full-query.ql]]
|
||
|
||
This can trace the data manually even when standard configuration fails.
|
||
|
||
** Supplement CodeQL: Add to FlowSource or a Subclass
|
||
|
||
Sometimes, the only way to identify how to extend a source is to understand how CodeQL internally resolves source nodes.
|
||
|
||
Key class hierarchies:
|
||
#+BEGIN_SRC java
|
||
abstract class SourceNode extends DataFlow::Node
|
||
abstract class RemoteFlowSource extends SourceNode
|
||
#+END_SRC
|
||
|
||
Follow usage in:
|
||
- [[../ql/java/ql/lib/Customizations.qll]]
|
||
- [[../ql/java/ql/src/Security/CWE/CWE-089/SqlTainted.ql]]
|
||
|
||
Then modify =Customizations.qll= by adding the custom source. The modified
|
||
[[../ql/java/ql/lib/Customizations.qll]] is
|
||
#+BEGIN_SRC java
|
||
import java
|
||
private import semmle.code.java.dataflow.FlowSources
|
||
|
||
class ReadLine extends RemoteFlowSource {
|
||
ReadLine() {
|
||
exists(Call read |
|
||
read.getCallee().getName() = "readLine" and
|
||
read = this.asExpr()
|
||
)
|
||
}
|
||
|
||
override string getSourceType() { result = "Console readline" }
|
||
}
|
||
#+END_SRC
|
||
|
||
This allows
|
||
|
||
#+BEGIN_SRC java
|
||
predicate isSource(DataFlow::Node src) {
|
||
src instanceof ActiveThreatModelSource
|
||
}
|
||
#+END_SRC
|
||
|
||
to include =readLine()= even though we extended =RemoteFlowSource=.
|
||
|
||
** Supplement CodeQL: Add to models-as-data
|
||
|
||
To modify the dataflow configuration using the models-as-data mechanism, we will explicitly define a new source model for =java.io.Console.readLine=. This function is already modeled in CodeQL—but only as a =summaryModel=. For SQL injection tracking, we want to treat it as a =sourceModel=. Since it’s already covered in auto-generated data, it does not appear in the model editor interface.
|
||
|
||
We begin by locating the existing model:
|
||
|
||
- The model schema used for extensions is defined here:
|
||
[[../ql/java/ql/lib/semmle/code/java/dataflow/internal/ExternalFlowExtensions.qll]]
|
||
|
||
- For reference, see a sample manually written model YAML:
|
||
[[../.github/codeql/extensions/jedis-db-local-java/models/redis.clients.jedis.model.yml]]
|
||
|
||
- To verify that readline is already modeled, use ripgrep:
|
||
#+BEGIN_SRC sh
|
||
1:$ rg -i 'java.io.*Console.*readline' ql/java
|
||
ql/java/ql/lib/ext/generated/java.io.model.yml
|
||
16: - ["java.io", "Console", False, "readLine", "()", "", "Argument[this]", "ReturnValue", "taint", "df-generated"]
|
||
17: - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[0]", "Argument[this]", "taint", "df-generated"]
|
||
18: - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[1].ArrayElement", "Argument[this]", "taint", "df-generated"]
|
||
19: - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[this]", "ReturnValue", "taint", "df-generated"]
|
||
#+END_SRC
|
||
|
||
Note: this model is auto-generated (=df-generated=) and appears under =summaryModel=.
|
||
|
||
Here is an example of that structure:
|
||
#+BEGIN_SRC yaml
|
||
extensions:
|
||
- addsTo:
|
||
pack: codeql/java-all
|
||
extensible: summaryModel
|
||
data:
|
||
...
|
||
- ["java.io", "Console", False, "readLine", "()", "", "Argument[this]", "ReturnValue", "taint", "df-generated"]
|
||
- ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[0]", "Argument[this]", "taint", "df-generated"]
|
||
- ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[1].ArrayElement", "Argument[this]", "taint", "df-generated"]
|
||
- ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument
|
||
#+END_SRC
|
||
|
||
Because this modeling is already present, the model editor UI will hide the
|
||
function. To override it, we’ll define a new source manually in plain YAML.
|
||
|
||
First, recall the schema definition for =sourceModel=:
|
||
|
||
#+BEGIN_SRC java
|
||
extensible predicate sourceModel(
|
||
string package, string type, boolean subtypes, string name, string signature, string ext,
|
||
string output, string kind, string provenance, QlBuiltins::ExtensionId madId
|
||
);
|
||
#+END_SRC
|
||
|
||
Starting from the existing =summaryModel=,
|
||
|
||
#+BEGIN_SRC yaml
|
||
# summaryModel
|
||
# string package, string type, boolean subtypes, string name, string signature, string ext, string input, string output, string kind, string provenance, QlBuiltins::ExtensionId madId
|
||
- ["java.io", "Console", False, "readLine", "()", "", "Argument[this]", "ReturnValue", "taint", "df-generated"]
|
||
#+END_SRC
|
||
|
||
we construct the following =sourceModel= definition instead:
|
||
|
||
#+BEGIN_SRC yaml
|
||
extensions:
|
||
- addsTo:
|
||
pack: codeql/java-all
|
||
extensible: sourceModel
|
||
data:
|
||
# sourceModel
|
||
# string package, string type, boolean subtypes, string name, string signature, string ext, string output, string kind, string provenance, QlBuiltins::ExtensionId madId
|
||
- ["java.io", "Console", False, "readLine", "()", "", "ReturnValue", "remote", "manual"]
|
||
|
||
# # from original
|
||
# # summaryModel
|
||
# # string package, string type, boolean subtypes, string name, string signature, string ext, string input, string output, string kind, string provenance, QlBuiltins::ExtensionId madId
|
||
# - ["java.io", "Console", False, "readLine", "()", "", "Argument[this]", "ReturnValue", "taint", "df-generated"]
|
||
#+END_SRC
|
||
|
||
Place this in:
|
||
|
||
[[../.github/codeql/extensions/sqlite-db/models/sqlite.model.yml]]
|
||
|
||
To ensure the model extension is applied, you must instruct the CodeQL
|
||
extension to include all extension packs. In =qllab.code-workspace=, add:
|
||
|
||
#+BEGIN_SRC javascript
|
||
{
|
||
...,
|
||
"settings": {
|
||
...,
|
||
"codeQL.runningQueries.useExtensionPacks": "all"
|
||
}
|
||
}
|
||
#+END_SRC
|
||
|
||
If needed, also include this setting in =.vscode/settings.json=:
|
||
|
||
#+BEGIN_SRC javascript
|
||
"codeQL.runningQueries.useExtensionPacks": "all"
|
||
#+END_SRC
|
||
|
||
Now re-run the query:
|
||
|
||
[[../ql/java/ql/src/Security/CWE/CWE-089/SqlTainted.ql]]
|
||
|
||
You should see flows that originate at =readLine()= and reach the SQL sink. This confirms that your manual =sourceModel= extension is effective.
|
||
|
||
|