This commit is contained in:
Michael Hohn
2025-07-18 10:36:52 -07:00
committed by =Michael Hohn
parent d5732f5509
commit 63b453bbf6
3 changed files with 160 additions and 199 deletions

View File

@@ -1,64 +1,9 @@
[[https://imgs.xkcd.com/comics/exploits_of_a_mom.png]] [[https://imgs.xkcd.com/comics/exploits_of_a_mom.png]]
(from https://xkcd.com/327/) (from https://xkcd.com/327/)
* SQL injection example * Using sqlite to illustrate models-as-data
** Setup and sample run
#+BEGIN_SRC sh
# Use a simple headline prompt
PS1='
\033[32m---- SQL injection demo ----\[\033[33m\033[0m\]
$?:$ '
# Build
./build.sh
# Prepare db
./admin -r
./admin -c
./admin -s
# Add regular user interactively
./add-user 2>> users.log
First User
# Regular user via "external" process
echo "User Outside" | ./add-user 2>> users.log
# Check
./admin -s
# Add Johnny Droptable
./add-user 2>> users.log
Johnny'); DROP TABLE users; --
# And the problem:
./admin -s
# Check the log
tail users.log
#+END_SRC
** Identify the problem
=./add-user= is reading from =STDIN=, and writing to a database; looking at the code in
[[./add-user.c]] leads to
: count = read(STDIN_FILENO, buf, BUFSIZE - 1);
for the read and
: rc = sqlite3_exec(db, query, NULL, 0, &zErrMsg);
for the write.
This problem is thus a dataflow problem; in codeql terminology we have
- a /source/ at the =read(STDIN_FILENO, buf, BUFSIZE - 1);=
- a /sink/ at the =sqlite3_exec(db, query, NULL, 0, &zErrMsg);=
We write codeql to identify these two, and then connect them via
- a /dataflow configuration/ -- for this problem, the more general /taintflow
configuration/.
** Build codeql database ** Build codeql database
To get started, build the codeql database (adjust paths to your setup): To get started, build the codeql database (adjust paths to your setup):
#+BEGIN_SRC sh #+BEGIN_SRC sh
@@ -75,155 +20,168 @@
#+END_SRC #+END_SRC
Then add this database directory to your VS Code =DATABASES= tab. Then add this database directory to your VS Code =DATABASES= tab.
** Tests using a default query
** TODO supplement sources via the model editor
** TODO supplement codeql: Add to FlowSource or a subclass
Note: this /one area/ that just has to be known. Browsing source will *not*
help you.
** Build codeql database in steps CodeQL reading hint:
For larger projects, using a single command to build everything is costly when : class ActiveThreatModelSource extends DataFlow::Node
any part of the build fails. uses
: this.(SourceNode).getThreatModel()
To build a database in steps, use the following sequence, adjusting paths to So following the cast (SourceNode) may be useful:
your setup:
#+BEGIN_SRC sh
# Build the db with source commit id.
export PATH=$HOME/local/vmsync/codeql250:"$PATH"
SRCDIR=$HOME/local/codeql-training-material.cpp-sqli/cpp/codeql-dataflow-sql-injection
DB=$SRCDIR/cpp-sqli-$(cd $SRCDIR && git rev-parse --short HEAD)
# Check paths
echo $DB
echo $SRCDIR
# Prepare db directory
test -d "$DB" && rm -fR "$DB"
mkdir -p "$DB"
# Run the build
cd $SRCDIR
codeql database init --language=cpp -s . -v $DB
# Repeat trace-command as needed to cover all targets
codeql database trace-command -v $DB -- make
codeql database finalize -j4 $DB
#+END_SRC
Then add this database directory to your VS Code =DATABASES= tab.
** Develop the query bottom-up
1. Identify the /source/ part of the
: read(STDIN_FILENO, buf, BUFSIZE - 1);
expression, the =buf= argument.
Start from a =from..where..select=, then convert to a predicate.
2. Identify the /sink/ part of the
: sqlite3_exec(db, query, NULL, 0, &zErrMsg);
expression, the =query= argument. Again start from =from..where..select=,
then convert to a predicate.
3. Fill in the /taintflow configuration/ boilerplate
#+BEGIN_SRC java
class CppSqli extends TaintTracking::Configuration {
CppSqli() { this = "CppSqli" }
override predicate isSource(DataFlow::Node node) {
none()
}
override predicate isSink(DataFlow::Node node) {
none()
}
}
#+END_SRC
Note that an inout-argument in C/C++ (the =buf= pointer is passed to =read=
and points to updated data after the return) is accessed as a codeql source
via
: source.(DataFlow::PostUpdateNode).getPreUpdateNode().asExpr()
instead of the usual
: source.asExpr()
The final query (without =isAdditionalTaintStep=) is
#+BEGIN_SRC java #+BEGIN_SRC java
/** /**
,* @name SQLI Vulnerability ,* A data flow source.
,* @description Using untrusted strings in a sql query allows sql injection attacks.
,* @kind path-problem
,* @id cpp/SQLIVulnerable
,* @problem.severity warning
,*/ ,*/
abstract class SourceNode extends DataFlow::Node
#+END_SRC
Following the =abstract class= is promising:
#+BEGIN_SRC java
abstract class RemoteFlowSource extends SourceNode
#+END_SRC
and others.
import cpp In
import semmle.code.cpp.dataflow.TaintTracking [[../ql/java/ql/lib/Customizations.qll]]
import DataFlow::PathGraph notice the comments mentioning RemoteFlowSource.
Use imports from [[../ql/java/ql/src/Security/CWE/CWE-089/SqlTainted.ql]]
but note that there are conflicts. you will use
: private import semmle.code.java.dataflow.FlowSources
Follow this to FlowSources, and find the mentioned RemoteFlowSource
: abstract class RemoteFlowSource extends SourceNode
class SqliFlowConfig extends TaintTracking::Configuration { Add the custom source. The modified [[../ql/java/ql/lib/Customizations.qll]] is
SqliFlowConfig() { this = "SqliFlow" } #+BEGIN_SRC java
import java
private import semmle.code.java.dataflow.FlowSources
override predicate isSource(DataFlow::Node source) { class ReadLine extends RemoteFlowSource {
// count = read(STDIN_FILENO, buf, BUFSIZE); ReadLine() {
exists(FunctionCall read | exists(Call read |
read.getTarget().getName() = "read" and read.getCallee().getName() = "readLine" and
read.getArgument(1) = source.(DataFlow::PostUpdateNode).getPreUpdateNode().asExpr() read = this.asExpr()
) )
} }
override predicate isSink(DataFlow::Node sink) { override string getSourceType() { result = "Console readline" }
// rc = sqlite3_exec(db, query, NULL, 0, &zErrMsg); }
exists(FunctionCall exec | #+END_SRC
exec.getTarget().getName() = "sqlite3_exec" and
exec.getArgument(1) = sink.asExpr() Note that the predicate
) #+BEGIN_SRC java
module QueryInjectionFlowConfig implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node src) { src instanceof ActiveThreatModelSource }
...;
}
#+END_SRC
now also returns the readLine() result -- although we extended
RemoteFlowSource, not ActiveThreatModelSource
** TODO supplement codeql: Add to models-as-data
- schema in codeql: [[../ql/java/ql/lib/semmle/code/java/dataflow/internal/ExternalFlowExtensions.qll]]
- data sample: [[../.github/codeql/extensions/jedis-db-local-java/models/redis.clients.jedis.model.yml]]
In the model editor, we see a java.io.*Console.*readline' (using =show already modeled= option)
#+BEGIN_SRC sh
1:$ rg -i 'java.io.*Console.*readline' ql/java
ql/java/ql/lib/ext/generated/java.io.model.yml
16: - ["java.io", "Console", False, "readLine", "()", "", "Argument[this]", "ReturnValue", "taint", "df-generated"]
17: - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[0]", "Argument[this]", "taint", "df-generated"]
18: - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[1].ArrayElement", "Argument[this]", "taint", "df-generated"]
19: - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[this]", "ReturnValue", "taint", "df-generated"]
#+END_SRC
note: this file is in the generated/ tree.
The current readline modeling is in the =summaryModel= section; we need it
in a =sourceModel=
#+BEGIN_SRC yaml
extensions:
- addsTo:
pack: codeql/java-all
extensible: summaryModel
data:
...
- ["java.io", "Console", False, "readLine", "()", "", "Argument[this]", "ReturnValue", "taint", "df-generated"]
- ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[0]", "Argument[this]", "taint", "df-generated"]
- ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[1].ArrayElement", "Argument[this]", "taint", "df-generated"]
- ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument
#+END_SRC
The model editor will not show this because its already modeled. To
illustrate text-based additions, we'll use plain text.
Starting from
#+BEGIN_SRC yaml
extensions:
- addsTo:
pack: codeql/java-all
extensible: summaryModel
data:
...
- ["java.io", "Console", False, "readLine", "()", "", "Argument[this]", "ReturnValue", "taint", "df-generated"]
- ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[0]", "Argument[this]", "taint", "df-generated"]
- ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[1].ArrayElement", "Argument[this]", "taint", "df-generated"]
- ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument
#+END_SRC
and the field information
#+BEGIN_SRC java
extensible predicate sourceModel(
string package, string type, boolean subtypes, string name, string signature, string ext,
string output, string kind, string provenance, QlBuiltins::ExtensionId madId
);
#+END_SRC
Starting from =summaryModel=
#+BEGIN_SRC yaml
# summaryModel
# string package, string type, boolean subtypes, string name, string signature, string ext, string input, string output, string kind, string provenance, QlBuiltins::ExtensionId madId
- ["java.io", "Console", False, "readLine", "()", "", "Argument[this]", "ReturnValue", "taint", "df-generated"]
#+END_SRC
we can construct the =sourceModel=
#+BEGIN_SRC yaml
extensions:
- addsTo:
pack: codeql/java-all
extensible: sourceModel
data:
# sourceModel
# string package, string type, boolean subtypes, string name, string signature, string ext, string output, string kind, string provenance, QlBuiltins::ExtensionId madId
- ["java.io", "Console", False, "readLine", "()", "", "ReturnValue", "remote", "manual"]
# # from original
# # summaryModel
# # string package, string type, boolean subtypes, string name, string signature, string ext, string input, string output, string kind, string provenance, QlBuiltins::ExtensionId madId
# - ["java.io", "Console", False, "readLine", "()", "", "Argument[this]", "ReturnValue", "taint", "df-generated"]
#+END_SRC
and move this into [[../.github/codeql/extensions/sqlite-db/models/sqlite.model.yml]]
To ensure that these model extensions are applied during query runs, include
this setting
#+begin_src javascript
{
...,
"settings": {
...,
"codeQL.runningQueries.useExtensionPacks": "all"
} }
} }
#+end_src
from SqliFlowConfig conf, DataFlow::PathNode source, DataFlow::PathNode sink in the workspace configuration file [[../qllab.code-workspace]]
where conf.hasFlowPath(source, sink)
select sink, source, sink, "Possible SQL injection"
#+END_SRC
** Optional: sarif file review of the results In some environments (e.g., older VS Code versions), you may also need to
Query results are available in several output formats using the cli. The replicate this setting in [[../.vscode/settings.json]]; there it simplifies to
following produces the sarif format, a json-based result description. #+begin_src javascript
"codeQL.runningQueries.useExtensionPacks": "all"
#+end_src
#+BEGIN_SRC sh Now we can run [[../ql/java/ql/src/Security/CWE/CWE-089/SqlTainted.ql]] again.
# The setup information from before
export PATH=$HOME/local/vmsync/codeql250:"$PATH"
SRCDIR=$HOME/local/codeql-training-material.cpp-sqli/cpp/codeql-dataflow-sql-injection
DB=$SRCDIR/cpp-sqli-$(cd $SRCDIR && git rev-parse --short HEAD)
# Check paths
echo $DB
echo $SRCDIR
# To see the help
codeql database analyze -h
# Run a query
codeql database analyze \
-v \
--ram=14000 \
-j12 \
--rerun \
--search-path ~/local/vmsync/ql \
--format=sarif-latest \
--output cpp-sqli.sarif \
-- \
$DB \
$SRCDIR/SqlInjection.ql
# Examine the file in an editor
edit cpp-sqli.sarif
#+END_SRC
An example of using the sarif data is in the the jq script [[./sarif-summary.jq]].
When run against the sarif input via
#+BEGIN_SRC sh
jq --raw-output --join-output -f sarif-summary.jq < cpp-sqli.sarif > cpp-sqli.txt
#+END_SRC
it produces output in a form close to that of compiler error messages:
#+BEGIN_SRC text
query-id: message line
Path
...
Path
...
#+END_SRC

View File

@@ -176,13 +176,6 @@
#+END_SRC #+END_SRC
* Modeling sqlite as dependency
The tree
: src-sqlite
contains a trivial sample taken from a workshop. It uses
=sqlite-jdbc-3.36.0.1.jar=, so we can use it to illustrate modeling on a smaller
example.
* Modeling Jedis as a Dependency in Model Editor * Modeling Jedis as a Dependency in Model Editor
** Set up and run Editor ** Set up and run Editor
To model =jedis= for taint analysis using the /model editor/, select the /"model To model =jedis= for taint analysis using the /model editor/, select the /"model
@@ -311,6 +304,16 @@
* TODO for java, the sqltainted query will find the sink, not the source yet. * TODO for java, the sqltainted query will find the sink, not the source yet.
[[../ql/java/ql/src/Security/CWE/CWE-089/SqlTainted.ql]] [[../ql/java/ql/src/Security/CWE/CWE-089/SqlTainted.ql]]
* TODO Modeling sqlite as dependency
The tree [[../codeql-sqlite/]] contains a trivial sample taken from a workshop. It
uses =sqlite-jdbc-3.36.0.1.jar=, so we can use it to illustrate modeling on a
smaller example. This one is unusual; the function
java.io.Console.readLine() is already modeled, but as a taint step, not a
source. We need it as source. The model editor won't show it at all because it
is already modeled,
* TODO vulnerable sample, sqlite * TODO vulnerable sample, sqlite
For .eval() to show in a query, it has to be used in an application. So we For .eval() to show in a query, it has to be used in an application. So we
modify src-sqlite/AddUser.java for jedis. modify src-sqlite/AddUser.java for jedis.

View File

@@ -58,7 +58,7 @@
+ but ActiveThreatModelSource finds no source + but ActiveThreatModelSource finds no source
- [ ] We can supplement in different ways - [ ] We can supplement in different ways
** supplement codeql: Write full manual query: already in workshop ** supplement codeql: Write full manual query: already in workshop
** supplement codeql: Add to FlowSource or a subclass ** TODO supplement codeql: Add to FlowSource or a subclass
Note: this /one area/ that just has to be known. Browsing source will *not* Note: this /one area/ that just has to be known. Browsing source will *not*
help you. help you.
@@ -116,7 +116,7 @@
now also returns the readLine() result -- although we extended now also returns the readLine() result -- although we extended
RemoteFlowSource, not ActiveThreatModelSource RemoteFlowSource, not ActiveThreatModelSource
** supplement codeql: Add to models-as-data ** TODO supplement codeql: Add to models-as-data
- schema in codeql: [[../ql/java/ql/lib/semmle/code/java/dataflow/internal/ExternalFlowExtensions.qll]] - schema in codeql: [[../ql/java/ql/lib/semmle/code/java/dataflow/internal/ExternalFlowExtensions.qll]]
- data sample: [[../.github/codeql/extensions/jedis-db-local-java/models/redis.clients.jedis.model.yml]] - data sample: [[../.github/codeql/extensions/jedis-db-local-java/models/redis.clients.jedis.model.yml]]