Files
codeql-lab/codeql-sqlite/README.org
Michael Hohn e7996c24b5 wip: outline
2025-07-11 10:58:36 -07:00

16 KiB
Raw Blame History

Using sqlite to illustrate models-as-data

This description uses / recycles a codeql workshop. The original instructions are below: SQL injection example

Build the codeql database

To get started, build the codeql database (adjust paths to your setup):

  # Build the db with source commit id.
  SRCDIR=$(pwd)
  DB=$SRCDIR/java-sqlite-$(cd $SRCDIR && git rev-parse --short HEAD).db

  echo $DB
  test -d "$DB" && rm -fR "$DB"
  mkdir -p "$DB"

  # Use the correct codeql
  export PATH="$(cd ../codeql && pwd):$PATH"
  codeql database create --language=java -s . -j 8 -v $DB --command='./build.sh'

  # Check for AddUser in the db
  unzip -v $DB/src.zip | grep AddUser

Then add this database directory to your VS Code DATABASES tab.

Tests using a default query

You can run the stdlib query ../ql/java/ql/src/Security/CWE/CWE-089/SqlTainted.ql but will get no results. It does point at classes to inspect in particular, the source and sink classes. Run ./Illustrations.ql; from the command line or vs studio code. Via cli:

  # run query
  codeql query run                                \
         -v                                       \
         --database java-sqlite-e2e555c.db        \
         --output result.bqrs                     \
         --threads=12                             \
         --ram=14000                              \
         Illustrations.ql

  # format results
  codeql bqrs decode --format=text result.bqrs | sed -n '/^Result set: #select/,$p'

This shows

  Result set: #select
  |  ui  |  qsi  |
  +------+-------+
  | args | query |

In the editor, these link to

  1. main(ARGS) and
  2. conn.createStatement().executeUpdate(QUERY);

The second is correct, but System.console().readLine(); is not found. Thus, SqlTainted.ql will not find anything.

TODO supplement sources via the model editor

  • We have no flow

    • check source, sink
    • we have a sink
    • but ActiveThreatModelSource finds no source
  • We can supplement in different ways

supplement codeql: Write full manual query: already in workshop

supplement codeql: Add to FlowSource or a subclass

Note: this one area that just has to be known. Browsing source will not help you.

CodeQL reading hint:

class ActiveThreatModelSource extends DataFlow::Node

uses

this.(SourceNode).getThreatModel()

So following the cast (SourceNode) may be useful:

  /**
   ,* A data flow source.
   ,*/
  abstract class SourceNode extends DataFlow::Node

Following the abstract class is promising:

  abstract class RemoteFlowSource extends SourceNode

and others.

In ../ql/java/ql/lib/Customizations.qll notice the comments mentioning RemoteFlowSource. Use imports from ../ql/java/ql/src/Security/CWE/CWE-089/SqlTainted.ql but note that there are conflicts. you will use

private import semmle.code.java.dataflow.FlowSources

Follow this to FlowSources, and find the mentioned RemoteFlowSource

abstract class RemoteFlowSource extends SourceNode

Add the custom source. The modified ../ql/java/ql/lib/Customizations.qll is

  import java
  private import semmle.code.java.dataflow.FlowSources

  class ReadLine extends RemoteFlowSource {
    ReadLine() {
      exists(Call read |
        read.getCallee().getName() = "readLine" and
        read = this.asExpr()
      )
    }

    override string getSourceType() { result = "Console readline" }
  }

Note that the predicate

  module QueryInjectionFlowConfig implements DataFlow::ConfigSig {
    predicate isSource(DataFlow::Node src) { src instanceof ActiveThreatModelSource }
        ...;
  }

now also returns the readLine() result although we extended RemoteFlowSource, not ActiveThreatModelSource

  • customizations in staging repo

supplement codeql: Add to models-as-data

In the model editor, we see a java.io.*Console.*readline'

  1:$ rg -i 'java.io.*Console.*readline' ql/java
  ql/java/ql/lib/ext/generated/java.io.model.yml
  16:      - ["java.io", "Console", False, "readLine", "()", "", "Argument[this]", "ReturnValue", "taint", "df-generated"]
  17:      - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[0]", "Argument[this]", "taint", "df-generated"]
  18:      - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[1].ArrayElement", "Argument[this]", "taint", "df-generated"]
  19:      - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[this]", "ReturnValue", "taint", "df-generated"]

note: this file is in the generated/ tree.

The current readline modeling is in the summaryModel section; we need it in a sourceModel

  extensions:
    - addsTo:
        pack: codeql/java-all
        extensible: summaryModel
      data:
        ...
        - ["java.io", "Console", False, "readLine", "()", "", "Argument[this]", "ReturnValue", "taint", "df-generated"]
        - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[0]", "Argument[this]", "taint", "df-generated"]
        - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[1].ArrayElement", "Argument[this]", "taint", "df-generated"]
        - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument

The model editor will not show this because its already modeled. To illustrate text-based additions, we'll use plain text. Starting from

  extensions:
    - addsTo:
        pack: codeql/java-all
        extensible: summaryModel
      data:
        ...
        - ["java.io", "Console", False, "readLine", "()", "", "Argument[this]", "ReturnValue", "taint", "df-generated"]
        - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[0]", "Argument[this]", "taint", "df-generated"]
        - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[1].ArrayElement", "Argument[this]", "taint", "df-generated"]
        - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument

and the field information

  extensible predicate sourceModel(
    string package, string type, boolean subtypes, string name, string signature, string ext,
    string output, string kind, string provenance, QlBuiltins::ExtensionId madId
  );

From

  # summaryModel
  # string package, string type, boolean subtypes, string name, string signature, string ext, string input,     string output, string kind,  string provenance, QlBuiltins::ExtensionId madId
  - ["java.io",     "Console",   False,            "readLine",  "()",             "",         "Argument[this]", "ReturnValue", "taint",      "df-generated"]

we can construct

  extensions:
    - addsTo:
        pack: codeql/java-all
        extensible: sourceModel
      data: 
        # sourceModel
        # string package, string type, boolean subtypes, string name, string signature, string ext,                   string output,    string kind,   string provenance, QlBuiltins::ExtensionId madId
        - ["java.io",     "Console",   False,            "readLine",  "()",             "",                           "ReturnValue",    "remote",      "manual"]

        # # from original
        # # summaryModel
        # # string package, string type, boolean subtypes, string name, string signature, string ext, string input,     string output, string kind,  string provenance, QlBuiltins::ExtensionId madId
        # - ["java.io",     "Console",   False,            "readLine",  "()",             "",         "Argument[this]", "ReturnValue", "taint",      "df-generated"]

and move this into ../.github/codeql/extensions/sqlite-db/models/sqlite.model.yml

SQL injection example

This directory contains the problematic Java source code. The rest of this README describes

The codeql query is developed in ../session/README.org.

Setup and sample run

The jdbc connector at https://github.com/xerial/sqlite-jdbc, from here is included in the git repository.

  # Use a simple headline prompt 
  PS1='
  \033[32m---- SQL injection demo ----\[\033[33m\033[0m\]
  $?:$ '

  
  # Build
  ./build.sh

  # Prepare db
  ./admin -r
  ./admin -c
  ./admin -s 

  # Add regular user interactively
  ./add-user 2>> users.log
  First User

  # Check
  ./admin -s

  # Add Johnny Droptable 
  ./add-user 2>> users.log
  Johnny'); DROP TABLE users; --

  # And the problem:
  ./admin -s

  # Check the log
  tail users.log

Identify the problem

./add-user is reading from STDIN, and writing to a database; looking at the code in ./AddUser.java leads to

System.console().readLine();

for the read and

conn.createStatement().executeUpdate(query);

for the write.

This problem is thus a dataflow problem; in codeql terminology we have

  • a source at the System.console().readLine();
  • a sink at the conn.createStatement().executeUpdate(query);

We write codeql to identify these two, and then connect them via

  • a dataflow configuration for this problem, the more general taintflow configuration.

Build the codeql database

To get started, build the codeql database (adjust paths to your setup):

  # Build the db with source commit id.
  SRCDIR=$(pwd)
  DB=$SRCDIR/java-sqlite-$(cd $SRCDIR && git rev-parse --short HEAD).db

  echo $DB
  test -d "$DB" && rm -fR "$DB"
  mkdir -p "$DB"

  # Use the correct codeql
  export PATH="$(cd ../codeql && pwd):$PATH"
  codeql database create --language=java -s . -j 8 -v $DB --command='./build.sh'

  # Check for AddUser in the db
  unzip -v $DB/src.zip | grep AddUser

Then add this database directory to your VS Code DATABASES tab.

(optional) Build the codeql database in steps

For larger projects, using a single command to build everything is costly when any part of the build fails. The sequence here is also used by the GHAS default setup, so familiarity with it helps in reviewing logs.

The purpose of these sections is to illustrate the codeql commands used in default setup and making the connection between the GHAS default action and the CodeQL CLI explicit.

After running default setup and downloading the log, you will see the following entries embedded in the full log. They are repeated here for completeness; you can skip the command-line options for now.

  codeql version --format=json

  codeql resolve languages --format=betterjson --extractor-options-verbosity=4 --extractor-include-aliases

  codeql database init --force-overwrite --db-cluster /home/runner/work/_temp/codeql_databases --source-root=/home/runner/work/codeql-workshop-sql-injection-java/codeql-workshop-sql-injection-java --extractor-include-aliases --language=java --codescanning-config=/home/runner/work/_temp/user-config.yaml --build-mode=none --calculate-language-specific-baseline --sublanguage-file-coverage

  codeql database trace-command --use-build-mode --working-dir /home/runner/work/codeql-workshop-sql-injection-java/codeql-workshop-sql-injection-java /home/runner/work/_temp/codeql_databases/java

  codeql database finalize --finalize-dataset --threads=4 --ram=14567 /home/runner/work/_temp/codeql_databases/java

  codeql database run-queries --ram=14567 --threads=4 /home/runner/work/_temp/codeql_databases/java --expect-discarded-cache --min-disk-free=1024 -v --intra-layer-parallelism

  codeql database cleanup /home/runner/work/_temp/codeql_databases/java --cache-cleanup=brutal

  codeql database bundle /home/runner/work/_temp/codeql_databases/java --output=/home/runner/work/_temp/codeql_databases/java.zip --name=java

To build a database in steps locally, use the following sequence, adjusting paths to your setup:

  # Build the db with source commit id.

  SRCDIR=$HOME/local/codeql-workshop-sql-injection-java/src
  DB=$SRCDIR/java-sqli-$(cd $SRCDIR && git rev-parse --short HEAD)

  # Check paths
  echo "DB will be: $DB"
  echo "SRC is in:  $SRCDIR"

  # Prepare db directory
  test -d "$DB" && rm -fR "$DB"
  mkdir -p "$DB"

  # Run the build, without --db-cluster
  #   Init database
  cd $SRCDIR
  codeql database init                            \
         --language=java                          \
         --build-mode=none                        \
         --source-root=.                          \
         -v $DB

  #   Repeat trace-command as needed to cover all targets
  codeql database trace-command                   \
         --use-build-mode                         \
         --working-dir .                          \
         $DB 

  #   Finalize database
  codeql database finalize                        \
         --finalize-dataset                       \
         --threads=4                              \
         --ram=14567                              \
         $DB

  # Use the database; get the location
  echo $DB
  # /Users/hohn/local/codeql-workshop-sql-injection-java/src/java-sqli-161a1d5

To also analyze the database just built, we use the log's command but add an explicit query name:

  codeql database run-queries                     \
         --ram=14567                              \
         --threads=4 $DB                          \
         --expect-discarded-cache                 \
         --min-disk-free=1024                     \
         -v                                       \
         --intra-layer-parallelism                \
         --                                       \
         ../session/simple.ql

This only gives us a bqrs file, we want sarif. Checking help:

  codeql database run-queries --help
  Usage: codeql database run-queries [OPTIONS] -- <database> [<query|dir|suite|pack>...]
  [Plumbing] Run a set of queries together.

  Run one or more queries against a CodeQL database, saving the results to the results
  subdirectory of the database directory.

  The results can later be converted to readable formats by codeql database interpret-results,
  or query-for-query by with codeql bqrs decode or codeql bqrs interpret.

So we run the following

  VERSION=$(cd $SRCDIR && git rev-parse --short HEAD)
  codeql database interpret-results                                   \
         --format=sarifv2.1.0                                         \
         -o simple-$VERSION.sarif  \
         -- $DB ../session/simple.ql

  echo "Results in simple-$VERSION.sarif"

We kept the output for this sample in ./simple-161a1d5.sarif