Files
codeql-lab/codeql-sqlite-java

Using sqlite to illustrate models-as-data

This section demonstrates the use of the models-as-data system by analyzing a small Java application that uses the SQLite JDBC driver. The example is adapted from a CodeQL workshop.

Build the CodeQL Database

To get started, build the CodeQL database for the SQLite-backed Java sample. Adjust paths as needed.

  SRCDIR=$(pwd)
  DB=$SRCDIR/java-sqlite-$(cd $SRCDIR && git rev-parse --short HEAD).db

  echo $DB
  test -d "$DB" && rm -fR "$DB"
  mkdir -p "$DB"

  # Ensure the correct CodeQL version is in your PATH
  export PATH="$(cd ../codeql && pwd):$PATH"
  codeql database create --language=java -s . -j 8 -v $DB --command='./build.sh'

  # Check for presence of AddUser.java in the resulting database
  unzip -v $DB/src.zip | grep AddUser

Then add this database directory to your VS Code DATABASES tab.

Tests Using a Default Query

You can run the standard SQL injection query:

../ql/java/ql/src/Security/CWE/CWE-089/SqlTainted.ql

but it will return no results. However, it does help identify which classes are being analyzed as potential sources and sinks. Instead, run the diagnostic query:

./Illustrations.ql

You can run it from the CLI:

  codeql query run                                \
         -v                                       \
         --database java-sqlite-e2e555c.db        \
         --output result.bqrs                     \
         --threads=12                             \
         --ram=14000                              \
         Illustrations.ql

  codeql bqrs decode --format=text result.bqrs | sed -n '/^Result set: #select/,$p'

The result will look like:

  Result set: #select
  |  ui  |  qsi  |
  +------+-------+
  | args | query |

In the editor, these correspond to:

  1. main(String[] args) — source-like
  2. conn.createStatement().executeUpdate(query) — sink

However, System.console().readLine() is not detected as a source. Therefore, SqlTainted.ql cannot find a complete flow.

Supplement Sources via the Model Editor

  • We observe no flow from source to sink

    • A sink exists (executeUpdate)
    • But no recognized source is found
  • There are two ways to fix this:

    1. Add a new source in Customizations.qll
    2. Add a new source in the models-as-data YAML format

Supplement CodeQL: Write a Full Manual Query

A manual dataflow query is already available:

./full-query.ql

This can trace the data manually even when standard configuration fails.

Supplement CodeQL: Add to FlowSource or a Subclass

Sometimes, the only way to identify how to extend a source is to understand how CodeQL internally resolves source nodes.

Key class hierarchies:

  abstract class SourceNode extends DataFlow::Node
  abstract class RemoteFlowSource extends SourceNode

Follow usage in:

Then modify Customizations.qll by adding the custom source. The modified ../ql/java/ql/lib/Customizations.qll is

  import java
  private import semmle.code.java.dataflow.FlowSources

  class ReadLine extends RemoteFlowSource {
    ReadLine() {
      exists(Call read |
        read.getCallee().getName() = "readLine" and
        read = this.asExpr()
      )
    }

    override string getSourceType() { result = "Console readline" }
  }

This allows

  predicate isSource(DataFlow::Node src) {
      src instanceof ActiveThreatModelSource
  }

to include readLine() even though we extended RemoteFlowSource.

Supplement CodeQL: Add to models-as-data

To modify the dataflow configuration using the models-as-data mechanism, we will explicitly define a new source model for java.io.Console.readLine. This function is already modeled in CodeQL—but only as a summaryModel. For SQL injection tracking, we want to treat it as a sourceModel. Since its already covered in auto-generated data, it does not appear in the model editor interface.

We begin by locating the existing model:

  • The model schema used for extensions is defined here: ../ql/java/ql/lib/semmle/code/java/dataflow/internal/ExternalFlowExtensions.qll
  • For reference, see a sample manually written model YAML: ../.github/codeql/extensions/jedis-db-local-java/models/redis.clients.jedis.model.yml
  • To verify that readline is already modeled, use ripgrep:

      1:$ rg -i 'java.io.*Console.*readline' ql/java
      ql/java/ql/lib/ext/generated/java.io.model.yml
      16:      - ["java.io", "Console", False, "readLine", "()", "", "Argument[this]", "ReturnValue", "taint", "df-generated"]
      17:      - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[0]", "Argument[this]", "taint", "df-generated"]
      18:      - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[1].ArrayElement", "Argument[this]", "taint", "df-generated"]
      19:      - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[this]", "ReturnValue", "taint", "df-generated"]

Note: this model is auto-generated (df-generated) and appears under summaryModel.

Here is an example of that structure:

  extensions:
    - addsTo:
        pack: codeql/java-all
        extensible: summaryModel
      data:
        ...
        - ["java.io", "Console", False, "readLine", "()", "", "Argument[this]", "ReturnValue", "taint", "df-generated"]
        - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[0]", "Argument[this]", "taint", "df-generated"]
        - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[1].ArrayElement", "Argument[this]", "taint", "df-generated"]
        - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument

Because this modeling is already present, the model editor UI will hide the function. To override it, well define a new source manually in plain YAML.

First, recall the schema definition for sourceModel:

  extensible predicate sourceModel(
    string package, string type, boolean subtypes, string name, string signature, string ext,
    string output, string kind, string provenance, QlBuiltins::ExtensionId madId
  );

Starting from the existing summaryModel,

  # summaryModel
  # string package, string type, boolean subtypes, string name, string signature, string ext, string input,     string output, string kind,  string provenance, QlBuiltins::ExtensionId madId
  - ["java.io",     "Console",   False,            "readLine",  "()",             "",         "Argument[this]", "ReturnValue", "taint",      "df-generated"]

we construct the following sourceModel definition instead:

  extensions:
    - addsTo:
        pack: codeql/java-all
        extensible: sourceModel
      data: 
        # sourceModel
        # string package, string type, boolean subtypes, string name, string signature, string ext,                   string output,    string kind,   string provenance, QlBuiltins::ExtensionId madId
        - ["java.io",     "Console",   False,            "readLine",  "()",             "",                           "ReturnValue",    "remote",      "manual"]

        # # from original
        # # summaryModel
        # # string package, string type, boolean subtypes, string name, string signature, string ext, string input,     string output, string kind,  string provenance, QlBuiltins::ExtensionId madId
        # - ["java.io",     "Console",   False,            "readLine",  "()",             "",         "Argument[this]", "ReturnValue", "taint",      "df-generated"]

Place this in:

../.github/codeql/extensions/sqlite-db/models/sqlite.model.yml

To ensure the model extension is applied, you must instruct the CodeQL extension to include all extension packs. In qllab.code-workspace, add:

  {
      ...,
      "settings": {
          ...,
          "codeQL.runningQueries.useExtensionPacks": "all"
      }
  }

If needed, also include this setting in .vscode/settings.json:

  "codeQL.runningQueries.useExtensionPacks": "all"

Now re-run the query:

../ql/java/ql/src/Security/CWE/CWE-089/SqlTainted.ql

You should see flows that originate at readLine() and reach the SQL sink. This confirms that your manual sourceModel extension is effective.