Files
codeql-lab/codeql-sqlite-java

Using sqlite to illustrate models-as-data

This section demonstrates the use of the models-as-data system by analyzing a small Java application that uses the SQLite JDBC driver. The example is adapted from a CodeQL workshop.

Build the CodeQL Database

To get started, build the CodeQL database for the SQLite-backed Java sample. Adjust paths as needed.

  SRCDIR=$(pwd)
  DB=$SRCDIR/java-sqlite-$(cd $SRCDIR && git rev-parse --short HEAD).db

  echo $DB
  test -d "$DB" && rm -fR "$DB"
  mkdir -p "$DB"

  # Ensure the correct CodeQL version is in your PATH
  export PATH="$(cd ../codeql && pwd):$PATH"
  codeql database create --language=java -s . -j 8 -v $DB --command='./build.sh'

  # Check for presence of AddUser.java in the resulting database
  unzip -v $DB/src.zip | grep AddUser

Then add this database directory to your VS Code DATABASES tab.

Tests Using a Default Query

You can run the standard SQL injection query:

../ql/java/ql/src/Security/CWE/CWE-089/SqlTainted.ql

but it will return no results. However, it does help identify which classes are being analyzed as potential sources and sinks. Instead, run the diagnostic query:

./Illustrations.ql

You can run it from the CLI:

  codeql query run                                \
         -v                                       \
         --database java-sqlite-e2e555c.db        \
         --output result.bqrs                     \
         --threads=12                             \
         --ram=14000                              \
         Illustrations.ql

  codeql bqrs decode --format=text result.bqrs | sed -n '/^Result set: #select/,$p'

The result will look like:

  Result set: #select
  |  ui  |  qsi  |
  +------+-------+
  | args | query |

In the editor, these correspond to:

  1. main(String[] args) — source-like
  2. conn.createStatement().executeUpdate(query) — sink

However, System.console().readLine() is not detected as a source. Therefore, SqlTainted.ql cannot find a complete flow.

Supplement Sources via the Model Editor

  • We observe no flow from source to sink

    • A sink exists (executeUpdate)
    • But no recognized source is found
  • There are two ways to fix this:

    1. Add a new source in Customizations.qll
    2. Add a new source in the models-as-data YAML format

Supplement CodeQL: Write a Full Manual Query

A manual dataflow query is already available:

./full-query.ql

This can trace the data manually even when standard configuration fails.

Supplement CodeQL: Add to FlowSource or a Subclass

Sometimes, the only way to identify how to extend a source is to understand how CodeQL internally resolves source nodes.

Key class hierarchies:

  abstract class SourceNode extends DataFlow::Node
  abstract class RemoteFlowSource extends SourceNode

Follow usage in:

Then modify Customizations.qll by adding the custom source. The modified ../ql/java/ql/lib/Customizations.qll is

  import java
  private import semmle.code.java.dataflow.FlowSources

  class ReadLine extends RemoteFlowSource {
    ReadLine() {
      exists(Call read |
        read.getCallee().getName() = "readLine" and
        read = this.asExpr()
      )
    }

    override string getSourceType() { result = "Console readline" }
  }

This allows

  predicate isSource(DataFlow::Node src) {
      src instanceof ActiveThreatModelSource
  }

to include readLine() even though we extended RemoteFlowSource.

TODO supplement codeql: Add to models-as-data

In the model editor, we see a java.io.*Console.*readline' (using show already modeled option)

  1:$ rg -i 'java.io.*Console.*readline' ql/java
  ql/java/ql/lib/ext/generated/java.io.model.yml
  16:      - ["java.io", "Console", False, "readLine", "()", "", "Argument[this]", "ReturnValue", "taint", "df-generated"]
  17:      - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[0]", "Argument[this]", "taint", "df-generated"]
  18:      - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[1].ArrayElement", "Argument[this]", "taint", "df-generated"]
  19:      - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[this]", "ReturnValue", "taint", "df-generated"]

note: this file is in the generated/ tree. There are others.

The current readline modeling is in the summaryModel section; we need it in a sourceModel

  extensions:
    - addsTo:
        pack: codeql/java-all
        extensible: summaryModel
      data:
        ...
        - ["java.io", "Console", False, "readLine", "()", "", "Argument[this]", "ReturnValue", "taint", "df-generated"]
        - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[0]", "Argument[this]", "taint", "df-generated"]
        - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[1].ArrayElement", "Argument[this]", "taint", "df-generated"]
        - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument

The model editor will not show this because its already modeled. To illustrate text-based additions, we'll use plain text. Starting from

  extensions:
    - addsTo:
        pack: codeql/java-all
        extensible: summaryModel
      data:
        ...
        - ["java.io", "Console", False, "readLine", "()", "", "Argument[this]", "ReturnValue", "taint", "df-generated"]
        - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[0]", "Argument[this]", "taint", "df-generated"]
        - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[1].ArrayElement", "Argument[this]", "taint", "df-generated"]
        - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument

and the field information

  extensible predicate sourceModel(
    string package, string type, boolean subtypes, string name, string signature, string ext,
    string output, string kind, string provenance, QlBuiltins::ExtensionId madId
  );

Starting from summaryModel

  # summaryModel
  # string package, string type, boolean subtypes, string name, string signature, string ext, string input,     string output, string kind,  string provenance, QlBuiltins::ExtensionId madId
  - ["java.io",     "Console",   False,            "readLine",  "()",             "",         "Argument[this]", "ReturnValue", "taint",      "df-generated"]

we can construct the sourceModel

  extensions:
    - addsTo:
        pack: codeql/java-all
        extensible: sourceModel
      data: 
        # sourceModel
        # string package, string type, boolean subtypes, string name, string signature, string ext,                   string output,    string kind,   string provenance, QlBuiltins::ExtensionId madId
        - ["java.io",     "Console",   False,            "readLine",  "()",             "",                           "ReturnValue",    "remote",      "manual"]

        # # from original
        # # summaryModel
        # # string package, string type, boolean subtypes, string name, string signature, string ext, string input,     string output, string kind,  string provenance, QlBuiltins::ExtensionId madId
        # - ["java.io",     "Console",   False,            "readLine",  "()",             "",         "Argument[this]", "ReturnValue", "taint",      "df-generated"]

and move this into ../.github/codeql/extensions/sqlite-db/models/sqlite.model.yml

To ensure that these model extensions are applied during query runs, include this setting

  {
      ...,
      "settings": {
          ...,
          "codeQL.runningQueries.useExtensionPacks": "all"
      }
  }

in the workspace configuration file ../qllab.code-workspace

In some environments (e.g., older VS Code versions), you may also need to replicate this setting in ../.vscode/settings.json; there it simplifies to

  "codeQL.runningQueries.useExtensionPacks": "all"

Now we can run ../ql/java/ql/src/Security/CWE/CWE-089/SqlTainted.ql again.