Files
codeql-lab/codeql-jedis-java

Jedis Codeql Setup

  • fork at https://github.com/hohn/jedis
  • github db build: enable code scanning, advanced config

  • local db build:

      cd ~/work-gh/codeql-lab/
    
      # Add the submodule
      git submodule add https://github.com/hohn/jedis extern/jedis
    
      # Initialize and clone the submodule
      git submodule update --init --recursive
    
    
      # Build directly once to resolve any errors
      cd ~/work-gh/codeql-lab/extern/jedis
      mvn install -DskipTests=true -Dmaven.javadoc.skip=true -B -V
    
      # Build under codeql
      # Step 1: Clean any prior Maven builds
      cd ~/work-gh/codeql-lab/extern/jedis
      mvn clean
    
      # Step 2: Run CodeQL DB creation with mvn install
      cd ~/work-gh/codeql-lab
      codeql database create assets/jedis-db-local \
             --overwrite \
             --language=java \
             --command="mvn install -DskipTests=true -Dmaven.javadoc.skip=true -B -V" \
             --source-root=extern/jedis

Jedis Codeql Modeling

Setup and Start

  # Step 1: Go to your CodeQL lab directory
  cd ~/work-gh/codeql-lab

  # Step 2: Extract the prebuilt CodeQL database for the Jedis project
  unzip -q assets/jedis-db-local.zip

  # Step 3: Extract the CodeQL command-line tools (platform-specific)
  unzip -q assets/codeql-osx64.zip

  # Step 4: Change directory to the unpacked CodeQL CLI tools
  cd ~/work-gh/codeql-lab/codeql

  # Step 5: Add the CodeQL CLI directory to your shell's PATH
  # This allows you to run `codeql` from any location
  export PATH="$(pwd):$PATH"

  # Step 6: Launch Visual Studio Code with the lab workspace
  code qllab.code-workspace

  # In VS Code, perform the following setup manually:
  # - Set the current database to: jedis-db-local
  #   (Usually from the CodeQL extension pane  this connects the UI to your analysis DB)
  # - Set the CodeQL CLI executable to: ~/work-gh/codeql-lab/codeql/codeql
  #   (Tell the extension where to find the CLI you just extracted)
  # - In the CodeQL extension tab, scroll to the bottom and select:
  #   'CodeQL: Method modeling' to begin a guided modeling tutorial

Using the Editor

Note that just by starting CodeQL: Method modeling, the new file

.github/codeql/extensions/jedis-db-local-java/codeql-pack.yml

is created.

Relevant Queries

A quick grep shows

  grep 'java.*modelgen' files  |grep -v test/

  ql/java/ql/src/utils/modelgenerator
  ql/java/ql/src/utils/modelgenerator/CaptureNeutralModels.ql
  ql/java/ql/src/utils/modelgenerator/CaptureTypeBasedSummaryModels.ql
  ql/java/ql/src/utils/modelgenerator/CaptureSinkModels.ql
  ql/java/ql/src/utils/modelgenerator/CaptureContentSummaryModels.ql
  ql/java/ql/src/utils/modelgenerator/internal
  ql/java/ql/src/utils/modelgenerator/internal/CaptureModels.qll
  ql/java/ql/src/utils/modelgenerator/internal/CaptureTypeBasedSummaryModels.qll
  ql/java/ql/src/utils/modelgenerator/internal/CaptureModelsPrinting.qll
  ql/java/ql/src/utils/modelgenerator/CaptureSummaryModels.ql
  ql/java/ql/src/utils/modelgenerator/RegenerateModels.py
  ql/java/ql/src/utils/modelgenerator/CaptureSourceModels.ql
  ql/java/ql/src/utils/modelgenerator/debug
  ql/java/ql/src/utils/modelgenerator/debug/CaptureSummaryModelsPartialPath.ql
  ql/java/ql/src/utils/modelgenerator/debug/CaptureSummaryModelsPath.ql
  ql/java/ql/src/utils/modelgenerator/debug/README.md

Primary Query File

The primary query file is

../ql/java/ql/src/utils/modelgenerator/internal/CaptureModels.qll

This acts as the backbone, exposing traits like:

  • SummaryModelGeneratorInput
  • ModelGeneratorCommonInput
  • isPrimitiveTypeUsedForBulkData(…)
  • Likely common predicates such as:

    • hasNoSideEffects(…)
    • isNeutralReturn(…)
    • isBulkGetterLike(…)

These are imported by:

  • CaptureSinkModels.ql
  • CaptureSummaryModels.ql
  • CaptureContentSummaryModels.ql
  • CaptureHeuristicSummaryModels.ql

Design: Three Modeling Targets

Module Implements Purpose
—————————- ——————————- ————————————————
SummaryModelGeneratorInput SummaryModelGeneratorInputSig Models pass-through or computed summaries
SourceModelGeneratorInput SourceModelGeneratorInputSig Models user-controlled or origin taint sources
SinkModelGeneratorInput SinkModelGeneratorInputSig Models taint sinks (e.g., logging, SQL, network)

Shared Input System ModelGeneratorCommonInput provides:

  • Name formatting
  • Type filtering (isRelevantType)
  • Signature stringification
  • “Approximate output” helpers like Argument[pos].Element

This gives a stable data interface to the rest of the system.

Filtering logic

  private predicate relevant(Callable api) {
    api.isPublic() and
    api.getDeclaringType().isPublic() and
    api.fromSource() and
    not isUninterestingForModels(api) and
    not isInfrequentlyUsed(api.getCompilationUnit())
  }

Experiment with test clone

The needed imports are private, so clone

ql/java/ql/test/utils/modelgenerator/dataflow/CaptureSourceModels.ql

and experiment there.

  import java
  import utils.modelgenerator.internal.CaptureModels
  import SourceModels
  import utils.test.InlineMadTest

  module InlineMadTestConfig implements InlineMadTestConfigSig {
    string getCapturedModel(Callable c) { result = Heuristic::captureSource(c) }

    string getKind() { result = "source" }
  }

  import InlineMadTest<InlineMadTestConfig>

Modeling Jedis as a Dependency in Model Editor

Set up and run Editor

To model jedis for taint analysis using the model editor, select the "model as dependency" option.

When this mode is active, the following CodeQL query is used:

/Users/hohn/work-gh/codeql-lab/ql/java/ql/src/utils/modeleditor/FrameworkModeEndpoints.ql

This query defines:

  from PublicEndpointFromSource endpoint, boolean supported, string type
  where
      supported = isSupported(endpoint) and
      type = supportedType(endpoint)
  select endpoint, endpoint.getPackageName(), endpoint.getTypeName(), endpoint.getName(),
      endpoint.getParameterTypes(), supported,
      endpoint.getCompilationUnit().getParentContainer().getBaseName(), type

There is a direct connection between this query and output columns in the model editor:

  • supported = true → shows in the UI as "Method already modeled"
  • supported = false → shown as "Unmodeled"

Files Created or Modified by the Modeling Workflow

Workspace Configuration Required

To ensure that these model extensions are applied during query runs, include the setting

"codeQL.runningQueries.useExtensionPacks": "all"

in the workspace configuration file ../qllab.code-workspace

In some environments (e.g., older VS Code versions), you may also need to replicate this setting in ../.vscode/settings.json

Verifying the Modeled Sink

Once the modeling is in place, a dataflow query like the following can be used to confirm the modeled sinks:

  import java
  private import semmle.code.java.dataflow.ExternalFlow
  private import semmle.code.java.dataflow.DataFlow

  from DataFlow::Node n, string type
  where sinkNode(n, type) and type = "code-injection"
  select n, type

Sample query result (run on the jedis-db-local database):

  • example.ql on jedis-db-local - finished in 2 seconds (14 results)

    1 script code-injection
    2 getBytes(…) code-injection
    3 script code-injection
    4 script code-injection
    5 script code-injection
    6 script code-injection
    7 "return redis.call('get','foo')" code-injection
    8 "return redis.call('get','foo')" code-injection
    9 encode(…) code-injection
    10 encode(…) code-injection
    11 "return redis.call('get','foo')" code-injection
    12 "return redis.call('get','foo')" code-injection
    13 script code-injection
    14 "return {}" code-injection

Identify usage of injection-related models in existing queries

To verify whether existing CodeQL queries make use of the injection-related models, we can search for files in the ql/java and ql/cpp directories that contain the string -injection. This string often appears in taint-tracking configuration or query metadata.

Java Queries

The following command locates .ql and .qll files in the Java query suite that reference -injection:

  rg -l -- '-injection' ql/java | grep '\.qll*'

Example output:

  ql/java/ql/src/Security/CWE/CWE-643/XPathInjection.ql
  ql/java/ql/src/Security/CWE/CWE-078/ExecTainted.ql
  ql/java/ql/src/Security/CWE/CWE-022/TaintedPath.ql
  ql/java/ql/src/Security/CWE/CWE-117/LogInjection.ql
  ql/java/ql/src/Security/CWE/CWE-470/FragmentInjection.ql
  ql/java/ql/src/Security/CWE/CWE-470/FragmentInjectionInPreferenceActivity.ql
  ql/java/ql/src/Security/CWE/CWE-730/RegexInjection.ql
  ql/java/ql/lib/semmle/code/java/security/XsltInjection.qll
  ql/java/ql/src/Security/CWE/CWE-090/LdapInjection.ql
  ql/java/ql/lib/semmle/code/java/security/GroovyInjection.qll
  ql/java/ql/lib/semmle/code/java/security/XPath.qll
  ql/java/ql/lib/semmle/code/java/security/TaintedEnvironmentVariableQuery.qll
  ql/java/ql/src/Security/CWE/CWE-074/XsltInjection.ql
  ql/java/ql/src/Security/CWE/CWE-074/JndiInjection.ql
  ...
  ql/java/ql/src/utils/modelgenerator/internal/CaptureModels.qll

These files include both top-level queries (under src/Security/...) and reusable model libraries (under lib/semmle/...). Experimental and framework-specific queries are also included.

C++ Queries

Likewise, to check for C++ queries that reference -injection, use:

  rg -l -- '-injection' ql/cpp | grep '\.qll*'

Example output:

  ql/cpp/ql/src/Security/CWE/CWE-078/ExecTainted.ql
  ql/cpp/ql/src/Security/CWE/CWE-022/TaintedPath.ql
  ql/cpp/ql/src/experimental/Security/CWE/CWE-078/WordexpTainted.ql
  ql/cpp/ql/src/Security/CWE/CWE-089/SqlTainted.ql

These files indicate active use of injection-related taint tracking in the C++ suite as well.

TODO Modeling Gaps in SqlTainted.ql (Java)

The built-in SQL injection query ../ql/java/ql/src/Security/CWE/CWE-089/SqlTainted.ql correctly identifies the sink in the Jedis sample, but not the source. This is because java.io.Console.readLine() is modeled as a taint step, not a source. Since the model editor excludes functions that are already modeled in any capacity, this function is not visible for editing.

To detect the source, we must override or supplement the model manually—either by using the models-as-data mechanism or extending Customizations.qll with a new source declaration.

TODO Modeling SQLite as a Dependency

The directory ../codeql-sqlite-java/ contains a minimal Java sample derived from a prior workshop. It uses sqlite-jdbc-3.36.0.1.jar and serves as a small-scale test case for dependency-based modeling. This example is especially useful for illustrating subtle modeling issues.

In particular, it uses java.io.Console.readLine(), which is already modeled as a taint step. However, for SQL injection tracking, we need it to act as a source. Because of its preexisting status, it does not appear in the model editor. To handle this, we must add a manual source override—either as a raw YAML model or as a hardcoded entry via Customizations.qll.

TODO Creating a Vulnerable SQLite Sample for Query Visibility

To ensure that taint-based queries (e.g., SqlTainted.ql) identify vulnerable behavior, the sink function such as .eval() or sqlite3_exec() must actually be invoked in application code. It is not sufficient for the function to merely exist in a linked library or dependency. CodeQL analysis only considers reachable code in the source tree.

To address this, we modify the file ../codeql-sqlite-java/AddUser.java to include a realistic, vulnerable flow that mimics typical usage patterns. For example, the program should:

  1. Accept user input (e.g., via System.in, BufferedReader, or Console.readLine()),
  2. Store it in a variable without sanitization,
  3. Construct an SQL query using string concatenation,
  4. Call eval() or sqlite3_exec() with the tainted query.

This guarantees that the sink is both present and exercised, allowing built-in and custom CodeQL queries to detect the dataflow path from source to sink.

The same flow structure used in the Jedis version can be reused here. That way, we maintain consistency across modeling examples while switching the underlying dependency from Redis to SQLite.