Files
codeql-lab/README.org

10 KiB
Raw Blame History

codeql-lab: Centralized Git Repository for CodeQL Development

Overview

codeql-lab is a consolidated Git repository that collects all relevant CodeQL components, resources, and tooling into a single version-controlled location.

Purpose

The goal of this repository is to provide an integrated development environment (“lab”) for CodeQL research, experimentation, and custom query development. It simplifies setup by maintaining all required submodules, configuration files, and datasets in one place.

Repository Location

The primary repository is hosted at: https://github.com/hohn/codeql-lab

Intended Use Cases

  • Local experimentation with CodeQL queries and libraries.
  • End-to-end testing of custom model data and query logic. This includes writing and validating custom data flow models, adjusting model coverage, and confirming that query results behave as expected across controlled datasets. The lab setup supports rapid iteration on QL logic, helping detect unintended changes and enabling reproducible evaluations of taint tracking, control flow, or API usage patterns.
  • Structured collaboration and controlled updates across all CodeQL-related artifacts.
  • Simplified onboarding and reproducible setup for new contributors or analysis environments.

Prerequisites

Working with this repository assumes prior experience with:

  • Git, Bash, and standard Unix command-line tools. These are used throughout and are required for setup and day-to-day tasks. Tools such as ripgrep, GNU Bash, and grep/regex workflows are assumed.
  • At least one supported programming language, such as C, C++, Java, Python, Go, or Ruby. A solid understanding of the target language is necessary to interpret analysis results and write effective queries. See general background on programming languages if needed.
  • Basic familiarity with program structure concepts, including abstract syntax trees (ASTs), control-flow graphs (CFGs), and data-flow graphs (DFGs). These are core to how CodeQL models code behavior.
  • Optional but helpful: familiarity with structural or functional programming languages (e.g. Lisp or OCaml) can make working with CodeQLs query language and type system more intuitive. See overview of functional programming for related context.

Repository Layout

Core Structure

  • Repository is based on: https://github.com/github/vscode-codeql-starter.git
  • All development work is done on the branch: qllab
  • CodeQL version is pinned via the ql/ submodule:

    commit 4d681f05bd671f8b5e31624f16a2b4d75e61c071 (tag: codeql-cli/v2.22.0)
    
  • A prebuilt CodeQL CLI binary is included:

    1104625939  assets/codeql-osx64.zip
    
  • Project-specific repositories can be added directly under the root. Example: the C dataflow workshop in ./codeql-dataflow-sql-injection-c

Additional Structure Notes

Possible Reading Orders

Data Flow

Debugging data flow config (instead of taint flow), Java

We can illustrate taint-flow debugging in the Java SQL injection sample

Debugging data flow config (instead of taint flow), C

Modeling

Review: SQLite Injection Workshop, Java

We begin with a recap of the Java-based injection example, focusing on the vulnerable code in AddUser.java. Following that, we examine a fully manual CodeQL query available in full-query.ql, which was written to explicitly trace tainted data through the program. Next, we explore the out-of-the-box query SqlTainted.ql included in the standard CodeQL packs, and conclude with an inspection of the relevant base classes and framework modeling in Illustrations.ql.

Customizations via codeql (Java)

To customize CodeQL for Java, we identify and extend base classes to add custom flow sources and sinks. A general explanation of this approach is available in the file README.org, particularly the section supplement codeql: Add to FlowSource or a subclass. For Java, java.qll includes Customizations.qll, which provides extension points for custom flow modeling this structure is common across most CodeQL-supported languages, with the notable exception of C. Further details on this customization process can be found in incoming.codeql-customizations-workshop.md.

Model Editor: Single-function case (Java SQLite sample)

  1. Extend the Java example using the model editor. The data and spec are present.

    1. This sample illustrates a subtle problem with the model editor: java.io.Console.readLine() is already modeled as a taint step and therefore does not appear in the editor. However, we need it modeled as a source, which requires special handling.
    2. Extensions included: ./.github/codeql/extensions/sqlite-db/codeql-pack.yml
    3. Extension data: ./.github/codeql/extensions/sqlite-db/models/sqlite.model.yml
    4. Explanation: Using sqlite to illustrate models-as-data
  2. Explain how the "models-as-data" system works internally:

    1. Use a diagnostic query to enumerate current sources and sinks.
    2. Identify the relevant entry points (e.g., classes and QL predicates) by inspecting representative queries such as: ~/work-gh/codeql-lab/ql/java/ql/src/Security/CWE/CWE-089/SqlTainted.ql::@name Query built from user-controlled sources

Model Editor: Jedis Example (Java Redis client)

  1. This sample is straightforward and has no surprises.
  2. There are many functions, but they all follow a simple, repetitive pattern.
  3. Use the model editor to define sources and sinks at scale.
  4. Explanation: Modeling Jedis as a Dependency in Model Editor
  5. Validation: Verifying the Modeled Sink
  6. Query usage: Identify usage of injection-related models in existing queries

TODO Review: SQLite Injection Workshop (C)

  • C++ version of the workshop.

TODO Extending Queries with Customizations.qll for C

  • Supported in most languages, but not C++ by default.
  • Can be enabled by building a custom CodeQL bundle.
  • Use this CLI tool: https://github.com/advanced-security/codeql-bundle
  • Demonstrate using `codeql-lab`.

    • in ./codeql-sqlite-java/README.org
    • ql/cpp/ql/lib/semmle/code/cpp/security/FlowSources.qll

        abstract class FlowSource extends DataFlow::Node
    • The other languages include Customizations.qll via <language.qll>, e.g., ql/python/ql/lib/python.qll

      1. Modify

        ql/python/ql/lib/python.qll
        
      2. Add

        ql/python/ql/lib/Customizations.qll
        
    • For C/C++,

      1. Modify

        ql/cpp/ql/lib/cpp.qll
        
      2. Add

        ql/cpp/ql/lib/Customizations.qll
        

TODO Use models-as-data QL code directly (no graphical editor).

summary

TODO codeql-bundling

Tool Setup

Some scripts are used here, found in ./bin/. To ensure the ones written in Python have access to prerequites, set up a virtual environment via

  # 1. Create the virtualenv
  python3 -m venv ~/codeql-lab/venv

  # 2. Install any packages
  source ~/codeql-lab/venv/bin/activate
  pip install pyyaml

For any of these scripts to work, add them to the PATH via

  export PATH="$HOME/codeql-lab/bin:$PATH"