- codeql-lab: Centralized Git Repository for CodeQL Development
- Prerequisites
- Repository Layout
- Possible Reading Orders
- Data Flow
- Modeling
- Review: SQLite Injection Workshop, Java
- Customizations via codeql (Java)
- Customizations via Model Editor: Jedis Example (Java Redis client)
- Customizations via Model Editor: Single-function case (Java SQLite sample)
- Review: SQLite Injection Workshop (C)
- Extending Queries with Customizations.qll for C
- Use models-as-data QL code directly (no graphical editor).
- codeql-bundling
- Tool Setup
codeql-lab: Centralized Git Repository for CodeQL Development
Overview
codeql-lab is a consolidated Git repository that collects all relevant CodeQL components, resources, and tooling into a single version-controlled location.
Purpose
The goal of this repository is to provide an integrated development environment (“lab”) for CodeQL research, experimentation, and custom query development. It simplifies setup by maintaining all required submodules, configuration files, and datasets in one place.
Repository Location
The primary repository is hosted at: https://github.com/hohn/codeql-lab
Intended Use Cases
- Local experimentation with CodeQL queries and libraries.
- End-to-end testing of custom model data and query logic. This includes writing and validating custom data flow models, adjusting model coverage, and confirming that query results behave as expected across controlled datasets. The lab setup supports rapid iteration on QL logic, helping detect unintended changes and enabling reproducible evaluations of taint tracking, control flow, or API usage patterns.
- Structured collaboration and controlled updates across all CodeQL-related artifacts.
- Simplified onboarding and reproducible setup for new contributors or analysis environments.
Prerequisites
Working with this repository assumes prior experience with:
- Git, Bash, and standard Unix command-line tools. These are used throughout and are required for setup and day-to-day tasks. Tools such as ripgrep, GNU Bash, and grep/regex workflows are assumed.
- At least one supported programming language, such as C, C++, Java, Python, Go, or Ruby. A solid understanding of the target language is necessary to interpret analysis results and write effective queries. See general background on programming languages if needed.
- Basic familiarity with program structure concepts, including abstract syntax trees (ASTs), control-flow graphs (CFGs), and data-flow graphs (DFGs). These are core to how CodeQL models code behavior.
- Optional but helpful: familiarity with structural or functional programming languages (e.g. Lisp or OCaml) can make working with CodeQL’s query language and type system more intuitive. See overview of functional programming for related context.
Repository Layout
Core Structure
- Repository is based on: https://github.com/github/vscode-codeql-starter.git
- All development work is done on the branch: qllab
-
CodeQL version is pinned via the
ql/submodule:commit 4d681f05bd671f8b5e31624f16a2b4d75e61c071 (tag: codeql-cli/v2.22.0)
-
A prebuilt CodeQL CLI binary is included:
1104625939 assets/codeql-osx64.zip
- Project-specific repositories can be added directly under the root.
Example: the C dataflow workshop in
./codeql-dataflow-sql-injection-c
Additional Structure Notes
- The original upstream README.md is preserved at ./README-vscode-codeql-starter.md
Possible Reading Orders
Data Flow
Debugging data flow config (instead of taint flow), Java
We can illustrate taint-flow debugging in the Java SQL injection sample
Debugging data flow config (instead of taint flow), C
Modeling
Review: SQLite Injection Workshop, Java
We begin with a recap of the Java-based injection example, focusing on the vulnerable code in AddUser.java. Following that, we examine a fully manual CodeQL query available in full-query.ql, which was written to explicitly trace tainted data through the program. Next, we explore the out-of-the-box query SqlTainted.ql included in the standard CodeQL packs, and conclude with an inspection of the relevant base classes and framework modeling in Illustrations.ql.
Customizations via codeql (Java)
To customize CodeQL for Java, we identify and extend base classes to add custom flow sources and sinks. A general explanation of this approach is available in the file README.org, particularly the section supplement codeql: Add to FlowSource or a subclass. For Java, java.qll includes Customizations.qll, which provides extension points for custom flow modeling – this structure is common across most CodeQL-supported languages, with the notable exception of C. Further details on this customization process can be found in incoming.codeql-customizations-workshop.md.
Customizations via Model Editor: Jedis Example (Java Redis client)
The Jedis example is a straightforward case with no unexpected behavior. Although the library contains many functions, they follow a simple and repetitive pattern, making it ideal for large-scale modeling. The CodeQL model editor can be used to efficiently define sources and sinks for such cases. A detailed explanation is provided in Modeling Jedis as a Dependency in Model Editor, while validation of the modeled sink is discussed in Verifying the Modeled Sink. Finally, the query-level usage of these models can be seen in Identify usage of injection-related models in existing queries.
Customizations via Model Editor: Single-function case (Java SQLite sample)
We extend the Java SQLite example using the model editor, with both the
necessary data and specification already available. This example highlights a
subtle issue with the model editor: the method java.io.Console.readLine() is
already modeled as a taint step and therefore does not appear in the editor
interface, even though we need it modeled as a source. This requires special
handling. The relevant extensions are defined in
./.github/codeql/extensions/sqlite-db/codeql-pack.yml, and the extension data
is provided in
./.github/codeql/extensions/sqlite-db/models/sqlite.model.yml. A detailed
explanation is available in Using sqlite to illustrate models-as-data.
To support this, we explain how the "models-as-data" system works internally. A diagnostic query can be used to enumerate currently recognized sources and sinks. From there, the relevant entry points – such as QL classes and predicates – can be identified by inspecting representative queries like SqlTainted.ql.
Review: SQLite Injection Workshop (C)
This is the C version of the workshop.
Extending Queries with Customizations.qll for C
While most CodeQL-supported languages provide out-of-the-box support for `Customizations.qll`, C and C++ do not include this by default. However, it is possible to enable such support by building a custom CodeQL bundle. This can be done using the CLI tool at https://github.com/advanced-security/codeql-bundle. Since the tool functions largely as a black box, we provide a more detailed illustration of the underlying steps.
A working demonstration is available in ./codeql-dataflow-sql-injection-c/README.org. In languages like Java, `Customizations.qll` is included automatically via imports from `<language>.qll`, such as java.qll importing Customizations.qll, which defines user-extensible predicates for flow modeling.
For C/C++, the process requires explicit modification:
- Modify `ql/cpp/ql/lib/cpp.qll` to import `Customizations.qll`.
- Create and populate `ql/cpp/ql/lib/Customizations.qll` with custom sources/sinks or extensions.
- Rebuild the CodeQL bundle to include these changes.
This customization enables consistent user-defined flow modeling across languages, making it possible to reuse modeling patterns from Java or Python in C/C++ contexts.
TODO Use models-as-data QL code directly (no graphical editor).
summary
- The model definition files exist
- Data files exist
- There is no editor
- Generate YAML manually
-
Use the C version of the SQLite injection workshop as reinforcement.
-
Apply models-as-data QL logic directly (no graphical editor).
- Add model for:
count = read(STDIN_FILENO, buf, BUFSIZE); - Add model for:
rc = sqlite3_exec(db, query, NULL, 0, &zErrMsg); - Reference Java version (structure only, not editor): Using sqlite to illustrate models-as-data
- C-specific walkthrough: Using sqlite to illustrate models-as-data
- Add model for:
- Manually define YAML models for standard functions (e.g.,
read) and test propagation via QL. -
customizations using models-as-data, via text
- continue with codeql-dataflow-sql-injection-c
- The ./ql/cpp/ql/src/Security/CWE/CWE-089/SqlTainted.ql query works out of the box
- Add
char* get_user_info()as extra source for illustration
TODO codeql-bundling
Tool Setup
Some scripts are used here, found in ./bin/. To ensure the ones written in Python have access to prerequites, set up a virtual environment via
# 1. Create the virtualenv
python3 -m venv ~/codeql-lab/venv
# 2. Install any packages
source ~/codeql-lab/venv/bin/activate
pip install pyyaml
For any of these scripts to work, add them to the PATH via
export PATH="$HOME/codeql-lab/bin:$PATH"