Files
codeql-lab/README.org

180 lines
9.2 KiB
Org Mode
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

* codeql-lab: Centralized Git Repository for CodeQL Development
** Overview
codeql-lab is a consolidated Git repository that collects all relevant
CodeQL components, resources, and tooling into a single
version-controlled location.
** Purpose
The goal of this repository is to provide an integrated development
environment (“lab”) for CodeQL research, experimentation, and custom
query development. It simplifies setup by maintaining all required
submodules, configuration files, and datasets in one place.
** Repository Location
The primary repository is hosted at:
https://github.com/hohn/codeql-lab
** Intended Use Cases
- Local experimentation with CodeQL queries and libraries.
- End-to-end testing of custom model data and query logic.
This includes writing and validating custom data flow models,
adjusting model coverage, and confirming that query results behave
as expected across controlled datasets. The lab setup supports rapid
iteration on QL logic, helping detect unintended changes and enabling
reproducible evaluations of taint tracking, control flow, or API usage
patterns.
- Structured collaboration and controlled updates across all
CodeQL-related artifacts.
- Simplified onboarding and reproducible setup for new contributors or
analysis environments.
* Prerequisites
Working with this repository assumes prior experience with:
- *Git, Bash, and standard Unix command-line tools*. These are used
throughout and are required for setup and day-to-day tasks.
Tools such as [[https://man.archlinux.org/man/rg.1][ripgrep]], [[https://www.gnu.org/software/bash/][GNU Bash]], and [[https://en.wikipedia.org/wiki/Grep][grep/regex workflows]] are assumed.
- *At least one supported programming language*, such as C, C++, Java,
Python, Go, or Ruby. A solid understanding of the target language is
necessary to interpret analysis results and write effective queries.
See general background on [[https://en.wikipedia.org/wiki/Programming_language][programming languages]] if needed.
- *Basic familiarity with program structure concepts*, including
[[https://en.wikipedia.org/wiki/Abstract_syntax_tree][abstract syntax trees (ASTs)]], [[https://en.wikipedia.org/wiki/Control-flow_graph][control-flow graphs (CFGs)]], and
[[https://en.wikipedia.org/wiki/Data-flow_analysis][data-flow graphs (DFGs)]]. These are core to how CodeQL models code behavior.
- *Optional but helpful*: familiarity with structural or functional
programming languages (e.g. [[https://en.wikipedia.org/wiki/Lisp_(programming_language)][Lisp]] or [[https://en.wikipedia.org/wiki/OCaml][OCaml]]) can make working with
CodeQLs query language and type system more intuitive.
See overview of [[https://en.wikipedia.org/wiki/Functional_programming][functional programming]] for related context.
* Repository Layout
** Core Structure
- Repository is based on: https://github.com/github/vscode-codeql-starter.git
- All development work is done on the branch: qllab
- CodeQL version is pinned via the =ql/= submodule:
: commit 4d681f05bd671f8b5e31624f16a2b4d75e61c071 (tag: codeql-cli/v2.22.0)
- A prebuilt CodeQL CLI binary is included:
: 1104625939 assets/codeql-osx64.zip
- Project-specific repositories can be added directly under the root.
Example: the C dataflow workshop in =./codeql-dataflow-sql-injection-c=
** Additional Structure Notes
- The original upstream README.md is preserved at [[./README-vscode-codeql-starter.md]]
* Possible Reading Orders
** Data Flow
*** Debugging data flow config (instead of taint flow), Java
We can illustrate taint-flow debugging in the Java SQL injection sample
- [[./codeql-sqlite-java/TaintFlowDebugging.ql]]
- [[./codeql-sqlite-java/TaintFlowDebugging.md]]
*** Debugging data flow config (instead of taint flow), C
** Modeling
*** Review: SQLite Injection Workshop, Java
- Recap the Java-based injection example.
*** Customizations via codeql, java
- codeql-dataflow-sql-injection-c/README.org, [[file:codeql-dataflow-sql-injection-c/README.org::*supplement codeql: Add to FlowSource or a subclass][supplement codeql: Add to FlowSource or a subclass]]
- TODO raw md from staging: codeql-dataflow-sql-injection-c/incoming.codeql-customizations-workshop.md
*** Model Editor: Single-function case (Java, SQLite sample)
1. Extend the Java example using the model editor. The data and spec are present.
1. This sample illustrates a subtle problem with the model editor:
=java.io.Console.readLine()= is already modeled as a /taint step/ and
therefore does not appear in the editor. However, we need it modeled as a /source/,
which requires special handling.
2. Extension spec: [[file:~/work-gh/codeql-lab/.github/codeql/extensions/sqlite-db/codeql-pack.yml::name: pack/sqlite-db]]
3. Extension data: [[file:~/work-gh/codeql-lab/.github/codeql/extensions/sqlite-db/models/sqlite.model.yml::extensions:]]
4. Explanation: [[file:~/work-gh/codeql-lab/codeql-sqlite-java/README.org::*Using sqlite to illustrate models-as-data][Using sqlite to illustrate models-as-data]]
2. Explain how the "models-as-data" system works internally:
1. Use a diagnostic query to enumerate current sources and sinks.
2. Identify the relevant entry points (e.g., classes and QL predicates)
by inspecting representative queries such as:
[[file:~/work-gh/codeql-lab/ql/java/ql/src/Security/CWE/CWE-089/SqlTainted.ql::@name Query built from user-controlled sources]]
*** Model Editor: Jedis Example (Java Redis client)
1. This sample is straightforward and has no surprises.
2. There are many functions, but they all follow a simple, repetitive pattern.
3. Use the model editor to define sources and sinks at scale.
4. Explanation: [[file:~/work-gh/codeql-lab/codeql-jedis-java/README.org::*Modeling Jedis as a Dependency in Model Editor][Modeling Jedis as a Dependency in Model Editor]]
5. Validation: [[file:~/work-gh/codeql-lab/codeql-jedis-java/README.org::*Verifying the Modeled Sink][Verifying the Modeled Sink]]
6. Query usage: [[file:~/work-gh/codeql-lab/codeql-jedis-java/README.org::*Identify usage of injection-related models in existing queries][Identify usage of injection-related models in existing queries]]
*** TODO Review: SQLite Injection Workshop (C)
- C++ version of the workshop.
*** TODO Extending Queries with Customizations.qll for C
- Supported in most languages, but not C++ by default.
- Can be enabled by building a custom CodeQL bundle.
- Use this CLI tool: https://github.com/advanced-security/codeql-bundle
- Demonstrate using `codeql-lab`.
+ in [[./codeql-sqlite-java/README.org]]
+ ql/cpp/ql/lib/semmle/code/cpp/security/FlowSources.qll
#+BEGIN_SRC text
abstract class FlowSource extends DataFlow::Node
#+END_SRC
+ The other languages include Customizations.qll via <language.qll>, e.g.,
ql/python/ql/lib/python.qll
1. Modify
: ql/python/ql/lib/python.qll
2. Add
: ql/python/ql/lib/Customizations.qll
+ For C/C++,
1. Modify
: ql/cpp/ql/lib/cpp.qll
2. Add
: ql/cpp/ql/lib/Customizations.qll
*** TODO Use models-as-data QL code directly (no graphical editor).
summary
- The model definition files exist
- Data files exist
- There is no editor
- Generate YAML manually.
- Use the C version of the SQLite injection workshop as reinforcement.
1. Code: [[file:~/work-gh/codeql-lab/codeql-dataflow-sql-injection-c/add-user.c]]
2. Query: [[file:~/work-gh/codeql-lab/codeql-dataflow-sql-injection-c/SqlInjection.ql]]
- Apply models-as-data QL logic directly (no graphical editor).
1. [ ] Add model for: =count = read(STDIN_FILENO, buf, BUFSIZE);=
2. [ ] Add model for: =rc = sqlite3_exec(db, query, NULL, 0, &zErrMsg);=
3. [X] Reference Java version (structure only, not editor): [[file:~/work-gh/codeql-lab/codeql-sqlite-java/README.org::*Using sqlite to illustrate models-as-data][Using sqlite to illustrate models-as-data]]
4. [ ] C-specific walkthrough: [[file:~/work-gh/codeql-lab/codeql-dataflow-sql-injection-c/README.org::*Using sqlite to illustrate models-as-data][Using sqlite to illustrate models-as-data]]
- Manually define YAML models for standard functions (e.g., =read=) and test propagation via QL.
- customizations using models-as-data, via text
- continue with codeql-dataflow-sql-injection-c
- The ./ql/cpp/ql/src/Security/CWE/CWE-089/SqlTainted.ql query works out of
the box
- Add =char* get_user_info()= as extra source for illustration
** TODO codeql-bundling
* Tool Setup
Some scripts are used here, found in [[./bin/]]. To ensure the ones written in
Python have access to prerequites, set up a virtual environment via
#+BEGIN_SRC sh
# 1. Create the virtualenv
python3 -m venv ~/codeql-lab/venv
# 2. Install any packages
source ~/codeql-lab/venv/bin/activate
pip install pyyaml
#+END_SRC
For any of these scripts to work, add them to the PATH via
#+BEGIN_SRC sh
export PATH="$HOME/codeql-lab/bin:$PATH"
#+END_SRC