mirror of
https://github.com/hohn/codeql-lab.git
synced 2025-12-16 18:03:08 +01:00
180 lines
9.2 KiB
Org Mode
180 lines
9.2 KiB
Org Mode
* codeql-lab: Centralized Git Repository for CodeQL Development
|
||
|
||
** Overview
|
||
codeql-lab is a consolidated Git repository that collects all relevant
|
||
CodeQL components, resources, and tooling into a single
|
||
version-controlled location.
|
||
|
||
** Purpose
|
||
The goal of this repository is to provide an integrated development
|
||
environment (“lab”) for CodeQL research, experimentation, and custom
|
||
query development. It simplifies setup by maintaining all required
|
||
submodules, configuration files, and datasets in one place.
|
||
|
||
** Repository Location
|
||
The primary repository is hosted at:
|
||
https://github.com/hohn/codeql-lab
|
||
|
||
** Intended Use Cases
|
||
- Local experimentation with CodeQL queries and libraries.
|
||
- End-to-end testing of custom model data and query logic.
|
||
This includes writing and validating custom data flow models,
|
||
adjusting model coverage, and confirming that query results behave
|
||
as expected across controlled datasets. The lab setup supports rapid
|
||
iteration on QL logic, helping detect unintended changes and enabling
|
||
reproducible evaluations of taint tracking, control flow, or API usage
|
||
patterns.
|
||
- Structured collaboration and controlled updates across all
|
||
CodeQL-related artifacts.
|
||
- Simplified onboarding and reproducible setup for new contributors or
|
||
analysis environments.
|
||
|
||
* Prerequisites
|
||
|
||
Working with this repository assumes prior experience with:
|
||
|
||
- *Git, Bash, and standard Unix command-line tools*. These are used
|
||
throughout and are required for setup and day-to-day tasks.
|
||
Tools such as [[https://man.archlinux.org/man/rg.1][ripgrep]], [[https://www.gnu.org/software/bash/][GNU Bash]], and [[https://en.wikipedia.org/wiki/Grep][grep/regex workflows]] are assumed.
|
||
|
||
- *At least one supported programming language*, such as C, C++, Java,
|
||
Python, Go, or Ruby. A solid understanding of the target language is
|
||
necessary to interpret analysis results and write effective queries.
|
||
See general background on [[https://en.wikipedia.org/wiki/Programming_language][programming languages]] if needed.
|
||
|
||
- *Basic familiarity with program structure concepts*, including
|
||
[[https://en.wikipedia.org/wiki/Abstract_syntax_tree][abstract syntax trees (ASTs)]], [[https://en.wikipedia.org/wiki/Control-flow_graph][control-flow graphs (CFGs)]], and
|
||
[[https://en.wikipedia.org/wiki/Data-flow_analysis][data-flow graphs (DFGs)]]. These are core to how CodeQL models code behavior.
|
||
|
||
- *Optional but helpful*: familiarity with structural or functional
|
||
programming languages (e.g. [[https://en.wikipedia.org/wiki/Lisp_(programming_language)][Lisp]] or [[https://en.wikipedia.org/wiki/OCaml][OCaml]]) can make working with
|
||
CodeQL’s query language and type system more intuitive.
|
||
See overview of [[https://en.wikipedia.org/wiki/Functional_programming][functional programming]] for related context.
|
||
|
||
|
||
* Repository Layout
|
||
** Core Structure
|
||
- Repository is based on: https://github.com/github/vscode-codeql-starter.git
|
||
- All development work is done on the branch: qllab
|
||
- CodeQL version is pinned via the =ql/= submodule:
|
||
: commit 4d681f05bd671f8b5e31624f16a2b4d75e61c071 (tag: codeql-cli/v2.22.0)
|
||
- A prebuilt CodeQL CLI binary is included:
|
||
: 1104625939 assets/codeql-osx64.zip
|
||
- Project-specific repositories can be added directly under the root.
|
||
Example: the C dataflow workshop in =./codeql-dataflow-sql-injection-c=
|
||
|
||
** Additional Structure Notes
|
||
- The original upstream README.md is preserved at [[./README-vscode-codeql-starter.md]]
|
||
|
||
* Possible Reading Orders
|
||
|
||
** Data Flow
|
||
*** Debugging data flow config (instead of taint flow), Java
|
||
We can illustrate taint-flow debugging in the Java SQL injection sample
|
||
- [[./codeql-sqlite-java/TaintFlowDebugging.ql]]
|
||
- [[./codeql-sqlite-java/TaintFlowDebugging.md]]
|
||
|
||
*** Debugging data flow config (instead of taint flow), C
|
||
|
||
** Modeling
|
||
*** Review: SQLite Injection Workshop, Java
|
||
- Recap the Java-based injection example.
|
||
|
||
*** Customizations via codeql, java
|
||
- codeql-dataflow-sql-injection-c/README.org, [[file:codeql-dataflow-sql-injection-c/README.org::*supplement codeql: Add to FlowSource or a subclass][supplement codeql: Add to FlowSource or a subclass]]
|
||
- TODO raw md from staging: codeql-dataflow-sql-injection-c/incoming.codeql-customizations-workshop.md
|
||
|
||
*** Model Editor: Single-function case (Java, SQLite sample)
|
||
1. Extend the Java example using the model editor. The data and spec are present.
|
||
1. This sample illustrates a subtle problem with the model editor:
|
||
=java.io.Console.readLine()= is already modeled as a /taint step/ and
|
||
therefore does not appear in the editor. However, we need it modeled as a /source/,
|
||
which requires special handling.
|
||
2. Extension spec: [[file:~/work-gh/codeql-lab/.github/codeql/extensions/sqlite-db/codeql-pack.yml::name: pack/sqlite-db]]
|
||
3. Extension data: [[file:~/work-gh/codeql-lab/.github/codeql/extensions/sqlite-db/models/sqlite.model.yml::extensions:]]
|
||
4. Explanation: [[file:~/work-gh/codeql-lab/codeql-sqlite-java/README.org::*Using sqlite to illustrate models-as-data][Using sqlite to illustrate models-as-data]]
|
||
2. Explain how the "models-as-data" system works internally:
|
||
1. Use a diagnostic query to enumerate current sources and sinks.
|
||
2. Identify the relevant entry points (e.g., classes and QL predicates)
|
||
by inspecting representative queries such as:
|
||
[[file:~/work-gh/codeql-lab/ql/java/ql/src/Security/CWE/CWE-089/SqlTainted.ql::@name Query built from user-controlled sources]]
|
||
|
||
*** Model Editor: Jedis Example (Java Redis client)
|
||
1. This sample is straightforward and has no surprises.
|
||
2. There are many functions, but they all follow a simple, repetitive pattern.
|
||
3. Use the model editor to define sources and sinks at scale.
|
||
4. Explanation: [[file:~/work-gh/codeql-lab/codeql-jedis-java/README.org::*Modeling Jedis as a Dependency in Model Editor][Modeling Jedis as a Dependency in Model Editor]]
|
||
5. Validation: [[file:~/work-gh/codeql-lab/codeql-jedis-java/README.org::*Verifying the Modeled Sink][Verifying the Modeled Sink]]
|
||
6. Query usage: [[file:~/work-gh/codeql-lab/codeql-jedis-java/README.org::*Identify usage of injection-related models in existing queries][Identify usage of injection-related models in existing queries]]
|
||
|
||
*** TODO Review: SQLite Injection Workshop (C)
|
||
- C++ version of the workshop.
|
||
|
||
*** TODO Extending Queries with Customizations.qll for C
|
||
- Supported in most languages, but not C++ by default.
|
||
- Can be enabled by building a custom CodeQL bundle.
|
||
- Use this CLI tool: https://github.com/advanced-security/codeql-bundle
|
||
- Demonstrate using `codeql-lab`.
|
||
+ in [[./codeql-sqlite-java/README.org]]
|
||
+ ql/cpp/ql/lib/semmle/code/cpp/security/FlowSources.qll
|
||
#+BEGIN_SRC text
|
||
abstract class FlowSource extends DataFlow::Node
|
||
#+END_SRC
|
||
|
||
+ The other languages include Customizations.qll via <language.qll>, e.g.,
|
||
ql/python/ql/lib/python.qll
|
||
1. Modify
|
||
: ql/python/ql/lib/python.qll
|
||
2. Add
|
||
: ql/python/ql/lib/Customizations.qll
|
||
|
||
+ For C/C++,
|
||
1. Modify
|
||
: ql/cpp/ql/lib/cpp.qll
|
||
2. Add
|
||
: ql/cpp/ql/lib/Customizations.qll
|
||
|
||
*** TODO Use models-as-data QL code directly (no graphical editor).
|
||
summary
|
||
- The model definition files exist
|
||
- Data files exist
|
||
- There is no editor
|
||
- Generate YAML manually.
|
||
|
||
- Use the C version of the SQLite injection workshop as reinforcement.
|
||
1. Code: [[file:~/work-gh/codeql-lab/codeql-dataflow-sql-injection-c/add-user.c]]
|
||
2. Query: [[file:~/work-gh/codeql-lab/codeql-dataflow-sql-injection-c/SqlInjection.ql]]
|
||
- Apply models-as-data QL logic directly (no graphical editor).
|
||
1. [ ] Add model for: =count = read(STDIN_FILENO, buf, BUFSIZE);=
|
||
2. [ ] Add model for: =rc = sqlite3_exec(db, query, NULL, 0, &zErrMsg);=
|
||
3. [X] Reference Java version (structure only, not editor): [[file:~/work-gh/codeql-lab/codeql-sqlite-java/README.org::*Using sqlite to illustrate models-as-data][Using sqlite to illustrate models-as-data]]
|
||
4. [ ] C-specific walkthrough: [[file:~/work-gh/codeql-lab/codeql-dataflow-sql-injection-c/README.org::*Using sqlite to illustrate models-as-data][Using sqlite to illustrate models-as-data]]
|
||
- Manually define YAML models for standard functions (e.g., =read=) and test propagation via QL.
|
||
|
||
- customizations using models-as-data, via text
|
||
- continue with codeql-dataflow-sql-injection-c
|
||
- The ./ql/cpp/ql/src/Security/CWE/CWE-089/SqlTainted.ql query works out of
|
||
the box
|
||
- Add =char* get_user_info()= as extra source for illustration
|
||
|
||
|
||
** TODO codeql-bundling
|
||
|
||
* Tool Setup
|
||
Some scripts are used here, found in [[./bin/]]. To ensure the ones written in
|
||
Python have access to prerequites, set up a virtual environment via
|
||
#+BEGIN_SRC sh
|
||
# 1. Create the virtualenv
|
||
python3 -m venv ~/codeql-lab/venv
|
||
|
||
# 2. Install any packages
|
||
source ~/codeql-lab/venv/bin/activate
|
||
pip install pyyaml
|
||
#+END_SRC
|
||
|
||
For any of these scripts to work, add them to the PATH via
|
||
#+BEGIN_SRC sh
|
||
export PATH="$HOME/codeql-lab/bin:$PATH"
|
||
#+END_SRC
|
||
|