Files
codeql-lab/codeql-bundling/README.org

301 lines
14 KiB
Org Mode

* CodeQL Bundling
The ultimate form of customizing CodeQL queries is building custom bundles.
This process is typically treated as a black box. That approach introduces
significant problems, especially when multiple bundles must be unified or merged.
The purpose of this module is to illustrate the steps and components involved
in bundling, assuming a solid understanding of Unix tools.
From a high-level deployment perspective, the typical flow is:
- Obtain released bundle B_o
- Modify it to create custom bundle B_m
- Run =codeql database analyze= with B_m to produce results R_m
- Review and post-process R_m
At this level, bundling appears trivial. The complexity arises within the
intermediate steps — especially modifying or composing bundles.
A "black-box" bundler is available at:
https://github.com/advanced-security/codeql-bundle.git
It is also included here as a submodule: [[../extern/codeql-bundle/]]
The following sections examine each step in detail.
** Get released bundle B_o
This is straightforward. Download a prebuilt CodeQL bundle:
#+BEGIN_SRC sh
wget https://github.com/github/codeql-action/releases/download/codeql-bundle-v2.20.0/codeql-bundle-linux64.tar.gz
wget https://github.com/github/codeql-action/releases/download/codeql-bundle-v2.20.0/codeql-bundle-osx64.tar.gz
#+END_SRC
These tarballs are also included in this repository via =git lfs= under [[../assets]].
** Modify bundle B_m
To construct a custom bundle B_m from the base bundle B_o:
1. Unpack the bundle
A CodeQL bundle is a self-contained tarball including the CLI and all
standard query/library packs.
#+BEGIN_SRC sh
# extract original
cd ~/codeql-lab
mkdir tmp.bundle
tar -zxf assets/codeql-bundle-osx64.tar.gz -C tmp.bundle
#+END_SRC
2. Understand the bundle layout
Key directories:
- =codeql/= — the CLI executable
- =tools/= — helper tools, rarely needed directly
- =qlpacks/= — all QL packs: libraries and queries
Each pack is a directory containing:
- =.packinfo= - info about extensible predicates
- =codeql-pack.yml= — pack metadata and dependencies, including modelling
extensions (which are used by extensible predicates)
- =src/= — QL libraries and queries
- =test/= — optional regression test cases
In the shell:
- Get information about and source code of extensible predicates.
#+BEGIN_SRC sh
#
cd ~/codeql-lab/tmp.bundle/codeql
ls qlpacks/codeql/
# controlflow/ go-all/ javascript-queries/ ruby-examples/ threat-models/
# cpp-all/ go-examples/ mad/ ruby-queries/ tutorial/
# cpp-examples/ go-queries/ python-all/ rust-all/ typeflow/
# cpp-queries/ java-all/ python-examples/ rust-queries/ typetracking/
# csharp-all/ java-examples/ python-queries/ ssa/ typos/
# csharp-examples/ java-queries/ rangeanalysis/ suite-helpers/ util/
# csharp-queries/ javascript-all/ regex/ swift-all/ xml/
# dataflow/ javascript-examples/ ruby-all/ swift-queries/ yaml/
# extensible predicates are listed:
jq .< qlpacks/codeql/cpp-all/3.0.0/.packinfo |less
# Indent in HOV Box style
hovjson < qlpacks/codeql/cpp-all/3.0.0/.packinfo | less
# {
# "extensible_predicate_metadata": {
# "extensible_predicates": [
# {
# "name": "sourceModel",
# "parameters": [
# {"name": "namespace","type": "string"},
# {"name": "type","type": "string"},
# {"name": "subtypes","type": "boolean"},
# {"name": "name","type": "string"},
# {"name": "signature","type": "string"},
# {"name": "ext","type": "string"},
# {"name": "output","type": "string"},
# {"name": "kind","type": "string"},
# {"name": "provenance","type": "string"}
# ],
# "path": "semmle/code/cpp/dataflow/internal/ExternalFlowExtensions.qll",
# Following this, rooted at .../3.0.0, shows us the source of the predicates
less qlpacks/codeql/cpp-all/3.0.0/semmle/code/cpp/dataflow/internal/ExternalFlowExtensions.qll
#+END_SRC
- Examine extension data files (yaml)
#+BEGIN_SRC sh
#
tail -4 ~/codeql-lab/tmp.bundle/codeql/qlpacks/codeql/cpp-all/3.0.0/qlpack.yml
# dataExtensions:
# - ext/*.model.yml
# - ext/deallocation/*.model.yml
# - ext/allocation/*.model.yml
#
# Is the read() function from the line
rg read ~/codeql-lab/codeql-dataflow-sql-injection-c/add-user.c
# 52: count = read(STDIN_FILENO, buf, BUFSIZE - 1);
# present?
#
# Does not look like it
rg -i read ~/codeql-lab/tmp.bundle/codeql/qlpacks/codeql/cpp-all/3.0.0/ext/
# ~/codeql-lab/tmp.bundle/codeql/qlpacks/codeql/cpp-all/3.0.0/ext/Boost.Asio.model.yml
# 7: - ["boost::asio", "", False, "read", "", "", "Argument[*1]", "remote", "manual"]
# 8: - ["boost::asio", "", False, "read_at", "", "", "Argument[*2]", "remote", "manual"]
# 9: - ["boost::asio", "", False, "read_until", "", "", "Argument[*1]", "remote", "manual"]
# 10: - ["boost::asio", "", False, "async_read", "", "", "Argument[*1]", "remote", "manual"]
# 11: - ["boost::asio", "", False, "async_read_at", "", "", "Argument[*2]", "remote", "manual"]
# 12: - ["boost::asio", "", False, "async_read_until", "", "", "Argument[*1]", "remote", "manual"]
#
# or the broader search
rg -i read ~/codeql-lab/tmp.bundle/codeql/qlpacks/codeql/cpp* --type=yaml
# ~/codeql-lab/tmp.bundle/codeql/qlpacks/codeql/cpp-all/3.0.0/ext/Boost.Asio.model.yml
# 7: - ["boost::asio", "", False, "read", "", "", "Argument[*1]", "remote", "manual"]
# 8: - ["boost::asio", "", False, "read_at", "", "", "Argument[*2]", "remote", "manual"]
# 9: - ["boost::asio", "", False, "read_until", "", "", "Argument[*1]", "remote", "manual"]
# 10: - ["boost::asio", "", False, "async_read", "", "", "Argument[*1]", "remote", "manual"]
# 11: - ["boost::asio", "", False, "async_read_at", "", "", "Argument[*2]", "remote", "manual"]
# 12: - ["boost::asio", "", False, "async_read_until", "", "", "Argument[*1]", "remote", "manual"]
#+END_SRC
- note the entry alignment and types between json spec and the yaml data
#+BEGIN_SRC sh
cd ~/codeql-lab/tmp.bundle/codeql
hovjson < qlpacks/codeql/cpp-all/3.0.0/.packinfo | head -16
# {
# "extensible_predicate_metadata": {
# "extensible_predicates": [
# {
# "name": "sourceModel",
# "parameters": [
# {"name": "namespace","type": "string"},
# {"name": "type","type": "string"},
# {"name": "subtypes","type": "boolean"},
# {"name": "name","type": "string"},
# {"name": "signature","type": "string"},
# {"name": "ext","type": "string"},
# {"name": "output","type": "string"},
# {"name": "kind","type": "string"},
# {"name": "provenance","type": "string"}
# ],
# In table form, for sourceModel
jq '.extensible_predicate_metadata.extensible_predicates[]
| select(.name == "sourceModel")
| .parameters[]
| .name ' < qlpacks/codeql/cpp-all/3.0.0/.packinfo
# "namespace"
# "type"
# "subtypes"
# "name"
# "signature"
# "ext"
# "output"
# "kind"
# "provenance"
rg -A 2 sourceModel qlpacks/codeql/cpp-all/3.0.0/ext/Boost.Asio.model.yml
# 5: extensible: sourceModel
# 6- data: # namespace, type, subtypes, name, signature, ext, output, kind, provenance
# 7- - ["boost::asio", "", False, "read", "", "", "Argument[*1]", "remote", "manual"]
#+END_SRC
In table form, these are
| "namespace" | "boost::asio" |
| "type" | "" |
| "subtypes" | False |
| "name" | "read" |
| "signature" | "" |
| "ext" | "" |
| "output" | "Argument[*1]" |
| "kind" | "remote" |
| "provenance" | "manual" |
# {"name": "namespace","type": "string"},
# {"name": "type","type": "string"},
# {"name": "subtypes","type": "boolean"},
# {"name": "name","type": "string"},
# {"name": "signature","type": "string"},
# {"name": "ext","type": "string"},
# {"name": "output","type": "string"},
# {"name": "kind","type": "string"},
# {"name": "provenance","type": "string"}
- <<XX: continue>> Check the Customizations.qll files, for extending /existing/ queries via
/custom/ codeql. Note that there isn't one for C++, but it can be added.
#+BEGIN_SRC sh
cd ~/codeql-lab/tmp.bundle/codeql/qlpacks/codeql && find * -name Customizations.qll
# csharp-all/4.0.0/Customizations.qll
# csharp-examples/0.0.0/.codeql/libraries/codeql/csharp-all/4.0.0/Customizations.qll
# csharp-queries/1.0.13/.codeql/libraries/codeql/csharp-all/4.0.0/Customizations.qll
# go-all/3.0.0/Customizations.qll
# go-examples/0.0.0/.codeql/libraries/codeql/go-all/3.0.0/Customizations.qll
# go-queries/1.1.4/.codeql/libraries/codeql/go-all/3.0.0/Customizations.qll
# java-all/5.0.0/Customizations.qll
# java-examples/0.0.0/.codeql/libraries/codeql/java-all/5.0.0/Customizations.qll
# java-queries/1.1.10/.codeql/libraries/codeql/java-all/5.0.0/Customizations.qll
# javascript-all/2.2.0/Customizations.qll
# javascript-examples/0.0.0/.codeql/libraries/codeql/javascript-all/2.2.0/Customizations.qll
# javascript-queries/1.2.5/.codeql/libraries/codeql/javascript-all/2.2.0/Customizations.qll
# python-all/3.0.0/Customizations.qll
# python-examples/0.0.0/.codeql/libraries/codeql/python-all/3.0.0/Customizations.qll
# python-queries/1.3.4/.codeql/libraries/codeql/python-all/3.0.0/Customizations.qll
# ruby-all/3.0.0/Customizations.qll
# ruby-examples/0.0.0/.codeql/libraries/codeql/ruby-all/3.0.0/Customizations.qll
# ruby-queries/1.1.8/.codeql/libraries/codeql/ruby-all/3.0.0/Customizations.qll
#+END_SRC
3. Make customizations
1. Choose a target pack to modify, e.g., =codeql/java-queries=
#+BEGIN_SRC sh
cd qlpacks/codeql/java-queries
#+END_SRC
2. Decide on customization approach; you can use one or both.
1. models-as-data. This means adding yaml files to provide data for
extensible predicates.
2. Extend existing classes. This means adding subclasses and predicates
for base classes, to extend sink/source/flow/barrier definitions.
3. Add or modify QL modules
1. Create =Customizations.qll= in =src/=
2. Import and extend existing modules/predicates
4. Add or modify =.ql= files using your new predicates
5. Optionally run tests for the modified pack:
#+BEGIN_SRC sh
codeql test run .
#+END_SRC
4. Optionally add a new QL pack
If your changes are substantial or logically separate:
- Create a new directory, e.g., =qlpacks/myorg/custom-queries=
- Add a =codeql-pack.yml= file:
#+BEGIN_SRC yaml
name: myorg/custom-queries
version: 0.0.1
dependencies:
codeql/java-queries: "*"
#+END_SRC
- Add QL source files in =src/=
- Use =import codeql/java-queries= to reuse existing logic
- Add =qlpack.yml= if using =codeql pack install= or testing separately
5. Rebundle (optional)
The unpacked tree can be used directly with =codeql database analyze=.
But to distribute or version the result, repackage it:
#+BEGIN_SRC sh
tar czf codeql-bundle-custom.tar.gz codeql-bundle/
#+END_SRC
*XX: REFERENCE* B_m is now a customized bundle containing your logic.
** =codeql database analyze= with B_m, get results R_m
Run the usual analysis command using your modified bundle B_m:
#+BEGIN_SRC sh
./codeql/codeql database analyze \
<database> \
--format=sarifv2.1.0 \
--output=results.sarif \
myorg/custom-queries
#+END_SRC
Adjust =--format= and output path as needed. You may also specify individual
=.ql= files or packs.
** Review / process R_m
Review or consume results as normal:
- Use the CodeQL extension or SARIF tools to explore the output
- Export summaries or ingest into downstream pipelines
- Consider writing postprocessors for bulk result handling