CodeQL Bundling
The ultimate form of customizing CodeQL queries is building custom bundles. This process is typically treated as a black box. That approach introduces significant problems, especially when multiple bundles must be unified or merged.
The purpose of this module is to illustrate the steps and components involved in bundling, assuming a solid understanding of Unix tools.
From a high-level deployment perspective, the typical flow is:
- Obtain released bundle B_o
- Modify it to create custom bundle B_m
- Run
codeql database analyzewith B_m to produce results R_m - Review and post-process R_m
At this level, bundling appears trivial. The complexity arises within the intermediate steps — especially modifying or composing bundles.
A "black-box" bundler is available at: https://github.com/advanced-security/codeql-bundle.git It is also included here as a submodule: ../extern/codeql-bundle/
The following sections examine each step in detail.
Get released bundle B_o
This is straightforward. Download a prebuilt CodeQL bundle:
wget https://github.com/github/codeql-action/releases/download/codeql-bundle-v2.20.0/codeql-bundle-linux64.tar.gz
wget https://github.com/github/codeql-action/releases/download/codeql-bundle-v2.20.0/codeql-bundle-osx64.tar.gz
These tarballs are also included in this repository via git lfs under ../assets.
Modify bundle B_m
To construct a custom bundle B_m from the base bundle B_o:
-
Unpack the bundle
A CodeQL bundle is a self-contained tarball including the CLI and all standard query/library packs.
# extract original cd ~/codeql-lab mkdir tmp.bundle tar -zxf assets/codeql-bundle-osx64.tar.gz -C tmp.bundle -
Understand the bundle layout
Key directories:
codeql/— the CLI executabletools/— helper tools, rarely needed directlyqlpacks/— all QL packs: libraries and queries
Each pack is a directory containing:
.packinfo- info about extensible predicatescodeql-pack.yml— pack metadata and dependencies, including modelling extensions (which are used by extensible predicates)src/— QL libraries and queriestest/— optional regression test cases
In the shell:
-
Get information about and source code of extensible predicates.
# cd ~/codeql-lab/tmp.bundle/codeql ls qlpacks/codeql/ # controlflow/ go-all/ javascript-queries/ ruby-examples/ threat-models/ # cpp-all/ go-examples/ mad/ ruby-queries/ tutorial/ # cpp-examples/ go-queries/ python-all/ rust-all/ typeflow/ # cpp-queries/ java-all/ python-examples/ rust-queries/ typetracking/ # csharp-all/ java-examples/ python-queries/ ssa/ typos/ # csharp-examples/ java-queries/ rangeanalysis/ suite-helpers/ util/ # csharp-queries/ javascript-all/ regex/ swift-all/ xml/ # dataflow/ javascript-examples/ ruby-all/ swift-queries/ yaml/ # extensible predicates are listed: jq .< qlpacks/codeql/cpp-all/3.0.0/.packinfo |less # Indent in HOV Box style hovjson < qlpacks/codeql/cpp-all/3.0.0/.packinfo | less # { # "extensible_predicate_metadata": { # "extensible_predicates": [ # { # "name": "sourceModel", # "parameters": [ # {"name": "namespace","type": "string"}, # {"name": "type","type": "string"}, # {"name": "subtypes","type": "boolean"}, # {"name": "name","type": "string"}, # {"name": "signature","type": "string"}, # {"name": "ext","type": "string"}, # {"name": "output","type": "string"}, # {"name": "kind","type": "string"}, # {"name": "provenance","type": "string"} # ], # "path": "semmle/code/cpp/dataflow/internal/ExternalFlowExtensions.qll", # Following this, rooted at .../3.0.0, shows us the source of the predicates less qlpacks/codeql/cpp-all/3.0.0/semmle/code/cpp/dataflow/internal/ExternalFlowExtensions.qll -
Examine extension data files (yaml)
# tail -4 ~/codeql-lab/tmp.bundle/codeql/qlpacks/codeql/cpp-all/3.0.0/qlpack.yml # dataExtensions: # - ext/*.model.yml # - ext/deallocation/*.model.yml # - ext/allocation/*.model.yml # # Is the read() function from the line rg read ~/codeql-lab/codeql-dataflow-sql-injection-c/add-user.c # 52: count = read(STDIN_FILENO, buf, BUFSIZE - 1); # present? # # Does not look like it rg -i read ~/codeql-lab/tmp.bundle/codeql/qlpacks/codeql/cpp-all/3.0.0/ext/ # ~/codeql-lab/tmp.bundle/codeql/qlpacks/codeql/cpp-all/3.0.0/ext/Boost.Asio.model.yml # 7: - ["boost::asio", "", False, "read", "", "", "Argument[*1]", "remote", "manual"] # 8: - ["boost::asio", "", False, "read_at", "", "", "Argument[*2]", "remote", "manual"] # 9: - ["boost::asio", "", False, "read_until", "", "", "Argument[*1]", "remote", "manual"] # 10: - ["boost::asio", "", False, "async_read", "", "", "Argument[*1]", "remote", "manual"] # 11: - ["boost::asio", "", False, "async_read_at", "", "", "Argument[*2]", "remote", "manual"] # 12: - ["boost::asio", "", False, "async_read_until", "", "", "Argument[*1]", "remote", "manual"] # # or the broader search rg -i read ~/codeql-lab/tmp.bundle/codeql/qlpacks/codeql/cpp* --type=yaml # ~/codeql-lab/tmp.bundle/codeql/qlpacks/codeql/cpp-all/3.0.0/ext/Boost.Asio.model.yml # 7: - ["boost::asio", "", False, "read", "", "", "Argument[*1]", "remote", "manual"] # 8: - ["boost::asio", "", False, "read_at", "", "", "Argument[*2]", "remote", "manual"] # 9: - ["boost::asio", "", False, "read_until", "", "", "Argument[*1]", "remote", "manual"] # 10: - ["boost::asio", "", False, "async_read", "", "", "Argument[*1]", "remote", "manual"] # 11: - ["boost::asio", "", False, "async_read_at", "", "", "Argument[*2]", "remote", "manual"] # 12: - ["boost::asio", "", False, "async_read_until", "", "", "Argument[*1]", "remote", "manual"] -
note the entry alignment and types between json spec and the yaml data
cd ~/codeql-lab/tmp.bundle/codeql hovjson < qlpacks/codeql/cpp-all/3.0.0/.packinfo | head -16 # { # "extensible_predicate_metadata": { # "extensible_predicates": [ # { # "name": "sourceModel", # "parameters": [ # {"name": "namespace","type": "string"}, # {"name": "type","type": "string"}, # {"name": "subtypes","type": "boolean"}, # {"name": "name","type": "string"}, # {"name": "signature","type": "string"}, # {"name": "ext","type": "string"}, # {"name": "output","type": "string"}, # {"name": "kind","type": "string"}, # {"name": "provenance","type": "string"} # ], # In table form, for sourceModel jq '.extensible_predicate_metadata.extensible_predicates[] | select(.name == "sourceModel") | .parameters[] | .name ' < qlpacks/codeql/cpp-all/3.0.0/.packinfo # "namespace" # "type" # "subtypes" # "name" # "signature" # "ext" # "output" # "kind" # "provenance" rg -A 2 sourceModel qlpacks/codeql/cpp-all/3.0.0/ext/Boost.Asio.model.yml # 5: extensible: sourceModel # 6- data: # namespace, type, subtypes, name, signature, ext, output, kind, provenance # 7- - ["boost::asio", "", False, "read", "", "", "Argument[*1]", "remote", "manual"]In table form, these are
"namespace" "boost::asio" "type" "" "subtypes" False "name" "read" "signature" "" "ext" "" "output" "Argument[*1]" "kind" "remote" "provenance" "manual" -
<<XX: continue>> Check the Customizations.qll files, for extending existing queries via custom codeql. Note that there isn't one for C++, but it can be added.
cd ~/codeql-lab/tmp.bundle/codeql/qlpacks/codeql && find * -name Customizations.qll # csharp-all/4.0.0/Customizations.qll # csharp-examples/0.0.0/.codeql/libraries/codeql/csharp-all/4.0.0/Customizations.qll # csharp-queries/1.0.13/.codeql/libraries/codeql/csharp-all/4.0.0/Customizations.qll # go-all/3.0.0/Customizations.qll # go-examples/0.0.0/.codeql/libraries/codeql/go-all/3.0.0/Customizations.qll # go-queries/1.1.4/.codeql/libraries/codeql/go-all/3.0.0/Customizations.qll # java-all/5.0.0/Customizations.qll # java-examples/0.0.0/.codeql/libraries/codeql/java-all/5.0.0/Customizations.qll # java-queries/1.1.10/.codeql/libraries/codeql/java-all/5.0.0/Customizations.qll # javascript-all/2.2.0/Customizations.qll # javascript-examples/0.0.0/.codeql/libraries/codeql/javascript-all/2.2.0/Customizations.qll # javascript-queries/1.2.5/.codeql/libraries/codeql/javascript-all/2.2.0/Customizations.qll # python-all/3.0.0/Customizations.qll # python-examples/0.0.0/.codeql/libraries/codeql/python-all/3.0.0/Customizations.qll # python-queries/1.3.4/.codeql/libraries/codeql/python-all/3.0.0/Customizations.qll # ruby-all/3.0.0/Customizations.qll # ruby-examples/0.0.0/.codeql/libraries/codeql/ruby-all/3.0.0/Customizations.qll # ruby-queries/1.1.8/.codeql/libraries/codeql/ruby-all/3.0.0/Customizations.qll
-
Make customizations
-
Choose a target pack to modify, e.g.,
codeql/java-queriescd qlpacks/codeql/java-queries -
Decide on customization approach; you can use one or both.
- models-as-data. This means adding yaml files to provide data for extensible predicates.
- Extend existing classes. This means adding subclasses and predicates for base classes, to extend sink/source/flow/barrier definitions.
-
Add or modify QL modules
- Create
Customizations.qllinsrc/ - Import and extend existing modules/predicates
- Create
- Add or modify
.qlfiles using your new predicates -
Optionally run tests for the modified pack:
codeql test run .
-
-
Optionally add a new QL pack
If your changes are substantial or logically separate:
- Create a new directory, e.g.,
qlpacks/myorg/custom-queries -
Add a
codeql-pack.ymlfile:name: myorg/custom-queries version: 0.0.1 dependencies: codeql/java-queries: "*" - Add QL source files in
src/ - Use
import codeql/java-queriesto reuse existing logic - Add
qlpack.ymlif usingcodeql pack installor testing separately
- Create a new directory, e.g.,
-
Rebundle (optional)
The unpacked tree can be used directly with
codeql database analyze. But to distribute or version the result, repackage it:tar czf codeql-bundle-custom.tar.gz codeql-bundle/
XX: REFERENCE B_m is now a customized bundle containing your logic.
codeql database analyze with B_m, get results R_m
Run the usual analysis command using your modified bundle B_m:
./codeql/codeql database analyze \
<database> \
--format=sarifv2.1.0 \
--output=results.sarif \
myorg/custom-queries
Adjust --format and output path as needed. You may also specify individual
.ql files or packs.
Review / process R_m
Review or consume results as normal:
- Use the CodeQL extension or SARIF tools to explore the output
- Export summaries or ingest into downstream pipelines
- Consider writing postprocessors for bulk result handling