mirror of
https://github.com/hohn/codeql-lab.git
synced 2025-12-16 09:53:04 +01:00
301 lines
14 KiB
Org Mode
301 lines
14 KiB
Org Mode
* CodeQL Bundling
|
|
The ultimate form of customizing CodeQL queries is building custom bundles.
|
|
This process is typically treated as a black box. That approach introduces
|
|
significant problems, especially when multiple bundles must be unified or merged.
|
|
|
|
The purpose of this module is to illustrate the steps and components involved
|
|
in bundling, assuming a solid understanding of Unix tools.
|
|
|
|
From a high-level deployment perspective, the typical flow is:
|
|
|
|
- Obtain released bundle B_o
|
|
- Modify it to create custom bundle B_m
|
|
- Run =codeql database analyze= with B_m to produce results R_m
|
|
- Review and post-process R_m
|
|
|
|
At this level, bundling appears trivial. The complexity arises within the
|
|
intermediate steps — especially modifying or composing bundles.
|
|
|
|
A "black-box" bundler is available at:
|
|
https://github.com/advanced-security/codeql-bundle.git
|
|
It is also included here as a submodule: [[../extern/codeql-bundle/]]
|
|
|
|
The following sections examine each step in detail.
|
|
|
|
** Get released bundle B_o
|
|
This is straightforward. Download a prebuilt CodeQL bundle:
|
|
|
|
#+BEGIN_SRC sh
|
|
wget https://github.com/github/codeql-action/releases/download/codeql-bundle-v2.20.0/codeql-bundle-linux64.tar.gz
|
|
wget https://github.com/github/codeql-action/releases/download/codeql-bundle-v2.20.0/codeql-bundle-osx64.tar.gz
|
|
#+END_SRC
|
|
|
|
These tarballs are also included in this repository via =git lfs= under [[../assets]].
|
|
|
|
** Modify bundle B_m
|
|
To construct a custom bundle B_m from the base bundle B_o:
|
|
|
|
1. Unpack the bundle
|
|
|
|
A CodeQL bundle is a self-contained tarball including the CLI and all
|
|
standard query/library packs.
|
|
|
|
#+BEGIN_SRC sh
|
|
# extract original
|
|
cd ~/codeql-lab
|
|
mkdir tmp.bundle
|
|
tar -zxf assets/codeql-bundle-osx64.tar.gz -C tmp.bundle
|
|
#+END_SRC
|
|
|
|
2. Understand the bundle layout
|
|
|
|
Key directories:
|
|
- =codeql/= — the CLI executable
|
|
- =tools/= — helper tools, rarely needed directly
|
|
- =qlpacks/= — all QL packs: libraries and queries
|
|
|
|
Each pack is a directory containing:
|
|
- =.packinfo= - info about extensible predicates
|
|
- =codeql-pack.yml= — pack metadata and dependencies, including modelling
|
|
extensions (which are used by extensible predicates)
|
|
- =src/= — QL libraries and queries
|
|
- =test/= — optional regression test cases
|
|
|
|
In the shell:
|
|
- Get information about and source code of extensible predicates.
|
|
#+BEGIN_SRC sh
|
|
#
|
|
cd ~/codeql-lab/tmp.bundle/codeql
|
|
|
|
ls qlpacks/codeql/
|
|
# controlflow/ go-all/ javascript-queries/ ruby-examples/ threat-models/
|
|
# cpp-all/ go-examples/ mad/ ruby-queries/ tutorial/
|
|
# cpp-examples/ go-queries/ python-all/ rust-all/ typeflow/
|
|
# cpp-queries/ java-all/ python-examples/ rust-queries/ typetracking/
|
|
# csharp-all/ java-examples/ python-queries/ ssa/ typos/
|
|
# csharp-examples/ java-queries/ rangeanalysis/ suite-helpers/ util/
|
|
# csharp-queries/ javascript-all/ regex/ swift-all/ xml/
|
|
# dataflow/ javascript-examples/ ruby-all/ swift-queries/ yaml/
|
|
|
|
# extensible predicates are listed:
|
|
jq .< qlpacks/codeql/cpp-all/3.0.0/.packinfo |less
|
|
|
|
# Indent in HOV Box style
|
|
hovjson < qlpacks/codeql/cpp-all/3.0.0/.packinfo | less
|
|
|
|
# {
|
|
# "extensible_predicate_metadata": {
|
|
# "extensible_predicates": [
|
|
# {
|
|
# "name": "sourceModel",
|
|
# "parameters": [
|
|
# {"name": "namespace","type": "string"},
|
|
# {"name": "type","type": "string"},
|
|
# {"name": "subtypes","type": "boolean"},
|
|
# {"name": "name","type": "string"},
|
|
# {"name": "signature","type": "string"},
|
|
# {"name": "ext","type": "string"},
|
|
# {"name": "output","type": "string"},
|
|
# {"name": "kind","type": "string"},
|
|
# {"name": "provenance","type": "string"}
|
|
# ],
|
|
# "path": "semmle/code/cpp/dataflow/internal/ExternalFlowExtensions.qll",
|
|
|
|
# Following this, rooted at .../3.0.0, shows us the source of the predicates
|
|
less qlpacks/codeql/cpp-all/3.0.0/semmle/code/cpp/dataflow/internal/ExternalFlowExtensions.qll
|
|
#+END_SRC
|
|
|
|
- Examine extension data files (yaml)
|
|
#+BEGIN_SRC sh
|
|
#
|
|
tail -4 ~/codeql-lab/tmp.bundle/codeql/qlpacks/codeql/cpp-all/3.0.0/qlpack.yml
|
|
# dataExtensions:
|
|
# - ext/*.model.yml
|
|
# - ext/deallocation/*.model.yml
|
|
# - ext/allocation/*.model.yml
|
|
|
|
#
|
|
# Is the read() function from the line
|
|
rg read ~/codeql-lab/codeql-dataflow-sql-injection-c/add-user.c
|
|
# 52: count = read(STDIN_FILENO, buf, BUFSIZE - 1);
|
|
# present?
|
|
|
|
#
|
|
# Does not look like it
|
|
rg -i read ~/codeql-lab/tmp.bundle/codeql/qlpacks/codeql/cpp-all/3.0.0/ext/
|
|
# ~/codeql-lab/tmp.bundle/codeql/qlpacks/codeql/cpp-all/3.0.0/ext/Boost.Asio.model.yml
|
|
# 7: - ["boost::asio", "", False, "read", "", "", "Argument[*1]", "remote", "manual"]
|
|
# 8: - ["boost::asio", "", False, "read_at", "", "", "Argument[*2]", "remote", "manual"]
|
|
# 9: - ["boost::asio", "", False, "read_until", "", "", "Argument[*1]", "remote", "manual"]
|
|
# 10: - ["boost::asio", "", False, "async_read", "", "", "Argument[*1]", "remote", "manual"]
|
|
# 11: - ["boost::asio", "", False, "async_read_at", "", "", "Argument[*2]", "remote", "manual"]
|
|
# 12: - ["boost::asio", "", False, "async_read_until", "", "", "Argument[*1]", "remote", "manual"]
|
|
|
|
#
|
|
# or the broader search
|
|
rg -i read ~/codeql-lab/tmp.bundle/codeql/qlpacks/codeql/cpp* --type=yaml
|
|
# ~/codeql-lab/tmp.bundle/codeql/qlpacks/codeql/cpp-all/3.0.0/ext/Boost.Asio.model.yml
|
|
# 7: - ["boost::asio", "", False, "read", "", "", "Argument[*1]", "remote", "manual"]
|
|
# 8: - ["boost::asio", "", False, "read_at", "", "", "Argument[*2]", "remote", "manual"]
|
|
# 9: - ["boost::asio", "", False, "read_until", "", "", "Argument[*1]", "remote", "manual"]
|
|
# 10: - ["boost::asio", "", False, "async_read", "", "", "Argument[*1]", "remote", "manual"]
|
|
# 11: - ["boost::asio", "", False, "async_read_at", "", "", "Argument[*2]", "remote", "manual"]
|
|
# 12: - ["boost::asio", "", False, "async_read_until", "", "", "Argument[*1]", "remote", "manual"]
|
|
#+END_SRC
|
|
|
|
- note the entry alignment and types between json spec and the yaml data
|
|
#+BEGIN_SRC sh
|
|
cd ~/codeql-lab/tmp.bundle/codeql
|
|
|
|
hovjson < qlpacks/codeql/cpp-all/3.0.0/.packinfo | head -16
|
|
# {
|
|
# "extensible_predicate_metadata": {
|
|
# "extensible_predicates": [
|
|
# {
|
|
# "name": "sourceModel",
|
|
# "parameters": [
|
|
# {"name": "namespace","type": "string"},
|
|
# {"name": "type","type": "string"},
|
|
# {"name": "subtypes","type": "boolean"},
|
|
# {"name": "name","type": "string"},
|
|
# {"name": "signature","type": "string"},
|
|
# {"name": "ext","type": "string"},
|
|
# {"name": "output","type": "string"},
|
|
# {"name": "kind","type": "string"},
|
|
# {"name": "provenance","type": "string"}
|
|
# ],
|
|
|
|
# In table form, for sourceModel
|
|
jq '.extensible_predicate_metadata.extensible_predicates[]
|
|
| select(.name == "sourceModel")
|
|
| .parameters[]
|
|
| .name ' < qlpacks/codeql/cpp-all/3.0.0/.packinfo
|
|
# "namespace"
|
|
# "type"
|
|
# "subtypes"
|
|
# "name"
|
|
# "signature"
|
|
# "ext"
|
|
# "output"
|
|
# "kind"
|
|
# "provenance"
|
|
|
|
rg -A 2 sourceModel qlpacks/codeql/cpp-all/3.0.0/ext/Boost.Asio.model.yml
|
|
# 5: extensible: sourceModel
|
|
# 6- data: # namespace, type, subtypes, name, signature, ext, output, kind, provenance
|
|
# 7- - ["boost::asio", "", False, "read", "", "", "Argument[*1]", "remote", "manual"]
|
|
#+END_SRC
|
|
In table form, these are
|
|
|
|
| "namespace" | "boost::asio" |
|
|
| "type" | "" |
|
|
| "subtypes" | False |
|
|
| "name" | "read" |
|
|
| "signature" | "" |
|
|
| "ext" | "" |
|
|
| "output" | "Argument[*1]" |
|
|
| "kind" | "remote" |
|
|
| "provenance" | "manual" |
|
|
|
|
# {"name": "namespace","type": "string"},
|
|
# {"name": "type","type": "string"},
|
|
# {"name": "subtypes","type": "boolean"},
|
|
# {"name": "name","type": "string"},
|
|
# {"name": "signature","type": "string"},
|
|
# {"name": "ext","type": "string"},
|
|
# {"name": "output","type": "string"},
|
|
# {"name": "kind","type": "string"},
|
|
# {"name": "provenance","type": "string"}
|
|
|
|
- <<XX: continue>> Check the Customizations.qll files, for extending /existing/ queries via
|
|
/custom/ codeql. Note that there isn't one for C++, but it can be added.
|
|
#+BEGIN_SRC sh
|
|
cd ~/codeql-lab/tmp.bundle/codeql/qlpacks/codeql && find * -name Customizations.qll
|
|
# csharp-all/4.0.0/Customizations.qll
|
|
# csharp-examples/0.0.0/.codeql/libraries/codeql/csharp-all/4.0.0/Customizations.qll
|
|
# csharp-queries/1.0.13/.codeql/libraries/codeql/csharp-all/4.0.0/Customizations.qll
|
|
# go-all/3.0.0/Customizations.qll
|
|
# go-examples/0.0.0/.codeql/libraries/codeql/go-all/3.0.0/Customizations.qll
|
|
# go-queries/1.1.4/.codeql/libraries/codeql/go-all/3.0.0/Customizations.qll
|
|
# java-all/5.0.0/Customizations.qll
|
|
# java-examples/0.0.0/.codeql/libraries/codeql/java-all/5.0.0/Customizations.qll
|
|
# java-queries/1.1.10/.codeql/libraries/codeql/java-all/5.0.0/Customizations.qll
|
|
# javascript-all/2.2.0/Customizations.qll
|
|
# javascript-examples/0.0.0/.codeql/libraries/codeql/javascript-all/2.2.0/Customizations.qll
|
|
# javascript-queries/1.2.5/.codeql/libraries/codeql/javascript-all/2.2.0/Customizations.qll
|
|
# python-all/3.0.0/Customizations.qll
|
|
# python-examples/0.0.0/.codeql/libraries/codeql/python-all/3.0.0/Customizations.qll
|
|
# python-queries/1.3.4/.codeql/libraries/codeql/python-all/3.0.0/Customizations.qll
|
|
# ruby-all/3.0.0/Customizations.qll
|
|
# ruby-examples/0.0.0/.codeql/libraries/codeql/ruby-all/3.0.0/Customizations.qll
|
|
# ruby-queries/1.1.8/.codeql/libraries/codeql/ruby-all/3.0.0/Customizations.qll
|
|
#+END_SRC
|
|
|
|
3. Make customizations
|
|
|
|
1. Choose a target pack to modify, e.g., =codeql/java-queries=
|
|
#+BEGIN_SRC sh
|
|
cd qlpacks/codeql/java-queries
|
|
#+END_SRC
|
|
|
|
2. Decide on customization approach; you can use one or both.
|
|
1. models-as-data. This means adding yaml files to provide data for
|
|
extensible predicates.
|
|
2. Extend existing classes. This means adding subclasses and predicates
|
|
for base classes, to extend sink/source/flow/barrier definitions.
|
|
|
|
3. Add or modify QL modules
|
|
1. Create =Customizations.qll= in =src/=
|
|
2. Import and extend existing modules/predicates
|
|
|
|
4. Add or modify =.ql= files using your new predicates
|
|
|
|
5. Optionally run tests for the modified pack:
|
|
#+BEGIN_SRC sh
|
|
codeql test run .
|
|
#+END_SRC
|
|
|
|
4. Optionally add a new QL pack
|
|
|
|
If your changes are substantial or logically separate:
|
|
- Create a new directory, e.g., =qlpacks/myorg/custom-queries=
|
|
- Add a =codeql-pack.yml= file:
|
|
#+BEGIN_SRC yaml
|
|
name: myorg/custom-queries
|
|
version: 0.0.1
|
|
dependencies:
|
|
codeql/java-queries: "*"
|
|
#+END_SRC
|
|
- Add QL source files in =src/=
|
|
- Use =import codeql/java-queries= to reuse existing logic
|
|
- Add =qlpack.yml= if using =codeql pack install= or testing separately
|
|
|
|
5. Rebundle (optional)
|
|
|
|
The unpacked tree can be used directly with =codeql database analyze=.
|
|
But to distribute or version the result, repackage it:
|
|
#+BEGIN_SRC sh
|
|
tar czf codeql-bundle-custom.tar.gz codeql-bundle/
|
|
#+END_SRC
|
|
|
|
*XX: REFERENCE* B_m is now a customized bundle containing your logic.
|
|
|
|
** =codeql database analyze= with B_m, get results R_m
|
|
Run the usual analysis command using your modified bundle B_m:
|
|
#+BEGIN_SRC sh
|
|
./codeql/codeql database analyze \
|
|
<database> \
|
|
--format=sarifv2.1.0 \
|
|
--output=results.sarif \
|
|
myorg/custom-queries
|
|
#+END_SRC
|
|
|
|
Adjust =--format= and output path as needed. You may also specify individual
|
|
=.ql= files or packs.
|
|
|
|
** Review / process R_m
|
|
Review or consume results as normal:
|
|
- Use the CodeQL extension or SARIF tools to explore the output
|
|
- Export summaries or ingest into downstream pipelines
|
|
- Consider writing postprocessors for bulk result handling
|