Files
codeql-lab/codeql-bundling

CodeQL Bundling

The ultimate form of customizing CodeQL queries is building custom bundles. This process is typically treated as a black box. That approach introduces significant problems, especially when multiple bundles must be unified or merged.

The purpose of this module is to illustrate the steps and components involved in bundling, assuming a solid understanding of Unix tools.

From a high-level deployment perspective, the typical flow is:

  • Obtain released bundle B_o
  • Modify it to create custom bundle B_m
  • Run codeql database analyze with B_m to produce results R_m
  • Review and post-process R_m

At this level, bundling appears trivial. The complexity arises within the intermediate steps — especially modifying or composing bundles.

A "black-box" bundler is available at: https://github.com/advanced-security/codeql-bundle.git It is also included here as a submodule: ../extern/codeql-bundle/

The following sections examine each step in detail.

Get released bundle B_o

This is straightforward. Download a prebuilt CodeQL bundle:

  wget https://github.com/github/codeql-action/releases/download/codeql-bundle-v2.20.0/codeql-bundle-linux64.tar.gz
  wget https://github.com/github/codeql-action/releases/download/codeql-bundle-v2.20.0/codeql-bundle-osx64.tar.gz

These tarballs are also included in this repository via git lfs under ../assets.

Modify bundle B_m

To construct a custom bundle B_m from the base bundle B_o:

  1. Unpack the bundle

    A CodeQL bundle is a self-contained tarball including the CLI and all standard query/library packs.

      # extract original
      cd ~/codeql-lab
      mkdir tmp.bundle
      tar -zxf assets/codeql-bundle-osx64.tar.gz -C tmp.bundle
  2. Understand the bundle layout

    Key directories:

    • codeql/ — the CLI executable
    • tools/ — helper tools, rarely needed directly
    • qlpacks/ — all QL packs: libraries and queries

    Each pack is a directory containing:

    • .packinfo - info about extensible predicates
    • codeql-pack.yml — pack metadata and dependencies, including modelling extensions (which are used by extensible predicates)
    • src/ — QL libraries and queries
    • test/ — optional regression test cases

    In the shell:

    • Get information about and source code of extensible predicates.

        # 
        cd ~/codeql-lab/tmp.bundle/codeql
      
        ls qlpacks/codeql/
        # controlflow/         go-all/              javascript-queries/  ruby-examples/       threat-models/
        # cpp-all/             go-examples/         mad/                 ruby-queries/        tutorial/
        # cpp-examples/        go-queries/          python-all/          rust-all/            typeflow/
        # cpp-queries/         java-all/            python-examples/     rust-queries/        typetracking/
        # csharp-all/          java-examples/       python-queries/      ssa/                 typos/
        # csharp-examples/     java-queries/        rangeanalysis/       suite-helpers/       util/
        # csharp-queries/      javascript-all/      regex/               swift-all/           xml/
        # dataflow/            javascript-examples/ ruby-all/            swift-queries/       yaml/
      
        # extensible predicates are listed:
        jq .< qlpacks/codeql/cpp-all/3.0.0/.packinfo |less
      
        # Indent in HOV Box style
        hovjson < qlpacks/codeql/cpp-all/3.0.0/.packinfo | less
      
        # {
        #   "extensible_predicate_metadata": {
        #     "extensible_predicates": [
        #       {
        #         "name": "sourceModel",
        #         "parameters": [
        #           {"name": "namespace","type": "string"},
        #           {"name": "type","type": "string"},
        #           {"name": "subtypes","type": "boolean"},
        #           {"name": "name","type": "string"},
        #           {"name": "signature","type": "string"},
        #           {"name": "ext","type": "string"},
        #           {"name": "output","type": "string"},
        #           {"name": "kind","type": "string"},
        #           {"name": "provenance","type": "string"}
        #         ],
        #         "path": "semmle/code/cpp/dataflow/internal/ExternalFlowExtensions.qll",
      
        # Following this, rooted at .../3.0.0, shows us the source of the predicates
        less qlpacks/codeql/cpp-all/3.0.0/semmle/code/cpp/dataflow/internal/ExternalFlowExtensions.qll
    • Examine extension data files (yaml)

        # 
        tail -4 ~/codeql-lab/tmp.bundle/codeql/qlpacks/codeql/cpp-all/3.0.0/qlpack.yml
        # dataExtensions:
        #  - ext/*.model.yml
        #  - ext/deallocation/*.model.yml
        #  - ext/allocation/*.model.yml
      
        #
        # Is the read() function from the line
        rg read ~/codeql-lab/codeql-dataflow-sql-injection-c/add-user.c
        # 52:    count = read(STDIN_FILENO, buf, BUFSIZE - 1);
        # present?
      
        #
        # Does not look like it
        rg -i read ~/codeql-lab/tmp.bundle/codeql/qlpacks/codeql/cpp-all/3.0.0/ext/
        # ~/codeql-lab/tmp.bundle/codeql/qlpacks/codeql/cpp-all/3.0.0/ext/Boost.Asio.model.yml
        # 7:      - ["boost::asio", "", False, "read", "", "", "Argument[*1]", "remote", "manual"]
        # 8:      - ["boost::asio", "", False, "read_at", "", "", "Argument[*2]", "remote", "manual"]
        # 9:      - ["boost::asio", "", False, "read_until", "", "", "Argument[*1]", "remote", "manual"]
        # 10:      - ["boost::asio", "", False, "async_read", "", "", "Argument[*1]", "remote", "manual"]
        # 11:      - ["boost::asio", "", False, "async_read_at", "", "", "Argument[*2]", "remote", "manual"]
        # 12:      - ["boost::asio", "", False, "async_read_until", "", "", "Argument[*1]", "remote", "manual"]
      
        #
        # or the broader search
        rg -i read ~/codeql-lab/tmp.bundle/codeql/qlpacks/codeql/cpp* --type=yaml
        # ~/codeql-lab/tmp.bundle/codeql/qlpacks/codeql/cpp-all/3.0.0/ext/Boost.Asio.model.yml
        # 7:      - ["boost::asio", "", False, "read", "", "", "Argument[*1]", "remote", "manual"]
        # 8:      - ["boost::asio", "", False, "read_at", "", "", "Argument[*2]", "remote", "manual"]
        # 9:      - ["boost::asio", "", False, "read_until", "", "", "Argument[*1]", "remote", "manual"]
        # 10:      - ["boost::asio", "", False, "async_read", "", "", "Argument[*1]", "remote", "manual"]
        # 11:      - ["boost::asio", "", False, "async_read_at", "", "", "Argument[*2]", "remote", "manual"]
        # 12:      - ["boost::asio", "", False, "async_read_until", "", "", "Argument[*1]", "remote", "manual"]
    • note the entry alignment and types between json spec and the yaml data

        cd ~/codeql-lab/tmp.bundle/codeql
      
        hovjson < qlpacks/codeql/cpp-all/3.0.0/.packinfo | head -16
        # {
        #   "extensible_predicate_metadata": {
        #     "extensible_predicates": [
        #       {
        #         "name": "sourceModel",
        #         "parameters": [
        #           {"name": "namespace","type": "string"},
        #           {"name": "type","type": "string"},
        #           {"name": "subtypes","type": "boolean"},
        #           {"name": "name","type": "string"},
        #           {"name": "signature","type": "string"},
        #           {"name": "ext","type": "string"},
        #           {"name": "output","type": "string"},
        #           {"name": "kind","type": "string"},
        #           {"name": "provenance","type": "string"}
        #         ],
      
        # In table form, for sourceModel
        jq '.extensible_predicate_metadata.extensible_predicates[]
            | select(.name == "sourceModel")
            | .parameters[]
            | .name ' < qlpacks/codeql/cpp-all/3.0.0/.packinfo
        # "namespace"
        # "type"
        # "subtypes"
        # "name"
        # "signature"
        # "ext"
        # "output"
        # "kind"
        # "provenance"
      
        rg -A 2 sourceModel qlpacks/codeql/cpp-all/3.0.0/ext/Boost.Asio.model.yml
        # 5:      extensible: sourceModel
        # 6-    data: # namespace, type, subtypes, name, signature, ext, output, kind, provenance
        # 7-      - ["boost::asio", "", False, "read", "", "", "Argument[*1]", "remote", "manual"]

      In table form, these are

      "namespace" "boost::asio"
      "type" ""
      "subtypes" False
      "name" "read"
      "signature" ""
      "ext" ""
      "output" "Argument[*1]"
      "kind" "remote"
      "provenance" "manual"
    • <<XX: continue>> Check the Customizations.qll files, for extending existing queries via custom codeql. Note that there isn't one for C++, but it can be added.

        cd  ~/codeql-lab/tmp.bundle/codeql/qlpacks/codeql && find * -name Customizations.qll
        # csharp-all/4.0.0/Customizations.qll
        # csharp-examples/0.0.0/.codeql/libraries/codeql/csharp-all/4.0.0/Customizations.qll
        # csharp-queries/1.0.13/.codeql/libraries/codeql/csharp-all/4.0.0/Customizations.qll
        # go-all/3.0.0/Customizations.qll
        # go-examples/0.0.0/.codeql/libraries/codeql/go-all/3.0.0/Customizations.qll
        # go-queries/1.1.4/.codeql/libraries/codeql/go-all/3.0.0/Customizations.qll
        # java-all/5.0.0/Customizations.qll
        # java-examples/0.0.0/.codeql/libraries/codeql/java-all/5.0.0/Customizations.qll
        # java-queries/1.1.10/.codeql/libraries/codeql/java-all/5.0.0/Customizations.qll
        # javascript-all/2.2.0/Customizations.qll
        # javascript-examples/0.0.0/.codeql/libraries/codeql/javascript-all/2.2.0/Customizations.qll
        # javascript-queries/1.2.5/.codeql/libraries/codeql/javascript-all/2.2.0/Customizations.qll
        # python-all/3.0.0/Customizations.qll
        # python-examples/0.0.0/.codeql/libraries/codeql/python-all/3.0.0/Customizations.qll
        # python-queries/1.3.4/.codeql/libraries/codeql/python-all/3.0.0/Customizations.qll
        # ruby-all/3.0.0/Customizations.qll
        # ruby-examples/0.0.0/.codeql/libraries/codeql/ruby-all/3.0.0/Customizations.qll
        # ruby-queries/1.1.8/.codeql/libraries/codeql/ruby-all/3.0.0/Customizations.qll
  3. Make customizations

    1. Choose a target pack to modify, e.g., codeql/java-queries

        cd qlpacks/codeql/java-queries
    2. Decide on customization approach; you can use one or both.

      1. models-as-data. This means adding yaml files to provide data for extensible predicates.
      2. Extend existing classes. This means adding subclasses and predicates for base classes, to extend sink/source/flow/barrier definitions.
    3. Add or modify QL modules

      1. Create Customizations.qll in src/
      2. Import and extend existing modules/predicates
    4. Add or modify .ql files using your new predicates
    5. Optionally run tests for the modified pack:

        codeql test run .
  4. Optionally add a new QL pack

    If your changes are substantial or logically separate:

    • Create a new directory, e.g., qlpacks/myorg/custom-queries
    • Add a codeql-pack.yml file:

        name: myorg/custom-queries
        version: 0.0.1
        dependencies:
            codeql/java-queries: "*"
    • Add QL source files in src/
    • Use import codeql/java-queries to reuse existing logic
    • Add qlpack.yml if using codeql pack install or testing separately
  5. Rebundle (optional)

    The unpacked tree can be used directly with codeql database analyze. But to distribute or version the result, repackage it:

      tar czf codeql-bundle-custom.tar.gz codeql-bundle/

XX: REFERENCE B_m is now a customized bundle containing your logic.

codeql database analyze with B_m, get results R_m

Run the usual analysis command using your modified bundle B_m:

  ./codeql/codeql database analyze \
                  <database> \
                  --format=sarifv2.1.0 \
                  --output=results.sarif \
                  myorg/custom-queries

Adjust --format and output path as needed. You may also specify individual .ql files or packs.

Review / process R_m

Review or consume results as normal:

  • Use the CodeQL extension or SARIF tools to explore the output
  • Export summaries or ingest into downstream pipelines
  • Consider writing postprocessors for bulk result handling