* CodeQL Bundling The ultimate form of customizing CodeQL queries is building custom bundles. This process is typically treated as a black box. That approach introduces significant problems, especially when multiple bundles must be unified or merged. The purpose of this module is to illustrate the steps and components involved in bundling, assuming a solid understanding of Unix tools. From a high-level deployment perspective, the typical flow is: - Obtain released bundle B_o - Modify it to create custom bundle B_m - Run =codeql database analyze= with B_m to produce results R_m - Review and post-process R_m At this level, bundling appears trivial. The complexity arises within the intermediate steps — especially modifying or composing bundles. A "black-box" bundler is available at: https://github.com/advanced-security/codeql-bundle.git It is also included here as a submodule: [[../extern/codeql-bundle/]] The following sections examine each step in detail. ** Get released bundle B_o This is straightforward. Download a prebuilt CodeQL bundle: #+BEGIN_SRC sh wget https://github.com/github/codeql-action/releases/download/codeql-bundle-v2.20.0/codeql-bundle-linux64.tar.gz wget https://github.com/github/codeql-action/releases/download/codeql-bundle-v2.20.0/codeql-bundle-osx64.tar.gz #+END_SRC These tarballs are also included in this repository via =git lfs= under [[../assets]]. ** Modify bundle B_m To construct a custom bundle B_m from the base bundle B_o: 1. Unpack the bundle A CodeQL bundle is a self-contained tarball including the CLI and all standard query/library packs. #+BEGIN_SRC sh # extract original cd ~/codeql-lab mkdir tmp.bundle tar -zxf assets/codeql-bundle-osx64.tar.gz -C tmp.bundle #+END_SRC 2. Understand the bundle layout Key directories: - =codeql/= — the CLI executable - =tools/= — helper tools, rarely needed directly - =qlpacks/= — all QL packs: libraries and queries Each pack is a directory containing: - =.packinfo= - info about extensible predicates - =codeql-pack.yml= — pack metadata and dependencies, including modelling extensions (which are used by extensible predicates) - =src/= — QL libraries and queries - =test/= — optional regression test cases In the shell: - Get information about and source code of extensible predicates. #+BEGIN_SRC sh # cd ~/codeql-lab/tmp.bundle/codeql ls qlpacks/codeql/ # controlflow/ go-all/ javascript-queries/ ruby-examples/ threat-models/ # cpp-all/ go-examples/ mad/ ruby-queries/ tutorial/ # cpp-examples/ go-queries/ python-all/ rust-all/ typeflow/ # cpp-queries/ java-all/ python-examples/ rust-queries/ typetracking/ # csharp-all/ java-examples/ python-queries/ ssa/ typos/ # csharp-examples/ java-queries/ rangeanalysis/ suite-helpers/ util/ # csharp-queries/ javascript-all/ regex/ swift-all/ xml/ # dataflow/ javascript-examples/ ruby-all/ swift-queries/ yaml/ # extensible predicates are listed: jq .< qlpacks/codeql/cpp-all/3.0.0/.packinfo |less # Indent in HOV Box style hovjson < qlpacks/codeql/cpp-all/3.0.0/.packinfo | less # { # "extensible_predicate_metadata": { # "extensible_predicates": [ # { # "name": "sourceModel", # "parameters": [ # {"name": "namespace","type": "string"}, # {"name": "type","type": "string"}, # {"name": "subtypes","type": "boolean"}, # {"name": "name","type": "string"}, # {"name": "signature","type": "string"}, # {"name": "ext","type": "string"}, # {"name": "output","type": "string"}, # {"name": "kind","type": "string"}, # {"name": "provenance","type": "string"} # ], # "path": "semmle/code/cpp/dataflow/internal/ExternalFlowExtensions.qll", # Following this, rooted at .../3.0.0, shows us the source of the predicates less qlpacks/codeql/cpp-all/3.0.0/semmle/code/cpp/dataflow/internal/ExternalFlowExtensions.qll #+END_SRC - Examine extension data files (yaml) #+BEGIN_SRC sh # tail -4 ~/codeql-lab/tmp.bundle/codeql/qlpacks/codeql/cpp-all/3.0.0/qlpack.yml # dataExtensions: # - ext/*.model.yml # - ext/deallocation/*.model.yml # - ext/allocation/*.model.yml # # Is the read() function from the line rg read ~/codeql-lab/codeql-dataflow-sql-injection-c/add-user.c # 52: count = read(STDIN_FILENO, buf, BUFSIZE - 1); # present? # # Does not look like it rg -i read ~/codeql-lab/tmp.bundle/codeql/qlpacks/codeql/cpp-all/3.0.0/ext/ # ~/codeql-lab/tmp.bundle/codeql/qlpacks/codeql/cpp-all/3.0.0/ext/Boost.Asio.model.yml # 7: - ["boost::asio", "", False, "read", "", "", "Argument[*1]", "remote", "manual"] # 8: - ["boost::asio", "", False, "read_at", "", "", "Argument[*2]", "remote", "manual"] # 9: - ["boost::asio", "", False, "read_until", "", "", "Argument[*1]", "remote", "manual"] # 10: - ["boost::asio", "", False, "async_read", "", "", "Argument[*1]", "remote", "manual"] # 11: - ["boost::asio", "", False, "async_read_at", "", "", "Argument[*2]", "remote", "manual"] # 12: - ["boost::asio", "", False, "async_read_until", "", "", "Argument[*1]", "remote", "manual"] # # or the broader search rg -i read ~/codeql-lab/tmp.bundle/codeql/qlpacks/codeql/cpp* --type=yaml # ~/codeql-lab/tmp.bundle/codeql/qlpacks/codeql/cpp-all/3.0.0/ext/Boost.Asio.model.yml # 7: - ["boost::asio", "", False, "read", "", "", "Argument[*1]", "remote", "manual"] # 8: - ["boost::asio", "", False, "read_at", "", "", "Argument[*2]", "remote", "manual"] # 9: - ["boost::asio", "", False, "read_until", "", "", "Argument[*1]", "remote", "manual"] # 10: - ["boost::asio", "", False, "async_read", "", "", "Argument[*1]", "remote", "manual"] # 11: - ["boost::asio", "", False, "async_read_at", "", "", "Argument[*2]", "remote", "manual"] # 12: - ["boost::asio", "", False, "async_read_until", "", "", "Argument[*1]", "remote", "manual"] #+END_SRC - note the entry alignment and types between json spec and the yaml data #+BEGIN_SRC sh cd ~/codeql-lab/tmp.bundle/codeql hovjson < qlpacks/codeql/cpp-all/3.0.0/.packinfo | head -16 # { # "extensible_predicate_metadata": { # "extensible_predicates": [ # { # "name": "sourceModel", # "parameters": [ # {"name": "namespace","type": "string"}, # {"name": "type","type": "string"}, # {"name": "subtypes","type": "boolean"}, # {"name": "name","type": "string"}, # {"name": "signature","type": "string"}, # {"name": "ext","type": "string"}, # {"name": "output","type": "string"}, # {"name": "kind","type": "string"}, # {"name": "provenance","type": "string"} # ], # In table form, for sourceModel jq '.extensible_predicate_metadata.extensible_predicates[] | select(.name == "sourceModel") | .parameters[] | .name ' < qlpacks/codeql/cpp-all/3.0.0/.packinfo # "namespace" # "type" # "subtypes" # "name" # "signature" # "ext" # "output" # "kind" # "provenance" rg -A 2 sourceModel qlpacks/codeql/cpp-all/3.0.0/ext/Boost.Asio.model.yml # 5: extensible: sourceModel # 6- data: # namespace, type, subtypes, name, signature, ext, output, kind, provenance # 7- - ["boost::asio", "", False, "read", "", "", "Argument[*1]", "remote", "manual"] #+END_SRC In table form, these are | "namespace" | "boost::asio" | | "type" | "" | | "subtypes" | False | | "name" | "read" | | "signature" | "" | | "ext" | "" | | "output" | "Argument[*1]" | | "kind" | "remote" | | "provenance" | "manual" | # {"name": "namespace","type": "string"}, # {"name": "type","type": "string"}, # {"name": "subtypes","type": "boolean"}, # {"name": "name","type": "string"}, # {"name": "signature","type": "string"}, # {"name": "ext","type": "string"}, # {"name": "output","type": "string"}, # {"name": "kind","type": "string"}, # {"name": "provenance","type": "string"} - <> Check the Customizations.qll files, for extending /existing/ queries via /custom/ codeql. Note that there isn't one for C++, but it can be added. #+BEGIN_SRC sh cd ~/codeql-lab/tmp.bundle/codeql/qlpacks/codeql && find * -name Customizations.qll # csharp-all/4.0.0/Customizations.qll # csharp-examples/0.0.0/.codeql/libraries/codeql/csharp-all/4.0.0/Customizations.qll # csharp-queries/1.0.13/.codeql/libraries/codeql/csharp-all/4.0.0/Customizations.qll # go-all/3.0.0/Customizations.qll # go-examples/0.0.0/.codeql/libraries/codeql/go-all/3.0.0/Customizations.qll # go-queries/1.1.4/.codeql/libraries/codeql/go-all/3.0.0/Customizations.qll # java-all/5.0.0/Customizations.qll # java-examples/0.0.0/.codeql/libraries/codeql/java-all/5.0.0/Customizations.qll # java-queries/1.1.10/.codeql/libraries/codeql/java-all/5.0.0/Customizations.qll # javascript-all/2.2.0/Customizations.qll # javascript-examples/0.0.0/.codeql/libraries/codeql/javascript-all/2.2.0/Customizations.qll # javascript-queries/1.2.5/.codeql/libraries/codeql/javascript-all/2.2.0/Customizations.qll # python-all/3.0.0/Customizations.qll # python-examples/0.0.0/.codeql/libraries/codeql/python-all/3.0.0/Customizations.qll # python-queries/1.3.4/.codeql/libraries/codeql/python-all/3.0.0/Customizations.qll # ruby-all/3.0.0/Customizations.qll # ruby-examples/0.0.0/.codeql/libraries/codeql/ruby-all/3.0.0/Customizations.qll # ruby-queries/1.1.8/.codeql/libraries/codeql/ruby-all/3.0.0/Customizations.qll #+END_SRC 3. Make customizations 1. Choose a target pack to modify, e.g., =codeql/java-queries= #+BEGIN_SRC sh cd qlpacks/codeql/java-queries #+END_SRC 2. Decide on customization approach; you can use one or both. 1. models-as-data. This means adding yaml files to provide data for extensible predicates. 2. Extend existing classes. This means adding subclasses and predicates for base classes, to extend sink/source/flow/barrier definitions. 3. Add or modify QL modules 1. Create =Customizations.qll= in =src/= 2. Import and extend existing modules/predicates 4. Add or modify =.ql= files using your new predicates 5. Optionally run tests for the modified pack: #+BEGIN_SRC sh codeql test run . #+END_SRC 4. Optionally add a new QL pack If your changes are substantial or logically separate: - Create a new directory, e.g., =qlpacks/myorg/custom-queries= - Add a =codeql-pack.yml= file: #+BEGIN_SRC yaml name: myorg/custom-queries version: 0.0.1 dependencies: codeql/java-queries: "*" #+END_SRC - Add QL source files in =src/= - Use =import codeql/java-queries= to reuse existing logic - Add =qlpack.yml= if using =codeql pack install= or testing separately 5. Rebundle (optional) The unpacked tree can be used directly with =codeql database analyze=. But to distribute or version the result, repackage it: #+BEGIN_SRC sh tar czf codeql-bundle-custom.tar.gz codeql-bundle/ #+END_SRC *XX: REFERENCE* B_m is now a customized bundle containing your logic. ** =codeql database analyze= with B_m, get results R_m Run the usual analysis command using your modified bundle B_m: #+BEGIN_SRC sh ./codeql/codeql database analyze \ \ --format=sarifv2.1.0 \ --output=results.sarif \ myorg/custom-queries #+END_SRC Adjust =--format= and output path as needed. You may also specify individual =.ql= files or packs. ** Review / process R_m Review or consume results as normal: - Use the CodeQL extension or SARIF tools to explore the output - Export summaries or ingest into downstream pipelines - Consider writing postprocessors for bulk result handling