diff --git a/doc/readme.md b/doc/readme.md
new file mode 100644
index 0000000..2125a91
--- /dev/null
+++ b/doc/readme.md
@@ -0,0 +1,1066 @@
+
+# Table of Contents
+
+1. [End-to-end demo of CodeQL command line usage](#orgbec345e)
+ 1. [Run analyses](#org120a28d)
+ 1. [Get collection of databases (already handy)](#org10f5d2f)
+ 1. [Get https://github.com/hohn/codeql-workshop-vulnerable-linux-driver](#org3098062)
+ 2. [Quick check using VS Code](#orga84eb1e)
+ 3. [Install codeql](#org6e8bf77)
+ 4. [Install pack dependencies](#orgefc5f79)
+ 2. [Run queries](#org38093b3)
+ 1. [Individual: 1 database -> N sarif files](#org5dc500d)
+ 2. [Use directory of queries: 1 database -> 1 sarif file (least effort)](#org696d4ba)
+ 3. [Use suite: 1 database -> 1 sarif file (more flexible, more effort)](#org5f683b3)
+ 3. [The importance of versioning](#org3172fe1)
+ 1. [CodeQL cli version](#orgc022fc5)
+ 2. [Database version](#org1b0ed6d)
+ 3. [Query set version](#org1a6cfa6)
+ 2. [Review results](#orgebc1392)
+ 1. [SARIF Documentation](#orgd483425)
+ 2. [SARIF viewer plugin](#org1dd7344)
+ 1. [Install plugin in VS Code](#org41d6b5c)
+ 2. [Review](#orgc0d3ad4)
+ 3. [View raw sarif with `jq`](#orgeb5a147)
+ 4. [View raw sarif with `jq` and fzf](#orga406a9a)
+ 5. [sarif-cli](#org08832bc)
+ 1. [Setup / local install](#org49abefb)
+ 2. [Compiler-style textual output from SARIF](#org4e881d5)
+ 3. [SQL conversion – not compatible with codeql v2.13.4](#org3c1536b)
+ 3. [Running sequence](#orgcc12fc2)
+ 1. [Smallest query suite to largest](#org267aec6)
+ 2. [Working with results based on counts](#org9a3230f)
+ 4. [Comparing analysis results across sarif files](#org60393a3)
+ 5. [Miscellany](#orgbec3025)
+
+
+
+
+
+# End-to-end demo of CodeQL command line usage
+
+
+
+
+## Run analyses
+
+
+
+
+### Get collection of databases (already handy)
+
+
+
+
+#### Get
+
+ cd ~/local
+ git clone git@github.com:hohn/codeql-workshop-vulnerable-linux-driver.git
+ cd codeql-workshop-vulnerable-linux-driver/
+ unzip vulnerable-linux-driver.zip
+ tree -L 2 vulnerable-linux-driver-db/
+ vulnerable-linux-driver-db/
+ ├── codeql-database.yml
+ ├── db-cpp
+ │ ├── default
+ │ ├── semmlecode.cpp.dbscheme
+ │ └── semmlecode.cpp.dbscheme.stats
+ └── src.zip
+
+ 3 directories, 4 files
+
+
+
+
+#### Quick check using VS Code
+
+The same steps will repeat for the cli.
+
+- select DB
+- select query
+- run query
+- view results
+
+
+
+
+#### Install codeql
+
+1. Full docs
+
+ -
+ -
+
+2. In short:
+
+ cd ~/local/codeql-cli-end-to-end
+ # Decide on version / os via browser, then:
+ wget https://github.com/github/codeql-action/releases/download/codeql-bundle-v2.13.4/codeql-bundle-osx64.tar.gz
+
+ # Fix attributes on mac
+ if [ `uname` = Darwin ] ; then
+ xattr -c *.tar.gz
+ fi
+
+ # Extract
+ tar zxf ./codeql-bundle-osx64.tar.gz
+
+ # Check binary
+ pwd
+ # /Users/hohn/local/codeql-cli-end-to-end
+ ./codeql/codeql --version
+ # CodeQL command-line toolchain release 2.13.4.
+ # Copyright (C) 2019-2023 GitHub, Inc.
+ # Unpacked in: /Users/hohn/local/codeql-cli-end-to-end/codeql
+ # Analysis results depend critically on separately distributed query and
+ # extractor modules. To list modules that are visible to the toolchain,
+ # use 'codeql resolve qlpacks' and 'codeql resolve languages'.
+
+ # Check packs
+ 0:$ ./codeql/codeql resolve qlpacks |head -5
+ # codeql/cpp-all (/Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-all/0.7.3)
+ # codeql/cpp-examples (/Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-examples/0.0.0)
+ # codeql/cpp-queries (/Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-queries/0.6.3)
+ # codeql/csharp-all (/Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/csharp-all/0.6.3)
+ # codeql/csharp-examples (/Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/csharp-examples/0.0.0)
+
+ # Fix the path
+ export PATH=$(pwd -P)/codeql:"$PATH"
+
+ # Check languages
+ codeql resolve languages | head -5
+ # go (/Users/hohn/local/codeql-cli-end-to-end/codeql/go)
+ # python (/Users/hohn/local/codeql-cli-end-to-end/codeql/python)
+ # java (/Users/hohn/local/codeql-cli-end-to-end/codeql/java)
+ # html (/Users/hohn/local/codeql-cli-end-to-end/codeql/html)
+ # xml (/Users/hohn/local/codeql-cli-end-to-end/codeql/xml)
+
+3. A more fancy version
+
+ # Reference urls:
+ # https://github.com/github/codeql-cli-binaries/releases/download/v2.8.0/codeql-linux64.zip
+ # https://github.com/github/codeql/archive/refs/tags/codeql-cli/v2.8.0.zip
+ #
+ # grab -- retrieve and extract codeql cli and library
+ # Usage: grab version url prefix
+ grab() {
+ version=$1; shift
+ platform=$1; shift
+ prefix=$1; shift
+ mkdir -p $prefix/codeql-$version &&
+ cd $prefix/codeql-$version || return
+
+ # Get cli
+ wget "https://github.com/github/codeql-cli-binaries/releases/download/$version/codeql-$platform.zip"
+ # Get lib
+ wget "https://github.com/github/codeql/archive/refs/tags/codeql-cli/$version.zip"
+ # Fix attributes
+ if [ `uname` = Darwin ] ; then
+ xattr -c *.zip
+ fi
+ # Extract
+ unzip -q codeql-$platform.zip
+ unzip -q $version.zip
+ # Rename library directory for VS Code
+ mv codeql-codeql-cli-$version/ ql
+ # remove archives?
+ # rm codeql-$platform.zip
+ # rm $version.zip
+ }
+
+ grab v2.7.6 osx64 $HOME/local
+ grab v2.8.3 osx64 $HOME/local
+ grab v2.8.4 osx64 $HOME/local
+
+ grab v2.6.3 linux64 /opt
+
+ grab v2.6.3 osx64 $HOME/local
+ grab v2.4.6 osx64 $HOME/local
+
+4. Most flexible in use, but more initial setup
+
+ `gh`, the GitHub command-line tool from
+
+ - gh api repos/{owner}/{repo}/releases
+
+ - gh extension create
+
+ - gh codeql extension
+
+ - gh gist list
+
+
+ 0:$ gh codeql
+ GitHub command-line wrapper for the CodeQL CLI.
+
+
+
+
+#### Install pack dependencies
+
+1. Full docs
+
+ -
+ -
+
+2. View installed docs via `-h` flag, highly recommended
+
+ # Overview
+ codeql -h
+
+ # Sub 1
+ codeql pack -h
+
+ # Sub 2
+ codeql pack install -h
+
+3. In short
+
+ 1. Create the qlpack
+
+ Create the qlpack files if not there, one per directory. In this project,
+ that's already done:
+
+ 0:$ find codeql-workshop-vulnerable-linux-driver -name "qlpack.yml"
+ codeql-workshop-vulnerable-linux-driver/queries/qlpack.yml
+ codeql-workshop-vulnerable-linux-driver/solutions/qlpack.yml
+ codeql-workshop-vulnerable-linux-driver/common/qlpack.yml
+
+ For example:
+
+ cat codeql-workshop-vulnerable-linux-driver/queries/qlpack.yml
+
+ shows
+
+ ---
+ library: false
+ name: queries
+ version: 0.0.1
+ dependencies:
+ codeql/cpp-all: ^0.7.0
+ common: "*"
+
+ So the queries directory does not contain a library, but it depends on one,
+
+ cat codeql-workshop-vulnerable-linux-driver/common/qlpack.yml
+
+ ---
+ library: true
+ name: common
+ version: 0.0.1
+ dependencies:
+ codeql/cpp-all: 0.7.0
+
+ 2. Install each pack's dependencies
+
+ The first time you install dependencies, it's a good idea to do this
+ menually, per `qlpack.yml` file, and deal with any errors that may occur.
+
+ pushd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
+ codeql pack install --no-strict-mode queries/
+
+ After the initial setup and for automation, install each pack's
+ dependencies via a loop using `codeql pack install`
+
+ pushd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
+ find . -name "qlpack.yml"
+ # ./queries/qlpack.yml
+ # ./solutions/qlpack.yml
+ # ./common/qlpack.yml
+
+ codeql pack install --no-strict-mode queries/
+ # Dependencies resolved. Installing packages...
+ # Install location: /Users/hohn/.codeql/packages
+ # Nothing to install.
+ # Package install location: /Users/hohn/.codeql/packages
+ # Nothing downloaded.
+
+ for sub in `find . -name "qlpack.yml" | sed s@qlpack.yml@@g;`
+ do
+ codeql pack install --no-strict-mode $sub
+ done
+
+
+
+
+### Run queries
+
+
+
+
+#### Individual: 1 database -> N sarif files
+
+ #* Set environment
+ PROJ=$HOME/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
+ DB=$PROJ/vulnerable-linux-driver-db
+ QLQUERY=$PROJ/solutions/BufferOverflow.ql
+ QUERY_RES_SARIF=$PROJ/$(cd $PROJ && git rev-parse --short HEAD)-BufferOverflow.sarif
+
+ #* Run query
+ pushd $PROJ
+ codeql database analyze --format=sarif-latest --rerun \
+ --output $QUERY_RES_SARIF \
+ -j6 \
+ --ram=24000 \
+ -- \
+ $DB \
+ $QLQUERY
+
+ # if you get
+ # fatal error occurred: Error initializing the IMB disk cache: the cache
+ # directory is already locked by another running process. Only one instance of
+ # the IMB can access a cache directory at a time. The lock file is located at
+ # /Users/hohn/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver/vulnerable-linux-driver-db/db-cpp/default/cache/.lock
+ # exit vs code and try again
+
+And after some time:
+
+ BufferOverflow.ql: [1/1 eval 1.8s] Results written to solutions/BufferOverfl
+ Shutting down query evaluator.
+ Interpreting results.
+
+ echo The query $QLQUERY
+ echo run on $DB
+ echo produced output in $QUERY_RES_SARIF:
+ head -5 $QUERY_RES_SARIF
+ # {
+ # "$schema" : "https://json.schemastore.org/sarif-2.1.0.json",
+ # "version" : "2.1.0",
+ # "runs" : [ {
+ # "tool" : {
+ # ...
+
+And run another, get another sarif file. Bad idea in general, but good for
+debugging timing etc.
+
+ #* Use prior variable settings
+
+ #* Run query
+ pushd $PROJ
+ qo=$PROJ/$(cd $PROJ && git rev-parse --short HEAD)-UseAfterFree.sarif
+ codeql database analyze --format=sarif-latest --rerun \
+ --output $qo \
+ -j6 \
+ --ram=24000 \
+ -- \
+ $DB \
+ $PROJ/solutions/UseAfterFree.ql
+ popd
+
+ echo "Query results in $qo"
+ head -5 "$qo"
+
+ # Query results in /Users/hohn/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver/e402cf5-UseAfterFree.sarif
+ # {
+ # "$schema" : "https://json.schemastore.org/sarif-2.1.0.json",
+ # "version" : "2.1.0",
+ # "runs" : [ {
+ # "tool" : {
+
+
+
+
+#### Use directory of queries: 1 database -> 1 sarif file (least effort)
+
+ #* Set environment
+ P1_PROJ=$HOME/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
+ P1_DB=$PROJ/vulnerable-linux-driver-db
+ P1_QLQUERYDIR=$PROJ/solutions/
+ P1_QUERY_RES_SARIF=$PROJ/$(cd $PROJ && git rev-parse --short HEAD).sarif
+
+ #* check variables
+ set | grep P1_
+
+ #* Run query
+ pushd $P1_PROJ
+ codeql database analyze --format=sarif-latest --rerun \
+ --output $P1_QUERY_RES_SARIF \
+ -j6 \
+ --ram=24000 \
+ -- \
+ $P1_DB \
+ $P1_PROJ/solutions/
+ popd
+
+We can compare SARIF result sizes:
+
+ ls -la "$qo" $P1_QUERY_RES_SARIF $QUERY_RES_SARIF
+
+And for these tiny results, it's mostly metadata:
+
+ -rw-r--r-- 1 hohn staff 29K Jun 20 10:06 /Users/hohn/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver/d548189-BufferOverflow.sarif
+ -rw-r--r-- 1 hohn staff 33K Jun 20 10:02 /Users/hohn/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver/d548189.sarif
+ -rw-r--r-- 1 hohn staff 28K Jun 20 09:51 /Users/hohn/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver/e402cf5-UseAfterFree.sarif
+
+
+
+
+#### Use suite: 1 database -> 1 sarif file (more flexible, more effort)
+
+A useful, general purpose template is at
+.
+
+1. Documentation
+
+ - [built-in-codeql-query-suites](https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/built-in-codeql-query-suites)
+ - [creating-codeql-query-suites](https://docs.github.com/en/code-security/codeql-cli/using-the-codeql-cli/creating-codeql-query-suites)
+ Important:
+
+ You must add at least one query, queries, or qlpack instruction to your
+ suite definition, otherwise no queries will be selected. If the suite
+ contains no further instructions, all the queries found from the list of
+ files, in the given directory, or in the named CodeQL pack are
+ selected. If there are further filtering instructions, only queries that
+ match the constraints imposed by those instructions will be selected.
+
+ Also, a suite definition must be *in* a codeql pack.
+
+2. In short
+
+ codeql resolve qlpacks | grep cpp
+
+ # Copy query suite into the pack
+ cd ~/local/codeql-cli-end-to-end
+ cp custom-suite-1.qls codeql-workshop-vulnerable-linux-driver/solutions/
+ codeql resolve queries \
+ codeql-workshop-vulnerable-linux-driver/solutions/custom-suite-1.qls
+
+ #
+ # Taken from
+ # codeql-v2.12.3/codeql/qlpacks/codeql/suite-helpers/0.4.3/code-scanning-selectors.yml
+ # and modified
+ #
+ - description: Security sample queries
+ - queries: .
+ # - qlpack: some-pack-cpp
+ - include:
+ kind:
+ # UseAfterFree
+ - problem
+ # # BufferOverflow
+ # - path-problem
+ # precision:
+ # - high
+ # - very-high
+ # problem.severity:
+ # - error
+ # tags contain:
+ # - security
+
+ # - exclude:
+ # deprecated: //
+ # - exclude:
+ # query path:
+ # - /^experimental\/.*/
+ # - Metrics/Summaries/FrameworkCoverage.ql
+ # - /Diagnostics/Internal/.*/
+ # - exclude:
+ # tags contain:
+ # - modelgenerator
+
+
+
+
+### The importance of versioning
+
+
+
+
+#### CodeQL cli version
+
+Easy:
+
+ export PATH=$HOME/local/codeql-cli-end-to-end/codeql:"$PATH"
+ codeql --version
+
+ CodeQL command-line toolchain release 2.13.4.
+ Copyright (C) 2019-2023 GitHub, Inc.
+ Unpacked in: /Users/hohn/local/codeql-cli-end-to-end/codeql
+ Analysis results depend critically on separately distributed query and
+ extractor modules. To list modules that are visible to the toolchain,
+ use 'codeql resolve qlpacks' and 'codeql resolve languages'.
+
+
+
+
+#### Database version
+
+An attempt to run an analysis with an older version of the cli against a
+database created with a newer cli version will likely abort with an error.
+
+In terms of commands, the codeql versions used for
+
+ codeql database create ...
+
+and
+
+ codeql database analyze ..
+
+should be the same.
+
+If you just have a collection of databases, you can check what version of
+the cli produced it.
+The database directory contains the codeql version used in a yaml file,
+a human-readable check:
+
+ cd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
+ grep -A 2 creationMetadata vulnerable-linux-driver-db/codeql-database.yml
+
+ creationMetadata:
+ cliVersion: 2.13.0
+ creationTime: 2023-04-24T21:39:15.963711665Z
+
+
+
+
+#### Query set version
+
+- For suites in our own source code
+
+ Your query sets *may* have release versions or tags. But they almost
+ certainly have git commit ids that can be used, like the following:
+
+ cd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
+ git rev-parse --short HEAD
+
+ d548189
+
+ If you use packs, you can fix the ids of dependencies in the `qlpack.yml`
+ file. In our example, this is done in several places. The `common`
+ version:
+
+ cd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
+ cat common/qlpack.yml
+
+ ---
+ library: true
+ name: common
+ version: 0.0.1
+ dependencies:
+ codeql/cpp-all: 0.7.0
+
+ The dependencies are transitive; both `queries` and `solutions` depend on
+ `common`, so packs fixed by common also fix packs used by the others.
+ And `common` is fixed by our `git` id, so we're done.
+
+- Some optional details
+
+ We have specified these packs:
+
+ cd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
+ grep codeql/cpp-all */qlpack.yml
+
+ common/qlpack.yml: codeql/cpp-all: 0.7.0
+ queries/qlpack.yml: codeql/cpp-all: ^0.7.0
+
+ The caret notation `^` means "at least". So at least version 0.7.0.
+
+ After we install packs via
+
+ codeql pack install --no-strict-mode ...
+
+ some lock files are generated, and those fix versions further down the
+ dependency chain:
+
+ cd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
+ cat common/codeql-pack.lock.yml
+
+ ---
+ lockVersion: 1.0.0
+ dependencies:
+ codeql/cpp-all:
+ version: 0.7.0
+ codeql/ssa:
+ version: 0.0.15
+ codeql/tutorial:
+ version: 0.0.8
+ codeql/util:
+ version: 0.0.8
+ compiled: false
+
+- Note that a query suite is always in a codeql pack, so the pack id is also
+ the suite id.
+
+ For example, above we copied a suite and resolved it:
+
+ cd ~/local/codeql-cli-end-to-end
+ cp custom-suite-1.qls codeql-workshop-vulnerable-linux-driver/solutions/
+ codeql resolve queries \
+ codeql-workshop-vulnerable-linux-driver/solutions/custom-suite-1.qls
+
+ /Users/hohn/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver/solutions/UseAfterFree.ql
+
+ To assign a version number, we can use the revision id:
+
+ cd ~/local/codeql-cli-end-to-end
+ git rev-parse --short head
+
+ ab6131f
+
+- For manually selected library suites
+
+ For a library suite, we can use the pack id. For example, we can
+ list the packs
+
+ export PATH=$HOME/local/codeql-cli-end-to-end/codeql:"$PATH"
+ codeql resolve qlpacks | grep cpp
+
+ codeql/cpp-all (/Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-all/0.7.3)
+ codeql/cpp-examples (/Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-examples/0.0.0)
+ codeql/cpp-queries (/Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-queries/0.6.3)
+
+ Following the last one, we can find some query suites manually.
+ The pack is already known; 0.6.3.
+
+ find ~/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-queries/0.6.3 \
+ -name "*.qls"
+
+ /Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-queries/0.6.3/codeql-suites/cpp-security-extended.qls
+ /Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-queries/0.6.3/codeql-suites/cpp-security-and-quality.qls
+ /Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-queries/0.6.3/codeql-suites/cpp-security-experimental.qls
+ /Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-queries/0.6.3/codeql-suites/cpp-code-scanning.qls
+ /Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-queries/0.6.3/codeql-suites/cpp-lgtm-full.qls
+ /Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-queries/0.6.3/codeql-suites/cpp-lgtm.qls
+
+- For predefined suites from `codeql resolve queries`
+
+ A full list of suites is produced via `codeql resolve queries`, here is a
+ filtered version.
+
+ export PATH=$HOME/local/codeql-cli-end-to-end/codeql:"$PATH"
+ codeql resolve queries 2>&1 | grep cpp
+
+ cpp-code-scanning.qls - Standard Code Scanning queries for C and C++
+ cpp-lgtm-full.qls - Standard LGTM queries for C/C++, including ones not displayed by default
+ cpp-lgtm.qls - Standard LGTM queries for C/C++
+ cpp-security-and-quality.qls - Security-and-quality queries for C and C++
+ cpp-security-experimental.qls - Extended and experimental security queries for C and C++
+ cpp-security-extended.qls - Security-extended queries for C and C++
+
+ The following just counts the list but notice the header output has version
+ info reported on `stderr`:
+
+ export PATH=$HOME/local/codeql-cli-end-to-end/codeql:"$PATH"
+ ( codeql resolve queries cpp-code-scanning.qls | wc ) 2>&1
+
+ Recording pack reference codeql/cpp-queries at /Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-queries/0.6.3.
+ Recording pack reference codeql/suite-helpers at /Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-queries/0.6.3/.codeql/libraries/codeql/suite-helpers/0.5.3.
+ 47 65 5813
+
+ So we can use the codeql/cpp-queries version, 0.6.3, if we run the
+ `cpp-code-scanning.qls` query suite.
+
+The difference in the last two approaches is the way the suite is chosen. The
+version number will be the same.
+
+
+
+
+## Review results
+
+
+
+
+### SARIF Documentation
+
+The standard is defined at
+
+
+
+
+
+### SARIF viewer plugin
+
+
+
+
+#### Install plugin in VS Code
+
+
+
+Sarif Viewer
+v3.3.7
+Microsoft DevLabs
+microsoft.com
+53,335
+(1)
+
+
+
+
+#### Review
+
+ cd ~/local/codeql-cli-end-to-end
+ find . -maxdepth 2 -name "*.sarif"
+
+Pick one in VS Code. Either
+
+ cd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
+ cd codeql-workshop-vulnerable-linux-driver/
+ code d548189.sarif
+
+or manually.
+
+We need the source.
+
+ cd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
+ git submodule init
+ git submodule update
+
+When we review, VS Code will ask for the path.
+
+ cd /Users/hohn/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver/vulnerable_linux_driver
+ ls src/vuln_driver.c
+
+Reviewing looks as follows.
+
+
+
+
+
+
+### View raw sarif with `jq`
+
+List the SARIF files again
+
+ cd ~/local/codeql-cli-end-to-end
+ find . -maxdepth 2 -name "*.sarif"
+
+ ./codeql-workshop-vulnerable-linux-driver/e402cf5.sarif
+ ./codeql-workshop-vulnerable-linux-driver/d548189.sarif
+ ./codeql-workshop-vulnerable-linux-driver/d548189-BufferOverflow.sarif
+ ./codeql-workshop-vulnerable-linux-driver/e402cf5-UseAfterFree.sarif
+ ./codeql-workshop-vulnerable-linux-driver/e402cf5-BufferOverflow.sarif
+
+The CodeQL version
+
+ cd ~/local/codeql-cli-end-to-end
+ jq '.runs | .[0] | .tool.driver.semanticVersion ' < ./codeql-workshop-vulnerable-linux-driver/e402cf5.sarif
+
+ "2.13.4"
+
+The names of rules processed
+
+ cd ~/local/codeql-cli-end-to-end
+ jq '.runs | .[] | .tool.driver.rules | .[] | .name ' < ./codeql-workshop-vulnerable-linux-driver/d548189.sarif
+
+ "cpp/buffer_overflow"
+ "cpp/use_after_free"
+
+
+
+
+### View raw sarif with `jq` and fzf
+
+Install the fuzzy finder
+
+ brew install fzf
+
+or `apt-get=/=yum` on linux
+
+Try working to `.runs[0].tool.driver.rules` and follow the output in real
+time.
+
+ pushd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
+ res=e402cf5-UseAfterFree.sarif
+ echo '' | fzf --print-query --preview="jq {q} < $res"
+ popd
+
+
+
+
+### sarif-cli
+
+
+
+
+#### Setup / local install
+
+Clone or
+
+
+ cd ~/local/codeql-cli-end-to-end
+ git clone git@github.com:hohn/sarif-cli.git
+
+ cd ~/local/codeql-cli-end-to-end/sarif-cli
+ python3.9 -m venv .venv
+ . .venv/bin/activate
+
+ python -m pip install -r requirementsDEV.txt
+
+ # Put bin/ contents into venv PATH
+ pip install -e .
+
+
+
+
+#### Compiler-style textual output from SARIF
+
+The sarif-cli has several script to use from the shell level:
+
+ cd ~/local/codeql-cli-end-to-end/sarif-cli
+ ls -1 bin/
+
+ json-to-yaml
+ sarif-aggregate-scans
+ sarif-create-aggregate-report
+ sarif-digest
+ sarif-extract-multi
+ sarif-extract-scans
+ sarif-extract-scans-runner
+ sarif-extract-tables
+ sarif-labeled
+ sarif-list-files
+ sarif-pad-aggregate
+ sarif-results-summary
+ sarif-to-dot
+
+The simplest one just list the source files found during analysis:
+
+ . ~/local/codeql-cli-end-to-end/sarif-cli/.venv/bin/activate
+ cd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
+ sarif-list-files d548189.sarif
+
+ src/buffer_overflow.h
+ src/use_after_free.h
+ src/vuln_driver.c
+
+Much more useful is a compiler-style summary of all results found:
+
+ . ~/local/codeql-cli-end-to-end/sarif-cli/.venv/bin/activate
+ cd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
+ sarif-results-summary d548189.sarif
+
+ RESULT: src/buffer_overflow.h:20:43:20:47: User-controlled size argument in call to [memcpy](1) copying to a [stack buffer](2)
+ PATH 0
+ FLOW STEP 0: src/vuln_driver.c:17:73:17:77: args
+ FLOW STEP 1: src/vuln_driver.c:28:20:28:33: args
+ FLOW STEP 2: src/buffer_overflow.h:6:42:6:46: buff
+ FLOW STEP 3: src/buffer_overflow.h:20:43:20:47: size
+
+ RESULT: src/use_after_free.h:28:11:28:25: The dangling pointer is used here: [fn](1)
+ The dangling pointer is used here: [fn](2)
+ The dangling pointer is used here: [arg](3)
+ The dangling pointer is used here: [fn](4)
+ The dangling pointer is used here: [arg](5)
+
+This sarif file has only two results, so the output is short:
+
+ RESULT: src/buffer_overflow.h:20:43:20:47: User-controlled size argument in call to [memcpy](1) copying to a [stack buffer](2)
+ PATH 0
+ FLOW STEP 0: src/vuln_driver.c:17:73:17:77: args
+ FLOW STEP 1: src/vuln_driver.c:28:20:28:33: args
+ FLOW STEP 2: src/buffer_overflow.h:6:42:6:46: buff
+ FLOW STEP 3: src/buffer_overflow.h:20:43:20:47: size
+
+ RESULT: src/use_after_free.h:28:11:28:25: The dangling pointer is used here: [fn](1)
+ The dangling pointer is used here: [fn](2)
+ The dangling pointer is used here: [arg](3)
+ The dangling pointer is used here: [fn](4)
+ The dangling pointer is used here: [arg](5)
+
+This illustrates the differences in the output between the two result `@kind`
+s:
+
+- `@kind problem` is a single list of results found
+- `@kind path-problem` is a list of flow paths. Each path in turn is a list
+ of locations.
+
+Most of these scripts take options that significantly change their output; to
+see them, use the `-h` or `--help` flags. E.g.,
+
+ . ~/local/codeql-cli-end-to-end/sarif-cli/.venv/bin/activate
+ sarif-results-summary -h
+
+ usage: sarif-results-summary [-h] [-s srcroot] [-r] [-e] [-c] sarif-file
+
+ summary of results
+
+ positional arguments:
+ sarif-file input file, - for stdin
+
+ optional arguments:
+ -h, --help show this help message and exit
+ -s srcroot, --list-source srcroot
+ list source snippets using srcroot as sarif SRCROOT
+ -r, --related-locations
+ list related locations like "hides [parameter](1)"
+ -e, --endpoints-only only list source and sink, dropping the path.
+ Identical, successive source/sink pairs are combined
+ -c, --csv output csv instead of human-readable summary
+
+Some of these make output much more informative, like `-r` and `-s`:
+
+With `-r`:
+
+ . ~/local/codeql-cli-end-to-end/sarif-cli/.venv/bin/activate
+ cd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
+ sarif-results-summary -r d548189.sarif
+
+ RESULT: src/buffer_overflow.h:20:43:20:47: User-controlled size argument in call to [memcpy](1) copying to a [stack buffer](2)
+ REFERENCE: src/buffer_overflow.h:20:17:20:23: memcpy
+ REFERENCE: src/buffer_overflow.h:8:22:8:33: stack buffer
+ PATH 0
+ FLOW STEP 0: src/vuln_driver.c:17:73:17:77: args
+ FLOW STEP 1: src/vuln_driver.c:28:20:28:33: args
+ FLOW STEP 2: src/buffer_overflow.h:6:42:6:46: buff
+ FLOW STEP 3: src/buffer_overflow.h:20:43:20:47: size
+
+ RESULT: src/use_after_free.h:28:11:28:25: The dangling pointer is used here: [fn](1)
+ The dangling pointer is used here: [fn](2)
+ The dangling pointer is used here: [arg](3)
+ The dangling pointer is used here: [fn](4)
+ The dangling pointer is used here: [arg](5)
+ REFERENCE: src/use_after_free.h:84:22:84:24: fn
+ REFERENCE: src/use_after_free.h:87:70:87:72: fn
+ REFERENCE: src/use_after_free.h:87:90:87:93: arg
+ REFERENCE: src/use_after_free.h:89:20:89:22: fn
+ REFERENCE: src/use_after_free.h:89:39:89:42: arg
+
+If the source code is available, we can use `-s` to include snippets in the
+output. This effectively converts sarif to the format used by gcc and clang
+to report warnings and errors.
+
+ . ~/local/codeql-cli-end-to-end/sarif-cli/.venv/bin/activate
+ cd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
+ sarif-results-summary -s vulnerable_linux_driver/ d548189.sarif
+
+ RESULT: src/buffer_overflow.h:20:43:20:47: User-controlled size argument in call to [memcpy](1) copying to a [stack buffer](2)
+ memcpy(kernel_buff, buff, size);
+ ^^^^
+ PATH 0
+ FLOW STEP 0: src/vuln_driver.c:17:73:17:77: args
+ static long do_ioctl(struct file *filp, unsigned int cmd, unsigned long args)
+ ^^^^
+ FLOW STEP 1: src/vuln_driver.c:28:20:28:33: args
+ buffer_overflow((char *) args);
+ ^^^^^^^^^^^^^
+ FLOW STEP 2: src/buffer_overflow.h:6:42:6:46: buff
+ static int buffer_overflow(char __user *buff)
+ ^^^^
+ FLOW STEP 3: src/buffer_overflow.h:20:43:20:47: size
+ memcpy(kernel_buff, buff, size);
+ ^^^^
+
+ RESULT: src/use_after_free.h:28:11:28:25: The dangling pointer is used here: [fn](1)
+ The dangling pointer is used here: [fn](2)
+ The dangling pointer is used here: [arg](3)
+ The dangling pointer is used here: [fn](4)
+ The dangling pointer is used here: [arg](5)
+ uaf_obj *global_uaf_obj = NULL;
+ ^^^^^^^^^^^^^^
+
+
+
+
+#### SQL conversion – not compatible with codeql v2.13.4
+
+The ultimate purpose of the sarif-cli is producing CSV files for import into
+SQL databases. This requires a completely defined static structure, without
+any optional fields. The internals of the tool are beyond the scope of this
+workshop, some details are their external effects are important:
+
+1. a (very large and comprehensive) type signature is defined in sarif-cli
+2. sarif files that have extra fields not in the signature will produce warnings
+3. sarif files that are missing fields from the signature will produce a fatal
+ error. A message will be printed and the scripts will abort.
+4. Sometimes, sarif files will have a field but no content. For a number of
+ these, dummy values are inserted. One example are queries that don't
+ produce line numbers in their output; for those, -1 is used as value.
+
+Unfortunately, this version of codeql
+
+ cd ~/local/codeql-cli-end-to-end
+ ./codeql/codeql --version
+
+ CodeQL command-line toolchain release 2.13.4.
+ Copyright (C) 2019-2023 GitHub, Inc.
+ Unpacked in: /Users/hohn/local/codeql-cli-end-to-end/codeql
+ Analysis results depend critically on separately distributed query and
+ extractor modules. To list modules that are visible to the toolchain,
+ use 'codeql resolve qlpacks' and 'codeql resolve languages'.
+
+has signature changes incompatible with (the older) sarif-cli (version
+e62c351)
+
+
+
+
+## Running sequence
+
+
+
+
+### Smallest query suite to largest
+
+A short script to show us how many queries the standard suites have:
+
+ export PATH=$HOME/local/codeql-cli-end-to-end/codeql:"$PATH"
+
+ queries=`codeql resolve queries 2>&1 | grep cpp | awk '{print($1)}'`
+ (
+ for suite in $queries
+ do
+ len=`codeql resolve queries $suite | wc -l`
+ echo "Suite $suite has $len queries"
+ done
+ ) 2>/dev/null
+
+ Suite cpp-code-scanning.qls has 47 queries
+ Suite cpp-lgtm-full.qls has 169 queries
+ Suite cpp-lgtm.qls has 100 queries
+ Suite cpp-security-and-quality.qls has 167 queries
+ Suite cpp-security-experimental.qls has 118 queries
+ Suite cpp-security-extended.qls has 83 queries
+
+If we want to gradually increase coverage using the standard suites, we would
+thus use them in this order:
+
+- cpp-code-scanning.qls, 47 queries
+- cpp-security-extended.qls, 83 queries
+- cpp-lgtm.qls, 100 queries
+- cpp-security-experimental.qls, 118 queries
+- cpp-security-and-quality.qls, 167 queries
+- cpp-lgtm-full.qls, 169 queries
+
+
+
+
+### Working with results based on counts
+
+- Lots of result (> 5000)
+
+ Use the [sarif-cli](#org08832bc), e.g., `sarif-results-summary -r d548189.sarif`, as above.
+
+- Medium result sets (~ 2000 results)
+
+ Use the [sarif-cli](#org08832bc) or try the [SARIF viewer plugin](#org1dd7344).
+
+- Few results
+
+ Use the [SARIF viewer plugin](#org1dd7344) for detailed review and working with the results
+ / queries. Use the [sarif-cli](#org08832bc) for quick command-line comparison.
+
+
+
+
+## Comparing analysis results across sarif files
+
+Use the [sarif-cli](#org08832bc).
+
+Options:
+
+- use `sarif-results-summary` on each sarif result file individually, then
+ compare the resulting text files via `diff`-style tools
+- (powerful, but effort required) if your version of CodeQL is compatible, use
+ `sarif-extract-scans-runner` to put all results into an SQL database and use
+ that to query the results.
+
+
+
+
+## Miscellany
+
+- Scale factor for building DBs: Common case: 15 minutes for a parallel cpp
+ compilation can be a 2 hour database build for codeql.
+
diff --git a/readme.org b/readme.org
index fafe79c..550f4c0 100644
--- a/readme.org
+++ b/readme.org
@@ -6,695 +6,6 @@
* End-to-end demo of CodeQL command line usage
-** Run analyses
-*** Get collection of databases (already handy)
-**** DONE Get https://github.com/hohn/codeql-workshop-vulnerable-linux-driver
-#+begin_src text
- cd ~/local
- git clone git@github.com:hohn/codeql-workshop-vulnerable-linux-driver.git
- cd codeql-workshop-vulnerable-linux-driver/
- unzip vulnerable-linux-driver.zip
- tree -L 2 vulnerable-linux-driver-db/
- vulnerable-linux-driver-db/
- ├── codeql-database.yml
- ├── db-cpp
- │ ├── default
- │ ├── semmlecode.cpp.dbscheme
- │ └── semmlecode.cpp.dbscheme.stats
- └── src.zip
-
- 3 directories, 4 files
-#+end_src
-**** DONE Quick check using VS Code. Same steps will repeat:
-***** select DB
-***** select query
-***** run query
-***** view results
-**** DONE Install codeql
-***** Full docs:
-https://docs.github.com/en/code-security/codeql-cli/using-the-codeql-cli/getting-started-with-the-codeql-cli#getting-started-with-the-codeql-cli
-https://docs.github.com/en/code-security/code-scanning/using-codeql-code-scanning-with-your-existing-ci-system/installing-codeql-cli-in-your-ci-system#setting-up-the-codeql-cli-in-your-ci-system
-***** In short:
-#+begin_src sh
- cd ~/local/codeql-cli-end-to-endw
- # Decide on version / os via browser, then:
- wget https://github.com/github/codeql-action/releases/download/codeql-bundle-v2.13.4/codeql-bundle-osx64.tar.gz
-
- # Fix attributes on mac
- if [ `uname` = Darwin ] ; then
- xattr -c *.tar.gz
- fi
-
- # Extract
- tar zxf ./codeql-bundle-osx64.tar.gz
-
- # Check binary
- pwd
- # /Users/hohn/local/codeql-cli-end-to-end
- ./codeql/codeql --version
- # CodeQL command-line toolchain release 2.13.4.
- # Copyright (C) 2019-2023 GitHub, Inc.
- # Unpacked in: /Users/hohn/local/codeql-cli-end-to-end/codeql
- # Analysis results depend critically on separately distributed query and
- # extractor modules. To list modules that are visible to the toolchain,
- # use 'codeql resolve qlpacks' and 'codeql resolve languages'.
-
- # Check packs
- 0:$ ./codeql/codeql resolve qlpacks |head -5
- # codeql/cpp-all (/Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-all/0.7.3)
- # codeql/cpp-examples (/Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-examples/0.0.0)
- # codeql/cpp-queries (/Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-queries/0.6.3)
- # codeql/csharp-all (/Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/csharp-all/0.6.3)
- # codeql/csharp-examples (/Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/csharp-examples/0.0.0)
-
- # Fix the path
- export PATH=$(pwd -P)/codeql:"$PATH"
-
- # Check languages
- codeql resolve languages | head -5
- # go (/Users/hohn/local/codeql-cli-end-to-end/codeql/go)
- # python (/Users/hohn/local/codeql-cli-end-to-end/codeql/python)
- # java (/Users/hohn/local/codeql-cli-end-to-end/codeql/java)
- # html (/Users/hohn/local/codeql-cli-end-to-end/codeql/html)
- # xml (/Users/hohn/local/codeql-cli-end-to-end/codeql/xml)
-#+end_src
-***** A more fancy version
-#+begin_src sh
- # Reference urls:
- # https://github.com/github/codeql-cli-binaries/releases/download/v2.8.0/codeql-linux64.zip
- # https://github.com/github/codeql/archive/refs/tags/codeql-cli/v2.8.0.zip
- #
- # grab -- retrieve and extract codeql cli and library
- # Usage: grab version url prefix
- grab() {
- version=$1; shift
- platform=$1; shift
- prefix=$1; shift
- mkdir -p $prefix/codeql-$version &&
- cd $prefix/codeql-$version || return
-
- # Get cli
- wget "https://github.com/github/codeql-cli-binaries/releases/download/$version/codeql-$platform.zip"
- # Get lib
- wget "https://github.com/github/codeql/archive/refs/tags/codeql-cli/$version.zip"
- # Fix attributes
- if [ `uname` = Darwin ] ; then
- xattr -c *.zip
- fi
- # Extract
- unzip -q codeql-$platform.zip
- unzip -q $version.zip
- # Rename library directory for VS Code
- mv codeql-codeql-cli-$version/ ql
- # remove archives?
- # rm codeql-$platform.zip
- # rm $version.zip
- }
-
- grab v2.7.6 osx64 $HOME/local
- grab v2.8.3 osx64 $HOME/local
- grab v2.8.4 osx64 $HOME/local
-
- grab v2.6.3 linux64 /opt
-
- grab v2.6.3 osx64 $HOME/local
- grab v2.4.6 osx64 $HOME/local
-#+end_src
-***** Most flexible in use, but more initial setup
-=gh=, the GitHub command-line tool from https://github.com/cli/cli
-
-****** gh api repos/{owner}/{repo}/releases
-https://cli.github.com/manual/gh_api
-****** gh extension create
-https://cli.github.com/manual/gh_extension
-****** gh codeql extension
-https://github.com/github/gh-codeql
-****** gh gist list
-https://cli.github.com/manual/gh_gist_list
-
-#+begin_src text
- 0:$ gh codeql
- GitHub command-line wrapper for the CodeQL CLI.
-#+end_src
-**** Install pack dependencies
-***** Full docs
-https://docs.github.com/en/code-security/codeql-cli/codeql-cli-reference/about-codeql-packs#about-qlpackyml-files
-https://docs.github.com/en/code-security/codeql-cli/codeql-cli-manual/pack-install
-***** View installed docs via =-h= flag, highly recommended
-#+begin_src sh
- # Overview
- codeql -h
-
- # Sub 1
- codeql pack -h
-
- # Sub 2
- codeql pack install -h
-#+end_src
-***** In short
-****** Create the qlpack
-Create the qlpack files if not there, one per directory. In this project,
-that's already done:
-#+begin_src sh
- 0:$ find codeql-workshop-vulnerable-linux-driver -name "qlpack.yml"
- codeql-workshop-vulnerable-linux-driver/queries/qlpack.yml
- codeql-workshop-vulnerable-linux-driver/solutions/qlpack.yml
- codeql-workshop-vulnerable-linux-driver/common/qlpack.yml
-#+end_src
-For example:
-: cat codeql-workshop-vulnerable-linux-driver/queries/qlpack.yml
-
-shows
-#+begin_src yaml
- ---
- library: false
- name: queries
- version: 0.0.1
- dependencies:
- codeql/cpp-all: ^0.7.0
- common: "*"
-#+end_src
-So the queries directory does not contain a library, but it depends on one,
-: cat codeql-workshop-vulnerable-linux-driver/common/qlpack.yml
-
-#+begin_src yaml
- ---
- library: true
- name: common
- version: 0.0.1
- dependencies:
- codeql/cpp-all: 0.7.0
-#+end_src
-
-****** Install each pack's dependencies
-The first time you install dependencies, it's a good idea to do this
-menually, per =qlpack.yml= file, and deal with any errors that may occur.
-
-#+begin_src sh
- pushd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
- codeql pack install --no-strict-mode queries/
-#+end_src
-
-After the initial setup and for automation, install each pack's
-dependencies via a loop: =codeql pack install=
-#+begin_src sh
- pushd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
- find . -name "qlpack.yml"
- # ./queries/qlpack.yml
- # ./solutions/qlpack.yml
- # ./common/qlpack.yml
-
- codeql pack install --no-strict-mode queries/
- # Dependencies resolved. Installing packages...
- # Install location: /Users/hohn/.codeql/packages
- # Nothing to install.
- # Package install location: /Users/hohn/.codeql/packages
- # Nothing downloaded.
-
- for sub in `find . -name "qlpack.yml" | sed s@qlpack.yml@@g;`
- do
- codeql pack install --no-strict-mode $sub
- done
-#+end_src
-*** Run queries
-**** Individual: 1 database -> N sarif files
-#+begin_src sh
- #* Set environment
- PROJ=$HOME/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
- DB=$PROJ/vulnerable-linux-driver-db
- QLQUERY=$PROJ/solutions/BufferOverflow.ql
- QUERY_RES_SARIF=$PROJ/$(cd $PROJ && git rev-parse --short HEAD)-BufferOverflow.sarif
-
- #* Run query
- pushd $PROJ
- codeql database analyze --format=sarif-latest --rerun \
- --output $QUERY_RES_SARIF \
- -j6 \
- --ram=24000 \
- -- \
- $DB \
- $QLQUERY
-
- # if you get
- # fatal error occurred: Error initializing the IMB disk cache: the cache
- # directory is already locked by another running process. Only one instance of
- # the IMB can access a cache directory at a time. The lock file is located at
- # /Users/hohn/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver/vulnerable-linux-driver-db/db-cpp/default/cache/.lock
- # exit vs code and try again
-#+end_src
-
-And after some time:
-
-#+begin_src text
- BufferOverflow.ql: [1/1 eval 1.8s] Results written to solutions/BufferOverfl
- Shutting down query evaluator.
- Interpreting results.
-#+end_src
-
-#+begin_src sh
- echo The query $QLQUERY
- echo run on $DB
- echo produced output in $QUERY_RES_SARIF:
- head -5 $QUERY_RES_SARIF
- # {
- # "$schema" : "https://json.schemastore.org/sarif-2.1.0.json",
- # "version" : "2.1.0",
- # "runs" : [ {
- # "tool" : {
- # ...
-#+end_src
-
-And run another, get another sarif file. Bad idea in general, but good for
-debugging timing etc.
-
-#+begin_src sh
- #* Use prior variable settings
-
- #* Run query
- pushd $PROJ
- qo=$PROJ/$(cd $PROJ && git rev-parse --short HEAD)-UseAfterFree.sarif
- codeql database analyze --format=sarif-latest --rerun \
- --output $qo \
- -j6 \
- --ram=24000 \
- -- \
- $DB \
- $PROJ/solutions/UseAfterFree.ql
- popd
-
- echo "Query results in $qo"
- head -5 "$qo"
-
- # Query results in /Users/hohn/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver/e402cf5-UseAfterFree.sarif
- # {
- # "$schema" : "https://json.schemastore.org/sarif-2.1.0.json",
- # "version" : "2.1.0",
- # "runs" : [ {
- # "tool" : {
-#+end_src
-
-**** Use directory of queries: 1 database -> 1 sarif file (least effort)
-#+begin_src sh
- #* Set environment
- P1_PROJ=$HOME/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
- P1_DB=$PROJ/vulnerable-linux-driver-db
- P1_QLQUERYDIR=$PROJ/solutions/
- P1_QUERY_RES_SARIF=$PROJ/$(cd $PROJ && git rev-parse --short HEAD).sarif
-
- #* check variables
- set | grep P1_
-
- #* Run query
- pushd $P1_PROJ
- codeql database analyze --format=sarif-latest --rerun \
- --output $P1_QUERY_RES_SARIF \
- -j6 \
- --ram=24000 \
- -- \
- $P1_DB \
- $P1_PROJ/solutions/
- popd
-#+end_src
-
-We can compare SARIF result sizes:
-#+begin_src sh
- ls -la "$qo" $P1_QUERY_RES_SARIF $QUERY_RES_SARIF
-#+end_src
-
-And for these tiny results, it's mostly metadata:
-#+begin_src text
- -rw-r--r-- 1 hohn staff 29K Jun 20 10:06 /Users/hohn/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver/d548189-BufferOverflow.sarif
- -rw-r--r-- 1 hohn staff 33K Jun 20 10:02 /Users/hohn/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver/d548189.sarif
- -rw-r--r-- 1 hohn staff 28K Jun 20 09:51 /Users/hohn/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver/e402cf5-UseAfterFree.sarif
-#+end_src
-
-**** Use suite: 1 database -> 1 sarif file (more flexible, more effort)
-A useful, general purpose template is at
-https://github.com/rvermeulen/codeql-example-project-layout.
-
-***** Documentation
-- [[https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/built-in-codeql-query-suites][built-in-codeql-query-suites]]
-- [[https://docs.github.com/en/code-security/codeql-cli/using-the-codeql-cli/creating-codeql-query-suites][creating-codeql-query-suites]]
- Important:
-
- You must add at least one query, queries, or qlpack instruction to your
- suite definition, otherwise no queries will be selected. If the suite
- contains no further instructions, all the queries found from the list of
- files, in the given directory, or in the named CodeQL pack are
- selected. If there are further filtering instructions, only queries that
- match the constraints imposed by those instructions will be selected.
-
- Also, a suite definition must be /in/ a codeql pack.
-***** In short
-#+begin_src sh
- codeql resolve qlpacks | grep cpp
-
- # Copy query suite into the pack
- cd ~/local/codeql-cli-end-to-end
- cp custom-suite-1.qls codeql-workshop-vulnerable-linux-driver/solutions/
- codeql resolve queries \
- codeql-workshop-vulnerable-linux-driver/solutions/custom-suite-1.qls
-#+end_src
-
-#+begin_src yaml
- #
- # Taken from
- # codeql-v2.12.3/codeql/qlpacks/codeql/suite-helpers/0.4.3/code-scanning-selectors.yml
- # and modified
- #
- - description: Security sample queries
- - queries: .
- # - qlpack: some-pack-cpp
- - include:
- kind:
- # UseAfterFree
- - problem
- # # BufferOverflow
- # - path-problem
- # precision:
- # - high
- # - very-high
- # problem.severity:
- # - error
- # tags contain:
- # - security
-
- # - exclude:
- # deprecated: //
- # - exclude:
- # query path:
- # - /^experimental\/.*/
- # - Metrics/Summaries/FrameworkCoverage.ql
- # - /Diagnostics/Internal/.*/
- # - exclude:
- # tags contain:
- # - modelgenerator
-#+end_src
-
-**** TODO Include versioning:
-***** TODO codeql cli
-***** TODO query set version
-Checks:
-**** For building DBs: Common case: 15 minutes for || cpp compilation, can
-be 2 h with codeql.
-** Review results
-*** SARIF Documentation
-The standard is defined at
-https://docs.oasis-open.org/sarif/sarif/v2.1.0/os/sarif-v2.1.0-os.html
-*** SARIF viewer plugin
-**** Install plugin in VS Code
-https://marketplace.visualstudio.com/items?itemName=MS-SarifVSCode.sarif-viewer
-
-Sarif Viewer
-v3.3.7
-Microsoft DevLabs
-microsoft.com
-53,335
-(1)
-
-**** Review
-#+begin_src sh
- cd ~/local/codeql-cli-end-to-end
- find . -maxdepth 2 -name "*.sarif"
-#+end_src
-Pick one in VS Code. Either
-#+begin_src sh
- cd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
- cd codeql-workshop-vulnerable-linux-driver/
- code d548189.sarif
-#+end_src
-or manually.
-
-We need the source.
-
-#+begin_src sh
- cd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
- git submodule init
- git submodule update
-#+end_src
-
-When we review, VS Code will ask for the path.
-
-#+begin_src sh
- cd /Users/hohn/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver/vulnerable_linux_driver
- ls src/vuln_driver.c
-#+end_src
-
-Reviewing looks as follows.
-[[file:./img/sarif-view-1.png]]
-
-*** View raw sarif with =jq=
-List the SARIF files again
-#+begin_src sh
- cd ~/local/codeql-cli-end-to-end
- find . -maxdepth 2 -name "*.sarif"
-#+end_src
-
-The CodeQL version
-#+begin_src sh
- cd ~/local/codeql-cli-end-to-end
- jq '.runs | .[0] | .tool.driver.semanticVersion ' < ./codeql-workshop-vulnerable-linux-driver/e402cf5.sarif
-#+end_src
-
-#+results:
-: 2.13.4
-
-
-The names of rules processed
-#+begin_src sh
- cd ~/local/codeql-cli-end-to-end
- jq '.runs | .[] | .tool.driver.rules | .[] | .name ' < ./codeql-workshop-vulnerable-linux-driver/d548189.sarif
-#+end_src
-
-#+results:
-| cpp/buffer_overflow |
-| cpp/use_after_free |
-
-*** View raw sarif with =jq= and fzf
-Install the fuzzy finder
-: brew install fzf
-
-or =apt-get=/=yum= on linux
-
-Try working to =.runs[0].tool.driver.rules= and follow the output in real
-time.
-
-#+begin_src sh
- pushd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
- res=e402cf5-UseAfterFree.sarif
- echo '' | fzf --print-query --preview="jq {q} < $res"
- popd
-#+end_src
-
-*** sarif-cli
-**** Setup / local install
-Clone https://github.com/hohn/sarif-cli or
-https://github.com/knewbury01/sarif-cli
-
-#+begin_src sh
- cd ~/local/codeql-cli-end-to-end
- git clone git@github.com:hohn/sarif-cli.git
-
- cd ~/local/codeql-cli-end-to-end/sarif-cli
- python3.9 -m venv .venv
- . .venv/bin/activate
-
- python -m pip install -r requirementsDEV.txt
-
- # Put bin/ contents into venv PATH
- pip install -e .
-#+end_src
-
-**** Compiler-style textual output from SARIF
-The sarif-cli has several script to use from the shell level:
-#+begin_src sh
- cd ~/local/codeql-cli-end-to-end/sarif-cli
- ls -1 bin/
-#+end_src
-
-#+results:
-#+begin_example
- json-to-yaml
- sarif-aggregate-scans
- sarif-create-aggregate-report
- sarif-digest
- sarif-extract-multi
- sarif-extract-scans
- sarif-extract-scans-runner
- sarif-extract-tables
- sarif-labeled
- sarif-list-files
- sarif-pad-aggregate
- sarif-results-summary
- sarif-to-dot
-#+end_example
-
-
-The simplest one just list the source files found during analysis:
-#+begin_src sh
- . ~/local/codeql-cli-end-to-end/sarif-cli/.venv/bin/activate
- cd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
- sarif-list-files d548189.sarif
-#+end_src
-
-#+results:
-: src/buffer_overflow.h
-: src/use_after_free.h
-: src/vuln_driver.c
-
-
-Much more useful is a compiler-style summary of all results found:
-#+begin_src sh
- . ~/local/codeql-cli-end-to-end/sarif-cli/.venv/bin/activate
- cd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
- sarif-results-summary d548189.sarif
-#+end_src
-
-#+results:
-#+begin_example
- RESULT: src/buffer_overflow.h:20:43:20:47: User-controlled size argument in call to [memcpy](1) copying to a [stack buffer](2)
- PATH 0
- FLOW STEP 0: src/vuln_driver.c:17:73:17:77: args
- FLOW STEP 1: src/vuln_driver.c:28:20:28:33: args
- FLOW STEP 2: src/buffer_overflow.h:6:42:6:46: buff
- FLOW STEP 3: src/buffer_overflow.h:20:43:20:47: size
-
- RESULT: src/use_after_free.h:28:11:28:25: The dangling pointer is used here: [fn](1)
- The dangling pointer is used here: [fn](2)
- The dangling pointer is used here: [arg](3)
- The dangling pointer is used here: [fn](4)
- The dangling pointer is used here: [arg](5)
-#+end_example
-
-This sarif file has only two results, so the output is short:
-
-#+results:
-#+begin_example
- RESULT: src/buffer_overflow.h:20:43:20:47: User-controlled size argument in call to [memcpy](1) copying to a [stack buffer](2)
- PATH 0
- FLOW STEP 0: src/vuln_driver.c:17:73:17:77: args
- FLOW STEP 1: src/vuln_driver.c:28:20:28:33: args
- FLOW STEP 2: src/buffer_overflow.h:6:42:6:46: buff
- FLOW STEP 3: src/buffer_overflow.h:20:43:20:47: size
-
- RESULT: src/use_after_free.h:28:11:28:25: The dangling pointer is used here: [fn](1)
- The dangling pointer is used here: [fn](2)
- The dangling pointer is used here: [arg](3)
- The dangling pointer is used here: [fn](4)
- The dangling pointer is used here: [arg](5)
-#+end_example
-
-This illustrates the differences in the output between the two result =@kind=
-s:
-- =@kind problem= is a single list of results found
-- =@kind path-problem= is a list of flow paths. Each path in turn is a list
- of locations.
-
-Most of these scripts take options that significantly change their output; to
-see them, use the =-h= or =--help= flags. E.g.,
-#+begin_src sh
- . ~/local/codeql-cli-end-to-end/sarif-cli/.venv/bin/activate
- sarif-results-summary -h
-#+end_src
-
-#+results:
-#+begin_example
- usage: sarif-results-summary [-h] [-s srcroot] [-r] [-e] [-c] sarif-file
-
- summary of results
-
- positional arguments:
- sarif-file input file, - for stdin
-
- optional arguments:
- -h, --help show this help message and exit
- -s srcroot, --list-source srcroot
- list source snippets using srcroot as sarif SRCROOT
- -r, --related-locations
- list related locations like "hides [parameter](1)"
- -e, --endpoints-only only list source and sink, dropping the path.
- Identical, successive source/sink pairs are combined
- -c, --csv output csv instead of human-readable summary
-#+end_example
-
-Some of these make output much more informative, like =-r= and =-s=:
-
-With =-r=:
-
-#+begin_src sh
- . ~/local/codeql-cli-end-to-end/sarif-cli/.venv/bin/activate
- cd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
- sarif-results-summary -r d548189.sarif
-#+end_src
-
-#+results:
-#+begin_example
- RESULT: src/buffer_overflow.h:20:43:20:47: User-controlled size argument in call to [memcpy](1) copying to a [stack buffer](2)
- REFERENCE: src/buffer_overflow.h:20:17:20:23: memcpy
- REFERENCE: src/buffer_overflow.h:8:22:8:33: stack buffer
- PATH 0
- FLOW STEP 0: src/vuln_driver.c:17:73:17:77: args
- FLOW STEP 1: src/vuln_driver.c:28:20:28:33: args
- FLOW STEP 2: src/buffer_overflow.h:6:42:6:46: buff
- FLOW STEP 3: src/buffer_overflow.h:20:43:20:47: size
-
- RESULT: src/use_after_free.h:28:11:28:25: The dangling pointer is used here: [fn](1)
- The dangling pointer is used here: [fn](2)
- The dangling pointer is used here: [arg](3)
- The dangling pointer is used here: [fn](4)
- The dangling pointer is used here: [arg](5)
- REFERENCE: src/use_after_free.h:84:22:84:24: fn
- REFERENCE: src/use_after_free.h:87:70:87:72: fn
- REFERENCE: src/use_after_free.h:87:90:87:93: arg
- REFERENCE: src/use_after_free.h:89:20:89:22: fn
- REFERENCE: src/use_after_free.h:89:39:89:42: arg
-#+end_example
-
-If the source code is available, we can use =-s= to include snippets in the
-output. This effectively converts sarif to the format used by gcc and clang
-to report warnings and errors.
-#+begin_src sh
- . ~/local/codeql-cli-end-to-end/sarif-cli/.venv/bin/activate
- cd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
- sarif-results-summary -s vulnerable_linux_driver/ d548189.sarif
-#+end_src
-
-#+results:
-#+begin_example
- RESULT: src/buffer_overflow.h:20:43:20:47: User-controlled size argument in call to [memcpy](1) copying to a [stack buffer](2)
- memcpy(kernel_buff, buff, size);
- ^^^^
- PATH 0
- FLOW STEP 0: src/vuln_driver.c:17:73:17:77: args
- static long do_ioctl(struct file *filp, unsigned int cmd, unsigned long args)
- ^^^^
- FLOW STEP 1: src/vuln_driver.c:28:20:28:33: args
- buffer_overflow((char *) args);
- ^^^^^^^^^^^^^
- FLOW STEP 2: src/buffer_overflow.h:6:42:6:46: buff
- static int buffer_overflow(char __user *buff)
- ^^^^
- FLOW STEP 3: src/buffer_overflow.h:20:43:20:47: size
- memcpy(kernel_buff, buff, size);
- ^^^^
-
- RESULT: src/use_after_free.h:28:11:28:25: The dangling pointer is used here: [fn](1)
- The dangling pointer is used here: [fn](2)
- The dangling pointer is used here: [arg](3)
- The dangling pointer is used here: [fn](4)
- The dangling pointer is used here: [arg](5)
- uaf_obj *global_uaf_obj = NULL;
- ^^^^^^^^^^^^^^
-#+end_example
-
-**** TODO SQL conversion
-** Running sequence
-*** Smallest query suite (security suite).
-*** Check results.
-**** Lots of result (> 5000) -> cli review via compiler-style dump.
-**** Medium result sets (~ 2000) (sarif review plugin, can only load 5000
-results)
-**** Few results (sarif review plugin, can only load 5000 results)
-*** Expand query
-** Compare results.
-*** sarif-cli using compiler-style dump
+ This workshop has pre-rendered material in
+ - doc/readme.html :: best quality, for downloading and viewing locally.
+ - doc/readme.md :: for quick browsing. Formatting may be limited.