Files
codeql-cli-end-to-end/doc
2023-06-21 18:56:36 -07:00
..
2023-06-21 18:56:36 -07:00

 -*- mode: org; org-confirm-babel-evaluate: nil; coding: utf-8 -*-
#+OPTIONS: H:3 num:t \n:nil @:t ::t |:t ^:{} f:t *:t TeX:t LaTeX:t skip:nil p:nil
#+OPTIONS: org-confirm-babel-evaluate:nil

# readme.in is the (partial) literate program used to generate readme.org

* End-to-end demo of CodeQL command line usage

** Run analyses
*** Get collection of databases (already handy)
**** DONE Get https://github.com/hohn/codeql-workshop-vulnerable-linux-driver
     #+begin_src text
       cd ~/local
       git clone git@github.com:hohn/codeql-workshop-vulnerable-linux-driver.git
       cd codeql-workshop-vulnerable-linux-driver/
       unzip vulnerable-linux-driver.zip
       tree -L 2 vulnerable-linux-driver-db/
       vulnerable-linux-driver-db/
       ├── codeql-database.yml
       ├── db-cpp
       │   ├── default
       │   ├── semmlecode.cpp.dbscheme
       │   └── semmlecode.cpp.dbscheme.stats
       └── src.zip

       3 directories, 4 files
     #+end_src
**** DONE Quick check using VS Code.  Same steps will repeat:
***** select DB
***** select query
***** run query
***** view results
**** DONE Install codeql
***** Full docs:
      https://docs.github.com/en/code-security/codeql-cli/using-the-codeql-cli/getting-started-with-the-codeql-cli#getting-started-with-the-codeql-cli
      https://docs.github.com/en/code-security/code-scanning/using-codeql-code-scanning-with-your-existing-ci-system/installing-codeql-cli-in-your-ci-system#setting-up-the-codeql-cli-in-your-ci-system
***** In short:
      #+begin_src sh
        cd ~/local/codeql-cli-end-to-end
        # Decide on version / os via browser, then: 
        wget https://github.com/github/codeql-action/releases/download/codeql-bundle-v2.13.4/codeql-bundle-osx64.tar.gz

        # Fix attributes on mac
        if [ `uname` = Darwin ] ; then
            xattr -c *.tar.gz
        fi

        # Extract
        tar zxf ./codeql-bundle-osx64.tar.gz

        # Check binary
        pwd
        # /Users/hohn/local/codeql-cli-end-to-end
        ./codeql/codeql --version
        # CodeQL command-line toolchain release 2.13.4.
        # Copyright (C) 2019-2023 GitHub, Inc.
        # Unpacked in: /Users/hohn/local/codeql-cli-end-to-end/codeql
        #    Analysis results depend critically on separately distributed query and
        #    extractor modules. To list modules that are visible to the toolchain,
        #    use 'codeql resolve qlpacks' and 'codeql resolve languages'.

        # Check packs
        0:$ ./codeql/codeql resolve qlpacks |head -5
        # codeql/cpp-all (/Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-all/0.7.3)
        # codeql/cpp-examples (/Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-examples/0.0.0)
        # codeql/cpp-queries (/Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-queries/0.6.3)
        # codeql/csharp-all (/Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/csharp-all/0.6.3)
        # codeql/csharp-examples (/Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/csharp-examples/0.0.0) 

        # Fix the path
        export PATH=$(pwd -P)/codeql:"$PATH"

        # Check languages
        codeql resolve languages | head -5
        # go (/Users/hohn/local/codeql-cli-end-to-end/codeql/go)
        # python (/Users/hohn/local/codeql-cli-end-to-end/codeql/python)
        # java (/Users/hohn/local/codeql-cli-end-to-end/codeql/java)
        # html (/Users/hohn/local/codeql-cli-end-to-end/codeql/html)
        # xml (/Users/hohn/local/codeql-cli-end-to-end/codeql/xml)
      #+end_src
***** A more fancy version
      #+begin_src sh
        # Reference urls:
        # https://github.com/github/codeql-cli-binaries/releases/download/v2.8.0/codeql-linux64.zip
        # https://github.com/github/codeql/archive/refs/tags/codeql-cli/v2.8.0.zip
        #
        # grab -- retrieve and extract codeql cli and library
        # Usage: grab version url prefix
        grab() {
            version=$1; shift
            platform=$1; shift
            prefix=$1; shift
            mkdir -p $prefix/codeql-$version &&
                cd $prefix/codeql-$version || return

            # Get cli
            wget "https://github.com/github/codeql-cli-binaries/releases/download/$version/codeql-$platform.zip"
            # Get lib
            wget "https://github.com/github/codeql/archive/refs/tags/codeql-cli/$version.zip"
            # Fix attributes
            if [ `uname` = Darwin ] ; then
                xattr -c *.zip
            fi
            # Extract
            unzip -q codeql-$platform.zip
            unzip -q $version.zip
            # Rename library directory for VS Code
            mv codeql-codeql-cli-$version/ ql
            # remove archives?
            # rm codeql-$platform.zip
            # rm $version.zip
        }

        grab v2.7.6 osx64 $HOME/local
        grab v2.8.3 osx64 $HOME/local
        grab v2.8.4 osx64 $HOME/local

        grab v2.6.3 linux64 /opt

        grab v2.6.3 osx64 $HOME/local
        grab v2.4.6 osx64 $HOME/local
      #+end_src
***** Most flexible in use, but more initial setup
      =gh=, the GitHub command-line tool from https://github.com/cli/cli

****** gh api repos/{owner}/{repo}/releases
       https://cli.github.com/manual/gh_api
****** gh extension create
       https://cli.github.com/manual/gh_extension
****** gh codeql extension
       https://github.com/github/gh-codeql
****** gh gist list
       https://cli.github.com/manual/gh_gist_list

       #+begin_src text
         0:$ gh codeql
         GitHub command-line wrapper for the CodeQL CLI.
       #+end_src
**** Install pack dependencies
***** Full docs
      https://docs.github.com/en/code-security/codeql-cli/codeql-cli-reference/about-codeql-packs#about-qlpackyml-files
      https://docs.github.com/en/code-security/codeql-cli/codeql-cli-manual/pack-install
***** View installed docs via =-h= flag, highly recommended
      #+begin_src sh
        # Overview
        codeql -h

        # Sub 1
        codeql pack -h

        # Sub 2
        codeql pack install -h
      #+end_src
***** In short
****** Create the qlpack
       Create the qlpack files if not there, one per directory.  In this project,
       that's already done:
       #+begin_src sh
         0:$ find codeql-workshop-vulnerable-linux-driver  -name "qlpack.yml" 
         codeql-workshop-vulnerable-linux-driver/queries/qlpack.yml
         codeql-workshop-vulnerable-linux-driver/solutions/qlpack.yml
         codeql-workshop-vulnerable-linux-driver/common/qlpack.yml
       #+end_src
       For example:
       : cat codeql-workshop-vulnerable-linux-driver/queries/qlpack.yml
       shows
       #+BEGIN_SRC yaml
         ---
         library: false
         name: queries
         version: 0.0.1
         dependencies:
           codeql/cpp-all: ^0.7.0
           common: "*"
       #+END_SRC
       So the queries directory does not contain a library, but it depends on one,
       : cat codeql-workshop-vulnerable-linux-driver/common/qlpack.yml
       #+BEGIN_SRC yaml
         ---
         library: true
         name: common
         version: 0.0.1
         dependencies:
           codeql/cpp-all: 0.7.0
       #+END_SRC
       
****** Install each pack's dependencies
       The first time you install dependencies, it's a good idea to do this
       menually, per =qlpack.yml= file, and deal with any errors that may occur.

       #+BEGIN_SRC sh
         pushd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
         codeql pack install --no-strict-mode queries/
       #+END_SRC

       After the initial setup and for automation, install each pack's
       dependencies via a loop: =codeql pack install=
       #+begin_src sh
         pushd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
         find . -name "qlpack.yml"
         # ./queries/qlpack.yml
         # ./solutions/qlpack.yml
         # ./common/qlpack.yml

         codeql pack install --no-strict-mode queries/
         # Dependencies resolved. Installing packages...
         # Install location: /Users/hohn/.codeql/packages
         # Nothing to install.
         # Package install location: /Users/hohn/.codeql/packages
         # Nothing downloaded.

         for sub in `find . -name "qlpack.yml" | sed s@qlpack.yml@@g;`
         do
             codeql pack install --no-strict-mode $sub
         done
       #+end_src
*** Run queries
**** Individual: 1 database -> N sarif files
     #+BEGIN_SRC sh
       #* Set environment
       PROJ=$HOME/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
       DB=$PROJ/vulnerable-linux-driver-db
       QLQUERY=$PROJ/solutions/BufferOverflow.ql
       QUERY_RES_SARIF=$PROJ/$(cd $PROJ && git rev-parse --short HEAD)-BufferOverflow.sarif

       #* Run query
       pushd $PROJ
       codeql database analyze --format=sarif-latest --rerun   \
              --output $QUERY_RES_SARIF                        \
              -j6                                              \
              --ram=24000                                      \
              --                                               \
              $DB                                              \
              $QLQUERY

       # if you get
           # fatal error occurred: Error initializing the IMB disk cache: the cache
           # directory is already locked by another running process. Only one instance of
           # the IMB can access a cache directory at a time. The lock file is located at
           # /Users/hohn/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver/vulnerable-linux-driver-db/db-cpp/default/cache/.lock
       #  exit vs code and try again
     #+END_SRC

     And after some time:

     #+BEGIN_SRC text
       BufferOverflow.ql: [1/1 eval 1.8s] Results written to solutions/BufferOverfl
       Shutting down query evaluator.
       Interpreting results.
     #+END_SRC

     #+BEGIN_SRC sh
       echo The query $QLQUERY
       echo run on $DB
       echo produced output in $QUERY_RES_SARIF:
       head -5 $QUERY_RES_SARIF
       # {
       #   "$schema" : "https://json.schemastore.org/sarif-2.1.0.json",
       #   "version" : "2.1.0",
       #   "runs" : [ {
       #     "tool" : {
       # ...
     #+END_SRC

     And run another, get another sarif file.  Bad idea in general, but good for
     debugging timing etc.

     #+BEGIN_SRC sh
       #* Use prior variable settings

       #* Run query
       pushd $PROJ
       qo=$PROJ/$(cd $PROJ && git rev-parse --short HEAD)-UseAfterFree.sarif
       codeql database analyze --format=sarif-latest --rerun   \
              --output $qo                                     \
              -j6                                              \
              --ram=24000                                      \
              --                                               \
              $DB                                              \
              $PROJ/solutions/UseAfterFree.ql
       popd

       echo "Query results in $qo"
       head -5 "$qo"

       # Query results in /Users/hohn/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver/e402cf5-UseAfterFree.sarif
       # {
       #   "$schema" : "https://json.schemastore.org/sarif-2.1.0.json",
       #   "version" : "2.1.0",
       #   "runs" : [ {
       #     "tool" : {
     #+END_SRC

**** Use directory of queries: 1 database -> 1 sarif file (least effort)
     #+BEGIN_SRC sh
       #* Set environment
       P1_PROJ=$HOME/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
       P1_DB=$PROJ/vulnerable-linux-driver-db
       P1_QLQUERYDIR=$PROJ/solutions/
       P1_QUERY_RES_SARIF=$PROJ/$(cd $PROJ && git rev-parse --short HEAD).sarif

       #* check variables
       set | grep P1_

       #* Run query
       pushd $P1_PROJ
       codeql database analyze --format=sarif-latest --rerun   \
              --output $P1_QUERY_RES_SARIF                     \
              -j6                                              \
              --ram=24000                                      \
              --                                               \
              $P1_DB                                           \
              $P1_PROJ/solutions/
       popd
     #+END_SRC

     We can compare SARIF result sizes:
     #+BEGIN_SRC sh
       ls -la "$qo" $P1_QUERY_RES_SARIF $QUERY_RES_SARIF
     #+END_SRC

     And for these tiny results, it's mostly metadata:
     #+BEGIN_SRC text
       -rw-r--r-- 1 hohn staff 29K Jun 20 10:06 /Users/hohn/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver/d548189-BufferOverflow.sarif
       -rw-r--r-- 1 hohn staff 33K Jun 20 10:02 /Users/hohn/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver/d548189.sarif
       -rw-r--r-- 1 hohn staff 28K Jun 20 09:51 /Users/hohn/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver/e402cf5-UseAfterFree.sarif
     #+END_SRC
     
**** Use suite: 1 database -> 1 sarif file (more flexible, more effort)
     A useful, general purpose template is at
     https://github.com/rvermeulen/codeql-example-project-layout.

***** Documentation
       - [[https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/built-in-codeql-query-suites][built-in-codeql-query-suites]]
       - [[https://docs.github.com/en/code-security/codeql-cli/using-the-codeql-cli/creating-codeql-query-suites][creating-codeql-query-suites]]
         Important:

         You must add at least one query, queries, or qlpack instruction to your
         suite definition, otherwise no queries will be selected. If the suite
         contains no further instructions, all the queries found from the list of
         files, in the given directory, or in the named CodeQL pack are
         selected. If there are further filtering instructions, only queries that
         match the constraints imposed by those instructions will be selected. 

         Also, a suite definition must be /in/ a codeql pack.
***** In short
     #+BEGIN_SRC sh
       codeql resolve qlpacks | grep cpp

       # Copy query suite into the pack
       cd ~/local/codeql-cli-end-to-end
       cp custom-suite-1.qls codeql-workshop-vulnerable-linux-driver/solutions/
       codeql resolve queries \
              codeql-workshop-vulnerable-linux-driver/solutions/custom-suite-1.qls
     #+END_SRC

     #+RESULTS:
     : /Users/hohn/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver/solutions/UseAfterFree.ql

     #+INCLUDE: "../custom-suite-1.qls" src yaml

*** The importance of versioning
**** TODO CodeQL cli version
     XX: for the sarif-cli
     The CLI versions used against development of the CLI support were: 2.6.3,
     2.9.4, and 2.11.4.

**** Database version
     An attempt to run an analysis with an older version of the cli against a
     database created with a newer cli version will likely abort with an error.

     In terms of commands, the codeql versions used for
     #+BEGIN_SRC sh 
       codeql database create ...
     #+END_SRC
     and
     #+BEGIN_SRC sh
       codeql database analyze ..
     #+END_SRC
     should be the same.

     If you just have a collection of databases, you can check what version of
     the cli produced it.
     The database directory contains the codeql version used in a yaml file,
     a human-readable check:
     #+BEGIN_SRC sh :exports both :results output
       cd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
       grep -A 2 creationMetadata  vulnerable-linux-driver-db/codeql-database.yml 
     #+END_SRC

     #+RESULTS:
     : creationMetadata:
     :   cliVersion: 2.13.0
     :   creationTime: 2023-04-24T21:39:15.963711665Z

**** Query set version
     - For suites in our own source code

         Your query sets /may/ have release versions or tags.  But they almost
         certainly have git commit ids that can be used, like the following:
         #+BEGIN_SRC sh :exports both :results output
           cd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
           git rev-parse --short HEAD
         #+END_SRC

         #+RESULTS:
         : d548189

         If you use packs, you can fix the ids of dependencies in the =qlpack.yml=
         file.  In our example, this is done in several places.  The =common=
         version:
         #+BEGIN_SRC sh :exports both :results output
           cd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
           cat common/qlpack.yml 
         #+END_SRC

         #+RESULTS:
         : ---
         : library: true
         : name: common
         : version: 0.0.1
         : dependencies:
         :   codeql/cpp-all: 0.7.0

         The dependencies are transitive; both =queries= and =solutions= depend on
         =common=, so packs fixed by common also fix packs used by the others.
         And =common= is fixed by our =git= id, so we're done.

     - Some optional details

         We have specified these packs:
         #+BEGIN_SRC sh :exports both :results output
           cd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
           grep codeql/cpp-all */qlpack.yml
         #+END_SRC

         #+RESULTS:
         : common/qlpack.yml:  codeql/cpp-all: 0.7.0
         : queries/qlpack.yml:  codeql/cpp-all: ^0.7.0

         The caret notation =^= means "at least".  So at least version 0.7.0. 

         After we install packs via
         #+begin_src sh 
           codeql pack install --no-strict-mode ...
         #+end_src
         some lock files are generated, and those fix versions further down the
         dependency chain:
         #+begin_src sh :exports both :results output
           cd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
           cat common/codeql-pack.lock.yml
         #+end_src

         #+results:
         #+begin_example
         ---
         lockversion: 1.0.0
         dependencies:
           codeql/cpp-all:
             version: 0.7.0
           codeql/ssa:
             version: 0.0.15
           codeql/tutorial:
             version: 0.0.8
           codeql/util:
             version: 0.0.8
         compiled: false
         #+end_example

     - Note that a query suite is always in a codeql pack, so the pack id is also
       the suite id.

       For example, above we copied a suite and resolved it:
       #+begin_src sh :exports both :results output
         cd ~/local/codeql-cli-end-to-end
         cp custom-suite-1.qls codeql-workshop-vulnerable-linux-driver/solutions/
         codeql resolve queries \
                codeql-workshop-vulnerable-linux-driver/solutions/custom-suite-1.qls
       #+end_src
       #+results:
       : /users/hohn/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver/solutions/useafterfree.ql

       To assign a version number, we can use the revision id:
       #+begin_src sh :exports both :results output
         cd ~/local/codeql-cli-end-to-end
         git rev-parse --short head
       #+end_src

       #+RESULTS:
       : 94fd0a3

     - For manually selected library suites

       For a library suite, we can use the pack id.  For example, we can 
       list the packs
       #+BEGIN_SRC sh :exports both :results output
         export PATH=$HOME/local/codeql-cli-end-to-end/codeql:"$PATH"
         codeql resolve qlpacks | grep cpp
       #+END_SRC

       #+RESULTS:
       : codeql/cpp-all (/Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-all/0.7.3)
       : codeql/cpp-examples (/Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-examples/0.0.0)
       : codeql/cpp-queries (/Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-queries/0.6.3)

       Following the last one, we can find some query suites manually.
       The pack is already known; 0.6.3.
       #+BEGIN_SRC sh :exports both :results output
         find ~/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-queries/0.6.3 \
              -name "*.qls"
       #+END_SRC
      
       #+RESULTS:
       : /Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-queries/0.6.3/codeql-suites/cpp-security-extended.qls
       : /Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-queries/0.6.3/codeql-suites/cpp-security-and-quality.qls
       : /Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-queries/0.6.3/codeql-suites/cpp-security-experimental.qls
       : /Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-queries/0.6.3/codeql-suites/cpp-code-scanning.qls
       : /Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-queries/0.6.3/codeql-suites/cpp-lgtm-full.qls
       : /Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-queries/0.6.3/codeql-suites/cpp-lgtm.qls

     - For predefined suites from =codeql resolve queries=

       A full list of suites is produced via =codeql resolve queries=, here is a
       filtered version.
       #+BEGIN_SRC sh :exports both :results output
         export PATH=$HOME/local/codeql-cli-end-to-end/codeql:"$PATH"
         codeql resolve queries 2>&1  | grep cpp
       #+END_SRC

       #+RESULTS:
       :   cpp-code-scanning.qls - Standard Code Scanning queries for C and C++
       :   cpp-lgtm-full.qls - Standard LGTM queries for C/C++, including ones not displayed by default
       :   cpp-lgtm.qls - Standard LGTM queries for C/C++
       :   cpp-security-and-quality.qls - Security-and-quality queries for C and C++
       :   cpp-security-experimental.qls - Extended and experimental security queries for C and C++
       :   cpp-security-extended.qls - Security-extended queries for C and C++

       The following just counts the list but notice the header output has version
       info reported on =stderr=:
       #+BEGIN_SRC sh :exports both :results output
         export PATH=$HOME/local/codeql-cli-end-to-end/codeql:"$PATH"
         ( codeql resolve queries cpp-code-scanning.qls | wc ) 2>&1
       #+END_SRC

       #+RESULTS:
       : Recording pack reference codeql/cpp-queries at /Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-queries/0.6.3.
       : Recording pack reference codeql/suite-helpers at /Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-queries/0.6.3/.codeql/libraries/codeql/suite-helpers/0.5.3.
       :       47      65    5813

       So we can use the codeql/cpp-queries version, 0.6.3, if we run the
       =cpp-code-scanning.qls= query suite.

    The difference in the last two approaches is the way the suite is chosen.  The
    version number will be the same.

** Review results
*** SARIF Documentation
    The standard is defined at
    https://docs.oasis-open.org/sarif/sarif/v2.1.0/os/sarif-v2.1.0-os.html 
*** SARIF viewer plugin
**** Install plugin in VS Code
     https://marketplace.visualstudio.com/items?itemName=MS-SarifVSCode.sarif-viewer

     Sarif Viewer
     v3.3.7
     Microsoft DevLabs
     microsoft.com
     53,335
     (1)

**** Review
     #+BEGIN_SRC sh
       cd ~/local/codeql-cli-end-to-end
       find . -maxdepth 2 -name "*.sarif" 
     #+END_SRC
     Pick one in VS Code.  Either
     #+BEGIN_SRC sh
       cd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
       cd codeql-workshop-vulnerable-linux-driver/
       code d548189.sarif
     #+END_SRC
     or manually.

     We need the source.

     #+BEGIN_SRC sh
       cd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
       git submodule init
       git submodule update
     #+END_SRC

     When we review, VS Code will ask for the path.

     #+BEGIN_SRC sh
       cd /Users/hohn/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver/vulnerable_linux_driver
       ls src/vuln_driver.c
     #+END_SRC

     #+RESULTS:
     : src/vuln_driver.c

     Reviewing looks as follows.
     #+ATTR_HTML: :alt sarif viewer :width 90%
     [[./img/sarif-view-1.png]]

*** View raw sarif with =jq=
    List the SARIF files again 
    #+BEGIN_SRC sh :exports both :results output
      cd ~/local/codeql-cli-end-to-end
      find . -maxdepth 2 -name "*.sarif" 
    #+END_SRC

    #+RESULTS:
    : ./codeql-workshop-vulnerable-linux-driver/e402cf5.sarif
    : ./codeql-workshop-vulnerable-linux-driver/d548189.sarif
    : ./codeql-workshop-vulnerable-linux-driver/d548189-BufferOverflow.sarif
    : ./codeql-workshop-vulnerable-linux-driver/e402cf5-UseAfterFree.sarif
    : ./codeql-workshop-vulnerable-linux-driver/e402cf5-BufferOverflow.sarif

    The CodeQL version
    #+BEGIN_SRC sh :exports both :results output
      cd ~/local/codeql-cli-end-to-end
      jq '.runs | .[0] | .tool.driver.semanticVersion ' < ./codeql-workshop-vulnerable-linux-driver/e402cf5.sarif
    #+END_SRC

    #+RESULTS:
    : "2.13.4"

    The names of rules processed
    #+BEGIN_SRC sh :exports both :results output
      cd ~/local/codeql-cli-end-to-end
      jq '.runs | .[] | .tool.driver.rules | .[] | .name ' < ./codeql-workshop-vulnerable-linux-driver/d548189.sarif
    #+END_SRC

    #+RESULTS:
    : "cpp/buffer_overflow"
    : "cpp/use_after_free"

*** View raw sarif with =jq= and fzf
    Install the fuzzy finder
    : brew install fzf
    or =apt-get=/=yum= on linux

    Try working to =.runs[0].tool.driver.rules= and follow the output in real
    time.

    #+BEGIN_SRC sh
      pushd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
      res=e402cf5-UseAfterFree.sarif
      echo '' | fzf --print-query --preview="jq {q} < $res"
      popd
    #+END_SRC

*** sarif-cli
**** Setup / local install
     Clone https://github.com/hohn/sarif-cli or
     https://github.com/knewbury01/sarif-cli 

     #+BEGIN_SRC sh
       cd ~/local/codeql-cli-end-to-end
       git clone git@github.com:hohn/sarif-cli.git

       cd ~/local/codeql-cli-end-to-end/sarif-cli
git checkout 203343df
       python3.9 -m venv .venv
       . .venv/bin/activate

       python -m pip install -r requirementsDEV.txt

       # Put bin/ contents into venv PATH
       pip install -e .
     #+END_SRC

**** Compiler-style textual output from SARIF
     The sarif-cli has several script to use from the shell level:
     #+BEGIN_SRC sh :exports both :results output
       cd ~/local/codeql-cli-end-to-end/sarif-cli
       ls -1 bin/
     #+END_SRC

     #+RESULTS:
     #+begin_example
     json-to-yaml
     sarif-aggregate-scans
     sarif-create-aggregate-report
     sarif-digest
     sarif-extract-multi
     sarif-extract-scans
     sarif-extract-scans-runner
     sarif-extract-tables
     sarif-labeled
     sarif-list-files
     sarif-pad-aggregate
     sarif-results-summary
     sarif-to-dot
     #+end_example
     
     The simplest one just list the source files found during analysis:
     #+BEGIN_SRC sh :exports both :results output
       . ~/local/codeql-cli-end-to-end/sarif-cli/.venv/bin/activate
       cd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
       sarif-list-files d548189.sarif 
     #+END_SRC

     #+RESULTS:
     : src/buffer_overflow.h
     : src/use_after_free.h
     : src/vuln_driver.c

     Much more useful is a compiler-style summary of all results found: 
     #+BEGIN_SRC sh :exports both :results output
       . ~/local/codeql-cli-end-to-end/sarif-cli/.venv/bin/activate
       cd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
       sarif-results-summary  d548189.sarif
     #+END_SRC

     This sarif file has only two results, so the output is short:

     #+RESULTS:
     #+begin_example
     RESULT: src/buffer_overflow.h:20:43:20:47: User-controlled size argument in call to [memcpy](1) copying to a [stack buffer](2)
     PATH 0
     FLOW STEP 0: src/vuln_driver.c:17:73:17:77: args
     FLOW STEP 1: src/vuln_driver.c:28:20:28:33: args
     FLOW STEP 2: src/buffer_overflow.h:6:42:6:46: buff
     FLOW STEP 3: src/buffer_overflow.h:20:43:20:47: size

     RESULT: src/use_after_free.h:28:11:28:25: The dangling pointer is used here: [fn](1)
     The dangling pointer is used here: [fn](2)
     The dangling pointer is used here: [arg](3)
     The dangling pointer is used here: [fn](4)
     The dangling pointer is used here: [arg](5)
     #+end_example

     This illustrates the differences in the output between the two result =@kind=
     s:
     - =@kind problem= is a single list of results found
     - =@kind path-problem= is a list of flow paths.  Each path in turn is a list
       of locations.

     Most of these scripts take options that significantly change their output; to
     see them, use the =-h= or =--help= flags.  E.g.,
     #+BEGIN_SRC sh :exports both :results output
       . ~/local/codeql-cli-end-to-end/sarif-cli/.venv/bin/activate
       sarif-results-summary -h
     #+END_SRC

     #+RESULTS:
     #+begin_example
     usage: sarif-results-summary [-h] [-s srcroot] [-r] [-e] [-c] sarif-file

     summary of results

     positional arguments:
       sarif-file            input file, - for stdin

     optional arguments:
       -h, --help            show this help message and exit
       -s srcroot, --list-source srcroot
                             list source snippets using srcroot as sarif SRCROOT
       -r, --related-locations
                             list related locations like "hides [parameter](1)"
       -e, --endpoints-only  only list source and sink, dropping the path.
                             Identical, successive source/sink pairs are combined
       -c, --csv             output csv instead of human-readable summary
     #+end_example

     Some of these make output much more informative, like =-r= and =-s=:

     With =-r=:
     
     #+BEGIN_SRC sh :exports both :results output
       . ~/local/codeql-cli-end-to-end/sarif-cli/.venv/bin/activate
       cd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
       sarif-results-summary -r d548189.sarif
     #+END_SRC

     #+RESULTS:
     #+begin_example
     RESULT: src/buffer_overflow.h:20:43:20:47: User-controlled size argument in call to [memcpy](1) copying to a [stack buffer](2)
     REFERENCE: src/buffer_overflow.h:20:17:20:23: memcpy
     REFERENCE: src/buffer_overflow.h:8:22:8:33: stack buffer
     PATH 0
     FLOW STEP 0: src/vuln_driver.c:17:73:17:77: args
     FLOW STEP 1: src/vuln_driver.c:28:20:28:33: args
     FLOW STEP 2: src/buffer_overflow.h:6:42:6:46: buff
     FLOW STEP 3: src/buffer_overflow.h:20:43:20:47: size

     RESULT: src/use_after_free.h:28:11:28:25: The dangling pointer is used here: [fn](1)
     The dangling pointer is used here: [fn](2)
     The dangling pointer is used here: [arg](3)
     The dangling pointer is used here: [fn](4)
     The dangling pointer is used here: [arg](5)
     REFERENCE: src/use_after_free.h:84:22:84:24: fn
     REFERENCE: src/use_after_free.h:87:70:87:72: fn
     REFERENCE: src/use_after_free.h:87:90:87:93: arg
     REFERENCE: src/use_after_free.h:89:20:89:22: fn
     REFERENCE: src/use_after_free.h:89:39:89:42: arg
     #+end_example
     
     If the source code is available, we can use =-s= to include snippets in the
     output.  This effectively converts sarif to the format used by gcc and clang
     to report warnings and errors.
     #+BEGIN_SRC sh :exports both :results output
       . ~/local/codeql-cli-end-to-end/sarif-cli/.venv/bin/activate
       cd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
       sarif-results-summary  -s vulnerable_linux_driver/ d548189.sarif
     #+END_SRC

     #+RESULTS:
     #+begin_example
     RESULT: src/buffer_overflow.h:20:43:20:47: User-controlled size argument in call to [memcpy](1) copying to a [stack buffer](2)
                     memcpy(kernel_buff, buff, size);
                                               ^^^^
     PATH 0
     FLOW STEP 0: src/vuln_driver.c:17:73:17:77: args
     static long do_ioctl(struct file *filp, unsigned int cmd, unsigned long args)
                                                                             ^^^^
     FLOW STEP 1: src/vuln_driver.c:28:20:28:33: args
        buffer_overflow((char *) args);
                        ^^^^^^^^^^^^^
     FLOW STEP 2: src/buffer_overflow.h:6:42:6:46: buff
      static int buffer_overflow(char __user *buff)
                                              ^^^^
     FLOW STEP 3: src/buffer_overflow.h:20:43:20:47: size
                     memcpy(kernel_buff, buff, size);
                                               ^^^^

     RESULT: src/use_after_free.h:28:11:28:25: The dangling pointer is used here: [fn](1)
     The dangling pointer is used here: [fn](2)
     The dangling pointer is used here: [arg](3)
     The dangling pointer is used here: [fn](4)
     The dangling pointer is used here: [arg](5)
      uaf_obj *global_uaf_obj = NULL;
               ^^^^^^^^^^^^^^
     #+end_example

**** SQL conversion -- not compatible with codeql v2.13.4
     The ultimate purpose of the sarif-cli is producing CSV files for import into
     SQL databases.  This requires a completely defined static structure, without
     any optional fields.  The internals of the tool are beyond the scope of this
     workshop, some details are their external effects are important:

     1. a (very large and comprehensive) type signature is defined in sarif-cli
     2. sarif files that have extra fields not in the signature will produce warnings
     3. sarif files that are missing fields from the signature will produce a fatal 
        error.  A message will be printed and the scripts will abort.
     4. Sometimes, sarif files will have a field but no content.  For a number of
        these, dummy values are inserted.  One example are queries that don't
        produce line numbers in their output; for those, -1 is used as value.

     Unfortunately, this version of codeql
     #+BEGIN_SRC sh 
       cd ~/local/codeql-cli-end-to-end
       ./codeql/codeql --version
     #+END_SRC

     : CodeQL command-line toolchain release 2.13.4.
     : Copyright (C) 2019-2023 GitHub, Inc.
     : Unpacked in: /Users/hohn/local/codeql-cli-end-to-end/codeql
     :    Analysis results depend critically on separately distributed query and
     :    extractor modules. To list modules that are visible to the toolchain,
     :    use 'codeql resolve qlpacks' and 'codeql resolve languages'.

     has signature changes incompatible with (the older) sarif-cli (version
     e62c351) 

     #     #+BEGIN_SRC sh :exports both :results output
     #       . ~/local/codeql-cli-end-to-end/sarif-cli/.venv/bin/activate 
     #       cd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
     #       # sarif-extract-tables d548189.sarif tabular
     #       sarif-extract-scans-runner - <<EOF
     #       d548189.sarif
     #       EOF
     #       echo d548189.sarif | sarif-extract-scans-runner - - EOF
     #     #+END_SRC

** Running sequence
*** Smallest query suite (security suite).
*** Check results.
**** Lots of result (> 5000) -> cli review via compiler-style dump.
**** Medium result sets (~ 2000) (sarif review plugin, can only load 5000
     results) 
**** Few results (sarif review plugin, can only load 5000 results)
*** Expand query 
** Compare results.
*** sarif-cli using compiler-style dump
** Miscellany
   - Scale factor for building DBs: Common case: 15 minutes for a parallel cpp
     compilation can be a 2 hour database build for codeql.