codeql-cli-end-to-end

  cd ~/local
  git clone git@github.com:hohn/codeql-workshop-vulnerable-linux-driver.git
  cd codeql-workshop-vulnerable-linux-driver/
  unzip vulnerable-linux-driver.zip
  tree -L 2 vulnerable-linux-driver-db/
  vulnerable-linux-driver-db/
  ├── codeql-database.yml
  ├── db-cpp
  │   ├── default
  │   ├── semmlecode.cpp.dbscheme
  │   └── semmlecode.cpp.dbscheme.stats
  └── src.zip

  3 directories, 4 files

DONE Quick check using VS Code. Same steps will repeat:

select DB

select query

run query

view results

DONE Install codeql

Full docs:

https://docs.github.com/en/code-security/codeql-cli/using-the-codeql-cli/getting-started-with-the-codeql-cli#getting-started-with-the-codeql-cli https://docs.github.com/en/code-security/code-scanning/using-codeql-code-scanning-with-your-existing-ci-system/installing-codeql-cli-in-your-ci-system#setting-up-the-codeql-cli-in-your-ci-system

In short:

  cd ~/local/codeql-cli-end-to-endw
  # Decide on version / os via browser, then: 
  wget https://github.com/github/codeql-action/releases/download/codeql-bundle-v2.13.4/codeql-bundle-osx64.tar.gz

  # Fix attributes on mac
  if [ `uname` = Darwin ] ; then
      xattr -c *.tar.gz
  fi

  # Extract
  tar zxf ./codeql-bundle-osx64.tar.gz

  # Check binary
  pwd
  # /Users/hohn/local/codeql-cli-end-to-end
  ./codeql/codeql --version
  # CodeQL command-line toolchain release 2.13.4.
  # Copyright (C) 2019-2023 GitHub, Inc.
  # Unpacked in: /Users/hohn/local/codeql-cli-end-to-end/codeql
  #    Analysis results depend critically on separately distributed query and
  #    extractor modules. To list modules that are visible to the toolchain,
  #    use 'codeql resolve qlpacks' and 'codeql resolve languages'.

  # Check packs
  0:$ ./codeql/codeql resolve qlpacks |head -5
  # codeql/cpp-all (/Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-all/0.7.3)
  # codeql/cpp-examples (/Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-examples/0.0.0)
  # codeql/cpp-queries (/Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-queries/0.6.3)
  # codeql/csharp-all (/Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/csharp-all/0.6.3)
  # codeql/csharp-examples (/Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/csharp-examples/0.0.0) 

  # Fix the path
  export PATH=$(pwd -P)/codeql:"$PATH"

  # Check languages
  codeql resolve languages | head -5
  # go (/Users/hohn/local/codeql-cli-end-to-end/codeql/go)
  # python (/Users/hohn/local/codeql-cli-end-to-end/codeql/python)
  # java (/Users/hohn/local/codeql-cli-end-to-end/codeql/java)
  # html (/Users/hohn/local/codeql-cli-end-to-end/codeql/html)
  # xml (/Users/hohn/local/codeql-cli-end-to-end/codeql/xml)

A more fancy version

  # Reference urls:
  # https://github.com/github/codeql-cli-binaries/releases/download/v2.8.0/codeql-linux64.zip
  # https://github.com/github/codeql/archive/refs/tags/codeql-cli/v2.8.0.zip
  #
  # grab -- retrieve and extract codeql cli and library
  # Usage: grab version url prefix
  grab() {
      version=$1; shift
      platform=$1; shift
      prefix=$1; shift
      mkdir -p $prefix/codeql-$version &&
          cd $prefix/codeql-$version || return

      # Get cli
      wget "https://github.com/github/codeql-cli-binaries/releases/download/$version/codeql-$platform.zip"
      # Get lib
      wget "https://github.com/github/codeql/archive/refs/tags/codeql-cli/$version.zip"
      # Fix attributes
      if [ `uname` = Darwin ] ; then
          xattr -c *.zip
      fi
      # Extract
      unzip -q codeql-$platform.zip
      unzip -q $version.zip
      # Rename library directory for VS Code
      mv codeql-codeql-cli-$version/ ql
      # remove archives?
      # rm codeql-$platform.zip
      # rm $version.zip
  }

  grab v2.7.6 osx64 $HOME/local
  grab v2.8.3 osx64 $HOME/local
  grab v2.8.4 osx64 $HOME/local

  grab v2.6.3 linux64 /opt

  grab v2.6.3 osx64 $HOME/local
  grab v2.4.6 osx64 $HOME/local

Most flexible in use, but more initial setup

gh, the GitHub command-line tool from https://github.com/cli/cli

gh api repos/{owner}/{repo}/releases

https://cli.github.com/manual/gh_api

gh extension create

https://cli.github.com/manual/gh_extension

gh codeql extension

https://github.com/github/gh-codeql

gh gist list

https://cli.github.com/manual/gh_gist_list

  0:$ gh codeql
  GitHub command-line wrapper for the CodeQL CLI.

Install pack dependencies

Full docs

https://docs.github.com/en/code-security/codeql-cli/codeql-cli-reference/about-codeql-packs#about-qlpackyml-files https://docs.github.com/en/code-security/codeql-cli/codeql-cli-manual/pack-install

View installed docs via `-h` flag, highly recommended

  # Overview
  codeql -h

  # Sub 1
  codeql pack -h

  # Sub 2
  codeql pack install -h

In short

Create the qlpack

Create the qlpack files if not there, one per directory. In this project, that's already done:

  0:$ find codeql-workshop-vulnerable-linux-driver  -name "qlpack.yml" 
  codeql-workshop-vulnerable-linux-driver/queries/qlpack.yml
  codeql-workshop-vulnerable-linux-driver/solutions/qlpack.yml
  codeql-workshop-vulnerable-linux-driver/common/qlpack.yml

For example:

cat codeql-workshop-vulnerable-linux-driver/queries/qlpack.yml

shows

  ---
  library: false
  name: queries
  version: 0.0.1
  dependencies:
    codeql/cpp-all: ^0.7.0
    common: "*"

So the queries directory does not contain a library, but it depends on one,

cat codeql-workshop-vulnerable-linux-driver/common/qlpack.yml

  ---
  library: true
  name: common
  version: 0.0.1
  dependencies:
    codeql/cpp-all: 0.7.0

Install each pack's dependencies

The first time you install dependencies, it's a good idea to do this menually, per qlpack.yml file, and deal with any errors that may occur.

  pushd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
  codeql pack install --no-strict-mode queries/

After the initial setup and for automation, install each pack's dependencies via a loop: codeql pack install

  pushd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
  find . -name "qlpack.yml"
  # ./queries/qlpack.yml
  # ./solutions/qlpack.yml
  # ./common/qlpack.yml

  codeql pack install --no-strict-mode queries/
  # Dependencies resolved. Installing packages...
  # Install location: /Users/hohn/.codeql/packages
  # Nothing to install.
  # Package install location: /Users/hohn/.codeql/packages
  # Nothing downloaded.

  for sub in `find . -name "qlpack.yml" | sed s@qlpack.yml@@g;`
  do
      codeql pack install --no-strict-mode $sub
  done

Run queries

Individual: 1 database -> N sarif files

  #* Set environment
  PROJ=$HOME/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
  DB=$PROJ/vulnerable-linux-driver-db
  QLQUERY=$PROJ/solutions/BufferOverflow.ql
  QUERY_RES_SARIF=$PROJ/$(cd $PROJ && git rev-parse --short HEAD)-BufferOverflow.sarif

  #* Run query
  pushd $PROJ
  codeql database analyze --format=sarif-latest --rerun   \
         --output $QUERY_RES_SARIF                        \
         -j6                                              \
         --ram=24000                                      \
         --                                               \
         $DB                                              \
         $QLQUERY

  # if you get
      # fatal error occurred: Error initializing the IMB disk cache: the cache
      # directory is already locked by another running process. Only one instance of
      # the IMB can access a cache directory at a time. The lock file is located at
      # /Users/hohn/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver/vulnerable-linux-driver-db/db-cpp/default/cache/.lock
  #  exit vs code and try again

And after some time:

  BufferOverflow.ql: [1/1 eval 1.8s] Results written to solutions/BufferOverfl
  Shutting down query evaluator.
  Interpreting results.

  echo The query $QLQUERY
  echo run on $DB
  echo produced output in $QUERY_RES_SARIF:
  head -5 $QUERY_RES_SARIF
  # {
  #   "$schema" : "https://json.schemastore.org/sarif-2.1.0.json",
  #   "version" : "2.1.0",
  #   "runs" : [ {
  #     "tool" : {
  # ...

And run another, get another sarif file. Bad idea in general, but good for debugging timing etc.

  #* Use prior variable settings

  #* Run query
  pushd $PROJ
  qo=$PROJ/$(cd $PROJ && git rev-parse --short HEAD)-UseAfterFree.sarif
  codeql database analyze --format=sarif-latest --rerun   \
         --output $qo                                     \
         -j6                                              \
         --ram=24000                                      \
         --                                               \
         $DB                                              \
         $PROJ/solutions/UseAfterFree.ql
  popd

  echo "Query results in $qo"
  head -5 "$qo"

  # Query results in /Users/hohn/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver/e402cf5-UseAfterFree.sarif
  # {
  #   "$schema" : "https://json.schemastore.org/sarif-2.1.0.json",
  #   "version" : "2.1.0",
  #   "runs" : [ {
  #     "tool" : {

Use directory of queries: 1 database -> 1 sarif file (least effort)

  #* Set environment
  P1_PROJ=$HOME/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
  P1_DB=$PROJ/vulnerable-linux-driver-db
  P1_QLQUERYDIR=$PROJ/solutions/
  P1_QUERY_RES_SARIF=$PROJ/$(cd $PROJ && git rev-parse --short HEAD).sarif

  #* check variables
  set | grep P1_

  #* Run query
  pushd $P1_PROJ
  codeql database analyze --format=sarif-latest --rerun   \
         --output $P1_QUERY_RES_SARIF                     \
         -j6                                              \
         --ram=24000                                      \
         --                                               \
         $P1_DB                                           \
         $P1_PROJ/solutions/
  popd

We can compare SARIF result sizes:

  ls -la "$qo" $P1_QUERY_RES_SARIF $QUERY_RES_SARIF

And for these tiny results, it's mostly metadata:

  -rw-r--r-- 1 hohn staff 29K Jun 20 10:06 /Users/hohn/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver/d548189-BufferOverflow.sarif
  -rw-r--r-- 1 hohn staff 33K Jun 20 10:02 /Users/hohn/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver/d548189.sarif
  -rw-r--r-- 1 hohn staff 28K Jun 20 09:51 /Users/hohn/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver/e402cf5-UseAfterFree.sarif

Use suite: 1 database -> 1 sarif file (more flexible, more effort)

A useful, general purpose template is at https://github.com/rvermeulen/codeql-example-project-layout.

Documentation

built-in-codeql-query-suites
creating-codeql-query-suites Important: You must add at least one query, queries, or qlpack instruction to your suite definition, otherwise no queries will be selected. If the suite contains no further instructions, all the queries found from the list of files, in the given directory, or in the named CodeQL pack are selected. If there are further filtering instructions, only queries that match the constraints imposed by those instructions will be selected. Also, a suite definition must be in a codeql pack.

In short

  codeql resolve qlpacks | grep cpp

  # Copy query suite into the pack
  cd ~/local/codeql-cli-end-to-end
  cp custom-suite-1.qls codeql-workshop-vulnerable-linux-driver/solutions/
  codeql resolve queries \
         codeql-workshop-vulnerable-linux-driver/solutions/custom-suite-1.qls

  # 
  # Taken from
  # codeql-v2.12.3/codeql/qlpacks/codeql/suite-helpers/0.4.3/code-scanning-selectors.yml
  # and modified
  # 
  - description: Security sample queries
  - queries: .
  # - qlpack: some-pack-cpp
  - include:
      kind:
      # UseAfterFree
      - problem
      # # BufferOverflow
      # - path-problem
      # precision:
      # - high
      # - very-high
      # problem.severity:
      # - error
      # tags contain:
      # - security

  # - exclude:
  #     deprecated: //
  # - exclude:
  #     query path:
  #       - /^experimental\/.*/
  #       - Metrics/Summaries/FrameworkCoverage.ql
  #       - /Diagnostics/Internal/.*/
  # - exclude:
  #     tags contain:
  #       - modelgenerator

TODO Include versioning:

TODO codeql cli

TODO query set version

Checks:

For building DBs: Common case: 15 minutes for || cpp compilation, can

be 2 h with codeql.

Review results

SARIF Documentation

The standard is defined at https://docs.oasis-open.org/sarif/sarif/v2.1.0/os/sarif-v2.1.0-os.html

SARIF viewer plugin

Install plugin in VS Code

https://marketplace.visualstudio.com/items?itemName=MS-SarifVSCode.sarif-viewer

Sarif Viewer v3.3.7 Microsoft DevLabs microsoft.com 53,335 (1)

Review

  cd ~/local/codeql-cli-end-to-end
  find . -maxdepth 2 -name "*.sarif"

Pick one in VS Code. Either

  cd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
  cd codeql-workshop-vulnerable-linux-driver/
  code d548189.sarif

or manually.

We need the source.

  cd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
  git submodule init
  git submodule update

When we review, VS Code will ask for the path.

  cd /Users/hohn/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver/vulnerable_linux_driver
  ls src/vuln_driver.c

Reviewing looks as follows.

View raw sarif with `jq`

List the SARIF files again

  cd ~/local/codeql-cli-end-to-end
  find . -maxdepth 2 -name "*.sarif"

The CodeQL version

  cd ~/local/codeql-cli-end-to-end
  jq '.runs | .[0] | .tool.driver.semanticVersion ' < ./codeql-workshop-vulnerable-linux-driver/e402cf5.sarif

2.13.4

The names of rules processed

  cd ~/local/codeql-cli-end-to-end
  jq '.runs | .[] | .tool.driver.rules | .[] | .name ' < ./codeql-workshop-vulnerable-linux-driver/d548189.sarif

cpp/buffer_overflow

cpp/use_after_free

View raw sarif with `jq` and fzf

Install the fuzzy finder

brew install fzf

or apt-get=/=yum on linux

Try working to .runs[0].tool.driver.rules and follow the output in real time.

  pushd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
  res=e402cf5-UseAfterFree.sarif
  echo '' | fzf --print-query --preview="jq {q} < $res"
  popd

sarif-cli

Setup / local install

Clone https://github.com/hohn/sarif-cli or https://github.com/knewbury01/sarif-cli

  cd ~/local/codeql-cli-end-to-end
  git clone git@github.com:hohn/sarif-cli.git

  cd ~/local/codeql-cli-end-to-end/sarif-cli
  python3.9 -m venv .venv
  . .venv/bin/activate

  python -m pip install -r requirementsDEV.txt

  # Put bin/ contents into venv PATH
  pip install -e .

Compiler-style textual output from SARIF

The sarif-cli has several script to use from the shell level:

  cd ~/local/codeql-cli-end-to-end/sarif-cli
  ls -1 bin/

  json-to-yaml
  sarif-aggregate-scans
  sarif-create-aggregate-report
  sarif-digest
  sarif-extract-multi
  sarif-extract-scans
  sarif-extract-scans-runner
  sarif-extract-tables
  sarif-labeled
  sarif-list-files
  sarif-pad-aggregate
  sarif-results-summary
  sarif-to-dot

The simplest one just list the source files found during analysis:

  . ~/local/codeql-cli-end-to-end/sarif-cli/.venv/bin/activate
  cd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
  sarif-list-files d548189.sarif

src/buffer_overflow.h
src/use_after_free.h
src/vuln_driver.c

Much more useful is a compiler-style summary of all results found:

  . ~/local/codeql-cli-end-to-end/sarif-cli/.venv/bin/activate
  cd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
  sarif-results-summary  d548189.sarif

  RESULT: src/buffer_overflow.h:20:43:20:47: User-controlled size argument in call to [memcpy](1) copying to a [stack buffer](2)
  PATH 0
  FLOW STEP 0: src/vuln_driver.c:17:73:17:77: args
  FLOW STEP 1: src/vuln_driver.c:28:20:28:33: args
  FLOW STEP 2: src/buffer_overflow.h:6:42:6:46: buff
  FLOW STEP 3: src/buffer_overflow.h:20:43:20:47: size

  RESULT: src/use_after_free.h:28:11:28:25: The dangling pointer is used here: [fn](1)
  The dangling pointer is used here: [fn](2)
  The dangling pointer is used here: [arg](3)
  The dangling pointer is used here: [fn](4)
  The dangling pointer is used here: [arg](5)

This sarif file has only two results, so the output is short:

  RESULT: src/buffer_overflow.h:20:43:20:47: User-controlled size argument in call to [memcpy](1) copying to a [stack buffer](2)
  PATH 0
  FLOW STEP 0: src/vuln_driver.c:17:73:17:77: args
  FLOW STEP 1: src/vuln_driver.c:28:20:28:33: args
  FLOW STEP 2: src/buffer_overflow.h:6:42:6:46: buff
  FLOW STEP 3: src/buffer_overflow.h:20:43:20:47: size

  RESULT: src/use_after_free.h:28:11:28:25: The dangling pointer is used here: [fn](1)
  The dangling pointer is used here: [fn](2)
  The dangling pointer is used here: [arg](3)
  The dangling pointer is used here: [fn](4)
  The dangling pointer is used here: [arg](5)

This illustrates the differences in the output between the two result @kind s:

@kind problem is a single list of results found
@kind path-problem is a list of flow paths. Each path in turn is a list of locations.

Most of these scripts take options that significantly change their output; to see them, use the -h or --help flags. E.g.,

  . ~/local/codeql-cli-end-to-end/sarif-cli/.venv/bin/activate
  sarif-results-summary -h

  usage: sarif-results-summary [-h] [-s srcroot] [-r] [-e] [-c] sarif-file

  summary of results

  positional arguments:
    sarif-file            input file, - for stdin

  optional arguments:
    -h, --help            show this help message and exit
    -s srcroot, --list-source srcroot
                          list source snippets using srcroot as sarif SRCROOT
    -r, --related-locations
                          list related locations like "hides [parameter](1)"
    -e, --endpoints-only  only list source and sink, dropping the path.
                          Identical, successive source/sink pairs are combined
    -c, --csv             output csv instead of human-readable summary

Some of these make output much more informative, like -r and -s:

With -r:

  . ~/local/codeql-cli-end-to-end/sarif-cli/.venv/bin/activate
  cd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
  sarif-results-summary -r d548189.sarif

  RESULT: src/buffer_overflow.h:20:43:20:47: User-controlled size argument in call to [memcpy](1) copying to a [stack buffer](2)
  REFERENCE: src/buffer_overflow.h:20:17:20:23: memcpy
  REFERENCE: src/buffer_overflow.h:8:22:8:33: stack buffer
  PATH 0
  FLOW STEP 0: src/vuln_driver.c:17:73:17:77: args
  FLOW STEP 1: src/vuln_driver.c:28:20:28:33: args
  FLOW STEP 2: src/buffer_overflow.h:6:42:6:46: buff
  FLOW STEP 3: src/buffer_overflow.h:20:43:20:47: size

  RESULT: src/use_after_free.h:28:11:28:25: The dangling pointer is used here: [fn](1)
  The dangling pointer is used here: [fn](2)
  The dangling pointer is used here: [arg](3)
  The dangling pointer is used here: [fn](4)
  The dangling pointer is used here: [arg](5)
  REFERENCE: src/use_after_free.h:84:22:84:24: fn
  REFERENCE: src/use_after_free.h:87:70:87:72: fn
  REFERENCE: src/use_after_free.h:87:90:87:93: arg
  REFERENCE: src/use_after_free.h:89:20:89:22: fn
  REFERENCE: src/use_after_free.h:89:39:89:42: arg

If the source code is available, we can use -s to include snippets in the output. This effectively converts sarif to the format used by gcc and clang to report warnings and errors.

  . ~/local/codeql-cli-end-to-end/sarif-cli/.venv/bin/activate
  cd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
  sarif-results-summary  -s vulnerable_linux_driver/ d548189.sarif

  RESULT: src/buffer_overflow.h:20:43:20:47: User-controlled size argument in call to [memcpy](1) copying to a [stack buffer](2)
                  memcpy(kernel_buff, buff, size);
                                            ^^^^
  PATH 0
  FLOW STEP 0: src/vuln_driver.c:17:73:17:77: args
  static long do_ioctl(struct file *filp, unsigned int cmd, unsigned long args)
                                                                          ^^^^
  FLOW STEP 1: src/vuln_driver.c:28:20:28:33: args
     buffer_overflow((char *) args);
                     ^^^^^^^^^^^^^
  FLOW STEP 2: src/buffer_overflow.h:6:42:6:46: buff
   static int buffer_overflow(char __user *buff)
                                           ^^^^
  FLOW STEP 3: src/buffer_overflow.h:20:43:20:47: size
                  memcpy(kernel_buff, buff, size);
                                            ^^^^

  RESULT: src/use_after_free.h:28:11:28:25: The dangling pointer is used here: [fn](1)
  The dangling pointer is used here: [fn](2)
  The dangling pointer is used here: [arg](3)
  The dangling pointer is used here: [fn](4)
  The dangling pointer is used here: [arg](5)
   uaf_obj *global_uaf_obj = NULL;
            ^^^^^^^^^^^^^^

readme.org

End-to-end demo of CodeQL command line usage

Run analyses

Get collection of databases (already handy)

DONE Get https://github.com/hohn/codeql-workshop-vulnerable-linux-driver

DONE Quick check using VS Code. Same steps will repeat:

select DB

select query

run query

view results

DONE Install codeql

Full docs:

In short:

A more fancy version

Most flexible in use, but more initial setup

Install pack dependencies

Full docs

View installed docs via -h flag, highly recommended

In short

Run queries

Individual: 1 database -> N sarif files

Use directory of queries: 1 database -> 1 sarif file (least effort)

Use suite: 1 database -> 1 sarif file (more flexible, more effort)

Documentation

In short

TODO Include versioning:

TODO codeql cli

TODO query set version

For building DBs: Common case: 15 minutes for || cpp compilation, can

Review results

SARIF Documentation

SARIF viewer plugin

Install plugin in VS Code

Review

View raw sarif with jq

View raw sarif with jq and fzf

sarif-cli

Setup / local install

Compiler-style textual output from SARIF

TODO SQL conversion

Running sequence

Smallest query suite (security suite).

Check results.

Lots of result (> 5000) -> cli review via compiler-style dump.

Medium result sets (~ 2000) (sarif review plugin, can only load 5000

Few results (sarif review plugin, can only load 5000 results)

Expand query

Compare results.

sarif-cli using compiler-style dump

View installed docs via `-h` flag, highly recommended

View raw sarif with `jq`

View raw sarif with `jq` and fzf