Files
codeql-cli-end-to-end/readme.org
2023-06-20 17:50:29 -07:00

18 KiB

End-to-end demo of CodeQL command line usage

Run analyses

Get collection of databases (already handy)

DONE Get https://github.com/hohn/codeql-workshop-vulnerable-linux-driver
  cd ~/local
  git clone git@github.com:hohn/codeql-workshop-vulnerable-linux-driver.git
  cd codeql-workshop-vulnerable-linux-driver/
  unzip vulnerable-linux-driver.zip
  tree -L 2 vulnerable-linux-driver-db/
  vulnerable-linux-driver-db/
  ├── codeql-database.yml
  ├── db-cpp
  │   ├── default
  │   ├── semmlecode.cpp.dbscheme
  │   └── semmlecode.cpp.dbscheme.stats
  └── src.zip

  3 directories, 4 files
DONE Quick check using VS Code. Same steps will repeat:
select DB
select query
run query
view results
DONE Install codeql
In short:
  cd ~/local/codeql-cli-end-to-endw
  # Decide on version / os via browser, then: 
  wget https://github.com/github/codeql-action/releases/download/codeql-bundle-v2.13.4/codeql-bundle-osx64.tar.gz

  # Fix attributes on mac
  if [ `uname` = Darwin ] ; then
      xattr -c *.tar.gz
  fi

  # Extract
  tar zxf ./codeql-bundle-osx64.tar.gz

  # Check binary
  pwd
  # /Users/hohn/local/codeql-cli-end-to-end
  ./codeql/codeql --version
  # CodeQL command-line toolchain release 2.13.4.
  # Copyright (C) 2019-2023 GitHub, Inc.
  # Unpacked in: /Users/hohn/local/codeql-cli-end-to-end/codeql
  #    Analysis results depend critically on separately distributed query and
  #    extractor modules. To list modules that are visible to the toolchain,
  #    use 'codeql resolve qlpacks' and 'codeql resolve languages'.

  # Check packs
  0:$ ./codeql/codeql resolve qlpacks |head -5
  # codeql/cpp-all (/Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-all/0.7.3)
  # codeql/cpp-examples (/Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-examples/0.0.0)
  # codeql/cpp-queries (/Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-queries/0.6.3)
  # codeql/csharp-all (/Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/csharp-all/0.6.3)
  # codeql/csharp-examples (/Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/csharp-examples/0.0.0) 

  # Fix the path
  export PATH=$(pwd -P)/codeql:"$PATH"

  # Check languages
  codeql resolve languages | head -5
  # go (/Users/hohn/local/codeql-cli-end-to-end/codeql/go)
  # python (/Users/hohn/local/codeql-cli-end-to-end/codeql/python)
  # java (/Users/hohn/local/codeql-cli-end-to-end/codeql/java)
  # html (/Users/hohn/local/codeql-cli-end-to-end/codeql/html)
  # xml (/Users/hohn/local/codeql-cli-end-to-end/codeql/xml)
A more fancy version
  # Reference urls:
  # https://github.com/github/codeql-cli-binaries/releases/download/v2.8.0/codeql-linux64.zip
  # https://github.com/github/codeql/archive/refs/tags/codeql-cli/v2.8.0.zip
  #
  # grab -- retrieve and extract codeql cli and library
  # Usage: grab version url prefix
  grab() {
      version=$1; shift
      platform=$1; shift
      prefix=$1; shift
      mkdir -p $prefix/codeql-$version &&
          cd $prefix/codeql-$version || return

      # Get cli
      wget "https://github.com/github/codeql-cli-binaries/releases/download/$version/codeql-$platform.zip"
      # Get lib
      wget "https://github.com/github/codeql/archive/refs/tags/codeql-cli/$version.zip"
      # Fix attributes
      if [ `uname` = Darwin ] ; then
          xattr -c *.zip
      fi
      # Extract
      unzip -q codeql-$platform.zip
      unzip -q $version.zip
      # Rename library directory for VS Code
      mv codeql-codeql-cli-$version/ ql
      # remove archives?
      # rm codeql-$platform.zip
      # rm $version.zip
  }

  grab v2.7.6 osx64 $HOME/local
  grab v2.8.3 osx64 $HOME/local
  grab v2.8.4 osx64 $HOME/local

  grab v2.6.3 linux64 /opt

  grab v2.6.3 osx64 $HOME/local
  grab v2.4.6 osx64 $HOME/local
Most flexible in use, but more initial setup

gh, the GitHub command-line tool from https://github.com/cli/cli

gh api repos/{owner}/{repo}/releases
gh gist list

https://cli.github.com/manual/gh_gist_list

  0:$ gh codeql
  GitHub command-line wrapper for the CodeQL CLI.
Install pack dependencies
View installed docs via -h flag, highly recommended
  # Overview
  codeql -h

  # Sub 1
  codeql pack -h

  # Sub 2
  codeql pack install -h
In short
Create the qlpack

Create the qlpack files if not there, one per directory. In this project, that's already done:

  0:$ find codeql-workshop-vulnerable-linux-driver  -name "qlpack.yml" 
  codeql-workshop-vulnerable-linux-driver/queries/qlpack.yml
  codeql-workshop-vulnerable-linux-driver/solutions/qlpack.yml
  codeql-workshop-vulnerable-linux-driver/common/qlpack.yml

For example:

cat codeql-workshop-vulnerable-linux-driver/queries/qlpack.yml

shows

  ---
  library: false
  name: queries
  version: 0.0.1
  dependencies:
    codeql/cpp-all: ^0.7.0
    common: "*"

So the queries directory does not contain a library, but it depends on one,

cat codeql-workshop-vulnerable-linux-driver/common/qlpack.yml
  ---
  library: true
  name: common
  version: 0.0.1
  dependencies:
    codeql/cpp-all: 0.7.0
Install each pack's dependencies

The first time you install dependencies, it's a good idea to do this menually, per qlpack.yml file, and deal with any errors that may occur.

  pushd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
  codeql pack install --no-strict-mode queries/

After the initial setup and for automation, install each pack's dependencies via a loop: codeql pack install

  pushd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
  find . -name "qlpack.yml"
  # ./queries/qlpack.yml
  # ./solutions/qlpack.yml
  # ./common/qlpack.yml

  codeql pack install --no-strict-mode queries/
  # Dependencies resolved. Installing packages...
  # Install location: /Users/hohn/.codeql/packages
  # Nothing to install.
  # Package install location: /Users/hohn/.codeql/packages
  # Nothing downloaded.

  for sub in `find . -name "qlpack.yml" | sed s@qlpack.yml@@g;`
  do
      codeql pack install --no-strict-mode $sub
  done

Run queries

Individual: 1 database -> N sarif files
  #* Set environment
  PROJ=$HOME/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
  DB=$PROJ/vulnerable-linux-driver-db
  QLQUERY=$PROJ/solutions/BufferOverflow.ql
  QUERY_RES_SARIF=$PROJ/$(cd $PROJ && git rev-parse --short HEAD)-BufferOverflow.sarif

  #* Run query
  pushd $PROJ
  codeql database analyze --format=sarif-latest --rerun   \
         --output $QUERY_RES_SARIF                        \
         -j6                                              \
         --ram=24000                                      \
         --                                               \
         $DB                                              \
         $QLQUERY

  # if you get
      # fatal error occurred: Error initializing the IMB disk cache: the cache
      # directory is already locked by another running process. Only one instance of
      # the IMB can access a cache directory at a time. The lock file is located at
      # /Users/hohn/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver/vulnerable-linux-driver-db/db-cpp/default/cache/.lock
  #  exit vs code and try again

And after some time:

  BufferOverflow.ql: [1/1 eval 1.8s] Results written to solutions/BufferOverfl
  Shutting down query evaluator.
  Interpreting results.
  echo The query $QLQUERY
  echo run on $DB
  echo produced output in $QUERY_RES_SARIF:
  head -5 $QUERY_RES_SARIF
  # {
  #   "$schema" : "https://json.schemastore.org/sarif-2.1.0.json",
  #   "version" : "2.1.0",
  #   "runs" : [ {
  #     "tool" : {
  # ...

And run another, get another sarif file. Bad idea in general, but good for debugging timing etc.

  #* Use prior variable settings

  #* Run query
  pushd $PROJ
  qo=$PROJ/$(cd $PROJ && git rev-parse --short HEAD)-UseAfterFree.sarif
  codeql database analyze --format=sarif-latest --rerun   \
         --output $qo                                     \
         -j6                                              \
         --ram=24000                                      \
         --                                               \
         $DB                                              \
         $PROJ/solutions/UseAfterFree.ql
  popd

  echo "Query results in $qo"
  head -5 "$qo"

  # Query results in /Users/hohn/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver/e402cf5-UseAfterFree.sarif
  # {
  #   "$schema" : "https://json.schemastore.org/sarif-2.1.0.json",
  #   "version" : "2.1.0",
  #   "runs" : [ {
  #     "tool" : {
Use directory of queries: 1 database -> 1 sarif file (least effort)
  #* Set environment
  P1_PROJ=$HOME/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
  P1_DB=$PROJ/vulnerable-linux-driver-db
  P1_QLQUERYDIR=$PROJ/solutions/
  P1_QUERY_RES_SARIF=$PROJ/$(cd $PROJ && git rev-parse --short HEAD).sarif

  #* check variables
  set | grep P1_

  #* Run query
  pushd $P1_PROJ
  codeql database analyze --format=sarif-latest --rerun   \
         --output $P1_QUERY_RES_SARIF                     \
         -j6                                              \
         --ram=24000                                      \
         --                                               \
         $P1_DB                                           \
         $P1_PROJ/solutions/
  popd

We can compare SARIF result sizes:

  ls -la "$qo" $P1_QUERY_RES_SARIF $QUERY_RES_SARIF

And for these tiny results, it's mostly metadata:

  -rw-r--r-- 1 hohn staff 29K Jun 20 10:06 /Users/hohn/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver/d548189-BufferOverflow.sarif
  -rw-r--r-- 1 hohn staff 33K Jun 20 10:02 /Users/hohn/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver/d548189.sarif
  -rw-r--r-- 1 hohn staff 28K Jun 20 09:51 /Users/hohn/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver/e402cf5-UseAfterFree.sarif
Use suite: 1 database -> 1 sarif file (more flexible, more effort)

A useful, general purpose template is at https://github.com/rvermeulen/codeql-example-project-layout.

Documentation
  • built-in-codeql-query-suites
  • creating-codeql-query-suites Important: You must add at least one query, queries, or qlpack instruction to your suite definition, otherwise no queries will be selected. If the suite contains no further instructions, all the queries found from the list of files, in the given directory, or in the named CodeQL pack are selected. If there are further filtering instructions, only queries that match the constraints imposed by those instructions will be selected. Also, a suite definition must be in a codeql pack.
In short
  codeql resolve qlpacks | grep cpp

  # Copy query suite into the pack
  cd ~/local/codeql-cli-end-to-end
  cp custom-suite-1.qls codeql-workshop-vulnerable-linux-driver/solutions/
  codeql resolve queries \
         codeql-workshop-vulnerable-linux-driver/solutions/custom-suite-1.qls
/Users/hohn/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver/solutions/UseAfterFree.ql
TODO Include versioning:
TODO codeql cli
TODO query set version

Checks:

For building DBs: Common case: 15 minutes for || cpp compilation, can

be 2 h with codeql.

Review results

SARIF Documentation

SARIF viewer plugin

Install plugin in VS Code

https://marketplace.visualstudio.com/items?itemName=MS-SarifVSCode.sarif-viewer

Sarif Viewer v3.3.7 Microsoft DevLabs microsoft.com 53,335 (1)

Review
  cd ~/local/codeql-cli-end-to-end
  find . -maxdepth 2 -name "*.sarif"

Pick one in VS Code. Either

  cd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
  cd codeql-workshop-vulnerable-linux-driver/
  code d548189.sarif

or manually.

We need the source.

  cd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
  git submodule init
  git submodule update

When we review, VS Code will ask for the path.

  cd /Users/hohn/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver/vulnerable_linux_driver
  ls src/vuln_driver.c
src/vuln_driver.c

Reviewing looks as follows.

sarif viewer

View raw sarif with jq

List the SARIF files again

  cd ~/local/codeql-cli-end-to-end
  find . -maxdepth 2 -name "*.sarif"
./codeql-workshop-vulnerable-linux-driver/e402cf5.sarif
./codeql-workshop-vulnerable-linux-driver/e402cf5-UseAfterFree.sarif
./codeql-workshop-vulnerable-linux-driver/e402cf5-BufferOverflow.sarif

The CodeQL version

  cd ~/local/codeql-cli-end-to-end
  jq '.runs | .[0] | .tool.driver.semanticVersion ' < ./codeql-workshop-vulnerable-linux-driver/e402cf5.sarif
2.13.4

The names of rules processed

  cd ~/local/codeql-cli-end-to-end
  jq '.runs | .[] | .tool.driver.rules | .[] | .name ' < ./codeql-workshop-vulnerable-linux-driver/d548189.sarif
cpp/buffer_overflow
cpp/use_after_free

View raw sarif with jq and fzf

Install the fuzzy finder

brew install fzf

or apt-get=/=yum on linux

Try working to .runs[0].tool.driver.rules and follow the output in real time.

  pushd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver
  res=e402cf5-UseAfterFree.sarif
  echo '' | fzf --print-query --preview="jq {q} < $res"
  popd

TODO sarif-cli

TODO Install
TODO Dump
TODO SQL conversion

Running sequence

Smallest query suite (security suite).

Check results.

Lots of result (> 5000) -> cli review via compiler-style dump.
Medium result sets (~ 2000) (sarif review plugin, can only load 5000

results)

Few results (sarif review plugin, can only load 5000 results)

Expand query

Compare results.

sarif-cli using compiler-style dump