* End-to-end demo of CodeQL command line usage ** Run analyses *** Get collection of databases (already handy) **** DONE Get https://github.com/hohn/codeql-workshop-vulnerable-linux-driver #+begin_src text cd ~/local git clone git@github.com:hohn/codeql-workshop-vulnerable-linux-driver.git cd codeql-workshop-vulnerable-linux-driver/ unzip vulnerable-linux-driver.zip tree -L 2 vulnerable-linux-driver-db/ vulnerable-linux-driver-db/ ├── codeql-database.yml ├── db-cpp │   ├── default │   ├── semmlecode.cpp.dbscheme │   └── semmlecode.cpp.dbscheme.stats └── src.zip 3 directories, 4 files #+end_src **** DONE Quick check using VS Code. Same steps will repeat: ***** select DB ***** select query ***** run query ***** view results **** DONE Install codeql ***** Full docs: https://docs.github.com/en/code-security/codeql-cli/using-the-codeql-cli/getting-started-with-the-codeql-cli#getting-started-with-the-codeql-cli https://docs.github.com/en/code-security/code-scanning/using-codeql-code-scanning-with-your-existing-ci-system/installing-codeql-cli-in-your-ci-system#setting-up-the-codeql-cli-in-your-ci-system ***** In short: #+begin_src sh cd ~/local/codeql-cli-end-to-endw # Decide on version / os via browser, then: wget https://github.com/github/codeql-action/releases/download/codeql-bundle-v2.13.4/codeql-bundle-osx64.tar.gz # Fix attributes on mac if [ `uname` = Darwin ] ; then xattr -c *.tar.gz fi # Extract tar zxf ./codeql-bundle-osx64.tar.gz # Check binary pwd # /Users/hohn/local/codeql-cli-end-to-end ./codeql/codeql --version # CodeQL command-line toolchain release 2.13.4. # Copyright (C) 2019-2023 GitHub, Inc. # Unpacked in: /Users/hohn/local/codeql-cli-end-to-end/codeql # Analysis results depend critically on separately distributed query and # extractor modules. To list modules that are visible to the toolchain, # use 'codeql resolve qlpacks' and 'codeql resolve languages'. # Check packs 0:$ ./codeql/codeql resolve qlpacks |head -5 # codeql/cpp-all (/Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-all/0.7.3) # codeql/cpp-examples (/Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-examples/0.0.0) # codeql/cpp-queries (/Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/cpp-queries/0.6.3) # codeql/csharp-all (/Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/csharp-all/0.6.3) # codeql/csharp-examples (/Users/hohn/local/codeql-cli-end-to-end/codeql/qlpacks/codeql/csharp-examples/0.0.0) # Fix the path export PATH=$(pwd -P)/codeql:"$PATH" # Check languages codeql resolve languages | head -5 # go (/Users/hohn/local/codeql-cli-end-to-end/codeql/go) # python (/Users/hohn/local/codeql-cli-end-to-end/codeql/python) # java (/Users/hohn/local/codeql-cli-end-to-end/codeql/java) # html (/Users/hohn/local/codeql-cli-end-to-end/codeql/html) # xml (/Users/hohn/local/codeql-cli-end-to-end/codeql/xml) #+end_src ***** A more fancy version #+begin_src sh # Reference urls: # https://github.com/github/codeql-cli-binaries/releases/download/v2.8.0/codeql-linux64.zip # https://github.com/github/codeql/archive/refs/tags/codeql-cli/v2.8.0.zip # # grab -- retrieve and extract codeql cli and library # Usage: grab version url prefix grab() { version=$1; shift platform=$1; shift prefix=$1; shift mkdir -p $prefix/codeql-$version && cd $prefix/codeql-$version || return # Get cli wget "https://github.com/github/codeql-cli-binaries/releases/download/$version/codeql-$platform.zip" # Get lib wget "https://github.com/github/codeql/archive/refs/tags/codeql-cli/$version.zip" # Fix attributes if [ `uname` = Darwin ] ; then xattr -c *.zip fi # Extract unzip -q codeql-$platform.zip unzip -q $version.zip # Rename library directory for VS Code mv codeql-codeql-cli-$version/ ql # remove archives? # rm codeql-$platform.zip # rm $version.zip } grab v2.7.6 osx64 $HOME/local grab v2.8.3 osx64 $HOME/local grab v2.8.4 osx64 $HOME/local grab v2.6.3 linux64 /opt grab v2.6.3 osx64 $HOME/local grab v2.4.6 osx64 $HOME/local #+end_src ***** Most flexible in use, but more initial setup =gh=, the GitHub command-line tool from https://github.com/cli/cli ****** gh api repos/{owner}/{repo}/releases https://cli.github.com/manual/gh_api ****** gh extension create https://cli.github.com/manual/gh_extension ****** gh codeql extension https://github.com/github/gh-codeql ****** gh gist list https://cli.github.com/manual/gh_gist_list #+begin_src text 0:$ gh codeql GitHub command-line wrapper for the CodeQL CLI. #+end_src **** Install pack dependencies ***** Full docs https://docs.github.com/en/code-security/codeql-cli/codeql-cli-reference/about-codeql-packs#about-qlpackyml-files https://docs.github.com/en/code-security/codeql-cli/codeql-cli-manual/pack-install ***** View installed docs via =-h= flag, highly recommended #+begin_src sh # Overview codeql -h # Sub 1 codeql pack -h # Sub 2 codeql pack install -h #+end_src ***** In short ****** Create the qlpack Create the qlpack files if not there, one per directory. In this project, that's already done: #+begin_src sh 0:$ find codeql-workshop-vulnerable-linux-driver -name "qlpack.yml" codeql-workshop-vulnerable-linux-driver/queries/qlpack.yml codeql-workshop-vulnerable-linux-driver/solutions/qlpack.yml codeql-workshop-vulnerable-linux-driver/common/qlpack.yml #+end_src For example: : cat codeql-workshop-vulnerable-linux-driver/queries/qlpack.yml shows #+BEGIN_SRC yaml --- library: false name: queries version: 0.0.1 dependencies: codeql/cpp-all: ^0.7.0 common: "*" #+END_SRC So the queries directory does not contain a library, but it depends on one, : cat codeql-workshop-vulnerable-linux-driver/common/qlpack.yml #+BEGIN_SRC yaml --- library: true name: common version: 0.0.1 dependencies: codeql/cpp-all: 0.7.0 #+END_SRC ****** Install each pack's dependencies The first time you install dependencies, it's a good idea to do this menually, per =qlpack.yml= file, and deal with any errors that may occur. #+BEGIN_SRC sh pushd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver codeql pack install --no-strict-mode queries/ #+END_SRC After the initial setup and for automation, install each pack's dependencies via a loop: =codeql pack install= #+begin_src sh pushd ~/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver find . -name "qlpack.yml" # ./queries/qlpack.yml # ./solutions/qlpack.yml # ./common/qlpack.yml codeql pack install --no-strict-mode queries/ # Dependencies resolved. Installing packages... # Install location: /Users/hohn/.codeql/packages # Nothing to install. # Package install location: /Users/hohn/.codeql/packages # Nothing downloaded. for sub in `find . -name "qlpack.yml" | sed s@qlpack.yml@@g;` do codeql pack install --no-strict-mode $sub done #+end_src *** Run queries **** Individual: 1 database -> N sarif files #+BEGIN_SRC sh #* Set environment PROJ=$HOME/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver DB=$PROJ/vulnerable-linux-driver-db QLQUERY=$PROJ/solutions/BufferOverflow.ql QUERY_RES_SARIF=$PROJ/$(cd $PROJ && git rev-parse --short HEAD)-BufferOverflow.sarif #* Run query pushd $PROJ codeql database analyze --format=sarif-latest --rerun \ --output $QUERY_RES_SARIF \ -j6 \ --ram=24000 \ -- \ $DB \ $QLQUERY # if you get # fatal error occurred: Error initializing the IMB disk cache: the cache # directory is already locked by another running process. Only one instance of # the IMB can access a cache directory at a time. The lock file is located at # /Users/hohn/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver/vulnerable-linux-driver-db/db-cpp/default/cache/.lock # exit vs code and try again #+END_SRC And after some time: #+BEGIN_SRC text BufferOverflow.ql: [1/1 eval 1.8s] Results written to solutions/BufferOverfl Shutting down query evaluator. Interpreting results. #+END_SRC #+BEGIN_SRC sh echo The query $QLQUERY echo run on $DB echo produced output in $QUERY_RES_SARIF: head -5 $QUERY_RES_SARIF # { # "$schema" : "https://json.schemastore.org/sarif-2.1.0.json", # "version" : "2.1.0", # "runs" : [ { # "tool" : { # ... #+END_SRC And run another, get another sarif file. Bad idea in general, but good for debugging timing etc. #+BEGIN_SRC sh #* Use prior variable settings #* Run query pushd $PROJ qo=$PROJ/$(cd $PROJ && git rev-parse --short HEAD)-UseAfterFree.sarif codeql database analyze --format=sarif-latest --rerun \ --output $qo \ -j6 \ --ram=24000 \ -- \ $DB \ $PROJ/solutions/UseAfterFree.ql popd echo "Query results in $qo" head -5 "$qo" # Query results in /Users/hohn/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver/e402cf5-UseAfterFree.sarif # { # "$schema" : "https://json.schemastore.org/sarif-2.1.0.json", # "version" : "2.1.0", # "runs" : [ { # "tool" : { #+END_SRC **** Use directory of queries: 1 database -> 1 sarif file (least effort) #+BEGIN_SRC sh #* Set environment P1_PROJ=$HOME/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver P1_DB=$PROJ/vulnerable-linux-driver-db P1_QLQUERYDIR=$PROJ/solutions/ P1_QUERY_RES_SARIF=$PROJ/$(cd $PROJ && git rev-parse --short HEAD).sarif #* check variables set | grep P1_ #* Run query pushd $PROJ codeql database analyze --format=sarif-latest --rerun \ --output $P1_QUERY_RES_SARIF \ -j6 \ --ram=24000 \ -- \ $P1_DB \ $P1_PROJ/solutions/ #+END_SRC We can compare SARIF result sizes: #+BEGIN_SRC sh ls -la "$qo" $P1_QUERY_RES_SARIF $QUERY_RES_SARIF #+END_SRC And for these tiny results, it's mostly metadata: #+BEGIN_SRC text -rw-r--r-- 1 hohn staff 29K Jun 20 10:06 /Users/hohn/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver/d548189-BufferOverflow.sarif -rw-r--r-- 1 hohn staff 33K Jun 20 10:02 /Users/hohn/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver/d548189.sarif -rw-r--r-- 1 hohn staff 28K Jun 20 09:51 /Users/hohn/local/codeql-cli-end-to-end/codeql-workshop-vulnerable-linux-driver/e402cf5-UseAfterFree.sarif #+END_SRC **** TODO Use suite: 1 database -> 1 sarif file (more flexible, more effort) **** Include versioning: ***** codeql cli ***** query set version Checks: **** For building DBs: Common case: 15 minutes for || cpp compilation, can be 2 h with codeql. ** Review results *** sarif viewer plugin *** raw sarif with =jq= *** sarif-cli **** dump **** sql conversion ** Running sequence *** Smallest query suite (security suite). *** Check results. **** Lots of result (> 5000) -> cli review via compiler-style dump. **** Medium result sets (~ 2000) (sarif review plugin, can only load 5000 results) **** Few results (sarif review plugin, can only load 5000 results) *** Expand query ** Compare results. *** sarif-cli using compiler-style dump. * Short end-to-end illustration 1. Overall procedure 2. Command-line use 1. For 3.2 also using sarif-cli 3. sarif viewer plugin https://marketplace.visualstudio.com/items?itemName=MS-SarifVSCode.sarif-viewer Sarif Viewer v3.3.7 Microsoft DevLabs microsoft.com 53,335 (1) 4. Details on query suite use (3. Use suite: 1 database -> 1 sarif file (more flexible, more effort))