Collection of cli tools for SARIF processing

THIS IS A WORK IN PROGRESS

Each of these tools present a high-level command-line interface to extract a specific subset of information from a SARIF file. The format of each tool's output is versioned and, as much as possible, independent of the input.

It is the intent of these tools to

  • hide the internals of sarif when used
  • provide examples of extracting information from SARIF files while writing your own or extending the tools

Setup for development

This repository uses git lfs for some larger files; installation steps are at git-lfs; on a mac with homebrew, install it via

  brew install git-lfs
  git lfs install

Set up the virtual environment and install the packages:

  python3 -m venv .venv
  . .venv/bin/activate
  python3 -m pip install -r requirements.txt
  # Or separately:
  pip install --upgrade pip
  pip install ipython pyyaml

"Install" for local development:

pip install -e .

Examples

To use git parlance, the porcelain tool is sarif-results-summary, while the plumbing tools are sarif-digest, sarif-labeled and sarif-list-files.

Following are short summaries of each.

sarif-results-summary

Display the SARIF results in human-readable plain text form. Taking the warning around

  src/stc/scintilla/lexers/LexMySQL.cxx:153:24:153:30:

as example, there are two options using only the SARIF file, and one more when source code is available.

  1. Display only main result. Using

      sarif-results-summary -s data/wxWidgets-small  -r data/wxWidgets_wxWidgets__2021-11-21_16_06_30__export.sarif 2>&1 |less -p LexMySQL.cxx

    only displays

      RESULT: src/stc/scintilla/lexers/LexMySQL.cxx:153:24:153:30: Local variable 'length' hides a [parameter of the same name](1).
  2. Display the related information. Using

      sarif-results-summary -r data/wxWidgets_wxWidgets__2021-11-21_16_06_30__export.sarif 2>&1 | less -p LexMySQL.cxx

    displays

      RESULT: src/stc/scintilla/lexers/LexMySQL.cxx:153:24:153:30: Local variable 'length' hides a [parameter of the same name](1).
    
      REFERENCE: src/stc/scintilla/lexers/LexMySQL.cxx:108:68:108:74: parameter of the same name
  3. Either display can be supplemented by source code snippets if the source is available. Using

      sarif-results-summary -s data/wxWidgets-small  -r data/wxWidgets_wxWidgets__2021-11-21_16_06_30__export.sarif 2>&1 |less

    displays the source code with underlines

      RESULT: src/stc/scintilla/lexers/LexMySQL.cxx:153:24:153:30: Local variable 'length' hides a [parameter of the same name](1).
    
                Sci_Position length = sc.LengthCurrent() + 1;
                             ^^^^^^
      REFERENCE: src/stc/scintilla/lexers/LexMySQL.cxx:108:68:108:74: parameter of the same name
    
      static void ColouriseMySQLDoc(Sci_PositionU startPos, Sci_Position length, int initStyle, WordList *keywordlists[],
                                                                         ^^^^^^

sarif-digest

Get an idea of the SARIF file structure by showing only first / last entries in arrays.

  sarif-digest  data/torvalds_linux__2021-10-21_10_07_00__export.sarif |less

sarif-labeled

Display the SARIF file with explicit paths inserted before json objects and selected array entries. Handy when reverse-engineering the format by searching for results.

  sarif-labeled  data/torvalds_linux__2021-10-21_10_07_00__export.sarif |less

For example, the

  "uri": "drivers/gpu/drm/i915/gt/uc/intel_guc.c",

is nested; the labeled display shows where:

  "sarif_struct['runs'][1]['results'][4]['locations'][0]['physicalLocation']['artifactLocation']": "----path----",
  "artifactLocation": {
  "uri": "drivers/gpu/drm/i915/gt/uc/intel_guc.c",

sarif-list-files

Display the list of files referenced by a SARIF file. This is the tools used to get file names that ultimately went into data/linux-small/ and data/wxWidgets-small/.

  sarif-list-files data/wxWidgets_wxWidgets__2021-11-21_16_06_30__export.sarif

Sample Data

The query results in data/ are taken from lgtm.com, which ran the

ql/$LANG/ql/src/codeql-suites/$LANG-lgtm.qls

queries.

The linux kernel has both single-location results ("kind": "problem") and path results ("kind": "path-problem"). It also has results for multiple source languages.

The subset of files referenced by the sarif results is in data/linux-small/ and is taken from

  "versionControlProvenance": [
      {
          "repositoryUri": "https://github.com/torvalds/linux.git",
          "revisionId": "d9abdee5fd5abffd0e763e52fbfa3116de167822"
      }
  ]

The wxWidgets library has both single-location results ("kind": "problem") and path results ("kind": "path-problem").

The subset of files referenced by the sarif results is in data/wxWidgets-small/ and is taken from

  "repositoryUri": "https://github.com/wxWidgets/wxWidgets.git",
  "revisionId": "7a03d5fe9bca2d2a2cd81fc0620bcbd2cbc4c7b0"
Description
Command line tools for working with SARIF files
Readme MIT 19 MiB
Languages
C 70.2%
C++ 18.6%
Python 7.6%
JavaScript 1.1%
Jupyter Notebook 1.1%
Other 1.3%