Michael Hohn ad738abed3 sarif-extract-tables: also output relatedLocations table
With --related-locations,

    ../../bin/sarif-results-summary -r results.sarif

produces the details

    RESULT: static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js:722:
    72:722:73: Character ''' is repeated [here](1) in the same character class.
    Character ''' is repeated [here](2) in the same character class.
    Character ''' is repeated [here](3) in the same character class.
    REFERENCE: static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js:722:74:722:75: here
    REFERENCE: static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js:722:76:722:77: here
    REFERENCE: static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js:722:78:722:79: here

Via
    ../../bin/sarif-extract-tables results.sarif tables

sarif-extract-tables now produces two output tables,

    tables/
    ├── messages.csv
    └── relatedLocations.csv

that contain the relevant information and can be joined or otherwise combined on
the struct_id_4055 key.

For example, adding to the end of sarif-extract-tables:
    import IPython
    IPython.embed()

    msg = d2[d2.message.str.startswith("Character ''' is repeated [here]")]
    dr3[dr3.struct_id_4055 == msg.struct_id_4055.values[0]]

    In [24]: msg
    Out[24]:
         struct_id_4055  ...                                            message
    180      4796917312  ...  Character ''' is repeated [here](1) in the sam...

    [1 rows x 7 columns]

    In [25]: dr3[dr3.struct_id_4055 == msg.struct_id_4055.values[0]]
    Out[25]:
         struct_id_4055                                                uri  startLine  startColumn  endLine  endColumn message
    180      4796917312  static/js/tinymce/jscripts/tiny_mce/plugins/pa...        722           74      722         75    here
    181      4796917312  static/js/tinymce/jscripts/tiny_mce/plugins/pa...        722           76      722         77    here
    182      4796917312  static/js/tinymce/jscripts/tiny_mce/plugins/pa...        722           78      722         79    here

or manually from the shell:

    # pick up the struct_id_4055:
    0:$ grep "static.*Character ''' is repeated \[here\]" tables/messages.csv
    180,4927448704,static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js,722,72,722,73,"Character ''' is repeated [here](1) in the same character class.

    # and find relatedLocations:
    0:$ grep 4927448704 tables/relatedLocations.csv
    180,4927448704,static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js,722,74,722,75,here
    181,4927448704,static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js,722,76,722,77,here
    182,4927448704,static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js,722,78,722,79,here

Changes:
- Introduce scli-dyys, a random id string for later identification and removal of
  dummy table rows.

- Keep the struct_id_4055 column to join tables as needed.

- Output is now written to a directory as there are always multiple files.
2022-02-16 17:03:58 -08:00
2022-02-01 22:31:10 -08:00
2021-11-09 12:25:37 -08:00
2021-12-06 11:48:53 -08:00

Collection of cli tools for SARIF processing

THIS IS A WORK IN PROGRESS

Each of these tools present a high-level command-line interface to extract a specific subset of information from a SARIF file. The format of each tool's output will be versioned and, as much as possible, independent of the input.

For human use and to fit with existing tools, the default output format is line-oriented and resembles compiler error formatting.

The goal of this tool set is to support working with sarif files

  • at the shell / file level,
  • across multiple versions of the same sarif result set,
  • and across many repositories.

The implementation language is Python, but that is a detail. The scripts should work well when used with other shell tools, especially diff and git.

Setup for development

This repository uses git lfs for some larger files; installation steps are at git-lfs; on a mac with homebrew, install it via

  brew install git-lfs
  git lfs install

Set up the virtual environment and install the packages:

  # Using requirements.txt 
  python3 -m venv .venv
  . .venv/bin/activate
  python3 -m pip install -r requirements.txt
  # Or separately:
  pip install --upgrade pip
  pip install ipython pyyaml pandas

"Install" for local development:

pip install -e .

Examples

To use git parlance, the porcelain tool is sarif-results-summary, while the plumbing tools are sarif-digest, sarif-labeled and sarif-list-files.

Following are short summaries of each.

sarif-results-summary

Display the SARIF results in human-readable plain text form.

Starting with the data/wxWidgets sample and the warning around

  src/stc/scintilla/lexers/LexMySQL.cxx:153:24:153:30:

there are several options using only the SARIF file, and one more when source code is available.

The following show the command and the output, limited to the intended result via sed:

  1. Display only main result, using no options.

      .venv/bin/sarif-results-summary \
          data/wxWidgets_wxWidgets__2021-11-21_16_06_30__export.sarif 2>&1 |\
          sed -n "/LexMySQL.cxx:153:24:153:30/,/RESULT/p" | sed '$d'
    RESULT: src/stc/scintilla/lexers/LexMySQL.cxx:153:24:153:30: Local variable 'length' hides a [parameter of the same name](1).
  2. Display the related information.

      .venv/bin/sarif-results-summary \
          -r data/wxWidgets_wxWidgets__2021-11-21_16_06_30__export.sarif 2>&1 |\
          sed -n "/LexMySQL.cxx:153:24:153:30/,/RESULT/p" | sed '$d'
    RESULT: src/stc/scintilla/lexers/LexMySQL.cxx:153:24:153:30: Local variable 'length' hides a [parameter of the same name](1).
    REFERENCE: src/stc/scintilla/lexers/LexMySQL.cxx:108:68:108:74: parameter of the same name
  3. Include source code snippets (when the source is available):

      .venv/bin/sarif-results-summary \
          -s data/wxWidgets-small \
          -r data/wxWidgets_wxWidgets__2021-11-21_16_06_30__export.sarif 2>&1 |\
          sed -n "/LexMySQL.cxx:153:24:153:30/,/RESULT/p" | sed '$d'
    RESULT: src/stc/scintilla/lexers/LexMySQL.cxx:153:24:153:30: Local variable 'length' hides a [parameter of the same name](1).
              Sci_Position length = sc.LengthCurrent() + 1;
                           ^^^^^^
    REFERENCE: src/stc/scintilla/lexers/LexMySQL.cxx:108:68:108:74: parameter of the same name
    static void ColouriseMySQLDoc(Sci_PositionU startPos, Sci_Position length, int initStyle, WordList *keywordlists[],
                                                                       ^^^^^^

To illustrate the flow steps options, switch to the data/treeio sample:

  1. Result with flow steps and relatedLocations

      read -r file srcroot <<< "data/treeio/results.sarif data/treeio/treeio"
      start="treeio.core.middleware.chat.py:395:29:395:33"
      .venv/bin/sarif-results-summary -r $file | sed -n "/$start/,/RESULT/p" | sed '$d'
    RESULT: treeio/core/middleware/chat.py:395:29:395:33: [Error information](1) may be exposed to an external user
    REFERENCE: treeio/core/middleware/chat.py:394:50:394:64: Error information
    PATH 0
    FLOW STEP 0: treeio/core/middleware/chat.py:394:50:394:64: ControlFlowNode for Attribute()
    FLOW STEP 1: treeio/core/middleware/chat.py:394:38:394:66: ControlFlowNode for Dict
    FLOW STEP 2: treeio/core/middleware/chat.py:394:13:394:67: ControlFlowNode for Dict
    FLOW STEP 3: treeio/core/middleware/chat.py:395:29:395:33: ControlFlowNode for data
    PATH 1
    FLOW STEP 0: treeio/core/middleware/chat.py:394:50:394:64: ControlFlowNode for Attribute()
    FLOW STEP 1: treeio/core/middleware/chat.py:394:46:394:65: ControlFlowNode for str()
    FLOW STEP 2: treeio/core/middleware/chat.py:394:38:394:66: ControlFlowNode for Dict
    FLOW STEP 3: treeio/core/middleware/chat.py:394:13:394:67: ControlFlowNode for Dict
    FLOW STEP 4: treeio/core/middleware/chat.py:395:29:395:33: ControlFlowNode for data
  2. Result with flow steps, relatedLocations, and source

      read -r file srcroot <<< "data/treeio/results.sarif data/treeio/treeio"
      start="treeio.core.middleware.chat.py:395:29:395:33"
      .venv/bin/sarif-results-summary -r -s $srcroot $file | \
          sed -n "/$start/,/RESULT/p" | sed '$d'
    RESULT: treeio/core/middleware/chat.py:395:29:395:33: [Error information](1) may be exposed to an external user
            return HttpResponse(data, content_type='application/json', status=200)
                                ^^^^
    REFERENCE: treeio/core/middleware/chat.py:394:50:394:64: Error information
                {"cmd": "Error", "data": {"msg": str(sys.exc_info())}})
                                                     ^^^^^^^^^^^^^^
    PATH 0
    FLOW STEP 0: treeio/core/middleware/chat.py:394:50:394:64: ControlFlowNode for Attribute()
                {"cmd": "Error", "data": {"msg": str(sys.exc_info())}})
                                                     ^^^^^^^^^^^^^^
    FLOW STEP 1: treeio/core/middleware/chat.py:394:38:394:66: ControlFlowNode for Dict
                {"cmd": "Error", "data": {"msg": str(sys.exc_info())}})
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    FLOW STEP 2: treeio/core/middleware/chat.py:394:13:394:67: ControlFlowNode for Dict
                {"cmd": "Error", "data": {"msg": str(sys.exc_info())}})
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    FLOW STEP 3: treeio/core/middleware/chat.py:395:29:395:33: ControlFlowNode for data
            return HttpResponse(data, content_type='application/json', status=200)
                                ^^^^
    PATH 1
    FLOW STEP 0: treeio/core/middleware/chat.py:394:50:394:64: ControlFlowNode for Attribute()
                {"cmd": "Error", "data": {"msg": str(sys.exc_info())}})
                                                     ^^^^^^^^^^^^^^
    FLOW STEP 1: treeio/core/middleware/chat.py:394:46:394:65: ControlFlowNode for str()
                {"cmd": "Error", "data": {"msg": str(sys.exc_info())}})
                                                 ^^^^^^^^^^^^^^^^^^^
    FLOW STEP 2: treeio/core/middleware/chat.py:394:38:394:66: ControlFlowNode for Dict
                {"cmd": "Error", "data": {"msg": str(sys.exc_info())}})
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    FLOW STEP 3: treeio/core/middleware/chat.py:394:13:394:67: ControlFlowNode for Dict
                {"cmd": "Error", "data": {"msg": str(sys.exc_info())}})
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    FLOW STEP 4: treeio/core/middleware/chat.py:395:29:395:33: ControlFlowNode for data
            return HttpResponse(data, content_type='application/json', status=200)
                                ^^^^

sarif-digest

Get an idea of the SARIF file structure by showing only first / last entries in arrays.

  sarif-digest  data/torvalds_linux__2021-10-21_10_07_00__export.sarif |less

sarif-labeled

Display the SARIF file with explicit paths inserted before json objects and selected array entries. Handy when reverse-engineering the format by searching for results.

  sarif-labeled  data/torvalds_linux__2021-10-21_10_07_00__export.sarif |less

For example, the

  "uri": "drivers/gpu/drm/i915/gt/uc/intel_guc.c",

is nested; the labeled display shows where:

  "sarif_struct['runs'][1]['results'][4]['locations'][0]['physicalLocation']['artifactLocation']": "----path----",
  "artifactLocation": {
  "uri": "drivers/gpu/drm/i915/gt/uc/intel_guc.c",

sarif-list-files

Display the list of files referenced by a SARIF file. This is the tools used to get file names that ultimately went into data/linux-small/ and data/wxWidgets-small/.

  sarif-list-files data/wxWidgets_wxWidgets__2021-11-21_16_06_30__export.sarif

Sample Data

The query results in data/ are taken from lgtm.com, which ran the

ql/$LANG/ql/src/codeql-suites/$LANG-lgtm.qls

queries.

The linux kernel has both single-location results ("kind": "problem") and path results ("kind": "path-problem"). It also has results for multiple source languages.

The subset of files referenced by the sarif results is in data/linux-small/ and is taken from

  "versionControlProvenance": [
      {
          "repositoryUri": "https://github.com/torvalds/linux.git",
          "revisionId": "d9abdee5fd5abffd0e763e52fbfa3116de167822"
      }
  ]

The wxWidgets library has both single-location results ("kind": "problem") and path results ("kind": "path-problem").

The subset of files referenced by the sarif results is in data/wxWidgets-small/ and is taken from

  "repositoryUri": "https://github.com/wxWidgets/wxWidgets.git",
  "revisionId": "7a03d5fe9bca2d2a2cd81fc0620bcbd2cbc4c7b0"
Description
Command line tools for working with SARIF files
Readme MIT 19 MiB
Languages
C 70.2%
C++ 18.6%
Python 7.6%
JavaScript 1.1%
Jupyter Notebook 1.1%
Other 1.3%