This command introduces a new tree structure that pulls in a collection
of sarif files. In yaml format, an example is
- creation_date: '2021-12-09' # Repository creation date
primary_language: javascript # By lines of code
project_name: treeio/treeio # Repo name-short name
query_commit_id: fa9571646c # Commit id for custom (non-library) queries
sarif_content: {} # The sarif content will be attached here
sarif_file_name: 2021-12-09/results.sarif # Path to sarif file
scan_start_date: '2021-12-09' # Beginning date/time of scan
scan_stop_date: '2021-12-10' # End date/time of scan
tool_name: codeql
tool_version: v1.27
- creation_date: '2022-02-25'
primary_language: javascript
...
At run time,
cd ~/local/sarif-cli/data/treeio
sarif-extract-multi multi-sarif-01.json test-multi-table
will load the specified sarif files and put them in place of
`sarif_content`, then build tables against the new signature found in
sarif_cli/signature_multi.py, and merge those into 6 larger tables. The
exported tables are
artifacts.csv path-problem.csv project.csv
codeflows.csv problem.csv related-locations.csv
and they have join keys for further operations.
The new typegraph is rendered in
notes/typegraph-multi.pdf
using the instructions in
sarif_cli/signature_multi.py
Collection of cli tools for SARIF processing
THIS IS A WORK IN PROGRESS
Each of these tools present a high-level command-line interface to extract a specific subset of information from a SARIF file. The format of each tool's output will be versioned and, as much as possible, independent of the input.
For human use and to fit with existing tools, the default output format is line-oriented and resembles compiler error formatting.
The goal of this tool set is to support working with sarif files
- at the shell / file level,
- across multiple versions of the same sarif result set,
- and across many repositories.
The implementation language is Python, but that is a detail. The scripts should
work well when used with other shell tools, especially diff and git.
Setup for development
This repository uses git lfs for some larger files; installation steps are at
git-lfs; on a mac with homebrew, install it via
brew install git-lfs
git lfs install
Set up the virtual environment and install the packages:
# Using requirements.txt
python3 -m venv .venv
. .venv/bin/activate
python3 -m pip install -r requirements.txt
# Or separately:
pip install --upgrade pip
pip install ipython pyyaml pandas
"Install" for local development:
pip install -e .
Examples
To use git parlance, the porcelain tool is sarif-results-summary, while the
plumbing tools are sarif-digest, sarif-labeled and sarif-list-files.
Following are short summaries of each.
sarif-results-summary
Display the SARIF results in human-readable plain text form.
Starting with the data/wxWidgets sample and the warning around
src/stc/scintilla/lexers/LexMySQL.cxx:153:24:153:30:
there are several options using only the SARIF file, and one more when source code is available.
The following show the command and the output, limited to the intended result
via sed:
-
Display only main result, using no options.
.venv/bin/sarif-results-summary \ data/wxWidgets_wxWidgets__2021-11-21_16_06_30__export.sarif 2>&1 |\ sed -n "/LexMySQL.cxx:153:24:153:30/,/RESULT/p" | sed '$d'RESULT: src/stc/scintilla/lexers/LexMySQL.cxx:153:24:153:30: Local variable 'length' hides a [parameter of the same name](1). -
Display the related information.
.venv/bin/sarif-results-summary \ -r data/wxWidgets_wxWidgets__2021-11-21_16_06_30__export.sarif 2>&1 |\ sed -n "/LexMySQL.cxx:153:24:153:30/,/RESULT/p" | sed '$d'RESULT: src/stc/scintilla/lexers/LexMySQL.cxx:153:24:153:30: Local variable 'length' hides a [parameter of the same name](1). REFERENCE: src/stc/scintilla/lexers/LexMySQL.cxx:108:68:108:74: parameter of the same name -
Include source code snippets (when the source is available):
.venv/bin/sarif-results-summary \ -s data/wxWidgets-small \ -r data/wxWidgets_wxWidgets__2021-11-21_16_06_30__export.sarif 2>&1 |\ sed -n "/LexMySQL.cxx:153:24:153:30/,/RESULT/p" | sed '$d'RESULT: src/stc/scintilla/lexers/LexMySQL.cxx:153:24:153:30: Local variable 'length' hides a [parameter of the same name](1). Sci_Position length = sc.LengthCurrent() + 1; ^^^^^^ REFERENCE: src/stc/scintilla/lexers/LexMySQL.cxx:108:68:108:74: parameter of the same name static void ColouriseMySQLDoc(Sci_PositionU startPos, Sci_Position length, int initStyle, WordList *keywordlists[], ^^^^^^
To illustrate the flow steps options, switch to the data/treeio sample:
-
Result with flow steps and relatedLocations
read -r file srcroot <<< "data/treeio/results.sarif data/treeio/treeio" start="treeio.core.middleware.chat.py:395:29:395:33" .venv/bin/sarif-results-summary -r $file | sed -n "/$start/,/RESULT/p" | sed '$d'RESULT: treeio/core/middleware/chat.py:395:29:395:33: [Error information](1) may be exposed to an external user REFERENCE: treeio/core/middleware/chat.py:394:50:394:64: Error information PATH 0 FLOW STEP 0: treeio/core/middleware/chat.py:394:50:394:64: ControlFlowNode for Attribute() FLOW STEP 1: treeio/core/middleware/chat.py:394:38:394:66: ControlFlowNode for Dict FLOW STEP 2: treeio/core/middleware/chat.py:394:13:394:67: ControlFlowNode for Dict FLOW STEP 3: treeio/core/middleware/chat.py:395:29:395:33: ControlFlowNode for data PATH 1 FLOW STEP 0: treeio/core/middleware/chat.py:394:50:394:64: ControlFlowNode for Attribute() FLOW STEP 1: treeio/core/middleware/chat.py:394:46:394:65: ControlFlowNode for str() FLOW STEP 2: treeio/core/middleware/chat.py:394:38:394:66: ControlFlowNode for Dict FLOW STEP 3: treeio/core/middleware/chat.py:394:13:394:67: ControlFlowNode for Dict FLOW STEP 4: treeio/core/middleware/chat.py:395:29:395:33: ControlFlowNode for data -
Result with flow steps, relatedLocations, and source
read -r file srcroot <<< "data/treeio/results.sarif data/treeio/treeio" start="treeio.core.middleware.chat.py:395:29:395:33" .venv/bin/sarif-results-summary -r -s $srcroot $file | \ sed -n "/$start/,/RESULT/p" | sed '$d'RESULT: treeio/core/middleware/chat.py:395:29:395:33: [Error information](1) may be exposed to an external user return HttpResponse(data, content_type='application/json', status=200) ^^^^ REFERENCE: treeio/core/middleware/chat.py:394:50:394:64: Error information {"cmd": "Error", "data": {"msg": str(sys.exc_info())}}) ^^^^^^^^^^^^^^ PATH 0 FLOW STEP 0: treeio/core/middleware/chat.py:394:50:394:64: ControlFlowNode for Attribute() {"cmd": "Error", "data": {"msg": str(sys.exc_info())}}) ^^^^^^^^^^^^^^ FLOW STEP 1: treeio/core/middleware/chat.py:394:38:394:66: ControlFlowNode for Dict {"cmd": "Error", "data": {"msg": str(sys.exc_info())}}) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ FLOW STEP 2: treeio/core/middleware/chat.py:394:13:394:67: ControlFlowNode for Dict {"cmd": "Error", "data": {"msg": str(sys.exc_info())}}) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ FLOW STEP 3: treeio/core/middleware/chat.py:395:29:395:33: ControlFlowNode for data return HttpResponse(data, content_type='application/json', status=200) ^^^^ PATH 1 FLOW STEP 0: treeio/core/middleware/chat.py:394:50:394:64: ControlFlowNode for Attribute() {"cmd": "Error", "data": {"msg": str(sys.exc_info())}}) ^^^^^^^^^^^^^^ FLOW STEP 1: treeio/core/middleware/chat.py:394:46:394:65: ControlFlowNode for str() {"cmd": "Error", "data": {"msg": str(sys.exc_info())}}) ^^^^^^^^^^^^^^^^^^^ FLOW STEP 2: treeio/core/middleware/chat.py:394:38:394:66: ControlFlowNode for Dict {"cmd": "Error", "data": {"msg": str(sys.exc_info())}}) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ FLOW STEP 3: treeio/core/middleware/chat.py:394:13:394:67: ControlFlowNode for Dict {"cmd": "Error", "data": {"msg": str(sys.exc_info())}}) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ FLOW STEP 4: treeio/core/middleware/chat.py:395:29:395:33: ControlFlowNode for data return HttpResponse(data, content_type='application/json', status=200) ^^^^
sarif-digest
Get an idea of the SARIF file structure by showing only first / last entries in arrays.
sarif-digest data/torvalds_linux__2021-10-21_10_07_00__export.sarif |less
sarif-labeled
Display the SARIF file with explicit paths inserted before json objects and selected array entries. Handy when reverse-engineering the format by searching for results.
sarif-labeled data/torvalds_linux__2021-10-21_10_07_00__export.sarif |less
For example, the
"uri": "drivers/gpu/drm/i915/gt/uc/intel_guc.c",
is nested; the labeled display shows where:
"sarif_struct['runs'][1]['results'][4]['locations'][0]['physicalLocation']['artifactLocation']": "----path----",
"artifactLocation": {
"uri": "drivers/gpu/drm/i915/gt/uc/intel_guc.c",
sarif-list-files
Display the list of files referenced by a SARIF file. This is the tools used to
get file names that ultimately went into data/linux-small/ and
data/wxWidgets-small/.
sarif-list-files data/wxWidgets_wxWidgets__2021-11-21_16_06_30__export.sarif
Sample Data
The query results in data/ are taken from lgtm.com, which ran the
ql/$LANG/ql/src/codeql-suites/$LANG-lgtm.qls
queries.
The linux kernel has both single-location results ("kind": "problem") and path
results ("kind": "path-problem"). It also has results for multiple source
languages.
The subset of files referenced by the sarif results is in data/linux-small/
and is taken from
"versionControlProvenance": [
{
"repositoryUri": "https://github.com/torvalds/linux.git",
"revisionId": "d9abdee5fd5abffd0e763e52fbfa3116de167822"
}
]
The wxWidgets library has both single-location results ("kind": "problem") and path
results ("kind": "path-problem").
The subset of files referenced by the sarif results is in data/wxWidgets-small/
and is taken from
"repositoryUri": "https://github.com/wxWidgets/wxWidgets.git",
"revisionId": "7a03d5fe9bca2d2a2cd81fc0620bcbd2cbc4c7b0"