Michael Hohn f0aa815a9a Fix encoding read error
When using
: with open(fname, 'r') as file:
hits the accented letter á in Vrána in the file
: data/wxWidgets-small/src/stc/scintilla/lexers/LexCSS.cxx
it results in a
: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 119: invalid continuation byte

We are reading source code, so we likely don't care about dropping non-ascii; using
: with codecs.open(fname, 'r', encoding="latin-1") as file:
ignores this problem.
2021-12-06 11:48:53 -08:00
2021-12-06 11:48:53 -08:00
2021-11-09 12:25:37 -08:00
2021-12-06 11:48:53 -08:00
2021-12-06 11:48:53 -08:00

Collection of cli tools for SARIF processing

THIS IS A WORK IN PROGRESS

Each of these tools present a high-level command-line interface to extract a specific subset of information from a SARIF file. The format of each tool's output is versioned and, as much as possible, independent of the input.

It is the intent of these tools to

  • hide the internals of sarif when used
  • provide examples of extracting information from SARIF files while writing your own or extending the tools

Setup for development

This repository uses git lfs for some larger files; installation steps are at git-lfs; on a mac with homebrew, install it via

  brew install git-lfs
  git lfs install

Set up the virtual environment and install the packages:

  python3 -m venv .venv
  . .venv/bin/activate
  python3 -m pip install -r requirements.txt
  # Or separately:
  pip install --upgrade pip
  pip install ipython pyyaml

"Install" for local development:

pip install -e .

Sample Data

The query results in data/ are taken from lgtm.com, which ran the

ql/$LANG/ql/src/codeql-suites/$LANG-lgtm.qls

queries.

The linux kernel has both single-location results ("kind": "problem") and path results ("kind": "path-problem"). It also has results for multiple source languages.

The subset of files referenced by the sarif results is in data/linux-small/ and is taken from

  "versionControlProvenance": [
      {
          "repositoryUri": "https://github.com/torvalds/linux.git",
          "revisionId": "d9abdee5fd5abffd0e763e52fbfa3116de167822"
      }
  ]
Description
Command line tools for working with SARIF files
Readme MIT 19 MiB
Languages
C 70.2%
C++ 18.6%
Python 7.6%
JavaScript 1.1%
Jupyter Notebook 1.1%
Other 1.3%