When using : with open(fname, 'r') as file: hits the accented letter á in Vrána in the file : data/wxWidgets-small/src/stc/scintilla/lexers/LexCSS.cxx it results in a : UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 119: invalid continuation byte We are reading source code, so we likely don't care about dropping non-ascii; using : with codecs.open(fname, 'r', encoding="latin-1") as file: ignores this problem.
Collection of cli tools for SARIF processing
THIS IS A WORK IN PROGRESS
Each of these tools present a high-level command-line interface to extract a specific subset of information from a SARIF file. The format of each tool's output is versioned and, as much as possible, independent of the input.
It is the intent of these tools to
- hide the internals of sarif when used
- provide examples of extracting information from SARIF files while writing your own or extending the tools
Setup for development
This repository uses git lfs for some larger files; installation steps are at
git-lfs; on a mac with homebrew, install it via
brew install git-lfs
git lfs install
Set up the virtual environment and install the packages:
python3 -m venv .venv
. .venv/bin/activate
python3 -m pip install -r requirements.txt
# Or separately:
pip install --upgrade pip
pip install ipython pyyaml
"Install" for local development:
pip install -e .
Sample Data
The query results in data/ are taken from lgtm.com, which ran the
ql/$LANG/ql/src/codeql-suites/$LANG-lgtm.qls
queries.
The linux kernel has both single-location results ("kind": "problem") and path
results ("kind": "path-problem"). It also has results for multiple source
languages.
The subset of files referenced by the sarif results is in data/linux-small/
and is taken from
"versionControlProvenance": [
{
"repositoryUri": "https://github.com/torvalds/linux.git",
"revisionId": "d9abdee5fd5abffd0e763e52fbfa3116de167822"
}
]