Region / line / column information are present in most messages. The one that
caused this error refers to the whole file:
ipdb> p sarif_struct
{'ruleId': 'com.lgtm/cpp-queries:cpp/missing-header-guard', 'ruleIndex': 12,
'message': {'text': 'This header file should contain a header guard to prevent
multiple inclusion.'}, 'locations': [{'physicalLocation': {'artifactLocation':
{'uri': 'diff/cmpbuf.h', 'uriBaseId': '%SRCROOT%', 'index': 13}}}],
'partialFingerprints': {'primaryLocationLineHash': 'd04cb834fa64727d:1',
'primaryLocationStartColumnFingerprint': '0'}}
The goal is fixed-structure output formatting, so whole-file output uses
-1,-1,-1,-1 for line, column information.
When using
: with open(fname, 'r') as file:
hits the accented letter á in Vrána in the file
: data/wxWidgets-small/src/stc/scintilla/lexers/LexCSS.cxx
it results in a
: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 119: invalid continuation byte
We are reading source code, so we likely don't care about dropping non-ascii; using
: with codecs.open(fname, 'r', encoding="latin-1") as file:
ignores this problem.
Using
sarif-results-summary -s data/linux-small data/torvalds_linux__2021-10-21_10_07_00__export.sarif |less
now underscores the indicated regions, e.g.
tools/cgroup/iocost_monitor.py:64:5:64:27: Normal methods should have 'self', rather than 'blkcg', as their first parameter.
def blkcg_name(blkcg):
^^^^^^^^^^^^^^^^^^^^^^