The command
../../bin/sarif-to-dot results.sarif -u -t -d | dot -Tpdf > raw-nested-types.pdf
produces a good illustration of the problems arising when optional values are absent.
To clean this up, structures missing fields have to be supplemented with those fields,
from right to left in the graph.
This is basically what sarif-results-summary does on the fly, it just has to be applied
to the input tree before collecting the signatures and producing this graph.
Once that is done, the types collected here can be used in SQL table export.
An occasional output from LGTM is
{"code":404,"error":"The specified analysis could not be found"}
With this patch, the csv output is now
"ERROR","invalid json contents %s","some-file.json"
and the plain text output becomes
ERROR: invalid json contents in some-file.json
Problem:
The
artifact = get(related_location, 'physicalLocation', 'artifactLocation')
requested by
message, artifact, region = S.get_relatedlocation_message_info(location)
is incomplete:
ipdb> p related_location
{'message': {'text': 'request'}}
Fix:
Introduce the NoFile class to propagate this and handle it where needed.
Now simply report <NoFile> as appropriate.
For plain text output:
RESULT: src/optionsparser/ ..
FLOW STEP 0: <NoFile>: request
FLOW STEP 1: <NoFile>: request_mp
FLOW STEP 2: src/....
For csv output:
"result","src/optionsparser/...","116","26","116","34","`& ...` used as ..."
"flow_step","0","<NoFile>","-1","-1","-1","-1","request"
"flow_step","1","<NoFile>","-1","-1","-1","-1","request_mp"
"flow_step","2","src/foo.cpp","119","97","119","104","request"
Region / line / column information are present in most messages. The one that
caused this error refers to the whole file:
ipdb> p sarif_struct
{'ruleId': 'com.lgtm/cpp-queries:cpp/missing-header-guard', 'ruleIndex': 12,
'message': {'text': 'This header file should contain a header guard to prevent
multiple inclusion.'}, 'locations': [{'physicalLocation': {'artifactLocation':
{'uri': 'diff/cmpbuf.h', 'uriBaseId': '%SRCROOT%', 'index': 13}}}],
'partialFingerprints': {'primaryLocationLineHash': 'd04cb834fa64727d:1',
'primaryLocationStartColumnFingerprint': '0'}}
The goal is fixed-structure output formatting, so whole-file output uses
-1,-1,-1,-1 for line, column information.
When using
: with open(fname, 'r') as file:
hits the accented letter á in Vrána in the file
: data/wxWidgets-small/src/stc/scintilla/lexers/LexCSS.cxx
it results in a
: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 119: invalid continuation byte
We are reading source code, so we likely don't care about dropping non-ascii; using
: with codecs.open(fname, 'r', encoding="latin-1") as file:
ignores this problem.
Using
sarif-results-summary -s data/linux-small data/torvalds_linux__2021-10-21_10_07_00__export.sarif |less
now underscores the indicated regions, e.g.
tools/cgroup/iocost_monitor.py:64:5:64:27: Normal methods should have 'self', rather than 'blkcg', as their first parameter.
def blkcg_name(blkcg):
^^^^^^^^^^^^^^^^^^^^^^