Commit Graph

210 Commits

Author SHA1 Message Date
Michael Hohn
2b75988b9a sarif-to-dot: add more support for --fill-structure option
Expand all 'properties' objects to common signature; instead of the 3
entries, get one:

( 'struct',
('kind', 'String'),
('precision', 'String'),
('severity', 'String'),
('tags', 'Array003')))

( 'struct',
('kind', 'String'),
('precision', 'String'),
('security-severity', 'String'),
('severity', 'String'),
('tags', 'Array003'))

( 'struct',
('kind', 'String'),
('precision', 'String'),
('severity', 'String'),
('sub-severity', 'String'),
('tags', 'Array003'))
2022-01-26 15:41:26 -08:00
Michael Hohn
153eba8346 sarif-to-dot: to reduce graph clutter, add option --no-edges-to-scalars 2022-01-26 00:41:31 -08:00
Michael Hohn
d7d566c5db sarif-to-dot: add more support for --fill-structure option
Collapse multipl 'physicalLocation's into one; from
 ( 'Struct006',
    ('struct', ('artifactLocation', 'Struct000'), ('region', 'Struct005'))),

 ('Struct036', ('struct', ('artifactLocation', 'Struct000'))),

to

 ( 'Struct006',
    ('struct', ('artifactLocation', 'Struct000'), ('region', 'Struct005'))),
2022-01-25 23:43:43 -08:00
Michael Hohn
b816705574 sarif-to-dot: add --fill-structure option and initial library support
This collapses the rightmost column of the signature output from

    ../../bin/sarif-to-dot -u -t -d -f results.sarif | dot -Tpdf

which has multiple distinct entries

 ('Struct030', ('struct', ('endColumn', 'Int'), ('startLine', 'Int'))),
 ( 'Struct016',
    ( 'struct',
      ('endColumn', 'Int'),
      ('startColumn', 'Int'),
      ('startLine', 'Int'))),
 ( 'Struct025',
    ( 'struct',
      ('endColumn', 'Int'),
      ('endLine', 'Int'),
      ('startColumn', 'Int'),
      ('startLine', 'Int'))),
 ('Struct030', ('struct', ('endColumn', 'Int'), ('startLine', 'Int'))),

to a single entry,

  ( 'Struct005',
    ( 'struct',
      ('endColumn', 'Int'),
      ('endLine', 'Int'),
      ('startColumn', 'Int'),
      ('startLine', 'Int'))),

when using

    ../../bin/sarif-to-dot results.sarif -u -t -f
2022-01-25 23:18:20 -08:00
Michael Hohn
edfe1f3363 sarif-to-dot: move signature functions into their own module 2022-01-25 17:57:44 -08:00
Michael Hohn
0444a87076 sarif-to-dot: remove module-variable references 2022-01-25 17:49:07 -08:00
Michael Hohn
86caa3f56f sarif-to-dot: small renaming 2022-01-24 17:24:30 -08:00
Michael Hohn
939ba9bd8a sarif-to-dot: output array signatures as nodes, not edges; fix raise statements 2022-01-20 18:09:45 -08:00
Michael Hohn
cef9b47b58 sarif-to-dot: produce dot output using -d option
The command
   ../../bin/sarif-to-dot results.sarif -u -t -d | dot -Tpdf > raw-nested-types.pdf
produces a good illustration of the problems arising when optional values are absent.
To clean this up, structures missing fields have to be supplemented with those fields,
from right to left in the graph.
This is basically what sarif-results-summary does on the fly, it just has to be applied
to the input tree before collecting the signatures and producing this graph.
Once that is done, the types collected here can be used in SQL table export.
2022-01-16 14:21:23 -08:00
Michael Hohn
113fa483ca traverse: add file header 2022-01-16 13:23:33 -08:00
Michael Hohn
d64b100101 sarif-to-dot: move processing code to the end 2022-01-16 01:39:24 -08:00
Michael Hohn
b94be6a21e sarif-to-dot: map values to their typedf 2022-01-16 01:26:24 -08:00
Michael Hohn
afca6b341a sarif-to-dot: improved output, add three options
-   Full view with some clean-up:
    45608 lines

        cd ~/local/sarif-cli/data/treeio
        ../../bin/sarif-to-dot results.sarif | tr -d "',[]"  |less

-   Only show unique array entry signatures
    1573 lines

        cd ~/local/sarif-cli/data/treeio
        ../../bin/sarif-to-dot results.sarif -u | tr -d "',[]"  |less

-   Only show unique array entry signatures, typedef object signatures
    338 lines

        cd ~/local/sarif-cli/data/treeio
        ../../bin/sarif-to-dot results.sarif -u -t | tr -d "',[]"  |less
2022-01-16 01:07:24 -08:00
Michael Hohn
706e4cdd54 sarif-to-dot: Print the type signature of a sarif file, at various levels of verbosity. 2022-01-15 23:05:06 -08:00
Michael Hohn
ef08825b43 Processing in stages: Move the initial sarif_cli code to sarif_cli/traverse 2021-12-22 18:03:34 -08:00
Michael Hohn
7d49c3bd08 Update the sarif-results-summary examples 2021-12-22 17:48:24 -08:00
Michael Hohn
558e218d3b Add endpoints-only option for path output and a collection of usage samples 2021-12-21 14:05:27 -08:00
Michael Hohn
79649a6226 Add treeio/ files referenced in sarif 2021-12-18 14:58:51 -08:00
Michael Hohn
979042ff5c Add a 3 =relatedLocations= and 3 =threadFlows= example 2021-12-18 14:58:10 -08:00
Michael Hohn
f0e52753f6 Illustration of the steps needed to pull in used source files only 2021-12-18 14:56:39 -08:00
Michael Hohn
9590d0a677 Add newline after dbg(message) output 2021-12-18 14:19:38 -08:00
Michael Hohn
291726dd58 Add smaller sarif test files 2021-12-18 13:19:11 -08:00
Michael Hohn
68a661fffb Added notes on more thorough examination of multiple results 2021-12-18 00:33:38 -08:00
Michael Hohn
7e66e29f53 Fix editing error 2021-12-15 14:02:27 -08:00
Michael Hohn
62ae8dca4a Correct the =sarif-results-summary= commands 2021-12-10 11:56:10 -08:00
Michael Hohn
780def7063 Add utility scripts to retrieve sarif files from lgtm 2021-12-10 11:25:03 -08:00
Michael Hohn
5386310b1b Prepend path index to data flow results; use single newlines 2021-12-08 16:28:32 -08:00
Michael Hohn
f1d21e4a43 Fix missing 'region' key in relatedLocations: use whole-file output
The goal is fixed-structure output formatting, so whole-file output uses
-1,-1,-1,-1 for line, column information.
2021-12-08 16:02:31 -08:00
Michael Hohn
1271589bc4 Fix class NoFile: comment 2021-12-06 15:34:03 -08:00
Michael Hohn
92d904ee10 Add quick check to verify that input is serif
An occasional output from LGTM is
    {"code":404,"error":"The specified analysis could not be found"}

With this patch, the csv output is now
    "ERROR","invalid json contents %s","some-file.json"

and the plain text output becomes
    ERROR: invalid json contents in some-file.json
2021-12-06 14:24:08 -08:00
Michael Hohn
120e673424 Fix: handle relatedLocations without physicalLocations (files)
Problem:
    The
        artifact = get(related_location, 'physicalLocation', 'artifactLocation')
    requested by
        message, artifact, region = S.get_relatedlocation_message_info(location)
    is incomplete:
        ipdb> p related_location
        {'message': {'text': 'request'}}

Fix:
    Introduce the NoFile class to propagate this and handle it where needed.

Now simply report <NoFile> as appropriate.
    For plain text output:

        RESULT: src/optionsparser/ ..
        FLOW STEP 0: <NoFile>: request
        FLOW STEP 1: <NoFile>: request_mp
        FLOW STEP 2: src/....

    For csv output:

        "result","src/optionsparser/...","116","26","116","34","`& ...` used as ..."
        "flow_step","0","<NoFile>","-1","-1","-1","-1","request"
        "flow_step","1","<NoFile>","-1","-1","-1","-1","request_mp"
        "flow_step","2","src/foo.cpp","119","97","119","104","request"
2021-12-06 12:37:35 -08:00
Michael Hohn
2c3ca3c0eb Fix for KeyError: 'region', caused by result without region
Region / line / column information are present in most messages.  The one that
caused this error refers to the whole file:

    ipdb> p sarif_struct

    {'ruleId': 'com.lgtm/cpp-queries:cpp/missing-header-guard', 'ruleIndex': 12,
    'message': {'text': 'This header file should contain a header guard to prevent
    multiple inclusion.'}, 'locations': [{'physicalLocation': {'artifactLocation':
    {'uri': 'diff/cmpbuf.h', 'uriBaseId': '%SRCROOT%', 'index': 13}}}],
    'partialFingerprints': {'primaryLocationLineHash': 'd04cb834fa64727d:1',
    'primaryLocationStartColumnFingerprint': '0'}}

The goal is fixed-structure output formatting, so whole-file output uses
-1,-1,-1,-1 for line, column information.
2021-12-06 11:48:53 -08:00
Michael Hohn
ffcacec630 sarif-results-summary: add csv output option 2021-12-06 11:48:53 -08:00
Michael Hohn
f9c3e18842 Add * Examples to README 2021-12-06 11:48:53 -08:00
Michael Hohn
44f61dc70c Add wxWidget subset as test case 2021-12-06 11:48:53 -08:00
Michael Hohn
f0aa815a9a Fix encoding read error
When using
: with open(fname, 'r') as file:
hits the accented letter á in Vrána in the file
: data/wxWidgets-small/src/stc/scintilla/lexers/LexCSS.cxx
it results in a
: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 119: invalid continuation byte

We are reading source code, so we likely don't care about dropping non-ascii; using
: with codecs.open(fname, 'r', encoding="latin-1") as file:
ignores this problem.
2021-12-06 11:48:53 -08:00
Michael Hohn
85ddaaafe1 sarif-results-summary: add codeFlow (path-problem) output, remove meta-data
The per-language result counts are removed; they belong in a separate sarif-info script.
2021-12-06 11:48:53 -08:00
Michael Hohn
29b62b8b1a Remove unused requirements 2021-12-06 11:48:53 -08:00
Michael Hohn
303d063940 Add note on git lfs requirement 2021-12-06 11:48:53 -08:00
Michael Hohn
6147e57260 Introduce get_relatedlocation_message_info to co-locate tree information 2021-11-17 16:34:20 -08:00
Michael Hohn
1f7e78b049 refactor: introduce get_location_message_info 2021-11-17 16:28:43 -08:00
Michael Hohn
8036ea5ffc factor common result prefix 2021-11-17 16:14:36 -08:00
Michael Hohn
90758f769f factor common code into display_underlined 2021-11-17 15:56:43 -08:00
Michael Hohn
f5bb156c8c Add option to print related location info (sarif-results-summary -r) 2021-11-16 21:46:55 -08:00
Michael Hohn
9f3be7bcb0 Log missing files, but try to continue execution 2021-11-16 21:45:54 -08:00
Michael Hohn
502cb21850 Add source files for relatedLocations 2021-11-16 21:42:28 -08:00
Michael Hohn
4ca7dda579 Add TODO to sarif-list-files
TODO: list files from the relatedLocations property
2021-11-16 21:32:07 -08:00
Michael Hohn
e36874cb54 sarif-results-summary: underline affected code region
Using
    sarif-results-summary -s data/linux-small data/torvalds_linux__2021-10-21_10_07_00__export.sarif |less
now underscores the indicated regions, e.g.

tools/cgroup/iocost_monitor.py:64:5:64:27: Normal methods should have 'self', rather than 'blkcg', as their first parameter.

    def blkcg_name(blkcg):
    ^^^^^^^^^^^^^^^^^^^^^^
2021-11-15 14:16:23 -08:00
Michael Hohn
a756abbb09 Consistency with tabs in Python source code
In load_lines, use 1 space for each tab
2021-11-15 14:00:18 -08:00
Michael Hohn
912f75c52a fix load_lines: only strip newlines 2021-11-15 13:41:51 -08:00