sarif-cli

mirror of https://github.com/hohn/sarif-cli.git synced 2025-12-16 09:13:04 +01:00

Author	SHA1	Message	Date
Michael Hohn	82a8e7a6dc	fix: set id and scan_id type to uint64 to suppress float conversion	2022-06-01 13:00:37 -07:00
Michael Hohn	0fc6eb3cce	Improve error reporting in sarif destructuring routines	2022-05-30 00:09:13 -07:00
Michael Hohn	f5e258de52	Enhance the fillsig() routines to supplement lgtm.com/lgtm enterprise signature differences	2022-05-30 00:08:09 -07:00
Michael Hohn	eb8e2f18e9	Initial version of sarif-extract-scans, to be tested Running cd ~/local/sarif-cli/data/treeio sarif-extract-scans scan-spec-0.json test-scan produces the 2 derived and one sarif-based table (codeflows.csv): ls test-scan/ codeflows.csv results.csv scans.csv Adding -r via sarif-extract-scans -r scan-spec-0.json test-scan writes all tables: ls test-scan/ artifacts.csv kind_pathproblem.csv project.csv results.csv scans.csv codeflows.csv kind_problem.csv relatedLocations.csv rules.csv	2022-05-16 18:58:53 -07:00
Michael Hohn	154b0bdc56	WIP: assemble derived 'results' table	2022-05-13 17:01:18 -07:00
Michael Hohn	b212423907	WIP: sarif-extract-scans: back to single sarif file handling, incorporate multi-file libraries	2022-05-10 19:01:38 -07:00
Michael Hohn	8e5d9c464b	Add snowflake implementation	2022-04-11 19:24:12 -07:00
Michael Hohn	d5390bb87e	Full revision of the base tables derived from multiple sarif input files The new base tables produced by `sarif-extract-multi` are artifacts codeflows kind_pathproblem kind_problem project relatedLocations rules The revised table overview is in the jupyter notebook scripts/multi-table-overview.ipynb The file notes/typegraph-multi-with-tables.pdf illustrates what original (sarif) tables are used to form the base (derived) tables.	2022-03-23 16:37:41 -07:00
Michael Hohn	db00f17137	Some cleanup based on pyflakes output	2022-03-17 17:23:53 -07:00
Michael Hohn	b82c620a1e	Add overview of the base tables derived from multi-sarif input; add rules.csv The table overview is in the jupyter notebook scripts/multi-table-overview.ipynb and makes use of some formatting customizations to actually get an overview. The initial `projects` table had far too many entries; the `rules` part is now in a separate `rules` table.	2022-03-16 16:54:14 -07:00
Michael Hohn	926e083991	Added field to multi-file signature; the steps are documented in adding-to-typegraph.org	2022-03-15 12:30:05 -07:00
Michael Hohn	0f070a6ae4	sarif-extract-multi: extract combined tables from multiple sarif files This command introduces a new tree structure that pulls in a collection of sarif files. In yaml format, an example is - creation_date: '2021-12-09' # Repository creation date primary_language: javascript # By lines of code project_name: treeio/treeio # Repo name-short name query_commit_id: fa9571646c # Commit id for custom (non-library) queries sarif_content: {} # The sarif content will be attached here sarif_file_name: 2021-12-09/results.sarif # Path to sarif file scan_start_date: '2021-12-09' # Beginning date/time of scan scan_stop_date: '2021-12-10' # End date/time of scan tool_name: codeql tool_version: v1.27 - creation_date: '2022-02-25' primary_language: javascript ... At run time, cd ~/local/sarif-cli/data/treeio sarif-extract-multi multi-sarif-01.json test-multi-table will load the specified sarif files and put them in place of `sarif_content`, then build tables against the new signature found in sarif_cli/signature_multi.py, and merge those into 6 larger tables. The exported tables are artifacts.csv path-problem.csv project.csv codeflows.csv problem.csv related-locations.csv and they have join keys for further operations. The new typegraph is rendered in notes/typegraph-multi.pdf using the instructions in sarif_cli/signature_multi.py	2022-03-11 23:00:53 -08:00
Michael Hohn	ad738abed3	sarif-extract-tables: also output relatedLocations table With --related-locations, ../../bin/sarif-results-summary -r results.sarif produces the details RESULT: static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js:722: 72:722:73: Character ''' is repeated [here](1) in the same character class. Character ''' is repeated [here](2) in the same character class. Character ''' is repeated [here](3) in the same character class. REFERENCE: static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js:722:74:722:75: here REFERENCE: static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js:722:76:722:77: here REFERENCE: static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js:722:78:722:79: here Via ../../bin/sarif-extract-tables results.sarif tables sarif-extract-tables now produces two output tables, tables/ ├── messages.csv └── relatedLocations.csv that contain the relevant information and can be joined or otherwise combined on the struct_id_4055 key. For example, adding to the end of sarif-extract-tables: import IPython IPython.embed() msg = d2[d2.message.str.startswith("Character ''' is repeated [here]")] dr3[dr3.struct_id_4055 == msg.struct_id_4055.values[0]] In [24]: msg Out[24]: struct_id_4055 ... message 180 4796917312 ... Character ''' is repeated [here](1) in the sam... [1 rows x 7 columns] In [25]: dr3[dr3.struct_id_4055 == msg.struct_id_4055.values[0]] Out[25]: struct_id_4055 uri startLine startColumn endLine endColumn message 180 4796917312 static/js/tinymce/jscripts/tiny_mce/plugins/pa... 722 74 722 75 here 181 4796917312 static/js/tinymce/jscripts/tiny_mce/plugins/pa... 722 76 722 77 here 182 4796917312 static/js/tinymce/jscripts/tiny_mce/plugins/pa... 722 78 722 79 here or manually from the shell: # pick up the struct_id_4055: 0:$ grep "static.*Character ''' is repeated \[here\]" tables/messages.csv 180,4927448704,static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js,722,72,722,73,"Character ''' is repeated [here](1) in the same character class. # and find relatedLocations: 0:$ grep 4927448704 tables/relatedLocations.csv 180,4927448704,static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js,722,74,722,75,here 181,4927448704,static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js,722,76,722,77,here 182,4927448704,static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js,722,78,722,79,here Changes: - Introduce scli-dyys, a random id string for later identification and removal of dummy table rows. - Keep the struct_id_4055 column to join tables as needed. - Output is now written to a directory as there are always multiple files.	2022-02-16 17:03:58 -08:00
Michael Hohn	f246f06d4e	sarif-extract-tables: interim commit: form tables Tables are now formed and kept in the Typegraph instance. These will be tested using pandas operations to form one of the previous outputs.	2022-02-04 23:56:01 -08:00
Michael Hohn	7a517fa06c	sarif-extract-tables: interim commit Internal destructuring and array aggregration run, but need to be tested. Tables need to be formed, and pandas selections/joins/etc. used for custom table output.	2022-02-04 14:44:55 -08:00
Michael Hohn	cf8096446b	sarif-to-dot: cleanup for and preparation for sarif table extraction	2022-02-01 22:42:25 -08:00
Michael Hohn	119f9a5c18	sarif-to-dot: add more support for --fill-structure option Expand ('Struct4827', ('struct', ('physicalLocation', 'Struct4963'))), to have fields ( 'Struct2683', ( 'struct', ('id', 'Int'), ('message', 'Struct2774'), ('physicalLocation', 'Struct4963'))) and avoid a redundant table.	2022-01-27 18:55:02 -08:00
Michael Hohn	eb53ede8b1	sarif-to-dot: add more support for --fill-structure option Common to all: \| ('locations', 'Array008') \| \| ('message', 'Struct009') \| \| ('partialFingerprints', 'Struct010') \| \| ('rule', 'Struct011') \| \| ('ruleId', 'String'), \| \| ('ruleIndex', 'Int'))) \| Only some problems and flow problems have \| ('relatedLocations', 'Array014') \| Add dummy value for relatedLocations to reduce to two result categories, @kind flow problem and @kind problem.	2022-01-27 18:18:43 -08:00
Michael Hohn	80b22001ce	sarif-to-dot: make signature names order-independent To create entire subtrees conforming to a signature, first make the signature names order-independent. Use hashes to name the signatures.	2022-01-27 17:53:14 -08:00
Michael Hohn	0b13a297a5	sarif-to-dot: add more support for --fill-structure option Ensure ('Array003', ('array', (0, 'String'))), is always present, collapse the following into one: ( 'Struct032', ( 'struct', ('artifacts', 'Array002'), ('columnKind', 'String'), ('newlineSequences', 'Array003'), ('properties', 'Struct004'), ('results', 'Array023'), ('tool', 'Struct029'), ('versionControlProvenance', 'Array031'))), ( 'Struct033', ( 'struct', ('artifacts', 'Array002'), ('columnKind', 'String'), ('properties', 'Struct004'), ('results', 'Array023'), ('tool', 'Struct029'), ('versionControlProvenance', 'Array031')))	2022-01-26 22:27:07 -08:00
Michael Hohn	2adf0dfa21	sarif-to-dot: increase graph ranksep to get intelligible edges	2022-01-26 16:15:42 -08:00
Michael Hohn	2c98cf0d41	sarif-to-dot: add more support for --fill-structure option When both ('message', 'Struct009'), ('physicalLocation', 'Struct006'))), are present, ensure ('id', 'Int'), also is.	2022-01-26 16:06:15 -08:00
Michael Hohn	2b75988b9a	sarif-to-dot: add more support for --fill-structure option Expand all 'properties' objects to common signature; instead of the 3 entries, get one: ( 'struct', ('kind', 'String'), ('precision', 'String'), ('severity', 'String'), ('tags', 'Array003'))) ( 'struct', ('kind', 'String'), ('precision', 'String'), ('security-severity', 'String'), ('severity', 'String'), ('tags', 'Array003')) ( 'struct', ('kind', 'String'), ('precision', 'String'), ('severity', 'String'), ('sub-severity', 'String'), ('tags', 'Array003'))	2022-01-26 15:41:26 -08:00
Michael Hohn	153eba8346	sarif-to-dot: to reduce graph clutter, add option --no-edges-to-scalars	2022-01-26 00:41:31 -08:00
Michael Hohn	d7d566c5db	sarif-to-dot: add more support for --fill-structure option Collapse multipl 'physicalLocation's into one; from ( 'Struct006', ('struct', ('artifactLocation', 'Struct000'), ('region', 'Struct005'))), ('Struct036', ('struct', ('artifactLocation', 'Struct000'))), to ( 'Struct006', ('struct', ('artifactLocation', 'Struct000'), ('region', 'Struct005'))),	2022-01-25 23:43:43 -08:00
Michael Hohn	b816705574	sarif-to-dot: add --fill-structure option and initial library support This collapses the rightmost column of the signature output from ../../bin/sarif-to-dot -u -t -d -f results.sarif \| dot -Tpdf which has multiple distinct entries ('Struct030', ('struct', ('endColumn', 'Int'), ('startLine', 'Int'))), ( 'Struct016', ( 'struct', ('endColumn', 'Int'), ('startColumn', 'Int'), ('startLine', 'Int'))), ( 'Struct025', ( 'struct', ('endColumn', 'Int'), ('endLine', 'Int'), ('startColumn', 'Int'), ('startLine', 'Int'))), ('Struct030', ('struct', ('endColumn', 'Int'), ('startLine', 'Int'))), to a single entry, ( 'Struct005', ( 'struct', ('endColumn', 'Int'), ('endLine', 'Int'), ('startColumn', 'Int'), ('startLine', 'Int'))), when using ../../bin/sarif-to-dot results.sarif -u -t -f	2022-01-25 23:18:20 -08:00
Michael Hohn	edfe1f3363	sarif-to-dot: move signature functions into their own module	2022-01-25 17:57:44 -08:00
Michael Hohn	113fa483ca	traverse: add file header	2022-01-16 13:23:33 -08:00
Michael Hohn	ef08825b43	Processing in stages: Move the initial sarif_cli code to sarif_cli/traverse	2021-12-22 18:03:34 -08:00
Michael Hohn	9590d0a677	Add newline after dbg(message) output	2021-12-18 14:19:38 -08:00
Michael Hohn	f1d21e4a43	Fix missing 'region' key in relatedLocations: use whole-file output The goal is fixed-structure output formatting, so whole-file output uses -1,-1,-1,-1 for line, column information.	2021-12-08 16:02:31 -08:00
Michael Hohn	1271589bc4	Fix class NoFile: comment	2021-12-06 15:34:03 -08:00
Michael Hohn	92d904ee10	Add quick check to verify that input is serif An occasional output from LGTM is {"code":404,"error":"The specified analysis could not be found"} With this patch, the csv output is now "ERROR","invalid json contents %s","some-file.json" and the plain text output becomes ERROR: invalid json contents in some-file.json	2021-12-06 14:24:08 -08:00
Michael Hohn	120e673424	Fix: handle `relatedLocation`s without `physicalLocation`s (files) Problem: The artifact = get(related_location, 'physicalLocation', 'artifactLocation') requested by message, artifact, region = S.get_relatedlocation_message_info(location) is incomplete: ipdb> p related_location {'message': {'text': 'request'}} Fix: Introduce the NoFile class to propagate this and handle it where needed. Now simply report <NoFile> as appropriate. For plain text output: RESULT: src/optionsparser/ .. FLOW STEP 0: <NoFile>: request FLOW STEP 1: <NoFile>: request_mp FLOW STEP 2: src/.... For csv output: "result","src/optionsparser/...","116","26","116","34","`& ...` used as ..." "flow_step","0","<NoFile>","-1","-1","-1","-1","request" "flow_step","1","<NoFile>","-1","-1","-1","-1","request_mp" "flow_step","2","src/foo.cpp","119","97","119","104","request"	2021-12-06 12:37:35 -08:00
Michael Hohn	2c3ca3c0eb	Fix for KeyError: 'region', caused by `result` without `region` Region / line / column information are present in most messages. The one that caused this error refers to the whole file: ipdb> p sarif_struct {'ruleId': 'com.lgtm/cpp-queries:cpp/missing-header-guard', 'ruleIndex': 12, 'message': {'text': 'This header file should contain a header guard to prevent multiple inclusion.'}, 'locations': [{'physicalLocation': {'artifactLocation': {'uri': 'diff/cmpbuf.h', 'uriBaseId': '%SRCROOT%', 'index': 13}}}], 'partialFingerprints': {'primaryLocationLineHash': 'd04cb834fa64727d:1', 'primaryLocationStartColumnFingerprint': '0'}} The goal is fixed-structure output formatting, so whole-file output uses -1,-1,-1,-1 for line, column information.	2021-12-06 11:48:53 -08:00
Michael Hohn	ffcacec630	sarif-results-summary: add csv output option	2021-12-06 11:48:53 -08:00
Michael Hohn	f0aa815a9a	Fix encoding read error When using : with open(fname, 'r') as file: hits the accented letter á in Vrána in the file : data/wxWidgets-small/src/stc/scintilla/lexers/LexCSS.cxx it results in a : UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 119: invalid continuation byte We are reading source code, so we likely don't care about dropping non-ascii; using : with codecs.open(fname, 'r', encoding="latin-1") as file: ignores this problem.	2021-12-06 11:48:53 -08:00
Michael Hohn	85ddaaafe1	sarif-results-summary: add codeFlow (path-problem) output, remove meta-data The per-language result counts are removed; they belong in a separate sarif-info script.	2021-12-06 11:48:53 -08:00
Michael Hohn	6147e57260	Introduce get_relatedlocation_message_info to co-locate tree information	2021-11-17 16:34:20 -08:00
Michael Hohn	1f7e78b049	refactor: introduce get_location_message_info	2021-11-17 16:28:43 -08:00
Michael Hohn	90758f769f	factor common code into `display_underlined`	2021-11-17 15:56:43 -08:00
Michael Hohn	9f3be7bcb0	Log missing files, but try to continue execution	2021-11-16 21:45:54 -08:00
Michael Hohn	e36874cb54	sarif-results-summary: underline affected code region Using sarif-results-summary -s data/linux-small data/torvalds_linux__2021-10-21_10_07_00__export.sarif \|less now underscores the indicated regions, e.g. tools/cgroup/iocost_monitor.py:64:5:64:27: Normal methods should have 'self', rather than 'blkcg', as their first parameter. def blkcg_name(blkcg): ^^^^^^^^^^^^^^^^^^^^^^	2021-11-15 14:16:23 -08:00
Michael Hohn	a756abbb09	Consistency with tabs in Python source code In load_lines, use 1 space for each tab	2021-11-15 14:00:18 -08:00
Michael Hohn	912f75c52a	fix load_lines: only strip newlines	2021-11-15 13:41:51 -08:00
Michael Hohn	b69eec404d	sarif-results-summary -s: include source file lines in output	2021-11-09 16:10:12 -08:00
Michael Hohn	ab1d7c27ef	Use sensible values for start/end line/columns for empty entries in the sarif 'region' structure.	2021-11-09 15:04:36 -08:00
Michael Hohn	a0af2c8c59	fix: traverse all languages	2021-11-09 14:29:31 -08:00
Michael Hohn	3032fe3fcd	pre-alpha versions of bin/sarif-{digest,labeled,list-files,results-summary	2021-11-09 12:21:12 -08:00

1 2

99 Commits