sarif-cli

mirror of https://github.com/hohn/sarif-cli.git synced 2025-12-16 17:23:03 +01:00

Author	SHA1	Message	Date
Kristen Newbury	c51dbba577	Add fake date ranges to scan default values	2022-10-26 11:28:06 -04:00
Kristen Newbury	3b3999cfd7	Add kind, precision, severity to scan table for path-problem	2022-10-13 16:44:20 -04:00
Kristen Newbury	3385d9a10a	Add kind, precision, severity to scan table	2022-10-13 13:54:32 -04:00
Michael Hohn	2b42a7d306	scan table change: the results.query_id is the @id from the CodeQL query Before, the query_id was ==> results.csv <== query_id STRING, -- git commit id of the ql query set now, it's query_id STRING, -- @id from the CodeQL query	2022-08-11 16:56:20 -07:00
Michael Hohn	8ad69a503b	Reduce zero results from error to warning	2022-08-11 16:26:07 -07:00
Michael Hohn	38af30ead9	Switch numpy.datetime64() to numpy.dtype('M') to get working equality comparison	2022-08-10 17:33:44 -07:00
Michael Hohn	1754c6c9ca	Export codeflows column types for scan-related pandas tables	2022-08-08 16:49:13 -07:00
Michael Hohn	505ee8ea66	Export column types for scan-related pandas tables	2022-08-08 16:48:17 -07:00
Michael Hohn	560b9ecf35	Enforce types when forming the scan tables (internal and output formatting) Force all column types to ensure appropriate formatting for writing. In particular, no character data in place of integers, no floats, no objects in place of strings. Table formation for the functions - st.joins_for_results - st.joins_for_scans - st.joins_for_projects enforces types.	2022-08-07 19:04:13 -07:00
Michael Hohn	0e7a941be3	Include all typegraph samples, from raw to refined	2022-07-14 18:29:21 -07:00
Michael Hohn	741be0cfe1	Include project table in output of sarif-extract-scans; add commit_id to scans table	2022-06-02 16:45:04 -07:00
Michael Hohn	fd55969b76	fix: special concatenation case for empty tables	2022-06-01 17:44:50 -07:00
Michael Hohn	32413984e2	fix: only concatenate non-empty tables to suppress float conversion	2022-06-01 17:34:56 -07:00
Michael Hohn	82a8e7a6dc	fix: set id and scan_id type to uint64 to suppress float conversion	2022-06-01 13:00:37 -07:00
Michael Hohn	0fc6eb3cce	Improve error reporting in sarif destructuring routines	2022-05-30 00:09:13 -07:00
Michael Hohn	f5e258de52	Enhance the fillsig() routines to supplement lgtm.com/lgtm enterprise signature differences	2022-05-30 00:08:09 -07:00
Michael Hohn	eb8e2f18e9	Initial version of sarif-extract-scans, to be tested Running cd ~/local/sarif-cli/data/treeio sarif-extract-scans scan-spec-0.json test-scan produces the 2 derived and one sarif-based table (codeflows.csv): ls test-scan/ codeflows.csv results.csv scans.csv Adding -r via sarif-extract-scans -r scan-spec-0.json test-scan writes all tables: ls test-scan/ artifacts.csv kind_pathproblem.csv project.csv results.csv scans.csv codeflows.csv kind_problem.csv relatedLocations.csv rules.csv	2022-05-16 18:58:53 -07:00
Michael Hohn	154b0bdc56	WIP: assemble derived 'results' table	2022-05-13 17:01:18 -07:00
Michael Hohn	b212423907	WIP: sarif-extract-scans: back to single sarif file handling, incorporate multi-file libraries	2022-05-10 19:01:38 -07:00
Michael Hohn	8e5d9c464b	Add snowflake implementation	2022-04-11 19:24:12 -07:00
Michael Hohn	d5390bb87e	Full revision of the base tables derived from multiple sarif input files The new base tables produced by `sarif-extract-multi` are artifacts codeflows kind_pathproblem kind_problem project relatedLocations rules The revised table overview is in the jupyter notebook scripts/multi-table-overview.ipynb The file notes/typegraph-multi-with-tables.pdf illustrates what original (sarif) tables are used to form the base (derived) tables.	2022-03-23 16:37:41 -07:00
Michael Hohn	db00f17137	Some cleanup based on pyflakes output	2022-03-17 17:23:53 -07:00
Michael Hohn	b82c620a1e	Add overview of the base tables derived from multi-sarif input; add rules.csv The table overview is in the jupyter notebook scripts/multi-table-overview.ipynb and makes use of some formatting customizations to actually get an overview. The initial `projects` table had far too many entries; the `rules` part is now in a separate `rules` table.	2022-03-16 16:54:14 -07:00
Michael Hohn	926e083991	Added field to multi-file signature; the steps are documented in adding-to-typegraph.org	2022-03-15 12:30:05 -07:00
Michael Hohn	0f070a6ae4	sarif-extract-multi: extract combined tables from multiple sarif files This command introduces a new tree structure that pulls in a collection of sarif files. In yaml format, an example is - creation_date: '2021-12-09' # Repository creation date primary_language: javascript # By lines of code project_name: treeio/treeio # Repo name-short name query_commit_id: fa9571646c # Commit id for custom (non-library) queries sarif_content: {} # The sarif content will be attached here sarif_file_name: 2021-12-09/results.sarif # Path to sarif file scan_start_date: '2021-12-09' # Beginning date/time of scan scan_stop_date: '2021-12-10' # End date/time of scan tool_name: codeql tool_version: v1.27 - creation_date: '2022-02-25' primary_language: javascript ... At run time, cd ~/local/sarif-cli/data/treeio sarif-extract-multi multi-sarif-01.json test-multi-table will load the specified sarif files and put them in place of `sarif_content`, then build tables against the new signature found in sarif_cli/signature_multi.py, and merge those into 6 larger tables. The exported tables are artifacts.csv path-problem.csv project.csv codeflows.csv problem.csv related-locations.csv and they have join keys for further operations. The new typegraph is rendered in notes/typegraph-multi.pdf using the instructions in sarif_cli/signature_multi.py	2022-03-11 23:00:53 -08:00
Michael Hohn	ad738abed3	sarif-extract-tables: also output relatedLocations table With --related-locations, ../../bin/sarif-results-summary -r results.sarif produces the details RESULT: static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js:722: 72:722:73: Character ''' is repeated [here](1) in the same character class. Character ''' is repeated [here](2) in the same character class. Character ''' is repeated [here](3) in the same character class. REFERENCE: static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js:722:74:722:75: here REFERENCE: static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js:722:76:722:77: here REFERENCE: static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js:722:78:722:79: here Via ../../bin/sarif-extract-tables results.sarif tables sarif-extract-tables now produces two output tables, tables/ ├── messages.csv └── relatedLocations.csv that contain the relevant information and can be joined or otherwise combined on the struct_id_4055 key. For example, adding to the end of sarif-extract-tables: import IPython IPython.embed() msg = d2[d2.message.str.startswith("Character ''' is repeated [here]")] dr3[dr3.struct_id_4055 == msg.struct_id_4055.values[0]] In [24]: msg Out[24]: struct_id_4055 ... message 180 4796917312 ... Character ''' is repeated [here](1) in the sam... [1 rows x 7 columns] In [25]: dr3[dr3.struct_id_4055 == msg.struct_id_4055.values[0]] Out[25]: struct_id_4055 uri startLine startColumn endLine endColumn message 180 4796917312 static/js/tinymce/jscripts/tiny_mce/plugins/pa... 722 74 722 75 here 181 4796917312 static/js/tinymce/jscripts/tiny_mce/plugins/pa... 722 76 722 77 here 182 4796917312 static/js/tinymce/jscripts/tiny_mce/plugins/pa... 722 78 722 79 here or manually from the shell: # pick up the struct_id_4055: 0:$ grep "static.*Character ''' is repeated \[here\]" tables/messages.csv 180,4927448704,static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js,722,72,722,73,"Character ''' is repeated [here](1) in the same character class. # and find relatedLocations: 0:$ grep 4927448704 tables/relatedLocations.csv 180,4927448704,static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js,722,74,722,75,here 181,4927448704,static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js,722,76,722,77,here 182,4927448704,static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js,722,78,722,79,here Changes: - Introduce scli-dyys, a random id string for later identification and removal of dummy table rows. - Keep the struct_id_4055 column to join tables as needed. - Output is now written to a directory as there are always multiple files.	2022-02-16 17:03:58 -08:00
Michael Hohn	f246f06d4e	sarif-extract-tables: interim commit: form tables Tables are now formed and kept in the Typegraph instance. These will be tested using pandas operations to form one of the previous outputs.	2022-02-04 23:56:01 -08:00
Michael Hohn	7a517fa06c	sarif-extract-tables: interim commit Internal destructuring and array aggregration run, but need to be tested. Tables need to be formed, and pandas selections/joins/etc. used for custom table output.	2022-02-04 14:44:55 -08:00
Michael Hohn	cf8096446b	sarif-to-dot: cleanup for and preparation for sarif table extraction	2022-02-01 22:42:25 -08:00
Michael Hohn	119f9a5c18	sarif-to-dot: add more support for --fill-structure option Expand ('Struct4827', ('struct', ('physicalLocation', 'Struct4963'))), to have fields ( 'Struct2683', ( 'struct', ('id', 'Int'), ('message', 'Struct2774'), ('physicalLocation', 'Struct4963'))) and avoid a redundant table.	2022-01-27 18:55:02 -08:00
Michael Hohn	eb53ede8b1	sarif-to-dot: add more support for --fill-structure option Common to all: \| ('locations', 'Array008') \| \| ('message', 'Struct009') \| \| ('partialFingerprints', 'Struct010') \| \| ('rule', 'Struct011') \| \| ('ruleId', 'String'), \| \| ('ruleIndex', 'Int'))) \| Only some problems and flow problems have \| ('relatedLocations', 'Array014') \| Add dummy value for relatedLocations to reduce to two result categories, @kind flow problem and @kind problem.	2022-01-27 18:18:43 -08:00
Michael Hohn	80b22001ce	sarif-to-dot: make signature names order-independent To create entire subtrees conforming to a signature, first make the signature names order-independent. Use hashes to name the signatures.	2022-01-27 17:53:14 -08:00
Michael Hohn	0b13a297a5	sarif-to-dot: add more support for --fill-structure option Ensure ('Array003', ('array', (0, 'String'))), is always present, collapse the following into one: ( 'Struct032', ( 'struct', ('artifacts', 'Array002'), ('columnKind', 'String'), ('newlineSequences', 'Array003'), ('properties', 'Struct004'), ('results', 'Array023'), ('tool', 'Struct029'), ('versionControlProvenance', 'Array031'))), ( 'Struct033', ( 'struct', ('artifacts', 'Array002'), ('columnKind', 'String'), ('properties', 'Struct004'), ('results', 'Array023'), ('tool', 'Struct029'), ('versionControlProvenance', 'Array031')))	2022-01-26 22:27:07 -08:00
Michael Hohn	2adf0dfa21	sarif-to-dot: increase graph ranksep to get intelligible edges	2022-01-26 16:15:42 -08:00
Michael Hohn	2c98cf0d41	sarif-to-dot: add more support for --fill-structure option When both ('message', 'Struct009'), ('physicalLocation', 'Struct006'))), are present, ensure ('id', 'Int'), also is.	2022-01-26 16:06:15 -08:00
Michael Hohn	2b75988b9a	sarif-to-dot: add more support for --fill-structure option Expand all 'properties' objects to common signature; instead of the 3 entries, get one: ( 'struct', ('kind', 'String'), ('precision', 'String'), ('severity', 'String'), ('tags', 'Array003'))) ( 'struct', ('kind', 'String'), ('precision', 'String'), ('security-severity', 'String'), ('severity', 'String'), ('tags', 'Array003')) ( 'struct', ('kind', 'String'), ('precision', 'String'), ('severity', 'String'), ('sub-severity', 'String'), ('tags', 'Array003'))	2022-01-26 15:41:26 -08:00
Michael Hohn	153eba8346	sarif-to-dot: to reduce graph clutter, add option --no-edges-to-scalars	2022-01-26 00:41:31 -08:00
Michael Hohn	d7d566c5db	sarif-to-dot: add more support for --fill-structure option Collapse multipl 'physicalLocation's into one; from ( 'Struct006', ('struct', ('artifactLocation', 'Struct000'), ('region', 'Struct005'))), ('Struct036', ('struct', ('artifactLocation', 'Struct000'))), to ( 'Struct006', ('struct', ('artifactLocation', 'Struct000'), ('region', 'Struct005'))),	2022-01-25 23:43:43 -08:00
Michael Hohn	b816705574	sarif-to-dot: add --fill-structure option and initial library support This collapses the rightmost column of the signature output from ../../bin/sarif-to-dot -u -t -d -f results.sarif \| dot -Tpdf which has multiple distinct entries ('Struct030', ('struct', ('endColumn', 'Int'), ('startLine', 'Int'))), ( 'Struct016', ( 'struct', ('endColumn', 'Int'), ('startColumn', 'Int'), ('startLine', 'Int'))), ( 'Struct025', ( 'struct', ('endColumn', 'Int'), ('endLine', 'Int'), ('startColumn', 'Int'), ('startLine', 'Int'))), ('Struct030', ('struct', ('endColumn', 'Int'), ('startLine', 'Int'))), to a single entry, ( 'Struct005', ( 'struct', ('endColumn', 'Int'), ('endLine', 'Int'), ('startColumn', 'Int'), ('startLine', 'Int'))), when using ../../bin/sarif-to-dot results.sarif -u -t -f	2022-01-25 23:18:20 -08:00
Michael Hohn	edfe1f3363	sarif-to-dot: move signature functions into their own module	2022-01-25 17:57:44 -08:00
Michael Hohn	113fa483ca	traverse: add file header	2022-01-16 13:23:33 -08:00
Michael Hohn	ef08825b43	Processing in stages: Move the initial sarif_cli code to sarif_cli/traverse	2021-12-22 18:03:34 -08:00
Michael Hohn	9590d0a677	Add newline after dbg(message) output	2021-12-18 14:19:38 -08:00
Michael Hohn	f1d21e4a43	Fix missing 'region' key in relatedLocations: use whole-file output The goal is fixed-structure output formatting, so whole-file output uses -1,-1,-1,-1 for line, column information.	2021-12-08 16:02:31 -08:00
Michael Hohn	1271589bc4	Fix class NoFile: comment	2021-12-06 15:34:03 -08:00
Michael Hohn	92d904ee10	Add quick check to verify that input is serif An occasional output from LGTM is {"code":404,"error":"The specified analysis could not be found"} With this patch, the csv output is now "ERROR","invalid json contents %s","some-file.json" and the plain text output becomes ERROR: invalid json contents in some-file.json	2021-12-06 14:24:08 -08:00
Michael Hohn	120e673424	Fix: handle `relatedLocation`s without `physicalLocation`s (files) Problem: The artifact = get(related_location, 'physicalLocation', 'artifactLocation') requested by message, artifact, region = S.get_relatedlocation_message_info(location) is incomplete: ipdb> p related_location {'message': {'text': 'request'}} Fix: Introduce the NoFile class to propagate this and handle it where needed. Now simply report <NoFile> as appropriate. For plain text output: RESULT: src/optionsparser/ .. FLOW STEP 0: <NoFile>: request FLOW STEP 1: <NoFile>: request_mp FLOW STEP 2: src/.... For csv output: "result","src/optionsparser/...","116","26","116","34","`& ...` used as ..." "flow_step","0","<NoFile>","-1","-1","-1","-1","request" "flow_step","1","<NoFile>","-1","-1","-1","-1","request_mp" "flow_step","2","src/foo.cpp","119","97","119","104","request"	2021-12-06 12:37:35 -08:00
Michael Hohn	2c3ca3c0eb	Fix for KeyError: 'region', caused by `result` without `region` Region / line / column information are present in most messages. The one that caused this error refers to the whole file: ipdb> p sarif_struct {'ruleId': 'com.lgtm/cpp-queries:cpp/missing-header-guard', 'ruleIndex': 12, 'message': {'text': 'This header file should contain a header guard to prevent multiple inclusion.'}, 'locations': [{'physicalLocation': {'artifactLocation': {'uri': 'diff/cmpbuf.h', 'uriBaseId': '%SRCROOT%', 'index': 13}}}], 'partialFingerprints': {'primaryLocationLineHash': 'd04cb834fa64727d:1', 'primaryLocationStartColumnFingerprint': '0'}} The goal is fixed-structure output formatting, so whole-file output uses -1,-1,-1,-1 for line, column information.	2021-12-06 11:48:53 -08:00
Michael Hohn	ffcacec630	sarif-results-summary: add csv output option	2021-12-06 11:48:53 -08:00
Michael Hohn	f0aa815a9a	Fix encoding read error When using : with open(fname, 'r') as file: hits the accented letter á in Vrána in the file : data/wxWidgets-small/src/stc/scintilla/lexers/LexCSS.cxx it results in a : UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 119: invalid continuation byte We are reading source code, so we likely don't care about dropping non-ascii; using : with codecs.open(fname, 'r', encoding="latin-1") as file: ignores this problem.	2021-12-06 11:48:53 -08:00

1 2

62 Commits