Commit Graph

7 Commits

Author SHA1 Message Date
Kristen Newbury
04a5aae14d Add CLI support
enabled by -f flag with CLI value
tested on sarif from CodeQL CLIs:
2.6.3, 2.9.4, 2.11.4
MUST contain versionControlProvenance property however
2022-12-15 19:12:58 -05:00
Michael Hohn
235acf6b93 Quote all non-numeric CSV output 2022-08-10 17:44:29 -07:00
Michael Hohn
30e3dd3a37 Replace internal ids with snowflake ids before writing tables 2022-04-29 22:39:25 -07:00
Michael Hohn
d5390bb87e Full revision of the base tables derived from multiple sarif input files
The new base tables produced by `sarif-extract-multi` are
    artifacts
    codeflows
    kind_pathproblem
    kind_problem
    project
    relatedLocations
    rules

The revised table overview is in the jupyter notebook
scripts/multi-table-overview.ipynb

The file notes/typegraph-multi-with-tables.pdf illustrates what original (sarif)
tables are used to form the base (derived) tables.
2022-03-23 16:37:41 -07:00
Michael Hohn
db00f17137 Some cleanup based on pyflakes output 2022-03-17 17:23:53 -07:00
Michael Hohn
b82c620a1e Add overview of the base tables derived from multi-sarif input; add rules.csv
The table overview is in the jupyter notebook
scripts/multi-table-overview.ipynb and makes use of some formatting
customizations to actually get an overview.

The initial `projects` table had far too many entries; the `rules` part
is now in a separate `rules` table.
2022-03-16 16:54:14 -07:00
Michael Hohn
0f070a6ae4 sarif-extract-multi: extract combined tables from multiple sarif files
This command introduces a new tree structure that pulls in a collection
of sarif files.  In yaml format, an example is

    - creation_date: '2021-12-09'   # Repository creation date
      primary_language: javascript  # By lines of code
      project_name: treeio/treeio   # Repo name-short name
      query_commit_id: fa9571646c   # Commit id for custom (non-library) queries
      sarif_content: {}             # The sarif content will be attached here
      sarif_file_name: 2021-12-09/results.sarif # Path to sarif file
      scan_start_date: '2021-12-09'             # Beginning date/time of scan
      scan_stop_date:  '2021-12-10'             # End date/time of scan
      tool_name: codeql
      tool_version: v1.27

    - creation_date: '2022-02-25'
      primary_language: javascript
      ...

At run time,

    cd ~/local/sarif-cli/data/treeio
    sarif-extract-multi multi-sarif-01.json test-multi-table

will load the specified sarif files and put them in place of
`sarif_content`, then build tables against the new signature found in
sarif_cli/signature_multi.py, and merge those into 6 larger tables.  The
exported tables are

    artifacts.csv  path-problem.csv  project.csv
    codeflows.csv  problem.csv       related-locations.csv

and they have join keys for further operations.

The new typegraph is rendered in

    notes/typegraph-multi.pdf

using the instructions in

    sarif_cli/signature_multi.py
2022-03-11 23:00:53 -08:00