Commit Graph

12 Commits

Author SHA1 Message Date
Kristen Newbury
04a5aae14d Add CLI support
enabled by -f flag with CLI value
tested on sarif from CodeQL CLIs:
2.6.3, 2.9.4, 2.11.4
MUST contain versionControlProvenance property however
2022-12-15 19:12:58 -05:00
Michael Hohn
235acf6b93 Quote all non-numeric CSV output 2022-08-10 17:44:29 -07:00
Michael Hohn
ef00559408 Bring sarif-extract-tables up to date with sarif-extract-scans 2022-07-19 15:42:26 -07:00
Michael Hohn
db00f17137 Some cleanup based on pyflakes output 2022-03-17 17:23:53 -07:00
Michael Hohn
0f070a6ae4 sarif-extract-multi: extract combined tables from multiple sarif files
This command introduces a new tree structure that pulls in a collection
of sarif files.  In yaml format, an example is

    - creation_date: '2021-12-09'   # Repository creation date
      primary_language: javascript  # By lines of code
      project_name: treeio/treeio   # Repo name-short name
      query_commit_id: fa9571646c   # Commit id for custom (non-library) queries
      sarif_content: {}             # The sarif content will be attached here
      sarif_file_name: 2021-12-09/results.sarif # Path to sarif file
      scan_start_date: '2021-12-09'             # Beginning date/time of scan
      scan_stop_date:  '2021-12-10'             # End date/time of scan
      tool_name: codeql
      tool_version: v1.27

    - creation_date: '2022-02-25'
      primary_language: javascript
      ...

At run time,

    cd ~/local/sarif-cli/data/treeio
    sarif-extract-multi multi-sarif-01.json test-multi-table

will load the specified sarif files and put them in place of
`sarif_content`, then build tables against the new signature found in
sarif_cli/signature_multi.py, and merge those into 6 larger tables.  The
exported tables are

    artifacts.csv  path-problem.csv  project.csv
    codeflows.csv  problem.csv       related-locations.csv

and they have join keys for further operations.

The new typegraph is rendered in

    notes/typegraph-multi.pdf

using the instructions in

    sarif_cli/signature_multi.py
2022-03-11 23:00:53 -08:00
Michael Hohn
9c151e295b sarif-extract-tables: include relatedLocations from both sources
With the addition of the path-problem output, include both as sources (left joins)
for relatedLocations:

    pd.concat([sf(4055)[['relatedLocations', 'struct_id']],
              sf(9699)[['relatedLocations', 'struct_id']]])
2022-02-22 17:35:39 -08:00
Michael Hohn
1dbd240b5b sarif-extract-tables: Form the codeFlows dataframe and write it out
One of the shorter multi-path results from
     cd ~/local/sarif-cli/data/treeio
     ../../bin/sarif-results-summary -r results.sarif |less
follows; the dataframe formed here starts with the codeFlows-containing table 9699
and has the content of the PATH * output below.

    RESULT: static/js/jquery-ui-1.10.3/demos/dialog/modal-form.html:89:35:93:14: [DOM text](1) is reinte
    rpreted as HTML without escaping meta-characters.
    [DOM text](2) is reinterpreted as HTML without escaping meta-characters.
    [DOM text](3) is reinterpreted as HTML without escaping meta-characters.
    REFERENCE: static/js/jquery-ui-1.10.3/demos/dialog/modal-form.html:90:17:90:27: DOM text
    REFERENCE: static/js/jquery-ui-1.10.3/demos/dialog/modal-form.html:91:17:91:28: DOM text
    REFERENCE: static/js/jquery-ui-1.10.3/demos/dialog/modal-form.html:92:17:92:31: DOM text
    PATH 0
    FLOW STEP 0: static/js/jquery-ui-1.10.3/demos/dialog/modal-form.html:90:17:90:27: name.val()
    FLOW STEP 1: static/js/jquery-ui-1.10.3/demos/dialog/modal-form.html:89:35:93:14: "<tr>"  ... "</tr>
    "
    PATH 1
    FLOW STEP 0: static/js/jquery-ui-1.10.3/demos/dialog/modal-form.html:91:17:91:28: email.val()
    FLOW STEP 1: static/js/jquery-ui-1.10.3/demos/dialog/modal-form.html:89:35:93:14: "<tr>"  ... "</tr>"
    PATH 2
    FLOW STEP 0: static/js/jquery-ui-1.10.3/demos/dialog/modal-form.html:92:17:92:31: password.val()
    FLOW STEP 1: static/js/jquery-ui-1.10.3/demos/dialog/modal-form.html:89:35:93:14: "<tr>"  ... "</tr>"
2022-02-22 16:50:44 -08:00
Michael Hohn
ad738abed3 sarif-extract-tables: also output relatedLocations table
With --related-locations,

    ../../bin/sarif-results-summary -r results.sarif

produces the details

    RESULT: static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js:722:
    72:722:73: Character ''' is repeated [here](1) in the same character class.
    Character ''' is repeated [here](2) in the same character class.
    Character ''' is repeated [here](3) in the same character class.
    REFERENCE: static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js:722:74:722:75: here
    REFERENCE: static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js:722:76:722:77: here
    REFERENCE: static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js:722:78:722:79: here

Via
    ../../bin/sarif-extract-tables results.sarif tables

sarif-extract-tables now produces two output tables,

    tables/
    ├── messages.csv
    └── relatedLocations.csv

that contain the relevant information and can be joined or otherwise combined on
the struct_id_4055 key.

For example, adding to the end of sarif-extract-tables:
    import IPython
    IPython.embed()

    msg = d2[d2.message.str.startswith("Character ''' is repeated [here]")]
    dr3[dr3.struct_id_4055 == msg.struct_id_4055.values[0]]

    In [24]: msg
    Out[24]:
         struct_id_4055  ...                                            message
    180      4796917312  ...  Character ''' is repeated [here](1) in the sam...

    [1 rows x 7 columns]

    In [25]: dr3[dr3.struct_id_4055 == msg.struct_id_4055.values[0]]
    Out[25]:
         struct_id_4055                                                uri  startLine  startColumn  endLine  endColumn message
    180      4796917312  static/js/tinymce/jscripts/tiny_mce/plugins/pa...        722           74      722         75    here
    181      4796917312  static/js/tinymce/jscripts/tiny_mce/plugins/pa...        722           76      722         77    here
    182      4796917312  static/js/tinymce/jscripts/tiny_mce/plugins/pa...        722           78      722         79    here

or manually from the shell:

    # pick up the struct_id_4055:
    0:$ grep "static.*Character ''' is repeated \[here\]" tables/messages.csv
    180,4927448704,static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js,722,72,722,73,"Character ''' is repeated [here](1) in the same character class.

    # and find relatedLocations:
    0:$ grep 4927448704 tables/relatedLocations.csv
    180,4927448704,static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js,722,74,722,75,here
    181,4927448704,static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js,722,76,722,77,here
    182,4927448704,static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js,722,78,722,79,here

Changes:
- Introduce scli-dyys, a random id string for later identification and removal of
  dummy table rows.

- Keep the struct_id_4055 column to join tables as needed.

- Output is now written to a directory as there are always multiple files.
2022-02-16 17:03:58 -08:00
Michael Hohn
ec9a0b5590 sarif-extract-tables: initial version, reproduces known output as table
Reproduce the

    file:line:col:line:col: message

output from

    ../../bin/sarif-results-summary results.sarif | grep size

as test/example.

Original sample output is

    RESULT: static/js/fileuploader.js:1214:13:1214:17: Unused variable size.
    RESULT: static/js/tinymce/jscripts/tiny_mce/plugins/media/js/media.js:438:30:438:34: Unused variable size.

The table result here is

    0:$ ../../bin/sarif-extract-tables results.sarif | grep size
    0,static/js/fileuploader.js,1214,13,1214,17,Unused variable size.
    34,static/js/tinymce/jscripts/tiny_mce/plugins/media/js/media.js,438,30,438,34,Unused variable size.
2022-02-08 20:04:28 -08:00
Michael Hohn
f5e73e90ba sarif-extract-tables: interim commit: first joins
These joins construct the table needed for sarif-results-summary output
2022-02-07 17:11:55 -08:00
Michael Hohn
f246f06d4e sarif-extract-tables: interim commit: form tables
Tables are now formed and kept in the Typegraph instance.
These will be tested using pandas operations to form one of the previous outputs.
2022-02-04 23:56:01 -08:00
Michael Hohn
7a517fa06c sarif-extract-tables: interim commit
Internal destructuring and array aggregration run, but need to be tested.
Tables need to be formed, and pandas selections/joins/etc. used for custom table output.
2022-02-04 14:44:55 -08:00