Commit Graph

210 Commits

Author SHA1 Message Date
Michael Hohn
ef00559408 Bring sarif-extract-tables up to date with sarif-extract-scans 2022-07-19 15:42:26 -07:00
Michael Hohn
da7d669eb9 Resize logo font 2022-07-15 09:33:16 -07:00
Michael Hohn
c9f14a538b Add logo 2022-07-15 09:30:20 -07:00
Michael Hohn
0e7a941be3 Include all typegraph samples, from raw to refined 2022-07-14 18:29:21 -07:00
Michael Hohn
ef51c3d84f remove git-lfs 2022-07-12 19:46:33 -07:00
Michael Hohn
5cce2ed4d1 Better status updates for sarif-combine-tables 2022-06-03 00:08:23 -07:00
Michael Hohn
69f02cf99a Add sarif-combine-tables to combine output from sarif-runner 2022-06-02 18:55:22 -07:00
Michael Hohn
741be0cfe1 Include project table in output of sarif-extract-scans; add commit_id to scans table 2022-06-02 16:45:04 -07:00
Michael Hohn
fd55969b76 fix: special concatenation case for empty tables 2022-06-01 17:44:50 -07:00
Michael Hohn
32413984e2 fix: only concatenate non-empty tables to suppress float conversion 2022-06-01 17:34:56 -07:00
Michael Hohn
82a8e7a6dc fix: set id and scan_id type to uint64 to suppress float conversion 2022-06-01 13:00:37 -07:00
Michael Hohn
0fc6eb3cce Improve error reporting in sarif destructuring routines 2022-05-30 00:09:13 -07:00
Michael Hohn
f5e258de52 Enhance the fillsig() routines to supplement lgtm.com/lgtm enterprise signature differences 2022-05-30 00:08:09 -07:00
Michael Hohn
b7cd96ea72 Add sarif-runner.py to drive sarif-extract-scans for sarif file collections
The input file format is just a list of  organization/project entries
2022-05-30 00:04:40 -07:00
Michael Hohn
eb8e2f18e9 Initial version of sarif-extract-scans, to be tested
Running

    cd ~/local/sarif-cli/data/treeio
    sarif-extract-scans scan-spec-0.json test-scan

produces the 2 derived and one sarif-based table (codeflows.csv):

    ls test-scan/
    codeflows.csv  results.csv  scans.csv

Adding -r via

    sarif-extract-scans -r scan-spec-0.json test-scan

writes all tables:

    ls test-scan/
    artifacts.csv  kind_pathproblem.csv  project.csv           results.csv  scans.csv
    codeflows.csv  kind_problem.csv      relatedLocations.csv  rules.csv
2022-05-16 18:58:53 -07:00
Michael Hohn
3dd8522b7f Add simple timing run information 2022-05-16 11:43:05 -07:00
Michael Hohn
154b0bdc56 WIP: assemble derived 'results' table 2022-05-13 17:01:18 -07:00
Michael Hohn
b212423907 WIP: sarif-extract-scans: back to single sarif file handling, incorporate multi-file libraries 2022-05-10 19:01:38 -07:00
Michael Hohn
675a5a4008 Add svg snapshot of derived-tables.drawio 2022-05-02 10:45:26 -07:00
Michael Hohn
cbf129b49f Indent the json input file 2022-05-02 10:44:43 -07:00
Michael Hohn
30e3dd3a37 Replace internal ids with snowflake ids before writing tables 2022-04-29 22:39:25 -07:00
Michael Hohn
51f0505f5e Add non-sarif-metadata/, an overview of the metric and diagnostic queries 2022-04-28 16:11:35 -07:00
Michael Hohn
44f1d2f179 Description of current and upcoming tables and their information sources 2022-04-20 15:22:20 -07:00
Michael Hohn
1f2daab51e Re-run of table overview 2022-04-20 15:13:46 -07:00
Michael Hohn
046a152ae2 Expand current and planned table description 2022-04-19 12:00:54 -07:00
Michael Hohn
6cef65338a explore parts of the github API via distinct connection layers. 2022-04-18 21:20:43 -07:00
Michael Hohn
8e5d9c464b Add snowflake implementation 2022-04-11 19:24:12 -07:00
Michael Hohn
8b3710a51b interim: sarif-extract-multi table outputs and future table diagrams 2022-04-08 14:13:24 -07:00
Michael Hohn
d5390bb87e Full revision of the base tables derived from multiple sarif input files
The new base tables produced by `sarif-extract-multi` are
    artifacts
    codeflows
    kind_pathproblem
    kind_problem
    project
    relatedLocations
    rules

The revised table overview is in the jupyter notebook
scripts/multi-table-overview.ipynb

The file notes/typegraph-multi-with-tables.pdf illustrates what original (sarif)
tables are used to form the base (derived) tables.
2022-03-23 16:37:41 -07:00
Michael Hohn
db00f17137 Some cleanup based on pyflakes output 2022-03-17 17:23:53 -07:00
Michael Hohn
bdf85eafc8 Add a collection of commands to run static python checkers 2022-03-17 17:21:58 -07:00
Michael Hohn
b82c620a1e Add overview of the base tables derived from multi-sarif input; add rules.csv
The table overview is in the jupyter notebook
scripts/multi-table-overview.ipynb and makes use of some formatting
customizations to actually get an overview.

The initial `projects` table had far too many entries; the `rules` part
is now in a separate `rules` table.
2022-03-16 16:54:14 -07:00
Michael Hohn
926e083991 Added field to multi-file signature; the steps are documented in adding-to-typegraph.org 2022-03-15 12:30:05 -07:00
Michael Hohn
0f070a6ae4 sarif-extract-multi: extract combined tables from multiple sarif files
This command introduces a new tree structure that pulls in a collection
of sarif files.  In yaml format, an example is

    - creation_date: '2021-12-09'   # Repository creation date
      primary_language: javascript  # By lines of code
      project_name: treeio/treeio   # Repo name-short name
      query_commit_id: fa9571646c   # Commit id for custom (non-library) queries
      sarif_content: {}             # The sarif content will be attached here
      sarif_file_name: 2021-12-09/results.sarif # Path to sarif file
      scan_start_date: '2021-12-09'             # Beginning date/time of scan
      scan_stop_date:  '2021-12-10'             # End date/time of scan
      tool_name: codeql
      tool_version: v1.27

    - creation_date: '2022-02-25'
      primary_language: javascript
      ...

At run time,

    cd ~/local/sarif-cli/data/treeio
    sarif-extract-multi multi-sarif-01.json test-multi-table

will load the specified sarif files and put them in place of
`sarif_content`, then build tables against the new signature found in
sarif_cli/signature_multi.py, and merge those into 6 larger tables.  The
exported tables are

    artifacts.csv  path-problem.csv  project.csv
    codeflows.csv  problem.csv       related-locations.csv

and they have join keys for further operations.

The new typegraph is rendered in

    notes/typegraph-multi.pdf

using the instructions in

    sarif_cli/signature_multi.py
2022-03-11 23:00:53 -08:00
Michael Hohn
9c151e295b sarif-extract-tables: include relatedLocations from both sources
With the addition of the path-problem output, include both as sources (left joins)
for relatedLocations:

    pd.concat([sf(4055)[['relatedLocations', 'struct_id']],
              sf(9699)[['relatedLocations', 'struct_id']]])
2022-02-22 17:35:39 -08:00
Michael Hohn
1dbd240b5b sarif-extract-tables: Form the codeFlows dataframe and write it out
One of the shorter multi-path results from
     cd ~/local/sarif-cli/data/treeio
     ../../bin/sarif-results-summary -r results.sarif |less
follows; the dataframe formed here starts with the codeFlows-containing table 9699
and has the content of the PATH * output below.

    RESULT: static/js/jquery-ui-1.10.3/demos/dialog/modal-form.html:89:35:93:14: [DOM text](1) is reinte
    rpreted as HTML without escaping meta-characters.
    [DOM text](2) is reinterpreted as HTML without escaping meta-characters.
    [DOM text](3) is reinterpreted as HTML without escaping meta-characters.
    REFERENCE: static/js/jquery-ui-1.10.3/demos/dialog/modal-form.html:90:17:90:27: DOM text
    REFERENCE: static/js/jquery-ui-1.10.3/demos/dialog/modal-form.html:91:17:91:28: DOM text
    REFERENCE: static/js/jquery-ui-1.10.3/demos/dialog/modal-form.html:92:17:92:31: DOM text
    PATH 0
    FLOW STEP 0: static/js/jquery-ui-1.10.3/demos/dialog/modal-form.html:90:17:90:27: name.val()
    FLOW STEP 1: static/js/jquery-ui-1.10.3/demos/dialog/modal-form.html:89:35:93:14: "<tr>"  ... "</tr>
    "
    PATH 1
    FLOW STEP 0: static/js/jquery-ui-1.10.3/demos/dialog/modal-form.html:91:17:91:28: email.val()
    FLOW STEP 1: static/js/jquery-ui-1.10.3/demos/dialog/modal-form.html:89:35:93:14: "<tr>"  ... "</tr>"
    PATH 2
    FLOW STEP 0: static/js/jquery-ui-1.10.3/demos/dialog/modal-form.html:92:17:92:31: password.val()
    FLOW STEP 1: static/js/jquery-ui-1.10.3/demos/dialog/modal-form.html:89:35:93:14: "<tr>"  ... "</tr>"
2022-02-22 16:50:44 -08:00
Michael Hohn
ad738abed3 sarif-extract-tables: also output relatedLocations table
With --related-locations,

    ../../bin/sarif-results-summary -r results.sarif

produces the details

    RESULT: static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js:722:
    72:722:73: Character ''' is repeated [here](1) in the same character class.
    Character ''' is repeated [here](2) in the same character class.
    Character ''' is repeated [here](3) in the same character class.
    REFERENCE: static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js:722:74:722:75: here
    REFERENCE: static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js:722:76:722:77: here
    REFERENCE: static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js:722:78:722:79: here

Via
    ../../bin/sarif-extract-tables results.sarif tables

sarif-extract-tables now produces two output tables,

    tables/
    ├── messages.csv
    └── relatedLocations.csv

that contain the relevant information and can be joined or otherwise combined on
the struct_id_4055 key.

For example, adding to the end of sarif-extract-tables:
    import IPython
    IPython.embed()

    msg = d2[d2.message.str.startswith("Character ''' is repeated [here]")]
    dr3[dr3.struct_id_4055 == msg.struct_id_4055.values[0]]

    In [24]: msg
    Out[24]:
         struct_id_4055  ...                                            message
    180      4796917312  ...  Character ''' is repeated [here](1) in the sam...

    [1 rows x 7 columns]

    In [25]: dr3[dr3.struct_id_4055 == msg.struct_id_4055.values[0]]
    Out[25]:
         struct_id_4055                                                uri  startLine  startColumn  endLine  endColumn message
    180      4796917312  static/js/tinymce/jscripts/tiny_mce/plugins/pa...        722           74      722         75    here
    181      4796917312  static/js/tinymce/jscripts/tiny_mce/plugins/pa...        722           76      722         77    here
    182      4796917312  static/js/tinymce/jscripts/tiny_mce/plugins/pa...        722           78      722         79    here

or manually from the shell:

    # pick up the struct_id_4055:
    0:$ grep "static.*Character ''' is repeated \[here\]" tables/messages.csv
    180,4927448704,static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js,722,72,722,73,"Character ''' is repeated [here](1) in the same character class.

    # and find relatedLocations:
    0:$ grep 4927448704 tables/relatedLocations.csv
    180,4927448704,static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js,722,74,722,75,here
    181,4927448704,static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js,722,76,722,77,here
    182,4927448704,static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js,722,78,722,79,here

Changes:
- Introduce scli-dyys, a random id string for later identification and removal of
  dummy table rows.

- Keep the struct_id_4055 column to join tables as needed.

- Output is now written to a directory as there are always multiple files.
2022-02-16 17:03:58 -08:00
Michael Hohn
ec9a0b5590 sarif-extract-tables: initial version, reproduces known output as table
Reproduce the

    file:line:col:line:col: message

output from

    ../../bin/sarif-results-summary results.sarif | grep size

as test/example.

Original sample output is

    RESULT: static/js/fileuploader.js:1214:13:1214:17: Unused variable size.
    RESULT: static/js/tinymce/jscripts/tiny_mce/plugins/media/js/media.js:438:30:438:34: Unused variable size.

The table result here is

    0:$ ../../bin/sarif-extract-tables results.sarif | grep size
    0,static/js/fileuploader.js,1214,13,1214,17,Unused variable size.
    34,static/js/tinymce/jscripts/tiny_mce/plugins/media/js/media.js,438,30,438,34,Unused variable size.
2022-02-08 20:04:28 -08:00
Michael Hohn
f5e73e90ba sarif-extract-tables: interim commit: first joins
These joins construct the table needed for sarif-results-summary output
2022-02-07 17:11:55 -08:00
Michael Hohn
f246f06d4e sarif-extract-tables: interim commit: form tables
Tables are now formed and kept in the Typegraph instance.
These will be tested using pandas operations to form one of the previous outputs.
2022-02-04 23:56:01 -08:00
Michael Hohn
7a517fa06c sarif-extract-tables: interim commit
Internal destructuring and array aggregration run, but need to be tested.
Tables need to be formed, and pandas selections/joins/etc. used for custom table output.
2022-02-04 14:44:55 -08:00
Michael Hohn
cf8096446b sarif-to-dot: cleanup for and preparation for sarif table extraction 2022-02-01 22:42:25 -08:00
Michael Hohn
c664ae2f8f .gitignore: ignore temporary files 2022-02-01 22:31:10 -08:00
Michael Hohn
119f9a5c18 sarif-to-dot: add more support for --fill-structure option
Expand

  ('Struct4827', ('struct', ('physicalLocation', 'Struct4963'))),

to have fields

  ( 'Struct2683',
    ( 'struct',
      ('id', 'Int'),
      ('message', 'Struct2774'),
      ('physicalLocation', 'Struct4963')))

and avoid a redundant table.
2022-01-27 18:55:02 -08:00
Michael Hohn
eb53ede8b1 sarif-to-dot: add more support for --fill-structure option
Common to all:
| ('locations', 'Array008')            |
| ('message', 'Struct009')             |
| ('partialFingerprints', 'Struct010') |
| ('rule', 'Struct011')                |
| ('ruleId', 'String'),                |
| ('ruleIndex', 'Int')))               |

Only some problems and flow problems have
| ('relatedLocations', 'Array014') |

Add dummy value for relatedLocations to reduce to two result categories,
@kind flow problem and @kind problem.
2022-01-27 18:18:43 -08:00
Michael Hohn
80b22001ce sarif-to-dot: make signature names order-independent
To create entire subtrees conforming to a signature, first make the
signature names order-independent.  Use hashes to name the signatures.
2022-01-27 17:53:14 -08:00
Michael Hohn
3e5d3ff5de Added interesting sarif structure diagram to notes/ 2022-01-26 23:25:30 -08:00
Michael Hohn
0b13a297a5 sarif-to-dot: add more support for --fill-structure option
Ensure

    ('Array003', ('array', (0, 'String'))),

is always present, collapse the following into one:

( 'Struct032',
  ( 'struct',
    ('artifacts', 'Array002'),
    ('columnKind', 'String'),
    ('newlineSequences', 'Array003'),
    ('properties', 'Struct004'),
    ('results', 'Array023'),
    ('tool', 'Struct029'),
    ('versionControlProvenance', 'Array031'))),

( 'Struct033',
  ( 'struct',
    ('artifacts', 'Array002'),
    ('columnKind', 'String'),
    ('properties', 'Struct004'),
    ('results', 'Array023'),
    ('tool', 'Struct029'),
    ('versionControlProvenance', 'Array031')))
2022-01-26 22:27:07 -08:00
Michael Hohn
2adf0dfa21 sarif-to-dot: increase graph ranksep to get intelligible edges 2022-01-26 16:15:42 -08:00
Michael Hohn
2c98cf0d41 sarif-to-dot: add more support for --fill-structure option
When both

   ('message', 'Struct009'),
   ('physicalLocation', 'Struct006'))),

are present, ensure

      ('id', 'Int'),

also is.
2022-01-26 16:06:15 -08:00