The new base tables produced by `sarif-extract-multi` are
artifacts
codeflows
kind_pathproblem
kind_problem
project
relatedLocations
rules
The revised table overview is in the jupyter notebook
scripts/multi-table-overview.ipynb
The file notes/typegraph-multi-with-tables.pdf illustrates what original (sarif)
tables are used to form the base (derived) tables.
The table overview is in the jupyter notebook
scripts/multi-table-overview.ipynb and makes use of some formatting
customizations to actually get an overview.
The initial `projects` table had far too many entries; the `rules` part
is now in a separate `rules` table.
This command introduces a new tree structure that pulls in a collection
of sarif files. In yaml format, an example is
- creation_date: '2021-12-09' # Repository creation date
primary_language: javascript # By lines of code
project_name: treeio/treeio # Repo name-short name
query_commit_id: fa9571646c # Commit id for custom (non-library) queries
sarif_content: {} # The sarif content will be attached here
sarif_file_name: 2021-12-09/results.sarif # Path to sarif file
scan_start_date: '2021-12-09' # Beginning date/time of scan
scan_stop_date: '2021-12-10' # End date/time of scan
tool_name: codeql
tool_version: v1.27
- creation_date: '2022-02-25'
primary_language: javascript
...
At run time,
cd ~/local/sarif-cli/data/treeio
sarif-extract-multi multi-sarif-01.json test-multi-table
will load the specified sarif files and put them in place of
`sarif_content`, then build tables against the new signature found in
sarif_cli/signature_multi.py, and merge those into 6 larger tables. The
exported tables are
artifacts.csv path-problem.csv project.csv
codeflows.csv problem.csv related-locations.csv
and they have join keys for further operations.
The new typegraph is rendered in
notes/typegraph-multi.pdf
using the instructions in
sarif_cli/signature_multi.py
With the addition of the path-problem output, include both as sources (left joins)
for relatedLocations:
pd.concat([sf(4055)[['relatedLocations', 'struct_id']],
sf(9699)[['relatedLocations', 'struct_id']]])
One of the shorter multi-path results from
cd ~/local/sarif-cli/data/treeio
../../bin/sarif-results-summary -r results.sarif |less
follows; the dataframe formed here starts with the codeFlows-containing table 9699
and has the content of the PATH * output below.
RESULT: static/js/jquery-ui-1.10.3/demos/dialog/modal-form.html:89:35:93:14: [DOM text](1) is reinte
rpreted as HTML without escaping meta-characters.
[DOM text](2) is reinterpreted as HTML without escaping meta-characters.
[DOM text](3) is reinterpreted as HTML without escaping meta-characters.
REFERENCE: static/js/jquery-ui-1.10.3/demos/dialog/modal-form.html:90:17:90:27: DOM text
REFERENCE: static/js/jquery-ui-1.10.3/demos/dialog/modal-form.html:91:17:91:28: DOM text
REFERENCE: static/js/jquery-ui-1.10.3/demos/dialog/modal-form.html:92:17:92:31: DOM text
PATH 0
FLOW STEP 0: static/js/jquery-ui-1.10.3/demos/dialog/modal-form.html:90:17:90:27: name.val()
FLOW STEP 1: static/js/jquery-ui-1.10.3/demos/dialog/modal-form.html:89:35:93:14: "<tr>" ... "</tr>
"
PATH 1
FLOW STEP 0: static/js/jquery-ui-1.10.3/demos/dialog/modal-form.html:91:17:91:28: email.val()
FLOW STEP 1: static/js/jquery-ui-1.10.3/demos/dialog/modal-form.html:89:35:93:14: "<tr>" ... "</tr>"
PATH 2
FLOW STEP 0: static/js/jquery-ui-1.10.3/demos/dialog/modal-form.html:92:17:92:31: password.val()
FLOW STEP 1: static/js/jquery-ui-1.10.3/demos/dialog/modal-form.html:89:35:93:14: "<tr>" ... "</tr>"
With --related-locations,
../../bin/sarif-results-summary -r results.sarif
produces the details
RESULT: static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js:722:
72:722:73: Character ''' is repeated [here](1) in the same character class.
Character ''' is repeated [here](2) in the same character class.
Character ''' is repeated [here](3) in the same character class.
REFERENCE: static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js:722:74:722:75: here
REFERENCE: static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js:722:76:722:77: here
REFERENCE: static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js:722:78:722:79: here
Via
../../bin/sarif-extract-tables results.sarif tables
sarif-extract-tables now produces two output tables,
tables/
├── messages.csv
└── relatedLocations.csv
that contain the relevant information and can be joined or otherwise combined on
the struct_id_4055 key.
For example, adding to the end of sarif-extract-tables:
import IPython
IPython.embed()
msg = d2[d2.message.str.startswith("Character ''' is repeated [here]")]
dr3[dr3.struct_id_4055 == msg.struct_id_4055.values[0]]
In [24]: msg
Out[24]:
struct_id_4055 ... message
180 4796917312 ... Character ''' is repeated [here](1) in the sam...
[1 rows x 7 columns]
In [25]: dr3[dr3.struct_id_4055 == msg.struct_id_4055.values[0]]
Out[25]:
struct_id_4055 uri startLine startColumn endLine endColumn message
180 4796917312 static/js/tinymce/jscripts/tiny_mce/plugins/pa... 722 74 722 75 here
181 4796917312 static/js/tinymce/jscripts/tiny_mce/plugins/pa... 722 76 722 77 here
182 4796917312 static/js/tinymce/jscripts/tiny_mce/plugins/pa... 722 78 722 79 here
or manually from the shell:
# pick up the struct_id_4055:
0:$ grep "static.*Character ''' is repeated \[here\]" tables/messages.csv
180,4927448704,static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js,722,72,722,73,"Character ''' is repeated [here](1) in the same character class.
# and find relatedLocations:
0:$ grep 4927448704 tables/relatedLocations.csv
180,4927448704,static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js,722,74,722,75,here
181,4927448704,static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js,722,76,722,77,here
182,4927448704,static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js,722,78,722,79,here
Changes:
- Introduce scli-dyys, a random id string for later identification and removal of
dummy table rows.
- Keep the struct_id_4055 column to join tables as needed.
- Output is now written to a directory as there are always multiple files.
Reproduce the
file:line:col:line:col: message
output from
../../bin/sarif-results-summary results.sarif | grep size
as test/example.
Original sample output is
RESULT: static/js/fileuploader.js:1214:13:1214:17: Unused variable size.
RESULT: static/js/tinymce/jscripts/tiny_mce/plugins/media/js/media.js:438:30:438:34: Unused variable size.
The table result here is
0:$ ../../bin/sarif-extract-tables results.sarif | grep size
0,static/js/fileuploader.js,1214,13,1214,17,Unused variable size.
34,static/js/tinymce/jscripts/tiny_mce/plugins/media/js/media.js,438,30,438,34,Unused variable size.
Internal destructuring and array aggregration run, but need to be tested.
Tables need to be formed, and pandas selections/joins/etc. used for custom table output.
Common to all:
| ('locations', 'Array008') |
| ('message', 'Struct009') |
| ('partialFingerprints', 'Struct010') |
| ('rule', 'Struct011') |
| ('ruleId', 'String'), |
| ('ruleIndex', 'Int'))) |
Only some problems and flow problems have
| ('relatedLocations', 'Array014') |
Add dummy value for relatedLocations to reduce to two result categories,
@kind flow problem and @kind problem.
The command
../../bin/sarif-to-dot results.sarif -u -t -d | dot -Tpdf > raw-nested-types.pdf
produces a good illustration of the problems arising when optional values are absent.
To clean this up, structures missing fields have to be supplemented with those fields,
from right to left in the graph.
This is basically what sarif-results-summary does on the fly, it just has to be applied
to the input tree before collecting the signatures and producing this graph.
Once that is done, the types collected here can be used in SQL table export.