sarif-cli/docs/sarif-handling.org

# -*- coding: utf-8 -*-
* Output of multi-value results
** Multiple message values, no flow path
   Results of the query https://lgtm.com/query/rule:1790078/lang:javascript/ are
   reported via the =select=
   #+BEGIN_SRC text
     select first, "Character '" + first +
     "' is repeated $@ in the same character class.", repeat, "here"
   #+END_SRC
   and the json/yaml file has entries
   #+BEGIN_SRC text
     message:
       text: |-
         Character ''' is repeated [here](1) in the same character class.
         Character ''' is repeated [here](2) in the same character class.
         Character ''' is repeated [here](3) in the same character class.
   #+END_SRC

   Their display in lgtm is [[https://lgtm.com/projects/g/treeio/treeio/snapshot/6b914d98b0a86ae9996945bd501e133d0f73ec6e/files/static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js#x7820a043f81b48cd:1][here]].

   Multiple values of =first= produce distinct multiple results, multiple values of
   =repeat= produce multiple =relatedLocations= within one =results= array entry.

   #+BEGIN_SRC text
     relatedLocations:
       - id: 1
         physicalLocation:
           artifactLocation:
             uri: static/js/tinymce/jscripts/tiny_mce/plugins/paste/editor_plugin_src.js
             uriBaseId: '%SRCROOT%'
             index: 41
           region:
             startLine: 722
             startColumn: 74
             endColumn: 75
         message:
           text: here
       - id: 2
         ...
       - id: 3
         ...
   #+END_SRC

   This is consistent with the use of =first= as an anchor for alerts and for path
   problems.

   However, things get more complicated when there are flow paths.  Thus, the
   approach of section [[*Multiple message values and flow paths][Multiple message values and flow paths]] should also be used
   here for consistency.

   See also
   - Full results:  [[../data/treeio/results.yaml]]
   - Trimmed test set:  [[../data/treeio/test_set_1.yaml]]

** Multiple message values and flow paths
   The query =com.lgtm/javascript-queries:js/unsafe-jquery-plugin=
   (full version [[https://github.com/github/codeql/blob/codeql-cli/v2.7.3/javascript/ql/src/Security/CWE-079/UnsafeJQueryPlugin.ql][CWE-079/UnsafeJQueryPlugin.ql]], lgtm.com results [[https://lgtm.com/projects/g/treeio/treeio?mode=list&id=js%2Funsafe-jquery-plugin][here]])
   has =select=
   #+begin_src javascript
     select sink.getNode(), source, sink, "Potential XSS vulnerability in the $@.", plugin,
       "'$.fn." + plugin.getPluginName() + "' plugin"
   #+end_src

   The full results are found in [[file:../data/treeio/results.yaml::Potential XSS vulnerability in the \['$.fn.datepicker' plugin\](1).][results.yaml]], with a testing subset in [[file:../data/treeio/test_set_1.yaml::Potential XSS vulnerability in the \['$.fn.datepicker'
 plugin\](1).][test_set_1.yaml]]; the results for this query are
   #+BEGIN_SRC text
     message:
       text: |-
         Potential XSS vulnerability in the ['$.fn.datepicker' plugin](1).
         Potential XSS vulnerability in the ['$.fn.datepicker' plugin](2).
         Potential XSS vulnerability in the ['$.fn.datepicker' plugin](3).
   #+END_SRC
   with 3 =relatedLocations= and 6 =threadFlows=.

   The original query's first column is a sink (=sink.getNode()=), so the
   =threadFlows= should terminate there -- and they do.
   #+BEGIN_SRC text
     locations:
       - physicalLocation:
           artifactLocation:
             uri: static/js/jquery-ui-1.10.3/ui/jquery.ui.datepicker.js
             uriBaseId: '%SRCROOT%'
             index: 61
           region:
             startLine: 1027
             startColumn: 6
             endColumn: 14
   #+END_SRC

   In the above query, the =source= is connected to the =plugin= (possibly
   restricting the result set),
   and for this particular result, the first two =threadFlows=' first locations are
   contained in the first =relatedLocation='s line range.
   Similarly, =threadFlows= 2 & 3 are contained in the second =relatedLocation=.

   This need not be visible from the output by itself, but we can
   assume the results are a straight nested product:
   $$  1\ result
   \times 3\ {relatedLocations\over result}
   \times 2\ {threadFlows \over location}
   $$

   This way, we can group a =relatedLocation= with one or more =threadFlows= and
   thus separate one of these clusters into separate results for cleaner
   exporting / viewing.

   Instead of
   #+BEGIN_SRC yaml
     - message
       - text: |-
         Potential XSS vulnerability in the ['$.fn.datepicker' plugin](1).
         Potential XSS vulnerability in the ['$.fn.datepicker' plugin](2).
         Potential XSS vulnerability in the ['$.fn.datepicker' plugin](3).
     - relatedLocations
       - id 1
       - id 2
       - id 3
     - codeFlows
       - threadFlows
       - threadFlows
       - threadFlows
       - threadFlows
       - threadFlows
       - threadFlows
   #+END_SRC

   this becomes a triple, with the first one:

   #+BEGIN_SRC yaml
     - message
       - text: |-
         Potential XSS vulnerability in the ['$.fn.datepicker' plugin](1).
     - relatedLocations
       - id 1
     - codeFlows
       - threadFlows
       - threadFlows
   #+END_SRC

   As a note, the standard's [[https://docs.oasis-open.org/sarif/sarif/v2.1.0/os/sarif-v2.1.0-os.html#_Toc34317744][3.37 threadFlow object]] entry does not connect the
   two, and a query may or may not connect them.  Even if the there is a logical
   connection, there need not be a physical (location) connection, so a
   =threadFlow='s region may or may not overlap with a =relatedLocation='s.

   Using
   #+BEGIN_SRC sh
     sarif-results-summary \
         -s data/treeio/treeio \
         -r data/treeio/results.sarif | \
         sed -n "/modal-form.html:89:35:93:14/,/RESULT/p" |less
   #+END_SRC
   we see a query result with  3 =relatedLocations= and 3 =threadFlows= with very
   obvious connections between them.  More importantly, the ordering is
   consistent.

** Multiple message values and source/sink pairs
   As a special case of [[*Multiple message values and flow paths][Multiple message values and flow paths]], we can report only
   the (source, sink) pairs and drop the flow paths.  This is useful in result
   reports spanning many repositories and multiple tools.

   Considering
   #+BEGIN_SRC text
     Potential XSS vulnerability in the ['$.fn.datepicker' plugin](1).
   #+END_SRC
   found in [[file:../data/treeio/test_set_1.yaml::Potential XSS vulnerability in the \['$.fn.datepicker'    plugin\](1).][test_set_1.yaml]], stripping the =threadFlows= paths, and looking at the
   first two =threadFlows= gives the following simplified structure.
   Note that without the flow paths, the first two results are now identical
   =(source, sink)= pairs; the same holds for 2,3 and 4,5.

   #+BEGIN_SRC yaml
     - ruleId: com.lgtm/javascript-queries:js/unsafe-jquery-plugin
       codeFlows:
         - threadFlows:
             - locations:
                 - location:
                     physicalLocation:
                       artifactLocation:
                         uri: static/js/jquery-ui-1.10.3/ui/jquery-ui.js
                         uriBaseId: '%SRCROOT%'
                         index: 72
                       region:
                         startLine: 9598
                         startColumn: 28
                         endColumn: 35
                     message:
                       text: options
                 - location:
                     physicalLocation:
                       artifactLocation:
                         uri: static/js/jquery-ui-1.10.3/ui/jquery.ui.datepicker.js
                         uriBaseId: '%SRCROOT%'
                         index: 61
                       region:
                         startLine: 1027
                         startColumn: 6
                         endColumn: 14
                     message:
                       text: altField
         - threadFlows:
             - locations:
                 - location:
                     physicalLocation:
                       artifactLocation:
                         uri: static/js/jquery-ui-1.10.3/ui/jquery-ui.js
                         uriBaseId: '%SRCROOT%'
                         index: 72
                       region:
                         startLine: 9598
                         startColumn: 28
                         endColumn: 35
                     message:
                       text: options
                 - location:
                     physicalLocation:
                       artifactLocation:
                         uri: static/js/jquery-ui-1.10.3/ui/jquery.ui.datepicker.js
                         uriBaseId: '%SRCROOT%'
                         index: 61
                       region:
                         startLine: 1027
                         startColumn: 6
                         endColumn: 14
                     message:
                       text: altField

   #+END_SRC

#
#+OPTIONS: ^:{}