Files
sarif-cli/notes/README.org
2023-12-06 14:09:51 -08:00

320 lines
12 KiB
Org Mode
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# -*- mode: org; org-confirm-babel-evaluate: nil; coding: utf-8 -*-
#+OPTIONS: org-confirm-babel-evaluate:nil
#+LANGUAGE: en
#+TEXT:
#+OPTIONS: ^:{} H:3 num:t \n:nil @:t ::t |:t ^:nil f:t *:t TeX:t LaTeX:t skip:nil p:nil
#+OPTIONS: toc:nil
#+HTML_HEAD: <link rel="stylesheet" type="text/css" href="./l3style.css"/>
#+HTML: <div id="toc">
#+TOC: headlines 3 insert TOC here, with two headline levels
#+HTML: </div>
#
#+HTML: <div id="org-content">
* The notes directory
This directory is for notes that may be useful, but aren't complete enough to
serve as documentation in their current state.
Think of it as staging for [[../docs]].
Short notes start as sections in this README. They will be moved if separate
file make more sense.
** The typegraphs
The type graph files are derived from a sarif input file, with various options
controlling output.
To produce dot maps of a sarif file type graph, from raw (largest) to fully
filled (most compact):
#+BEGIN_SRC sh
cd ../data/treeio/2022-02-25
# Everything:
../../../bin/sarif-to-dot -t -d results.sarif | dot -Tpdf > typegraph-td.pdf
# Suppress edges to int/bool/string types in dot graph
../../../bin/sarif-to-dot -td -n results.sarif | dot -Tpdf > typegraph-tdn.pdf
# Additionally, only report unique array entry signatures
../../../bin/sarif-to-dot -td -nu results.sarif | dot -Tpdf > typegraph-tdnu.pdf
# Additionally, fill in missing (optional) entries in sarif input before other steps.
../../../bin/sarif-to-dot -td -nuf results.sarif | dot -Tpdf > typegraph-tdnuf.pdf
#+END_SRC
** Debugging the absence of automationDetails.id
The =automationDetails.id= entry is produced by CodeQL when using the
=--sarif-category= flag.
The prerequisites for tracing its flow through the tools is started in
[[../data/build-multiple-sarifs.sh]]
For testing the following is injected into =sqlidb-1.1.sarif=.
#+BEGIN_SRC text
: '
"automationDetails" : {
"id" : "mast-issue/"
},
'
#+END_SRC
*** Add repl as appropriate, then examine.
Make sure the input is correct
#+BEGIN_SRC sh :session shared :results output :eval never-export
cd ~/local/sarif-cli/data/codeql-dataflow-sql-injection
grep -A2 automationDetails sqlidb-1.1.sarif
#+END_SRC
#+RESULTS:
: hohn@gh-hohn ~/local/sarif-cli/data/codeql-dataflow-sql-injection
: "automationDetails" : {
: "id" : "mast-issue/"
: },
:
: hohn@gh-hohn ~/local/sarif-cli/data/codeql-dataflow-sql-injection
*** Create the CSV
#+BEGIN_SRC sh :session shared :results output :eval never-export
source ~/local/sarif-cli/.venv/bin/activate
cd ~/local/sarif-cli/data/codeql-dataflow-sql-injection
sarif-extract-scans-runner --input-signature CLI - > /dev/null <<EOF
sqlidb-1.1.sarif
EOF
#+END_SRC
#+RESULTS:
#+begin_example
hohn@gh-hohn ~/local/sarif-cli/notes
(.venv)
hohn@gh-hohn ~/local/sarif-cli/notes
(.venv)
hohn@gh-hohn ~/local/sarif-cli/data/codeql-dataflow-sql-injection
> > (.venv)
hohn@gh-hohn ~/local/sarif-cli/data/codeql-dataflow-sql-injection
#+end_example
#+BEGIN_SRC sh :session shared :results output :eval never-export
cd ~/local/sarif-cli/data/codeql-dataflow-sql-injection
ls -la sqlidb-1.1*
find sqlidb-1.1.sarif.scantables -print
#+END_SRC
#+RESULTS:
#+begin_example
hohn@gh-hohn ~/local/sarif-cli/data/codeql-dataflow-sql-injection
-rw-r--r-- 1 hohn staff 8.2K Jul 11 19:25 sqlidb-1.1.sarif
-rw-r--r-- 1 hohn staff 326 Jul 12 16:39 sqlidb-1.1.sarif.csv
-rw-r--r-- 1 hohn staff 72 Jul 12 16:39 sqlidb-1.1.sarif.scanspec
sqlidb-1.1.sarif.scantables:
total 16K
drwxr-xr-x 6 hohn staff 192 Jul 12 16:39 ./
drwxr-xr-x 43 hohn staff 1.4K Jul 12 16:39 ../
-rw-r--r-- 1 hohn staff 622 Jul 12 16:39 codeflows.csv
-rw-r--r-- 1 hohn staff 165 Jul 12 16:39 projects.csv
-rw-r--r-- 1 hohn staff 589 Jul 12 16:39 results.csv
-rw-r--r-- 1 hohn staff 343 Jul 12 16:39 scans.csv
(.venv)
hohn@gh-hohn ~/local/sarif-cli/data/codeql-dataflow-sql-injection
sqlidb-1.1.sarif.scantables
sqlidb-1.1.sarif.scantables/codeflows.csv
sqlidb-1.1.sarif.scantables/scans.csv
sqlidb-1.1.sarif.scantables/results.csv
sqlidb-1.1.sarif.scantables/projects.csv
(.venv)
hohn@gh-hohn ~/local/sarif-cli/data/codeql-dataflow-sql-injection
#+end_example
*** Check if =automationDetails= or its value is in output
#+BEGIN_SRC sh :session shared :results output :eval never-export
cd ~/local/sarif-cli/data/codeql-dataflow-sql-injection/sqlidb-1.1.sarif.scantables
ag automationDetails | cat
#+END_SRC
#+RESULTS:
: (.venv)
: hohn@gh-hohn ~/local/sarif-cli/data/codeql-dataflow-sql-injection/sqlidb-1.1.sarif.scantables
: projects.csv:1:"id","project_name","creation_date","repo_url","primary_language","languages_analyzed","automationDetails"
: (.venv)
: hohn@gh-hohn ~/local/sarif-cli/data/codeql-dataflow-sql-injection/sqlidb-1.1.sarif.scantables
#+RESULTS:
: (.venv)
: hohn@gh-hohn ~/local/sarif-cli/data/codeql-dataflow-sql-injection/sqlidb-1.1.sarif.scantables
: (.venv)
: hohn@gh-hohn ~/local/sarif-cli/data/codeql-dataflow-sql-injection/sqlidb-1.1.sarif.scantables
#+RESULTS:
: (.venv)
: hohn@gh-hohn ~/local/sarif-cli/data/codeql-dataflow-sql-injection/sqlidb-1.1.sarif.scantables
: (.venv)
: hohn@gh-hohn ~/local/sarif-cli/data/codeql-dataflow-sql-injection/sqlidb-1.1.sarif.scantables
See if the magic value is present
#+BEGIN_SRC sh :session shared :results output :eval never-export
cd ~/local/sarif-cli/data/codeql-dataflow-sql-injection/sqlidb-1.1.sarif.scantables
ag mast-issue |cat
#+END_SRC
#+RESULTS:
: (.venv)
: hohn@gh-hohn ~/local/sarif-cli/data/codeql-dataflow-sql-injection/sqlidb-1.1.sarif.scantables
: projects.csv:2:490227419655596076,"vcp-no-uri","1970-01-01","vcp-no-uri","unknown","unknown","mast-issue/"
: (.venv)
: hohn@gh-hohn ~/local/sarif-cli/data/codeql-dataflow-sql-injection/sqlidb-1.1.sarif.scantables
#+RESULTS:
: (.venv)
: hohn@gh-hohn ~/local/sarif-cli/data/codeql-dataflow-sql-injection/sqlidb-1.1.sarif.scantables
: (.venv)
: hohn@gh-hohn ~/local/sarif-cli/data/codeql-dataflow-sql-injection/sqlidb-1.1.sarif.scantables
*** Nothing is in the output, so trace execution to see where it's dropped
#+BEGIN_SRC sh :session shared :results output :eval never-export
cd ~/local/sarif-cli/notes && ag -l automationDetails ../sarif_cli |cat
#+END_SRC
#+RESULTS:
: ../sarif_cli/scan_tables.py
: ../sarif_cli/signature_single_CLI.py
: ../sarif_cli/table_joins_CLI.py
: ../sarif_cli/signature.py
: (.venv)
: hohn@gh-hohn ~/local/sarif-cli/notes
*** Trace the call chain
Trace the call chain to one of
: ../sarif_cli/scan_tables.py
: ../sarif_cli/table_joins_CLI.py
: ../sarif_cli/signature.py
Entry is
#+BEGIN_SRC sh :session shared :results output :eval never-export
sarif-extract-scans-runner --input-signature CLI - > /dev/null <<EOF
sqlidb-1.1.sarif
EOF
#+END_SRC
1. sarif-extract-scans-runner
1. calls [[file:~/local/sarif-cli/bin/sarif-extract-scans-runner::runstats = subprocess.run(\['sarif-extract-scans', scan_spec_file, output_dir, csv_outfile, "-f", args.input_signature\],]]
The following will drop into the inserted repls:
#+BEGIN_SRC sh :session shared :results output :eval never-export
cd ~/local/sarif-cli/data/codeql-dataflow-sql-injection
sarif-extract-scans \
sqlidb-1.1.sarif.scanspec \
sqlidb-1.1.sarif.scantables \
sqlidb-1.1.sarif.csv \
-f CLI
#+END_SRC
1. calls [[file:~/local/sarif-cli/bin/sarif-extract-scans::sarif_struct = load(scan_spec\['sarif_file_name'\])]]
2. uses [[file:~/local/sarif-cli/bin/sarif-extract-scans::location_info = tj.joins_for_location_info(tgraph)]]
*** Run using embedded repls
The following will drop into the inserted repls:
#+BEGIN_SRC sh :session shared :results output :eval never-export
cd ~/local/sarif-cli/data/codeql-dataflow-sql-injection
sarif-extract-scans \
sqlidb-1.1.sarif.scanspec \
sqlidb-1.1.sarif.scantables \
sqlidb-1.1.sarif.csv \
-f CLI
#+END_SRC
The line
: .rename(columns={"id": "automationDetails"})
has the right effect:
#+BEGIN_SRC text
In [3]: project_df_temp1.T
Out[3]:
0
struct_id_5521 4796854592
$schema https://json.schemastore.org/sarif-2.1.0.json
version_5521 2.1.0
value_index_1273 0
artifacts 4797197888
columnKind utf16CodeUnits
newlineSequences 4797197568
properties 4797244480
results 4797198208
tool 4797244672
versionControlProvenance 4797218944
automationDetails mast-issue/
#+END_SRC
The line
: extra = b.project.automationDetails[0]
also works:
#+BEGIN_SRC text
In [1]: extra
Out[1]: 'mast-issue/'
#+END_SRC
but
: extra
is only used in
: e.project_id = hash.hash_unique((repoUri+extra).encode())
when
#+BEGIN_SRC text
In [5]: "repositoryUri" in b.project
Out[5]: True
#+END_SRC
For reference:
#+BEGIN_SRC text
In [8]: b.project.automationDetails
Out[8]:
0 mast-issue/
Name: automationDetails, dtype: object
#+END_SRC
This is in joins_for_projects, called from
: scantabs.projects = st.joins_for_projects(bt, external_info)
Add
: "automationDetails" : extra,
to the
: # Projects table
And repeat the [[*Check if =automationDetails= or its value is in output][Check if =automationDetails= or its value is in output]]
Still missing. Must be dropped between dataframe creation and output.
Use project_name to search.
: class ScanTablesTypes:
has no entry for
: automationDetails
Add
: "automationDetails" : pd.StringDtype(),
Similar for
: File: sarif_cli/columns.py
And repeat [[*Run using embedded repls][Run using embedded repls]], then
[[*Check if =automationDetails= or its value is in output][Check if =automationDetails= or its value is in output]]
* SARIF and Signatures
signature here is e.g., struct_graph_LGTM in ./sarif_cli/signature_single.py
The signatures are those produced by codeql in the past. They are not meant to
be updated frequently; they arose and are used as follows.
1. The SARIF standard is quite loose, with many optional fields.
2. For producing CSV tabular output (and for internal table processing), the
sarif-cli tool needed an exact signature. Using existing SARIF files was a
straightforward way to get a signature.
3. When a SARIF file contains extra keys, a warning is issued but processing
continues.
4. When a sarif file is missing an entry thats in the signature, a fatal error
is issued.
The only time you need to update the signature is when you get fatal errors —
there will be a detailed message about expected vs. found fields.
* Footnotes
#+HTML: </div>