Files
sarif-cli/notes
..
2023-07-11 20:26:40 -07:00
2022-04-11 19:24:12 -07:00

The notes directory

This directory is for notes that may be useful, but aren't complete enough to serve as documentation in their current state.

Think of it as staging for ../docs.

Short notes start as sections in this README. They will be moved if separate file make more sense.

The typegraphs

The type graph files are derived from a sarif input file, with various options controlling output.

To produce dot maps of a sarif file type graph, from raw (largest) to fully filled (most compact):

  cd ../data/treeio/2022-02-25

  # Everything:
  ../../../bin/sarif-to-dot -t -d  results.sarif | dot -Tpdf > typegraph-td.pdf

  # Suppress edges to int/bool/string types in dot graph
  ../../../bin/sarif-to-dot -td -n results.sarif | dot -Tpdf > typegraph-tdn.pdf

  # Additionally, only report unique array entry signatures
  ../../../bin/sarif-to-dot -td -nu results.sarif | dot -Tpdf > typegraph-tdnu.pdf

  # Additionally, fill in missing (optional) entries in sarif input before other steps.
  ../../../bin/sarif-to-dot -td -nuf results.sarif | dot -Tpdf > typegraph-tdnuf.pdf

Debugging the absence of automationDetails.id

The automationDetails.id entry is produced by CodeQL when using the --sarif-category flag.

The prerequisites for tracing its flow through the tools is started in ../data/build-multiple-sarifs.sh

For testing the following is injected into sqlidb-1.1.sarif.

  : '
  "automationDetails" : {
  "id" : "mast-issue/"
  },
  '

Add repl as appropriate, then examine.

Make sure the input is correct

  cd ~/local/sarif-cli/data/codeql-dataflow-sql-injection
  grep -A2 automationDetails sqlidb-1.1.sarif
hohn@gh-hohn ~/local/sarif-cli/data/codeql-dataflow-sql-injection
"automationDetails" : {
      "id" : "mast-issue/"
    },

hohn@gh-hohn ~/local/sarif-cli/data/codeql-dataflow-sql-injection

Create the CSV

  source ~/local/sarif-cli/.venv/bin/activate
  cd ~/local/sarif-cli/data/codeql-dataflow-sql-injection 
  sarif-extract-scans-runner --input-signature CLI - > /dev/null <<EOF
  sqlidb-1.1.sarif
  EOF
hohn@gh-hohn ~/local/sarif-cli/notes
(.venv) 
hohn@gh-hohn ~/local/sarif-cli/notes
(.venv) 
hohn@gh-hohn ~/local/sarif-cli/data/codeql-dataflow-sql-injection
> > (.venv) 
hohn@gh-hohn ~/local/sarif-cli/data/codeql-dataflow-sql-injection
  cd ~/local/sarif-cli/data/codeql-dataflow-sql-injection 
  ls -la sqlidb-1.1*
  find sqlidb-1.1.sarif.scantables -print
hohn@gh-hohn ~/local/sarif-cli/data/codeql-dataflow-sql-injection
-rw-r--r-- 1 hohn staff 8.2K Jul 11 19:25 sqlidb-1.1.sarif
-rw-r--r-- 1 hohn staff  326 Jul 12 16:39 sqlidb-1.1.sarif.csv
-rw-r--r-- 1 hohn staff   72 Jul 12 16:39 sqlidb-1.1.sarif.scanspec

sqlidb-1.1.sarif.scantables:
total 16K
drwxr-xr-x  6 hohn staff  192 Jul 12 16:39 ./
drwxr-xr-x 43 hohn staff 1.4K Jul 12 16:39 ../
-rw-r--r--  1 hohn staff  622 Jul 12 16:39 codeflows.csv
-rw-r--r--  1 hohn staff  165 Jul 12 16:39 projects.csv
-rw-r--r--  1 hohn staff  589 Jul 12 16:39 results.csv
-rw-r--r--  1 hohn staff  343 Jul 12 16:39 scans.csv
(.venv) 
hohn@gh-hohn ~/local/sarif-cli/data/codeql-dataflow-sql-injection
sqlidb-1.1.sarif.scantables
sqlidb-1.1.sarif.scantables/codeflows.csv
sqlidb-1.1.sarif.scantables/scans.csv
sqlidb-1.1.sarif.scantables/results.csv
sqlidb-1.1.sarif.scantables/projects.csv
(.venv) 
hohn@gh-hohn ~/local/sarif-cli/data/codeql-dataflow-sql-injection

Check if automationDetails or its value is in output

  cd ~/local/sarif-cli/data/codeql-dataflow-sql-injection/sqlidb-1.1.sarif.scantables
  ag automationDetails | cat
(.venv) 
hohn@gh-hohn ~/local/sarif-cli/data/codeql-dataflow-sql-injection/sqlidb-1.1.sarif.scantables
projects.csv:1:"id","project_name","creation_date","repo_url","primary_language","languages_analyzed","automationDetails"
(.venv) 
hohn@gh-hohn ~/local/sarif-cli/data/codeql-dataflow-sql-injection/sqlidb-1.1.sarif.scantables
(.venv) 
hohn@gh-hohn ~/local/sarif-cli/data/codeql-dataflow-sql-injection/sqlidb-1.1.sarif.scantables
(.venv) 
hohn@gh-hohn ~/local/sarif-cli/data/codeql-dataflow-sql-injection/sqlidb-1.1.sarif.scantables
(.venv) 
hohn@gh-hohn ~/local/sarif-cli/data/codeql-dataflow-sql-injection/sqlidb-1.1.sarif.scantables
(.venv) 
hohn@gh-hohn ~/local/sarif-cli/data/codeql-dataflow-sql-injection/sqlidb-1.1.sarif.scantables

See if the magic value is present

  cd ~/local/sarif-cli/data/codeql-dataflow-sql-injection/sqlidb-1.1.sarif.scantables
  ag mast-issue |cat
(.venv) 
hohn@gh-hohn ~/local/sarif-cli/data/codeql-dataflow-sql-injection/sqlidb-1.1.sarif.scantables
projects.csv:2:490227419655596076,"vcp-no-uri","1970-01-01","vcp-no-uri","unknown","unknown","mast-issue/"
(.venv) 
hohn@gh-hohn ~/local/sarif-cli/data/codeql-dataflow-sql-injection/sqlidb-1.1.sarif.scantables
(.venv) 
hohn@gh-hohn ~/local/sarif-cli/data/codeql-dataflow-sql-injection/sqlidb-1.1.sarif.scantables
(.venv) 
hohn@gh-hohn ~/local/sarif-cli/data/codeql-dataflow-sql-injection/sqlidb-1.1.sarif.scantables

Nothing is in the output, so trace execution to see where it's dropped

  cd ~/local/sarif-cli/notes && ag -l automationDetails ../sarif_cli  |cat
../sarif_cli/scan_tables.py
../sarif_cli/signature_single_CLI.py
../sarif_cli/table_joins_CLI.py
../sarif_cli/signature.py
(.venv) 
hohn@gh-hohn ~/local/sarif-cli/notes

Trace the call chain

Trace the call chain to one of

../sarif_cli/scan_tables.py
../sarif_cli/table_joins_CLI.py
../sarif_cli/signature.py

Entry is

  sarif-extract-scans-runner --input-signature CLI - > /dev/null <<EOF
  sqlidb-1.1.sarif
  EOF
  1. sarif-extract-scans-runner

    1. calls ~/local/sarif-cli/bin/sarif-extract-scans-runner::runstats = subprocess.run(\['sarif-extract-scans', scan_spec_file, output_dir, csv_outfile, "-f", args.input_signature\],

      The following will drop into the inserted repls:

        cd ~/local/sarif-cli/data/codeql-dataflow-sql-injection 
        sarif-extract-scans \
            sqlidb-1.sarif.scanspec \
            sqlidb-1.sarif.scantables \
            sqlidb-1.sarif.csv \
            -f CLI
      1. calls ~/local/sarif-cli/bin/sarif-extract-scans::sarif_struct = load(scan_spec\['sarif_file_name'\])
      2. uses ~/local/sarif-cli/bin/sarif-extract-scans::location_info = tj.joins_for_location_info(tgraph)

Run using embedded repls

The following will drop into the inserted repls:

  cd ~/local/sarif-cli/data/codeql-dataflow-sql-injection 
  sarif-extract-scans \
      sqlidb-1.1.sarif.scanspec \
      sqlidb-1.1.sarif.scantables \
      sqlidb-1.1.sarif.csv \
      -f CLI

The line

.rename(columns={"id": "automationDetails"})

has the right effect:

  In [3]: project_df_temp1.T
  Out[3]: 
                                                                        0
  struct_id_5521                                               4796854592
  $schema                   https://json.schemastore.org/sarif-2.1.0.json
  version_5521                                                      2.1.0
  value_index_1273                                                      0
  artifacts                                                    4797197888
  columnKind                                               utf16CodeUnits
  newlineSequences                                             4797197568
  properties                                                   4797244480
  results                                                      4797198208
  tool                                                         4797244672
  versionControlProvenance                                     4797218944
  automationDetails                                           mast-issue/

The line

        extra = b.project.automationDetails[0]

also works:

In [1]: extra
Out[1]: 'mast-issue/'

but

extra

is only used in

        e.project_id = hash.hash_unique((repoUri+extra).encode())

when

In [5]: "repositoryUri" in b.project
Out[5]: True

For reference:

In [8]: b.project.automationDetails
Out[8]: 
0    mast-issue/
Name: automationDetails, dtype: object

This is in joins_for_projects, called from

scantabs.projects = st.joins_for_projects(bt, external_info)

Add

        "automationDetails"  : extra,

to the

# Projects table

And repeat the Check if automationDetails or its value is in output Still missing. Must be dropped between dataframe creation and output.

Use project_name to search.

class ScanTablesTypes:

has no entry for

automationDetails

Add

"automationDetails"  : pd.StringDtype(),

Similar for

File: sarif_cli/columns.py

And repeat Run using embedded repls, then Check if automationDetails or its value is in output

SARIF and Signatures

signature here is e.g., struct_graph_LGTM in ./sarif_cli/signature_single.py

The signatures are those produced by codeql in the past. They are not meant to be updated frequently; they arose and are used as follows.

  1. The SARIF standard is quite loose, with many optional fields.
  2. For producing CSV tabular output (and for internal table processing), the sarif-cli tool needed an exact signature. Using existing SARIF files was a straightforward way to get a signature.
  3. When a SARIF file contains extra keys, a warning is issued but processing continues.
  4. When a sarif file is missing an entry thats in the signature, a fatal error is issued.

The only time you need to update the signature is when you get fatal errors — there will be a detailed message about expected vs. found fields.

Footnotes