The notes directory
This directory is for notes that may be useful, but aren't complete enough to serve as documentation in their current state.
Think of it as staging for ../docs.
Short notes start as sections in this README. They will be moved if separate file make more sense.
The typegraphs
The type graph files are derived from a sarif input file, with various options controlling output.
To produce dot maps of a sarif file type graph, from raw (largest) to fully filled (most compact):
cd ../data/treeio/2022-02-25
# Everything:
../../../bin/sarif-to-dot -t -d results.sarif | dot -Tpdf > typegraph-td.pdf
# Suppress edges to int/bool/string types in dot graph
../../../bin/sarif-to-dot -td -n results.sarif | dot -Tpdf > typegraph-tdn.pdf
# Additionally, only report unique array entry signatures
../../../bin/sarif-to-dot -td -nu results.sarif | dot -Tpdf > typegraph-tdnu.pdf
# Additionally, fill in missing (optional) entries in sarif input before other steps.
../../../bin/sarif-to-dot -td -nuf results.sarif | dot -Tpdf > typegraph-tdnuf.pdf
Debugging the absence of automationDetails.id
The automationDetails.id entry is produced by CodeQL when using the
--sarif-category flag.
The prerequisites for tracing its flow through the tools is started in ../data/build-multiple-sarifs.sh
For testing the following is injected into sqlidb-1.1.sarif.
: '
"automationDetails" : {
"id" : "mast-issue/"
},
'
Add repl as appropriate, then examine.
Make sure the input is correct
cd ~/local/sarif-cli/data/codeql-dataflow-sql-injection
grep -A2 automationDetails sqlidb-1.1.sarif
[32mhohn@gh-hohn [33m~/local/sarif-cli/data/codeql-dataflow-sql-injection[0m "automationDetails" : { "id" : "mast-issue/" }, [32mhohn@gh-hohn [33m~/local/sarif-cli/data/codeql-dataflow-sql-injection[0m
Create the CSV
source ~/local/sarif-cli/.venv/bin/activate
cd ~/local/sarif-cli/data/codeql-dataflow-sql-injection
sarif-extract-scans-runner --input-signature CLI - > /dev/null <<EOF
sqlidb-1.1.sarif
EOF
[32mhohn@gh-hohn [33m~/local/sarif-cli/notes[0m (.venv) [32mhohn@gh-hohn [33m~/local/sarif-cli/notes[0m (.venv) [32mhohn@gh-hohn [33m~/local/sarif-cli/data/codeql-dataflow-sql-injection[0m > > (.venv) [32mhohn@gh-hohn [33m~/local/sarif-cli/data/codeql-dataflow-sql-injection[0m
cd ~/local/sarif-cli/data/codeql-dataflow-sql-injection
ls -la sqlidb-1.1*
find sqlidb-1.1.sarif.scantables -print
[32mhohn@gh-hohn [33m~/local/sarif-cli/data/codeql-dataflow-sql-injection[0m -rw-r--r-- 1 hohn staff 8.2K Jul 11 19:25 [0m[0msqlidb-1.1.sarif[0m -rw-r--r-- 1 hohn staff 326 Jul 12 16:39 [0msqlidb-1.1.sarif.csv[0m -rw-r--r-- 1 hohn staff 72 Jul 12 16:39 [0msqlidb-1.1.sarif.scanspec[0m sqlidb-1.1.sarif.scantables: total 16K drwxr-xr-x 6 hohn staff 192 Jul 12 16:39 [1;34m.[0m/ drwxr-xr-x 43 hohn staff 1.4K Jul 12 16:39 [1;34m..[0m/ -rw-r--r-- 1 hohn staff 622 Jul 12 16:39 [0mcodeflows.csv[0m -rw-r--r-- 1 hohn staff 165 Jul 12 16:39 [0mprojects.csv[0m -rw-r--r-- 1 hohn staff 589 Jul 12 16:39 [0mresults.csv[0m -rw-r--r-- 1 hohn staff 343 Jul 12 16:39 [0mscans.csv[0m (.venv) [32mhohn@gh-hohn [33m~/local/sarif-cli/data/codeql-dataflow-sql-injection[0m sqlidb-1.1.sarif.scantables sqlidb-1.1.sarif.scantables/codeflows.csv sqlidb-1.1.sarif.scantables/scans.csv sqlidb-1.1.sarif.scantables/results.csv sqlidb-1.1.sarif.scantables/projects.csv (.venv) [32mhohn@gh-hohn [33m~/local/sarif-cli/data/codeql-dataflow-sql-injection[0m
Check if automationDetails or its value is in output
cd ~/local/sarif-cli/data/codeql-dataflow-sql-injection/sqlidb-1.1.sarif.scantables
ag automationDetails | cat
(.venv) [32mhohn@gh-hohn [33m~/local/sarif-cli/data/codeql-dataflow-sql-injection/sqlidb-1.1.sarif.scantables[0m projects.csv:1:"id","project_name","creation_date","repo_url","primary_language","languages_analyzed","automationDetails" (.venv) [32mhohn@gh-hohn [33m~/local/sarif-cli/data/codeql-dataflow-sql-injection/sqlidb-1.1.sarif.scantables[0m
(.venv) [32mhohn@gh-hohn [33m~/local/sarif-cli/data/codeql-dataflow-sql-injection/sqlidb-1.1.sarif.scantables[0m (.venv) [32mhohn@gh-hohn [33m~/local/sarif-cli/data/codeql-dataflow-sql-injection/sqlidb-1.1.sarif.scantables[0m
(.venv) [32mhohn@gh-hohn [33m~/local/sarif-cli/data/codeql-dataflow-sql-injection/sqlidb-1.1.sarif.scantables[0m (.venv) [32mhohn@gh-hohn [33m~/local/sarif-cli/data/codeql-dataflow-sql-injection/sqlidb-1.1.sarif.scantables[0m
See if the magic value is present
cd ~/local/sarif-cli/data/codeql-dataflow-sql-injection/sqlidb-1.1.sarif.scantables
ag mast-issue |cat
(.venv) [32mhohn@gh-hohn [33m~/local/sarif-cli/data/codeql-dataflow-sql-injection/sqlidb-1.1.sarif.scantables[0m projects.csv:2:490227419655596076,"vcp-no-uri","1970-01-01","vcp-no-uri","unknown","unknown","mast-issue/" (.venv) [32mhohn@gh-hohn [33m~/local/sarif-cli/data/codeql-dataflow-sql-injection/sqlidb-1.1.sarif.scantables[0m
(.venv) [32mhohn@gh-hohn [33m~/local/sarif-cli/data/codeql-dataflow-sql-injection/sqlidb-1.1.sarif.scantables[0m (.venv) [32mhohn@gh-hohn [33m~/local/sarif-cli/data/codeql-dataflow-sql-injection/sqlidb-1.1.sarif.scantables[0m
Nothing is in the output, so trace execution to see where it's dropped
cd ~/local/sarif-cli/notes && ag -l automationDetails ../sarif_cli |cat
../sarif_cli/scan_tables.py ../sarif_cli/signature_single_CLI.py ../sarif_cli/table_joins_CLI.py ../sarif_cli/signature.py (.venv) [32mhohn@gh-hohn [33m~/local/sarif-cli/notes[0m
Trace the call chain
Trace the call chain to one of
../sarif_cli/scan_tables.py ../sarif_cli/table_joins_CLI.py ../sarif_cli/signature.py
Entry is
sarif-extract-scans-runner --input-signature CLI - > /dev/null <<EOF
sqlidb-1.1.sarif
EOF
-
sarif-extract-scans-runner
-
The following will drop into the inserted repls:
cd ~/local/sarif-cli/data/codeql-dataflow-sql-injection sarif-extract-scans \ sqlidb-1.sarif.scanspec \ sqlidb-1.sarif.scantables \ sqlidb-1.sarif.csv \ -f CLI
-
Run using embedded repls
The following will drop into the inserted repls:
cd ~/local/sarif-cli/data/codeql-dataflow-sql-injection
sarif-extract-scans \
sqlidb-1.1.sarif.scanspec \
sqlidb-1.1.sarif.scantables \
sqlidb-1.1.sarif.csv \
-f CLI
The line
.rename(columns={"id": "automationDetails"})
has the right effect:
In [3]: project_df_temp1.T
Out[3]:
0
struct_id_5521 4796854592
$schema https://json.schemastore.org/sarif-2.1.0.json
version_5521 2.1.0
value_index_1273 0
artifacts 4797197888
columnKind utf16CodeUnits
newlineSequences 4797197568
properties 4797244480
results 4797198208
tool 4797244672
versionControlProvenance 4797218944
automationDetails mast-issue/
The line
extra = b.project.automationDetails[0]
also works:
In [1]: extra
Out[1]: 'mast-issue/'
but
extra
is only used in
e.project_id = hash.hash_unique((repoUri+extra).encode())
when
In [5]: "repositoryUri" in b.project
Out[5]: True
For reference:
In [8]: b.project.automationDetails
Out[8]:
0 mast-issue/
Name: automationDetails, dtype: object
This is in joins_for_projects, called from
scantabs.projects = st.joins_for_projects(bt, external_info)
Add
"automationDetails" : extra,
to the
# Projects table
And repeat the Check if automationDetails or its value is in output Still missing. Must be dropped between dataframe creation and output.
Use project_name to search.
class ScanTablesTypes:
has no entry for
automationDetails
Add
"automationDetails" : pd.StringDtype(),
Similar for
File: sarif_cli/columns.py
And repeat Run using embedded repls, then Check if automationDetails or its value is in output
SARIF and Signatures
‘signature’ here is e.g., struct_graph_LGTM in ./sarif_cli/signature_single.py
The signatures are those produced by codeql in the past. They are not meant to be updated frequently; they arose and are used as follows.
- The SARIF standard is quite loose, with many optional fields.
- For producing CSV tabular output (and for internal table processing), the sarif-cli tool needed an exact signature. Using existing SARIF files was a straightforward way to get a signature.
- When a SARIF file contains extra keys, a warning is issued but processing continues.
- When a sarif file is missing an entry that’s in the signature, a fatal error is issued.
The only time you need to update the signature is when you get fatal errors — there will be a detailed message about expected vs. found fields.