Add 'Repository Selection'
This commit is contained in:
committed by
=Michael Hohn
parent
f60b55f181
commit
195dda9fd7
372
notes/cli-end-to-end-demo.org
Normal file
372
notes/cli-end-to-end-demo.org
Normal file
@@ -0,0 +1,372 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
* End-to-end example of CLI use
|
||||
This document describes a complete cycle of the MRVA workflow. The steps
|
||||
included are
|
||||
1. aquiring CodeQL databases
|
||||
2. selection of databases
|
||||
3. configuration and use of the command-line client
|
||||
4. server startup
|
||||
5. submission of the jobs
|
||||
6. retrieval of the results
|
||||
7. examination of the results
|
||||
|
||||
* Database Aquisition
|
||||
General database aquisition is beyond the scope of this document as it is very specific
|
||||
to an organization's environment.
|
||||
|
||||
For this demo, the data is preloaded via container. To inspect it:
|
||||
|
||||
#+BEGIN_SRC sh
|
||||
# On host, run
|
||||
docker exec -it dbstore /bin/bash
|
||||
|
||||
# In the container
|
||||
ls -la /data/dbstore-data/
|
||||
ls /data/dbstore-data/qldb/ | wc -l
|
||||
#+END_SRC
|
||||
Here we use a small sample of an example for open-source
|
||||
repositories, 23 in all.
|
||||
|
||||
* Repository Selection
|
||||
When using all of the MRVA system, we select a small subset of repositories
|
||||
available to you in [[*Database Aquisition][Database Aquisition]]. For this demo we include a small
|
||||
collection -- 23 repositories -- and here we further narrow the selection to 12.
|
||||
|
||||
The full list
|
||||
#+BEGIN_SRC text
|
||||
ls -1 /data/dbstore-data/qldb/
|
||||
'BoomingTech$Piccoloctsj6d7177.zip'
|
||||
'KhronosGroup$OpenXR-SDKctsj984ee6.zip'
|
||||
'OpenRCT2$OpenRCT2ctsj975d7c.zip'
|
||||
'StanfordLegion$legionctsj39cbe4.zip'
|
||||
'USCiLab$cerealctsj264953.zip'
|
||||
'WinMerge$winmergectsj101305.zip'
|
||||
'draios$sysdigctsj12c02d.zip'
|
||||
'gildor2$UEViewerctsjfefdd8.zip'
|
||||
'git-for-windows$gitctsjb7c2bd.zip'
|
||||
'google$orbitctsj9bbeaf.zip'
|
||||
'libfuse$libfusectsj7a66a4.zip'
|
||||
'luigirizzo$netmapctsj6417fa.zip'
|
||||
'mawww$kakounectsjc54fab.zip'
|
||||
'microsoft$node-native-keymapctsj4cc9a2.zip'
|
||||
'nem0$LumixEnginectsjfab756.zip'
|
||||
'pocoproject$pococtsj26b932.zip'
|
||||
'quickfix$quickfixctsjebfd13.zip'
|
||||
'rui314$moldctsjfec16a.zip'
|
||||
'swig$swigctsj78bcd3.zip'
|
||||
'tdlib$telegram-bot-apictsj8529d9.zip'
|
||||
'timescale$timescaledbctsjf617cf.zip'
|
||||
'xoreaxeaxeax$movfuscatorctsj8f7e5b.zip'
|
||||
'xrootd$xrootdctsje4b745.zip'
|
||||
#+END_SRC
|
||||
|
||||
The selection of 12 repositories, from an initial collection of 6000 was made
|
||||
using a collection of Python/pandas scripts made for the purpose, the [[https://github.com/hohn/mrvacommander/blob/hohn-0.1.21.2-improve-structure-and-docs/client/qldbtools/README.md#installation][qldbtools]]
|
||||
package. The resulting selection, in the format expected by the VS Code
|
||||
extension, follows.
|
||||
#+BEGIN_SRC text
|
||||
cat /data/qldbtools/scratch/vscode-selection.json
|
||||
{
|
||||
"version": 1,
|
||||
"databases": {
|
||||
"variantAnalysis": {
|
||||
"repositoryLists": [
|
||||
{
|
||||
"name": "mirva-list",
|
||||
"repositories": [
|
||||
"xoreaxeaxeax/movfuscatorctsj8f7e5b",
|
||||
"microsoft/node-native-keymapctsj4cc9a2",
|
||||
"BoomingTech/Piccoloctsj6d7177",
|
||||
"USCiLab/cerealctsj264953",
|
||||
"KhronosGroup/OpenXR-SDKctsj984ee6",
|
||||
"tdlib/telegram-bot-apictsj8529d9",
|
||||
"WinMerge/winmergectsj101305",
|
||||
"timescale/timescaledbctsjf617cf",
|
||||
"pocoproject/pococtsj26b932",
|
||||
"quickfix/quickfixctsjebfd13",
|
||||
"libfuse/libfusectsj7a66a4"
|
||||
]
|
||||
}
|
||||
],
|
||||
"owners": [],
|
||||
"repositories": []
|
||||
}
|
||||
},
|
||||
"selected": {
|
||||
"kind": "variantAnalysisUserDefinedList",
|
||||
"listName": "mirva-list"
|
||||
}
|
||||
#+END_SRC
|
||||
|
||||
This selection is deceptively simple. For a full explanation, see [[file:cli-end-to-end-detailed.org::*Repository Selection][Repository
|
||||
Selection]] in the detailed version of this document.
|
||||
|
||||
** The meaning of the names
|
||||
This section is optional reading for the demonstration.
|
||||
|
||||
The repository names all end with =ctsj= followed by 6 hex digits like =ctsj4cc9a2=.
|
||||
|
||||
The information critial for selection of databases are the columns
|
||||
1. owner
|
||||
2. name
|
||||
3. language
|
||||
4. "sha"
|
||||
5. "cliVersion"
|
||||
6. "creationTime"
|
||||
|
||||
There are others that may be useful, but they are not strictly required.
|
||||
|
||||
The critical ones deserve more explanation:
|
||||
1. "sha": The =git= commit SHA of the repository the CodeQL database was
|
||||
created from. Required to distinguish query results over the evolution of
|
||||
a code base.
|
||||
2. "cliVersion": The version of the CodeQL CLI used to create the database.
|
||||
Required to identify advances/regressions originating from the CodeQL binary.
|
||||
3. "creationTime": The time the database was created. Required (or at least
|
||||
very handy) for following the evolution of query results over time.
|
||||
|
||||
There is a computed column, CID. The CID column combines
|
||||
- cliVersion
|
||||
- creationTime
|
||||
- language
|
||||
- sha
|
||||
into a single 6-character string via hashing. Together with (owner, repo) it
|
||||
provides a unique index for every DB.
|
||||
|
||||
|
||||
For this document, we simply use a pseudo-random selection of 11 databases via
|
||||
#+BEGIN_SRC sh
|
||||
./bin/mc-db-generate-selection -n 11 \
|
||||
scratch/vscode-selection.json \
|
||||
scratch/gh-mrva-selection.json \
|
||||
< scratch/db-info-3.csv
|
||||
#+END_SRC
|
||||
|
||||
Note that these use pseudo-random numbers, so the selection is in fact
|
||||
deterministic.
|
||||
|
||||
* Starting the server
|
||||
The full instructions for building and running the server are in [[../README.md]] under
|
||||
'Steps to build and run the server'
|
||||
|
||||
With docker-compose set up and this repository cloned as previously described,
|
||||
we just run
|
||||
#+BEGIN_SRC sh
|
||||
cd ~/work-gh/mrva/mrvacommander
|
||||
docker-compose up --build
|
||||
#+END_SRC
|
||||
and wait until the log output no longer changes.
|
||||
|
||||
Then, use the following command to populate the mrvacommander database storage:
|
||||
#+BEGIN_SRC sh
|
||||
cd ~/work-gh/mrva/mrvacommander/client/qldbtools && \
|
||||
./bin/mc-db-populate-minio -n 11 < scratch/db-info-3.csv
|
||||
#+END_SRC
|
||||
|
||||
* Running the gh-mrva command-line client
|
||||
The first run uses the test query to verify basic functionality, but it returns
|
||||
no results.
|
||||
** Run MRVA from command line
|
||||
1. Install mrva cli
|
||||
#+BEGIN_SRC sh
|
||||
mkdir -p ~/work-gh/mrva && cd ~/work-gh/mrva
|
||||
git clone https://github.com/hohn/gh-mrva.git
|
||||
cd ~/work-gh/mrva/gh-mrva && git checkout mrvacommander-end-to-end
|
||||
|
||||
# Build it
|
||||
go mod edit -replace="github.com/GitHubSecurityLab/gh-mrva=$HOME/work-gh/mrva/gh-mrva"
|
||||
go build .
|
||||
|
||||
# Sanity check
|
||||
./gh-mrva -h
|
||||
#+END_SRC
|
||||
|
||||
2. Set up the configuration
|
||||
#+BEGIN_SRC sh
|
||||
mkdir -p ~/.config/gh-mrva
|
||||
cat > ~/.config/gh-mrva/config.yml <<eof
|
||||
# The following options are supported
|
||||
# codeql_path: Path to CodeQL distribution (checkout of codeql repo)
|
||||
# controller: NWO of the MRVA controller to use. Not used here.
|
||||
# list_file: Path to the JSON file containing the target repos
|
||||
|
||||
# XX:
|
||||
codeql_path: $HOME/work-gh/not-used
|
||||
controller: not-used/mirva-controller
|
||||
list_file: $HOME/work-gh/mrva/gh-mrva/gh-mrva-selection.json
|
||||
eof
|
||||
#+END_SRC
|
||||
|
||||
3. Submit the mrva job
|
||||
#+BEGIN_SRC sh
|
||||
cp ~/work-gh/mrva/mrvacommander/client/qldbtools/scratch/gh-mrva-selection.json \
|
||||
~/work-gh/mrva/gh-mrva/gh-mrva-selection.json
|
||||
|
||||
cd ~/work-gh/mrva/gh-mrva/
|
||||
./gh-mrva submit --language cpp --session mirva-session-1360 \
|
||||
--list mirva-list \
|
||||
--query ~/work-gh/mrva/gh-mrva/FlatBuffersFunc.ql
|
||||
#+END_SRC
|
||||
|
||||
4. Check the status
|
||||
#+BEGIN_SRC sh
|
||||
cd ~/work-gh/mrva/gh-mrva/
|
||||
|
||||
# Check the status
|
||||
./gh-mrva status --session mirva-session-1360
|
||||
#+END_SRC
|
||||
|
||||
5. Download the sarif files, optionally also get databases. For the current
|
||||
query / database combination there are zero result hence no downloads.
|
||||
#+BEGIN_SRC sh
|
||||
cd ~/work-gh/mrva/gh-mrva/
|
||||
# Just download the sarif files
|
||||
./gh-mrva download --session mirva-session-1360 \
|
||||
--output-dir mirva-session-1360
|
||||
|
||||
# Download the sarif files and CodeQL dbs
|
||||
./gh-mrva download --session mirva-session-1360 \
|
||||
--download-dbs \
|
||||
--output-dir mirva-session-1360
|
||||
#+END_SRC
|
||||
|
||||
** Write query that has some results
|
||||
First, get the list of paths corresponding to the previously selected
|
||||
databases.
|
||||
#+BEGIN_SRC sh
|
||||
cd ~/work-gh/mrva/mrvacommander/client/qldbtools
|
||||
./bin/mc-rows-from-mrva-list scratch/gh-mrva-selection.json \
|
||||
scratch/db-info-3.csv > scratch/selection-full-info
|
||||
csvcut -c path scratch/selection-full-info
|
||||
#+END_SRC
|
||||
|
||||
Use one of these databases to write a query. It need not produce results.
|
||||
#+BEGIN_SRC sh
|
||||
cd ~/work-gh/mrva/gh-mrva/
|
||||
code gh-mrva.code-workspace
|
||||
#+END_SRC
|
||||
In this case, the trivial =findPrintf=:
|
||||
#+BEGIN_SRC java
|
||||
/**
|
||||
,* @name findPrintf
|
||||
,* @description find calls to plain fprintf
|
||||
,* @kind problem
|
||||
,* @id cpp-fprintf-call
|
||||
,* @problem.severity warning
|
||||
,*/
|
||||
|
||||
import cpp
|
||||
|
||||
from FunctionCall fc
|
||||
where
|
||||
fc.getTarget().getName() = "fprintf"
|
||||
select fc, "call of fprintf"
|
||||
#+END_SRC
|
||||
|
||||
|
||||
Repeat the submit steps with this query
|
||||
1. --
|
||||
2. --
|
||||
3. Submit the mrva job
|
||||
#+BEGIN_SRC sh
|
||||
cp ~/work-gh/mrva/mrvacommander/client/qldbtools/scratch/gh-mrva-selection.json \
|
||||
~/work-gh/mrva/gh-mrva/gh-mrva-selection.json
|
||||
|
||||
cd ~/work-gh/mrva/gh-mrva/
|
||||
./gh-mrva submit --language cpp --session mirva-session-1480 \
|
||||
--list mirva-list \
|
||||
--query ~/work-gh/mrva/gh-mrva/Fprintf.ql
|
||||
#+END_SRC
|
||||
4. Check the status
|
||||
#+BEGIN_SRC sh
|
||||
cd ~/work-gh/mrva/gh-mrva/
|
||||
./gh-mrva status --session mirva-session-1480
|
||||
#+END_SRC
|
||||
|
||||
This time we have results
|
||||
#+BEGIN_SRC text
|
||||
...
|
||||
Run name: mirva-session-1480
|
||||
Status: succeeded
|
||||
Total runs: 1
|
||||
Total successful scans: 11
|
||||
Total failed scans: 0
|
||||
Total skipped repositories: 0
|
||||
Total skipped repositories due to access mismatch: 0
|
||||
Total skipped repositories due to not found: 0
|
||||
Total skipped repositories due to no database: 0
|
||||
Total skipped repositories due to over limit: 0
|
||||
Total repositories with findings: 7
|
||||
Total findings: 618
|
||||
Repositories with findings:
|
||||
quickfix/quickfixctsjebfd13 (cpp-fprintf-call): 5
|
||||
libfuse/libfusectsj7a66a4 (cpp-fprintf-call): 146
|
||||
xoreaxeaxeax/movfuscatorctsj8f7e5b (cpp-fprintf-call): 80
|
||||
pocoproject/pococtsj26b932 (cpp-fprintf-call): 17
|
||||
BoomingTech/Piccoloctsj6d7177 (cpp-fprintf-call): 10
|
||||
tdlib/telegram-bot-apictsj8529d9 (cpp-fprintf-call): 247
|
||||
WinMerge/winmergectsj101305 (cpp-fprintf-call): 113
|
||||
#+END_SRC
|
||||
5. Download the sarif files, optionally also get databases.
|
||||
#+BEGIN_SRC sh
|
||||
cd ~/work-gh/mrva/gh-mrva/
|
||||
# Just download the sarif files
|
||||
./gh-mrva download --session mirva-session-1480 \
|
||||
--output-dir mirva-session-1480
|
||||
|
||||
# Download the sarif files and CodeQL dbs
|
||||
./gh-mrva download --session mirva-session-1480 \
|
||||
--download-dbs \
|
||||
--output-dir mirva-session-1480
|
||||
|
||||
# And list them:
|
||||
\ls -la *1480*
|
||||
-rwxr-xr-x@ 1 hohn staff 1915857 Aug 16 14:10 BoomingTech_Piccoloctsj6d7177_1.sarif
|
||||
drwxr-xr-x@ 3 hohn staff 96 Aug 16 14:15 BoomingTech_Piccoloctsj6d7177_1_db
|
||||
-rwxr-xr-x@ 1 hohn staff 89857056 Aug 16 14:11 BoomingTech_Piccoloctsj6d7177_1_db.zip
|
||||
-rwxr-xr-x@ 1 hohn staff 3105663 Aug 16 14:10 WinMerge_winmergectsj101305_1.sarif
|
||||
-rwxr-xr-x@ 1 hohn staff 227812131 Aug 16 14:12 WinMerge_winmergectsj101305_1_db.zip
|
||||
-rwxr-xr-x@ 1 hohn staff 193976 Aug 16 14:10 libfuse_libfusectsj7a66a4_1.sarif
|
||||
-rwxr-xr-x@ 1 hohn staff 12930693 Aug 16 14:10 libfuse_libfusectsj7a66a4_1_db.zip
|
||||
-rwxr-xr-x@ 1 hohn staff 1240694 Aug 16 14:10 pocoproject_pococtsj26b932_1.sarif
|
||||
-rwxr-xr-x@ 1 hohn staff 158924920 Aug 16 14:12 pocoproject_pococtsj26b932_1_db.zip
|
||||
-rwxr-xr-x@ 1 hohn staff 888494 Aug 16 14:10 quickfix_quickfixctsjebfd13_1.sarif
|
||||
-rwxr-xr-x@ 1 hohn staff 75023303 Aug 16 14:11 quickfix_quickfixctsjebfd13_1_db.zip
|
||||
-rwxr-xr-x@ 1 hohn staff 1487363 Aug 16 14:10 tdlib_telegram-bot-apictsj8529d9_1.sarif
|
||||
-rwxr-xr-x@ 1 hohn staff 373477635 Aug 16 14:14 tdlib_telegram-bot-apictsj8529d9_1_db.zip
|
||||
-rwxr-xr-x@ 1 hohn staff 103657 Aug 16 14:10 xoreaxeaxeax_movfuscatorctsj8f7e5b_1.sarif
|
||||
-rwxr-xr-x@ 1 hohn staff 9464225 Aug 16 14:10 xoreaxeaxeax_movfuscatorctsj8f7e5b_1_db.zip
|
||||
#+END_SRC
|
||||
|
||||
6. Use the [[https://marketplace.visualstudio.com/items?itemName=MS-SarifVSCode.sarif-viewer][SARIF Viewer]] plugin in VS Code to open and review the results.
|
||||
|
||||
Prepare the source directory so the viewer can be pointed at it
|
||||
#+BEGIN_SRC sh
|
||||
cd ~/work-gh/mrva/gh-mrva/mirva-session-1480
|
||||
|
||||
unzip -qd BoomingTech_Piccoloctsj6d7177_1_db BoomingTech_Piccoloctsj6d7177_1_db.zip
|
||||
|
||||
cd BoomingTech_Piccoloctsj6d7177_1_db/codeql_db/
|
||||
unzip -qd src src.zip
|
||||
#+END_SRC
|
||||
|
||||
Use the viewer
|
||||
#+BEGIN_SRC sh
|
||||
code BoomingTech_Piccoloctsj6d7177_1.sarif
|
||||
|
||||
# For lauxlib.c, point the source viewer to
|
||||
find ~/work-gh/mrva/gh-mrva/mirva-session-1480/BoomingTech_Piccoloctsj6d7177_1_db/codeql_db/src/home/runner/work/bulk-builder/bulk-builder -name lauxlib.c
|
||||
|
||||
# Here: ~/work-gh/mrva/gh-mrva/mirva-session-1480/BoomingTech_Piccoloctsj6d7177_1_db/codeql_db/src/home/runner/work/bulk-builder/bulk-builder/engine/3rdparty/lua-5.4.4/lauxlib.c
|
||||
#+END_SRC
|
||||
|
||||
7. (optional) Large result sets are more easily filtered via
|
||||
dataframes or spreadsheets. Convert the SARIF to CSV if needed; see [[https://github.com/hohn/sarif-cli/][sarif-cli]].
|
||||
|
||||
|
||||
|
||||
|
||||
* Footnotes
|
||||
[fn:1]The =csvkit= can be installed into the same Python virtual environment as
|
||||
the =qldbtools=.
|
||||
Reference in New Issue
Block a user