16 KiB
- End-to-end example of CLI use
- Database Aquisition
- Repository Selection
- Starting the server
- Running the gh-mrva command-line client
- Footnotes
End-to-end example of CLI use
This document describes a complete cycle of the MRVA workflow. The steps included are
- aquiring CodeQL databases
- selection of databases
- configuration and use of the command-line client
- server startup
- submission of the jobs
- retrieval of the results
- examination of the results
Database Aquisition
General database aquisition is beyond the scope of this document as it is very specific to an organization's environment.
For this demo, the data is preloaded via container. To inspect it:
# On host, run
docker exec -it dbstore /bin/bash
# In the container
ls -la /data/dbstore-data/
ls /data/dbstore-data/qldb/ | wc -l
Here we use a small sample of an example for open-source repositories, 23 in all.
Repository Selection
When using all of the MRVA system, we select a small subset of repositories available to you in Database Aquisition. For this demo we include a small collection – 23 repositories – and here we further narrow the selection to 12.
The full list
ls -1 /data/dbstore-data/qldb/
'BoomingTech$Piccoloctsj6d7177.zip'
'KhronosGroup$OpenXR-SDKctsj984ee6.zip'
'OpenRCT2$OpenRCT2ctsj975d7c.zip'
'StanfordLegion$legionctsj39cbe4.zip'
'USCiLab$cerealctsj264953.zip'
'WinMerge$winmergectsj101305.zip'
'draios$sysdigctsj12c02d.zip'
'gildor2$UEViewerctsjfefdd8.zip'
'git-for-windows$gitctsjb7c2bd.zip'
'google$orbitctsj9bbeaf.zip'
'libfuse$libfusectsj7a66a4.zip'
'luigirizzo$netmapctsj6417fa.zip'
'mawww$kakounectsjc54fab.zip'
'microsoft$node-native-keymapctsj4cc9a2.zip'
'nem0$LumixEnginectsjfab756.zip'
'pocoproject$pococtsj26b932.zip'
'quickfix$quickfixctsjebfd13.zip'
'rui314$moldctsjfec16a.zip'
'swig$swigctsj78bcd3.zip'
'tdlib$telegram-bot-apictsj8529d9.zip'
'timescale$timescaledbctsjf617cf.zip'
'xoreaxeaxeax$movfuscatorctsj8f7e5b.zip'
'xrootd$xrootdctsje4b745.zip'
The selection of 12 repositories, from an initial collection of 6000 was made using a collection of Python/pandas scripts made for the purpose, the qldbtools package. The resulting selection, in the format expected by the VS Code extension, follows.
cat /data/qldbtools/scratch/vscode-selection.json
{
"version": 1,
"databases": {
"variantAnalysis": {
"repositoryLists": [
{
"name": "mirva-list",
"repositories": [
"xoreaxeaxeax/movfuscatorctsj8f7e5b",
"microsoft/node-native-keymapctsj4cc9a2",
"BoomingTech/Piccoloctsj6d7177",
"USCiLab/cerealctsj264953",
"KhronosGroup/OpenXR-SDKctsj984ee6",
"tdlib/telegram-bot-apictsj8529d9",
"WinMerge/winmergectsj101305",
"timescale/timescaledbctsjf617cf",
"pocoproject/pococtsj26b932",
"quickfix/quickfixctsjebfd13",
"libfuse/libfusectsj7a66a4"
]
}
],
"owners": [],
"repositories": []
}
},
"selected": {
"kind": "variantAnalysisUserDefinedList",
"listName": "mirva-list"
}
This selection is deceptively simple. For a full explanation, see Repository Selection in the detailed version of this document.
Optional: The meaning of the names
The repository names all end with ctsj followed by 6 hex digits like
ctsj4cc9a2.
The information critial for selection of databases are the columns
- owner
- name
- language
- "sha"
- "cliVersion"
- "creationTime"
There are others that may be useful, but they are not strictly required.
The critical ones deserve more explanation:
- "sha": The
gitcommit SHA of the repository the CodeQL database was created from. Required to distinguish query results over the evolution of a code base. - "cliVersion": The version of the CodeQL CLI used to create the database. Required to identify advances/regressions originating from the CodeQL binary.
- "creationTime": The time the database was created. Required (or at least very handy) for following the evolution of query results over time.
There is a computed column, CID. The CID column combines
- cliVersion
- creationTime
- language
- sha
into a single 6-character string via hashing. Together with (owner, repo) it provides a unique index for every DB.
For this document, we simply use a pseudo-random selection of 11 databases via
./bin/mc-db-generate-selection -n 11 \
scratch/vscode-selection.json \
scratch/gh-mrva-selection.json \
< scratch/db-info-3.csv
Note that these use pseudo-random numbers, so the selection is in fact deterministic.
Starting the server
Clone the full repository before continuing:
mkdir -p ~/work-gh/mrva/
git clone git@github.com:hohn/mrvacommander.git
Make sure Docker is installed and running. With docker-compose set up and this repository cloned, we just run
cd ~/work-gh/mrva/mrvacommander
docker-compose -f docker-compose-demo.yml up -d
and wait until the log output no longer changes. Should look like
docker-compose -f docker-compose-demo.yml up -d
[+] Running 27/6
✔ dbstore Pulled 1.1s
✔ artifactstore Pulled 1.1s
✔ mrvadata 3 layers [⣿⣿⣿] 0B/0B Pulled 263.8s
✔ server 2 layers [⣿⣿] 0B/0B Pulled 25.2s
✔ agent 5 layers [⣿⣿⣿⣿⣿] 0B/0B Pulled 24.9s
✔ client-qldbtools 11 layers [⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿] 0B/0B Pulled 20.8s
[+] Running 9/9
✔ Container mrvadata Started 0.3s
✔ Container mrvacommander-client-qldbtools-1 Started 0.3s
✔ Container mrvacommander-client-ghmrva-1 Running 0.0s
✔ Container mrvacommander-code-server-1 Running 0.0s
✔ Container artifactstore Running 0.0s
✔ Container rabbitmq Running 0.0s
✔ Container dbstore Started 0.4s
✔ Container agent Started 0.5s
✔ Container server Started 0.5s
The content is prepopulated in the dbstore container.
Optional: Inspect the Backing Store
As completely optional step, you can inspect the backing store:
docker exec -it dbstore /bin/bash
ls /data/qldb/
# 'BoomingTech$Piccoloctsj6d7177.zip' 'mawww$kakounectsjc54fab.zip'
# 'KhronosGroup$OpenXR-SDKctsj984ee6.zip' 'microsoft$node-native-keymapctsj4cc9a2.zip'
# ...
Optional: Inspect the MinIO DB
Another completely optional step, you can inspect the minio DB contents if you have the minio cli installed:
# Configuration
MINIO_ALIAS="qldbminio"
MINIO_URL="http://localhost:9000"
MINIO_ROOT_USER="user"
MINIO_ROOT_PASSWORD="mmusty8432"
QL_DB_BUCKET_NAME="qldb"
# Check for MinIO client
if ! command -v mc &> /dev/null
then
echo "MinIO client (mc) not found."
fi
# Configure MinIO client
mc alias set $MINIO_ALIAS $MINIO_URL $MINIO_ROOT_USER $MINIO_ROOT_PASSWORD
# Show contents
mc ls qldbminio/qldb
Running the gh-mrva command-line client
The first run uses the test query to verify basic functionality, but it returns no results.
XX: mrvacommander-client-ghmrva-1
Run MRVA from command line
-
Install mrva cli
mkdir -p ~/work-gh/mrva && cd ~/work-gh/mrva git clone https://github.com/hohn/gh-mrva.git cd ~/work-gh/mrva/gh-mrva && git checkout mrvacommander-end-to-end # Build it go mod edit -replace="github.com/GitHubSecurityLab/gh-mrva=$HOME/work-gh/mrva/gh-mrva" go build . # Sanity check ./gh-mrva -h -
Set up the configuration
mkdir -p ~/.config/gh-mrva cat > ~/.config/gh-mrva/config.yml <<eof # The following options are supported # codeql_path: Path to CodeQL distribution (checkout of codeql repo) # controller: NWO of the MRVA controller to use. Not used here. # list_file: Path to the JSON file containing the target repos # XX: codeql_path: $HOME/work-gh/not-used controller: not-used/mirva-controller list_file: $HOME/work-gh/mrva/gh-mrva/gh-mrva-selection.json eof -
Submit the mrva job
cp ~/work-gh/mrva/mrvacommander/client/qldbtools/scratch/gh-mrva-selection.json \ ~/work-gh/mrva/gh-mrva/gh-mrva-selection.json cd ~/work-gh/mrva/gh-mrva/ ./gh-mrva submit --language cpp --session mirva-session-1360 \ --list mirva-list \ --query ~/work-gh/mrva/gh-mrva/FlatBuffersFunc.ql -
Check the status
cd ~/work-gh/mrva/gh-mrva/ # Check the status ./gh-mrva status --session mirva-session-1360 -
Download the sarif files, optionally also get databases. For the current query / database combination there are zero result hence no downloads.
cd ~/work-gh/mrva/gh-mrva/ # Just download the sarif files ./gh-mrva download --session mirva-session-1360 \ --output-dir mirva-session-1360 # Download the sarif files and CodeQL dbs ./gh-mrva download --session mirva-session-1360 \ --download-dbs \ --output-dir mirva-session-1360
Write query that has some results
First, get the list of paths corresponding to the previously selected databases.
cd ~/work-gh/mrva/mrvacommander/client/qldbtools
./bin/mc-rows-from-mrva-list scratch/gh-mrva-selection.json \
scratch/db-info-3.csv > scratch/selection-full-info
csvcut -c path scratch/selection-full-info
Use one of these databases to write a query. It need not produce results.
cd ~/work-gh/mrva/gh-mrva/
code gh-mrva.code-workspace
In this case, the trivial findPrintf:
/**
,* @name findPrintf
,* @description find calls to plain fprintf
,* @kind problem
,* @id cpp-fprintf-call
,* @problem.severity warning
,*/
import cpp
from FunctionCall fc
where
fc.getTarget().getName() = "fprintf"
select fc, "call of fprintf"
Repeat the submit steps with this query
- –
- –
-
Submit the mrva job
cp ~/work-gh/mrva/mrvacommander/client/qldbtools/scratch/gh-mrva-selection.json \ ~/work-gh/mrva/gh-mrva/gh-mrva-selection.json cd ~/work-gh/mrva/gh-mrva/ ./gh-mrva submit --language cpp --session mirva-session-1480 \ --list mirva-list \ --query ~/work-gh/mrva/gh-mrva/Fprintf.ql -
Check the status
cd ~/work-gh/mrva/gh-mrva/ ./gh-mrva status --session mirva-session-1480This time we have results
... Run name: mirva-session-1480 Status: succeeded Total runs: 1 Total successful scans: 11 Total failed scans: 0 Total skipped repositories: 0 Total skipped repositories due to access mismatch: 0 Total skipped repositories due to not found: 0 Total skipped repositories due to no database: 0 Total skipped repositories due to over limit: 0 Total repositories with findings: 7 Total findings: 618 Repositories with findings: quickfix/quickfixctsjebfd13 (cpp-fprintf-call): 5 libfuse/libfusectsj7a66a4 (cpp-fprintf-call): 146 xoreaxeaxeax/movfuscatorctsj8f7e5b (cpp-fprintf-call): 80 pocoproject/pococtsj26b932 (cpp-fprintf-call): 17 BoomingTech/Piccoloctsj6d7177 (cpp-fprintf-call): 10 tdlib/telegram-bot-apictsj8529d9 (cpp-fprintf-call): 247 WinMerge/winmergectsj101305 (cpp-fprintf-call): 113 -
Download the sarif files, optionally also get databases.
cd ~/work-gh/mrva/gh-mrva/ # Just download the sarif files ./gh-mrva download --session mirva-session-1480 \ --output-dir mirva-session-1480 # Download the sarif files and CodeQL dbs ./gh-mrva download --session mirva-session-1480 \ --download-dbs \ --output-dir mirva-session-1480 # And list them: \ls -la *1480* -rwxr-xr-x@ 1 hohn staff 1915857 Aug 16 14:10 BoomingTech_Piccoloctsj6d7177_1.sarif drwxr-xr-x@ 3 hohn staff 96 Aug 16 14:15 BoomingTech_Piccoloctsj6d7177_1_db -rwxr-xr-x@ 1 hohn staff 89857056 Aug 16 14:11 BoomingTech_Piccoloctsj6d7177_1_db.zip -rwxr-xr-x@ 1 hohn staff 3105663 Aug 16 14:10 WinMerge_winmergectsj101305_1.sarif -rwxr-xr-x@ 1 hohn staff 227812131 Aug 16 14:12 WinMerge_winmergectsj101305_1_db.zip -rwxr-xr-x@ 1 hohn staff 193976 Aug 16 14:10 libfuse_libfusectsj7a66a4_1.sarif -rwxr-xr-x@ 1 hohn staff 12930693 Aug 16 14:10 libfuse_libfusectsj7a66a4_1_db.zip -rwxr-xr-x@ 1 hohn staff 1240694 Aug 16 14:10 pocoproject_pococtsj26b932_1.sarif -rwxr-xr-x@ 1 hohn staff 158924920 Aug 16 14:12 pocoproject_pococtsj26b932_1_db.zip -rwxr-xr-x@ 1 hohn staff 888494 Aug 16 14:10 quickfix_quickfixctsjebfd13_1.sarif -rwxr-xr-x@ 1 hohn staff 75023303 Aug 16 14:11 quickfix_quickfixctsjebfd13_1_db.zip -rwxr-xr-x@ 1 hohn staff 1487363 Aug 16 14:10 tdlib_telegram-bot-apictsj8529d9_1.sarif -rwxr-xr-x@ 1 hohn staff 373477635 Aug 16 14:14 tdlib_telegram-bot-apictsj8529d9_1_db.zip -rwxr-xr-x@ 1 hohn staff 103657 Aug 16 14:10 xoreaxeaxeax_movfuscatorctsj8f7e5b_1.sarif -rwxr-xr-x@ 1 hohn staff 9464225 Aug 16 14:10 xoreaxeaxeax_movfuscatorctsj8f7e5b_1_db.zip -
Use the SARIF Viewer plugin in VS Code to open and review the results.
Prepare the source directory so the viewer can be pointed at it
cd ~/work-gh/mrva/gh-mrva/mirva-session-1480 unzip -qd BoomingTech_Piccoloctsj6d7177_1_db BoomingTech_Piccoloctsj6d7177_1_db.zip cd BoomingTech_Piccoloctsj6d7177_1_db/codeql_db/ unzip -qd src src.zipUse the viewer
code BoomingTech_Piccoloctsj6d7177_1.sarif # For lauxlib.c, point the source viewer to find ~/work-gh/mrva/gh-mrva/mirva-session-1480/BoomingTech_Piccoloctsj6d7177_1_db/codeql_db/src/home/runner/work/bulk-builder/bulk-builder -name lauxlib.c # Here: ~/work-gh/mrva/gh-mrva/mirva-session-1480/BoomingTech_Piccoloctsj6d7177_1_db/codeql_db/src/home/runner/work/bulk-builder/bulk-builder/engine/3rdparty/lua-5.4.4/lauxlib.c - (optional) Large result sets are more easily filtered via dataframes or spreadsheets. Convert the SARIF to CSV if needed; see sarif-cli.
Footnotes
1The csvkit can be installed into the same Python virtual environment as
the qldbtools.