The problem was missing fields in the SARIF output. After the debugging
below, the cause were conversions from JSON to Go to JSON; the Go ->
JSON conversion only output the fields defined in the Go struct.
Because SARIF has so many optional fields, no attempt is made to enforce
a statically-defined structure. Instead, the JSON -> Go conversion is
now to a fully dynamic structure; unused fields are simply passed through
Debugging:
Comparing two SARIF files shows
{
"$schema" : "https://json.schemastore.org/sarif-2.1.0.json",
"version" : "2.1.0",
"runs" : [ {...
} ]
}
and
{
"runs": [...
]
}
so there are missing fields.
The Problem
1. Problem origin
// Modify the sarif: start by extracting
var sarif Sarif
if err := json.Unmarshal(sarifData, &sarif); err != nil {
return nil, fmt.Errorf("failed to unmarshal SARIF: %v", err)
}
...
// now inject version control info
...
// and write it back
sarifBytes, err := json.Marshal(sarif)
if err != nil {
return nil, fmt.Errorf("failed to marshal SARIF: %v", err)
}
2. But the struct only has one of the needed fields
type Sarif struct {
Runs []SarifRun `json:"runs"`
}
3. From the docs:
// To unmarshal JSON into a struct, Unmarshal matches incoming object
// keys to the keys used by [Marshal] (either the struct field name or its tag),
// preferring an exact match but also accepting a case-insensitive match. By
// default, object keys which don't have a corresponding struct field are
// ignored (see [Decoder.DisallowUnknownFields] for an alternative).
End-to-end testing contained an unhandled CodeQL database download
request. The handlers are added in this patch. Debugging info is below
for reference.
The mrvacommander *server* fails with the following. The source code is
: func setupEndpoints(c CommanderAPI)
See mrvacommander/pkg/server/server.go, endpoints for getting a URL to download artifacts.
Original
Downloading artifacts for tdlib_telegram-bot-apictsj8529d9_2
...
Downloading database tdlib/telegram-bot-apictsj8529d9 cpp mirva-session-1400 tdlib_telegram-bot-apictsj8529d9_2
...
2024/08/13 12:31:38 >> GET http://localhost:8080/repos/tdlib/telegram-bot-apictsj8529d9/code-scanning/codeql/databases/cpp
...
2024/08/13 12:31:38 << 404 http://localhost:8080/repos/tdlib/telegram-bot-apictsj8529d9/code-scanning/codeql/databases/cpp
...
-rwxr-xr-x@ 1 hohn staff 169488 Aug 13 12:29 tdlib_telegram-bot-apictsj8529d9_2.sarif*
-rwxr-xr-x@ 1 hohn staff 10 Aug 13 12:31 tdlib_telegram-bot-apictsj8529d9_2_db.zip*
Server log
server | 2024/08/13 19:31:38 ERROR Unhandled endpoint method=GET uri=/repos/tdlib/telegram-bot-apictsj8529d9/code-scanning/codeql/databases/cpp
Try a manual download from the server
8:$ wget http://localhost:8080/repos/tdlib/telegram-bot-apictsj8529d9/code-scanning/codeql/databases/cpp
--2024-08-13 12:56:05-- http://localhost:8080/repos/tdlib/telegram-bot-apictsj8529d9/code-scanning/codeql/databases/cpp
Resolving localhost (localhost)... ::1, 127.0.0.1
Connecting to localhost (localhost)|::1|:8080... connected.
HTTP request sent, awaiting response... 404 Not Found
2024-08-13 12:56:05 ERROR 404: Not Found.
server | 2024/08/13 19:56:05 ERROR Unhandled endpoint method=GET uri=/repos/tdlib/telegram-bot-apictsj8529d9/code-scanning/codeql/databases/cpp
The full info for the DB
tdlib,telegram-bot-api,8529d9,2.17.0,2024-05-09 08:02:49.545174+00:00,cpp,f95d406da67adb8ac13d9c562291aa57c65398e0,306106.0,/Users/hohn/work-gh/mrva/mrva-open-source-download/repos-2024-04-29/tdlib/telegram-bot-api/code-scanning/codeql/databases/cpp/db.zip,cpp,C/C++,1244.0,306106.0,2024-05-13T15:54:54.749093,cpp,True,3375,373477635
The gh-mrva *client* sends the following. The source is
gh-mrva/utils/utils.go,
client.Get(fmt.Sprintf("http://localhost:8080/repos/%s/code-scanning/codeql/databases/%s", task.Nwo, task.Language))
We have
cd /Users/hohn/work-gh/mrva/gh-mrva
0:$ rg 'repos/.*/code-scanning/codeql/databases'
...
utils/utils.go
625: // resp, err := client.Get(fmt.Sprintf("https://api.github.com/repos/%s/code-scanning/codeql/databases/%s", task.Nwo, task.Language))
626: resp, err := client.Get(fmt.Sprintf("http://localhost:8080/repos/%s/code-scanning/codeql/databases/%s", task.Nwo, task.Language))
And
resp, err := client.Get(fmt.Sprintf("http://localhost:8080/repos/%s/code-scanning/codeql/databases/%s", task.Nwo, task.Language))
The original DB upload was
cd ~/work-gh/mrva/mrvacommander/client/qldbtools && \
./bin/mc-db-populate-minio -n 11 < scratch/db-info-3.csv
...
2024-08-14 09:29:19 [INFO] Uploaded /Users/hohn/work-gh/mrva/mrva-open-source-download/repos-2024-04-29/tdlib/telegram-bot-api/code-scanning/codeql/databases/cpp/db.zip as tdlib$telegram-bot-apictsj8529d9.zip to bucket qldb
...
This patch fixes the following
- [X] Wrong db metadata path. Fixed via
: globRecursively(databasePath, "codeql-database.yml")
The log output for reference:
agent | 2024/08/09 21:16:40 DEBUG XX:getDataBaseMetadata databasePath=/tmp/ce523549-a217-4b54-a118-7224ce444870/db "Waiting for SIGUSR1 or SIGUSR2..."=<nil>
agent | 2024/08/09 21:16:40 DEBUG XX:getDataBaseMetadata databasePath=/tmp/bc24fe72-b520-4e72-9634-a98d630cb75e/db "Waiting for SIGUSR1 or SIGUSR2..."=<nil>
agent | 2024/08/09 21:16:40 DEBUG Received signal: %s "user defined signal 1"=<nil>
agent | 2024/08/09 21:16:40 DEBUG XX:getDataBaseMetadata databasePath=/tmp/41fcf5cc-e151-4a11-bccc-481d599aa426/db "Waiting for SIGUSR1 or SIGUSR2..."=<nil>
From
func getDatabaseMetadata(databasePath string) (*DatabaseMetadata, error) {
data, err := os.ReadFile(filepath.Join(databasePath, "codeql-database.yml"))
...}
And some inspection:
root@3fa4b8013336:~# find /tmp |grep ql-datab
/tmp/27f09b9f-254f-4ef5-abf5-9a1a2927906b/db/cpp/codeql-database.yml
/tmp/d7e14cd4-8789-4176-81bc-2ac1957ed9fd/db/codeql_db/codeql-database.yml
/tmp/41fcf5cc-e151-4a11-bccc-481d599aa426/db/codeql_db/codeql-database.yml
/tmp/bc24fe72-b520-4e72-9634-a98d630cb75e/db/codeql_db/codeql-database.yml
/tmp/ce523549-a217-4b54-a118-7224ce444870/db/codeql_db/codeql-database.yml
- [X] Wrong db path. Fixed via
: findDBDir(databasePath)
The log output for reference:
agent | 2024/08/09 21:51:09 ERROR Failed to run analysis job error="failed to run analysis: failed to run queries: exit status 2\nOutput: A fatal error occurred: /tmp/91c61e0b-dfd9-4dd3-a3ad-cb77dbc1cbfd/db is not a recognized CodeQL database.\n"
agent | 2024/08/09 21:51:09 INFO Running analysis job job="{Spec:{SessionID:1 NameWithOwner:{Owner:USCiLab Repo:cerealctsj264953}} QueryPackLocation:{Key:1 Bucket:packs} QueryLanguage:cpp}"
agent | 2024/08/09 21:51:09 ERROR Failed to run analysis job error="failed to run analysis: failed to run queries: exit status 2\nOutput: A fatal error occurred: /tmp/1b8ffeba-8ad1-465e-8ec7-36cda449a5f5/db is not a recognized CodeQL database.\n"
...
This is easily confirmed:
root@171b5417e05f:~# /opt/codeql/codeql database upgrade /tmp/7ed27578-d7ea-42e0-902a-effbc4df05f2/
A fatal error occurred: /tmp/7ed27578-d7ea-42e0-902a-effbc4df05f2 is not a recognized CodeQL database.
Another try:
root@171b5417e05f:~# /opt/codeql/codeql database upgrade /tmp/7ed27578-d7ea-42e0-902a-effbc4df05f2/database.zip
A fatal error occurred: Database root /tmp/7ed27578-d7ea-42e0-902a-effbc4df05f2/database.zip is not a directory.
This one is correct:
root@171b5417e05f:~# /opt/codeql/codeql database upgrade /tmp/7ed27578-d7ea-42e0-902a-effbc4df05f2/db/codeql_db
/tmp/7ed27578-d7ea-42e0-902a-effbc4df05f2/db/codeql_db/db-cpp is up to date.
- [X] Wrong database source prefix. Also fixed via
: findDBDir(databasePath)
Similar log entries:
agent | 2024/08/13 15:40:14 ERROR Failed to run analysis job error="failed to run analysis: failed to get source location prefix: failed to resolve database: exit status 2\nOutput: A fatal error occurred: /tmp/da420844-a284-4d82-9470-fa189a5b4ee6/db is not a recognized CodeQL database.\n"
agent | 2024/08/13 15:40:14 INFO Worker stopping due to reduction in worker count
agent | 2024/08/13 15:40:18 ERROR Failed to run analysis job error="failed to run analysis: failed to get source location prefix: failed to resolve database: exit status 2\nOutput: A fatal error occurred: /tmp/eebfc52c-3ecf-490d-bbf4-23c305d6ba18/db is not a recognized CodeQL database.\n"
and
agent | 2024/08/13 15:49:33 ERROR Failed to resolve database err="exit status 2" output="A fatal error occurred: /tmp/b5c4941a-5692-4640-aa79-9810bcab39f4/db is not a recognized CodeQL database.\n"
agent | 2024/08/13 15:49:33 DEBUG XX: RunQuery failed to get source location prefixdatabasePath=/tmp/b5c4941a-5692-4640-aa79-9810bcab39f4/db "Waiting for SIGUSR1 or SIGUSR2..."=<nil>
agent | 2024/08/13 15:49:35 INFO Modifying worker count current=3 new=2
agent | 2024/08/13 15:49:35 ERROR Failed to resolve database err="exit status 2" output="A fatal error occurred: /tmp/eda30582-81a3-4582-8897-65f8904e8501/db is not a recognized CodeQL database.\n"
agent | 2024/08/13 15:49:35 DEBUG XX: RunQuery failed to get source location prefixdatabasePath=/tmp/eda30582-81a3-4582-8897-65f8904e8501/db "Waiting for SIGUSR1 or SIGUSR2..."=<nil>
And this fails
root@51464985499f:~# /opt/codeql/codeql resolve database /tmp/eda30582-81a3-4582-8897-65f8904e8501/db/
A fatal error occurred: /tmp/eda30582-81a3-4582-8897-65f8904e8501/db is not a recognized CodeQL database.
But this works:
root@51464985499f:~# /opt/codeql/codeql resolve database /tmp/eda30582-81a3-4582-8897-65f8904e8501/db/codeql_db/
{
"sourceLocationPrefix" : "/home/runner/work/bulk-builder/bulk-builder",
"columnKind" : "utf8",
"unicodeNewlines" : false,
"sourceArchiveZip" : "/tmp/eda30582-81a3-4582-8897-65f8904e8501/db/codeql_db/src.zip",
"sourceArchiveRoot" : "/tmp/eda30582-81a3-4582-8897-65f8904e8501/db/codeql_db/src",
"datasetFolder" : "/tmp/eda30582-81a3-4582-8897-65f8904e8501/db/codeql_db/db-cpp",
"logsFolder" : "/tmp/eda30582-81a3-4582-8897-65f8904e8501/db/codeql_db/log",
"languages" : [
"cpp"
],
"scratchDir" : "/tmp/eda30582-81a3-4582-8897-65f8904e8501/db/codeql_db/working"
}
Previously:
- There is confusion between nameWithOwner and queryLanguage. Both are strings.
Between
runResult, err := codeql.RunQuery(databasePath, job.QueryLanguage, queryPackPath, tempDir)
(agent.go l205)
and
func RunQuery(database string, nwo string, queryPackPath string, tempDir string) (*RunQueryResult, error)
QueryLanguage is suddenly name with owner in the code.
Added some debugging, the value is the query language in the two places it gets used:
server | 2024/07/03 18:30:15 DEBUG Processed request info location="{Data:map[bucket:packs key:1]}" language=cpp
...
agent | 2024/07/03 18:30:15 DEBUG XX: is nwo a name/owner, or the original callers' queryLanguage? nwo=cpp
...
agent | 2024/07/03 18:30:19 DEBUG XX: 2: is nwo a name/owner, or the original callers' queryLanguage? nwo=cpp
Changes:
- Introduce explicit type QueryLanguage = string and update code to clarify
- inline trivial function
For full test, we cannot have
ERROR codeql database analyze failed: error="exec:
\"codeql\": executable file not found in $PATH" job="{MirvaRequestID:0
QueryPackId:54674 QueryLanguage:cpp ORepo:{Owner:psycopg Repo:psycopg2}}"
For linux/arm64, use a Dockerfile that:
- uses ubuntu 22.04 base image
- adds the 1.17 version of the codeql bundle
- extracts the bundle
- adds a recent version of the JRE
- extracts it
- sets the CODEQL_JAVA_HOME environment variable to point to the JRE
The instructions are updated