Commit Graph

53 Commits

Author SHA1 Message Date
Michael Hohn
a5bb232af2 Use full repository path name in place of mrvacommander 2024-12-13 10:54:35 -08:00
Michael Hohn
4d52176c5a Add Publisher Confirms and Consumer Acknowledgements to rabbitmq channels
Also updated the end-to-end workflow

The confirmation channel size is intentionally very large to prevent
blocking the server or agents.
2024-11-14 12:04:18 -08:00
Michael Hohn
4e93929943 Code generalization: cleanup 2024-10-30 11:10:28 -07:00
Michael Hohn
e7d32861e5 Code generalization: request db info from other source: remove unused constants 2024-10-28 18:45:21 -07:00
Michael Hohn
52aafd6fc9 Code generalization: request db info from other source: remove unneccessary types 2024-10-28 14:34:07 -07:00
Michael Hohn
d956f47db3 Fix: Produce complete SARIF output in agent
The problem was missing fields in the SARIF output.  After the debugging
below, the cause were conversions from JSON to Go to JSON; the Go ->
JSON conversion only output the fields defined in the Go struct.

Because SARIF has so many optional fields, no attempt is made to enforce
a statically-defined structure.  Instead, the JSON -> Go conversion is
now to a fully dynamic structure; unused fields are simply passed through

Debugging:

       Comparing two SARIF files shows

           {
             "$schema" : "https://json.schemastore.org/sarif-2.1.0.json",
             "version" : "2.1.0",
             "runs" : [ {...
             } ]
           }

       and

           {
             "runs": [...
             ]
           }

       so there are missing fields.

    The Problem
     1. Problem origin

          // Modify the sarif: start by extracting
          var sarif Sarif
          if err := json.Unmarshal(sarifData, &sarif); err != nil {
              return nil, fmt.Errorf("failed to unmarshal SARIF: %v", err)
          }
          ...
          // now inject version control info
          ...
          // and write it back
          sarifBytes, err := json.Marshal(sarif)
          if err != nil {
              return nil, fmt.Errorf("failed to marshal SARIF: %v", err)
          }

     2. But the struct only has one of the needed fields

          type Sarif struct {
              Runs []SarifRun `json:"runs"`
          }

     3. From the docs:

          // To unmarshal JSON into a struct, Unmarshal matches incoming object
          // keys to the keys used by [Marshal] (either the struct field name or its tag),
          // preferring an exact match but also accepting a case-insensitive match. By
          // default, object keys which don't have a corresponding struct field are
          // ignored (see [Decoder.DisallowUnknownFields] for an alternative).
2024-08-16 14:27:46 -07:00
Michael Hohn
0a52b729cd Expand the codeql db download response
End-to-end testing contained an unhandled CodeQL database download
request.  The handlers are added in this patch.  Debugging info is below
for reference.

The mrvacommander *server* fails with the following.  The source code is
: func setupEndpoints(c CommanderAPI)
See mrvacommander/pkg/server/server.go, endpoints for getting a URL to download artifacts.

  Original

          Downloading artifacts for tdlib_telegram-bot-apictsj8529d9_2
          ...
          Downloading database tdlib/telegram-bot-apictsj8529d9 cpp mirva-session-1400 tdlib_telegram-bot-apictsj8529d9_2
          ...
          2024/08/13 12:31:38 >> GET http://localhost:8080/repos/tdlib/telegram-bot-apictsj8529d9/code-scanning/codeql/databases/cpp
          ...
          2024/08/13 12:31:38 << 404 http://localhost:8080/repos/tdlib/telegram-bot-apictsj8529d9/code-scanning/codeql/databases/cpp
          ...
          -rwxr-xr-x@  1 hohn  staff  169488 Aug 13 12:29 tdlib_telegram-bot-apictsj8529d9_2.sarif*
          -rwxr-xr-x@  1 hohn  staff      10 Aug 13 12:31 tdlib_telegram-bot-apictsj8529d9_2_db.zip*

  Server log

          server         | 2024/08/13 19:31:38 ERROR Unhandled endpoint method=GET uri=/repos/tdlib/telegram-bot-apictsj8529d9/code-scanning/codeql/databases/cpp

  Try a manual download from the server

          8:$ wget http://localhost:8080/repos/tdlib/telegram-bot-apictsj8529d9/code-scanning/codeql/databases/cpp
          --2024-08-13 12:56:05--  http://localhost:8080/repos/tdlib/telegram-bot-apictsj8529d9/code-scanning/codeql/databases/cpp
          Resolving localhost (localhost)... ::1, 127.0.0.1
          Connecting to localhost (localhost)|::1|:8080... connected.
          HTTP request sent, awaiting response... 404 Not Found
          2024-08-13 12:56:05 ERROR 404: Not Found.

          server         | 2024/08/13 19:56:05 ERROR Unhandled endpoint method=GET uri=/repos/tdlib/telegram-bot-apictsj8529d9/code-scanning/codeql/databases/cpp

  The full info for the DB
          tdlib,telegram-bot-api,8529d9,2.17.0,2024-05-09 08:02:49.545174+00:00,cpp,f95d406da67adb8ac13d9c562291aa57c65398e0,306106.0,/Users/hohn/work-gh/mrva/mrva-open-source-download/repos-2024-04-29/tdlib/telegram-bot-api/code-scanning/codeql/databases/cpp/db.zip,cpp,C/C++,1244.0,306106.0,2024-05-13T15:54:54.749093,cpp,True,3375,373477635

The gh-mrva *client* sends the following.  The source is
gh-mrva/utils/utils.go,
    client.Get(fmt.Sprintf("http://localhost:8080/repos/%s/code-scanning/codeql/databases/%s", task.Nwo, task.Language))

We have
  cd /Users/hohn/work-gh/mrva/gh-mrva
  0:$ rg 'repos/.*/code-scanning/codeql/databases'

          ...
          utils/utils.go
          625:	// resp, err := client.Get(fmt.Sprintf("https://api.github.com/repos/%s/code-scanning/codeql/databases/%s", task.Nwo, task.Language))
          626:	resp, err := client.Get(fmt.Sprintf("http://localhost:8080/repos/%s/code-scanning/codeql/databases/%s", task.Nwo, task.Language))

  And
          resp, err := client.Get(fmt.Sprintf("http://localhost:8080/repos/%s/code-scanning/codeql/databases/%s", task.Nwo, task.Language))

The original DB upload was
  cd ~/work-gh/mrva/mrvacommander/client/qldbtools && \
      ./bin/mc-db-populate-minio -n 11 < scratch/db-info-3.csv

  ...
  2024-08-14 09:29:19 [INFO] Uploaded /Users/hohn/work-gh/mrva/mrva-open-source-download/repos-2024-04-29/tdlib/telegram-bot-api/code-scanning/codeql/databases/cpp/db.zip as tdlib$telegram-bot-apictsj8529d9.zip to bucket qldb
  ...
2024-08-14 13:01:15 -07:00
Michael Hohn
6bebf4abfc Remove interactive debug statements 2024-08-13 09:27:13 -07:00
Michael Hohn
9d60489908 wip: Handle varying CodeQL DB formats. This code contains debugging features
This patch fixes the following

     - [X] Wrong db metadata path.  Fixed via
       : globRecursively(databasePath, "codeql-database.yml")

       The log output for reference:

                 agent          | 2024/08/09 21:16:40 DEBUG XX:getDataBaseMetadata databasePath=/tmp/ce523549-a217-4b54-a118-7224ce444870/db "Waiting for SIGUSR1 or SIGUSR2..."=<nil>
                 agent          | 2024/08/09 21:16:40 DEBUG XX:getDataBaseMetadata databasePath=/tmp/bc24fe72-b520-4e72-9634-a98d630cb75e/db "Waiting for SIGUSR1 or SIGUSR2..."=<nil>
                 agent          | 2024/08/09 21:16:40 DEBUG Received signal: %s "user defined signal 1"=<nil>
                 agent          | 2024/08/09 21:16:40 DEBUG XX:getDataBaseMetadata databasePath=/tmp/41fcf5cc-e151-4a11-bccc-481d599aa426/db "Waiting for SIGUSR1 or SIGUSR2..."=<nil>

            From

                 func getDatabaseMetadata(databasePath string) (*DatabaseMetadata, error) {
                 data, err := os.ReadFile(filepath.Join(databasePath, "codeql-database.yml"))
                 ...}

            And some inspection:

                 root@3fa4b8013336:~# find /tmp |grep ql-datab
                 /tmp/27f09b9f-254f-4ef5-abf5-9a1a2927906b/db/cpp/codeql-database.yml
                 /tmp/d7e14cd4-8789-4176-81bc-2ac1957ed9fd/db/codeql_db/codeql-database.yml
                 /tmp/41fcf5cc-e151-4a11-bccc-481d599aa426/db/codeql_db/codeql-database.yml
                 /tmp/bc24fe72-b520-4e72-9634-a98d630cb75e/db/codeql_db/codeql-database.yml
                 /tmp/ce523549-a217-4b54-a118-7224ce444870/db/codeql_db/codeql-database.yml

     - [X] Wrong db path.  Fixed via
       : findDBDir(databasePath)

       The log output for reference:

                 agent          | 2024/08/09 21:51:09 ERROR Failed to run analysis job error="failed to run analysis: failed to run queries: exit status 2\nOutput: A fatal error occurred: /tmp/91c61e0b-dfd9-4dd3-a3ad-cb77dbc1cbfd/db is not a recognized CodeQL database.\n"
                 agent          | 2024/08/09 21:51:09 INFO Running analysis job job="{Spec:{SessionID:1 NameWithOwner:{Owner:USCiLab Repo:cerealctsj264953}} QueryPackLocation:{Key:1 Bucket:packs} QueryLanguage:cpp}"
                 agent          | 2024/08/09 21:51:09 ERROR Failed to run analysis job error="failed to run analysis: failed to run queries: exit status 2\nOutput: A fatal error occurred: /tmp/1b8ffeba-8ad1-465e-8ec7-36cda449a5f5/db is not a recognized CodeQL database.\n"
                 ...

            This is easily confirmed:

                 root@171b5417e05f:~# /opt/codeql/codeql database upgrade  /tmp/7ed27578-d7ea-42e0-902a-effbc4df05f2/
                 A fatal error occurred: /tmp/7ed27578-d7ea-42e0-902a-effbc4df05f2 is not a recognized CodeQL database.

            Another try:

                 root@171b5417e05f:~# /opt/codeql/codeql database upgrade  /tmp/7ed27578-d7ea-42e0-902a-effbc4df05f2/database.zip
                 A fatal error occurred: Database root /tmp/7ed27578-d7ea-42e0-902a-effbc4df05f2/database.zip is not a directory.

             This one is correct:

                 root@171b5417e05f:~# /opt/codeql/codeql database upgrade /tmp/7ed27578-d7ea-42e0-902a-effbc4df05f2/db/codeql_db
                 /tmp/7ed27578-d7ea-42e0-902a-effbc4df05f2/db/codeql_db/db-cpp is up to date.

     - [X] Wrong database source prefix.  Also fixed via
       : findDBDir(databasePath)

       Similar log entries:

                 agent          | 2024/08/13 15:40:14 ERROR Failed to run analysis job error="failed to run analysis: failed to get source location prefix: failed to resolve database: exit status 2\nOutput: A fatal error occurred: /tmp/da420844-a284-4d82-9470-fa189a5b4ee6/db is not a recognized CodeQL database.\n"
                 agent          | 2024/08/13 15:40:14 INFO Worker stopping due to reduction in worker count
                 agent          | 2024/08/13 15:40:18 ERROR Failed to run analysis job error="failed to run analysis: failed to get source location prefix: failed to resolve database: exit status 2\nOutput: A fatal error occurred: /tmp/eebfc52c-3ecf-490d-bbf4-23c305d6ba18/db is not a recognized CodeQL database.\n"

            and
                 agent          | 2024/08/13 15:49:33 ERROR Failed to resolve database err="exit status 2" output="A fatal error occurred: /tmp/b5c4941a-5692-4640-aa79-9810bcab39f4/db is not a recognized CodeQL database.\n"
                 agent          | 2024/08/13 15:49:33 DEBUG XX: RunQuery failed to get source location prefixdatabasePath=/tmp/b5c4941a-5692-4640-aa79-9810bcab39f4/db "Waiting for SIGUSR1 or SIGUSR2..."=<nil>
                 agent          | 2024/08/13 15:49:35 INFO Modifying worker count current=3 new=2
                 agent          | 2024/08/13 15:49:35 ERROR Failed to resolve database err="exit status 2" output="A fatal error occurred: /tmp/eda30582-81a3-4582-8897-65f8904e8501/db is not a recognized CodeQL database.\n"
                 agent          | 2024/08/13 15:49:35 DEBUG XX: RunQuery failed to get source location prefixdatabasePath=/tmp/eda30582-81a3-4582-8897-65f8904e8501/db "Waiting for SIGUSR1 or SIGUSR2..."=<nil>

            And this fails

                 root@51464985499f:~# /opt/codeql/codeql resolve database /tmp/eda30582-81a3-4582-8897-65f8904e8501/db/
                 A fatal error occurred: /tmp/eda30582-81a3-4582-8897-65f8904e8501/db is not a recognized CodeQL database.

            But this works:

                 root@51464985499f:~# /opt/codeql/codeql resolve database /tmp/eda30582-81a3-4582-8897-65f8904e8501/db/codeql_db/
                 {
                   "sourceLocationPrefix" : "/home/runner/work/bulk-builder/bulk-builder",
                   "columnKind" : "utf8",
                   "unicodeNewlines" : false,
                   "sourceArchiveZip" : "/tmp/eda30582-81a3-4582-8897-65f8904e8501/db/codeql_db/src.zip",
                   "sourceArchiveRoot" : "/tmp/eda30582-81a3-4582-8897-65f8904e8501/db/codeql_db/src",
                   "datasetFolder" : "/tmp/eda30582-81a3-4582-8897-65f8904e8501/db/codeql_db/db-cpp",
                   "logsFolder" : "/tmp/eda30582-81a3-4582-8897-65f8904e8501/db/codeql_db/log",
                   "languages" : [
                     "cpp"
                   ],
                   "scratchDir" : "/tmp/eda30582-81a3-4582-8897-65f8904e8501/db/codeql_db/working"
                }
2024-08-13 09:22:24 -07:00
Michael Hohn
8965725e42 Replaced the dynamic table type ArtifactLocation with struct keys
The original is present in comment form for reference
2024-07-10 13:08:40 -07:00
Michael Hohn
e3f4d9f012 Use QL_DB_BUCKET_NAME in shell and go 2024-07-08 14:23:25 -07:00
Michael Hohn
3566f5169e Type checking fix: Restrict the keys / values for ArtifactLocation and centralize the common ones 2024-07-08 12:07:46 -07:00
Michael Hohn
b3cf7a4f65 Introduce explicit type QueryLanguage = string and update code to clarify
Previously:
- There is confusion between nameWithOwner and queryLanguage.  Both are strings.
  Between

        runResult, err := codeql.RunQuery(databasePath, job.QueryLanguage, queryPackPath, tempDir)
    (agent.go l205)

  and

        func RunQuery(database string, nwo string, queryPackPath string, tempDir string) (*RunQueryResult, error)

  QueryLanguage is suddenly name with owner in the code.

  Added some debugging, the value is the query language in the two places it gets used:

        server         | 2024/07/03 18:30:15 DEBUG Processed request info location="{Data:map[bucket:packs key:1]}" language=cpp
        ...
        agent          | 2024/07/03 18:30:15 DEBUG XX: is nwo a name/owner, or the original callers' queryLanguage? nwo=cpp
        ...
        agent          | 2024/07/03 18:30:19 DEBUG XX: 2: is nwo a name/owner, or the original callers' queryLanguage? nwo=cpp

Changes:
- Introduce explicit type QueryLanguage = string and update code to clarify
- inline trivial function
2024-07-03 13:30:02 -07:00
Michael Hohn
380e90135a Add the submitEmptyStatusResponse special case 2024-07-01 10:54:46 -07:00
Michael Hohn
1642894ccf Added note about querypackurl 2024-06-27 14:53:52 -07:00
Michael Hohn
c54bda8432 fix regression from 0cffb3c8 2024-06-27 14:22:52 -07:00
Michael Hohn
d145731c4b WIP: marked special case of 0 jobs 2024-06-26 09:27:27 -07:00
Michael Hohn
0cffb3c849 Simplify struct SessionInfo and adjoining code 2024-06-25 18:57:27 -07:00
Nicolas Will
b4d9833da3 Resolve status logic error and refactor server.go 2024-06-24 22:31:19 -04:00
Nicolas Will
e0cbc01d21 Fully implement local and container MRVA 2024-06-24 01:31:28 -04:00
Nicolas Will
fc9fcc7ae6 Add server queue logic and refactor 2024-06-17 11:30:46 +02:00
Michael Hohn
8b310e43ad Fix storage modules types and interfaces to compile server 2024-06-16 20:16:26 -07:00
Michael Hohn
6229c08900 Remove postgres and references to it 2024-06-16 19:43:29 -07:00
Michael Hohn
b756668e70 Fix merge so server compiles 2024-06-16 19:36:31 -07:00
Michael Hohn
2c5ecd3a1e Merge the agent-impl branch into the server branch 2024-06-16 19:21:42 -07:00
Michael Hohn
cd0647836e Combine New/Setup functions 2024-06-16 19:09:32 -07:00
Michael Hohn
8df9673897 wip: Mark update slots with XX:, add pkg/server/container.go 2024-06-16 19:09:30 -07:00
Nicolas Will
903ca5673e Add dynamic worker management 2024-06-16 17:07:13 +02:00
Nicolas Will
7ea45cb176 Separate queue and agent logic and refactor 2024-06-16 11:18:22 +02:00
Nicolas Will
3b06e2061f Add RabbitMQ agent and containers 2024-06-15 00:23:14 +02:00
Nicolas Will
c29daab045 Standardize NameWithOwner and Visible naming
Acronyms are now "NWO" and "Vis" respsectively
2024-06-14 12:55:45 +02:00
Nicolas Will
3218f64bcf Move archive functions into utils package 2024-06-14 12:48:33 +02:00
Michael Hohn
5730c330f4 Add codeql to server container for standalone testing
For full test, we cannot have

       ERROR codeql database analyze failed: error="exec:
       \"codeql\": executable file not found in $PATH" job="{MirvaRequestID:0
       QueryPackId:54674 QueryLanguage:cpp ORepo:{Owner:psycopg Repo:psycopg2}}"

For linux/arm64, use a Dockerfile that:
       - uses ubuntu 22.04 base image
       - adds the 1.17 version of the codeql bundle
       - extracts the bundle
       - adds a recent version of the JRE
       - extracts it
       - sets the CODEQL_JAVA_HOME environment variable to point to the JRE

The instructions are updated
2024-06-12 11:28:37 -07:00
Michael Hohn
765a76f75a Provide MRVA_SERVER_ROOT via environment variable 2024-06-11 20:13:13 -07:00
Michael Hohn
9c0cdb1fe4 Simplify naming, don't restate package name 2024-06-11 16:55:10 -07:00
Michael Hohn
2d88b351ff Introduce structs/interfacess for new storage units
This commit simply splits the interfaces but introduces no new structs

     - Introduce the QueryPackStore, mrvacommander/pkg/qpstore
     - Introduce the CodeQL database store, pkg/qldbstore/interfaces.go
2024-06-11 14:16:41 -07:00
Michael Hohn
fc29fc5653 wip: update passing Queue to Commander
- Add minio to docker-compose
     - Fix use of server.NewCommanderSingle
2024-06-11 13:19:05 -07:00
Michael Hohn
7e0d6909da wip: Make cross-module visibility explicit via Visibles structs
All access is/will be through interfaces accessed through these structs.

This introduces several distinct storage units:
+ DB for server state
+ DB for codeql databases
+ query pack store

The steps for manually creating needed databases are in the README
2024-06-07 13:14:41 -07:00
Michael Hohn
25cab583c1 wip: storage using postgres / gorm using partial json
Several approaches of normalizing json were tried and ultimately found
impractical at this point.

Using a hybrid of tables and json is the current approach; this may be
further normalized later.
2024-06-06 13:19:00 -07:00
Michael Hohn
593644ca2e wip: rename ID to JobId 2024-06-04 13:04:51 -07:00
Michael Hohn
0349961360 wip: start container version of server 2024-06-04 12:24:42 -07:00
Michael Hohn
b9081b1945 wip: convert run-analysis.sh to golang version 2024-05-31 08:24:09 -07:00
Michael Hohn
ba44db04da wip: server is now fully functional, some FIXMEs remain 2024-05-26 12:22:36 -07:00
Michael Hohn
f7155eba50 wip: add analysis runner / agent, separate Server/Queue/Agent, use New* initializers 2024-05-23 15:46:55 -07:00
Michael Hohn
2ab596bf1d wip: Move all references to github.com/hohn/ghes-mirva-server 2024-05-22 14:39:12 -07:00
Michael Hohn
4269bacf2a wip: update store. references to storage. in server.go 2024-05-21 11:45:47 -07:00
Michael Hohn
873339ff06 wip: port submit_response() 2024-05-21 10:51:01 -07:00
Michael Hohn
8cd4f4d809 wip: port queue.StartAnalyses 2024-05-20 20:07:39 -07:00
Michael Hohn
cf595f338a wip: port FileDownload 2024-05-20 14:28:33 -07:00
Michael Hohn
ccf064fe6c wip: replace some references to the old prototype 2024-05-20 14:01:19 -07:00