michael hohn 76333d0092 Add hepc-serve-global and stabilize container startup
- Introduce hepc-serve-global to serve global MRVA values from
  hohnlab.org/mrva/values without local DB provisioning.
- Keep schema initialization symmetric across server and agent, while
  serializing PostgreSQL DDL via a global advisory lock to prevent
  concurrent CREATE TABLE races.
- Pin RabbitMQ image to rabbitmq:3.13.7-management to avoid credential
  incompatibilities introduced by upstream image changes.
- Remove pre-hashed RabbitMQ credentials and return to deterministic
  user/password initialization.
- Eliminate reliance on implicit container state to ensure reproducible
  startup.

The primary purpose of this change is integration of global MRVA values;
the remaining fixes are required to make the new startup path reliable.
2026-01-08 16:17:34 -08:00
2024-12-17 21:29:27 -08:00
2024-12-17 21:29:27 -08:00
2025-01-30 14:40:47 -08:00
2024-12-18 11:57:55 -08:00

Introduction to hepc HTTP End Point for CodeQL

Note on Global Data

hepc requires a pre-existing, complete, self-consistent metadata set. Creation and maintenance of this metadata is intentionally outside the scope of hepc itself.

There are several supported ways to obtain such a metadata set:

  1. The sections Usage Sample for Containers and Form link to DBs from filesystem, create metadata.sql illustrate how to scan local file systems and generate a compatible metadata.sql. This approach is appropriate if you maintain your own CodeQL databases. In this case, host-hepc-init and host-hepc-serve can be used without significant changes.
  2. hepc-serve-global references a curated collection of open-source projects' CodeQL databases. This requires a network connection but no local setup. The collection is currently around 70 GB and is expected to grow beyond 160 GB; it is therefore unsuitable for inclusion in a Docker image. At present, it contains approximately 3000 databases. For experimentation at medium scale, this is the recommended approach.
  3. host-hepc-serve implements a simple HTTP API used by the other MRVA containers. You may replace this service entirely by implementing a compatible server and adjusting the container wiring accordingly.

In all cases, hepc operates strictly as a reader of metadata and does not create, modify, or repair metadata sets.

Usage Sample for Containers

This is for container preparation; all operations produce full, standalone copies. For a host service with full storage, see Usage Sample for Full Machines

  cd ~/work-gh/mrva/mrvahepc 
  uv sync  # one-time install; uv-run shebangs avoid manual activation

  # Collect DBs from filesystem
  cd ~/work-gh/mrva/mrvahepc && rm -fR db-collection.tmp/
  export MRVA_HEPC_ENDPOINT=http://hepc
  ./bin/mc-hepc-init --db_collection_dir db-collection.tmp \
                     --starting_path ~/work-gh/mrva/mrva-open-source-download/ \
                     --max_dbs 17

  # Serve collected DBs plus metadata
  cd ~/work-gh/mrva/mrvahepc 
  ./bin/mc-hepc-serve --codeql-db-dir db-collection.tmp

  # Test server
  curl 127.0.0.1:8070/index -o - 2>/dev/null | wc -l

  curl 127.0.0.1:8070/api/v1/latest_results/codeql-all \
       -o - 2>/dev/null | wc -l

  url=$(curl 127.0.0.1:8070/api/v1/latest_results/codeql-all \
             -o - 2>/dev/null | head -1 | jq -r .result_url)
  echo $url
  # http://hepc/db/db-collection.tmp/aircrack-ng-aircrack-ng-ctsj-41ebbe.zip

  wget $(echo $url|sed 's|http://hepc|http://127.0.0.1:8070|g;')

Usage Sample for Full Machines

This is for providing a DB service from a complete machine, with DBs already in place. All operations produce links to existing DBs, not copies.

Form link to DBs from filesystem, create metadata.sql

  cd ~/work-gh/mrva/mrvahepc 
  uv sync

  # Form link to DBs from filesystem, create metadata.sql
  cd ~/work-gh/mrva/mrvahepc && rm -fR db-collection-host.tmp/
  workers=$(( $(nproc) * 2 ))
  ./bin/host-hepc-init --db_collection_dir db-collection-host.tmp \
                       --starting_path ~/work-gh/mrva/mrva-open-source-download/ \
                       --max_dbs 8000 \
                       --max_workers=$workers \
                       > db-collection-host.log 2>&1

Inspect metadata.sql

  cd ~/work-gh/mrva/mrvahepc 
  uv sync

  # Inspect metadata.sql
  cd ~/work-gh/mrva/mrvahepc 
  sqlite3 db-collection-host.tmp/metadata.sql <<eoo
  .tables
  select count(*) from metadata;
  .mode column
  .width 13 13 13 13 13 13 13 13 13 13 13
  select * from metadata limit 1;
  eoo

Use metadata.sql to provide DB selection UI

Here, just run

  cd ~/work-gh/mrva/mrvahepc 
  uv sync

  # Use metadata.sql to provide DB selection UI
  cd ~/work-gh/mrva/mrvahepc 
  ./bin/db-selector-gui --metadata_db_path db-collection-host.tmp/metadata.sql

The docker compose file mounts host paths used in db-collection-host.tmp/metadata.sql as hepc volumes

TODO Serve collected DBs plus metadata

  # Serve collected DBs plus metadata
  cd ~/work-gh/mrva/mrvahepc 
  ./bin/mc-hepc-serve --codeql-db-dir db-collection.tmp

  # Test server
  curl 127.0.0.1:8070/index -o - 2>/dev/null | wc -l

  curl 127.0.0.1:8070/api/v1/latest_results/codeql-all \
       -o - 2>/dev/null | wc -l

  url=$(curl 127.0.0.1:8070/api/v1/latest_results/codeql-all \
             -o - 2>/dev/null | head -1 | jq -r .result_url)
  echo $url
  # http://hepc/db/db-collection.tmp/aircrack-ng-aircrack-ng-ctsj-41ebbe.zip

  wget $(echo $url|sed 's|http://hepc|http://127.0.0.1:8070|g;')

Installation

  • Use uv to install dependencies without manually activating a venv

      cd ~/work-gh/mrva/mrvahepc
      uv sync
    

    uv sync installs everything declared in pyproject.toml into a managed .venv and caches the wheels; the bin/ scripts already use uv run in their shebangs, so you can execute them directly.

  • Local development

      cd ~/work-gh/mrva/mrvahepc
      uv sync
      uv pip install --editable .
    

    The --editable install is optional; use ./bin/* directly to avoid relying on entry points in your shell.

  • Full installation

    pip install mrvahepc
    

Use as library

The best way to examine the code is starting from the high-level scripts in bin/.

Description
No description provided
Readme Apache-2.0 736 KiB
Languages
Python 95.3%
Shell 4.7%