Michael Hohn b7b4839fe0 Enforce CID uniqueness and save raw refined info immediately
Previously, the refined info was collected and the CID computed before saving.
This was a major development time sink, so the CID is now computed in the
following step (bin/mc-db-unique).

The columns previously chosen for the CID are not enough.  If these columns are
empty for any reason, the CID repeats.  Just including the owner/name won't help,
because those are duplicates.

Some possibilities considered and rejected:
1. Could use a random number for missing columns.  But this makes
   the CID nondeterministic.
2. Switch to the file system ctime?  Not unique across owner/repo pairs,
   but unique within one.  Also, this could be changed externally and cause
   *very* subtle bugs.
3. Use the file system path?  It has to be unique at ingestion time, but
   repo collections can move.

Instead, this patch
4. Drops rows that don't have the
   | cliVersion   |
   | creationTime |
   | language     |
   | sha          |
   columns.  There are very few (16 out of 6000) and their DBs are
   quesionable.
2024-08-01 11:09:04 -07:00
2024-07-03 08:55:27 -07:00
2024-07-10 15:38:59 -07:00
2024-06-26 10:35:10 -07:00
2024-06-26 10:35:10 -07:00
2024-06-16 19:43:29 -07:00
2024-06-16 19:43:29 -07:00
2024-05-07 15:38:59 -07:00
2024-06-26 10:21:52 -07:00
2024-07-10 15:38:59 -07:00

Overview

TODO diagram

TODO Style notes

  • NO package init() functions
  • Dynamic behaviour must be explicit

Client CodeQL Database Selector

Separate from the server's downloading of databases, a client-side interface is needed to generate the databases.json file. This

  1. must be usable from the shell
  2. must be interactive (Python, Jupyter)
  3. is session based to allow iterations on selection / narrowing
  4. must be queryable. There is no need to reinvent sql / dataframes

Python with dataframes is ideal for this; the project is in client/.

Reverse proxy

For testing, replay flows using mitmweb. This is faster and simpler than using gh-mrva or the VS Code plugin.

  • Set up the virtual environment and install tools

    python3.11 -m venv venv
    source venv/bin/activate
    pip install mitmproxy
    

For intercepting requests:

  1. Start mitmproxy to listen on port 8080 and forward requests to port 8081, with web interface

    mitmweb --mode reverse:http://localhost:8081 -p 8080
    
  2. Change server ports in docker-compose.yml to

    ports:
    - "8081:8080" # host:container
    
  3. Start the containers.

  4. Submit requests.

  5. Save the flows for later replay.

One such session is in tools/mitmweb-flows; it can be loaded to replay the requests:

  1. start mitmweb --mode reverse:http://localhost:8081 -p 8080
  2. file > open > tools/mitmweb-flows
  3. replay at least the submit, status, and download requests

Cross-compile server on host, run it in container

These are simple steps using a single container.

  1. build server on host

    GOOS=linux GOARCH=arm64 go build
    
  2. build docker image

    cd cmd/server
    docker build -t server-image .
    
  3. Start container with shared directory

    docker run -it \
           -v   /Users/hohn/work-gh/mrva/mrvacommander:/mrva/mrvacommander \
           server-image
    
  4. Run server in container

    cd /mrva/mrvacommander/cmd/server/ && ./server
    

Using docker-compose

Steps to build and run the server

Steps to build and run the server in a multi-container environment set up by docker-compose.

  1. Built the server-image, above

  2. Build server on host

    cd ~/work-gh/mrva/mrvacommander/cmd/server/
    GOOS=linux GOARCH=arm64 go build
    
  3. Start the containers

    cd ~/work-gh/mrva/mrvacommander/
    docker-compose down
    docker-compose up -d
    
  4. Run server in its container

    cd ~/work-gh/mrva/mrvacommander/
    docker exec -it server bash
    cd /mrva/mrvacommander/cmd/server/ 
    ./server -loglevel=debug -mode=container
    
  5. Test server from the host via

    cd ~/work-gh/mrva/mrvacommander-1/tools
    sh ./request_16-Jun-2024_11-33-16.curl
    
  6. Follow server logging via

    cd ~/work-gh/mrva/mrvacommander-1
    docker-compose up -d
    docker-compose logs -f server
    
  7. Completely rebuild all containers. Useful when running into docker errors

    cd ~/work-gh/mrva/mrvacommander-1
    docker-compose up --build
    
  8. Test server via remote client by following the steps in gh-mrva

Some general docker-compose commands

  1. Get service status

    docker-compose ps
    
  2. Stop services

    docker-compose down
    
  3. View all logs

    docker-compose logs
    
  4. check containers from server container

    docker exec -it server bash
    curl -I http://rabbitmq:15672
    

Use the minio ql database db

  1. Web access via

    open http://localhost:9001/login
    

    username / password are in docker-compose.yml for now. The ql db listing will be at

    http://localhost:9001/browser/qldb
    
  2. Populate the database by running

    ./populate-dbstore.sh
    

    from the host.

  3. The names in the bucket use the owner_repo format for now, e.g. google_flatbuffers_db.zip. TODO This will be enhanced to include other data later

  4. Test Go's access to the dbstore -- from the host -- via

    cd ./test
    go test -v
    

    This should produce

    === RUN   TestDBListing
    dbstore_test.go:44: Object Key: google_flatbuffers_db.zip
    dbstore_test.go:44: Object Key: psycopg_psycopg2_db.zip
    

Use the minio query pack db

  1. Web access via

    open http://localhost:19001/login
    

    username / password are in docker-compose.yml for now. The ql db listing will be at

    http://localhost:19001/browser/qpstore
    

To run Use the minio query pack db

Description
No description provided
Readme Apache-2.0 1.2 GiB
Languages
Go 47.7%
Jupyter Notebook 21.2%
Python 17.4%
CSS 6.3%
Dockerfile 3.6%
Other 3.7%