Files
mrvacommander/client/qldbtools
Michael Hohn b7b4839fe0 Enforce CID uniqueness and save raw refined info immediately
Previously, the refined info was collected and the CID computed before saving.
This was a major development time sink, so the CID is now computed in the
following step (bin/mc-db-unique).

The columns previously chosen for the CID are not enough.  If these columns are
empty for any reason, the CID repeats.  Just including the owner/name won't help,
because those are duplicates.

Some possibilities considered and rejected:
1. Could use a random number for missing columns.  But this makes
   the CID nondeterministic.
2. Switch to the file system ctime?  Not unique across owner/repo pairs,
   but unique within one.  Also, this could be changed externally and cause
   *very* subtle bugs.
3. Use the file system path?  It has to be unique at ingestion time, but
   repo collections can move.

Instead, this patch
4. Drops rows that don't have the
   | cliVersion   |
   | creationTime |
   | language     |
   | sha          |
   columns.  There are very few (16 out of 6000) and their DBs are
   quesionable.
2024-08-01 11:09:04 -07:00
..

qldbtools

qldbtools is a Python package for working with CodeQL databases

Installation

  • Set up the virtual environment and install tools

            cd ~/work-gh/mrva/mrvacommander/client/qldbtools/
            python3.11 -m venv venv
            source venv/bin/activate
            pip install --upgrade pip
    
            # From requirements.txt
            pip install -r requirements.txt
            # Or explicitly
            pip install jupyterlab pandas ipython
            pip install lckr-jupyterlab-variableinspector
    
  • Run jupyterlab

            cd ~/work-gh/mrva/mrvacommander/client
            source venv/bin/activate
            jupyter lab &
    
    The variable inspector is a right-click on an open console or notebook.
    
    The `jupyter` command produces output including
    
            Jupyter Server 2.14.1 is running at:
            http://127.0.0.1:8888/lab?token=4c91308819786fe00a33b76e60f3321840283486457516a1
    
    Use this to connect multiple front ends
    
  • Local development

    ```bash
    cd ~/work-gh/mrva/mrvacommander/client/qldbtools
    source venv/bin/activate
    pip install --editable .
    ```
    
    The `--editable` *should* use symlinks for all scripts; use `./bin/*` to be sure.
    
  • Full installation

    ```bash
    pip install qldbtools
    ```
    

Use as library

import qldbtools as ql

Command-line use

Initial information collection requires a unique file path so it can be run repeatedly over DB collections with the same (owner,name) but other differences -- namely, in one or more of

  • creationTime
  • sha
  • cliVersion
  • language

Those fields are collected and a single name addenum formed in bin/mc-db-refine-info.

The command sequence, grouped by data files, is

    cd ~/work-gh/mrva/mrvacommander/client/qldbtools
    ./bin/mc-db-initial-info ~/work-gh/mrva/mrva-open-source-download > db-info-1.csv
    ./bin/mc-db-refine-info < db-info-1.csv > db-info-2.csv
   
    ./bin/mc-db-view-info < db-info-2.csv &
    ./bin/mc-db-unique < db-info-2.csv > db-info-3.csv
    ./bin/mc-db-view-info < db-info-3.csv &

    ./bin/mc-db-populate-minio -n 23 < db-info-3.csv
    ./bin/mc-db-generate-selection -n 23 vscode-selection.json gh-mrva-selection.json < db-info-3.csv 

Notes

The preview-data plugin for VS Code has a bug; it displays 0 instead of 0e3379 for the following. There are other entries with similar malfunction.

    CleverRaven,Cataclysm-DDA,0e3379,2.17.0,2024-05-08 12:13:10.038007+00:00,cpp,5ca7f4e59c2d7b0a93fb801a31138477f7b4a761,578098.0,/Users/hohn/work-gh/mrva/mrva-open-source-download/repos-2024-04-29/CleverRaven/Cataclysm-DDA/code-scanning/codeql/databases/cpp/db.zip,cpp,C/C++,1228.0,578098.0,2024-05-13T12:14:54.650648,cpp,True,4245,563435469
    CleverRaven,Cataclysm-DDA,3231f7,2.18.0,2024-07-18 11:13:01.673231+00:00,cpp,db3435138781937e9e0e999abbaa53f1d3afb5b7,579532.0,/Users/hohn/work-gh/mrva/mrva-open-source-download/repos/CleverRaven/Cataclysm-DDA/code-scanning/codeql/databases/cpp/db.zip,cpp,C/C++,1239.0,579532.0,2024-07-24T02:33:23.900885,cpp,True,1245,573213726