Include custom id (CID) to distinguish CodeQL databases
The current api (<2024-07-26 Fri>) is set up only for (owner,name). This is
insufficient for distinguishing CodeQL databases.
Other differences must be considered; this patch combines the fields
| cliVersion |
| creationTime |
| language |
| sha |
into one called CID. The CID field is a hash of these others and therefore can be
changed in the future without affecting workflows or the server.
The cid is combined with the owner/name to form one
identifier. This requires no changes to server or client -- the db
selection's interface is separate from VS Code and gh-mrva in any case.
To test this, this version imports multiple versions of the same owner/repo pairs from multiple directories. In this case, from
~/work-gh/mrva/mrva-open-source-download/repos
and
~/work-gh/mrva/mrva-open-source-download/repos-2024-04-29/
The unique database count increases from 3000 to 5360 -- see README.md,
./bin/mc-db-view-info < db-info-3.csv &
Other code modifications:
- Push (owner,repo,cid) names to minio
- Generate databases.json for use in vs code extension
- Generate list-databases.json for use by gh-mrva client
This commit is contained in:
committed by
=Michael Hohn
parent
b4f1a2b8a6
commit
1e1daf9330
@@ -1,7 +1,8 @@
|
||||
#!/usr/bin/env python
|
||||
""" Read a table of CodeQL DB information,
|
||||
group entries by (owner,name), sort each group by
|
||||
creationTime and keep only the top (newest) element.
|
||||
group entries by (owner,name,CID),
|
||||
sort each group by creationTime,
|
||||
and keep only the top (newest) element.
|
||||
"""
|
||||
import argparse
|
||||
import logging
|
||||
@@ -32,8 +33,8 @@ import sys
|
||||
|
||||
df0 = pd.read_csv(sys.stdin)
|
||||
|
||||
df_sorted = df0.sort_values(by=['owner', 'name', 'creationTime'])
|
||||
df_unique = df_sorted.groupby(['owner', 'name']).first().reset_index()
|
||||
df_sorted = df0.sort_values(by=['owner', 'name', 'CID', 'creationTime'])
|
||||
df_unique = df_sorted.groupby(['owner', 'name', 'CID']).first().reset_index()
|
||||
|
||||
df_unique.to_csv(sys.stdout, index=False)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user