mrvacommander

Author	SHA1	Message	Date
Michael Hohn	2d92ad51c3	Migrate entries from global Makefile to local	2024-09-26 12:36:31 -07:00
Michael Hohn	f60b55f181	Storate container simplification Only one is really needed for large storage for the dbstore container. The demo containers can contain their own data -- it's small and the containers are made for demonstration anyway.	2024-09-13 12:04:30 -07:00
Michael Hohn	a35fc619e6	Use mk. prefix for Makefile time stamps and make git ignore them	2024-09-13 09:44:08 -07:00
Michael Hohn	8dd6c94918	Set up and push fully configured vs code container	2024-09-12 14:05:59 -07:00
Michael Hohn	34958e4cf4	WIP: Working individual containers and docker compose demo	2024-09-12 09:49:25 -07:00
Michael Hohn	1e2df515e3	Set up and push Docker containers for demonstration purposes These containers take the place of a desktop install	2024-09-04 15:52:18 -07:00
Michael Hohn	681fcdab8c	Add new containers to streamline setup	2024-08-29 13:22:59 -07:00
Michael Hohn	5021fc824b	Fix: include minio in requirements.txt	2024-08-23 08:17:27 -07:00
Michael Hohn	7d27b910cd	Fix: include CID when filtering in mc-rows-from-mrva-list	2024-08-22 13:58:07 -07:00
Michael Hohn	0d3f4c5e40	Updated requirements for container	2024-08-21 16:25:13 -07:00
Michael Hohn	742b059a49	Add script to list full details for a mrva-list file	2024-08-09 08:37:31 -07:00
Michael Hohn	d1f56ae196	Add explicit language selection	2024-08-09 08:36:48 -07:00
Michael Hohn	b183cee78d	Reformat / rearrange comments	2024-08-02 14:10:54 -07:00
Michael Hohn	5a95f0ea08	Add module comment	2024-08-02 14:04:06 -07:00
Michael Hohn	349d758c14	Move session scripts to separate directory	2024-08-02 13:56:47 -07:00
Michael Hohn	582d933130	Improve example data layout and README	2024-08-01 14:30:40 -07:00
Michael Hohn	b7b4839fe0	Enforce CID uniqueness and save raw refined info immediately Previously, the refined info was collected and the CID computed before saving. This was a major development time sink, so the CID is now computed in the following step (bin/mc-db-unique). The columns previously chosen for the CID are not enough. If these columns are empty for any reason, the CID repeats. Just including the owner/name won't help, because those are duplicates. Some possibilities considered and rejected: 1. Could use a random number for missing columns. But this makes the CID nondeterministic. 2. Switch to the file system ctime? Not unique across owner/repo pairs, but unique within one. Also, this could be changed externally and cause very subtle bugs. 3. Use the file system path? It has to be unique at ingestion time, but repo collections can move. Instead, this patch 4. Drops rows that don't have the \| cliVersion \| \| creationTime \| \| language \| \| sha \| columns. There are very few (16 out of 6000) and their DBs are quesionable.	2024-08-01 11:09:04 -07:00
Michael Hohn	06dcf50728	Sort utils.cid_hash() entries for legibility	2024-07-31 15:20:43 -07:00
Michael Hohn	8f151ab002	Comment update	2024-07-30 16:08:05 -07:00
Michael Hohn	1e1daf9330	Include custom id (CID) to distinguish CodeQL databases The current api (<2024-07-26 Fri>) is set up only for (owner,name). This is insufficient for distinguishing CodeQL databases. Other differences must be considered; this patch combines the fields \| cliVersion \| \| creationTime \| \| language \| \| sha \| into one called CID. The CID field is a hash of these others and therefore can be changed in the future without affecting workflows or the server. The cid is combined with the owner/name to form one identifier. This requires no changes to server or client -- the db selection's interface is separate from VS Code and gh-mrva in any case. To test this, this version imports multiple versions of the same owner/repo pairs from multiple directories. In this case, from ~/work-gh/mrva/mrva-open-source-download/repos and ~/work-gh/mrva/mrva-open-source-download/repos-2024-04-29/ The unique database count increases from 3000 to 5360 -- see README.md, ./bin/mc-db-view-info < db-info-3.csv & Other code modifications: - Push (owner,repo,cid) names to minio - Generate databases.json for use in vs code extension - Generate list-databases.json for use by gh-mrva client	2024-07-30 10:47:29 -07:00
Michael Hohn	b4f1a2b8a6	Minor comment fix	2024-07-29 13:53:12 -07:00
Michael Hohn	f652a6719c	Comment fix	2024-07-29 13:41:15 -07:00
Michael Hohn	81c44ab14a	Add mc-db-unique as default single-(owner,repo) selector	2024-07-26 14:18:14 -07:00
Michael Hohn	92ca709458	Add mc-db-view-info to view available DBs	2024-07-26 08:40:41 -07:00
Michael Hohn	242ba3fc1e	Add script to populate minio using dataframe previously chosen	2024-07-25 15:14:37 -07:00
Michael Hohn	26dd69c976	minor doc update	2024-07-23 15:18:32 -07:00
Michael Hohn	731b44b187	Add scripts for automatic codeql db data and metadata collection - updated instructions - cli scripts mirror the interactive session*.py files	2024-07-23 15:05:03 -07:00
Michael Hohn	aaeafa9e88	Automate metadata collection for all DBs Several errors are handled; on extraction ExtractNotZipfile: ExtractNoCQLDB: On detail extraction DetailsMissing:	2024-07-22 19:12:12 -07:00
Michael Hohn	129b8cc302	interim: collect metadata from one DB zip file	2024-07-22 12:54:57 -07:00
Michael Hohn	d64522d168	Collect CodeQL database information from the file system and save as CSV This collection already provides significant meta-information ctime : str = '2024-05-13T12:04:01.593586' language : str = 'cpp' name : str = 'nanobind' owner : str = 'wjakob' path : Path = Path('/Users/hohn/work-gh/mrva/mrva-open-source-download/repos/wjakob/nanobind/code-scanning/codeql/databases/cpp/db.zip') size : int = 63083064 There is some more in the db.zip files, to be added	2024-07-22 11:07:00 -07:00
Michael Hohn	6b4e753e69	Experiment with formats for saving/loading the database index The .csv.gz format is the simplest and most universal. It's also the smallest on disk. The comparison of saved/reloaded dataframe shows no difference. The ctime_raw column caused serialization problems, so only ctime (in iso-8601 format) is used.	2024-07-12 14:41:05 -07:00
Michael Hohn	3df1cac5ae	Clean up package info	2024-07-10 15:38:59 -07:00
Michael Hohn	dcc32ea8ab	Add documentation style sheet and Makefile entry	2024-07-10 15:27:09 -07:00
Michael Hohn	3c8db9cbe4	Put the DB code into a package	2024-07-10 15:04:09 -07:00
Michael Hohn	2df48b9f98	Collect DB information from file system and render it	2024-07-10 09:11:21 -07:00
Michael Hohn	8d80272922	Add client/ setup and plan	2024-07-09 10:37:41 -07:00

36 Commits