improve list formatting

2025-04-09 18:23:20 -07:00
parent 47a021d84a
commit 01ddf38069
2 changed files with 242 additions and 214 deletions
--- a/doc/mrva-overview.pdf
+++ b/doc/mrva-overview.pdf
--- a/doc/mrva-overview.tex
+++ b/doc/mrva-overview.tex
@@ -3,6 +3,34 @@
 % Load the geometry package to set margins
 \usepackage[lmargin=2cm,rmargin=2cm,tmargin=1.8cm,bmargin=1.8cm]{geometry}
 % increase nesting depth
 \usepackage{enumitem}
 \setlistdepth{9}
 % 
 \renewlist{itemize}{itemize}{9}
 \setlist[itemize,1]{label=\textbullet}
 \setlist[itemize,2]{label=--}
 \setlist[itemize,3]{label=*}
 \setlist[itemize,4]{label=•}
 \setlist[itemize,5]{label=–}
 \setlist[itemize,6]{label=>}
 \setlist[itemize,7]{label=»}
 \setlist[itemize,8]{label=›}
 \setlist[itemize,9]{label=·}
 %
 \renewlist{enumerate}{enumerate}{9}
 \setlist[enumerate,1]{label=\arabic*.,ref=\arabic*}
 \setlist[enumerate,2]{label=\alph*.),ref=\theenumi\alph*}
 \setlist[enumerate,3]{label=\roman*.),ref=\theenumii\roman*}
 \setlist[enumerate,4]{label=\Alph*.),ref=\theenumiii\Alph*}
 \setlist[enumerate,5]{label=\Roman*.),ref=\theenumiv\Roman*}
 \setlist[enumerate,6]{label=\arabic*),ref=\theenumv\arabic*}
 \setlist[enumerate,7]{label=\alph*),ref=\theenumvi\alph*}
 \setlist[enumerate,8]{label=\roman*),ref=\theenumvii\roman*}
 \setlist[enumerate,9]{label=\Alph*),ref=\theenumviii\Alph*}
 % Load CM Bright for math
 \usepackage{amsmath}  % Standard math package
 \usepackage{amssymb}  % Additional math symbols
@@ -45,9 +73,9 @@
 % Define 'listing' as a floating environment
 \DeclareFloatingEnvironment[
-    fileext=lol,
+fileext=lol,
-    listname=List of Listings,
+listname=List of Listings,
-    name=Listing
+name=Listing
 ]{listing}
 % To prevent floats from moving past a section boundary but still allow some floating:
@@ -91,17 +119,17 @@
 The MRVA system is organized as a collection of services. On the server side, the
 system is containerized using Docker and comprises several key components:
 \begin{itemize}
-    \item {\textbf{Server}}: Acts as the central coordinator.
+\item {\textbf{Server}}: Acts as the central coordinator.
-    \item \textbf{Agents}: One or more agents that execute tasks.
+\item \textbf{Agents}: One or more agents that execute tasks.
-    \item \textbf{RabbitMQ}: Handles messaging between components.
+\item \textbf{RabbitMQ}: Handles messaging between components.
-    \item \textbf{MinIO}: Provides storage for both queries and results.
+\item \textbf{MinIO}: Provides storage for both queries and results.
-    \item \textbf{HEPC}: An HTTP endpoint that hosts and serves CodeQL databases.
+\item \textbf{HEPC}: An HTTP endpoint that hosts and serves CodeQL databases.
 \end{itemize}
 On the client side, users can interact with the system in two ways:
 \begin{itemize}
-    \item {\textbf{VSCode-CodeQL}}: A graphical interface integrated with Visual Studio Code.
+\item {\textbf{VSCode-CodeQL}}: A graphical interface integrated with Visual Studio Code.
-    \item \textbf{gh-mrva CLI}: A command-line interface that connects to the server in a similar way.
+\item \textbf{gh-mrva CLI}: A command-line interface that connects to the server in a similar way.
 \end{itemize}
 This architecture enables a robust and flexible workflow for code analysis, combining a containerized back-end with both graphical and CLI front-end tools.
@@ -114,15 +142,15 @@ overview.
 \subsection{Execution Overview}
 The \textit{MRVA system} is a distributed platform for executing \textit{CodeQL
-queries} across multiple repositories using a set of worker agents. The system is
+  queries} across multiple repositories using a set of worker agents. The system is
 {containerized} and built around a set of core services:
 \begin{itemize}
-    \item \textbf{Server}: Coordinates job distribution and result aggregation.
+\item \textbf{Server}: Coordinates job distribution and result aggregation.
-    \item \textbf{Agents}: Execute queries independently and return results.
+\item \textbf{Agents}: Execute queries independently and return results.
-    \item \textbf{RabbitMQ}: Handles messaging between system components.
+\item \textbf{RabbitMQ}: Handles messaging between system components.
-    \item \textbf{MinIO}: Stores query inputs and execution results.
+\item \textbf{MinIO}: Stores query inputs and execution results.
-    \item \textbf{HEPC}: Serves CodeQL databases over HTTP.
+\item \textbf{HEPC}: Serves CodeQL databases over HTTP.
 \end{itemize}
 Clients interact with MRVA via \texttt{VSCode-CodeQL} (a graphical interface) or
@@ -132,11 +160,11 @@ server.
 The execution process follows a structured workflow:
 \begin{enumerate}
-    \item A client submits a set of queries $\mathcal{Q}$ targeting a repository
+\item A client submits a set of queries $\mathcal{Q}$ targeting a repository
  set $\mathcal{R}$.
-    \item The server enqueues jobs and distributes them to available agents.
+\item The server enqueues jobs and distributes them to available agents.
-    \item Each agent retrieves a job, executes queries against its assigned repository, and accumulates results.
+\item Each agent retrieves a job, executes queries against its assigned repository, and accumulates results.
-    \item The agent sends results back to the server, which then forwards them to the client.
+\item The agent sends results back to the server, which then forwards them to the client.
 \end{enumerate}
 This full round-trip can be expressed as:
@@ -181,8 +209,8 @@ is that both setups follow the same structural approach:
 Thus:
 \begin{itemize}
-    \item The {functional architecture is identical} between the single-machine and cluster setups.
+\item The {functional architecture is identical} between the single-machine and cluster setups.
-    \item The {primary difference} is in \textit{scale}:
+\item The {primary difference} is in \textit{scale}:
  \begin{itemize}
  \item A single machine is limited by \textit{local CPU and RAM}.
  \item A cluster is constrained by \textit{network and inter-node coordination overhead} but allows for higher overall compute capacity.
@@ -195,84 +223,84 @@ Thus:
 The following table enumerates the types (messages) passed from Client to Server.
 \begin{longtable}{|p{5cm}|p{5cm}|p{5cm}|}
-\hline
+  \hline
-\rowcolor{gray!20} \textbf{Type Name} & \textbf{Field} & \textbf{Type} \\
+  \rowcolor{gray!20} \textbf{Type Name} & \textbf{Field} & \textbf{Type} \\
-\hline
+  \hline
-\endfirsthead
+  \endfirsthead
-\hline
+  \hline
-\rowcolor{gray!20} \textbf{Type Name} & \textbf{Field} & \textbf{Type} \\
+  \rowcolor{gray!20} \textbf{Type Name} & \textbf{Field} & \textbf{Type} \\
-\hline
+  \hline
-\endhead
+  \endhead
-\hline
+  \hline
-\endfoot
+  \endfoot
-\hline
+  \hline
-\endlastfoot
+  \endlastfoot
-ServerState & NextID & () $\rightarrow$ int \\
+  ServerState & NextID & () $\rightarrow$ int \\
-& GetResult & JobSpec $\rightarrow$ IO (Either Error AnalyzeResult) \\
+                                        & GetResult & JobSpec $\rightarrow$ IO (Either Error AnalyzeResult) \\
-& GetJobSpecByRepoId & (int, int) $\rightarrow$ IO (Either Error JobSpec) \\
+                                        & GetJobSpecByRepoId & (int, int) $\rightarrow$ IO (Either Error JobSpec) \\
-& SetResult & (JobSpec, AnalyzeResult) $\rightarrow$ IO () \\
+                                        & SetResult & (JobSpec, AnalyzeResult) $\rightarrow$ IO () \\
-& GetJobList & int $\rightarrow$ IO (Either Error \textbf{[AnalyzeJob]}) \\
+                                        & GetJobList & int $\rightarrow$ IO (Either Error \textbf{[AnalyzeJob]}) \\
-& GetJobInfo & JobSpec $\rightarrow$ IO (Either Error JobInfo) \\
+                                        & GetJobInfo & JobSpec $\rightarrow$ IO (Either Error JobInfo) \\
-& SetJobInfo & (JobSpec, JobInfo) $\rightarrow$ IO () \\
+                                        & SetJobInfo & (JobSpec, JobInfo) $\rightarrow$ IO () \\
-& GetStatus & JobSpec $\rightarrow$ IO (Either Error Status) \\
+                                        & GetStatus & JobSpec $\rightarrow$ IO (Either Error Status) \\
-& SetStatus & (JobSpec, Status) $\rightarrow$ IO () \\
+                                        & SetStatus & (JobSpec, Status) $\rightarrow$ IO () \\
-& AddJob & AnalyzeJob $\rightarrow$ IO () \\
+                                        & AddJob & AnalyzeJob $\rightarrow$ IO () \\
-\hline
+  \hline
-JobSpec & sessionID & int \\
+  JobSpec & sessionID & int \\
-& nameWithOwner & string \\
+                                        & nameWithOwner & string \\
-\hline
+  \hline
-AnalyzeResult & spec & JobSpec \\
+  AnalyzeResult & spec & JobSpec \\
-& status & Status \\
+                                        & status & Status \\
-& resultCount & int \\
+                                        & resultCount & int \\
-& resultLocation & ArtifactLocation \\
+                                        & resultLocation & ArtifactLocation \\
-& sourceLocationPrefix & string \\
+                                        & sourceLocationPrefix & string \\
-& databaseSHA & string \\
+                                        & databaseSHA & string \\
-\hline
+  \hline
-ArtifactLocation & Key & string \\
+  ArtifactLocation & Key & string \\
-& Bucket & string \\
+                                        & Bucket & string \\
-\hline
+  \hline
-AnalyzeJob & Spec & JobSpec \\
+  AnalyzeJob & Spec & JobSpec \\
-& QueryPackLocation & ArtifactLocation \\
+                                        & QueryPackLocation & ArtifactLocation \\
-& QueryLanguage & QueryLanguage \\
+                                        & QueryLanguage & QueryLanguage \\
-\hline
+  \hline
-QueryLanguage &  & string \\
+  QueryLanguage &  & string \\
-\hline
+  \hline
-JobInfo & QueryLanguage & string \\
+  JobInfo & QueryLanguage & string \\
-& CreatedAt & string \\
+                                        & CreatedAt & string \\
-& UpdatedAt & string \\
+                                        & UpdatedAt & string \\
-& SkippedRepositories & SkippedRepositories \\
+                                        & SkippedRepositories & SkippedRepositories \\
-\hline
+  \hline
-SkippedRepositories & AccessMismatchRepos & AccessMismatchRepos \\
+  SkippedRepositories & AccessMismatchRepos & AccessMismatchRepos \\
-& NotFoundRepos & NotFoundRepos \\
+                                        & NotFoundRepos & NotFoundRepos \\
-& NoCodeqlDBRepos & NoCodeqlDBRepos \\
+                                        & NoCodeqlDBRepos & NoCodeqlDBRepos \\
-& OverLimitRepos & OverLimitRepos \\
+                                        & OverLimitRepos & OverLimitRepos \\
-\hline
+  \hline
-AccessMismatchRepos & RepositoryCount & int \\
+  AccessMismatchRepos & RepositoryCount & int \\
-& Repositories & \textbf{[Repository]} \\
+                                        & Repositories & \textbf{[Repository]} \\
-\hline
+  \hline
-NotFoundRepos & RepositoryCount & int \\
+  NotFoundRepos & RepositoryCount & int \\
-& RepositoryFullNames & \textbf{[string]} \\
+                                        & RepositoryFullNames & \textbf{[string]} \\
-\hline
+  \hline
-Repository & ID & int \\
+  Repository & ID & int \\
-& Name & string \\
+                                        & Name & string \\
-& FullName & string \\
+                                        & FullName & string \\
-& Private & bool \\
+                                        & Private & bool \\
-& StargazersCount & int \\
+                                        & StargazersCount & int \\
-& UpdatedAt & string \\
+                                        & UpdatedAt & string \\
 \end{longtable}
@@ -313,11 +341,11 @@ The full round-trip execution, from query submission to result delivery, can be
 \]
 \begin{itemize}
-    \item \(C \to S\): Client submits a query suite \(\mathcal{Q}\) to the server.
+\item \(C \to S\): Client submits a query suite \(\mathcal{Q}\) to the server.
-    \item \(S \to Q\): Server enqueues the query suite \((\mathcal{Q}, \mathcal{R}_i)\) for each repository.
+\item \(S \to Q\): Server enqueues the query suite \((\mathcal{Q}, \mathcal{R}_i)\) for each repository.
-    \item \(Q \to \alpha\): Agent \(\alpha\) polls the queue and retrieves a job.
+\item \(Q \to \alpha\): Agent \(\alpha\) polls the queue and retrieves a job.
-    \item \(\alpha \to S\): Agent executes the queries and returns the accumulated results \(\mathcal{R}_i^{\mathcal{Q}}\) to the server.
+\item \(\alpha \to S\): Agent executes the queries and returns the accumulated results \(\mathcal{R}_i^{\mathcal{Q}}\) to the server.
-    \item \(S \to C\): Server sends the complete result set \(\mathcal{R}_i^{\mathcal{Q}}\) for each repository back to the client.
+\item \(S \to C\): Server sends the complete result set \(\mathcal{R}_i^{\mathcal{Q}}\) for each repository back to the client.
 \end{itemize}
 \section{Result Representation}
@@ -330,9 +358,9 @@ For the complete collection of results across all repositories and queries:
 where:
 \begin{itemize}
-    \item \(N\) is the total number of repositories.
+\item \(N\) is the total number of repositories.
-    \item \(M\) is the total number of queries in \(\mathcal{Q}\).
+\item \(M\) is the total number of queries in \(\mathcal{Q}\).
-    \item \(k_{i,j}\) is the number of results from executing query
+\item \(k_{i,j}\) is the number of results from executing query
  \(\mathcal{Q}_j\)
  on repository \(\mathcal{R}_i\).
 \end{itemize}
@@ -354,7 +382,7 @@ Each result can be further indexed to track multiple repositories and result set
 \begin{listing}[H] % h = here, t = top, b = bottom, p = page of floats
  \caption{Distributed Query Execution Algorithm}
-    \begin{lstlisting}[language=Python]
+\begin{lstlisting}[language=Python]
 # Distributed Query Execution with Agent Polling and Accumulated Results
 # Initialization
@@ -519,52 +547,52 @@ $\mathcal{R}_{\text{results}}$ = execute_queries(A, Q, $\mathcal{R}_{\text{resul
 \begin{enumerate}
 \item \textbf{\textbf{Initialization}}
-\begin{itemize}
+  \begin{itemize}
-\item For each repository \(\mathcal{R}_i \in \mathcal{R}\):
+  \item For each repository \(\mathcal{R}_i \in \mathcal{R}\):
-\begin{itemize}
+    \begin{itemize}
-\item Initialize result sets: \(\mathcal{R}_i^{\mathcal{Q}} \gets \{\}\).
+    \item Initialize result sets: \(\mathcal{R}_i^{\mathcal{Q}} \gets \{\}\).
-\end{itemize}
+    \end{itemize}
-\item Initialize an empty job queue: \(Q \gets \{\}\).
+  \item Initialize an empty job queue: \(Q \gets \{\}\).
-\end{itemize}
+  \end{itemize}
 \item \textbf{\textbf{Enqueue Queries}}
-\begin{itemize}
+  \begin{itemize}
-\item For each repository \(\mathcal{R}_i \in \mathcal{R}\):
+  \item For each repository \(\mathcal{R}_i \in \mathcal{R}\):
-\begin{itemize}
+    \begin{itemize}
-\item Enqueue the entire query suite: \(S \xrightarrow{\text{enqueue}(\mathcal{Q}, \mathcal{R}_i)} Q\).
+    \item Enqueue the entire query suite: \(S \xrightarrow{\text{enqueue}(\mathcal{Q}, \mathcal{R}_i)} Q\).
-\end{itemize}
+    \end{itemize}
-\end{itemize}
+  \end{itemize}
 \item \textbf{\textbf{Execution Loop}}
-\begin{itemize}
+  \begin{itemize}
-\item While \(Q \neq \emptyset\): (agents poll the queue for available jobs)
+  \item While \(Q \neq \emptyset\): (agents poll the queue for available jobs)
-\begin{itemize}
+    \begin{itemize}
-\item For each available agent \(\alpha \in A\):
+    \item For each available agent \(\alpha \in A\):
-\begin{itemize}
+      \begin{itemize}
-\item Agent autonomously retrieves a job: \(\alpha \xleftarrow{\text{poll}(Q)}\).
+      \item Agent autonomously retrieves a job: \(\alpha \xleftarrow{\text{poll}(Q)}\).
-\item \textbf{\textbf{Agent Execution Block}}
+      \item \textbf{\textbf{Agent Execution Block}}
-\begin{itemize}
+        \begin{itemize}
-\item Initialize result set for this repository: \(\mathcal{R}_i^{\mathcal{Q}} \gets \{\}\).
+        \item Initialize result set for this repository: \(\mathcal{R}_i^{\mathcal{Q}} \gets \{\}\).
-\item For each query \(\mathcal{Q}_j \in \mathcal{Q}\):
+        \item For each query \(\mathcal{Q}_j \in \mathcal{Q}\):
-\begin{itemize}
+          \begin{itemize}
-\item Collect results:  
+          \item Collect results:  
-\(\mathcal{R}_i^{\mathcal{Q}_j} \gets \{ r_{i,j,1}, r_{i,j,2}, \dots, r_{i,j,k_{i,j}} \}\).
+            \(\mathcal{R}_i^{\mathcal{Q}_j} \gets \{ r_{i,j,1}, r_{i,j,2}, \dots, r_{i,j,k_{i,j}} \}\).
-\item Accumulate results:  
+          \item Accumulate results:  
-\(\mathcal{R}_i^{\mathcal{Q}} \gets \mathcal{R}_i^{\mathcal{Q}} \cup \mathcal{R}_i^{\mathcal{Q}_j}\).
+            \(\mathcal{R}_i^{\mathcal{Q}} \gets \mathcal{R}_i^{\mathcal{Q}} \cup \mathcal{R}_i^{\mathcal{Q}_j}\).
-\end{itemize}
+          \end{itemize}
-\item Agent sends all accumulated results back to the server:  
+        \item Agent sends all accumulated results back to the server:  
-\(\alpha \xrightarrow{(\mathcal{Q}, \mathcal{R}_i, \mathcal{R}_i^{\mathcal{Q}})} S\).
+          \(\alpha \xrightarrow{(\mathcal{Q}, \mathcal{R}_i, \mathcal{R}_i^{\mathcal{Q}})} S\).
-\end{itemize}
+        \end{itemize}
-\end{itemize}
+      \end{itemize}
-\end{itemize}
+    \end{itemize}
-\end{itemize}
+  \end{itemize}
 \item \textbf{\textbf{Agent Sends Results}}
-\begin{itemize}
+  \begin{itemize}
-\item Server sends results for repository \(i\) back to the client:  
+  \item Server sends results for repository \(i\) back to the client:  
-\(S \xrightarrow{(\mathcal{Q}, \mathcal{R}_i, \mathcal{R}_i^{\mathcal{Q}})} C\).
+    \(S \xrightarrow{(\mathcal{Q}, \mathcal{R}_i, \mathcal{R}_i^{\mathcal{Q}})} C\).
-\end{itemize}
+  \end{itemize}
 \end{enumerate}
 \end{document}