improve list formatting

This commit is contained in:
Michael Hohn
2025-04-09 18:23:20 -07:00
committed by =Michael Hohn
parent 47a021d84a
commit 01ddf38069
2 changed files with 242 additions and 214 deletions

Binary file not shown.

View File

@@ -3,6 +3,34 @@
% Load the geometry package to set margins % Load the geometry package to set margins
\usepackage[lmargin=2cm,rmargin=2cm,tmargin=1.8cm,bmargin=1.8cm]{geometry} \usepackage[lmargin=2cm,rmargin=2cm,tmargin=1.8cm,bmargin=1.8cm]{geometry}
% increase nesting depth
\usepackage{enumitem}
\setlistdepth{9}
%
\renewlist{itemize}{itemize}{9}
\setlist[itemize,1]{label=\textbullet}
\setlist[itemize,2]{label=--}
\setlist[itemize,3]{label=*}
\setlist[itemize,4]{label=•}
\setlist[itemize,5]{label=}
\setlist[itemize,6]{label=>}
\setlist[itemize,7]{label=»}
\setlist[itemize,8]{label=}
\setlist[itemize,9]{label=·}
%
\renewlist{enumerate}{enumerate}{9}
\setlist[enumerate,1]{label=\arabic*.,ref=\arabic*}
\setlist[enumerate,2]{label=\alph*.),ref=\theenumi\alph*}
\setlist[enumerate,3]{label=\roman*.),ref=\theenumii\roman*}
\setlist[enumerate,4]{label=\Alph*.),ref=\theenumiii\Alph*}
\setlist[enumerate,5]{label=\Roman*.),ref=\theenumiv\Roman*}
\setlist[enumerate,6]{label=\arabic*),ref=\theenumv\arabic*}
\setlist[enumerate,7]{label=\alph*),ref=\theenumvi\alph*}
\setlist[enumerate,8]{label=\roman*),ref=\theenumvii\roman*}
\setlist[enumerate,9]{label=\Alph*),ref=\theenumviii\Alph*}
% Load CM Bright for math % Load CM Bright for math
\usepackage{amsmath} % Standard math package \usepackage{amsmath} % Standard math package
\usepackage{amssymb} % Additional math symbols \usepackage{amssymb} % Additional math symbols
@@ -45,9 +73,9 @@
% Define 'listing' as a floating environment % Define 'listing' as a floating environment
\DeclareFloatingEnvironment[ \DeclareFloatingEnvironment[
fileext=lol, fileext=lol,
listname=List of Listings, listname=List of Listings,
name=Listing name=Listing
]{listing} ]{listing}
% To prevent floats from moving past a section boundary but still allow some floating: % To prevent floats from moving past a section boundary but still allow some floating:
@@ -91,17 +119,17 @@
The MRVA system is organized as a collection of services. On the server side, the The MRVA system is organized as a collection of services. On the server side, the
system is containerized using Docker and comprises several key components: system is containerized using Docker and comprises several key components:
\begin{itemize} \begin{itemize}
\item {\textbf{Server}}: Acts as the central coordinator. \item {\textbf{Server}}: Acts as the central coordinator.
\item \textbf{Agents}: One or more agents that execute tasks. \item \textbf{Agents}: One or more agents that execute tasks.
\item \textbf{RabbitMQ}: Handles messaging between components. \item \textbf{RabbitMQ}: Handles messaging between components.
\item \textbf{MinIO}: Provides storage for both queries and results. \item \textbf{MinIO}: Provides storage for both queries and results.
\item \textbf{HEPC}: An HTTP endpoint that hosts and serves CodeQL databases. \item \textbf{HEPC}: An HTTP endpoint that hosts and serves CodeQL databases.
\end{itemize} \end{itemize}
On the client side, users can interact with the system in two ways: On the client side, users can interact with the system in two ways:
\begin{itemize} \begin{itemize}
\item {\textbf{VSCode-CodeQL}}: A graphical interface integrated with Visual Studio Code. \item {\textbf{VSCode-CodeQL}}: A graphical interface integrated with Visual Studio Code.
\item \textbf{gh-mrva CLI}: A command-line interface that connects to the server in a similar way. \item \textbf{gh-mrva CLI}: A command-line interface that connects to the server in a similar way.
\end{itemize} \end{itemize}
This architecture enables a robust and flexible workflow for code analysis, combining a containerized back-end with both graphical and CLI front-end tools. This architecture enables a robust and flexible workflow for code analysis, combining a containerized back-end with both graphical and CLI front-end tools.
@@ -114,15 +142,15 @@ overview.
\subsection{Execution Overview} \subsection{Execution Overview}
The \textit{MRVA system} is a distributed platform for executing \textit{CodeQL The \textit{MRVA system} is a distributed platform for executing \textit{CodeQL
queries} across multiple repositories using a set of worker agents. The system is queries} across multiple repositories using a set of worker agents. The system is
{containerized} and built around a set of core services: {containerized} and built around a set of core services:
\begin{itemize} \begin{itemize}
\item \textbf{Server}: Coordinates job distribution and result aggregation. \item \textbf{Server}: Coordinates job distribution and result aggregation.
\item \textbf{Agents}: Execute queries independently and return results. \item \textbf{Agents}: Execute queries independently and return results.
\item \textbf{RabbitMQ}: Handles messaging between system components. \item \textbf{RabbitMQ}: Handles messaging between system components.
\item \textbf{MinIO}: Stores query inputs and execution results. \item \textbf{MinIO}: Stores query inputs and execution results.
\item \textbf{HEPC}: Serves CodeQL databases over HTTP. \item \textbf{HEPC}: Serves CodeQL databases over HTTP.
\end{itemize} \end{itemize}
Clients interact with MRVA via \texttt{VSCode-CodeQL} (a graphical interface) or Clients interact with MRVA via \texttt{VSCode-CodeQL} (a graphical interface) or
@@ -132,11 +160,11 @@ server.
The execution process follows a structured workflow: The execution process follows a structured workflow:
\begin{enumerate} \begin{enumerate}
\item A client submits a set of queries $\mathcal{Q}$ targeting a repository \item A client submits a set of queries $\mathcal{Q}$ targeting a repository
set $\mathcal{R}$. set $\mathcal{R}$.
\item The server enqueues jobs and distributes them to available agents. \item The server enqueues jobs and distributes them to available agents.
\item Each agent retrieves a job, executes queries against its assigned repository, and accumulates results. \item Each agent retrieves a job, executes queries against its assigned repository, and accumulates results.
\item The agent sends results back to the server, which then forwards them to the client. \item The agent sends results back to the server, which then forwards them to the client.
\end{enumerate} \end{enumerate}
This full round-trip can be expressed as: This full round-trip can be expressed as:
@@ -181,8 +209,8 @@ is that both setups follow the same structural approach:
Thus: Thus:
\begin{itemize} \begin{itemize}
\item The {functional architecture is identical} between the single-machine and cluster setups. \item The {functional architecture is identical} between the single-machine and cluster setups.
\item The {primary difference} is in \textit{scale}: \item The {primary difference} is in \textit{scale}:
\begin{itemize} \begin{itemize}
\item A single machine is limited by \textit{local CPU and RAM}. \item A single machine is limited by \textit{local CPU and RAM}.
\item A cluster is constrained by \textit{network and inter-node coordination overhead} but allows for higher overall compute capacity. \item A cluster is constrained by \textit{network and inter-node coordination overhead} but allows for higher overall compute capacity.
@@ -195,84 +223,84 @@ Thus:
The following table enumerates the types (messages) passed from Client to Server. The following table enumerates the types (messages) passed from Client to Server.
\begin{longtable}{|p{5cm}|p{5cm}|p{5cm}|} \begin{longtable}{|p{5cm}|p{5cm}|p{5cm}|}
\hline \hline
\rowcolor{gray!20} \textbf{Type Name} & \textbf{Field} & \textbf{Type} \\ \rowcolor{gray!20} \textbf{Type Name} & \textbf{Field} & \textbf{Type} \\
\hline \hline
\endfirsthead \endfirsthead
\hline \hline
\rowcolor{gray!20} \textbf{Type Name} & \textbf{Field} & \textbf{Type} \\ \rowcolor{gray!20} \textbf{Type Name} & \textbf{Field} & \textbf{Type} \\
\hline \hline
\endhead \endhead
\hline \hline
\endfoot \endfoot
\hline \hline
\endlastfoot \endlastfoot
ServerState & NextID & () $\rightarrow$ int \\ ServerState & NextID & () $\rightarrow$ int \\
& GetResult & JobSpec $\rightarrow$ IO (Either Error AnalyzeResult) \\ & GetResult & JobSpec $\rightarrow$ IO (Either Error AnalyzeResult) \\
& GetJobSpecByRepoId & (int, int) $\rightarrow$ IO (Either Error JobSpec) \\ & GetJobSpecByRepoId & (int, int) $\rightarrow$ IO (Either Error JobSpec) \\
& SetResult & (JobSpec, AnalyzeResult) $\rightarrow$ IO () \\ & SetResult & (JobSpec, AnalyzeResult) $\rightarrow$ IO () \\
& GetJobList & int $\rightarrow$ IO (Either Error \textbf{[AnalyzeJob]}) \\ & GetJobList & int $\rightarrow$ IO (Either Error \textbf{[AnalyzeJob]}) \\
& GetJobInfo & JobSpec $\rightarrow$ IO (Either Error JobInfo) \\ & GetJobInfo & JobSpec $\rightarrow$ IO (Either Error JobInfo) \\
& SetJobInfo & (JobSpec, JobInfo) $\rightarrow$ IO () \\ & SetJobInfo & (JobSpec, JobInfo) $\rightarrow$ IO () \\
& GetStatus & JobSpec $\rightarrow$ IO (Either Error Status) \\ & GetStatus & JobSpec $\rightarrow$ IO (Either Error Status) \\
& SetStatus & (JobSpec, Status) $\rightarrow$ IO () \\ & SetStatus & (JobSpec, Status) $\rightarrow$ IO () \\
& AddJob & AnalyzeJob $\rightarrow$ IO () \\ & AddJob & AnalyzeJob $\rightarrow$ IO () \\
\hline \hline
JobSpec & sessionID & int \\ JobSpec & sessionID & int \\
& nameWithOwner & string \\ & nameWithOwner & string \\
\hline \hline
AnalyzeResult & spec & JobSpec \\ AnalyzeResult & spec & JobSpec \\
& status & Status \\ & status & Status \\
& resultCount & int \\ & resultCount & int \\
& resultLocation & ArtifactLocation \\ & resultLocation & ArtifactLocation \\
& sourceLocationPrefix & string \\ & sourceLocationPrefix & string \\
& databaseSHA & string \\ & databaseSHA & string \\
\hline \hline
ArtifactLocation & Key & string \\ ArtifactLocation & Key & string \\
& Bucket & string \\ & Bucket & string \\
\hline \hline
AnalyzeJob & Spec & JobSpec \\ AnalyzeJob & Spec & JobSpec \\
& QueryPackLocation & ArtifactLocation \\ & QueryPackLocation & ArtifactLocation \\
& QueryLanguage & QueryLanguage \\ & QueryLanguage & QueryLanguage \\
\hline \hline
QueryLanguage & & string \\ QueryLanguage & & string \\
\hline \hline
JobInfo & QueryLanguage & string \\ JobInfo & QueryLanguage & string \\
& CreatedAt & string \\ & CreatedAt & string \\
& UpdatedAt & string \\ & UpdatedAt & string \\
& SkippedRepositories & SkippedRepositories \\ & SkippedRepositories & SkippedRepositories \\
\hline \hline
SkippedRepositories & AccessMismatchRepos & AccessMismatchRepos \\ SkippedRepositories & AccessMismatchRepos & AccessMismatchRepos \\
& NotFoundRepos & NotFoundRepos \\ & NotFoundRepos & NotFoundRepos \\
& NoCodeqlDBRepos & NoCodeqlDBRepos \\ & NoCodeqlDBRepos & NoCodeqlDBRepos \\
& OverLimitRepos & OverLimitRepos \\ & OverLimitRepos & OverLimitRepos \\
\hline \hline
AccessMismatchRepos & RepositoryCount & int \\ AccessMismatchRepos & RepositoryCount & int \\
& Repositories & \textbf{[Repository]} \\ & Repositories & \textbf{[Repository]} \\
\hline \hline
NotFoundRepos & RepositoryCount & int \\ NotFoundRepos & RepositoryCount & int \\
& RepositoryFullNames & \textbf{[string]} \\ & RepositoryFullNames & \textbf{[string]} \\
\hline \hline
Repository & ID & int \\ Repository & ID & int \\
& Name & string \\ & Name & string \\
& FullName & string \\ & FullName & string \\
& Private & bool \\ & Private & bool \\
& StargazersCount & int \\ & StargazersCount & int \\
& UpdatedAt & string \\ & UpdatedAt & string \\
\end{longtable} \end{longtable}
@@ -313,11 +341,11 @@ The full round-trip execution, from query submission to result delivery, can be
\] \]
\begin{itemize} \begin{itemize}
\item \(C \to S\): Client submits a query suite \(\mathcal{Q}\) to the server. \item \(C \to S\): Client submits a query suite \(\mathcal{Q}\) to the server.
\item \(S \to Q\): Server enqueues the query suite \((\mathcal{Q}, \mathcal{R}_i)\) for each repository. \item \(S \to Q\): Server enqueues the query suite \((\mathcal{Q}, \mathcal{R}_i)\) for each repository.
\item \(Q \to \alpha\): Agent \(\alpha\) polls the queue and retrieves a job. \item \(Q \to \alpha\): Agent \(\alpha\) polls the queue and retrieves a job.
\item \(\alpha \to S\): Agent executes the queries and returns the accumulated results \(\mathcal{R}_i^{\mathcal{Q}}\) to the server. \item \(\alpha \to S\): Agent executes the queries and returns the accumulated results \(\mathcal{R}_i^{\mathcal{Q}}\) to the server.
\item \(S \to C\): Server sends the complete result set \(\mathcal{R}_i^{\mathcal{Q}}\) for each repository back to the client. \item \(S \to C\): Server sends the complete result set \(\mathcal{R}_i^{\mathcal{Q}}\) for each repository back to the client.
\end{itemize} \end{itemize}
\section{Result Representation} \section{Result Representation}
@@ -330,9 +358,9 @@ For the complete collection of results across all repositories and queries:
where: where:
\begin{itemize} \begin{itemize}
\item \(N\) is the total number of repositories. \item \(N\) is the total number of repositories.
\item \(M\) is the total number of queries in \(\mathcal{Q}\). \item \(M\) is the total number of queries in \(\mathcal{Q}\).
\item \(k_{i,j}\) is the number of results from executing query \item \(k_{i,j}\) is the number of results from executing query
\(\mathcal{Q}_j\) \(\mathcal{Q}_j\)
on repository \(\mathcal{R}_i\). on repository \(\mathcal{R}_i\).
\end{itemize} \end{itemize}
@@ -354,7 +382,7 @@ Each result can be further indexed to track multiple repositories and result set
\begin{listing}[H] % h = here, t = top, b = bottom, p = page of floats \begin{listing}[H] % h = here, t = top, b = bottom, p = page of floats
\caption{Distributed Query Execution Algorithm} \caption{Distributed Query Execution Algorithm}
\begin{lstlisting}[language=Python] \begin{lstlisting}[language=Python]
# Distributed Query Execution with Agent Polling and Accumulated Results # Distributed Query Execution with Agent Polling and Accumulated Results
# Initialization # Initialization
@@ -519,52 +547,52 @@ $\mathcal{R}_{\text{results}}$ = execute_queries(A, Q, $\mathcal{R}_{\text{resul
\begin{enumerate} \begin{enumerate}
\item \textbf{\textbf{Initialization}} \item \textbf{\textbf{Initialization}}
\begin{itemize} \begin{itemize}
\item For each repository \(\mathcal{R}_i \in \mathcal{R}\): \item For each repository \(\mathcal{R}_i \in \mathcal{R}\):
\begin{itemize} \begin{itemize}
\item Initialize result sets: \(\mathcal{R}_i^{\mathcal{Q}} \gets \{\}\). \item Initialize result sets: \(\mathcal{R}_i^{\mathcal{Q}} \gets \{\}\).
\end{itemize} \end{itemize}
\item Initialize an empty job queue: \(Q \gets \{\}\). \item Initialize an empty job queue: \(Q \gets \{\}\).
\end{itemize} \end{itemize}
\item \textbf{\textbf{Enqueue Queries}} \item \textbf{\textbf{Enqueue Queries}}
\begin{itemize} \begin{itemize}
\item For each repository \(\mathcal{R}_i \in \mathcal{R}\): \item For each repository \(\mathcal{R}_i \in \mathcal{R}\):
\begin{itemize} \begin{itemize}
\item Enqueue the entire query suite: \(S \xrightarrow{\text{enqueue}(\mathcal{Q}, \mathcal{R}_i)} Q\). \item Enqueue the entire query suite: \(S \xrightarrow{\text{enqueue}(\mathcal{Q}, \mathcal{R}_i)} Q\).
\end{itemize} \end{itemize}
\end{itemize} \end{itemize}
\item \textbf{\textbf{Execution Loop}} \item \textbf{\textbf{Execution Loop}}
\begin{itemize} \begin{itemize}
\item While \(Q \neq \emptyset\): (agents poll the queue for available jobs) \item While \(Q \neq \emptyset\): (agents poll the queue for available jobs)
\begin{itemize} \begin{itemize}
\item For each available agent \(\alpha \in A\): \item For each available agent \(\alpha \in A\):
\begin{itemize} \begin{itemize}
\item Agent autonomously retrieves a job: \(\alpha \xleftarrow{\text{poll}(Q)}\). \item Agent autonomously retrieves a job: \(\alpha \xleftarrow{\text{poll}(Q)}\).
\item \textbf{\textbf{Agent Execution Block}} \item \textbf{\textbf{Agent Execution Block}}
\begin{itemize} \begin{itemize}
\item Initialize result set for this repository: \(\mathcal{R}_i^{\mathcal{Q}} \gets \{\}\). \item Initialize result set for this repository: \(\mathcal{R}_i^{\mathcal{Q}} \gets \{\}\).
\item For each query \(\mathcal{Q}_j \in \mathcal{Q}\): \item For each query \(\mathcal{Q}_j \in \mathcal{Q}\):
\begin{itemize} \begin{itemize}
\item Collect results: \item Collect results:
\(\mathcal{R}_i^{\mathcal{Q}_j} \gets \{ r_{i,j,1}, r_{i,j,2}, \dots, r_{i,j,k_{i,j}} \}\). \(\mathcal{R}_i^{\mathcal{Q}_j} \gets \{ r_{i,j,1}, r_{i,j,2}, \dots, r_{i,j,k_{i,j}} \}\).
\item Accumulate results: \item Accumulate results:
\(\mathcal{R}_i^{\mathcal{Q}} \gets \mathcal{R}_i^{\mathcal{Q}} \cup \mathcal{R}_i^{\mathcal{Q}_j}\). \(\mathcal{R}_i^{\mathcal{Q}} \gets \mathcal{R}_i^{\mathcal{Q}} \cup \mathcal{R}_i^{\mathcal{Q}_j}\).
\end{itemize} \end{itemize}
\item Agent sends all accumulated results back to the server: \item Agent sends all accumulated results back to the server:
\(\alpha \xrightarrow{(\mathcal{Q}, \mathcal{R}_i, \mathcal{R}_i^{\mathcal{Q}})} S\). \(\alpha \xrightarrow{(\mathcal{Q}, \mathcal{R}_i, \mathcal{R}_i^{\mathcal{Q}})} S\).
\end{itemize} \end{itemize}
\end{itemize} \end{itemize}
\end{itemize} \end{itemize}
\end{itemize} \end{itemize}
\item \textbf{\textbf{Agent Sends Results}} \item \textbf{\textbf{Agent Sends Results}}
\begin{itemize} \begin{itemize}
\item Server sends results for repository \(i\) back to the client: \item Server sends results for repository \(i\) back to the client:
\(S \xrightarrow{(\mathcal{Q}, \mathcal{R}_i, \mathcal{R}_i^{\mathcal{Q}})} C\). \(S \xrightarrow{(\mathcal{Q}, \mathcal{R}_i, \mathcal{R}_i^{\mathcal{Q}})} C\).
\end{itemize} \end{itemize}
\end{enumerate} \end{enumerate}
\end{document} \end{document}