improve list formatting

This commit is contained in:
Michael Hohn
2025-04-09 18:23:20 -07:00
committed by =Michael Hohn
parent 47a021d84a
commit 01ddf38069
2 changed files with 242 additions and 214 deletions

Binary file not shown.

View File

@@ -3,6 +3,34 @@
% Load the geometry package to set margins
\usepackage[lmargin=2cm,rmargin=2cm,tmargin=1.8cm,bmargin=1.8cm]{geometry}
% increase nesting depth
\usepackage{enumitem}
\setlistdepth{9}
%
\renewlist{itemize}{itemize}{9}
\setlist[itemize,1]{label=\textbullet}
\setlist[itemize,2]{label=--}
\setlist[itemize,3]{label=*}
\setlist[itemize,4]{label=•}
\setlist[itemize,5]{label=}
\setlist[itemize,6]{label=>}
\setlist[itemize,7]{label=»}
\setlist[itemize,8]{label=}
\setlist[itemize,9]{label=·}
%
\renewlist{enumerate}{enumerate}{9}
\setlist[enumerate,1]{label=\arabic*.,ref=\arabic*}
\setlist[enumerate,2]{label=\alph*.),ref=\theenumi\alph*}
\setlist[enumerate,3]{label=\roman*.),ref=\theenumii\roman*}
\setlist[enumerate,4]{label=\Alph*.),ref=\theenumiii\Alph*}
\setlist[enumerate,5]{label=\Roman*.),ref=\theenumiv\Roman*}
\setlist[enumerate,6]{label=\arabic*),ref=\theenumv\arabic*}
\setlist[enumerate,7]{label=\alph*),ref=\theenumvi\alph*}
\setlist[enumerate,8]{label=\roman*),ref=\theenumvii\roman*}
\setlist[enumerate,9]{label=\Alph*),ref=\theenumviii\Alph*}
% Load CM Bright for math
\usepackage{amsmath} % Standard math package
\usepackage{amssymb} % Additional math symbols
@@ -45,9 +73,9 @@
% Define 'listing' as a floating environment
\DeclareFloatingEnvironment[
fileext=lol,
listname=List of Listings,
name=Listing
fileext=lol,
listname=List of Listings,
name=Listing
]{listing}
% To prevent floats from moving past a section boundary but still allow some floating:
@@ -91,17 +119,17 @@
The MRVA system is organized as a collection of services. On the server side, the
system is containerized using Docker and comprises several key components:
\begin{itemize}
\item {\textbf{Server}}: Acts as the central coordinator.
\item \textbf{Agents}: One or more agents that execute tasks.
\item \textbf{RabbitMQ}: Handles messaging between components.
\item \textbf{MinIO}: Provides storage for both queries and results.
\item \textbf{HEPC}: An HTTP endpoint that hosts and serves CodeQL databases.
\item {\textbf{Server}}: Acts as the central coordinator.
\item \textbf{Agents}: One or more agents that execute tasks.
\item \textbf{RabbitMQ}: Handles messaging between components.
\item \textbf{MinIO}: Provides storage for both queries and results.
\item \textbf{HEPC}: An HTTP endpoint that hosts and serves CodeQL databases.
\end{itemize}
On the client side, users can interact with the system in two ways:
\begin{itemize}
\item {\textbf{VSCode-CodeQL}}: A graphical interface integrated with Visual Studio Code.
\item \textbf{gh-mrva CLI}: A command-line interface that connects to the server in a similar way.
\item {\textbf{VSCode-CodeQL}}: A graphical interface integrated with Visual Studio Code.
\item \textbf{gh-mrva CLI}: A command-line interface that connects to the server in a similar way.
\end{itemize}
This architecture enables a robust and flexible workflow for code analysis, combining a containerized back-end with both graphical and CLI front-end tools.
@@ -114,15 +142,15 @@ overview.
\subsection{Execution Overview}
The \textit{MRVA system} is a distributed platform for executing \textit{CodeQL
queries} across multiple repositories using a set of worker agents. The system is
queries} across multiple repositories using a set of worker agents. The system is
{containerized} and built around a set of core services:
\begin{itemize}
\item \textbf{Server}: Coordinates job distribution and result aggregation.
\item \textbf{Agents}: Execute queries independently and return results.
\item \textbf{RabbitMQ}: Handles messaging between system components.
\item \textbf{MinIO}: Stores query inputs and execution results.
\item \textbf{HEPC}: Serves CodeQL databases over HTTP.
\item \textbf{Server}: Coordinates job distribution and result aggregation.
\item \textbf{Agents}: Execute queries independently and return results.
\item \textbf{RabbitMQ}: Handles messaging between system components.
\item \textbf{MinIO}: Stores query inputs and execution results.
\item \textbf{HEPC}: Serves CodeQL databases over HTTP.
\end{itemize}
Clients interact with MRVA via \texttt{VSCode-CodeQL} (a graphical interface) or
@@ -132,11 +160,11 @@ server.
The execution process follows a structured workflow:
\begin{enumerate}
\item A client submits a set of queries $\mathcal{Q}$ targeting a repository
\item A client submits a set of queries $\mathcal{Q}$ targeting a repository
set $\mathcal{R}$.
\item The server enqueues jobs and distributes them to available agents.
\item Each agent retrieves a job, executes queries against its assigned repository, and accumulates results.
\item The agent sends results back to the server, which then forwards them to the client.
\item The server enqueues jobs and distributes them to available agents.
\item Each agent retrieves a job, executes queries against its assigned repository, and accumulates results.
\item The agent sends results back to the server, which then forwards them to the client.
\end{enumerate}
This full round-trip can be expressed as:
@@ -181,8 +209,8 @@ is that both setups follow the same structural approach:
Thus:
\begin{itemize}
\item The {functional architecture is identical} between the single-machine and cluster setups.
\item The {primary difference} is in \textit{scale}:
\item The {functional architecture is identical} between the single-machine and cluster setups.
\item The {primary difference} is in \textit{scale}:
\begin{itemize}
\item A single machine is limited by \textit{local CPU and RAM}.
\item A cluster is constrained by \textit{network and inter-node coordination overhead} but allows for higher overall compute capacity.
@@ -195,84 +223,84 @@ Thus:
The following table enumerates the types (messages) passed from Client to Server.
\begin{longtable}{|p{5cm}|p{5cm}|p{5cm}|}
\hline
\rowcolor{gray!20} \textbf{Type Name} & \textbf{Field} & \textbf{Type} \\
\hline
\endfirsthead
\hline
\rowcolor{gray!20} \textbf{Type Name} & \textbf{Field} & \textbf{Type} \\
\hline
\endfirsthead
\hline
\rowcolor{gray!20} \textbf{Type Name} & \textbf{Field} & \textbf{Type} \\
\hline
\endhead
\hline
\rowcolor{gray!20} \textbf{Type Name} & \textbf{Field} & \textbf{Type} \\
\hline
\endhead
\hline
\endfoot
\hline
\endfoot
\hline
\endlastfoot
\hline
\endlastfoot
ServerState & NextID & () $\rightarrow$ int \\
& GetResult & JobSpec $\rightarrow$ IO (Either Error AnalyzeResult) \\
& GetJobSpecByRepoId & (int, int) $\rightarrow$ IO (Either Error JobSpec) \\
& SetResult & (JobSpec, AnalyzeResult) $\rightarrow$ IO () \\
& GetJobList & int $\rightarrow$ IO (Either Error \textbf{[AnalyzeJob]}) \\
& GetJobInfo & JobSpec $\rightarrow$ IO (Either Error JobInfo) \\
& SetJobInfo & (JobSpec, JobInfo) $\rightarrow$ IO () \\
& GetStatus & JobSpec $\rightarrow$ IO (Either Error Status) \\
& SetStatus & (JobSpec, Status) $\rightarrow$ IO () \\
& AddJob & AnalyzeJob $\rightarrow$ IO () \\
ServerState & NextID & () $\rightarrow$ int \\
& GetResult & JobSpec $\rightarrow$ IO (Either Error AnalyzeResult) \\
& GetJobSpecByRepoId & (int, int) $\rightarrow$ IO (Either Error JobSpec) \\
& SetResult & (JobSpec, AnalyzeResult) $\rightarrow$ IO () \\
& GetJobList & int $\rightarrow$ IO (Either Error \textbf{[AnalyzeJob]}) \\
& GetJobInfo & JobSpec $\rightarrow$ IO (Either Error JobInfo) \\
& SetJobInfo & (JobSpec, JobInfo) $\rightarrow$ IO () \\
& GetStatus & JobSpec $\rightarrow$ IO (Either Error Status) \\
& SetStatus & (JobSpec, Status) $\rightarrow$ IO () \\
& AddJob & AnalyzeJob $\rightarrow$ IO () \\
\hline
JobSpec & sessionID & int \\
& nameWithOwner & string \\
\hline
JobSpec & sessionID & int \\
& nameWithOwner & string \\
\hline
AnalyzeResult & spec & JobSpec \\
& status & Status \\
& resultCount & int \\
& resultLocation & ArtifactLocation \\
& sourceLocationPrefix & string \\
& databaseSHA & string \\
\hline
AnalyzeResult & spec & JobSpec \\
& status & Status \\
& resultCount & int \\
& resultLocation & ArtifactLocation \\
& sourceLocationPrefix & string \\
& databaseSHA & string \\
\hline
ArtifactLocation & Key & string \\
& Bucket & string \\
\hline
ArtifactLocation & Key & string \\
& Bucket & string \\
\hline
AnalyzeJob & Spec & JobSpec \\
& QueryPackLocation & ArtifactLocation \\
& QueryLanguage & QueryLanguage \\
\hline
AnalyzeJob & Spec & JobSpec \\
& QueryPackLocation & ArtifactLocation \\
& QueryLanguage & QueryLanguage \\
\hline
QueryLanguage & & string \\
\hline
QueryLanguage & & string \\
\hline
JobInfo & QueryLanguage & string \\
& CreatedAt & string \\
& UpdatedAt & string \\
& SkippedRepositories & SkippedRepositories \\
\hline
JobInfo & QueryLanguage & string \\
& CreatedAt & string \\
& UpdatedAt & string \\
& SkippedRepositories & SkippedRepositories \\
\hline
SkippedRepositories & AccessMismatchRepos & AccessMismatchRepos \\
& NotFoundRepos & NotFoundRepos \\
& NoCodeqlDBRepos & NoCodeqlDBRepos \\
& OverLimitRepos & OverLimitRepos \\
\hline
SkippedRepositories & AccessMismatchRepos & AccessMismatchRepos \\
& NotFoundRepos & NotFoundRepos \\
& NoCodeqlDBRepos & NoCodeqlDBRepos \\
& OverLimitRepos & OverLimitRepos \\
\hline
AccessMismatchRepos & RepositoryCount & int \\
& Repositories & \textbf{[Repository]} \\
\hline
AccessMismatchRepos & RepositoryCount & int \\
& Repositories & \textbf{[Repository]} \\
\hline
NotFoundRepos & RepositoryCount & int \\
& RepositoryFullNames & \textbf{[string]} \\
\hline
NotFoundRepos & RepositoryCount & int \\
& RepositoryFullNames & \textbf{[string]} \\
\hline
Repository & ID & int \\
& Name & string \\
& FullName & string \\
& Private & bool \\
& StargazersCount & int \\
& UpdatedAt & string \\
\hline
Repository & ID & int \\
& Name & string \\
& FullName & string \\
& Private & bool \\
& StargazersCount & int \\
& UpdatedAt & string \\
\end{longtable}
@@ -313,11 +341,11 @@ The full round-trip execution, from query submission to result delivery, can be
\]
\begin{itemize}
\item \(C \to S\): Client submits a query suite \(\mathcal{Q}\) to the server.
\item \(S \to Q\): Server enqueues the query suite \((\mathcal{Q}, \mathcal{R}_i)\) for each repository.
\item \(Q \to \alpha\): Agent \(\alpha\) polls the queue and retrieves a job.
\item \(\alpha \to S\): Agent executes the queries and returns the accumulated results \(\mathcal{R}_i^{\mathcal{Q}}\) to the server.
\item \(S \to C\): Server sends the complete result set \(\mathcal{R}_i^{\mathcal{Q}}\) for each repository back to the client.
\item \(C \to S\): Client submits a query suite \(\mathcal{Q}\) to the server.
\item \(S \to Q\): Server enqueues the query suite \((\mathcal{Q}, \mathcal{R}_i)\) for each repository.
\item \(Q \to \alpha\): Agent \(\alpha\) polls the queue and retrieves a job.
\item \(\alpha \to S\): Agent executes the queries and returns the accumulated results \(\mathcal{R}_i^{\mathcal{Q}}\) to the server.
\item \(S \to C\): Server sends the complete result set \(\mathcal{R}_i^{\mathcal{Q}}\) for each repository back to the client.
\end{itemize}
\section{Result Representation}
@@ -330,9 +358,9 @@ For the complete collection of results across all repositories and queries:
where:
\begin{itemize}
\item \(N\) is the total number of repositories.
\item \(M\) is the total number of queries in \(\mathcal{Q}\).
\item \(k_{i,j}\) is the number of results from executing query
\item \(N\) is the total number of repositories.
\item \(M\) is the total number of queries in \(\mathcal{Q}\).
\item \(k_{i,j}\) is the number of results from executing query
\(\mathcal{Q}_j\)
on repository \(\mathcal{R}_i\).
\end{itemize}
@@ -354,7 +382,7 @@ Each result can be further indexed to track multiple repositories and result set
\begin{listing}[H] % h = here, t = top, b = bottom, p = page of floats
\caption{Distributed Query Execution Algorithm}
\begin{lstlisting}[language=Python]
\begin{lstlisting}[language=Python]
# Distributed Query Execution with Agent Polling and Accumulated Results
# Initialization
@@ -519,52 +547,52 @@ $\mathcal{R}_{\text{results}}$ = execute_queries(A, Q, $\mathcal{R}_{\text{resul
\begin{enumerate}
\item \textbf{\textbf{Initialization}}
\begin{itemize}
\item For each repository \(\mathcal{R}_i \in \mathcal{R}\):
\begin{itemize}
\item Initialize result sets: \(\mathcal{R}_i^{\mathcal{Q}} \gets \{\}\).
\end{itemize}
\item Initialize an empty job queue: \(Q \gets \{\}\).
\end{itemize}
\begin{itemize}
\item For each repository \(\mathcal{R}_i \in \mathcal{R}\):
\begin{itemize}
\item Initialize result sets: \(\mathcal{R}_i^{\mathcal{Q}} \gets \{\}\).
\end{itemize}
\item Initialize an empty job queue: \(Q \gets \{\}\).
\end{itemize}
\item \textbf{\textbf{Enqueue Queries}}
\begin{itemize}
\item For each repository \(\mathcal{R}_i \in \mathcal{R}\):
\begin{itemize}
\item Enqueue the entire query suite: \(S \xrightarrow{\text{enqueue}(\mathcal{Q}, \mathcal{R}_i)} Q\).
\end{itemize}
\end{itemize}
\begin{itemize}
\item For each repository \(\mathcal{R}_i \in \mathcal{R}\):
\begin{itemize}
\item Enqueue the entire query suite: \(S \xrightarrow{\text{enqueue}(\mathcal{Q}, \mathcal{R}_i)} Q\).
\end{itemize}
\end{itemize}
\item \textbf{\textbf{Execution Loop}}
\begin{itemize}
\item While \(Q \neq \emptyset\): (agents poll the queue for available jobs)
\begin{itemize}
\item For each available agent \(\alpha \in A\):
\begin{itemize}
\item Agent autonomously retrieves a job: \(\alpha \xleftarrow{\text{poll}(Q)}\).
\begin{itemize}
\item While \(Q \neq \emptyset\): (agents poll the queue for available jobs)
\begin{itemize}
\item For each available agent \(\alpha \in A\):
\begin{itemize}
\item Agent autonomously retrieves a job: \(\alpha \xleftarrow{\text{poll}(Q)}\).
\item \textbf{\textbf{Agent Execution Block}}
\begin{itemize}
\item Initialize result set for this repository: \(\mathcal{R}_i^{\mathcal{Q}} \gets \{\}\).
\item For each query \(\mathcal{Q}_j \in \mathcal{Q}\):
\begin{itemize}
\item Collect results:
\(\mathcal{R}_i^{\mathcal{Q}_j} \gets \{ r_{i,j,1}, r_{i,j,2}, \dots, r_{i,j,k_{i,j}} \}\).
\item Accumulate results:
\(\mathcal{R}_i^{\mathcal{Q}} \gets \mathcal{R}_i^{\mathcal{Q}} \cup \mathcal{R}_i^{\mathcal{Q}_j}\).
\end{itemize}
\item Agent sends all accumulated results back to the server:
\(\alpha \xrightarrow{(\mathcal{Q}, \mathcal{R}_i, \mathcal{R}_i^{\mathcal{Q}})} S\).
\end{itemize}
\end{itemize}
\end{itemize}
\end{itemize}
\item \textbf{\textbf{Agent Execution Block}}
\begin{itemize}
\item Initialize result set for this repository: \(\mathcal{R}_i^{\mathcal{Q}} \gets \{\}\).
\item For each query \(\mathcal{Q}_j \in \mathcal{Q}\):
\begin{itemize}
\item Collect results:
\(\mathcal{R}_i^{\mathcal{Q}_j} \gets \{ r_{i,j,1}, r_{i,j,2}, \dots, r_{i,j,k_{i,j}} \}\).
\item Accumulate results:
\(\mathcal{R}_i^{\mathcal{Q}} \gets \mathcal{R}_i^{\mathcal{Q}} \cup \mathcal{R}_i^{\mathcal{Q}_j}\).
\end{itemize}
\item Agent sends all accumulated results back to the server:
\(\alpha \xrightarrow{(\mathcal{Q}, \mathcal{R}_i, \mathcal{R}_i^{\mathcal{Q}})} S\).
\end{itemize}
\end{itemize}
\end{itemize}
\end{itemize}
\item \textbf{\textbf{Agent Sends Results}}
\begin{itemize}
\item Server sends results for repository \(i\) back to the client:
\(S \xrightarrow{(\mathcal{Q}, \mathcal{R}_i, \mathcal{R}_i^{\mathcal{Q}})} C\).
\end{itemize}
\begin{itemize}
\item Server sends results for repository \(i\) back to the client:
\(S \xrightarrow{(\mathcal{Q}, \mathcal{R}_i, \mathcal{R}_i^{\mathcal{Q}})} C\).
\end{itemize}
\end{enumerate}
\end{document}