Files
mrvacommander/doc/mrva-interconnect.ltx

332 lines
12 KiB
TeX
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

\documentclass[11pt]{article}
% Load the geometry package to set margins
\usepackage[lmargin=2cm,rmargin=2cm,tmargin=1.8cm,bmargin=1.8cm]{geometry}
% increase nesting depth
\usepackage{enumitem}
\setlistdepth{9}
%
\renewlist{itemize}{itemize}{9}
\setlist[itemize,1]{label=\textbullet}
\setlist[itemize,2]{label=--}
\setlist[itemize,3]{label=*}
\setlist[itemize,4]{label=•}
\setlist[itemize,5]{label=}
\setlist[itemize,6]{label=>}
\setlist[itemize,7]{label=»}
\setlist[itemize,8]{label=}
\setlist[itemize,9]{label=·}
%
\renewlist{enumerate}{enumerate}{9}
\setlist[enumerate,1]{label=\arabic*.,ref=\arabic*}
\setlist[enumerate,2]{label=\alph*.),ref=\theenumi\alph*}
\setlist[enumerate,3]{label=\roman*.),ref=\theenumii\roman*}
\setlist[enumerate,4]{label=\Alph*.),ref=\theenumiii\Alph*}
\setlist[enumerate,5]{label=\Roman*.),ref=\theenumiv\Roman*}
\setlist[enumerate,6]{label=\arabic*),ref=\theenumv\arabic*}
\setlist[enumerate,7]{label=\alph*),ref=\theenumvi\alph*}
\setlist[enumerate,8]{label=\roman*),ref=\theenumvii\roman*}
\setlist[enumerate,9]{label=\Alph*),ref=\theenumviii\Alph*}
% Load CM Bright for math
\usepackage{amsmath} % Standard math package
\usepackage{amssymb} % Additional math symbols
\usepackage{cmbright} % Sans-serif math font that complements Fira Sans
\usepackage{fourier}
% Font configuration
% \usepackage{bera}
% or
% Load Fira Sans for text
\usepackage{fontspec}
\setmainfont{Fira Sans} % System-installed Fira Sans
\renewcommand{\familydefault}{\sfdefault} % Set sans-serif as default
% pseudo-code with math
\usepackage{listings}
\usepackage{float}
\usepackage{xcolor}
\usepackage{colortbl}
% Set TT font
% \usepackage{inconsolata}
% or
\setmonofont{IBMPlexMono-Light}
% Define custom settings for listings
\lstset{
language=Python,
basicstyle=\ttfamily\small, % Monospaced font
commentstyle=\itshape\color{gray}, % Italic and gray for comments
keywordstyle=\color{blue}, % Keywords in blue
stringstyle=\color{red}, % Strings in red
mathescape=true, % Enable math in comments
breaklines=true, % Break long lines
numbers=left, % Add line numbers
numberstyle=\tiny\color{gray}, % Style for line numbers
frame=single, % Add a frame around the code
}
\usepackage{newfloat} % Allows creating custom float types
% Define 'listing' as a floating environment
\DeclareFloatingEnvironment[
fileext=lol,
listname=List of Listings,
name=Listing
]{listing}
% To prevent floats from moving past a section boundary but still allow some floating:
\usepackage{placeins}
% used with \FloatBarrier
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{graphicx}
\usepackage{longtable}
\usepackage{wrapfig}
\usepackage{rotating}
\usepackage[normalem]{ulem}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{capt-of}
\usepackage{hyperref}
\usepackage{algorithm}
\usepackage{algpseudocode}
% Title, Author, and Date (or Report Number)
\title{MRVA component interconnections}
\author{Michael Hohn}
\date{Technical Report 20250524}
\hypersetup{
pdfauthor={Michael Hohn},
pdftitle={MRVA component interconnections},
pdfkeywords={},
pdfsubject={},
pdfcreator={Emacs 29.1},
pdflang={English}}
\begin{document}
\maketitle
\tableofcontents
\section{Overview}
\label{sec:overview}
The MRVA system is organized as a collection of services. On the server side, the
system is containerized using Docker and comprises several key components:
\begin{itemize}
\item {\textbf{Server}}: Acts as the central coordinator.
\item \textbf{Agents}: One or more agents that execute tasks.
\item \textbf{RabbitMQ}: Handles messaging between components.
\item \textbf{MinIO}: Provides storage for both queries and results.
\item \textbf{HEPC}: An HTTP endpoint that hosts and serves CodeQL databases.
\end{itemize}
The execution process follows a structured workflow:
\begin{enumerate}
\item A client submits a set of queries $\mathcal{Q}$ targeting a repository
set $\mathcal{R}$.
\item The server enqueues jobs and distributes them to available agents.
\item Each agent retrieves a job, executes queries against its assigned repository, and accumulates results.
\item The agent sends results back to the server, which then forwards them to the client.
\end{enumerate}
This full round-trip can be expressed as:
\begin{equation}
\text{Client} \xrightarrow{\mathcal{Q}} \text{Server}
\xrightarrow{\text{enqueue}}
\text{Queue} \xrightarrow{\text{dispatch}} \text{Agent}
\xrightarrow{\mathcal{Q}(\mathcal{R}_i)}
\text{Server} \xrightarrow{\mathcal{Q}(\mathcal{R}_i} \text{Client}
\end{equation}
\section{Symbols and Notation}
\label{sec:orgb695d5a}
We define the following symbols for entities in the system:
\begin{center}
\begin{tabular}{lll}
Concept & Symbol & Description \\[0pt]
\hline
Client & \(C\) & The source of the query submission \\[0pt]
Server & \(S\) & Manages job queue and communicates results back to the client \\[0pt]
Job Queue & \(Q\) & Queue for managing submitted jobs \\[0pt]
Agent & \(\alpha\) & Independently polls, executes jobs, and accumulates results \\[0pt]
Agent Set & \(A\) & The set of all available agents \\[0pt]
Query Suite & \(\mathcal{Q}\) & Collection of queries submitted by the client \\[0pt]
Repository List & \(\mathcal{R}\) & Collection of repositories \\[0pt]
\(i\)-th Repository & \(\mathcal{R}_i\) & Specific repository indexed by \(i\) \\[0pt]
\(j\)-th Query & \(\mathcal{Q}_j\) & Specific query from the suite indexed by \(j\) \\[0pt]
Query Result & \(r_{i,j,k_{i,j}}\) & \(k_{i,j}\)-th result from query \(j\) executed on repository \(i\) \\[0pt]
Query Result Set & \(\mathcal{R}_i^{\mathcal{Q}_j}\) & Set of all results for query \(j\) on repository \(i\) \\[0pt]
Accumulated Results & \(\mathcal{R}_i^{\mathcal{Q}}\) & All results from executing all queries on \(\mathcal{R}_i\) \\[0pt]
\end{tabular}
\end{center}
\section{Full Round-Trip Representation}
\label{sec:full-round-trip}
The full round-trip execution, from query submission to result delivery, can be summarized as:
\[
C \xrightarrow{\mathcal{Q}} S \xrightarrow{\text{enqueue}} Q
\xrightarrow{\text{poll}}
\alpha \xrightarrow{\mathcal{Q}(\mathcal{R}_i)} S \xrightarrow{\mathcal{R}_i^{\mathcal{Q}}} C
\]
\begin{itemize}
\item \(C \to S\): Client submits a query suite \(\mathcal{Q}\) to the server.
\item \(S \to Q\): Server enqueues the query suite \((\mathcal{Q}, \mathcal{R}_i)\) for each repository.
\item \(Q \to \alpha\): Agent \(\alpha\) polls the queue and retrieves a job.
\item \(\alpha \to S\): Agent executes the queries and returns the accumulated results \(\mathcal{R}_i^{\mathcal{Q}}\) to the server.
\item \(S \to C\): Server sends the complete result set \(\mathcal{R}_i^{\mathcal{Q}}\) for each repository back to the client.
\end{itemize}
\section{Result Representation}
For the complete collection of results across all repositories and queries:
\[
\mathcal{R}^{\mathcal{Q}} = \bigcup_{i=1}^{N} \bigcup_{j=1}^{M}
\left\{ r_{i,j,1}, r_{i,j,2}, \dots, r_{i,j,k_{i,j}} \right\}
\]
where:
\begin{itemize}
\item \(N\) is the total number of repositories.
\item \(M\) is the total number of queries in \(\mathcal{Q}\).
\item \(k_{i,j}\) is the number of results from executing query
\(\mathcal{Q}_j\)
on repository \(\mathcal{R}_i\).
\end{itemize}
An individual result from the \(i\)-th repository, \(j\)-th query, and \(k\)-th result is:
\[
r_{i,j,k}
\]
\[
C \xrightarrow{\mathcal{Q}} S \xrightarrow{\text{enqueue}} Q \xrightarrow{\text{dispatch}} \alpha \xrightarrow{\mathcal{Q}(\mathcal{R}_i)} S \xrightarrow{r_{i,j}} C
\]
Each result can be further indexed to track multiple repositories and result sets.
\section{Graph Extraction from Log Table}
Assume we have a structured event log represented as a set of tuples.
\subsection*{Event Log Structure}
Let
\[
\mathcal{T} = \{ t_1, t_2, \dots, t_n \}
\]
be the set of all events, where each event
\[
t_i = (\mathit{id}_i, \tau_i, a_i, e_i, q_i, r_i, c_i)
\]
consists of:
\begin{itemize}
\item \(\mathit{id}_i\): unique event ID
\item \(\tau_i\): timestamp
\item \(a_i\): actor (e.g., ``agent\_alpha1'')
\item \(e_i\): event type (e.g., ``enqueue'', ``execute'')
\item \(q_i\): query ID
\item \(r_i\): repository ID
\item \(c_i\): result count (may be \(\bot\) if not applicable)
\end{itemize}
Let
\[
\mathcal{G} = (V, E)
\]
be a directed graph constructed from \(\mathcal{T}\), with vertices \(V\) and edges \(E\).
\subsection*{Graph Definition}
\begin{align*}
V &= \{ \mathit{id}_i \mid t_i \in \mathcal{T} \} \\
E &\subseteq V \times V
\end{align*}
Edges capture temporal or semantic relationships between events.
\subsection*{Construction Steps}
\paragraph{1. Partition by Job Identity}
Define the set of job identifiers:
\[
J = \{ (q, r) \mid \exists i: q_i = q \land r_i = r \}
\]
Then for each \((q, r) \in J\), define:
\[
\mathcal{T}_{q,r} = \{ t_i \in \mathcal{T} \mid q_i = q \land r_i = r \}
\]
\paragraph{2. Sort by Time}
Order each \(\mathcal{T}_{q,r}\) as a list:
\[
\mathcal{T}_{q,r} = [ t_{i_1}, t_{i_2}, \dots, t_{i_k} ]
\quad \text{such that } \tau_{i_j} < \tau_{i_{j+1}}
\]
\paragraph{3. Causal Edges}
Define within-job edges:
\[
E_{q,r} = \{ (\mathit{id}_{i_j}, \mathit{id}_{i_{j+1}}) \mid 1 \leq j < k \}
\]
\paragraph{4. Global Causal Graph}
Take the union:
\[
E_{\text{causal}} = \bigcup_{(q, r) \in J} E_{q,r}
\]
\paragraph{5. Semantic Edges (Optional)}
Define semantic predicates such as:
\[
\mathsf{pulls}(i, j) \iff e_i = \text{enqueue} \land e_j = \text{pull} \land
q_i = q_j \land r_i = r_j \land \tau_i < \tau_j \land a_i = \text{server} \land a_j = \text{agent}
\]
Then:
\[
E_{\text{semantic}} = \{ (\mathit{id}_i, \mathit{id}_j) \mid \mathsf{pulls}(i, j) \}
\]
\subsection*{Final Graph}
\begin{align*}
V &= \{ \mathit{id}_i \mid t_i \in \mathcal{T} \} \\
E &= E_{\text{causal}} \cup E_{\text{semantic}}
\end{align*}
\subsection*{Notes}
\begin{itemize}
\item This construction is generic: the log store \(\mathcal{T}\) may come from a database, file, or tuple-indexed dictionary.
\item Each semantic edge rule corresponds to a logical filter/join over \(\mathcal{T}\).
\item The construction is schema-free on the graph side and can be recomputed on demand with different edge logic.
\end{itemize}
\end{document}
%%% Local Variables:
%%% mode: LaTeX
%%% TeX-master: nil
%%% TeX-engine: luatex
%%% TeX-command-extra-options: "-synctex=1 -shell-escape -interaction=nonstopmode"
%%% End: