\documentclass[11pt]{article} % Load the geometry package to set margins \usepackage[lmargin=2cm,rmargin=2cm,tmargin=1.8cm,bmargin=1.8cm]{geometry} % increase nesting depth \usepackage{enumitem} \setlistdepth{9} % \renewlist{itemize}{itemize}{9} \setlist[itemize,1]{label=\textbullet} \setlist[itemize,2]{label=--} \setlist[itemize,3]{label=*} \setlist[itemize,4]{label=•} \setlist[itemize,5]{label=–} \setlist[itemize,6]{label=>} \setlist[itemize,7]{label=»} \setlist[itemize,8]{label=›} \setlist[itemize,9]{label=·} % \renewlist{enumerate}{enumerate}{9} \setlist[enumerate,1]{label=\arabic*.,ref=\arabic*} \setlist[enumerate,2]{label=\alph*.),ref=\theenumi\alph*} \setlist[enumerate,3]{label=\roman*.),ref=\theenumii\roman*} \setlist[enumerate,4]{label=\Alph*.),ref=\theenumiii\Alph*} \setlist[enumerate,5]{label=\Roman*.),ref=\theenumiv\Roman*} \setlist[enumerate,6]{label=\arabic*),ref=\theenumv\arabic*} \setlist[enumerate,7]{label=\alph*),ref=\theenumvi\alph*} \setlist[enumerate,8]{label=\roman*),ref=\theenumvii\roman*} \setlist[enumerate,9]{label=\Alph*),ref=\theenumviii\Alph*} % Load CM Bright for math \usepackage{amsmath} % Standard math package \usepackage{amssymb} % Additional math symbols \usepackage{cmbright} % Sans-serif math font that complements Fira Sans \usepackage{fourier} % Font configuration % \usepackage{bera} % or % Load Fira Sans for text \usepackage{fontspec} \setmainfont{Fira Sans} % System-installed Fira Sans \renewcommand{\familydefault}{\sfdefault} % Set sans-serif as default % pseudo-code with math \usepackage{listings} \usepackage{float} \usepackage{xcolor} \usepackage{colortbl} % Set TT font % \usepackage{inconsolata} % or \setmonofont{IBMPlexMono-Light} % Define custom settings for listings \lstset{ language=Python, basicstyle=\ttfamily\small, % Monospaced font commentstyle=\itshape\color{gray}, % Italic and gray for comments keywordstyle=\color{blue}, % Keywords in blue stringstyle=\color{red}, % Strings in red mathescape=true, % Enable math in comments breaklines=true, % Break long lines numbers=left, % Add line numbers numberstyle=\tiny\color{gray}, % Style for line numbers frame=single, % Add a frame around the code } \usepackage{newfloat} % Allows creating custom float types % Define 'listing' as a floating environment \DeclareFloatingEnvironment[ fileext=lol, listname=List of Listings, name=Listing ]{listing} % To prevent floats from moving past a section boundary but still allow some floating: \usepackage{placeins} % used with \FloatBarrier \usepackage[utf8]{inputenc} \usepackage[T1]{fontenc} \usepackage{graphicx} \usepackage{longtable} \usepackage{wrapfig} \usepackage{rotating} \usepackage[normalem]{ulem} \usepackage{amsmath} \usepackage{amssymb} \usepackage{capt-of} \usepackage{hyperref} \usepackage{algorithm} \usepackage{algpseudocode} % Title, Author, and Date (or Report Number) \title{MRVA component interconnections} \author{Michael Hohn} \date{Technical Report 20250524} \hypersetup{ pdfauthor={Michael Hohn}, pdftitle={MRVA component interconnections}, pdfkeywords={}, pdfsubject={}, pdfcreator={Emacs 29.1}, pdflang={English}} \begin{document} \maketitle \tableofcontents \section{Overview} \label{sec:overview} The MRVA system is organized as a collection of services. On the server side, the system is containerized using Docker and comprises several key components: \begin{itemize} \item {\textbf{Server}}: Acts as the central coordinator. \item \textbf{Agents}: One or more agents that execute tasks. \item \textbf{RabbitMQ}: Handles messaging between components. \item \textbf{MinIO}: Provides storage for both queries and results. \item \textbf{HEPC}: An HTTP endpoint that hosts and serves CodeQL databases. \end{itemize} The execution process follows a structured workflow: \begin{enumerate} \item A client submits a set of queries $\mathcal{Q}$ targeting a repository set $\mathcal{R}$. \item The server enqueues jobs and distributes them to available agents. \item Each agent retrieves a job, executes queries against its assigned repository, and accumulates results. \item The agent sends results back to the server, which then forwards them to the client. \end{enumerate} This full round-trip can be expressed as: \begin{equation} \text{Client} \xrightarrow{\mathcal{Q}} \text{Server} \xrightarrow{\text{enqueue}} \text{Queue} \xrightarrow{\text{dispatch}} \text{Agent} \xrightarrow{\mathcal{Q}(\mathcal{R}_i)} \text{Server} \xrightarrow{\mathcal{Q}(\mathcal{R}_i} \text{Client} \end{equation} \section{Symbols and Notation} \label{sec:orgb695d5a} We define the following symbols for entities in the system: \begin{center} \begin{tabular}{lll} Concept & Symbol & Description \\[0pt] \hline Client & \(C\) & The source of the query submission \\[0pt] Server & \(S\) & Manages job queue and communicates results back to the client \\[0pt] Job Queue & \(Q\) & Queue for managing submitted jobs \\[0pt] Agent & \(\alpha\) & Independently polls, executes jobs, and accumulates results \\[0pt] Agent Set & \(A\) & The set of all available agents \\[0pt] Query Suite & \(\mathcal{Q}\) & Collection of queries submitted by the client \\[0pt] Repository List & \(\mathcal{R}\) & Collection of repositories \\[0pt] \(i\)-th Repository & \(\mathcal{R}_i\) & Specific repository indexed by \(i\) \\[0pt] \(j\)-th Query & \(\mathcal{Q}_j\) & Specific query from the suite indexed by \(j\) \\[0pt] Query Result & \(r_{i,j,k_{i,j}}\) & \(k_{i,j}\)-th result from query \(j\) executed on repository \(i\) \\[0pt] Query Result Set & \(\mathcal{R}_i^{\mathcal{Q}_j}\) & Set of all results for query \(j\) on repository \(i\) \\[0pt] Accumulated Results & \(\mathcal{R}_i^{\mathcal{Q}}\) & All results from executing all queries on \(\mathcal{R}_i\) \\[0pt] \end{tabular} \end{center} \section{Full Round-Trip Representation} \label{sec:full-round-trip} The full round-trip execution, from query submission to result delivery, can be summarized as: \[ C \xrightarrow{\mathcal{Q}} S \xrightarrow{\text{enqueue}} Q \xrightarrow{\text{poll}} \alpha \xrightarrow{\mathcal{Q}(\mathcal{R}_i)} S \xrightarrow{\mathcal{R}_i^{\mathcal{Q}}} C \] \begin{itemize} \item \(C \to S\): Client submits a query suite \(\mathcal{Q}\) to the server. \item \(S \to Q\): Server enqueues the query suite \((\mathcal{Q}, \mathcal{R}_i)\) for each repository. \item \(Q \to \alpha\): Agent \(\alpha\) polls the queue and retrieves a job. \item \(\alpha \to S\): Agent executes the queries and returns the accumulated results \(\mathcal{R}_i^{\mathcal{Q}}\) to the server. \item \(S \to C\): Server sends the complete result set \(\mathcal{R}_i^{\mathcal{Q}}\) for each repository back to the client. \end{itemize} \section{Result Representation} For the complete collection of results across all repositories and queries: \[ \mathcal{R}^{\mathcal{Q}} = \bigcup_{i=1}^{N} \bigcup_{j=1}^{M} \left\{ r_{i,j,1}, r_{i,j,2}, \dots, r_{i,j,k_{i,j}} \right\} \] where: \begin{itemize} \item \(N\) is the total number of repositories. \item \(M\) is the total number of queries in \(\mathcal{Q}\). \item \(k_{i,j}\) is the number of results from executing query \(\mathcal{Q}_j\) on repository \(\mathcal{R}_i\). \end{itemize} An individual result from the \(i\)-th repository, \(j\)-th query, and \(k\)-th result is: \[ r_{i,j,k} \] \[ C \xrightarrow{\mathcal{Q}} S \xrightarrow{\text{enqueue}} Q \xrightarrow{\text{dispatch}} \alpha \xrightarrow{\mathcal{Q}(\mathcal{R}_i)} S \xrightarrow{r_{i,j}} C \] Each result can be further indexed to track multiple repositories and result sets. \section{Graph Extraction from Log Table} Assume we have a structured event log represented as a set of tuples. \subsection*{Event Log Structure} Let \[ \mathcal{T} = \{ t_1, t_2, \dots, t_n \} \] be the set of all events, where each event \[ t_i = (\mathit{id}_i, \tau_i, a_i, e_i, q_i, r_i, c_i) \] consists of: \begin{itemize} \item \(\mathit{id}_i\): unique event ID \item \(\tau_i\): timestamp \item \(a_i\): actor (e.g., ``agent\_alpha1'') \item \(e_i\): event type (e.g., ``enqueue'', ``execute'') \item \(q_i\): query ID \item \(r_i\): repository ID \item \(c_i\): result count (may be \(\bot\) if not applicable) \end{itemize} Let \[ \mathcal{G} = (V, E) \] be a directed graph constructed from \(\mathcal{T}\), with vertices \(V\) and edges \(E\). \subsection*{Graph Definition} \begin{align*} V &= \{ \mathit{id}_i \mid t_i \in \mathcal{T} \} \\ E &\subseteq V \times V \end{align*} Edges capture temporal or semantic relationships between events. \subsection*{Construction Steps} \paragraph{1. Partition by Job Identity} Define the set of job identifiers: \[ J = \{ (q, r) \mid \exists i: q_i = q \land r_i = r \} \] Then for each \((q, r) \in J\), define: \[ \mathcal{T}_{q,r} = \{ t_i \in \mathcal{T} \mid q_i = q \land r_i = r \} \] \paragraph{2. Sort by Time} Order each \(\mathcal{T}_{q,r}\) as a list: \[ \mathcal{T}_{q,r} = [ t_{i_1}, t_{i_2}, \dots, t_{i_k} ] \quad \text{such that } \tau_{i_j} < \tau_{i_{j+1}} \] \paragraph{3. Causal Edges} Define within-job edges: \[ E_{q,r} = \{ (\mathit{id}_{i_j}, \mathit{id}_{i_{j+1}}) \mid 1 \leq j < k \} \] \paragraph{4. Global Causal Graph} Take the union: \[ E_{\text{causal}} = \bigcup_{(q, r) \in J} E_{q,r} \] \paragraph{5. Semantic Edges (Optional)} Define semantic predicates such as: \[ \mathsf{pulls}(i, j) \iff e_i = \text{enqueue} \land e_j = \text{pull} \land q_i = q_j \land r_i = r_j \land \tau_i < \tau_j \land a_i = \text{server} \land a_j = \text{agent} \] Then: \[ E_{\text{semantic}} = \{ (\mathit{id}_i, \mathit{id}_j) \mid \mathsf{pulls}(i, j) \} \] \subsection*{Final Graph} \begin{align*} V &= \{ \mathit{id}_i \mid t_i \in \mathcal{T} \} \\ E &= E_{\text{causal}} \cup E_{\text{semantic}} \end{align*} \subsection*{Notes} \begin{itemize} \item This construction is generic: the log store \(\mathcal{T}\) may come from a database, file, or tuple-indexed dictionary. \item Each semantic edge rule corresponds to a logical filter/join over \(\mathcal{T}\). \item The construction is schema-free on the graph side and can be recomputed on demand with different edge logic. \end{itemize} \end{document} %%% Local Variables: %%% mode: LaTeX %%% TeX-master: nil %%% TeX-engine: luatex %%% TeX-command-extra-options: "-synctex=1 -shell-escape -interaction=nonstopmode" %%% End: