Merge pull request #3953 from RasmusWL/python-more-call-graph-tracing

Approved by tausbn
2025-12-16 16:53:25 +01:00 · 2020-09-07 17:34:14 +01:00
parent d3f19721e6 61998afc56
commit 22b3b0a5f1
48 changed files with 2092 additions and 308 deletions
--- a/python/tools/recorded-call-graph-metrics/.flake8
+++ b/python/tools/recorded-call-graph-metrics/.flake8
@@ -0,0 +1,6 @@
+# As described in https://github.com/psf/black/blob/master/docs/compatible_configs.md#flake8
+# and https://black.readthedocs.io/en/stable/the_black_code_style.html#line-length
+[flake8]
+max-line-length = 88
+select = C,E,F,W,B,B950
+ignore = E203, E501, W503
--- a/python/tools/recorded-call-graph-metrics/.gitignore
+++ b/python/tools/recorded-call-graph-metrics/.gitignore
@@ -0,0 +1,13 @@
+# Example DB
+cg-trace-example-db/
+
+# Tests artifacts
+tests/python-traces/
+tests/cg-trace-test-db
+
+# Artifact from building `pip install -e .`
+src/cg_trace.egg-info/
+
+projects/
+
+venv/
--- a/python/tools/recorded-call-graph-metrics/.isort.cfg
+++ b/python/tools/recorded-call-graph-metrics/.isort.cfg
@@ -0,0 +1,6 @@
+[settings]
+multi_line_output = 3
+include_trailing_comma = True
+force_grid_wrap = 0
+use_parentheses = True
+line_length = 88
--- a/python/tools/recorded-call-graph-metrics/README.md
+++ b/python/tools/recorded-call-graph-metrics/README.md
@@ -4,14 +4,113 @@ also known as _call graph tracing_.

 Execute a python program and for each call being made, record the call and callee. This allows us to compare call graph resolution from static analysis with actual data -- that is, can we statically determine the target of each actual call correctly.

-This is still in the early stages, and currently only supports a very minimal working example (to show that this approach might work).
+Using the call graph tracer does incur a heavy toll on the performance. Expect 10x longer to execute the program.

-The next hurdle is being able to handle multiple calls on the same line, such as
+Number of calls recorded vary a little from run to run. I have not been able to pinpoint why.

- `foo(); bar()`
- `foo(bar())`
- `foo().bar()`
+## Running against real projects

-## How do I give it a spin?
+Currently it's possible to gather metrics from traced runs of the standard test suite of a few projects (defined in [projects.json](./projects.json)): `youtube-dl`, `wcwidth`, and `flask`.

-Run the `recreate-db.sh` script to create the database `cg-trace-example-db`, which will include the `example/simple.xml` trace from executing the `example/simple.py` code. Then run the queries inside the `ql/` directory.
+To run against all projects, use
+
+```bash
+$ ./helper.sh all $(./helper.sh projects)
+```
+
+To view the results, use
+```
+$ head -n 100 projects/*/Metrics.txt
+```
+
+### Expanding set of projects
+
+It should be fairly straightforward to expand the set of projects. Most projects use `tox` for running their tests against multiple python versions. I didn't look into any kind of integration, but have manually picked out the instructions required to get going.
+
+As an example, compare the [`tox.ini`](https://github.com/pallets/flask/blob/21c3df31de4bc2f838c945bd37d185210d9bab1a/tox.ini) file from flask with the configuration
+
+```json
+    "flask": {
+        "repo": "https://github.com/pallets/flask.git",
+        "sha": "21c3df31de4bc2f838c945bd37d185210d9bab1a",
+        "module_command": "pytest -c /dev/null tests examples",
+        "setup": [
+            "pip install -r requirements/tests.txt",
+            "pip install -q -e examples/tutorial[test]",
+            "pip install -q -e examples/javascript[test]"
+        ]
+    }
+```
+
+## Local development
+
+### Setup
+
+1. Ensure you have at least Python 3.7
+
+2. Create virtual environment `python3 -m venv venv` and activate it
+
+3. Install dependencies `pip install -r --upgrade requirements.txt`
+
+4. Install this codebase as an editable package `pip install -e .`
+
+5. Setup your editor. If you're using VS Code, create a new project for this folder, and
+   use these settings for correct autoformatting of code on save:
+  ```
+  {
+      "python.pythonPath": "venv/bin/python",
+      "python.linting.enabled": true,
+      "python.linting.flake8Enabled": true,
+      "python.formatting.provider": "black",
+      "editor.formatOnSave": true,
+      "[python]": {
+          "editor.codeActionsOnSave": {
+              "source.organizeImports": true
+          }
+      },
+      "python.autoComplete.extraPaths": [
+          "src"
+      ]
+  }
+  ```
+
+6. Enjoy writing code, and being able to run `cg-trace` on your command line :tada:
+
+### Using it
+
+After following setup instructions above, you should be able to reproduce the example trace by running
+
+```
+cg-trace --xml example/simple.xml example/simple.py
+```
+
+You can also run traces for all tests and build a database by running `tests/create-test-db.sh`. Then run the queries inside the `ql/` directory.
+
+## Tracing Limitations
+
+### Multi-threading
+
+Should be possible by using [`threading.setprofile`](https://docs.python.org/3.8/library/threading.html#threading.setprofile), but that hasn't been done yet.
+
+### Code that uses `sys.setprofile`
+
+Since that is our mechanism for recording calls, any code that uses `sys.setprofile` will not work together with the call-graph tracer.
+
+### Class instantiation
+
+Does not always fire off an event in the `sys.setprofile` function (neither in `sys.settrace`), so is not recorded. Example:
+
+```
+r = range(10)
+```
+
+when disassembled (`python -m dis <file>`):
+
+```
+  9          48 LOAD_NAME                7 (range)
+             50 LOAD_CONST               5 (10)
+             52 CALL_FUNCTION            1
+             54 STORE_NAME               8 (r)
+```
+
+but no event :disappointed:
--- a/python/tools/recorded-call-graph-metrics/cg_trace.py
+++ b/python/tools/recorded-call-graph-metrics/cg_trace.py
@@ -1,222 +0,0 @@
-#!/usr/bin/env python3
-
-"""Call Graph tracing.
-
-Execute a python program and for each call being made, record the call and callee. This
-allows us to compare call graph resolution from static analysis with actual data -- that
-is, can we statically determine the target of each actual call correctly.
-
-If there is 100% code coverage from the Python execution, it would also be possible to
-look at the precision of the call graph resolutions -- that is, do we expect a function to
-be able to be called in a place where it is not? Currently not something we're looking at.
-"""
-
-# read: https://eli.thegreenplace.net/2012/03/23/python-internals-how-callables-work/
-
-# TODO: Know that a call to a C-function was made. See
-# https://docs.python.org/3/library/bdb.html#bdb.Bdb.trace_dispatch. Maybe use `lxml` as
-# test
-
-# For inspiration, look at these projects:
-# - https://github.com/joerick/pyinstrument (capture call-stack every <n> ms for profiling)
-# - https://github.com/gak/pycallgraph (display call-graph with graphviz after python execution)
-
-import argparse
-import bdb
-from io import StringIO
-import sys
-import os
-import dis
-import dataclasses
-import csv
-import xml.etree.ElementTree as ET
-
-# Copy-Paste and uncomment for interactive ipython sessions
-# import IPython; IPython.embed(); sys.exit()
-
-
-@dataclasses.dataclass(frozen=True)
-class Call():
-    """A call
-    """
-    filename: str
-    linenum: int
-    inst_index: int
-
-    @classmethod
-    def from_frame(cls, frame, debugger: bdb.Bdb):
-        code = frame.f_code
-
-        # Uncomment to see the bytecode
-        # b = dis.Bytecode(frame.f_code, current_offset=frame.f_lasti)
-        # print(b.dis(), file=sys.__stderr__)
-
-        return cls(
-            filename = debugger.canonic(code.co_filename),
-            linenum = frame.f_lineno,
-            inst_index = frame.f_lasti,
-        )
-
-
-@dataclasses.dataclass(frozen=True)
-class Callee():
-    """A callee (Function/Lambda/???)
-
-    should (hopefully) be uniquely identified by its name and location (filename+line
-    number)
-    """
-    funcname: str
-    filename: str
-    linenum: int
-
-    @classmethod
-    def from_frame(cls, frame, debugger: bdb.Bdb):
-        code = frame.f_code
-        return cls(
-            funcname = code.co_name,
-            filename = debugger.canonic(code.co_filename),
-            linenum = frame.f_lineno,
-        )
-
-
-class CallGraphTracer(bdb.Bdb):
-    """Tracer that records calls being made
-
-    It would seem obvious that this should have extended `trace` library
-    (https://docs.python.org/3/library/trace.html), but that part is not extensible --
-    however, the basic debugger (bdb) is, and provides maybe a bit more help than just
-    using `sys.settrace` directly.
-    """
-
-    recorded_calls: set
-
-    def __init__(self):
-        self.recorded_calls = set()
-        super().__init__()
-
-    def user_call(self, frame, argument_list):
-        call = Call.from_frame(frame.f_back, self)
-        callee = Callee.from_frame(frame, self)
-
-        # _print(f'{call}  -> {callee}')
-        self.recorded_calls.add((call, callee))
-
-
-################################################################################
-# Export
-################################################################################
-
-
-class Exporter:
-
-    @staticmethod
-    def export(recorded_calls, outfile_path):
-        raise NotImplementedError()
-
-    @staticmethod
-    def dataclass_to_dict(obj):
-        d = dataclasses.asdict(obj)
-        prefix = obj.__class__.__name__.lower()
-        return {f"{prefix}_{key}": val for (key, val) in d.items()}
-
-
-class CSVExporter(Exporter):
-
-    @staticmethod
-    def export(recorded_calls, outfile_path):
-        with open(outfile_path, 'w', newline='') as csv_file:
-            writer = None
-            for (call, callee) in recorded_calls:
-                data = {
-                    **Exporter.dataclass_to_dict(call),
-                    **Exporter.dataclass_to_dict(callee)
-                }
-
-                if writer is None:
-                    writer = csv.DictWriter(csv_file, fieldnames=data.keys())
-                    writer.writeheader()
-
-                writer.writerow(data)
-
-
-        print(f'output written to {outfile_path}')
-
-        # embed(); sys.exit()
-
-
-class XMLExporter(Exporter):
-
-    @staticmethod
-    def export(recorded_calls, outfile_path):
-
-        root = ET.Element('root')
-
-        for (call, callee) in recorded_calls:
-            data = {
-                **Exporter.dataclass_to_dict(call),
-                **Exporter.dataclass_to_dict(callee)
-            }
-
-            rc = ET.SubElement(root, 'recorded_call')
-            # this xml library only supports serializing attributes that have string values
-            rc.attrib = {k: str(v) for k, v in data.items()}
-
-        tree = ET.ElementTree(root)
-        tree.write(outfile_path, encoding='utf-8')
-
-
-################################################################################
-# __main__
-################################################################################
-
-
-if __name__ == "__main__":
-
-
-    parser = argparse.ArgumentParser()
-
-
-    parser.add_argument('--csv')
-    parser.add_argument('--xml')
-
-    parser.add_argument('progname', help='file to run as main program')
-    parser.add_argument('arguments', nargs=argparse.REMAINDER,
-            help='arguments to the program')
-
-    opts = parser.parse_args()
-
-    # These details of setting up the program to be run is very much inspired by `trace`
-    # from the standard library
-    sys.argv = [opts.progname, *opts.arguments]
-    sys.path[0] = os.path.dirname(opts.progname)
-
-    with open(opts.progname) as fp:
-        code = compile(fp.read(), opts.progname, 'exec')
-
-    # try to emulate __main__ namespace as much as possible
-    globs = {
-        '__file__': opts.progname,
-        '__name__': '__main__',
-        '__package__': None,
-        '__cached__': None,
-    }
-
-    real_stdout = sys.stdout
-    real_stderr = sys.stderr
-    captured_stdout = StringIO()
-
-    sys.stdout = captured_stdout
-    cgt = CallGraphTracer()
-    cgt.run(code, globs, globs)
-    sys.stdout = real_stdout
-
-    if opts.csv:
-        CSVExporter.export(cgt.recorded_calls, opts.csv)
-    elif opts.xml:
-        XMLExporter.export(cgt.recorded_calls, opts.xml)
-    else:
-        for (call, callee) in cgt.recorded_calls:
-            print(f'{call}  -> {callee}')
-
-    print('--- captured stdout ---')
-    print(captured_stdout.getvalue(), end='')
--- a/python/tools/recorded-call-graph-metrics/example/simple.xml
+++ b/python/tools/recorded-call-graph-metrics/example/simple.xml
@@ -1,6 +1,137 @@
 <root>
-    <recorded_call call_filename="/home/rasmus/code/ql/python/tools/recorded-call-graph-metrics/example/simple.py" call_linenum="7" call_inst_index="18" callee_funcname="foo" callee_filename="/home/rasmus/code/ql/python/tools/recorded-call-graph-metrics/example/simple.py" callee_linenum="1" />
-    <recorded_call call_filename="/home/rasmus/code/ql/python/tools/recorded-call-graph-metrics/example/simple.py" call_linenum="8" call_inst_index="24" callee_funcname="bar" callee_filename="/home/rasmus/code/ql/python/tools/recorded-call-graph-metrics/example/simple.py" callee_linenum="4" />
-    <recorded_call call_filename="/home/rasmus/code/ql/python/tools/recorded-call-graph-metrics/example/simple.py" call_linenum="10" call_inst_index="30" callee_funcname="foo" callee_filename="/home/rasmus/code/ql/python/tools/recorded-call-graph-metrics/example/simple.py" callee_linenum="1" />
-    <recorded_call call_filename="/home/rasmus/code/ql/python/tools/recorded-call-graph-metrics/example/simple.py" call_linenum="10" call_inst_index="36" callee_funcname="bar" callee_filename="/home/rasmus/code/ql/python/tools/recorded-call-graph-metrics/example/simple.py" callee_linenum="4" />
+  <info>
+    <cg_trace_version>0.0.2</cg_trace_version>
+    <args>--xml example/simple.xml example/simple.py</args>
+    <exit_status>completed</exit_status>
+    <elapsed>0.00 seconds</elapsed>
+    <utctimestamp>2020-07-22T12:14:02</utctimestamp>
+  </info>
+  <recorded_calls>
+    <recorded_call>
+      <Call>
+        <filename>/home/rasmus/code/ql/python/tools/recorded-call-graph-metrics/example/simple.py</filename>
+        <linenum>2</linenum>
+        <inst_index>4</inst_index>
+        <bytecode_expr>
+          <BytecodeCall>
+            <function>
+              <BytecodeVariableName>
+                <name>print</name>
+              </BytecodeVariableName>
+            </function>
+          </BytecodeCall>
+        </bytecode_expr>
+      </Call>
+      <ExternalCallee>
+        <module>builtins</module>
+        <qualname>print</qualname>
+        <is_builtin>True</is_builtin>
+      </ExternalCallee>
+    </recorded_call>
+    <recorded_call>
+      <Call>
+        <filename>/home/rasmus/code/ql/python/tools/recorded-call-graph-metrics/example/simple.py</filename>
+        <linenum>5</linenum>
+        <inst_index>4</inst_index>
+        <bytecode_expr>
+          <BytecodeCall>
+            <function>
+              <BytecodeVariableName>
+                <name>print</name>
+              </BytecodeVariableName>
+            </function>
+          </BytecodeCall>
+        </bytecode_expr>
+      </Call>
+      <ExternalCallee>
+        <module>builtins</module>
+        <qualname>print</qualname>
+        <is_builtin>True</is_builtin>
+      </ExternalCallee>
+    </recorded_call>
+    <recorded_call>
+      <Call>
+        <filename>/home/rasmus/code/ql/python/tools/recorded-call-graph-metrics/example/simple.py</filename>
+        <linenum>7</linenum>
+        <inst_index>18</inst_index>
+        <bytecode_expr>
+          <BytecodeCall>
+            <function>
+              <BytecodeVariableName>
+                <name>foo</name>
+              </BytecodeVariableName>
+            </function>
+          </BytecodeCall>
+        </bytecode_expr>
+      </Call>
+      <PythonCallee>
+        <filename>/home/rasmus/code/ql/python/tools/recorded-call-graph-metrics/example/simple.py</filename>
+        <linenum>1</linenum>
+        <funcname>foo</funcname>
+      </PythonCallee>
+    </recorded_call>
+    <recorded_call>
+      <Call>
+        <filename>/home/rasmus/code/ql/python/tools/recorded-call-graph-metrics/example/simple.py</filename>
+        <linenum>8</linenum>
+        <inst_index>24</inst_index>
+        <bytecode_expr>
+          <BytecodeCall>
+            <function>
+              <BytecodeVariableName>
+                <name>bar</name>
+              </BytecodeVariableName>
+            </function>
+          </BytecodeCall>
+        </bytecode_expr>
+      </Call>
+      <PythonCallee>
+        <filename>/home/rasmus/code/ql/python/tools/recorded-call-graph-metrics/example/simple.py</filename>
+        <linenum>4</linenum>
+        <funcname>bar</funcname>
+      </PythonCallee>
+    </recorded_call>
+    <recorded_call>
+      <Call>
+        <filename>/home/rasmus/code/ql/python/tools/recorded-call-graph-metrics/example/simple.py</filename>
+        <linenum>10</linenum>
+        <inst_index>30</inst_index>
+        <bytecode_expr>
+          <BytecodeCall>
+            <function>
+              <BytecodeVariableName>
+                <name>foo</name>
+              </BytecodeVariableName>
+            </function>
+          </BytecodeCall>
+        </bytecode_expr>
+      </Call>
+      <PythonCallee>
+        <filename>/home/rasmus/code/ql/python/tools/recorded-call-graph-metrics/example/simple.py</filename>
+        <linenum>1</linenum>
+        <funcname>foo</funcname>
+      </PythonCallee>
+    </recorded_call>
+    <recorded_call>
+      <Call>
+        <filename>/home/rasmus/code/ql/python/tools/recorded-call-graph-metrics/example/simple.py</filename>
+        <linenum>10</linenum>
+        <inst_index>36</inst_index>
+        <bytecode_expr>
+          <BytecodeCall>
+            <function>
+              <BytecodeVariableName>
+                <name>bar</name>
+              </BytecodeVariableName>
+            </function>
+          </BytecodeCall>
+        </bytecode_expr>
+      </Call>
+      <PythonCallee>
+        <filename>/home/rasmus/code/ql/python/tools/recorded-call-graph-metrics/example/simple.py</filename>
+        <linenum>4</linenum>
+        <funcname>bar</funcname>
+      </PythonCallee>
+    </recorded_call>
+  </recorded_calls>
 </root>
--- a/python/tools/recorded-call-graph-metrics/helper.sh
+++ b/python/tools/recorded-call-graph-metrics/helper.sh
@@ -0,0 +1,191 @@
+#!/bin/bash
+
+set -Eeuo pipefail # see https://vaneyckt.io/posts/safer_bash_scripts_with_set_euxo_pipefail/
+
+SCRIPTDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
+PROJECTS_FILE="$SCRIPTDIR/projects.json"
+
+METRICS_QUERY="ql/query/Metrics.ql"
+
+PROJECTS_BASE_DIR="$SCRIPTDIR/projects"
+
+repo_dir() {
+    echo "$PROJECTS_BASE_DIR/$1/repo"
+}
+
+venv_dir() {
+    echo "$PROJECTS_BASE_DIR/$1/venv"
+}
+
+trace_dir() {
+    echo "$PROJECTS_BASE_DIR/$1/traces"
+}
+
+db_path() {
+    echo "$PROJECTS_BASE_DIR/$1/$1-db"
+}
+
+query_result_base_path() {
+    echo "$PROJECTS_BASE_DIR/$1/$2"
+}
+
+help() {
+    echo -n """\
+$0 help                 This message
+$0 projects             List projects
+$0 repo <projects>      Fetch repo for projects
+$0 setup <projects>     Perform setup steps for projects (install dependencies)
+$0 trace <projects>     Trace projects
+$0 db <projects>        Build databases for projects
+$0 metrics <projects>   Run $METRICS_QUERY on projects
+$0 all <projects>       Perform all the above steps for projects
+"""
+}
+
+projects() {
+    jq -r 'keys[]' "$PROJECTS_FILE"
+}
+
+check_project_exists() {
+    if ! jq -e ".\"$1\"" "$PROJECTS_FILE" &>/dev/null; then
+        echo "ERROR: '$1' not a known project, see '$0 projects'"
+        exit 1
+    fi
+}
+
+repo() {
+    for project in $@; do
+        check_project_exists $project
+
+        echo "Cloning repo for '$project'"
+
+        REPO_DIR=$(repo_dir $project)
+
+        if [[ -d "$REPO_DIR" ]]; then
+            echo "Repo already cloned in $REPO_DIR"
+            continue;
+        fi
+
+        REPO_URL=$(jq -e -r ".\"$project\".repo" "$PROJECTS_FILE")
+        SHA=$(jq -e -r ".\"$project\".sha" "$PROJECTS_FILE")
+
+        mkdir -p "$REPO_DIR"
+        cd "$REPO_DIR"
+        git init
+        git remote add origin $REPO_URL
+        git fetch --depth 1 origin $SHA
+        git -c advice.detachedHead=False checkout FETCH_HEAD
+    done
+}
+
+setup() {
+    for project in $@; do
+        check_project_exists $project
+
+        echo "Setting up '$project'"
+
+        python3 -m venv $(venv_dir $project)
+        source $(venv_dir $project)/bin/activate
+
+        cd $(repo_dir $project)
+        pip install -e "$SCRIPTDIR"
+
+        IFS=$'\n'
+        setup_commands=($(jq -r ".\"$project\".setup[]" $PROJECTS_FILE))
+        unset IFS
+        for setup_command in "${setup_commands[@]}"; do
+            echo "Running '$setup_command'"
+            $setup_command
+        done
+
+        # deactivate venv again
+        deactivate
+    done
+}
+
+trace() {
+    for project in $@; do
+        check_project_exists $project
+
+        echo "Tracing '$project'"
+
+        source $(venv_dir $project)/bin/activate
+
+        REPO_DIR=$(repo_dir $project)
+        cd "$REPO_DIR"
+
+        rm -rf $(trace_dir $project)
+        mkdir -p $(trace_dir $project)
+
+        MODULE_COMMAND=$(jq -r ".\"$project\".module_command" $PROJECTS_FILE)
+
+        cg-trace --xml $(trace_dir $project)/trace.xml --module $MODULE_COMMAND
+
+        # deactivate venv again
+        deactivate
+    done
+}
+
+db() {
+    for project in $@; do
+        check_project_exists $project
+
+        echo "Creating CodeQL database for '$project'"
+
+        DB=$(db_path $project)
+        SRC=$(repo_dir $project)
+        PYTHON_EXTRACTOR=$(codeql resolve extractor --language=python)
+
+        # Source venv so we can extract dependencies
+        source $(venv_dir $project)/bin/activate
+
+        rm -rf "$DB"
+
+        codeql database init --source-root="$SRC" --language=python "$DB"
+        codeql database trace-command --working-dir="$SRC" "$DB" "$PYTHON_EXTRACTOR/tools/autobuild.sh"
+        codeql database index-files --language xml --include-extension .xml --working-dir="$(trace_dir $project)" "$DB"
+        codeql database finalize "$DB"
+
+        echo "Created database in '$DB'"
+
+        # deactivate venv again
+        deactivate
+    done
+}
+
+metrics() {
+    for project in $@; do
+        check_project_exists $project
+
+        echo "Running $METRICS_QUERY on '$project'"
+
+        RESULTS_BASE=$(query_result_base_path $project Metrics)
+        DB=$(db_path $project)
+
+        codeql query run "$SCRIPTDIR/$METRICS_QUERY" --database "$DB" --output "${RESULTS_BASE}.bqrs"
+        codeql bqrs decode "${RESULTS_BASE}.bqrs" --format text --output "${RESULTS_BASE}.txt"
+
+        echo "Results available in '${RESULTS_BASE}.txt'"
+    done
+}
+
+all() {
+    for project in $@; do
+        check_project_exists $project
+
+        repo $project
+        setup $project
+        trace $project
+        db $project
+        metrics $project
+    done
+}
+
+
+COMMAND=${1:-"help"}
+
+if [[ $# -ge 2 ]]; then
+    shift
+fi
+
+$COMMAND $@
--- a/python/tools/recorded-call-graph-metrics/projects.json
+++ b/python/tools/recorded-call-graph-metrics/projects.json
@@ -0,0 +1,28 @@
+{
+    "wcwidth": {
+        "repo": "https://github.com/jquast/wcwidth.git",
+        "sha": "b29897e5a1b403a0e36f7fc991614981cbc42475",
+        "module_command": "pytest -c /dev/null",
+        "setup": [
+            "pip install pytest"
+        ]
+    },
+    "youtube-dl": {
+        "repo": "https://github.com/ytdl-org/youtube-dl.git",
+        "sha": "a115e07594ccb7749ca108c889978510c7df126e",
+        "module_command": "nose -v test --exclude test_download.py --exclude test_age_restriction.py --exclude test_subtitles.py --exclude test_write_annotations.py --exclude test_youtube_lists.py --exclude test_iqiyi_sdk_interpreter.py --exclude test_socks.py",
+        "setup": [
+            "pip install nose"
+        ]
+    },
+    "flask": {
+        "repo": "https://github.com/pallets/flask.git",
+        "sha": "21c3df31de4bc2f838c945bd37d185210d9bab1a",
+        "module_command": "pytest -c /dev/null tests examples",
+        "setup": [
+            "pip install -r requirements/tests.txt",
+            "pip install -q -e examples/tutorial[test]",
+            "pip install -q -e examples/javascript[test]"
+        ]
+    }
+}
--- a/python/tools/recorded-call-graph-metrics/ql/PointsToFound.ql
+++ b/python/tools/recorded-call-graph-metrics/ql/PointsToFound.ql
@@ -1,9 +0,0 @@
-import RecordedCalls
-
-from ValidRecordedCall rc, Call call, Function callee, CallableValue calleeValue
-where
-  call = rc.getCall() and
-  callee = rc.getCallee() and
-  calleeValue.getScope() = callee and
-  calleeValue.getACall() = call.getAFlowNode()
-select call, "-->", callee
--- a/python/tools/recorded-call-graph-metrics/ql/RecordedCalls.qll
+++ b/python/tools/recorded-call-graph-metrics/ql/RecordedCalls.qll
@@ -1,36 +0,0 @@
-import python
-
-class RecordedCall extends XMLElement {
-  RecordedCall() { this.hasName("recorded_call") }
-
-  string call_filename() { result = this.getAttributeValue("call_filename") }
-
-  int call_linenum() { result = this.getAttributeValue("call_linenum").toInt() }
-
-  int call_inst_index() { result = this.getAttributeValue("call_inst_index").toInt() }
-
-  Call getCall() {
-    // TODO: handle calls spanning multiple lines
-    result.getLocation().hasLocationInfo(this.call_filename(), this.call_linenum(), _, _, _)
-  }
-
-  string callee_filename() { result = this.getAttributeValue("callee_filename") }
-
-  int callee_linenum() { result = this.getAttributeValue("callee_linenum").toInt() }
-
-  string callee_funcname() { result = this.getAttributeValue("callee_funcname") }
-
-  Function getCallee() {
-    result.getLocation().hasLocationInfo(this.callee_filename(), this.callee_linenum(), _, _, _)
-  }
-}
-
-/**
- * Class of recorded calls where we can uniquely identify both the `call` and the `callee`.
- */
-class ValidRecordedCall extends RecordedCall {
-  ValidRecordedCall() {
-    strictcount(this.getCall()) = 1 and
-    strictcount(this.getCallee()) = 1
-  }
-}
--- a/python/tools/recorded-call-graph-metrics/ql/UnidentifiedRecordedCalls.ql
+++ b/python/tools/recorded-call-graph-metrics/ql/UnidentifiedRecordedCalls.ql
@@ -1,7 +0,0 @@
-import RecordedCalls
-
-from RecordedCall rc
-where not rc instanceof ValidRecordedCall
-select "Could not uniquely identify this recorded call (either call or callee was not uniquely identified)",
-  rc.call_filename(), rc.call_linenum(), rc.call_inst_index(), "-->", rc.callee_filename(),
-  rc.callee_linenum(), rc.callee_funcname()
--- a/python/tools/recorded-call-graph-metrics/ql/lib/BytecodeExpr.qll
+++ b/python/tools/recorded-call-graph-metrics/ql/lib/BytecodeExpr.qll
@@ -0,0 +1,73 @@
+import python
+
+abstract class XMLBytecodeExpr extends XMLElement { }
+
+class XMLBytecodeConst extends XMLBytecodeExpr {
+  XMLBytecodeConst() { this.hasName("BytecodeConst") }
+
+  string get_value_data_raw() { result = this.getAChild("value").getTextValue() }
+}
+
+class XMLBytecodeVariableName extends XMLBytecodeExpr {
+  XMLBytecodeVariableName() { this.hasName("BytecodeVariableName") }
+
+  string get_name_data() { result = this.getAChild("name").getTextValue() }
+}
+
+class XMLBytecodeAttribute extends XMLBytecodeExpr {
+  XMLBytecodeAttribute() { this.hasName("BytecodeAttribute") }
+
+  string get_attr_name_data() { result = this.getAChild("attr_name").getTextValue() }
+
+  XMLBytecodeExpr get_object_data() { result.getParent() = this.getAChild("object") }
+}
+
+class XMLBytecodeSubscript extends XMLBytecodeExpr {
+  XMLBytecodeSubscript() { this.hasName("BytecodeSubscript") }
+
+  XMLBytecodeExpr get_key_data() { result.getParent() = this.getAChild("key") }
+
+  XMLBytecodeExpr get_object_data() { result.getParent() = this.getAChild("object") }
+}
+
+class XMLBytecodeTuple extends XMLBytecodeExpr {
+  XMLBytecodeTuple() { this.hasName("BytecodeTuple") }
+
+  XMLBytecodeExpr get_elements_data(int index) {
+    result = this.getAChild("elements").getChild(index)
+  }
+}
+
+class XMLBytecodeList extends XMLBytecodeExpr {
+  XMLBytecodeList() { this.hasName("BytecodeList") }
+
+  XMLBytecodeExpr get_elements_data(int index) {
+    result = this.getAChild("elements").getChild(index)
+  }
+}
+
+class XMLBytecodeCall extends XMLBytecodeExpr {
+  XMLBytecodeCall() { this.hasName("BytecodeCall") }
+
+  XMLBytecodeExpr get_function_data() { result.getParent() = this.getAChild("function") }
+}
+
+class XMLBytecodeUnknown extends XMLBytecodeExpr {
+  XMLBytecodeUnknown() { this.hasName("BytecodeUnknown") }
+
+  string get_opname_data() { result = this.getAChild("opname").getTextValue() }
+}
+
+class XMLBytecodeMakeFunction extends XMLBytecodeExpr {
+  XMLBytecodeMakeFunction() { this.hasName("BytecodeMakeFunction") }
+
+  XMLBytecodeExpr get_qualified_name_data() {
+    result.getParent() = this.getAChild("qualified_name")
+  }
+}
+
+class XMLSomethingInvolvingScaryBytecodeJump extends XMLBytecodeExpr {
+  XMLSomethingInvolvingScaryBytecodeJump() { this.hasName("SomethingInvolvingScaryBytecodeJump") }
+
+  string get_opname_data() { result = this.getAChild("opname").getTextValue() }
+}
--- a/python/tools/recorded-call-graph-metrics/ql/lib/RecordedCalls.qll
+++ b/python/tools/recorded-call-graph-metrics/ql/lib/RecordedCalls.qll
@@ -0,0 +1,269 @@
+import python
+import semmle.python.types.Builtins
+import semmle.python.objects.Callables
+import lib.BytecodeExpr
+
+/** The XML data for a recorded call (includes all data). */
+class XMLRecordedCall extends XMLElement {
+  XMLRecordedCall() { this.hasName("recorded_call") }
+
+  /** Gets the XML data for the call. */
+  XMLCall getXMLCall() { result.getParent() = this }
+
+  /** Gets a call matching the recorded information. */
+  Call getACall() { result = this.getXMLCall().getACall() }
+
+  /** Gets the XML data for the callee. */
+  XMLCallee getXMLCallee() { result.getParent() = this }
+
+  /** Gets a python function matching the recorded information of the callee. */
+  Function getAPythonCallee() { result = this.getXMLCallee().(XMLPythonCallee).getACallee() }
+
+  /** Gets a builtin function matching the recorded information of the callee. */
+  Builtin getABuiltinCallee() { result = this.getXMLCallee().(XMLExternalCallee).getACallee() }
+
+  /** Get a different `XMLRecordedCall` with the same result-set for `getACall`. */
+  XMLRecordedCall getOtherWithSameSetOfCalls() {
+    // `rc` is for a different bytecode instruction on same line
+    not result.getXMLCall().get_inst_index_data() = this.getXMLCall().get_inst_index_data() and
+    result.getXMLCall().get_filename_data() = this.getXMLCall().get_filename_data() and
+    result.getXMLCall().get_linenum_data() = this.getXMLCall().get_linenum_data() and
+    // set of calls are equal
+    forall(Call call | call = this.getACall() or call = result.getACall() |
+      call = this.getACall() and call = result.getACall()
+    )
+  }
+
+  override string toString() {
+    exists(string path |
+      path =
+        any(File file | file.getAbsolutePath() = this.getXMLCall().get_filename_data())
+            .getRelativePath()
+      or
+      not exists(File file |
+        file.getAbsolutePath() = this.getXMLCall().get_filename_data() and
+        exists(file.getRelativePath())
+      ) and
+      path = this.getXMLCall().get_filename_data()
+    |
+      result = this.getName() + ": " + path + ":" + this.getXMLCall().get_linenum_data()
+    )
+  }
+}
+
+/** The XML data for the call part a recorded call. */
+class XMLCall extends XMLElement {
+  XMLCall() { this.hasName("Call") }
+
+  string get_filename_data() { result = this.getAChild("filename").getTextValue() }
+
+  int get_linenum_data() { result = this.getAChild("linenum").getTextValue().toInt() }
+
+  int get_inst_index_data() { result = this.getAChild("inst_index").getTextValue().toInt() }
+
+  /** Gets a call that matches the recorded information. */
+  Call getACall() {
+    // TODO: do we handle calls spanning multiple lines?
+    this.matchBytecodeExpr(result, this.getAChild("bytecode_expr").getAChild())
+  }
+
+  /** Holds if `expr` can be fully matched with `bytecode`. */
+  private predicate matchBytecodeExpr(Expr expr, XMLBytecodeExpr bytecode) {
+    exists(Call parent_call, XMLBytecodeCall parent_bytecode_call |
+      parent_call
+          .getLocation()
+          .hasLocationInfo(this.get_filename_data(), this.get_linenum_data(), _, _, _) and
+      parent_call.getAChildNode*() = expr and
+      parent_bytecode_call.getParent() = this.getAChild("bytecode_expr") and
+      parent_bytecode_call.getAChild*() = bytecode
+    ) and
+    (
+      expr.(Name).getId() = bytecode.(XMLBytecodeVariableName).get_name_data()
+      or
+      expr.(Attribute).getName() = bytecode.(XMLBytecodeAttribute).get_attr_name_data() and
+      matchBytecodeExpr(expr.(Attribute).getObject(),
+        bytecode.(XMLBytecodeAttribute).get_object_data())
+      or
+      matchBytecodeExpr(expr.(Call).getFunc(), bytecode.(XMLBytecodeCall).get_function_data())
+      //
+      // I considered allowing a partial match as well. That is, if the bytecode
+      // expression information only tells us `<unknown>.foo()`, and we find an AST
+      // expression that matches on `.foo()`, that is good enough.
+      //
+      // However, we cannot assume that all calls are recorded (such as `range(10)`),
+      // and we cannot assume that for all recorded calls there exists a corresponding
+      // AST call (such as for list-comprehensions).
+      //
+      // So allowing partial matches is not safe, since we might end up matching a
+      // recorded call not in the AST together with an unrecorded call visible in the
+      // AST.
+    )
+  }
+}
+
+/** The XML data for the callee part a recorded call. */
+abstract class XMLCallee extends XMLElement { }
+
+/** The XML data for the callee part a recorded call, when the callee is a Python function. */
+class XMLPythonCallee extends XMLCallee {
+  XMLPythonCallee() { this.hasName("PythonCallee") }
+
+  string get_filename_data() { result = this.getAChild("filename").getTextValue() }
+
+  int get_linenum_data() { result = this.getAChild("linenum").getTextValue().toInt() }
+
+  string get_funcname_data() { result = this.getAChild("funcname").getTextValue() }
+
+  Function getACallee() {
+    result.getLocation().hasLocationInfo(this.get_filename_data(), this.get_linenum_data(), _, _, _)
+    or
+    // if function has decorator, the call will be recorded going to the first
+    result
+        .getADecorator()
+        .getLocation()
+        .hasLocationInfo(this.get_filename_data(), this.get_linenum_data(), _, _, _)
+  }
+}
+
+/** The XML data for the callee part a recorded call, when the callee is a C function or builtin. */
+class XMLExternalCallee extends XMLCallee {
+  XMLExternalCallee() { this.hasName("ExternalCallee") }
+
+  string get_module_data() { result = this.getAChild("module").getTextValue() }
+
+  string get_qualname_data() { result = this.getAChild("qualname").getTextValue() }
+
+  Builtin getACallee() {
+    exists(Builtin mod |
+      mod.isModule() and
+      mod.getName() = this.get_module_data()
+    |
+      result = traverse_qualname(mod, this.get_qualname_data())
+    )
+  }
+}
+
+/**
+ * Helper predicate. If parent = `builtins` and qualname = `list.append`, it will
+ * return the result of `builtins.list.append`.class
+ */
+private Builtin traverse_qualname(Builtin parent, string qualname) {
+  not qualname = "__objclass__" and
+  not qualname.matches("%.%") and
+  result = parent.getMember(qualname)
+  or
+  qualname.matches("%.%") and
+  exists(string before_dot, string after_dot, Builtin intermediate_parent |
+    qualname = before_dot + "." + after_dot and
+    not before_dot = "__objclass__" and
+    intermediate_parent = parent.getMember(before_dot) and
+    result = traverse_qualname(intermediate_parent, after_dot)
+  )
+}
+
+/**
+ * Class of recorded calls where we can identify both the `call` and the `callee` uniquely.
+ */
+class IdentifiedRecordedCall extends XMLRecordedCall {
+  IdentifiedRecordedCall() {
+    strictcount(this.getACall()) = 1 and
+    (
+      strictcount(this.getAPythonCallee()) = 1
+      or
+      strictcount(this.getABuiltinCallee()) = 1
+    )
+    or
+    // Handle case where the same function is called multiple times in one line, for
+    // example `func(); func()`. This only works if:
+    // - all the callees for these calls is the same
+    // - all these calls were recorded
+    //
+    // without this `strictcount`, in the case `func(); func(); func()`, if 1 of the calls
+    // is not recorded, we would still mark the other two recorded calls as valid
+    // (which is not following the rules above). + 1 to count `this` as well.
+    strictcount(this.getACall()) = strictcount(this.getOtherWithSameSetOfCalls()) + 1 and
+    forex(XMLRecordedCall rc | rc = this.getOtherWithSameSetOfCalls() |
+      unique(Function f | f = this.getAPythonCallee()) =
+        unique(Function f | f = rc.getAPythonCallee())
+      or
+      unique(Builtin b | b = this.getABuiltinCallee()) =
+        unique(Builtin b | b = rc.getABuiltinCallee())
+    )
+  }
+
+  override string toString() {
+    exists(string callee_str |
+      exists(Function callee, string path | callee = this.getAPythonCallee() |
+        (
+          path = callee.getLocation().getFile().getRelativePath()
+          or
+          not exists(callee.getLocation().getFile().getRelativePath()) and
+          path = callee.getLocation().getFile().getAbsolutePath()
+        ) and
+        callee_str =
+          callee.toString() + " (" + path + ":" + callee.getLocation().getStartLine() + ")"
+      )
+      or
+      callee_str = this.getABuiltinCallee().toString()
+    |
+      result = super.toString() + " --> " + callee_str
+    )
+  }
+}
+
+/**
+ * Class of recorded calls where we cannot identify both the `call` and the `callee` uniquely.
+ */
+class UnidentifiedRecordedCall extends XMLRecordedCall {
+  UnidentifiedRecordedCall() { not this instanceof IdentifiedRecordedCall }
+}
+
+/**
+ * Recorded calls made from outside project folder, that can be ignored when evaluating
+ * call-graph quality.
+ */
+class IgnoredRecordedCall extends XMLRecordedCall {
+  IgnoredRecordedCall() {
+    not exists(
+      any(File file | file.getAbsolutePath() = this.getXMLCall().get_filename_data())
+          .getRelativePath()
+    )
+  }
+}
+
+/** Provides classes for call-graph resolution by using points-to. */
+module PointsToBasedCallGraph {
+  /** An IdentifiedRecordedCall that can be resolved with points-to */
+  class ResolvableRecordedCall extends IdentifiedRecordedCall {
+    Value calleeValue;
+
+    ResolvableRecordedCall() {
+      exists(Call call, XMLCallee xmlCallee |
+        call = this.getACall() and
+        calleeValue.getACall() = call.getAFlowNode() and
+        xmlCallee = this.getXMLCallee() and
+        (
+          xmlCallee instanceof XMLPythonCallee and
+          (
+            // normal function
+            calleeValue.(PythonFunctionValue).getScope() = xmlCallee.(XMLPythonCallee).getACallee()
+            or
+            // class instantiation -- points-to says the call goes to the class
+            calleeValue.(ClassValue).lookup("__init__").(PythonFunctionValue).getScope() =
+              xmlCallee.(XMLPythonCallee).getACallee()
+          )
+          or
+          xmlCallee instanceof XMLExternalCallee and
+          calleeValue.(BuiltinFunctionObjectInternal).getBuiltin() =
+            xmlCallee.(XMLExternalCallee).getACallee()
+          or
+          xmlCallee instanceof XMLExternalCallee and
+          calleeValue.(BuiltinMethodObjectInternal).getBuiltin() =
+            xmlCallee.(XMLExternalCallee).getACallee()
+        )
+      )
+    }
+
+    Value getCalleeValue() { result = calleeValue }
+  }
+}
--- a/python/tools/recorded-call-graph-metrics/ql/query/InternalMetrics.ql
+++ b/python/tools/recorded-call-graph-metrics/ql/query/InternalMetrics.ql
@@ -0,0 +1,17 @@
+/**
+ * Metrics for evaluating how good we are at interpreting results from the cg_trace program.
+ * See Metrics.ql for call-graph quality metrics.
+ */
+
+import lib.RecordedCalls
+
+from string text, float number, float ratio
+where
+  exists(int all_rcs | all_rcs = count(XMLRecordedCall rc) and ratio = number / all_rcs |
+    text = "XMLRecordedCall" and number = all_rcs
+    or
+    text = "IdentifiedRecordedCall" and number = count(IdentifiedRecordedCall rc)
+    or
+    text = "UnidentifiedRecordedCall" and number = count(UnidentifiedRecordedCall rc)
+  )
+select text, number, ratio * 100 + "%" as percent
--- a/python/tools/recorded-call-graph-metrics/ql/query/Metrics.ql
+++ b/python/tools/recorded-call-graph-metrics/ql/query/Metrics.ql
@@ -0,0 +1,56 @@
+import lib.RecordedCalls
+
+// column i is just used for sorting
+from string text, float number, float ratio, int i
+where
+  exists(int all_rcs | all_rcs = count(XMLRecordedCall rc) and ratio = number / all_rcs |
+    text = "XMLRecordedCall" and number = all_rcs and i = 0
+    or
+    text = "IgnoredRecordedCall" and number = count(IgnoredRecordedCall rc) and i = 1
+    or
+    text = "not IgnoredRecordedCall" and number = all_rcs - count(IgnoredRecordedCall rc) and i = 2
+  )
+  or
+  text = "----------" and
+  number = 0 and
+  ratio = 0 and
+  i = 10
+  or
+  exists(int all_not_ignored_rcs |
+    all_not_ignored_rcs = count(XMLRecordedCall rc | not rc instanceof IgnoredRecordedCall) and
+    ratio = number / all_not_ignored_rcs
+  |
+    text = "IdentifiedRecordedCall" and
+    number = count(IdentifiedRecordedCall rc | not rc instanceof IgnoredRecordedCall) and
+    i = 11
+    or
+    text = "UnidentifiedRecordedCall" and
+    number = count(UnidentifiedRecordedCall rc | not rc instanceof IgnoredRecordedCall) and
+    i = 12
+  )
+  or
+  text = "----------" and
+  number = 0 and
+  ratio = 0 and
+  i = 20
+  or
+  exists(int all_identified_rcs |
+    all_identified_rcs = count(IdentifiedRecordedCall rc | not rc instanceof IgnoredRecordedCall) and
+    ratio = number / all_identified_rcs
+  |
+    text = "points-to ResolvableRecordedCall" and
+    number =
+      count(PointsToBasedCallGraph::ResolvableRecordedCall rc |
+        not rc instanceof IgnoredRecordedCall
+      ) and
+    i = 21
+    or
+    text = "points-to not ResolvableRecordedCall" and
+    number =
+      all_identified_rcs -
+        count(PointsToBasedCallGraph::ResolvableRecordedCall rc |
+          not rc instanceof IgnoredRecordedCall
+        ) and
+    i = 22
+  )
+select i, text, number, ratio * 100 + "%" as percent order by i
--- a/python/tools/recorded-call-graph-metrics/ql/query/PointsToFound.ql
+++ b/python/tools/recorded-call-graph-metrics/ql/query/PointsToFound.ql
@@ -0,0 +1,4 @@
+import lib.RecordedCalls
+
+from PointsToBasedCallGraph::ResolvableRecordedCall rc
+select rc.getACall(), "-->", rc.getCalleeValue()
--- a/python/tools/recorded-call-graph-metrics/ql/query/PointsToNotFound.ql
+++ b/python/tools/recorded-call-graph-metrics/ql/query/PointsToNotFound.ql
@@ -0,0 +1,5 @@
+import lib.RecordedCalls
+
+from IdentifiedRecordedCall rc
+where not rc instanceof PointsToBasedCallGraph::ResolvableRecordedCall
+select rc, rc.getACall()
--- a/python/tools/recorded-call-graph-metrics/ql/query/UnidentifiedRecordedCalls.ql
+++ b/python/tools/recorded-call-graph-metrics/ql/query/UnidentifiedRecordedCalls.ql
@@ -0,0 +1,23 @@
+import lib.RecordedCalls
+
+from UnidentifiedRecordedCall rc, string reason
+where
+  not rc instanceof IgnoredRecordedCall and
+  (
+    not exists(rc.getACall()) and
+    reason = "no call"
+    or
+    count(rc.getACall()) > 1 and
+    reason = "more than 1 call"
+    or
+    not exists(rc.getAPythonCallee()) and
+    not exists(rc.getABuiltinCallee()) and
+    reason = "no callee"
+    or
+    count(rc.getAPythonCallee()) > 1 and
+    reason = "more than 1 Python callee"
+    or
+    count(rc.getABuiltinCallee()) > 1 and
+    reason = "more than 1 Builtin callee"
+  )
+select rc, reason
--- a/python/tools/recorded-call-graph-metrics/ql/query/UnknownOpcode.ql
+++ b/python/tools/recorded-call-graph-metrics/ql/query/UnknownOpcode.ql
@@ -0,0 +1,9 @@
+import python
+import lib.RecordedCalls
+
+// Could be useful for deciding which new opcodes to support
+from string op_name, int c
+where
+  exists(XMLBytecodeUnknown unknown | unknown.get_opname_data() = op_name) and
+  c = count(XMLBytecodeUnknown unknown | unknown.get_opname_data() = op_name | 1)
+select op_name, c order by c
--- a/python/tools/recorded-call-graph-metrics/recreate-db.sh
+++ b/python/tools/recorded-call-graph-metrics/recreate-db.sh
@@ -1,23 +0,0 @@
-#!/bin/bash
-
-set -e
-set -x
-
-DB="cg-trace-example-db"
-SRC="example/"
-XMLDIR="$SRC"
-PYTHON_EXTRACTOR=$(codeql resolve extractor --language=python)
-
-
-./cg_trace.py --xml example/simple.xml example/simple.py
-
-rm -rf "$DB"
-
-
-codeql database init --source-root="$SRC" --language=python "$DB"
-codeql database trace-command --working-dir="$SRC" "$DB" "$PYTHON_EXTRACTOR/tools/autobuild.sh"
-codeql database index-files --language xml --include-extension .xml --working-dir="$XMLDIR" "$DB"
-codeql database finalize "$DB"
-
-set +x
-echo "Created database '$DB'"
--- a/python/tools/recorded-call-graph-metrics/requirements.txt
+++ b/python/tools/recorded-call-graph-metrics/requirements.txt
@@ -0,0 +1,5 @@
+lxml
+# dev
+black
+flake8
+flake8-bugbear
--- a/python/tools/recorded-call-graph-metrics/setup.py
+++ b/python/tools/recorded-call-graph-metrics/setup.py
@@ -0,0 +1,14 @@
+from setuptools import find_packages, setup
+
+# using src/ folder as recommended in: https://blog.ionelmc.ro/2014/05/25/python-packaging/
+
+setup(
+    name="cg_trace",
+    version="0.0.2",  # Remember to update src/cg_trace/__init__.py
+    description="Call graph tracing",
+    packages=find_packages("src"),
+    package_dir={"": "src"},
+    install_requires=["lxml"],
+    entry_points={"console_scripts": ["cg-trace = cg_trace.main:main"]},
+    python_requires=">=3.7",
+)
--- a/python/tools/recorded-call-graph-metrics/src/cg_trace/init.py
+++ b/python/tools/recorded-call-graph-metrics/src/cg_trace/init.py
@@ -0,0 +1,15 @@
+import sys
+
+__version__ = "0.0.2"  # remember to update setup.py
+
+# Since the virtual machine opcodes changed in 3.6, not going to attempt to support
+# anything before that. Using dataclasses, which is a new feature in Python 3.7
+MIN_PYTHON_VERSION = (3, 7)
+MIN_PYTHON_VERSION_FORMATTED = ".".join(str(i) for i in MIN_PYTHON_VERSION)
+
+if not sys.version_info[:2] >= MIN_PYTHON_VERSION:
+    sys.exit(
+        "You need at least Python {} to use 'cg_trace'".format(
+            MIN_PYTHON_VERSION_FORMATTED
+        )
+    )
--- a/python/tools/recorded-call-graph-metrics/src/cg_trace/main.py
+++ b/python/tools/recorded-call-graph-metrics/src/cg_trace/main.py
@@ -0,0 +1,5 @@
+import sys
+
+from cg_trace.main import main
+
+sys.exit(main())
--- a/python/tools/recorded-call-graph-metrics/src/cg_trace/bytecode_reconstructor.py
+++ b/python/tools/recorded-call-graph-metrics/src/cg_trace/bytecode_reconstructor.py
@@ -0,0 +1,275 @@
+import dataclasses
+import dis
+import logging
+from dis import Instruction
+from types import FrameType
+from typing import Any, List
+
+from cg_trace.settings import DEBUG, FAIL_ON_UNKNOWN_BYTECODE
+from cg_trace.utils import better_compare_for_dataclass
+
+LOGGER = logging.getLogger(__name__)
+
+# See https://docs.python.org/3/library/dis.html#python-bytecode-instructions for
+# details on the bytecode instructions
+
+# TODO: read https://opensource.com/article/18/4/introduction-python-bytecode
+
+
+class BytecodeExpr:
+    """An expression reconstructed from Python bytecode
+    """
+
+
+@better_compare_for_dataclass
+@dataclasses.dataclass(frozen=True, eq=True, order=True)
+class BytecodeConst(BytecodeExpr):
+    """FOR LOAD_CONST"""
+
+    value: Any
+
+    def __str__(self):
+        return repr(self.value)
+
+
+@better_compare_for_dataclass
+@dataclasses.dataclass(frozen=True, eq=True, order=True)
+class BytecodeVariableName(BytecodeExpr):
+    name: str
+
+    def __str__(self):
+        return self.name
+
+
+@better_compare_for_dataclass
+@dataclasses.dataclass(frozen=True, eq=True, order=True)
+class BytecodeAttribute(BytecodeExpr):
+    attr_name: str
+    object: BytecodeExpr
+
+    def __str__(self):
+        return f"{self.object}.{self.attr_name}"
+
+
+@better_compare_for_dataclass
+@dataclasses.dataclass(frozen=True, eq=True, order=True)
+class BytecodeSubscript(BytecodeExpr):
+    key: BytecodeExpr
+    object: BytecodeExpr
+
+    def __str__(self):
+        return f"{self.object}[{self.key}]"
+
+
+@better_compare_for_dataclass
+@dataclasses.dataclass(frozen=True, eq=True, order=True)
+class BytecodeTuple(BytecodeExpr):
+    elements: List[BytecodeExpr]
+
+    def __str__(self):
+        elements_formatted = (
+            ", ".join(str(e) for e in self.elements)
+            if len(self.elements) > 1
+            else f"{self.elements[0]},"
+        )
+        return f"({elements_formatted})"
+
+
+@better_compare_for_dataclass
+@dataclasses.dataclass(frozen=True, eq=True, order=True)
+class BytecodeList(BytecodeExpr):
+    elements: List[BytecodeExpr]
+
+    def __str__(self):
+        elements_formatted = (
+            ", ".join(str(e) for e in self.elements)
+            if len(self.elements) > 1
+            else f"{self.elements[0]},"
+        )
+        return f"[{elements_formatted}]"
+
+
+@better_compare_for_dataclass
+@dataclasses.dataclass(frozen=True, eq=True, order=True)
+class BytecodeCall(BytecodeExpr):
+    function: BytecodeExpr
+
+    def __str__(self):
+        return f"{self.function}()"
+
+
+@better_compare_for_dataclass
+@dataclasses.dataclass(frozen=True, eq=True, order=True)
+class BytecodeUnknown(BytecodeExpr):
+    opname: str
+
+    def __str__(self):
+        return f"<{self.opname}>"
+
+
+@better_compare_for_dataclass
+@dataclasses.dataclass(frozen=True, eq=True, order=True)
+class BytecodeMakeFunction(BytecodeExpr):
+    """For MAKE_FUNCTION opcode"""
+
+    qualified_name: BytecodeExpr
+
+    def __str__(self):
+        return f"<MAKE_FUNCTION>(qualified_name={self.qualified_name})>"
+
+
+@better_compare_for_dataclass
+@dataclasses.dataclass(frozen=True, eq=True, order=True)
+class SomethingInvolvingScaryBytecodeJump(BytecodeExpr):
+    opname: str
+
+    def __str__(self):
+        return "<SomethingInvolvingScaryBytecodeJump>"
+
+
+def expr_that_added_elem_to_stack(
+    instructions: List[Instruction], start_index: int, stack_pos: int
+):
+    """Backwards traverse instructions
+
+    Backwards traverse the instructions starting at `start_index` until we find the
+    instruction that added the element at stack position `stack_pos` (where 0 means top
+    of stack). For example, if the instructions are:
+
+    ```
+    0: LOAD_GLOBAL              0 (func)
+    1: LOAD_CONST               1 (42)
+    2: CALL_FUNCTION            1
+    ```
+
+    We can look for the function that is called by invoking this function with
+    `start_index = 1` and `stack_pos = 1`. It will see that `LOAD_CONST` added the top
+    element to the stack, and find that `LOAD_GLOBAL` was the instruction to add element
+    in stack position 1 to the stack -- so `expr_from_instruction(instructions, 0)` is
+    returned.
+
+    It is assumed that if `stack_pos == 0` then the instruction you are looking for is
+    the one at `instructions[start_index]`. This might not hold, in case of using `NOP`
+    instructions.
+
+    If any jump instruction is found, `SomethingInvolvingScaryBytecodeJump` is returned
+    immediately. (since correctly process the bytecode when faced with jumps is not as
+    straight forward).
+    """
+    if DEBUG:
+        LOGGER.debug(
+            f"find_inst_that_added_elem_to_stack start_index={start_index} stack_pos={stack_pos}"
+        )
+    assert stack_pos >= 0
+    for inst in reversed(instructions[: start_index + 1]):
+        # Return immediately if faced with a jump
+        if inst.opcode in dis.hasjabs or inst.opcode in dis.hasjrel:
+            return SomethingInvolvingScaryBytecodeJump(inst.opname)
+
+        if stack_pos == 0:
+            if DEBUG:
+                LOGGER.debug(f"Found it: {inst}")
+            found_index = instructions.index(inst)
+            break
+        old = stack_pos
+        stack_pos -= dis.stack_effect(inst.opcode, inst.arg)
+        new = stack_pos
+        if DEBUG:
+            LOGGER.debug(f"Skipping ({old} -> {new}) {inst}")
+    else:
+        raise Exception("inst_index_for_stack_diff failed")
+
+    return expr_from_instruction(instructions, found_index)
+
+
+def expr_from_instruction(instructions: List[Instruction], index: int) -> BytecodeExpr:
+    inst = instructions[index]
+
+    if DEBUG:
+        LOGGER.debug(f"expr_from_instruction: {inst} index={index}")
+
+    if inst.opname in ["LOAD_GLOBAL", "LOAD_FAST", "LOAD_NAME", "LOAD_DEREF"]:
+        return BytecodeVariableName(inst.argval)
+
+    # elif inst.opname in ["LOAD_CONST"]:
+    #     return BytecodeConst(inst.argval)
+
+    # https://docs.python.org/3/library/dis.html#opcode-LOAD_METHOD
+    # https://docs.python.org/3/library/dis.html#opcode-LOAD_ATTR
+    elif inst.opname in ["LOAD_METHOD", "LOAD_ATTR"]:
+        attr_name = inst.argval
+        obj_expr = expr_that_added_elem_to_stack(instructions, index - 1, 0)
+        return BytecodeAttribute(attr_name=attr_name, object=obj_expr)
+
+    # elif inst.opname in ["BINARY_SUBSCR"]:
+    #     key_expr = expr_that_added_elem_to_stack(instructions, index - 1, 0)
+    #     obj_expr = expr_that_added_elem_to_stack(instructions, index - 1, 1)
+    #     return BytecodeSubscript(key=key_expr, object=obj_expr)
+
+    # elif inst.opname in ["BUILD_TUPLE", "BUILD_LIST"]:
+    #     elements = []
+    #     for i in range(inst.arg):
+    #         element_expr = expr_that_added_elem_to_stack(instructions, index - 1, i)
+    #         elements.append(element_expr)
+    #     elements.reverse()
+    #     klass = {"BUILD_TUPLE": BytecodeTuple, "BUILD_LIST": BytecodeList}[inst.opname]
+    #     return klass(elements=elements)
+
+    # https://docs.python.org/3/library/dis.html#opcode-CALL_FUNCTION
+    elif inst.opname in [
+        "CALL_FUNCTION",
+        "CALL_METHOD",
+        "CALL_FUNCTION_KW",
+        "CALL_FUNCTION_EX",
+    ]:
+        assert index > 0
+        assert isinstance(inst.arg, int)
+        if inst.opname in ["CALL_FUNCTION", "CALL_METHOD"]:
+            num_stack_elems = inst.arg
+        elif inst.opname == "CALL_FUNCTION_KW":
+            num_stack_elems = inst.arg + 1
+        elif inst.opname == "CALL_FUNCTION_EX":
+            # top of stack _can_ be keyword argument dictionary (indicated by lowest bit
+            # set), always followed by the positional arguments (also if there are not
+            # any).
+            num_stack_elems = (1 if inst.arg & 1 == 1 else 0) + 1
+
+        func_expr = expr_that_added_elem_to_stack(
+            instructions, index - 1, num_stack_elems
+        )
+        return BytecodeCall(function=func_expr)
+
+    # elif inst.opname in ["MAKE_FUNCTION"]:
+    #     name_expr = expr_that_added_elem_to_stack(instructions, index - 1, 0)
+    #     assert isinstance(name_expr, BytecodeConst)
+    #     return BytecodeMakeFunction(qualified_name=name_expr)
+
+    # TODO: handle with statements (https://docs.python.org/3/library/dis.html#opcode-SETUP_WITH)
+    WITH_OPNAMES = ["SETUP_WITH", "WITH_CLEANUP_START", "WITH_CLEANUP_FINISH"]
+
+    # Special cases ignored for now:
+    #
+    # - LOAD_BUILD_CLASS: Called when constructing a class.
+    # - IMPORT_NAME: Observed to result in a call to filename='<frozen
+    #   importlib._bootstrap>', linenum=389, funcname='parent'
+    if FAIL_ON_UNKNOWN_BYTECODE:
+        if inst.opname not in ["LOAD_BUILD_CLASS", "IMPORT_NAME"] + WITH_OPNAMES:
+            LOGGER.warning(
+                f"Don't know how to handle this type of instruction: {inst.opname}"
+            )
+            raise BaseException()
+
+    return BytecodeUnknown(inst.opname)
+
+
+def expr_from_frame(frame: FrameType) -> BytecodeExpr:
+    bytecode = dis.Bytecode(frame.f_code, current_offset=frame.f_lasti)
+
+    if DEBUG:
+        LOGGER.debug(
+            f"{frame.f_code.co_filename}:{frame.f_lineno}: bytecode: \n{bytecode.dis()}"
+        )
+
+    instructions = list(iter(bytecode))
+    last_instruction_index = [inst.offset for inst in instructions].index(frame.f_lasti)
+    return expr_from_instruction(instructions, last_instruction_index)
--- a/python/tools/recorded-call-graph-metrics/src/cg_trace/cmdline.py
+++ b/python/tools/recorded-call-graph-metrics/src/cg_trace/cmdline.py
@@ -0,0 +1,22 @@
+import argparse
+
+
+def parse(args):
+    parser = argparse.ArgumentParser()
+
+    parser.add_argument(
+        "--debug", action="store_true", default=False, help="Enable debug logging"
+    )
+
+    parser.add_argument("--xml")
+
+    parser.add_argument(
+        "--module", action="store_true", default=False, help="Trace a module"
+    )
+
+    parser.add_argument("progname", help="file to run as main program")
+    parser.add_argument(
+        "arguments", nargs=argparse.REMAINDER, help="arguments to the program"
+    )
+
+    return parser.parse_args(args)
--- a/python/tools/recorded-call-graph-metrics/src/cg_trace/exporter.py
+++ b/python/tools/recorded-call-graph-metrics/src/cg_trace/exporter.py
@@ -0,0 +1,46 @@
+import dataclasses
+from typing import Dict
+
+from lxml import etree
+
+
+def dataclass_to_xml(obj, parent):
+    obj_elem = etree.SubElement(parent, obj.__class__.__name__)
+    for field in dataclasses.fields(obj):
+        field_elem = etree.SubElement(obj_elem, field.name)
+        value = getattr(obj, field.name)
+        if isinstance(value, (str, int)) or value is None:
+            field_elem.text = str(value)
+        elif isinstance(value, list):
+            for list_elem in value:
+                assert dataclasses.is_dataclass(list_elem)
+                dataclass_to_xml(list_elem, field_elem)
+        elif dataclasses.is_dataclass(value):
+            dataclass_to_xml(value, field_elem)
+        else:
+            raise ValueError(
+                f"Can't export key {field.name!r} with value {value!r} (type {type(value)}"
+            )
+
+
+class XMLExporter:
+    @staticmethod
+    def export(outfile_path, recorded_calls, info: Dict[str, str]):
+
+        root = etree.Element("root")
+
+        info_elem = etree.SubElement(root, "info")
+        for k, v in info.items():
+            etree.SubElement(info_elem, k).text = v
+
+        rcs = etree.SubElement(root, "recorded_calls")
+
+        for (call, callee) in sorted(recorded_calls):
+            rc = etree.SubElement(rcs, "recorded_call")
+            dataclass_to_xml(call, rc)
+            dataclass_to_xml(callee, rc)
+
+        tree = etree.ElementTree(root)
+        tree.write(outfile_path, encoding="utf-8", pretty_print=True)
+
+        print(f"Wrote {len(recorded_calls)} recorded calls to {outfile_path}")
--- a/python/tools/recorded-call-graph-metrics/src/cg_trace/generate_bytecode_expr_qll.py
+++ b/python/tools/recorded-call-graph-metrics/src/cg_trace/generate_bytecode_expr_qll.py
@@ -0,0 +1,46 @@
+import dataclasses
+from typing import Any, List
+
+from cg_trace.bytecode_reconstructor import BytecodeExpr
+
+PREAMBLE = """\
+import python
+
+abstract class XMLBytecodeExpr extends XMLElement { }
+"""
+
+CLASS_PREAMBLE = """\
+class XML{class_name} extends XMLBytecodeExpr {{
+  XML{class_name}() {{ this.hasName("{class_name}") }}
+"""
+
+CLASS_AFTER = """\
+}
+"""
+
+ATTR_TEMPLATES = {
+    str: 'string get_{name}_data() {{ result = this.getAChild("{name}").getTextValue() }}',
+    int: 'int get_{name}_data() {{ result = this.getAChild("{name}").getTextValue().toInt() }}',
+    BytecodeExpr: 'XMLBytecodeExpr get_{name}_data() {{ result.getParent() = this.getAChild("{name}") }}',
+    List[
+        BytecodeExpr
+    ]: 'XMLBytecodeExpr get_{name}_data(int index) {{ result = this.getAChild("{name}").getChild(index) }}',
+    Any: 'string get_{name}_data_raw() {{ result = this.getAChild("{name}").getTextValue() }}',
+}
+
+if __name__ == "__main__":
+
+    print(PREAMBLE)
+
+    for sc in BytecodeExpr.__subclasses__():
+        print(CLASS_PREAMBLE.format(class_name=sc.__name__))
+
+        for f in dataclasses.fields(sc):
+            field_template = ATTR_TEMPLATES.get(f.type)
+            if field_template:
+                generated = field_template.format(name=f.name)
+                print(f"  {generated}")
+            else:
+                raise Exception("no template for", f.type)
+
+        print(CLASS_AFTER)
--- a/python/tools/recorded-call-graph-metrics/src/cg_trace/main.py
+++ b/python/tools/recorded-call-graph-metrics/src/cg_trace/main.py
@@ -0,0 +1,118 @@
+import itertools
+import logging
+import os
+import sys
+import time
+from datetime import datetime
+from io import StringIO
+
+from cg_trace import __version__, cmdline, settings, tracer
+from cg_trace.exporter import XMLExporter
+
+
+def record_calls(code, globals):
+    real_stdout = sys.stdout
+    real_stderr = sys.stderr
+    captured_stdout = StringIO()
+    captured_stderr = StringIO()
+
+    sys.stdout = captured_stdout
+    sys.stderr = captured_stderr
+
+    cgt = tracer.CallGraphTracer()
+    exit_status = cgt.run(code, globals, globals)
+    sys.stdout = real_stdout
+    sys.stderr = real_stderr
+
+    all_calls_sorted = sorted(
+        itertools.chain(cgt.python_calls.values(), cgt.external_calls.values())
+    )
+
+    return all_calls_sorted, captured_stdout, captured_stderr, exit_status
+
+
+def setup_logging(debug):
+    # code we run can also set up logging, so we need to set the level directly on our
+    # own pacakge
+    sh = logging.StreamHandler(stream=sys.stderr)
+
+    pkg_logger = logging.getLogger("cg_trace")
+    pkg_logger.addHandler(sh)
+    pkg_logger.setLevel(logging.CRITICAL if debug else logging.INFO)
+
+
+def main(args=None) -> int:
+
+    # from . import bytecode_reconstructor
+    # logging.getLogger(bytecode_reconstructor.__name__).setLevel(logging.INFO)
+
+    if args is None:
+        # first element in argv is program name
+        args = sys.argv[1:]
+
+    opts = cmdline.parse(args)
+
+    settings.DEBUG = opts.debug
+    setup_logging(opts.debug)
+
+    # These details of setting up the program to be run is very much inspired by `trace`
+    # from the standard library
+    if opts.module:
+        import runpy
+
+        module_name = opts.progname
+        _mod_name, mod_spec, code = runpy._get_module_details(module_name)
+        sys.argv = [code.co_filename, *opts.arguments]
+        globs = {
+            "__name__": "__main__",
+            "__file__": code.co_filename,
+            "__package__": mod_spec.parent,
+            "__loader__": mod_spec.loader,
+            "__spec__": mod_spec,
+            "__cached__": None,
+        }
+    else:
+        sys.argv = [opts.progname, *opts.arguments]
+        sys.path[0] = os.path.dirname(opts.progname)
+
+        with open(opts.progname) as fp:
+            code = compile(fp.read(), opts.progname, "exec")
+
+        # try to emulate __main__ namespace as much as possible
+        globs = {
+            "__file__": opts.progname,
+            "__name__": "__main__",
+            "__package__": None,
+            "__cached__": None,
+        }
+
+    start = time.time()
+    recorded_calls, captured_stdout, captured_stderr, exit_status = record_calls(
+        code, globs
+    )
+    end = time.time()
+    elapsed_formatted = f"{end-start:.2f} seconds"
+
+    if opts.xml:
+        XMLExporter.export(
+            opts.xml,
+            recorded_calls,
+            info={
+                "cg_trace_version": __version__,
+                "args": " ".join(args),
+                "exit_status": exit_status,
+                "elapsed": elapsed_formatted,
+                "utctimestamp": datetime.utcnow().replace(microsecond=0).isoformat(),
+            },
+        )
+    else:
+        print(f"--- Recorded calls (in {elapsed_formatted}) ---")
+        for (call, callee) in recorded_calls:
+            print(f"{call} --> {callee}")
+
+    print("--- captured stdout ---")
+    print(captured_stdout.getvalue(), end="")
+    print("--- captured stderr ---")
+    print(captured_stderr.getvalue(), end="")
+
+    return 0
--- a/python/tools/recorded-call-graph-metrics/src/cg_trace/settings.py
+++ b/python/tools/recorded-call-graph-metrics/src/cg_trace/settings.py
@@ -0,0 +1,6 @@
+# Whether to run the call graph tracer with debugging enabled. Turning off
+# `if DEBUG: LOGGER.debug()` code completely yielded massive performance improvements.
+DEBUG = False
+
+
+FAIL_ON_UNKNOWN_BYTECODE = False
--- a/python/tools/recorded-call-graph-metrics/src/cg_trace/tracer.py
+++ b/python/tools/recorded-call-graph-metrics/src/cg_trace/tracer.py
@@ -0,0 +1,333 @@
+import dataclasses
+import inspect
+import logging
+import os
+import sys
+from types import FrameType
+from typing import Any, Optional, Tuple
+
+from cg_trace.bytecode_reconstructor import BytecodeExpr, expr_from_frame
+from cg_trace.settings import DEBUG
+from cg_trace.utils import better_compare_for_dataclass
+
+LOGGER = logging.getLogger(__name__)
+
+
+# copy-paste For interactive ipython sessions
+# import IPython; sys.stdout = sys.__stdout__; IPython.embed(); sys.exit()
+
+
+_canonic_filename_cache = dict()
+
+
+def canonic_filename(filename):
+    """Return canonical form of filename. (same as Bdb.canonic)
+
+    For real filenames, the canonical form is a case-normalized (on
+    case insensitive filesystems) absolute path.  'Filenames' with
+    angle brackets, such as "<stdin>", generated in interactive
+    mode, are returned unchanged.
+    """
+    if filename == "<" + filename[1:-1] + ">":
+        return filename
+    canonic = _canonic_filename_cache.get(filename)
+    if not canonic:
+        canonic = os.path.abspath(filename)
+        canonic = os.path.normcase(canonic)
+        _canonic_filename_cache[filename] = canonic
+    return canonic
+
+
+_call_cache = dict()
+
+
+@dataclasses.dataclass(frozen=True, eq=True, order=True)
+class Call:
+    """A call
+    """
+
+    filename: str
+    linenum: int
+    inst_index: int
+    bytecode_expr: BytecodeExpr
+
+    def __str__(self):
+        d = dataclasses.asdict(self)
+        del d["bytecode_expr"]
+        normal_fields = ", ".join(f"{k}={v!r}" for k, v in d.items())
+
+        return f"{type(self).__name__}({normal_fields}, bytecode_expr≈{str(self.bytecode_expr)})"
+
+    @classmethod
+    def from_frame(cls, frame: FrameType):
+        global _call_cache
+        key = cls.hash_key(frame)
+        if key in _call_cache:
+            return _call_cache[key]
+
+        code = frame.f_code
+
+        bytecode_expr = expr_from_frame(frame)
+
+        call = cls(
+            filename=canonic_filename(code.co_filename),
+            linenum=frame.f_lineno,
+            inst_index=frame.f_lasti,
+            bytecode_expr=bytecode_expr,
+        )
+
+        _call_cache[key] = call
+
+        return call
+
+    @staticmethod
+    def hash_key(frame: FrameType) -> Tuple[str, int, int]:
+        code = frame.f_code
+        return (
+            canonic_filename(code.co_filename),
+            frame.f_lineno,
+            frame.f_lasti,
+        )
+
+
+@dataclasses.dataclass(frozen=True, eq=True, order=True)
+class Callee:
+    pass
+
+
+BUILTIN_FUNCTION_OR_METHOD = type(print)
+METHOD_DESCRIPTOR_TYPE = type(dict.get)
+
+
+_unknown_module_fixup_cache = dict()
+
+
+def _unkown_module_fixup(func):
+    # TODO: Doesn't work for everything (for example: `OrderedDict.fromkeys`, `object.__new__`)
+
+    # TODO: Can make this logic easier by using `func.__self__`. For `f = dict().get`, `f.__self__.__class__ == dict`
+    # and `dict.__new__.__self__ = dict`
+
+    module = func.__module__
+    qualname = func.__qualname__
+    cls_name, method_name = qualname.split(".")
+
+    key = (module, qualname)
+    if key in _unknown_module_fixup_cache:
+        return _unknown_module_fixup_cache[key]
+
+    matching_classes = list()
+    for klass in object.__subclasses__():
+
+        if inspect.isabstract(klass):
+            continue
+
+        try:
+            # type(dict.get) == METHOD_DESCRIPTOR_TYPE
+            # type(dict.__new__) == BUILTIN_FUNCTION_OR_METHOD
+            if klass.__qualname__ == cls_name and type(
+                getattr(klass, method_name, None)
+            ) in [BUILTIN_FUNCTION_OR_METHOD, METHOD_DESCRIPTOR_TYPE]:
+                matching_classes.append(klass)
+        # For flask, observed to give `ValueError: Namespace class is abstract`, even with the isabstract above
+        except ValueError:
+            pass
+
+    if len(matching_classes) == 1:
+        klass = matching_classes[0]
+        ret = klass.__module__
+    else:
+        if DEBUG:
+            LOGGER.debug(f"Found more than one matching class for {module} {qualname}")
+        ret = None
+    _unknown_module_fixup_cache[key] = ret
+    return ret
+
+
+@better_compare_for_dataclass
+@dataclasses.dataclass(frozen=True, eq=True)
+class ExternalCallee(Callee):
+    # Some bound methods might not have __module__ attribute: for example,
+    # `list().append.__module__ is None`
+    module: Optional[str]
+    qualname: str
+    #
+    is_builtin: bool
+
+    @classmethod
+    def from_arg(cls, func):
+        # builtin bound methods seems to always return `None` for __module__, but we
+        # might be able to recover the lost information by looking through all classes.
+        # For example, `dict().get.__module__ is None` and `dict().get.__qualname__ ==
+        # "dict.get"`
+
+        module = func.__module__
+        qualname = func.__qualname__
+        if module is None and qualname.count(".") == 1:
+            module = _unkown_module_fixup(func)
+
+        return cls(
+            module=module,
+            qualname=qualname,
+            is_builtin=type(func) == BUILTIN_FUNCTION_OR_METHOD,
+        )
+
+    def __lt__(self, other):
+        if not isinstance(other, ExternalCallee):
+            raise TypeError()
+
+        for field in dataclasses.fields(self):
+            s_a = getattr(self, field.name)
+            o_a = getattr(other, field.name)
+
+            # `None < None` gives TypeError
+            if s_a is None and o_a is None:
+                return False
+
+            if type(s_a) != type(o_a):
+                return type(s_a).__name__ < type(o_a).__name__
+
+            if not s_a < o_a:
+                return False
+
+        return True
+
+    def __gt__(self, other):
+        return other < self
+
+    def __ge__(self, other):
+        return self > other or self == other
+
+    def __le__(self, other):
+        return self < other or self == other
+
+
+@better_compare_for_dataclass
+@dataclasses.dataclass(frozen=True, eq=True, order=True)
+class PythonCallee(Callee):
+    """A callee (Function/Lambda/???)
+
+    should (hopefully) be uniquely identified by its name and location (filename+line
+    number)
+    """
+
+    filename: str
+    linenum: int
+    funcname: str
+
+    @classmethod
+    def from_frame(cls, frame: FrameType):
+        code = frame.f_code
+        return cls(
+            filename=canonic_filename(code.co_filename),
+            linenum=frame.f_lineno,
+            funcname=code.co_name,
+        )
+
+
+class CallGraphTracer:
+    """Tracer that records calls being made
+
+    It would seem obvious that this should have extended `trace` library
+    (https://docs.python.org/3/library/trace.html), but that part is not extensible.
+
+    You might think that we can just use `sys.settrace`
+    (https://docs.python.org/3.8/library/sys.html#sys.settrace) like the basic debugger
+    (bdb) does, but that isn't invoked on calls to C code, which we need in general, and
+    need for handling builtins specifically.
+
+    Luckily, `sys.setprofile`
+    (https://docs.python.org/3.8/library/sys.html#sys.setprofile) provides all that we
+    need. You might be scared by reading the following bit of the documentation
+
+    > The function is thread-specific, but there is no way for the profiler to know about
+    > context switches between threads, so it does not make sense to use this in the
+    > presence of multiple threads.
+
+    but that is to be understood in the context of making a profiler (you can't reliably
+    measure function execution time if you don't know about context switches). For our
+    use-case, this is not a problem.
+    """
+
+    def __init__(self):
+        # Performing `Call.from_frame` can be expensive, so we cache (call, callee)
+        # pairs we have already seen to avoid double procressing.
+        self.python_calls = dict()
+        self.external_calls = dict()
+
+    def run(self, code, globals, locals):
+        self.exec_call_seen = False
+        self.ignore_rest = False
+        try:
+            sys.setprofile(self.profilefunc)
+            exec(code, globals, locals)
+            return "completed"
+        except SystemExit:
+            return "completed (SystemExit)"
+        except Exception:
+            sys.setprofile(None)
+            LOGGER.info("Exception occurred while running program:", exc_info=True)
+            return "exception occurred"
+        finally:
+            sys.setprofile(None)
+
+    def profilefunc(self, frame: FrameType, event: str, arg):
+        # ignore everything until the first call, since that is `exec` from the `run`
+        # method above
+        if not self.exec_call_seen:
+            if event == "call":
+                self.exec_call_seen = True
+            return
+
+        # if we're going out of the exec, we should ignore anything else (for example the
+        # call to `sys.setprofile(None)`)
+        if event == "c_return":
+            if arg == exec and frame.f_code.co_filename == __file__:
+                self.ignore_rest = True
+
+        if self.ignore_rest:
+            return
+
+        if event not in ["call", "c_call"]:
+            return
+
+        if DEBUG:
+            LOGGER.debug(f"profilefunc event={event}")
+        if event == "call":
+            # in call, the `frame` argument is new the frame for entering the callee
+            assert frame.f_back is not None
+
+            callee = PythonCallee.from_frame(frame)
+
+            key = (Call.hash_key(frame.f_back), callee)
+            if key in self.python_calls:
+                if DEBUG:
+                    LOGGER.debug(f"ignoring already seen call {key[0]} --> {callee}")
+                return
+
+            if DEBUG:
+                LOGGER.debug(f"callee={callee}")
+            call = Call.from_frame(frame.f_back)
+
+            self.python_calls[key] = (call, callee)
+
+        if event == "c_call":
+            # in c_call, the `frame` argument is frame where the call happens, and the
+            # `arg` argument is the C function object.
+
+            callee = ExternalCallee.from_arg(arg)
+
+            key = (Call.hash_key(frame), callee)
+            if key in self.external_calls:
+                if DEBUG:
+                    LOGGER.debug(f"ignoring already seen call {key[0]} --> {callee}")
+                return
+
+            if DEBUG:
+                LOGGER.debug(f"callee={callee}")
+            call = Call.from_frame(frame)
+
+            self.external_calls[key] = (call, callee)
+
+        if DEBUG:
+            LOGGER.debug(f"{call} --> {callee}")
--- a/python/tools/recorded-call-graph-metrics/src/cg_trace/utils.py
+++ b/python/tools/recorded-call-graph-metrics/src/cg_trace/utils.py
@@ -0,0 +1,20 @@
+def better_compare_for_dataclass(cls):
+    """When dataclass is used with `order=True`, the comparison methods is only implemented for
+    objects of the same class. This decorator extends the functionality to compare class
+    name if used against other objects.
+    """
+    for op in [
+        "__lt__",
+        "__le__",
+        "__gt__",
+        "__ge__",
+    ]:
+        old = getattr(cls, op)
+
+        def new(self, other, op=op, old=old):
+            if type(self) == type(other):
+                return old(self, other)
+            return getattr(str, op)(self.__class__.__name__, other.__class__.__name__)
+
+        setattr(cls, op, new)
+    return cls
--- a/python/tools/recorded-call-graph-metrics/tests/create-test-db.sh
+++ b/python/tools/recorded-call-graph-metrics/tests/create-test-db.sh
@@ -0,0 +1,32 @@
+#!/bin/bash
+
+set -Eeuo pipefail # see https://vaneyckt.io/posts/safer_bash_scripts_with_set_euxo_pipefail/
+
+SCRIPTDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
+
+if ! pip show cg_trace &>/dev/null; then
+    echo "You need to follow setup instructions in README"
+    exit 1
+fi
+
+DB="$SCRIPTDIR/cg-trace-test-db"
+SRC="$SCRIPTDIR/python-src/"
+XMLDIR="$SCRIPTDIR/python-traces/"
+PYTHON_EXTRACTOR=$(codeql resolve extractor --language=python)
+
+rm -rf "$DB"
+rm -rf "$XMLDIR"
+
+mkdir -p "$XMLDIR"
+
+for f in $(ls $SRC); do
+    echo "Tracing $f"
+    cg-trace --xml "$XMLDIR/${f%.py}.xml" "$SRC/$f"
+done
+
+codeql database init --source-root="$SRC" --language=python "$DB"
+codeql database trace-command --working-dir="$SRC" "$DB" "$PYTHON_EXTRACTOR/tools/autobuild.sh"
+codeql database index-files --language xml --include-extension .xml --working-dir="$XMLDIR" "$DB"
+codeql database finalize "$DB"
+
+echo "Created database '$DB'"
--- a/python/tools/recorded-call-graph-metrics/tests/python-src/BUILD_LIST.py
+++ b/python/tools/recorded-call-graph-metrics/tests/python-src/BUILD_LIST.py
@@ -0,0 +1,9 @@
+def foo():
+    print("foo")
+
+
+def bar():
+    print("bar")
+
+
+[foo, bar][0]()
--- a/python/tools/recorded-call-graph-metrics/tests/python-src/BUILD_TUPLE.py
+++ b/python/tools/recorded-call-graph-metrics/tests/python-src/BUILD_TUPLE.py
@@ -0,0 +1,9 @@
+def foo():
+    print("foo")
+
+
+def bar():
+    print("bar")
+
+
+(foo, bar)[0]()
--- a/python/tools/recorded-call-graph-metrics/tests/python-src/CALL_FUNCTION_EX.py
+++ b/python/tools/recorded-call-graph-metrics/tests/python-src/CALL_FUNCTION_EX.py
@@ -0,0 +1,15 @@
+def func(*args, **kwargs):
+    print("func", args, kwargs)
+
+
+args = [1, 2, 3]
+kwargs = {"a": 1, "b": 2}
+
+# These gives rise to a CALL_FUNCTION_EX
+func(*args)
+func(**kwargs)
+func(*args, **kwargs)
+
+
+func(*args, foo="foo")
+func(*args, foo="foo", **kwargs)
--- a/python/tools/recorded-call-graph-metrics/tests/python-src/getitem.py
+++ b/python/tools/recorded-call-graph-metrics/tests/python-src/getitem.py
@@ -0,0 +1,7 @@
+class Foo:
+    def __getitem__(self, key):
+        print("__getitem__")
+
+
+foo = Foo()
+foo["key"]  # this is recorded as a call :)
--- a/python/tools/recorded-call-graph-metrics/tests/python-src/builtins.py
+++ b/python/tools/recorded-call-graph-metrics/tests/python-src/builtins.py
@@ -0,0 +1,9 @@
+print("builtins test")
+len("bar")
+l = list()
+l.append(42)
+
+import sys
+sys.getdefaultencoding()
+
+r = range(10)
--- a/python/tools/recorded-call-graph-metrics/tests/python-src/class-simple.py
+++ b/python/tools/recorded-call-graph-metrics/tests/python-src/class-simple.py
@@ -0,0 +1,44 @@
+def func(self, arg):
+    print("func", self, arg)
+
+
+class Foo(object):
+    def __init__(self, arg):
+        print("Foo.__init__", self, arg)
+
+    def some_method(self):
+        print("Foo.some_method", self)
+        return self
+
+    f = func
+
+    @staticmethod
+    def some_staticmethod():
+        print("Foo.some_staticmethod")
+
+    @classmethod
+    def some_classmethod(cls):
+        print("Foo.some_classmethod", cls)
+
+
+foo = Foo(42)
+foo.some_method()
+foo.f(10)
+foo.some_staticmethod()
+foo.some_classmethod()
+foo.some_method().some_method().some_method()
+
+
+Foo.some_staticmethod()
+Foo.some_classmethod()
+
+
+class Bar(object):
+    def wat(self):
+        print("Bar.wat")
+
+
+# these calls to Bar() are not recorded (since no __init__ function)
+bar = Bar()
+bar.wat()
+Bar().wat()
--- a/python/tools/recorded-call-graph-metrics/tests/python-src/dict-get.py
+++ b/python/tools/recorded-call-graph-metrics/tests/python-src/dict-get.py
@@ -0,0 +1,3 @@
+d = dict()
+
+d.get("foo") or d.get("bar")
--- a/python/tools/recorded-call-graph-metrics/tests/python-src/getsockname.py
+++ b/python/tools/recorded-call-graph-metrics/tests/python-src/getsockname.py
@@ -0,0 +1,4 @@
+import socket
+
+sock = socket.socket()
+print(sock.getsockname())
--- a/python/tools/recorded-call-graph-metrics/tests/python-src/io-builtin.py
+++ b/python/tools/recorded-call-graph-metrics/tests/python-src/io-builtin.py
@@ -0,0 +1,4 @@
+import io
+
+# the `io.open` is just an alias for `_io.open`, but we record the external callee as `io.open` :|
+io.open("foo")
--- a/python/tools/recorded-call-graph-metrics/tests/python-src/iteration.py
+++ b/python/tools/recorded-call-graph-metrics/tests/python-src/iteration.py
@@ -0,0 +1,7 @@
+for i in range(10):
+    print(i)
+
+[i + 1 for i in range(10)]
+l = list(range(10))
+[i + 1 for i in l]
+[i + 1 for i in l]
--- a/python/tools/recorded-call-graph-metrics/tests/python-src/multiple-on-one-line.py
+++ b/python/tools/recorded-call-graph-metrics/tests/python-src/multiple-on-one-line.py
@@ -0,0 +1,37 @@
+def one(*args, **kwargs):
+    print("one")
+    return 1
+
+def two(*args, **kwargs):
+    print("two")
+    return 2
+
+def three(*args, **kwargs):
+    print("three")
+    return 3
+
+one(); two()
+print("---")
+
+one(); one()
+print("---")
+
+alias_one = one
+alias_one(); two()
+print("---")
+
+three(one(), two())
+print("---")
+
+three(one(), two=two())
+print("---")
+
+def f():
+    print("f")
+
+    def g():
+        print("g")
+
+    return g
+
+f()()
--- a/python/tools/recorded-call-graph-metrics/tests/python-src/problem-1.py
+++ b/python/tools/recorded-call-graph-metrics/tests/python-src/problem-1.py
@@ -0,0 +1,26 @@
+class Foo:
+    def __init__(self):
+        self.list = []
+
+    def func(self, kwargs=None, result_callback=None):
+        self.list.append((kwargs or {}, result_callback))
+
+
+foo = Foo()
+foo.func()
+
+"""
+Has problematic bytecode, since to find out what method is called from instruction 16, we need
+to traverse the JUMP_IF_TRUE_OR_POP which requires some more sophistication.
+
+Disassembly of <code object func at 0x7f98f64ee030, file "example/problem-1.py", line 5>:
+  6           0 LOAD_FAST                0 (self)
+              2 LOAD_ATTR                0 (list)
+              4 LOAD_METHOD              1 (append)
+              6 LOAD_FAST                1 (kwargs)
+              8 JUMP_IF_TRUE_OR_POP     12
+             10 BUILD_MAP                0
+        >>   12 LOAD_FAST                2 (result_callback)
+             14 BUILD_TUPLE              2
+             16 CALL_METHOD              1
+"""
--- a/python/tools/recorded-call-graph-metrics/tests/python-src/problem-2.py
+++ b/python/tools/recorded-call-graph-metrics/tests/python-src/problem-2.py
@@ -0,0 +1,25 @@
+def func(func_arg):
+    print("func")
+
+    def func2():
+        print("func2")
+        return func_arg()
+
+    func2()
+
+
+def nop():
+    print("nop")
+    pass
+
+
+func(nop)
+
+
+"""
+Needs handling of LOAD_DEREF. Disassembled bytecode looks like:
+
+  6           8 LOAD_DEREF               0 (func_arg)
+             10 CALL_FUNCTION            0
+             12 RETURN_VALUE
+"""
--- a/python/tools/recorded-call-graph-metrics/tests/python-src/simple.py
+++ b/python/tools/recorded-call-graph-metrics/tests/python-src/simple.py
@@ -0,0 +1,10 @@
+def foo():
+    print('foo')
+
+def bar():
+    print('bar')
+
+foo()
+bar()
+
+foo(); bar()
--- a/python/tools/recorded-call-graph-metrics/tests/python-src/with-exit.py
+++ b/python/tools/recorded-call-graph-metrics/tests/python-src/with-exit.py
@@ -0,0 +1,5 @@
+import sys
+
+print("will exit now")
+
+sys.exit()