mirror of
https://github.com/github/codeql.git
synced 2025-12-16 16:53:25 +01:00
Merge pull request #3953 from RasmusWL/python-more-call-graph-tracing
Approved by tausbn
This commit is contained in:
6
python/tools/recorded-call-graph-metrics/.flake8
Normal file
6
python/tools/recorded-call-graph-metrics/.flake8
Normal file
@@ -0,0 +1,6 @@
|
||||
# As described in https://github.com/psf/black/blob/master/docs/compatible_configs.md#flake8
|
||||
# and https://black.readthedocs.io/en/stable/the_black_code_style.html#line-length
|
||||
[flake8]
|
||||
max-line-length = 88
|
||||
select = C,E,F,W,B,B950
|
||||
ignore = E203, E501, W503
|
||||
13
python/tools/recorded-call-graph-metrics/.gitignore
vendored
Normal file
13
python/tools/recorded-call-graph-metrics/.gitignore
vendored
Normal file
@@ -0,0 +1,13 @@
|
||||
# Example DB
|
||||
cg-trace-example-db/
|
||||
|
||||
# Tests artifacts
|
||||
tests/python-traces/
|
||||
tests/cg-trace-test-db
|
||||
|
||||
# Artifact from building `pip install -e .`
|
||||
src/cg_trace.egg-info/
|
||||
|
||||
projects/
|
||||
|
||||
venv/
|
||||
6
python/tools/recorded-call-graph-metrics/.isort.cfg
Normal file
6
python/tools/recorded-call-graph-metrics/.isort.cfg
Normal file
@@ -0,0 +1,6 @@
|
||||
[settings]
|
||||
multi_line_output = 3
|
||||
include_trailing_comma = True
|
||||
force_grid_wrap = 0
|
||||
use_parentheses = True
|
||||
line_length = 88
|
||||
@@ -4,14 +4,113 @@ also known as _call graph tracing_.
|
||||
|
||||
Execute a python program and for each call being made, record the call and callee. This allows us to compare call graph resolution from static analysis with actual data -- that is, can we statically determine the target of each actual call correctly.
|
||||
|
||||
This is still in the early stages, and currently only supports a very minimal working example (to show that this approach might work).
|
||||
Using the call graph tracer does incur a heavy toll on the performance. Expect 10x longer to execute the program.
|
||||
|
||||
The next hurdle is being able to handle multiple calls on the same line, such as
|
||||
Number of calls recorded vary a little from run to run. I have not been able to pinpoint why.
|
||||
|
||||
- `foo(); bar()`
|
||||
- `foo(bar())`
|
||||
- `foo().bar()`
|
||||
## Running against real projects
|
||||
|
||||
## How do I give it a spin?
|
||||
Currently it's possible to gather metrics from traced runs of the standard test suite of a few projects (defined in [projects.json](./projects.json)): `youtube-dl`, `wcwidth`, and `flask`.
|
||||
|
||||
Run the `recreate-db.sh` script to create the database `cg-trace-example-db`, which will include the `example/simple.xml` trace from executing the `example/simple.py` code. Then run the queries inside the `ql/` directory.
|
||||
To run against all projects, use
|
||||
|
||||
```bash
|
||||
$ ./helper.sh all $(./helper.sh projects)
|
||||
```
|
||||
|
||||
To view the results, use
|
||||
```
|
||||
$ head -n 100 projects/*/Metrics.txt
|
||||
```
|
||||
|
||||
### Expanding set of projects
|
||||
|
||||
It should be fairly straightforward to expand the set of projects. Most projects use `tox` for running their tests against multiple python versions. I didn't look into any kind of integration, but have manually picked out the instructions required to get going.
|
||||
|
||||
As an example, compare the [`tox.ini`](https://github.com/pallets/flask/blob/21c3df31de4bc2f838c945bd37d185210d9bab1a/tox.ini) file from flask with the configuration
|
||||
|
||||
```json
|
||||
"flask": {
|
||||
"repo": "https://github.com/pallets/flask.git",
|
||||
"sha": "21c3df31de4bc2f838c945bd37d185210d9bab1a",
|
||||
"module_command": "pytest -c /dev/null tests examples",
|
||||
"setup": [
|
||||
"pip install -r requirements/tests.txt",
|
||||
"pip install -q -e examples/tutorial[test]",
|
||||
"pip install -q -e examples/javascript[test]"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Local development
|
||||
|
||||
### Setup
|
||||
|
||||
1. Ensure you have at least Python 3.7
|
||||
|
||||
2. Create virtual environment `python3 -m venv venv` and activate it
|
||||
|
||||
3. Install dependencies `pip install -r --upgrade requirements.txt`
|
||||
|
||||
4. Install this codebase as an editable package `pip install -e .`
|
||||
|
||||
5. Setup your editor. If you're using VS Code, create a new project for this folder, and
|
||||
use these settings for correct autoformatting of code on save:
|
||||
```
|
||||
{
|
||||
"python.pythonPath": "venv/bin/python",
|
||||
"python.linting.enabled": true,
|
||||
"python.linting.flake8Enabled": true,
|
||||
"python.formatting.provider": "black",
|
||||
"editor.formatOnSave": true,
|
||||
"[python]": {
|
||||
"editor.codeActionsOnSave": {
|
||||
"source.organizeImports": true
|
||||
}
|
||||
},
|
||||
"python.autoComplete.extraPaths": [
|
||||
"src"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
6. Enjoy writing code, and being able to run `cg-trace` on your command line :tada:
|
||||
|
||||
### Using it
|
||||
|
||||
After following setup instructions above, you should be able to reproduce the example trace by running
|
||||
|
||||
```
|
||||
cg-trace --xml example/simple.xml example/simple.py
|
||||
```
|
||||
|
||||
You can also run traces for all tests and build a database by running `tests/create-test-db.sh`. Then run the queries inside the `ql/` directory.
|
||||
|
||||
## Tracing Limitations
|
||||
|
||||
### Multi-threading
|
||||
|
||||
Should be possible by using [`threading.setprofile`](https://docs.python.org/3.8/library/threading.html#threading.setprofile), but that hasn't been done yet.
|
||||
|
||||
### Code that uses `sys.setprofile`
|
||||
|
||||
Since that is our mechanism for recording calls, any code that uses `sys.setprofile` will not work together with the call-graph tracer.
|
||||
|
||||
### Class instantiation
|
||||
|
||||
Does not always fire off an event in the `sys.setprofile` function (neither in `sys.settrace`), so is not recorded. Example:
|
||||
|
||||
```
|
||||
r = range(10)
|
||||
```
|
||||
|
||||
when disassembled (`python -m dis <file>`):
|
||||
|
||||
```
|
||||
9 48 LOAD_NAME 7 (range)
|
||||
50 LOAD_CONST 5 (10)
|
||||
52 CALL_FUNCTION 1
|
||||
54 STORE_NAME 8 (r)
|
||||
```
|
||||
|
||||
but no event :disappointed:
|
||||
|
||||
@@ -1,222 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
|
||||
"""Call Graph tracing.
|
||||
|
||||
Execute a python program and for each call being made, record the call and callee. This
|
||||
allows us to compare call graph resolution from static analysis with actual data -- that
|
||||
is, can we statically determine the target of each actual call correctly.
|
||||
|
||||
If there is 100% code coverage from the Python execution, it would also be possible to
|
||||
look at the precision of the call graph resolutions -- that is, do we expect a function to
|
||||
be able to be called in a place where it is not? Currently not something we're looking at.
|
||||
"""
|
||||
|
||||
# read: https://eli.thegreenplace.net/2012/03/23/python-internals-how-callables-work/
|
||||
|
||||
# TODO: Know that a call to a C-function was made. See
|
||||
# https://docs.python.org/3/library/bdb.html#bdb.Bdb.trace_dispatch. Maybe use `lxml` as
|
||||
# test
|
||||
|
||||
# For inspiration, look at these projects:
|
||||
# - https://github.com/joerick/pyinstrument (capture call-stack every <n> ms for profiling)
|
||||
# - https://github.com/gak/pycallgraph (display call-graph with graphviz after python execution)
|
||||
|
||||
import argparse
|
||||
import bdb
|
||||
from io import StringIO
|
||||
import sys
|
||||
import os
|
||||
import dis
|
||||
import dataclasses
|
||||
import csv
|
||||
import xml.etree.ElementTree as ET
|
||||
|
||||
# Copy-Paste and uncomment for interactive ipython sessions
|
||||
# import IPython; IPython.embed(); sys.exit()
|
||||
|
||||
|
||||
@dataclasses.dataclass(frozen=True)
|
||||
class Call():
|
||||
"""A call
|
||||
"""
|
||||
filename: str
|
||||
linenum: int
|
||||
inst_index: int
|
||||
|
||||
@classmethod
|
||||
def from_frame(cls, frame, debugger: bdb.Bdb):
|
||||
code = frame.f_code
|
||||
|
||||
# Uncomment to see the bytecode
|
||||
# b = dis.Bytecode(frame.f_code, current_offset=frame.f_lasti)
|
||||
# print(b.dis(), file=sys.__stderr__)
|
||||
|
||||
return cls(
|
||||
filename = debugger.canonic(code.co_filename),
|
||||
linenum = frame.f_lineno,
|
||||
inst_index = frame.f_lasti,
|
||||
)
|
||||
|
||||
|
||||
@dataclasses.dataclass(frozen=True)
|
||||
class Callee():
|
||||
"""A callee (Function/Lambda/???)
|
||||
|
||||
should (hopefully) be uniquely identified by its name and location (filename+line
|
||||
number)
|
||||
"""
|
||||
funcname: str
|
||||
filename: str
|
||||
linenum: int
|
||||
|
||||
@classmethod
|
||||
def from_frame(cls, frame, debugger: bdb.Bdb):
|
||||
code = frame.f_code
|
||||
return cls(
|
||||
funcname = code.co_name,
|
||||
filename = debugger.canonic(code.co_filename),
|
||||
linenum = frame.f_lineno,
|
||||
)
|
||||
|
||||
|
||||
class CallGraphTracer(bdb.Bdb):
|
||||
"""Tracer that records calls being made
|
||||
|
||||
It would seem obvious that this should have extended `trace` library
|
||||
(https://docs.python.org/3/library/trace.html), but that part is not extensible --
|
||||
however, the basic debugger (bdb) is, and provides maybe a bit more help than just
|
||||
using `sys.settrace` directly.
|
||||
"""
|
||||
|
||||
recorded_calls: set
|
||||
|
||||
def __init__(self):
|
||||
self.recorded_calls = set()
|
||||
super().__init__()
|
||||
|
||||
def user_call(self, frame, argument_list):
|
||||
call = Call.from_frame(frame.f_back, self)
|
||||
callee = Callee.from_frame(frame, self)
|
||||
|
||||
# _print(f'{call} -> {callee}')
|
||||
self.recorded_calls.add((call, callee))
|
||||
|
||||
|
||||
################################################################################
|
||||
# Export
|
||||
################################################################################
|
||||
|
||||
|
||||
class Exporter:
|
||||
|
||||
@staticmethod
|
||||
def export(recorded_calls, outfile_path):
|
||||
raise NotImplementedError()
|
||||
|
||||
@staticmethod
|
||||
def dataclass_to_dict(obj):
|
||||
d = dataclasses.asdict(obj)
|
||||
prefix = obj.__class__.__name__.lower()
|
||||
return {f"{prefix}_{key}": val for (key, val) in d.items()}
|
||||
|
||||
|
||||
class CSVExporter(Exporter):
|
||||
|
||||
@staticmethod
|
||||
def export(recorded_calls, outfile_path):
|
||||
with open(outfile_path, 'w', newline='') as csv_file:
|
||||
writer = None
|
||||
for (call, callee) in recorded_calls:
|
||||
data = {
|
||||
**Exporter.dataclass_to_dict(call),
|
||||
**Exporter.dataclass_to_dict(callee)
|
||||
}
|
||||
|
||||
if writer is None:
|
||||
writer = csv.DictWriter(csv_file, fieldnames=data.keys())
|
||||
writer.writeheader()
|
||||
|
||||
writer.writerow(data)
|
||||
|
||||
|
||||
print(f'output written to {outfile_path}')
|
||||
|
||||
# embed(); sys.exit()
|
||||
|
||||
|
||||
class XMLExporter(Exporter):
|
||||
|
||||
@staticmethod
|
||||
def export(recorded_calls, outfile_path):
|
||||
|
||||
root = ET.Element('root')
|
||||
|
||||
for (call, callee) in recorded_calls:
|
||||
data = {
|
||||
**Exporter.dataclass_to_dict(call),
|
||||
**Exporter.dataclass_to_dict(callee)
|
||||
}
|
||||
|
||||
rc = ET.SubElement(root, 'recorded_call')
|
||||
# this xml library only supports serializing attributes that have string values
|
||||
rc.attrib = {k: str(v) for k, v in data.items()}
|
||||
|
||||
tree = ET.ElementTree(root)
|
||||
tree.write(outfile_path, encoding='utf-8')
|
||||
|
||||
|
||||
################################################################################
|
||||
# __main__
|
||||
################################################################################
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
||||
|
||||
parser = argparse.ArgumentParser()
|
||||
|
||||
|
||||
parser.add_argument('--csv')
|
||||
parser.add_argument('--xml')
|
||||
|
||||
parser.add_argument('progname', help='file to run as main program')
|
||||
parser.add_argument('arguments', nargs=argparse.REMAINDER,
|
||||
help='arguments to the program')
|
||||
|
||||
opts = parser.parse_args()
|
||||
|
||||
# These details of setting up the program to be run is very much inspired by `trace`
|
||||
# from the standard library
|
||||
sys.argv = [opts.progname, *opts.arguments]
|
||||
sys.path[0] = os.path.dirname(opts.progname)
|
||||
|
||||
with open(opts.progname) as fp:
|
||||
code = compile(fp.read(), opts.progname, 'exec')
|
||||
|
||||
# try to emulate __main__ namespace as much as possible
|
||||
globs = {
|
||||
'__file__': opts.progname,
|
||||
'__name__': '__main__',
|
||||
'__package__': None,
|
||||
'__cached__': None,
|
||||
}
|
||||
|
||||
real_stdout = sys.stdout
|
||||
real_stderr = sys.stderr
|
||||
captured_stdout = StringIO()
|
||||
|
||||
sys.stdout = captured_stdout
|
||||
cgt = CallGraphTracer()
|
||||
cgt.run(code, globs, globs)
|
||||
sys.stdout = real_stdout
|
||||
|
||||
if opts.csv:
|
||||
CSVExporter.export(cgt.recorded_calls, opts.csv)
|
||||
elif opts.xml:
|
||||
XMLExporter.export(cgt.recorded_calls, opts.xml)
|
||||
else:
|
||||
for (call, callee) in cgt.recorded_calls:
|
||||
print(f'{call} -> {callee}')
|
||||
|
||||
print('--- captured stdout ---')
|
||||
print(captured_stdout.getvalue(), end='')
|
||||
@@ -1,6 +1,137 @@
|
||||
<root>
|
||||
<recorded_call call_filename="/home/rasmus/code/ql/python/tools/recorded-call-graph-metrics/example/simple.py" call_linenum="7" call_inst_index="18" callee_funcname="foo" callee_filename="/home/rasmus/code/ql/python/tools/recorded-call-graph-metrics/example/simple.py" callee_linenum="1" />
|
||||
<recorded_call call_filename="/home/rasmus/code/ql/python/tools/recorded-call-graph-metrics/example/simple.py" call_linenum="8" call_inst_index="24" callee_funcname="bar" callee_filename="/home/rasmus/code/ql/python/tools/recorded-call-graph-metrics/example/simple.py" callee_linenum="4" />
|
||||
<recorded_call call_filename="/home/rasmus/code/ql/python/tools/recorded-call-graph-metrics/example/simple.py" call_linenum="10" call_inst_index="30" callee_funcname="foo" callee_filename="/home/rasmus/code/ql/python/tools/recorded-call-graph-metrics/example/simple.py" callee_linenum="1" />
|
||||
<recorded_call call_filename="/home/rasmus/code/ql/python/tools/recorded-call-graph-metrics/example/simple.py" call_linenum="10" call_inst_index="36" callee_funcname="bar" callee_filename="/home/rasmus/code/ql/python/tools/recorded-call-graph-metrics/example/simple.py" callee_linenum="4" />
|
||||
<info>
|
||||
<cg_trace_version>0.0.2</cg_trace_version>
|
||||
<args>--xml example/simple.xml example/simple.py</args>
|
||||
<exit_status>completed</exit_status>
|
||||
<elapsed>0.00 seconds</elapsed>
|
||||
<utctimestamp>2020-07-22T12:14:02</utctimestamp>
|
||||
</info>
|
||||
<recorded_calls>
|
||||
<recorded_call>
|
||||
<Call>
|
||||
<filename>/home/rasmus/code/ql/python/tools/recorded-call-graph-metrics/example/simple.py</filename>
|
||||
<linenum>2</linenum>
|
||||
<inst_index>4</inst_index>
|
||||
<bytecode_expr>
|
||||
<BytecodeCall>
|
||||
<function>
|
||||
<BytecodeVariableName>
|
||||
<name>print</name>
|
||||
</BytecodeVariableName>
|
||||
</function>
|
||||
</BytecodeCall>
|
||||
</bytecode_expr>
|
||||
</Call>
|
||||
<ExternalCallee>
|
||||
<module>builtins</module>
|
||||
<qualname>print</qualname>
|
||||
<is_builtin>True</is_builtin>
|
||||
</ExternalCallee>
|
||||
</recorded_call>
|
||||
<recorded_call>
|
||||
<Call>
|
||||
<filename>/home/rasmus/code/ql/python/tools/recorded-call-graph-metrics/example/simple.py</filename>
|
||||
<linenum>5</linenum>
|
||||
<inst_index>4</inst_index>
|
||||
<bytecode_expr>
|
||||
<BytecodeCall>
|
||||
<function>
|
||||
<BytecodeVariableName>
|
||||
<name>print</name>
|
||||
</BytecodeVariableName>
|
||||
</function>
|
||||
</BytecodeCall>
|
||||
</bytecode_expr>
|
||||
</Call>
|
||||
<ExternalCallee>
|
||||
<module>builtins</module>
|
||||
<qualname>print</qualname>
|
||||
<is_builtin>True</is_builtin>
|
||||
</ExternalCallee>
|
||||
</recorded_call>
|
||||
<recorded_call>
|
||||
<Call>
|
||||
<filename>/home/rasmus/code/ql/python/tools/recorded-call-graph-metrics/example/simple.py</filename>
|
||||
<linenum>7</linenum>
|
||||
<inst_index>18</inst_index>
|
||||
<bytecode_expr>
|
||||
<BytecodeCall>
|
||||
<function>
|
||||
<BytecodeVariableName>
|
||||
<name>foo</name>
|
||||
</BytecodeVariableName>
|
||||
</function>
|
||||
</BytecodeCall>
|
||||
</bytecode_expr>
|
||||
</Call>
|
||||
<PythonCallee>
|
||||
<filename>/home/rasmus/code/ql/python/tools/recorded-call-graph-metrics/example/simple.py</filename>
|
||||
<linenum>1</linenum>
|
||||
<funcname>foo</funcname>
|
||||
</PythonCallee>
|
||||
</recorded_call>
|
||||
<recorded_call>
|
||||
<Call>
|
||||
<filename>/home/rasmus/code/ql/python/tools/recorded-call-graph-metrics/example/simple.py</filename>
|
||||
<linenum>8</linenum>
|
||||
<inst_index>24</inst_index>
|
||||
<bytecode_expr>
|
||||
<BytecodeCall>
|
||||
<function>
|
||||
<BytecodeVariableName>
|
||||
<name>bar</name>
|
||||
</BytecodeVariableName>
|
||||
</function>
|
||||
</BytecodeCall>
|
||||
</bytecode_expr>
|
||||
</Call>
|
||||
<PythonCallee>
|
||||
<filename>/home/rasmus/code/ql/python/tools/recorded-call-graph-metrics/example/simple.py</filename>
|
||||
<linenum>4</linenum>
|
||||
<funcname>bar</funcname>
|
||||
</PythonCallee>
|
||||
</recorded_call>
|
||||
<recorded_call>
|
||||
<Call>
|
||||
<filename>/home/rasmus/code/ql/python/tools/recorded-call-graph-metrics/example/simple.py</filename>
|
||||
<linenum>10</linenum>
|
||||
<inst_index>30</inst_index>
|
||||
<bytecode_expr>
|
||||
<BytecodeCall>
|
||||
<function>
|
||||
<BytecodeVariableName>
|
||||
<name>foo</name>
|
||||
</BytecodeVariableName>
|
||||
</function>
|
||||
</BytecodeCall>
|
||||
</bytecode_expr>
|
||||
</Call>
|
||||
<PythonCallee>
|
||||
<filename>/home/rasmus/code/ql/python/tools/recorded-call-graph-metrics/example/simple.py</filename>
|
||||
<linenum>1</linenum>
|
||||
<funcname>foo</funcname>
|
||||
</PythonCallee>
|
||||
</recorded_call>
|
||||
<recorded_call>
|
||||
<Call>
|
||||
<filename>/home/rasmus/code/ql/python/tools/recorded-call-graph-metrics/example/simple.py</filename>
|
||||
<linenum>10</linenum>
|
||||
<inst_index>36</inst_index>
|
||||
<bytecode_expr>
|
||||
<BytecodeCall>
|
||||
<function>
|
||||
<BytecodeVariableName>
|
||||
<name>bar</name>
|
||||
</BytecodeVariableName>
|
||||
</function>
|
||||
</BytecodeCall>
|
||||
</bytecode_expr>
|
||||
</Call>
|
||||
<PythonCallee>
|
||||
<filename>/home/rasmus/code/ql/python/tools/recorded-call-graph-metrics/example/simple.py</filename>
|
||||
<linenum>4</linenum>
|
||||
<funcname>bar</funcname>
|
||||
</PythonCallee>
|
||||
</recorded_call>
|
||||
</recorded_calls>
|
||||
</root>
|
||||
|
||||
191
python/tools/recorded-call-graph-metrics/helper.sh
Executable file
191
python/tools/recorded-call-graph-metrics/helper.sh
Executable file
@@ -0,0 +1,191 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -Eeuo pipefail # see https://vaneyckt.io/posts/safer_bash_scripts_with_set_euxo_pipefail/
|
||||
|
||||
SCRIPTDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
|
||||
PROJECTS_FILE="$SCRIPTDIR/projects.json"
|
||||
|
||||
METRICS_QUERY="ql/query/Metrics.ql"
|
||||
|
||||
PROJECTS_BASE_DIR="$SCRIPTDIR/projects"
|
||||
|
||||
repo_dir() {
|
||||
echo "$PROJECTS_BASE_DIR/$1/repo"
|
||||
}
|
||||
|
||||
venv_dir() {
|
||||
echo "$PROJECTS_BASE_DIR/$1/venv"
|
||||
}
|
||||
|
||||
trace_dir() {
|
||||
echo "$PROJECTS_BASE_DIR/$1/traces"
|
||||
}
|
||||
|
||||
db_path() {
|
||||
echo "$PROJECTS_BASE_DIR/$1/$1-db"
|
||||
}
|
||||
|
||||
query_result_base_path() {
|
||||
echo "$PROJECTS_BASE_DIR/$1/$2"
|
||||
}
|
||||
|
||||
help() {
|
||||
echo -n """\
|
||||
$0 help This message
|
||||
$0 projects List projects
|
||||
$0 repo <projects> Fetch repo for projects
|
||||
$0 setup <projects> Perform setup steps for projects (install dependencies)
|
||||
$0 trace <projects> Trace projects
|
||||
$0 db <projects> Build databases for projects
|
||||
$0 metrics <projects> Run $METRICS_QUERY on projects
|
||||
$0 all <projects> Perform all the above steps for projects
|
||||
"""
|
||||
}
|
||||
|
||||
projects() {
|
||||
jq -r 'keys[]' "$PROJECTS_FILE"
|
||||
}
|
||||
|
||||
check_project_exists() {
|
||||
if ! jq -e ".\"$1\"" "$PROJECTS_FILE" &>/dev/null; then
|
||||
echo "ERROR: '$1' not a known project, see '$0 projects'"
|
||||
exit 1
|
||||
fi
|
||||
}
|
||||
|
||||
repo() {
|
||||
for project in $@; do
|
||||
check_project_exists $project
|
||||
|
||||
echo "Cloning repo for '$project'"
|
||||
|
||||
REPO_DIR=$(repo_dir $project)
|
||||
|
||||
if [[ -d "$REPO_DIR" ]]; then
|
||||
echo "Repo already cloned in $REPO_DIR"
|
||||
continue;
|
||||
fi
|
||||
|
||||
REPO_URL=$(jq -e -r ".\"$project\".repo" "$PROJECTS_FILE")
|
||||
SHA=$(jq -e -r ".\"$project\".sha" "$PROJECTS_FILE")
|
||||
|
||||
mkdir -p "$REPO_DIR"
|
||||
cd "$REPO_DIR"
|
||||
git init
|
||||
git remote add origin $REPO_URL
|
||||
git fetch --depth 1 origin $SHA
|
||||
git -c advice.detachedHead=False checkout FETCH_HEAD
|
||||
done
|
||||
}
|
||||
|
||||
setup() {
|
||||
for project in $@; do
|
||||
check_project_exists $project
|
||||
|
||||
echo "Setting up '$project'"
|
||||
|
||||
python3 -m venv $(venv_dir $project)
|
||||
source $(venv_dir $project)/bin/activate
|
||||
|
||||
cd $(repo_dir $project)
|
||||
pip install -e "$SCRIPTDIR"
|
||||
|
||||
IFS=$'\n'
|
||||
setup_commands=($(jq -r ".\"$project\".setup[]" $PROJECTS_FILE))
|
||||
unset IFS
|
||||
for setup_command in "${setup_commands[@]}"; do
|
||||
echo "Running '$setup_command'"
|
||||
$setup_command
|
||||
done
|
||||
|
||||
# deactivate venv again
|
||||
deactivate
|
||||
done
|
||||
}
|
||||
|
||||
trace() {
|
||||
for project in $@; do
|
||||
check_project_exists $project
|
||||
|
||||
echo "Tracing '$project'"
|
||||
|
||||
source $(venv_dir $project)/bin/activate
|
||||
|
||||
REPO_DIR=$(repo_dir $project)
|
||||
cd "$REPO_DIR"
|
||||
|
||||
rm -rf $(trace_dir $project)
|
||||
mkdir -p $(trace_dir $project)
|
||||
|
||||
MODULE_COMMAND=$(jq -r ".\"$project\".module_command" $PROJECTS_FILE)
|
||||
|
||||
cg-trace --xml $(trace_dir $project)/trace.xml --module $MODULE_COMMAND
|
||||
|
||||
# deactivate venv again
|
||||
deactivate
|
||||
done
|
||||
}
|
||||
|
||||
db() {
|
||||
for project in $@; do
|
||||
check_project_exists $project
|
||||
|
||||
echo "Creating CodeQL database for '$project'"
|
||||
|
||||
DB=$(db_path $project)
|
||||
SRC=$(repo_dir $project)
|
||||
PYTHON_EXTRACTOR=$(codeql resolve extractor --language=python)
|
||||
|
||||
# Source venv so we can extract dependencies
|
||||
source $(venv_dir $project)/bin/activate
|
||||
|
||||
rm -rf "$DB"
|
||||
|
||||
codeql database init --source-root="$SRC" --language=python "$DB"
|
||||
codeql database trace-command --working-dir="$SRC" "$DB" "$PYTHON_EXTRACTOR/tools/autobuild.sh"
|
||||
codeql database index-files --language xml --include-extension .xml --working-dir="$(trace_dir $project)" "$DB"
|
||||
codeql database finalize "$DB"
|
||||
|
||||
echo "Created database in '$DB'"
|
||||
|
||||
# deactivate venv again
|
||||
deactivate
|
||||
done
|
||||
}
|
||||
|
||||
metrics() {
|
||||
for project in $@; do
|
||||
check_project_exists $project
|
||||
|
||||
echo "Running $METRICS_QUERY on '$project'"
|
||||
|
||||
RESULTS_BASE=$(query_result_base_path $project Metrics)
|
||||
DB=$(db_path $project)
|
||||
|
||||
codeql query run "$SCRIPTDIR/$METRICS_QUERY" --database "$DB" --output "${RESULTS_BASE}.bqrs"
|
||||
codeql bqrs decode "${RESULTS_BASE}.bqrs" --format text --output "${RESULTS_BASE}.txt"
|
||||
|
||||
echo "Results available in '${RESULTS_BASE}.txt'"
|
||||
done
|
||||
}
|
||||
|
||||
all() {
|
||||
for project in $@; do
|
||||
check_project_exists $project
|
||||
|
||||
repo $project
|
||||
setup $project
|
||||
trace $project
|
||||
db $project
|
||||
metrics $project
|
||||
done
|
||||
}
|
||||
|
||||
|
||||
COMMAND=${1:-"help"}
|
||||
|
||||
if [[ $# -ge 2 ]]; then
|
||||
shift
|
||||
fi
|
||||
|
||||
$COMMAND $@
|
||||
28
python/tools/recorded-call-graph-metrics/projects.json
Normal file
28
python/tools/recorded-call-graph-metrics/projects.json
Normal file
@@ -0,0 +1,28 @@
|
||||
{
|
||||
"wcwidth": {
|
||||
"repo": "https://github.com/jquast/wcwidth.git",
|
||||
"sha": "b29897e5a1b403a0e36f7fc991614981cbc42475",
|
||||
"module_command": "pytest -c /dev/null",
|
||||
"setup": [
|
||||
"pip install pytest"
|
||||
]
|
||||
},
|
||||
"youtube-dl": {
|
||||
"repo": "https://github.com/ytdl-org/youtube-dl.git",
|
||||
"sha": "a115e07594ccb7749ca108c889978510c7df126e",
|
||||
"module_command": "nose -v test --exclude test_download.py --exclude test_age_restriction.py --exclude test_subtitles.py --exclude test_write_annotations.py --exclude test_youtube_lists.py --exclude test_iqiyi_sdk_interpreter.py --exclude test_socks.py",
|
||||
"setup": [
|
||||
"pip install nose"
|
||||
]
|
||||
},
|
||||
"flask": {
|
||||
"repo": "https://github.com/pallets/flask.git",
|
||||
"sha": "21c3df31de4bc2f838c945bd37d185210d9bab1a",
|
||||
"module_command": "pytest -c /dev/null tests examples",
|
||||
"setup": [
|
||||
"pip install -r requirements/tests.txt",
|
||||
"pip install -q -e examples/tutorial[test]",
|
||||
"pip install -q -e examples/javascript[test]"
|
||||
]
|
||||
}
|
||||
}
|
||||
@@ -1,9 +0,0 @@
|
||||
import RecordedCalls
|
||||
|
||||
from ValidRecordedCall rc, Call call, Function callee, CallableValue calleeValue
|
||||
where
|
||||
call = rc.getCall() and
|
||||
callee = rc.getCallee() and
|
||||
calleeValue.getScope() = callee and
|
||||
calleeValue.getACall() = call.getAFlowNode()
|
||||
select call, "-->", callee
|
||||
@@ -1,36 +0,0 @@
|
||||
import python
|
||||
|
||||
class RecordedCall extends XMLElement {
|
||||
RecordedCall() { this.hasName("recorded_call") }
|
||||
|
||||
string call_filename() { result = this.getAttributeValue("call_filename") }
|
||||
|
||||
int call_linenum() { result = this.getAttributeValue("call_linenum").toInt() }
|
||||
|
||||
int call_inst_index() { result = this.getAttributeValue("call_inst_index").toInt() }
|
||||
|
||||
Call getCall() {
|
||||
// TODO: handle calls spanning multiple lines
|
||||
result.getLocation().hasLocationInfo(this.call_filename(), this.call_linenum(), _, _, _)
|
||||
}
|
||||
|
||||
string callee_filename() { result = this.getAttributeValue("callee_filename") }
|
||||
|
||||
int callee_linenum() { result = this.getAttributeValue("callee_linenum").toInt() }
|
||||
|
||||
string callee_funcname() { result = this.getAttributeValue("callee_funcname") }
|
||||
|
||||
Function getCallee() {
|
||||
result.getLocation().hasLocationInfo(this.callee_filename(), this.callee_linenum(), _, _, _)
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Class of recorded calls where we can uniquely identify both the `call` and the `callee`.
|
||||
*/
|
||||
class ValidRecordedCall extends RecordedCall {
|
||||
ValidRecordedCall() {
|
||||
strictcount(this.getCall()) = 1 and
|
||||
strictcount(this.getCallee()) = 1
|
||||
}
|
||||
}
|
||||
@@ -1,7 +0,0 @@
|
||||
import RecordedCalls
|
||||
|
||||
from RecordedCall rc
|
||||
where not rc instanceof ValidRecordedCall
|
||||
select "Could not uniquely identify this recorded call (either call or callee was not uniquely identified)",
|
||||
rc.call_filename(), rc.call_linenum(), rc.call_inst_index(), "-->", rc.callee_filename(),
|
||||
rc.callee_linenum(), rc.callee_funcname()
|
||||
@@ -0,0 +1,73 @@
|
||||
import python
|
||||
|
||||
abstract class XMLBytecodeExpr extends XMLElement { }
|
||||
|
||||
class XMLBytecodeConst extends XMLBytecodeExpr {
|
||||
XMLBytecodeConst() { this.hasName("BytecodeConst") }
|
||||
|
||||
string get_value_data_raw() { result = this.getAChild("value").getTextValue() }
|
||||
}
|
||||
|
||||
class XMLBytecodeVariableName extends XMLBytecodeExpr {
|
||||
XMLBytecodeVariableName() { this.hasName("BytecodeVariableName") }
|
||||
|
||||
string get_name_data() { result = this.getAChild("name").getTextValue() }
|
||||
}
|
||||
|
||||
class XMLBytecodeAttribute extends XMLBytecodeExpr {
|
||||
XMLBytecodeAttribute() { this.hasName("BytecodeAttribute") }
|
||||
|
||||
string get_attr_name_data() { result = this.getAChild("attr_name").getTextValue() }
|
||||
|
||||
XMLBytecodeExpr get_object_data() { result.getParent() = this.getAChild("object") }
|
||||
}
|
||||
|
||||
class XMLBytecodeSubscript extends XMLBytecodeExpr {
|
||||
XMLBytecodeSubscript() { this.hasName("BytecodeSubscript") }
|
||||
|
||||
XMLBytecodeExpr get_key_data() { result.getParent() = this.getAChild("key") }
|
||||
|
||||
XMLBytecodeExpr get_object_data() { result.getParent() = this.getAChild("object") }
|
||||
}
|
||||
|
||||
class XMLBytecodeTuple extends XMLBytecodeExpr {
|
||||
XMLBytecodeTuple() { this.hasName("BytecodeTuple") }
|
||||
|
||||
XMLBytecodeExpr get_elements_data(int index) {
|
||||
result = this.getAChild("elements").getChild(index)
|
||||
}
|
||||
}
|
||||
|
||||
class XMLBytecodeList extends XMLBytecodeExpr {
|
||||
XMLBytecodeList() { this.hasName("BytecodeList") }
|
||||
|
||||
XMLBytecodeExpr get_elements_data(int index) {
|
||||
result = this.getAChild("elements").getChild(index)
|
||||
}
|
||||
}
|
||||
|
||||
class XMLBytecodeCall extends XMLBytecodeExpr {
|
||||
XMLBytecodeCall() { this.hasName("BytecodeCall") }
|
||||
|
||||
XMLBytecodeExpr get_function_data() { result.getParent() = this.getAChild("function") }
|
||||
}
|
||||
|
||||
class XMLBytecodeUnknown extends XMLBytecodeExpr {
|
||||
XMLBytecodeUnknown() { this.hasName("BytecodeUnknown") }
|
||||
|
||||
string get_opname_data() { result = this.getAChild("opname").getTextValue() }
|
||||
}
|
||||
|
||||
class XMLBytecodeMakeFunction extends XMLBytecodeExpr {
|
||||
XMLBytecodeMakeFunction() { this.hasName("BytecodeMakeFunction") }
|
||||
|
||||
XMLBytecodeExpr get_qualified_name_data() {
|
||||
result.getParent() = this.getAChild("qualified_name")
|
||||
}
|
||||
}
|
||||
|
||||
class XMLSomethingInvolvingScaryBytecodeJump extends XMLBytecodeExpr {
|
||||
XMLSomethingInvolvingScaryBytecodeJump() { this.hasName("SomethingInvolvingScaryBytecodeJump") }
|
||||
|
||||
string get_opname_data() { result = this.getAChild("opname").getTextValue() }
|
||||
}
|
||||
@@ -0,0 +1,269 @@
|
||||
import python
|
||||
import semmle.python.types.Builtins
|
||||
import semmle.python.objects.Callables
|
||||
import lib.BytecodeExpr
|
||||
|
||||
/** The XML data for a recorded call (includes all data). */
|
||||
class XMLRecordedCall extends XMLElement {
|
||||
XMLRecordedCall() { this.hasName("recorded_call") }
|
||||
|
||||
/** Gets the XML data for the call. */
|
||||
XMLCall getXMLCall() { result.getParent() = this }
|
||||
|
||||
/** Gets a call matching the recorded information. */
|
||||
Call getACall() { result = this.getXMLCall().getACall() }
|
||||
|
||||
/** Gets the XML data for the callee. */
|
||||
XMLCallee getXMLCallee() { result.getParent() = this }
|
||||
|
||||
/** Gets a python function matching the recorded information of the callee. */
|
||||
Function getAPythonCallee() { result = this.getXMLCallee().(XMLPythonCallee).getACallee() }
|
||||
|
||||
/** Gets a builtin function matching the recorded information of the callee. */
|
||||
Builtin getABuiltinCallee() { result = this.getXMLCallee().(XMLExternalCallee).getACallee() }
|
||||
|
||||
/** Get a different `XMLRecordedCall` with the same result-set for `getACall`. */
|
||||
XMLRecordedCall getOtherWithSameSetOfCalls() {
|
||||
// `rc` is for a different bytecode instruction on same line
|
||||
not result.getXMLCall().get_inst_index_data() = this.getXMLCall().get_inst_index_data() and
|
||||
result.getXMLCall().get_filename_data() = this.getXMLCall().get_filename_data() and
|
||||
result.getXMLCall().get_linenum_data() = this.getXMLCall().get_linenum_data() and
|
||||
// set of calls are equal
|
||||
forall(Call call | call = this.getACall() or call = result.getACall() |
|
||||
call = this.getACall() and call = result.getACall()
|
||||
)
|
||||
}
|
||||
|
||||
override string toString() {
|
||||
exists(string path |
|
||||
path =
|
||||
any(File file | file.getAbsolutePath() = this.getXMLCall().get_filename_data())
|
||||
.getRelativePath()
|
||||
or
|
||||
not exists(File file |
|
||||
file.getAbsolutePath() = this.getXMLCall().get_filename_data() and
|
||||
exists(file.getRelativePath())
|
||||
) and
|
||||
path = this.getXMLCall().get_filename_data()
|
||||
|
|
||||
result = this.getName() + ": " + path + ":" + this.getXMLCall().get_linenum_data()
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
/** The XML data for the call part a recorded call. */
|
||||
class XMLCall extends XMLElement {
|
||||
XMLCall() { this.hasName("Call") }
|
||||
|
||||
string get_filename_data() { result = this.getAChild("filename").getTextValue() }
|
||||
|
||||
int get_linenum_data() { result = this.getAChild("linenum").getTextValue().toInt() }
|
||||
|
||||
int get_inst_index_data() { result = this.getAChild("inst_index").getTextValue().toInt() }
|
||||
|
||||
/** Gets a call that matches the recorded information. */
|
||||
Call getACall() {
|
||||
// TODO: do we handle calls spanning multiple lines?
|
||||
this.matchBytecodeExpr(result, this.getAChild("bytecode_expr").getAChild())
|
||||
}
|
||||
|
||||
/** Holds if `expr` can be fully matched with `bytecode`. */
|
||||
private predicate matchBytecodeExpr(Expr expr, XMLBytecodeExpr bytecode) {
|
||||
exists(Call parent_call, XMLBytecodeCall parent_bytecode_call |
|
||||
parent_call
|
||||
.getLocation()
|
||||
.hasLocationInfo(this.get_filename_data(), this.get_linenum_data(), _, _, _) and
|
||||
parent_call.getAChildNode*() = expr and
|
||||
parent_bytecode_call.getParent() = this.getAChild("bytecode_expr") and
|
||||
parent_bytecode_call.getAChild*() = bytecode
|
||||
) and
|
||||
(
|
||||
expr.(Name).getId() = bytecode.(XMLBytecodeVariableName).get_name_data()
|
||||
or
|
||||
expr.(Attribute).getName() = bytecode.(XMLBytecodeAttribute).get_attr_name_data() and
|
||||
matchBytecodeExpr(expr.(Attribute).getObject(),
|
||||
bytecode.(XMLBytecodeAttribute).get_object_data())
|
||||
or
|
||||
matchBytecodeExpr(expr.(Call).getFunc(), bytecode.(XMLBytecodeCall).get_function_data())
|
||||
//
|
||||
// I considered allowing a partial match as well. That is, if the bytecode
|
||||
// expression information only tells us `<unknown>.foo()`, and we find an AST
|
||||
// expression that matches on `.foo()`, that is good enough.
|
||||
//
|
||||
// However, we cannot assume that all calls are recorded (such as `range(10)`),
|
||||
// and we cannot assume that for all recorded calls there exists a corresponding
|
||||
// AST call (such as for list-comprehensions).
|
||||
//
|
||||
// So allowing partial matches is not safe, since we might end up matching a
|
||||
// recorded call not in the AST together with an unrecorded call visible in the
|
||||
// AST.
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
/** The XML data for the callee part a recorded call. */
|
||||
abstract class XMLCallee extends XMLElement { }
|
||||
|
||||
/** The XML data for the callee part a recorded call, when the callee is a Python function. */
|
||||
class XMLPythonCallee extends XMLCallee {
|
||||
XMLPythonCallee() { this.hasName("PythonCallee") }
|
||||
|
||||
string get_filename_data() { result = this.getAChild("filename").getTextValue() }
|
||||
|
||||
int get_linenum_data() { result = this.getAChild("linenum").getTextValue().toInt() }
|
||||
|
||||
string get_funcname_data() { result = this.getAChild("funcname").getTextValue() }
|
||||
|
||||
Function getACallee() {
|
||||
result.getLocation().hasLocationInfo(this.get_filename_data(), this.get_linenum_data(), _, _, _)
|
||||
or
|
||||
// if function has decorator, the call will be recorded going to the first
|
||||
result
|
||||
.getADecorator()
|
||||
.getLocation()
|
||||
.hasLocationInfo(this.get_filename_data(), this.get_linenum_data(), _, _, _)
|
||||
}
|
||||
}
|
||||
|
||||
/** The XML data for the callee part a recorded call, when the callee is a C function or builtin. */
|
||||
class XMLExternalCallee extends XMLCallee {
|
||||
XMLExternalCallee() { this.hasName("ExternalCallee") }
|
||||
|
||||
string get_module_data() { result = this.getAChild("module").getTextValue() }
|
||||
|
||||
string get_qualname_data() { result = this.getAChild("qualname").getTextValue() }
|
||||
|
||||
Builtin getACallee() {
|
||||
exists(Builtin mod |
|
||||
mod.isModule() and
|
||||
mod.getName() = this.get_module_data()
|
||||
|
|
||||
result = traverse_qualname(mod, this.get_qualname_data())
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Helper predicate. If parent = `builtins` and qualname = `list.append`, it will
|
||||
* return the result of `builtins.list.append`.class
|
||||
*/
|
||||
private Builtin traverse_qualname(Builtin parent, string qualname) {
|
||||
not qualname = "__objclass__" and
|
||||
not qualname.matches("%.%") and
|
||||
result = parent.getMember(qualname)
|
||||
or
|
||||
qualname.matches("%.%") and
|
||||
exists(string before_dot, string after_dot, Builtin intermediate_parent |
|
||||
qualname = before_dot + "." + after_dot and
|
||||
not before_dot = "__objclass__" and
|
||||
intermediate_parent = parent.getMember(before_dot) and
|
||||
result = traverse_qualname(intermediate_parent, after_dot)
|
||||
)
|
||||
}
|
||||
|
||||
/**
|
||||
* Class of recorded calls where we can identify both the `call` and the `callee` uniquely.
|
||||
*/
|
||||
class IdentifiedRecordedCall extends XMLRecordedCall {
|
||||
IdentifiedRecordedCall() {
|
||||
strictcount(this.getACall()) = 1 and
|
||||
(
|
||||
strictcount(this.getAPythonCallee()) = 1
|
||||
or
|
||||
strictcount(this.getABuiltinCallee()) = 1
|
||||
)
|
||||
or
|
||||
// Handle case where the same function is called multiple times in one line, for
|
||||
// example `func(); func()`. This only works if:
|
||||
// - all the callees for these calls is the same
|
||||
// - all these calls were recorded
|
||||
//
|
||||
// without this `strictcount`, in the case `func(); func(); func()`, if 1 of the calls
|
||||
// is not recorded, we would still mark the other two recorded calls as valid
|
||||
// (which is not following the rules above). + 1 to count `this` as well.
|
||||
strictcount(this.getACall()) = strictcount(this.getOtherWithSameSetOfCalls()) + 1 and
|
||||
forex(XMLRecordedCall rc | rc = this.getOtherWithSameSetOfCalls() |
|
||||
unique(Function f | f = this.getAPythonCallee()) =
|
||||
unique(Function f | f = rc.getAPythonCallee())
|
||||
or
|
||||
unique(Builtin b | b = this.getABuiltinCallee()) =
|
||||
unique(Builtin b | b = rc.getABuiltinCallee())
|
||||
)
|
||||
}
|
||||
|
||||
override string toString() {
|
||||
exists(string callee_str |
|
||||
exists(Function callee, string path | callee = this.getAPythonCallee() |
|
||||
(
|
||||
path = callee.getLocation().getFile().getRelativePath()
|
||||
or
|
||||
not exists(callee.getLocation().getFile().getRelativePath()) and
|
||||
path = callee.getLocation().getFile().getAbsolutePath()
|
||||
) and
|
||||
callee_str =
|
||||
callee.toString() + " (" + path + ":" + callee.getLocation().getStartLine() + ")"
|
||||
)
|
||||
or
|
||||
callee_str = this.getABuiltinCallee().toString()
|
||||
|
|
||||
result = super.toString() + " --> " + callee_str
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Class of recorded calls where we cannot identify both the `call` and the `callee` uniquely.
|
||||
*/
|
||||
class UnidentifiedRecordedCall extends XMLRecordedCall {
|
||||
UnidentifiedRecordedCall() { not this instanceof IdentifiedRecordedCall }
|
||||
}
|
||||
|
||||
/**
|
||||
* Recorded calls made from outside project folder, that can be ignored when evaluating
|
||||
* call-graph quality.
|
||||
*/
|
||||
class IgnoredRecordedCall extends XMLRecordedCall {
|
||||
IgnoredRecordedCall() {
|
||||
not exists(
|
||||
any(File file | file.getAbsolutePath() = this.getXMLCall().get_filename_data())
|
||||
.getRelativePath()
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
/** Provides classes for call-graph resolution by using points-to. */
|
||||
module PointsToBasedCallGraph {
|
||||
/** An IdentifiedRecordedCall that can be resolved with points-to */
|
||||
class ResolvableRecordedCall extends IdentifiedRecordedCall {
|
||||
Value calleeValue;
|
||||
|
||||
ResolvableRecordedCall() {
|
||||
exists(Call call, XMLCallee xmlCallee |
|
||||
call = this.getACall() and
|
||||
calleeValue.getACall() = call.getAFlowNode() and
|
||||
xmlCallee = this.getXMLCallee() and
|
||||
(
|
||||
xmlCallee instanceof XMLPythonCallee and
|
||||
(
|
||||
// normal function
|
||||
calleeValue.(PythonFunctionValue).getScope() = xmlCallee.(XMLPythonCallee).getACallee()
|
||||
or
|
||||
// class instantiation -- points-to says the call goes to the class
|
||||
calleeValue.(ClassValue).lookup("__init__").(PythonFunctionValue).getScope() =
|
||||
xmlCallee.(XMLPythonCallee).getACallee()
|
||||
)
|
||||
or
|
||||
xmlCallee instanceof XMLExternalCallee and
|
||||
calleeValue.(BuiltinFunctionObjectInternal).getBuiltin() =
|
||||
xmlCallee.(XMLExternalCallee).getACallee()
|
||||
or
|
||||
xmlCallee instanceof XMLExternalCallee and
|
||||
calleeValue.(BuiltinMethodObjectInternal).getBuiltin() =
|
||||
xmlCallee.(XMLExternalCallee).getACallee()
|
||||
)
|
||||
)
|
||||
}
|
||||
|
||||
Value getCalleeValue() { result = calleeValue }
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,17 @@
|
||||
/**
|
||||
* Metrics for evaluating how good we are at interpreting results from the cg_trace program.
|
||||
* See Metrics.ql for call-graph quality metrics.
|
||||
*/
|
||||
|
||||
import lib.RecordedCalls
|
||||
|
||||
from string text, float number, float ratio
|
||||
where
|
||||
exists(int all_rcs | all_rcs = count(XMLRecordedCall rc) and ratio = number / all_rcs |
|
||||
text = "XMLRecordedCall" and number = all_rcs
|
||||
or
|
||||
text = "IdentifiedRecordedCall" and number = count(IdentifiedRecordedCall rc)
|
||||
or
|
||||
text = "UnidentifiedRecordedCall" and number = count(UnidentifiedRecordedCall rc)
|
||||
)
|
||||
select text, number, ratio * 100 + "%" as percent
|
||||
56
python/tools/recorded-call-graph-metrics/ql/query/Metrics.ql
Normal file
56
python/tools/recorded-call-graph-metrics/ql/query/Metrics.ql
Normal file
@@ -0,0 +1,56 @@
|
||||
import lib.RecordedCalls
|
||||
|
||||
// column i is just used for sorting
|
||||
from string text, float number, float ratio, int i
|
||||
where
|
||||
exists(int all_rcs | all_rcs = count(XMLRecordedCall rc) and ratio = number / all_rcs |
|
||||
text = "XMLRecordedCall" and number = all_rcs and i = 0
|
||||
or
|
||||
text = "IgnoredRecordedCall" and number = count(IgnoredRecordedCall rc) and i = 1
|
||||
or
|
||||
text = "not IgnoredRecordedCall" and number = all_rcs - count(IgnoredRecordedCall rc) and i = 2
|
||||
)
|
||||
or
|
||||
text = "----------" and
|
||||
number = 0 and
|
||||
ratio = 0 and
|
||||
i = 10
|
||||
or
|
||||
exists(int all_not_ignored_rcs |
|
||||
all_not_ignored_rcs = count(XMLRecordedCall rc | not rc instanceof IgnoredRecordedCall) and
|
||||
ratio = number / all_not_ignored_rcs
|
||||
|
|
||||
text = "IdentifiedRecordedCall" and
|
||||
number = count(IdentifiedRecordedCall rc | not rc instanceof IgnoredRecordedCall) and
|
||||
i = 11
|
||||
or
|
||||
text = "UnidentifiedRecordedCall" and
|
||||
number = count(UnidentifiedRecordedCall rc | not rc instanceof IgnoredRecordedCall) and
|
||||
i = 12
|
||||
)
|
||||
or
|
||||
text = "----------" and
|
||||
number = 0 and
|
||||
ratio = 0 and
|
||||
i = 20
|
||||
or
|
||||
exists(int all_identified_rcs |
|
||||
all_identified_rcs = count(IdentifiedRecordedCall rc | not rc instanceof IgnoredRecordedCall) and
|
||||
ratio = number / all_identified_rcs
|
||||
|
|
||||
text = "points-to ResolvableRecordedCall" and
|
||||
number =
|
||||
count(PointsToBasedCallGraph::ResolvableRecordedCall rc |
|
||||
not rc instanceof IgnoredRecordedCall
|
||||
) and
|
||||
i = 21
|
||||
or
|
||||
text = "points-to not ResolvableRecordedCall" and
|
||||
number =
|
||||
all_identified_rcs -
|
||||
count(PointsToBasedCallGraph::ResolvableRecordedCall rc |
|
||||
not rc instanceof IgnoredRecordedCall
|
||||
) and
|
||||
i = 22
|
||||
)
|
||||
select i, text, number, ratio * 100 + "%" as percent order by i
|
||||
@@ -0,0 +1,4 @@
|
||||
import lib.RecordedCalls
|
||||
|
||||
from PointsToBasedCallGraph::ResolvableRecordedCall rc
|
||||
select rc.getACall(), "-->", rc.getCalleeValue()
|
||||
@@ -0,0 +1,5 @@
|
||||
import lib.RecordedCalls
|
||||
|
||||
from IdentifiedRecordedCall rc
|
||||
where not rc instanceof PointsToBasedCallGraph::ResolvableRecordedCall
|
||||
select rc, rc.getACall()
|
||||
@@ -0,0 +1,23 @@
|
||||
import lib.RecordedCalls
|
||||
|
||||
from UnidentifiedRecordedCall rc, string reason
|
||||
where
|
||||
not rc instanceof IgnoredRecordedCall and
|
||||
(
|
||||
not exists(rc.getACall()) and
|
||||
reason = "no call"
|
||||
or
|
||||
count(rc.getACall()) > 1 and
|
||||
reason = "more than 1 call"
|
||||
or
|
||||
not exists(rc.getAPythonCallee()) and
|
||||
not exists(rc.getABuiltinCallee()) and
|
||||
reason = "no callee"
|
||||
or
|
||||
count(rc.getAPythonCallee()) > 1 and
|
||||
reason = "more than 1 Python callee"
|
||||
or
|
||||
count(rc.getABuiltinCallee()) > 1 and
|
||||
reason = "more than 1 Builtin callee"
|
||||
)
|
||||
select rc, reason
|
||||
@@ -0,0 +1,9 @@
|
||||
import python
|
||||
import lib.RecordedCalls
|
||||
|
||||
// Could be useful for deciding which new opcodes to support
|
||||
from string op_name, int c
|
||||
where
|
||||
exists(XMLBytecodeUnknown unknown | unknown.get_opname_data() = op_name) and
|
||||
c = count(XMLBytecodeUnknown unknown | unknown.get_opname_data() = op_name | 1)
|
||||
select op_name, c order by c
|
||||
@@ -1,23 +0,0 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -e
|
||||
set -x
|
||||
|
||||
DB="cg-trace-example-db"
|
||||
SRC="example/"
|
||||
XMLDIR="$SRC"
|
||||
PYTHON_EXTRACTOR=$(codeql resolve extractor --language=python)
|
||||
|
||||
|
||||
./cg_trace.py --xml example/simple.xml example/simple.py
|
||||
|
||||
rm -rf "$DB"
|
||||
|
||||
|
||||
codeql database init --source-root="$SRC" --language=python "$DB"
|
||||
codeql database trace-command --working-dir="$SRC" "$DB" "$PYTHON_EXTRACTOR/tools/autobuild.sh"
|
||||
codeql database index-files --language xml --include-extension .xml --working-dir="$XMLDIR" "$DB"
|
||||
codeql database finalize "$DB"
|
||||
|
||||
set +x
|
||||
echo "Created database '$DB'"
|
||||
@@ -0,0 +1,5 @@
|
||||
lxml
|
||||
# dev
|
||||
black
|
||||
flake8
|
||||
flake8-bugbear
|
||||
14
python/tools/recorded-call-graph-metrics/setup.py
Normal file
14
python/tools/recorded-call-graph-metrics/setup.py
Normal file
@@ -0,0 +1,14 @@
|
||||
from setuptools import find_packages, setup
|
||||
|
||||
# using src/ folder as recommended in: https://blog.ionelmc.ro/2014/05/25/python-packaging/
|
||||
|
||||
setup(
|
||||
name="cg_trace",
|
||||
version="0.0.2", # Remember to update src/cg_trace/__init__.py
|
||||
description="Call graph tracing",
|
||||
packages=find_packages("src"),
|
||||
package_dir={"": "src"},
|
||||
install_requires=["lxml"],
|
||||
entry_points={"console_scripts": ["cg-trace = cg_trace.main:main"]},
|
||||
python_requires=">=3.7",
|
||||
)
|
||||
@@ -0,0 +1,15 @@
|
||||
import sys
|
||||
|
||||
__version__ = "0.0.2" # remember to update setup.py
|
||||
|
||||
# Since the virtual machine opcodes changed in 3.6, not going to attempt to support
|
||||
# anything before that. Using dataclasses, which is a new feature in Python 3.7
|
||||
MIN_PYTHON_VERSION = (3, 7)
|
||||
MIN_PYTHON_VERSION_FORMATTED = ".".join(str(i) for i in MIN_PYTHON_VERSION)
|
||||
|
||||
if not sys.version_info[:2] >= MIN_PYTHON_VERSION:
|
||||
sys.exit(
|
||||
"You need at least Python {} to use 'cg_trace'".format(
|
||||
MIN_PYTHON_VERSION_FORMATTED
|
||||
)
|
||||
)
|
||||
@@ -0,0 +1,5 @@
|
||||
import sys
|
||||
|
||||
from cg_trace.main import main
|
||||
|
||||
sys.exit(main())
|
||||
@@ -0,0 +1,275 @@
|
||||
import dataclasses
|
||||
import dis
|
||||
import logging
|
||||
from dis import Instruction
|
||||
from types import FrameType
|
||||
from typing import Any, List
|
||||
|
||||
from cg_trace.settings import DEBUG, FAIL_ON_UNKNOWN_BYTECODE
|
||||
from cg_trace.utils import better_compare_for_dataclass
|
||||
|
||||
LOGGER = logging.getLogger(__name__)
|
||||
|
||||
# See https://docs.python.org/3/library/dis.html#python-bytecode-instructions for
|
||||
# details on the bytecode instructions
|
||||
|
||||
# TODO: read https://opensource.com/article/18/4/introduction-python-bytecode
|
||||
|
||||
|
||||
class BytecodeExpr:
|
||||
"""An expression reconstructed from Python bytecode
|
||||
"""
|
||||
|
||||
|
||||
@better_compare_for_dataclass
|
||||
@dataclasses.dataclass(frozen=True, eq=True, order=True)
|
||||
class BytecodeConst(BytecodeExpr):
|
||||
"""FOR LOAD_CONST"""
|
||||
|
||||
value: Any
|
||||
|
||||
def __str__(self):
|
||||
return repr(self.value)
|
||||
|
||||
|
||||
@better_compare_for_dataclass
|
||||
@dataclasses.dataclass(frozen=True, eq=True, order=True)
|
||||
class BytecodeVariableName(BytecodeExpr):
|
||||
name: str
|
||||
|
||||
def __str__(self):
|
||||
return self.name
|
||||
|
||||
|
||||
@better_compare_for_dataclass
|
||||
@dataclasses.dataclass(frozen=True, eq=True, order=True)
|
||||
class BytecodeAttribute(BytecodeExpr):
|
||||
attr_name: str
|
||||
object: BytecodeExpr
|
||||
|
||||
def __str__(self):
|
||||
return f"{self.object}.{self.attr_name}"
|
||||
|
||||
|
||||
@better_compare_for_dataclass
|
||||
@dataclasses.dataclass(frozen=True, eq=True, order=True)
|
||||
class BytecodeSubscript(BytecodeExpr):
|
||||
key: BytecodeExpr
|
||||
object: BytecodeExpr
|
||||
|
||||
def __str__(self):
|
||||
return f"{self.object}[{self.key}]"
|
||||
|
||||
|
||||
@better_compare_for_dataclass
|
||||
@dataclasses.dataclass(frozen=True, eq=True, order=True)
|
||||
class BytecodeTuple(BytecodeExpr):
|
||||
elements: List[BytecodeExpr]
|
||||
|
||||
def __str__(self):
|
||||
elements_formatted = (
|
||||
", ".join(str(e) for e in self.elements)
|
||||
if len(self.elements) > 1
|
||||
else f"{self.elements[0]},"
|
||||
)
|
||||
return f"({elements_formatted})"
|
||||
|
||||
|
||||
@better_compare_for_dataclass
|
||||
@dataclasses.dataclass(frozen=True, eq=True, order=True)
|
||||
class BytecodeList(BytecodeExpr):
|
||||
elements: List[BytecodeExpr]
|
||||
|
||||
def __str__(self):
|
||||
elements_formatted = (
|
||||
", ".join(str(e) for e in self.elements)
|
||||
if len(self.elements) > 1
|
||||
else f"{self.elements[0]},"
|
||||
)
|
||||
return f"[{elements_formatted}]"
|
||||
|
||||
|
||||
@better_compare_for_dataclass
|
||||
@dataclasses.dataclass(frozen=True, eq=True, order=True)
|
||||
class BytecodeCall(BytecodeExpr):
|
||||
function: BytecodeExpr
|
||||
|
||||
def __str__(self):
|
||||
return f"{self.function}()"
|
||||
|
||||
|
||||
@better_compare_for_dataclass
|
||||
@dataclasses.dataclass(frozen=True, eq=True, order=True)
|
||||
class BytecodeUnknown(BytecodeExpr):
|
||||
opname: str
|
||||
|
||||
def __str__(self):
|
||||
return f"<{self.opname}>"
|
||||
|
||||
|
||||
@better_compare_for_dataclass
|
||||
@dataclasses.dataclass(frozen=True, eq=True, order=True)
|
||||
class BytecodeMakeFunction(BytecodeExpr):
|
||||
"""For MAKE_FUNCTION opcode"""
|
||||
|
||||
qualified_name: BytecodeExpr
|
||||
|
||||
def __str__(self):
|
||||
return f"<MAKE_FUNCTION>(qualified_name={self.qualified_name})>"
|
||||
|
||||
|
||||
@better_compare_for_dataclass
|
||||
@dataclasses.dataclass(frozen=True, eq=True, order=True)
|
||||
class SomethingInvolvingScaryBytecodeJump(BytecodeExpr):
|
||||
opname: str
|
||||
|
||||
def __str__(self):
|
||||
return "<SomethingInvolvingScaryBytecodeJump>"
|
||||
|
||||
|
||||
def expr_that_added_elem_to_stack(
|
||||
instructions: List[Instruction], start_index: int, stack_pos: int
|
||||
):
|
||||
"""Backwards traverse instructions
|
||||
|
||||
Backwards traverse the instructions starting at `start_index` until we find the
|
||||
instruction that added the element at stack position `stack_pos` (where 0 means top
|
||||
of stack). For example, if the instructions are:
|
||||
|
||||
```
|
||||
0: LOAD_GLOBAL 0 (func)
|
||||
1: LOAD_CONST 1 (42)
|
||||
2: CALL_FUNCTION 1
|
||||
```
|
||||
|
||||
We can look for the function that is called by invoking this function with
|
||||
`start_index = 1` and `stack_pos = 1`. It will see that `LOAD_CONST` added the top
|
||||
element to the stack, and find that `LOAD_GLOBAL` was the instruction to add element
|
||||
in stack position 1 to the stack -- so `expr_from_instruction(instructions, 0)` is
|
||||
returned.
|
||||
|
||||
It is assumed that if `stack_pos == 0` then the instruction you are looking for is
|
||||
the one at `instructions[start_index]`. This might not hold, in case of using `NOP`
|
||||
instructions.
|
||||
|
||||
If any jump instruction is found, `SomethingInvolvingScaryBytecodeJump` is returned
|
||||
immediately. (since correctly process the bytecode when faced with jumps is not as
|
||||
straight forward).
|
||||
"""
|
||||
if DEBUG:
|
||||
LOGGER.debug(
|
||||
f"find_inst_that_added_elem_to_stack start_index={start_index} stack_pos={stack_pos}"
|
||||
)
|
||||
assert stack_pos >= 0
|
||||
for inst in reversed(instructions[: start_index + 1]):
|
||||
# Return immediately if faced with a jump
|
||||
if inst.opcode in dis.hasjabs or inst.opcode in dis.hasjrel:
|
||||
return SomethingInvolvingScaryBytecodeJump(inst.opname)
|
||||
|
||||
if stack_pos == 0:
|
||||
if DEBUG:
|
||||
LOGGER.debug(f"Found it: {inst}")
|
||||
found_index = instructions.index(inst)
|
||||
break
|
||||
old = stack_pos
|
||||
stack_pos -= dis.stack_effect(inst.opcode, inst.arg)
|
||||
new = stack_pos
|
||||
if DEBUG:
|
||||
LOGGER.debug(f"Skipping ({old} -> {new}) {inst}")
|
||||
else:
|
||||
raise Exception("inst_index_for_stack_diff failed")
|
||||
|
||||
return expr_from_instruction(instructions, found_index)
|
||||
|
||||
|
||||
def expr_from_instruction(instructions: List[Instruction], index: int) -> BytecodeExpr:
|
||||
inst = instructions[index]
|
||||
|
||||
if DEBUG:
|
||||
LOGGER.debug(f"expr_from_instruction: {inst} index={index}")
|
||||
|
||||
if inst.opname in ["LOAD_GLOBAL", "LOAD_FAST", "LOAD_NAME", "LOAD_DEREF"]:
|
||||
return BytecodeVariableName(inst.argval)
|
||||
|
||||
# elif inst.opname in ["LOAD_CONST"]:
|
||||
# return BytecodeConst(inst.argval)
|
||||
|
||||
# https://docs.python.org/3/library/dis.html#opcode-LOAD_METHOD
|
||||
# https://docs.python.org/3/library/dis.html#opcode-LOAD_ATTR
|
||||
elif inst.opname in ["LOAD_METHOD", "LOAD_ATTR"]:
|
||||
attr_name = inst.argval
|
||||
obj_expr = expr_that_added_elem_to_stack(instructions, index - 1, 0)
|
||||
return BytecodeAttribute(attr_name=attr_name, object=obj_expr)
|
||||
|
||||
# elif inst.opname in ["BINARY_SUBSCR"]:
|
||||
# key_expr = expr_that_added_elem_to_stack(instructions, index - 1, 0)
|
||||
# obj_expr = expr_that_added_elem_to_stack(instructions, index - 1, 1)
|
||||
# return BytecodeSubscript(key=key_expr, object=obj_expr)
|
||||
|
||||
# elif inst.opname in ["BUILD_TUPLE", "BUILD_LIST"]:
|
||||
# elements = []
|
||||
# for i in range(inst.arg):
|
||||
# element_expr = expr_that_added_elem_to_stack(instructions, index - 1, i)
|
||||
# elements.append(element_expr)
|
||||
# elements.reverse()
|
||||
# klass = {"BUILD_TUPLE": BytecodeTuple, "BUILD_LIST": BytecodeList}[inst.opname]
|
||||
# return klass(elements=elements)
|
||||
|
||||
# https://docs.python.org/3/library/dis.html#opcode-CALL_FUNCTION
|
||||
elif inst.opname in [
|
||||
"CALL_FUNCTION",
|
||||
"CALL_METHOD",
|
||||
"CALL_FUNCTION_KW",
|
||||
"CALL_FUNCTION_EX",
|
||||
]:
|
||||
assert index > 0
|
||||
assert isinstance(inst.arg, int)
|
||||
if inst.opname in ["CALL_FUNCTION", "CALL_METHOD"]:
|
||||
num_stack_elems = inst.arg
|
||||
elif inst.opname == "CALL_FUNCTION_KW":
|
||||
num_stack_elems = inst.arg + 1
|
||||
elif inst.opname == "CALL_FUNCTION_EX":
|
||||
# top of stack _can_ be keyword argument dictionary (indicated by lowest bit
|
||||
# set), always followed by the positional arguments (also if there are not
|
||||
# any).
|
||||
num_stack_elems = (1 if inst.arg & 1 == 1 else 0) + 1
|
||||
|
||||
func_expr = expr_that_added_elem_to_stack(
|
||||
instructions, index - 1, num_stack_elems
|
||||
)
|
||||
return BytecodeCall(function=func_expr)
|
||||
|
||||
# elif inst.opname in ["MAKE_FUNCTION"]:
|
||||
# name_expr = expr_that_added_elem_to_stack(instructions, index - 1, 0)
|
||||
# assert isinstance(name_expr, BytecodeConst)
|
||||
# return BytecodeMakeFunction(qualified_name=name_expr)
|
||||
|
||||
# TODO: handle with statements (https://docs.python.org/3/library/dis.html#opcode-SETUP_WITH)
|
||||
WITH_OPNAMES = ["SETUP_WITH", "WITH_CLEANUP_START", "WITH_CLEANUP_FINISH"]
|
||||
|
||||
# Special cases ignored for now:
|
||||
#
|
||||
# - LOAD_BUILD_CLASS: Called when constructing a class.
|
||||
# - IMPORT_NAME: Observed to result in a call to filename='<frozen
|
||||
# importlib._bootstrap>', linenum=389, funcname='parent'
|
||||
if FAIL_ON_UNKNOWN_BYTECODE:
|
||||
if inst.opname not in ["LOAD_BUILD_CLASS", "IMPORT_NAME"] + WITH_OPNAMES:
|
||||
LOGGER.warning(
|
||||
f"Don't know how to handle this type of instruction: {inst.opname}"
|
||||
)
|
||||
raise BaseException()
|
||||
|
||||
return BytecodeUnknown(inst.opname)
|
||||
|
||||
|
||||
def expr_from_frame(frame: FrameType) -> BytecodeExpr:
|
||||
bytecode = dis.Bytecode(frame.f_code, current_offset=frame.f_lasti)
|
||||
|
||||
if DEBUG:
|
||||
LOGGER.debug(
|
||||
f"{frame.f_code.co_filename}:{frame.f_lineno}: bytecode: \n{bytecode.dis()}"
|
||||
)
|
||||
|
||||
instructions = list(iter(bytecode))
|
||||
last_instruction_index = [inst.offset for inst in instructions].index(frame.f_lasti)
|
||||
return expr_from_instruction(instructions, last_instruction_index)
|
||||
@@ -0,0 +1,22 @@
|
||||
import argparse
|
||||
|
||||
|
||||
def parse(args):
|
||||
parser = argparse.ArgumentParser()
|
||||
|
||||
parser.add_argument(
|
||||
"--debug", action="store_true", default=False, help="Enable debug logging"
|
||||
)
|
||||
|
||||
parser.add_argument("--xml")
|
||||
|
||||
parser.add_argument(
|
||||
"--module", action="store_true", default=False, help="Trace a module"
|
||||
)
|
||||
|
||||
parser.add_argument("progname", help="file to run as main program")
|
||||
parser.add_argument(
|
||||
"arguments", nargs=argparse.REMAINDER, help="arguments to the program"
|
||||
)
|
||||
|
||||
return parser.parse_args(args)
|
||||
@@ -0,0 +1,46 @@
|
||||
import dataclasses
|
||||
from typing import Dict
|
||||
|
||||
from lxml import etree
|
||||
|
||||
|
||||
def dataclass_to_xml(obj, parent):
|
||||
obj_elem = etree.SubElement(parent, obj.__class__.__name__)
|
||||
for field in dataclasses.fields(obj):
|
||||
field_elem = etree.SubElement(obj_elem, field.name)
|
||||
value = getattr(obj, field.name)
|
||||
if isinstance(value, (str, int)) or value is None:
|
||||
field_elem.text = str(value)
|
||||
elif isinstance(value, list):
|
||||
for list_elem in value:
|
||||
assert dataclasses.is_dataclass(list_elem)
|
||||
dataclass_to_xml(list_elem, field_elem)
|
||||
elif dataclasses.is_dataclass(value):
|
||||
dataclass_to_xml(value, field_elem)
|
||||
else:
|
||||
raise ValueError(
|
||||
f"Can't export key {field.name!r} with value {value!r} (type {type(value)}"
|
||||
)
|
||||
|
||||
|
||||
class XMLExporter:
|
||||
@staticmethod
|
||||
def export(outfile_path, recorded_calls, info: Dict[str, str]):
|
||||
|
||||
root = etree.Element("root")
|
||||
|
||||
info_elem = etree.SubElement(root, "info")
|
||||
for k, v in info.items():
|
||||
etree.SubElement(info_elem, k).text = v
|
||||
|
||||
rcs = etree.SubElement(root, "recorded_calls")
|
||||
|
||||
for (call, callee) in sorted(recorded_calls):
|
||||
rc = etree.SubElement(rcs, "recorded_call")
|
||||
dataclass_to_xml(call, rc)
|
||||
dataclass_to_xml(callee, rc)
|
||||
|
||||
tree = etree.ElementTree(root)
|
||||
tree.write(outfile_path, encoding="utf-8", pretty_print=True)
|
||||
|
||||
print(f"Wrote {len(recorded_calls)} recorded calls to {outfile_path}")
|
||||
@@ -0,0 +1,46 @@
|
||||
import dataclasses
|
||||
from typing import Any, List
|
||||
|
||||
from cg_trace.bytecode_reconstructor import BytecodeExpr
|
||||
|
||||
PREAMBLE = """\
|
||||
import python
|
||||
|
||||
abstract class XMLBytecodeExpr extends XMLElement { }
|
||||
"""
|
||||
|
||||
CLASS_PREAMBLE = """\
|
||||
class XML{class_name} extends XMLBytecodeExpr {{
|
||||
XML{class_name}() {{ this.hasName("{class_name}") }}
|
||||
"""
|
||||
|
||||
CLASS_AFTER = """\
|
||||
}
|
||||
"""
|
||||
|
||||
ATTR_TEMPLATES = {
|
||||
str: 'string get_{name}_data() {{ result = this.getAChild("{name}").getTextValue() }}',
|
||||
int: 'int get_{name}_data() {{ result = this.getAChild("{name}").getTextValue().toInt() }}',
|
||||
BytecodeExpr: 'XMLBytecodeExpr get_{name}_data() {{ result.getParent() = this.getAChild("{name}") }}',
|
||||
List[
|
||||
BytecodeExpr
|
||||
]: 'XMLBytecodeExpr get_{name}_data(int index) {{ result = this.getAChild("{name}").getChild(index) }}',
|
||||
Any: 'string get_{name}_data_raw() {{ result = this.getAChild("{name}").getTextValue() }}',
|
||||
}
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
||||
print(PREAMBLE)
|
||||
|
||||
for sc in BytecodeExpr.__subclasses__():
|
||||
print(CLASS_PREAMBLE.format(class_name=sc.__name__))
|
||||
|
||||
for f in dataclasses.fields(sc):
|
||||
field_template = ATTR_TEMPLATES.get(f.type)
|
||||
if field_template:
|
||||
generated = field_template.format(name=f.name)
|
||||
print(f" {generated}")
|
||||
else:
|
||||
raise Exception("no template for", f.type)
|
||||
|
||||
print(CLASS_AFTER)
|
||||
118
python/tools/recorded-call-graph-metrics/src/cg_trace/main.py
Normal file
118
python/tools/recorded-call-graph-metrics/src/cg_trace/main.py
Normal file
@@ -0,0 +1,118 @@
|
||||
import itertools
|
||||
import logging
|
||||
import os
|
||||
import sys
|
||||
import time
|
||||
from datetime import datetime
|
||||
from io import StringIO
|
||||
|
||||
from cg_trace import __version__, cmdline, settings, tracer
|
||||
from cg_trace.exporter import XMLExporter
|
||||
|
||||
|
||||
def record_calls(code, globals):
|
||||
real_stdout = sys.stdout
|
||||
real_stderr = sys.stderr
|
||||
captured_stdout = StringIO()
|
||||
captured_stderr = StringIO()
|
||||
|
||||
sys.stdout = captured_stdout
|
||||
sys.stderr = captured_stderr
|
||||
|
||||
cgt = tracer.CallGraphTracer()
|
||||
exit_status = cgt.run(code, globals, globals)
|
||||
sys.stdout = real_stdout
|
||||
sys.stderr = real_stderr
|
||||
|
||||
all_calls_sorted = sorted(
|
||||
itertools.chain(cgt.python_calls.values(), cgt.external_calls.values())
|
||||
)
|
||||
|
||||
return all_calls_sorted, captured_stdout, captured_stderr, exit_status
|
||||
|
||||
|
||||
def setup_logging(debug):
|
||||
# code we run can also set up logging, so we need to set the level directly on our
|
||||
# own pacakge
|
||||
sh = logging.StreamHandler(stream=sys.stderr)
|
||||
|
||||
pkg_logger = logging.getLogger("cg_trace")
|
||||
pkg_logger.addHandler(sh)
|
||||
pkg_logger.setLevel(logging.CRITICAL if debug else logging.INFO)
|
||||
|
||||
|
||||
def main(args=None) -> int:
|
||||
|
||||
# from . import bytecode_reconstructor
|
||||
# logging.getLogger(bytecode_reconstructor.__name__).setLevel(logging.INFO)
|
||||
|
||||
if args is None:
|
||||
# first element in argv is program name
|
||||
args = sys.argv[1:]
|
||||
|
||||
opts = cmdline.parse(args)
|
||||
|
||||
settings.DEBUG = opts.debug
|
||||
setup_logging(opts.debug)
|
||||
|
||||
# These details of setting up the program to be run is very much inspired by `trace`
|
||||
# from the standard library
|
||||
if opts.module:
|
||||
import runpy
|
||||
|
||||
module_name = opts.progname
|
||||
_mod_name, mod_spec, code = runpy._get_module_details(module_name)
|
||||
sys.argv = [code.co_filename, *opts.arguments]
|
||||
globs = {
|
||||
"__name__": "__main__",
|
||||
"__file__": code.co_filename,
|
||||
"__package__": mod_spec.parent,
|
||||
"__loader__": mod_spec.loader,
|
||||
"__spec__": mod_spec,
|
||||
"__cached__": None,
|
||||
}
|
||||
else:
|
||||
sys.argv = [opts.progname, *opts.arguments]
|
||||
sys.path[0] = os.path.dirname(opts.progname)
|
||||
|
||||
with open(opts.progname) as fp:
|
||||
code = compile(fp.read(), opts.progname, "exec")
|
||||
|
||||
# try to emulate __main__ namespace as much as possible
|
||||
globs = {
|
||||
"__file__": opts.progname,
|
||||
"__name__": "__main__",
|
||||
"__package__": None,
|
||||
"__cached__": None,
|
||||
}
|
||||
|
||||
start = time.time()
|
||||
recorded_calls, captured_stdout, captured_stderr, exit_status = record_calls(
|
||||
code, globs
|
||||
)
|
||||
end = time.time()
|
||||
elapsed_formatted = f"{end-start:.2f} seconds"
|
||||
|
||||
if opts.xml:
|
||||
XMLExporter.export(
|
||||
opts.xml,
|
||||
recorded_calls,
|
||||
info={
|
||||
"cg_trace_version": __version__,
|
||||
"args": " ".join(args),
|
||||
"exit_status": exit_status,
|
||||
"elapsed": elapsed_formatted,
|
||||
"utctimestamp": datetime.utcnow().replace(microsecond=0).isoformat(),
|
||||
},
|
||||
)
|
||||
else:
|
||||
print(f"--- Recorded calls (in {elapsed_formatted}) ---")
|
||||
for (call, callee) in recorded_calls:
|
||||
print(f"{call} --> {callee}")
|
||||
|
||||
print("--- captured stdout ---")
|
||||
print(captured_stdout.getvalue(), end="")
|
||||
print("--- captured stderr ---")
|
||||
print(captured_stderr.getvalue(), end="")
|
||||
|
||||
return 0
|
||||
@@ -0,0 +1,6 @@
|
||||
# Whether to run the call graph tracer with debugging enabled. Turning off
|
||||
# `if DEBUG: LOGGER.debug()` code completely yielded massive performance improvements.
|
||||
DEBUG = False
|
||||
|
||||
|
||||
FAIL_ON_UNKNOWN_BYTECODE = False
|
||||
333
python/tools/recorded-call-graph-metrics/src/cg_trace/tracer.py
Normal file
333
python/tools/recorded-call-graph-metrics/src/cg_trace/tracer.py
Normal file
@@ -0,0 +1,333 @@
|
||||
import dataclasses
|
||||
import inspect
|
||||
import logging
|
||||
import os
|
||||
import sys
|
||||
from types import FrameType
|
||||
from typing import Any, Optional, Tuple
|
||||
|
||||
from cg_trace.bytecode_reconstructor import BytecodeExpr, expr_from_frame
|
||||
from cg_trace.settings import DEBUG
|
||||
from cg_trace.utils import better_compare_for_dataclass
|
||||
|
||||
LOGGER = logging.getLogger(__name__)
|
||||
|
||||
|
||||
# copy-paste For interactive ipython sessions
|
||||
# import IPython; sys.stdout = sys.__stdout__; IPython.embed(); sys.exit()
|
||||
|
||||
|
||||
_canonic_filename_cache = dict()
|
||||
|
||||
|
||||
def canonic_filename(filename):
|
||||
"""Return canonical form of filename. (same as Bdb.canonic)
|
||||
|
||||
For real filenames, the canonical form is a case-normalized (on
|
||||
case insensitive filesystems) absolute path. 'Filenames' with
|
||||
angle brackets, such as "<stdin>", generated in interactive
|
||||
mode, are returned unchanged.
|
||||
"""
|
||||
if filename == "<" + filename[1:-1] + ">":
|
||||
return filename
|
||||
canonic = _canonic_filename_cache.get(filename)
|
||||
if not canonic:
|
||||
canonic = os.path.abspath(filename)
|
||||
canonic = os.path.normcase(canonic)
|
||||
_canonic_filename_cache[filename] = canonic
|
||||
return canonic
|
||||
|
||||
|
||||
_call_cache = dict()
|
||||
|
||||
|
||||
@dataclasses.dataclass(frozen=True, eq=True, order=True)
|
||||
class Call:
|
||||
"""A call
|
||||
"""
|
||||
|
||||
filename: str
|
||||
linenum: int
|
||||
inst_index: int
|
||||
bytecode_expr: BytecodeExpr
|
||||
|
||||
def __str__(self):
|
||||
d = dataclasses.asdict(self)
|
||||
del d["bytecode_expr"]
|
||||
normal_fields = ", ".join(f"{k}={v!r}" for k, v in d.items())
|
||||
|
||||
return f"{type(self).__name__}({normal_fields}, bytecode_expr≈{str(self.bytecode_expr)})"
|
||||
|
||||
@classmethod
|
||||
def from_frame(cls, frame: FrameType):
|
||||
global _call_cache
|
||||
key = cls.hash_key(frame)
|
||||
if key in _call_cache:
|
||||
return _call_cache[key]
|
||||
|
||||
code = frame.f_code
|
||||
|
||||
bytecode_expr = expr_from_frame(frame)
|
||||
|
||||
call = cls(
|
||||
filename=canonic_filename(code.co_filename),
|
||||
linenum=frame.f_lineno,
|
||||
inst_index=frame.f_lasti,
|
||||
bytecode_expr=bytecode_expr,
|
||||
)
|
||||
|
||||
_call_cache[key] = call
|
||||
|
||||
return call
|
||||
|
||||
@staticmethod
|
||||
def hash_key(frame: FrameType) -> Tuple[str, int, int]:
|
||||
code = frame.f_code
|
||||
return (
|
||||
canonic_filename(code.co_filename),
|
||||
frame.f_lineno,
|
||||
frame.f_lasti,
|
||||
)
|
||||
|
||||
|
||||
@dataclasses.dataclass(frozen=True, eq=True, order=True)
|
||||
class Callee:
|
||||
pass
|
||||
|
||||
|
||||
BUILTIN_FUNCTION_OR_METHOD = type(print)
|
||||
METHOD_DESCRIPTOR_TYPE = type(dict.get)
|
||||
|
||||
|
||||
_unknown_module_fixup_cache = dict()
|
||||
|
||||
|
||||
def _unkown_module_fixup(func):
|
||||
# TODO: Doesn't work for everything (for example: `OrderedDict.fromkeys`, `object.__new__`)
|
||||
|
||||
# TODO: Can make this logic easier by using `func.__self__`. For `f = dict().get`, `f.__self__.__class__ == dict`
|
||||
# and `dict.__new__.__self__ = dict`
|
||||
|
||||
module = func.__module__
|
||||
qualname = func.__qualname__
|
||||
cls_name, method_name = qualname.split(".")
|
||||
|
||||
key = (module, qualname)
|
||||
if key in _unknown_module_fixup_cache:
|
||||
return _unknown_module_fixup_cache[key]
|
||||
|
||||
matching_classes = list()
|
||||
for klass in object.__subclasses__():
|
||||
|
||||
if inspect.isabstract(klass):
|
||||
continue
|
||||
|
||||
try:
|
||||
# type(dict.get) == METHOD_DESCRIPTOR_TYPE
|
||||
# type(dict.__new__) == BUILTIN_FUNCTION_OR_METHOD
|
||||
if klass.__qualname__ == cls_name and type(
|
||||
getattr(klass, method_name, None)
|
||||
) in [BUILTIN_FUNCTION_OR_METHOD, METHOD_DESCRIPTOR_TYPE]:
|
||||
matching_classes.append(klass)
|
||||
# For flask, observed to give `ValueError: Namespace class is abstract`, even with the isabstract above
|
||||
except ValueError:
|
||||
pass
|
||||
|
||||
if len(matching_classes) == 1:
|
||||
klass = matching_classes[0]
|
||||
ret = klass.__module__
|
||||
else:
|
||||
if DEBUG:
|
||||
LOGGER.debug(f"Found more than one matching class for {module} {qualname}")
|
||||
ret = None
|
||||
_unknown_module_fixup_cache[key] = ret
|
||||
return ret
|
||||
|
||||
|
||||
@better_compare_for_dataclass
|
||||
@dataclasses.dataclass(frozen=True, eq=True)
|
||||
class ExternalCallee(Callee):
|
||||
# Some bound methods might not have __module__ attribute: for example,
|
||||
# `list().append.__module__ is None`
|
||||
module: Optional[str]
|
||||
qualname: str
|
||||
#
|
||||
is_builtin: bool
|
||||
|
||||
@classmethod
|
||||
def from_arg(cls, func):
|
||||
# builtin bound methods seems to always return `None` for __module__, but we
|
||||
# might be able to recover the lost information by looking through all classes.
|
||||
# For example, `dict().get.__module__ is None` and `dict().get.__qualname__ ==
|
||||
# "dict.get"`
|
||||
|
||||
module = func.__module__
|
||||
qualname = func.__qualname__
|
||||
if module is None and qualname.count(".") == 1:
|
||||
module = _unkown_module_fixup(func)
|
||||
|
||||
return cls(
|
||||
module=module,
|
||||
qualname=qualname,
|
||||
is_builtin=type(func) == BUILTIN_FUNCTION_OR_METHOD,
|
||||
)
|
||||
|
||||
def __lt__(self, other):
|
||||
if not isinstance(other, ExternalCallee):
|
||||
raise TypeError()
|
||||
|
||||
for field in dataclasses.fields(self):
|
||||
s_a = getattr(self, field.name)
|
||||
o_a = getattr(other, field.name)
|
||||
|
||||
# `None < None` gives TypeError
|
||||
if s_a is None and o_a is None:
|
||||
return False
|
||||
|
||||
if type(s_a) != type(o_a):
|
||||
return type(s_a).__name__ < type(o_a).__name__
|
||||
|
||||
if not s_a < o_a:
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
def __gt__(self, other):
|
||||
return other < self
|
||||
|
||||
def __ge__(self, other):
|
||||
return self > other or self == other
|
||||
|
||||
def __le__(self, other):
|
||||
return self < other or self == other
|
||||
|
||||
|
||||
@better_compare_for_dataclass
|
||||
@dataclasses.dataclass(frozen=True, eq=True, order=True)
|
||||
class PythonCallee(Callee):
|
||||
"""A callee (Function/Lambda/???)
|
||||
|
||||
should (hopefully) be uniquely identified by its name and location (filename+line
|
||||
number)
|
||||
"""
|
||||
|
||||
filename: str
|
||||
linenum: int
|
||||
funcname: str
|
||||
|
||||
@classmethod
|
||||
def from_frame(cls, frame: FrameType):
|
||||
code = frame.f_code
|
||||
return cls(
|
||||
filename=canonic_filename(code.co_filename),
|
||||
linenum=frame.f_lineno,
|
||||
funcname=code.co_name,
|
||||
)
|
||||
|
||||
|
||||
class CallGraphTracer:
|
||||
"""Tracer that records calls being made
|
||||
|
||||
It would seem obvious that this should have extended `trace` library
|
||||
(https://docs.python.org/3/library/trace.html), but that part is not extensible.
|
||||
|
||||
You might think that we can just use `sys.settrace`
|
||||
(https://docs.python.org/3.8/library/sys.html#sys.settrace) like the basic debugger
|
||||
(bdb) does, but that isn't invoked on calls to C code, which we need in general, and
|
||||
need for handling builtins specifically.
|
||||
|
||||
Luckily, `sys.setprofile`
|
||||
(https://docs.python.org/3.8/library/sys.html#sys.setprofile) provides all that we
|
||||
need. You might be scared by reading the following bit of the documentation
|
||||
|
||||
> The function is thread-specific, but there is no way for the profiler to know about
|
||||
> context switches between threads, so it does not make sense to use this in the
|
||||
> presence of multiple threads.
|
||||
|
||||
but that is to be understood in the context of making a profiler (you can't reliably
|
||||
measure function execution time if you don't know about context switches). For our
|
||||
use-case, this is not a problem.
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
# Performing `Call.from_frame` can be expensive, so we cache (call, callee)
|
||||
# pairs we have already seen to avoid double procressing.
|
||||
self.python_calls = dict()
|
||||
self.external_calls = dict()
|
||||
|
||||
def run(self, code, globals, locals):
|
||||
self.exec_call_seen = False
|
||||
self.ignore_rest = False
|
||||
try:
|
||||
sys.setprofile(self.profilefunc)
|
||||
exec(code, globals, locals)
|
||||
return "completed"
|
||||
except SystemExit:
|
||||
return "completed (SystemExit)"
|
||||
except Exception:
|
||||
sys.setprofile(None)
|
||||
LOGGER.info("Exception occurred while running program:", exc_info=True)
|
||||
return "exception occurred"
|
||||
finally:
|
||||
sys.setprofile(None)
|
||||
|
||||
def profilefunc(self, frame: FrameType, event: str, arg):
|
||||
# ignore everything until the first call, since that is `exec` from the `run`
|
||||
# method above
|
||||
if not self.exec_call_seen:
|
||||
if event == "call":
|
||||
self.exec_call_seen = True
|
||||
return
|
||||
|
||||
# if we're going out of the exec, we should ignore anything else (for example the
|
||||
# call to `sys.setprofile(None)`)
|
||||
if event == "c_return":
|
||||
if arg == exec and frame.f_code.co_filename == __file__:
|
||||
self.ignore_rest = True
|
||||
|
||||
if self.ignore_rest:
|
||||
return
|
||||
|
||||
if event not in ["call", "c_call"]:
|
||||
return
|
||||
|
||||
if DEBUG:
|
||||
LOGGER.debug(f"profilefunc event={event}")
|
||||
if event == "call":
|
||||
# in call, the `frame` argument is new the frame for entering the callee
|
||||
assert frame.f_back is not None
|
||||
|
||||
callee = PythonCallee.from_frame(frame)
|
||||
|
||||
key = (Call.hash_key(frame.f_back), callee)
|
||||
if key in self.python_calls:
|
||||
if DEBUG:
|
||||
LOGGER.debug(f"ignoring already seen call {key[0]} --> {callee}")
|
||||
return
|
||||
|
||||
if DEBUG:
|
||||
LOGGER.debug(f"callee={callee}")
|
||||
call = Call.from_frame(frame.f_back)
|
||||
|
||||
self.python_calls[key] = (call, callee)
|
||||
|
||||
if event == "c_call":
|
||||
# in c_call, the `frame` argument is frame where the call happens, and the
|
||||
# `arg` argument is the C function object.
|
||||
|
||||
callee = ExternalCallee.from_arg(arg)
|
||||
|
||||
key = (Call.hash_key(frame), callee)
|
||||
if key in self.external_calls:
|
||||
if DEBUG:
|
||||
LOGGER.debug(f"ignoring already seen call {key[0]} --> {callee}")
|
||||
return
|
||||
|
||||
if DEBUG:
|
||||
LOGGER.debug(f"callee={callee}")
|
||||
call = Call.from_frame(frame)
|
||||
|
||||
self.external_calls[key] = (call, callee)
|
||||
|
||||
if DEBUG:
|
||||
LOGGER.debug(f"{call} --> {callee}")
|
||||
@@ -0,0 +1,20 @@
|
||||
def better_compare_for_dataclass(cls):
|
||||
"""When dataclass is used with `order=True`, the comparison methods is only implemented for
|
||||
objects of the same class. This decorator extends the functionality to compare class
|
||||
name if used against other objects.
|
||||
"""
|
||||
for op in [
|
||||
"__lt__",
|
||||
"__le__",
|
||||
"__gt__",
|
||||
"__ge__",
|
||||
]:
|
||||
old = getattr(cls, op)
|
||||
|
||||
def new(self, other, op=op, old=old):
|
||||
if type(self) == type(other):
|
||||
return old(self, other)
|
||||
return getattr(str, op)(self.__class__.__name__, other.__class__.__name__)
|
||||
|
||||
setattr(cls, op, new)
|
||||
return cls
|
||||
32
python/tools/recorded-call-graph-metrics/tests/create-test-db.sh
Executable file
32
python/tools/recorded-call-graph-metrics/tests/create-test-db.sh
Executable file
@@ -0,0 +1,32 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -Eeuo pipefail # see https://vaneyckt.io/posts/safer_bash_scripts_with_set_euxo_pipefail/
|
||||
|
||||
SCRIPTDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
|
||||
|
||||
if ! pip show cg_trace &>/dev/null; then
|
||||
echo "You need to follow setup instructions in README"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
DB="$SCRIPTDIR/cg-trace-test-db"
|
||||
SRC="$SCRIPTDIR/python-src/"
|
||||
XMLDIR="$SCRIPTDIR/python-traces/"
|
||||
PYTHON_EXTRACTOR=$(codeql resolve extractor --language=python)
|
||||
|
||||
rm -rf "$DB"
|
||||
rm -rf "$XMLDIR"
|
||||
|
||||
mkdir -p "$XMLDIR"
|
||||
|
||||
for f in $(ls $SRC); do
|
||||
echo "Tracing $f"
|
||||
cg-trace --xml "$XMLDIR/${f%.py}.xml" "$SRC/$f"
|
||||
done
|
||||
|
||||
codeql database init --source-root="$SRC" --language=python "$DB"
|
||||
codeql database trace-command --working-dir="$SRC" "$DB" "$PYTHON_EXTRACTOR/tools/autobuild.sh"
|
||||
codeql database index-files --language xml --include-extension .xml --working-dir="$XMLDIR" "$DB"
|
||||
codeql database finalize "$DB"
|
||||
|
||||
echo "Created database '$DB'"
|
||||
@@ -0,0 +1,9 @@
|
||||
def foo():
|
||||
print("foo")
|
||||
|
||||
|
||||
def bar():
|
||||
print("bar")
|
||||
|
||||
|
||||
[foo, bar][0]()
|
||||
@@ -0,0 +1,9 @@
|
||||
def foo():
|
||||
print("foo")
|
||||
|
||||
|
||||
def bar():
|
||||
print("bar")
|
||||
|
||||
|
||||
(foo, bar)[0]()
|
||||
@@ -0,0 +1,15 @@
|
||||
def func(*args, **kwargs):
|
||||
print("func", args, kwargs)
|
||||
|
||||
|
||||
args = [1, 2, 3]
|
||||
kwargs = {"a": 1, "b": 2}
|
||||
|
||||
# These gives rise to a CALL_FUNCTION_EX
|
||||
func(*args)
|
||||
func(**kwargs)
|
||||
func(*args, **kwargs)
|
||||
|
||||
|
||||
func(*args, foo="foo")
|
||||
func(*args, foo="foo", **kwargs)
|
||||
@@ -0,0 +1,7 @@
|
||||
class Foo:
|
||||
def __getitem__(self, key):
|
||||
print("__getitem__")
|
||||
|
||||
|
||||
foo = Foo()
|
||||
foo["key"] # this is recorded as a call :)
|
||||
@@ -0,0 +1,9 @@
|
||||
print("builtins test")
|
||||
len("bar")
|
||||
l = list()
|
||||
l.append(42)
|
||||
|
||||
import sys
|
||||
sys.getdefaultencoding()
|
||||
|
||||
r = range(10)
|
||||
@@ -0,0 +1,44 @@
|
||||
def func(self, arg):
|
||||
print("func", self, arg)
|
||||
|
||||
|
||||
class Foo(object):
|
||||
def __init__(self, arg):
|
||||
print("Foo.__init__", self, arg)
|
||||
|
||||
def some_method(self):
|
||||
print("Foo.some_method", self)
|
||||
return self
|
||||
|
||||
f = func
|
||||
|
||||
@staticmethod
|
||||
def some_staticmethod():
|
||||
print("Foo.some_staticmethod")
|
||||
|
||||
@classmethod
|
||||
def some_classmethod(cls):
|
||||
print("Foo.some_classmethod", cls)
|
||||
|
||||
|
||||
foo = Foo(42)
|
||||
foo.some_method()
|
||||
foo.f(10)
|
||||
foo.some_staticmethod()
|
||||
foo.some_classmethod()
|
||||
foo.some_method().some_method().some_method()
|
||||
|
||||
|
||||
Foo.some_staticmethod()
|
||||
Foo.some_classmethod()
|
||||
|
||||
|
||||
class Bar(object):
|
||||
def wat(self):
|
||||
print("Bar.wat")
|
||||
|
||||
|
||||
# these calls to Bar() are not recorded (since no __init__ function)
|
||||
bar = Bar()
|
||||
bar.wat()
|
||||
Bar().wat()
|
||||
@@ -0,0 +1,3 @@
|
||||
d = dict()
|
||||
|
||||
d.get("foo") or d.get("bar")
|
||||
@@ -0,0 +1,4 @@
|
||||
import socket
|
||||
|
||||
sock = socket.socket()
|
||||
print(sock.getsockname())
|
||||
@@ -0,0 +1,4 @@
|
||||
import io
|
||||
|
||||
# the `io.open` is just an alias for `_io.open`, but we record the external callee as `io.open` :|
|
||||
io.open("foo")
|
||||
@@ -0,0 +1,7 @@
|
||||
for i in range(10):
|
||||
print(i)
|
||||
|
||||
[i + 1 for i in range(10)]
|
||||
l = list(range(10))
|
||||
[i + 1 for i in l]
|
||||
[i + 1 for i in l]
|
||||
@@ -0,0 +1,37 @@
|
||||
def one(*args, **kwargs):
|
||||
print("one")
|
||||
return 1
|
||||
|
||||
def two(*args, **kwargs):
|
||||
print("two")
|
||||
return 2
|
||||
|
||||
def three(*args, **kwargs):
|
||||
print("three")
|
||||
return 3
|
||||
|
||||
one(); two()
|
||||
print("---")
|
||||
|
||||
one(); one()
|
||||
print("---")
|
||||
|
||||
alias_one = one
|
||||
alias_one(); two()
|
||||
print("---")
|
||||
|
||||
three(one(), two())
|
||||
print("---")
|
||||
|
||||
three(one(), two=two())
|
||||
print("---")
|
||||
|
||||
def f():
|
||||
print("f")
|
||||
|
||||
def g():
|
||||
print("g")
|
||||
|
||||
return g
|
||||
|
||||
f()()
|
||||
@@ -0,0 +1,26 @@
|
||||
class Foo:
|
||||
def __init__(self):
|
||||
self.list = []
|
||||
|
||||
def func(self, kwargs=None, result_callback=None):
|
||||
self.list.append((kwargs or {}, result_callback))
|
||||
|
||||
|
||||
foo = Foo()
|
||||
foo.func()
|
||||
|
||||
"""
|
||||
Has problematic bytecode, since to find out what method is called from instruction 16, we need
|
||||
to traverse the JUMP_IF_TRUE_OR_POP which requires some more sophistication.
|
||||
|
||||
Disassembly of <code object func at 0x7f98f64ee030, file "example/problem-1.py", line 5>:
|
||||
6 0 LOAD_FAST 0 (self)
|
||||
2 LOAD_ATTR 0 (list)
|
||||
4 LOAD_METHOD 1 (append)
|
||||
6 LOAD_FAST 1 (kwargs)
|
||||
8 JUMP_IF_TRUE_OR_POP 12
|
||||
10 BUILD_MAP 0
|
||||
>> 12 LOAD_FAST 2 (result_callback)
|
||||
14 BUILD_TUPLE 2
|
||||
16 CALL_METHOD 1
|
||||
"""
|
||||
@@ -0,0 +1,25 @@
|
||||
def func(func_arg):
|
||||
print("func")
|
||||
|
||||
def func2():
|
||||
print("func2")
|
||||
return func_arg()
|
||||
|
||||
func2()
|
||||
|
||||
|
||||
def nop():
|
||||
print("nop")
|
||||
pass
|
||||
|
||||
|
||||
func(nop)
|
||||
|
||||
|
||||
"""
|
||||
Needs handling of LOAD_DEREF. Disassembled bytecode looks like:
|
||||
|
||||
6 8 LOAD_DEREF 0 (func_arg)
|
||||
10 CALL_FUNCTION 0
|
||||
12 RETURN_VALUE
|
||||
"""
|
||||
@@ -0,0 +1,10 @@
|
||||
def foo():
|
||||
print('foo')
|
||||
|
||||
def bar():
|
||||
print('bar')
|
||||
|
||||
foo()
|
||||
bar()
|
||||
|
||||
foo(); bar()
|
||||
@@ -0,0 +1,5 @@
|
||||
import sys
|
||||
|
||||
print("will exit now")
|
||||
|
||||
sys.exit()
|
||||
Reference in New Issue
Block a user