mirror of https://github.com/github/codeql.git synced 2026-06-03 04:40:14 +02:00

Files

Copilot a13dfaa44f Python: deprecate AstNode.getAFlowNode() and rewrite internal callers

Preparatory refactor for the shared-CFG dataflow migration.

Deprecates the AstNode.getAFlowNode() cached predicate on the public
Python QL API and rewrites all ~140 internal callers across lib/, src/,
test/, and tools/ from `expr.getAFlowNode() = cfgNode` to
`cfgNode.getNode() = expr`, using ControlFlowNode.getNode() which
already exists in Flow.qll.

The predicate itself is preserved (with a deprecation note pointing at
the new pattern) so external users do not experience churn — they can
migrate at their own pace and the AST/CFG hierarchies still get the
intended untangling once the deprecation eventually elapses.

Semantic noop verified by:
- All 361 lib/ + src/ queries compile clean.
- All 122 ControlFlow + PointsTo library-tests pass.
- All 64 dataflow library-tests pass.
- All 113 Variables/Exceptions/Expressions/Statements/Functions/Imports/
  Security/CWE-798/ModificationOfParameterWithDefault query-tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

2026-06-02 08:37:30 +00:00

example

Python: CG trace: Don't abuse example dir

2020-07-22 14:22:04 +02:00

Python: deprecate AstNode.getAFlowNode() and rewrite internal callers

2026-06-02 08:37:30 +00:00

src/cg_trace

spelling: processing

2022-10-13 11:21:09 -04:00

tests

Python: CG trace: Better handling of builtins without __module__

2020-07-24 19:13:53 +02:00

.flake8

Python: CG trace: blackify

2020-07-17 13:49:25 +02:00

.gitignore

Python: CG trace: Add helper.sh to run tracing against real projects

2020-07-23 17:37:01 +02:00

.isort.cfg

Python: CG trace: Make code modular

2020-07-17 14:40:54 +02:00

helper.sh

Python: CG trace: Make ./helper.sh show help again

2020-07-24 18:59:29 +02:00

projects.json

Python: CG trace: Add support for flask

2020-07-24 20:06:53 +02:00

README.md

Python: Fix grammar

2020-09-07 14:59:07 +02:00

requirements.txt

Python: CG trace: blackify

2020-07-17 13:49:25 +02:00

setup.py

Python: CG trace: reconstruct call expr from bytecode

2020-07-20 11:28:05 +02:00

README.md

Recorded Call Graph Metrics

also known as call graph tracing.

Execute a python program and for each call being made, record the call and callee. This allows us to compare call graph resolution from static analysis with actual data -- that is, can we statically determine the target of each actual call correctly.

Using the call graph tracer does incur a heavy toll on the performance. Expect 10x longer to execute the program.

Number of calls recorded vary a little from run to run. I have not been able to pinpoint why.

Running against real projects

Currently it's possible to gather metrics from traced runs of the standard test suite of a few projects (defined in projects.json): youtube-dl, wcwidth, and flask.

To run against all projects, use

$ ./helper.sh all $(./helper.sh projects)

To view the results, use

$ head -n 100 projects/*/Metrics.txt

Expanding set of projects

It should be fairly straightforward to expand the set of projects. Most projects use tox for running their tests against multiple python versions. I didn't look into any kind of integration, but have manually picked out the instructions required to get going.

As an example, compare the tox.ini file from flask with the configuration

    "flask": {
        "repo": "https://github.com/pallets/flask.git",
        "sha": "21c3df31de4bc2f838c945bd37d185210d9bab1a",
        "module_command": "pytest -c /dev/null tests examples",
        "setup": [
            "pip install -r requirements/tests.txt",
            "pip install -q -e examples/tutorial[test]",
            "pip install -q -e examples/javascript[test]"
        ]
    }

Local development

Setup

Ensure you have at least Python 3.7
Create virtual environment python3 -m venv venv and activate it
Install dependencies pip install -r --upgrade requirements.txt
Install this codebase as an editable package pip install -e .
Setup your editor. If you're using VS Code, create a new project for this folder, and use these settings for correct autoformatting of code on save:

{
    "python.pythonPath": "venv/bin/python",
    "python.linting.enabled": true,
    "python.linting.flake8Enabled": true,
    "python.formatting.provider": "black",
    "editor.formatOnSave": true,
    "[python]": {
        "editor.codeActionsOnSave": {
            "source.organizeImports": true
        }
    },
    "python.autoComplete.extraPaths": [
        "src"
    ]
}

Enjoy writing code, and being able to run cg-trace on your command line 🎉

Using it

After following setup instructions above, you should be able to reproduce the example trace by running

cg-trace --xml example/simple.xml example/simple.py

You can also run traces for all tests and build a database by running tests/create-test-db.sh. Then run the queries inside the ql/ directory.

Tracing Limitations

Multi-threading

Should be possible by using threading.setprofile, but that hasn't been done yet.

Code that uses `sys.setprofile`

Since that is our mechanism for recording calls, any code that uses sys.setprofile will not work together with the call-graph tracer.

Class instantiation

Does not always fire off an event in the sys.setprofile function (neither in sys.settrace), so is not recorded. Example:

r = range(10)

when disassembled (python -m dis <file>):

  9          48 LOAD_NAME                7 (range)
             50 LOAD_CONST               5 (10)
             52 CALL_FUNCTION            1
             54 STORE_NAME               8 (r)

but no event 😞

README.md

Recorded Call Graph Metrics

Running against real projects

Expanding set of projects

Local development

Setup

Using it

Tracing Limitations

Multi-threading

Code that uses sys.setprofile

Class instantiation

Code that uses `sys.setprofile`