Python: Copy Python extractor to codeql repo

This commit is contained in:
Taus
2024-02-28 15:15:21 +00:00
parent 297a17975d
commit 6dec323cfc
369 changed files with 165346 additions and 0 deletions

View File

@@ -0,0 +1,49 @@
load("//:dist.bzl", "pack_zip")
py_binary(
name = "make-zips-py",
srcs = [
"make_zips.py",
"python_tracer.py",
"unparse.py",
],
data = [
"LICENSE-PSF.md",
"__main__.py",
"imp.py",
] + glob([
"blib2to3/**",
"buildtools/**",
"lark/**",
"semmle/**",
]),
# On @criemen's machine, without this, make-zips.py can't find its imports from
# python_tracer. The problem didn't show for some reason on Windows CI machines, though.
imports = ["."],
main = "make_zips.py",
)
genrule(
name = "python3src",
outs = [
"python3src.zip",
],
cmd = "PYTHON_INSTALLER_OUTPUT=\"$(RULEDIR)\" $(location :make-zips-py)",
tools = [":make-zips-py"],
)
pack_zip(
name = "extractor-python",
srcs = [
"LICENSE-PSF.md", # because we distribute imp.py
"convert_setup.py",
"get_venv_lib.py",
"imp.py",
"index.py",
"python_tracer.py",
"setup.py",
":python3src",
] + glob(["data/**"]),
prefix = "tools",
visibility = ["//visibility:public"],
)

View File

@@ -0,0 +1,257 @@
Parts of the Python extractor are derived from code in the CPython
distribution. Its license is reproduced below.
A. HISTORY OF THE SOFTWARE
==========================
Python was created in the early 1990s by Guido van Rossum at Stichting
Mathematisch Centrum (CWI, see http://www.cwi.nl) in the Netherlands
as a successor of a language called ABC. Guido remains Python's
principal author, although it includes many contributions from others.
In 1995, Guido continued his work on Python at the Corporation for
National Research Initiatives (CNRI, see http://www.cnri.reston.va.us)
in Reston, Virginia where he released several versions of the
software.
In May 2000, Guido and the Python core development team moved to
BeOpen.com to form the BeOpen PythonLabs team. In October of the same
year, the PythonLabs team moved to Digital Creations, which became
Zope Corporation. In 2001, the Python Software Foundation (PSF, see
https://www.python.org/psf/) was formed, a non-profit organization
created specifically to own Python-related Intellectual Property.
Zope Corporation was a sponsoring member of the PSF.
All Python releases are Open Source (see http://www.opensource.org for
the Open Source Definition). Historically, most, but not all, Python
releases have also been GPL-compatible; the table below summarizes
the various releases.
Release Derived Year Owner GPL-
from compatible? (1)
0.9.0 thru 1.2 1991-1995 CWI yes
1.3 thru 1.5.2 1.2 1995-1999 CNRI yes
1.6 1.5.2 2000 CNRI no
2.0 1.6 2000 BeOpen.com no
1.6.1 1.6 2001 CNRI yes (2)
2.1 2.0+1.6.1 2001 PSF no
2.0.1 2.0+1.6.1 2001 PSF yes
2.1.1 2.1+2.0.1 2001 PSF yes
2.1.2 2.1.1 2002 PSF yes
2.1.3 2.1.2 2002 PSF yes
2.2 and above 2.1.1 2001-now PSF yes
Footnotes:
(1) GPL-compatible doesn't mean that we're distributing Python under
the GPL. All Python licenses, unlike the GPL, let you distribute
a modified version without making your changes open source. The
GPL-compatible licenses make it possible to combine Python with
other software that is released under the GPL; the others don't.
(2) According to Richard Stallman, 1.6.1 is not GPL-compatible,
because its license has a choice of law clause. According to
CNRI, however, Stallman's lawyer has told CNRI's lawyer that 1.6.1
is "not incompatible" with the GPL.
Thanks to the many outside volunteers who have worked under Guido's
direction to make these releases possible.
B. TERMS AND CONDITIONS FOR ACCESSING OR OTHERWISE USING PYTHON
===============================================================
PYTHON SOFTWARE FOUNDATION LICENSE VERSION 2
--------------------------------------------
1. This LICENSE AGREEMENT is between the Python Software Foundation
("PSF"), and the Individual or Organization ("Licensee") accessing and
otherwise using this software ("Python") in source or binary form and
its associated documentation.
2. Subject to the terms and conditions of this License Agreement, PSF hereby
grants Licensee a nonexclusive, royalty-free, world-wide license to reproduce,
analyze, test, perform and/or display publicly, prepare derivative works,
distribute, and otherwise use Python alone or in any derivative version,
provided, however, that PSF's License Agreement and PSF's notice of copyright,
i.e., "Copyright (c) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010,
2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019 Python Software Foundation;
All Rights Reserved" are retained in Python alone or in any derivative version
prepared by Licensee.
3. In the event Licensee prepares a derivative work that is based on
or incorporates Python or any part thereof, and wants to make
the derivative work available to others as provided herein, then
Licensee hereby agrees to include in any such work a brief summary of
the changes made to Python.
4. PSF is making Python available to Licensee on an "AS IS"
basis. PSF MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR
IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, PSF MAKES NO AND
DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS
FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF PYTHON WILL NOT
INFRINGE ANY THIRD PARTY RIGHTS.
5. PSF SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF PYTHON
FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS
A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING PYTHON,
OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF.
6. This License Agreement will automatically terminate upon a material
breach of its terms and conditions.
7. Nothing in this License Agreement shall be deemed to create any
relationship of agency, partnership, or joint venture between PSF and
Licensee. This License Agreement does not grant permission to use PSF
trademarks or trade name in a trademark sense to endorse or promote
products or services of Licensee, or any third party.
8. By copying, installing or otherwise using Python, Licensee
agrees to be bound by the terms and conditions of this License
Agreement.
BEOPEN.COM LICENSE AGREEMENT FOR PYTHON 2.0
-------------------------------------------
BEOPEN PYTHON OPEN SOURCE LICENSE AGREEMENT VERSION 1
1. This LICENSE AGREEMENT is between BeOpen.com ("BeOpen"), having an
office at 160 Saratoga Avenue, Santa Clara, CA 95051, and the
Individual or Organization ("Licensee") accessing and otherwise using
this software in source or binary form and its associated
documentation ("the Software").
2. Subject to the terms and conditions of this BeOpen Python License
Agreement, BeOpen hereby grants Licensee a non-exclusive,
royalty-free, world-wide license to reproduce, analyze, test, perform
and/or display publicly, prepare derivative works, distribute, and
otherwise use the Software alone or in any derivative version,
provided, however, that the BeOpen Python License is retained in the
Software, alone or in any derivative version prepared by Licensee.
3. BeOpen is making the Software available to Licensee on an "AS IS"
basis. BEOPEN MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR
IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, BEOPEN MAKES NO AND
DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS
FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF THE SOFTWARE WILL NOT
INFRINGE ANY THIRD PARTY RIGHTS.
4. BEOPEN SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF THE
SOFTWARE FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS
AS A RESULT OF USING, MODIFYING OR DISTRIBUTING THE SOFTWARE, OR ANY
DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF.
5. This License Agreement will automatically terminate upon a material
breach of its terms and conditions.
6. This License Agreement shall be governed by and interpreted in all
respects by the law of the State of California, excluding conflict of
law provisions. Nothing in this License Agreement shall be deemed to
create any relationship of agency, partnership, or joint venture
between BeOpen and Licensee. This License Agreement does not grant
permission to use BeOpen trademarks or trade names in a trademark
sense to endorse or promote products or services of Licensee, or any
third party. As an exception, the "BeOpen Python" logos available at
http://www.pythonlabs.com/logos.html may be used according to the
permissions granted on that web page.
7. By copying, installing or otherwise using the software, Licensee
agrees to be bound by the terms and conditions of this License
Agreement.
CNRI LICENSE AGREEMENT FOR PYTHON 1.6.1
---------------------------------------
1. This LICENSE AGREEMENT is between the Corporation for National
Research Initiatives, having an office at 1895 Preston White Drive,
Reston, VA 20191 ("CNRI"), and the Individual or Organization
("Licensee") accessing and otherwise using Python 1.6.1 software in
source or binary form and its associated documentation.
2. Subject to the terms and conditions of this License Agreement, CNRI
hereby grants Licensee a nonexclusive, royalty-free, world-wide
license to reproduce, analyze, test, perform and/or display publicly,
prepare derivative works, distribute, and otherwise use Python 1.6.1
alone or in any derivative version, provided, however, that CNRI's
License Agreement and CNRI's notice of copyright, i.e., "Copyright (c)
1995-2001 Corporation for National Research Initiatives; All Rights
Reserved" are retained in Python 1.6.1 alone or in any derivative
version prepared by Licensee. Alternately, in lieu of CNRI's License
Agreement, Licensee may substitute the following text (omitting the
quotes): "Python 1.6.1 is made available subject to the terms and
conditions in CNRI's License Agreement. This Agreement together with
Python 1.6.1 may be located on the Internet using the following
unique, persistent identifier (known as a handle): 1895.22/1013. This
Agreement may also be obtained from a proxy server on the Internet
using the following URL: http://hdl.handle.net/1895.22/1013".
3. In the event Licensee prepares a derivative work that is based on
or incorporates Python 1.6.1 or any part thereof, and wants to make
the derivative work available to others as provided herein, then
Licensee hereby agrees to include in any such work a brief summary of
the changes made to Python 1.6.1.
4. CNRI is making Python 1.6.1 available to Licensee on an "AS IS"
basis. CNRI MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR
IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, CNRI MAKES NO AND
DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS
FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF PYTHON 1.6.1 WILL NOT
INFRINGE ANY THIRD PARTY RIGHTS.
5. CNRI SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF PYTHON
1.6.1 FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS
A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING PYTHON 1.6.1,
OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF.
6. This License Agreement will automatically terminate upon a material
breach of its terms and conditions.
7. This License Agreement shall be governed by the federal
intellectual property law of the United States, including without
limitation the federal copyright law, and, to the extent such
U.S. federal law does not apply, by the law of the Commonwealth of
Virginia, excluding Virginia's conflict of law provisions.
Notwithstanding the foregoing, with regard to derivative works based
on Python 1.6.1 that incorporate non-separable material that was
previously distributed under the GNU General Public License (GPL), the
law of the Commonwealth of Virginia shall govern this License
Agreement only as to issues arising under or with respect to
Paragraphs 4, 5, and 7 of this License Agreement. Nothing in this
License Agreement shall be deemed to create any relationship of
agency, partnership, or joint venture between CNRI and Licensee. This
License Agreement does not grant permission to use CNRI trademarks or
trade name in a trademark sense to endorse or promote products or
services of Licensee, or any third party.
8. By clicking on the "ACCEPT" button where indicated, or by copying,
installing or otherwise using Python 1.6.1, Licensee agrees to be
bound by the terms and conditions of this License Agreement.
ACCEPT
CWI LICENSE AGREEMENT FOR PYTHON 0.9.0 THROUGH 1.2
--------------------------------------------------
Copyright (c) 1991 - 1995, Stichting Mathematisch Centrum Amsterdam,
The Netherlands. All rights reserved.
Permission to use, copy, modify, and distribute this software and its
documentation for any purpose and without fee is hereby granted,
provided that the above copyright notice appear in all copies and that
both that copyright notice and this permission notice appear in
supporting documentation, and that the name of Stichting Mathematisch
Centrum or CWI not be used in advertising or publicity pertaining to
distribution of the software without specific, written prior
permission.
STICHTING MATHEMATISCH CENTRUM DISCLAIMS ALL WARRANTIES WITH REGARD TO
THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
FITNESS, IN NO EVENT SHALL STICHTING MATHEMATISCH CENTRUM BE LIABLE
FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT
OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

61
python/extractor/Makefile Normal file
View File

@@ -0,0 +1,61 @@
.PHONY: all
.DEFAULT: all
all:
OS = $(shell uname)
GIT_ROOT = $(shell git rev-parse --show-toplevel)
TOKENIZER_FILE = semmle/python/parser/tokenizer.py
TOKENIZER_DEPS = tokenizer_generator/state_transition.txt tokenizer_generator/tokenizer_template.py
# Must use the same Python version as on jenkins, since output differs per version.
# However, output is unstable on Python 3.5 (which jenkins uses)
TOKENIZER_CMD = python3 -m tokenizer_generator.gen_state_machine $(TOKENIZER_DEPS)
.PHONY: tokenizer
tokenizer: $(TOKENIZER_FILE)
$(TOKENIZER_FILE): $(TOKENIZER_DEPS)
$(TOKENIZER_CMD) > $@
MASTER_FILE = semmle/python/master.py
DBSCHEME_FILE = $(GIT_ROOT)/ql/python/ql/lib/semmlecode.python.dbscheme
.PHONY: dbscheme
dbscheme: $(MASTER_FILE)
python3 -m semmle.dbscheme_gen $(DBSCHEME_FILE)
AST_GENERATED_DIR = $(GIT_ROOT)/ql/python/ql/lib/semmle/python/
AST_GENERATED_FILE = $(AST_GENERATED_DIR)AstGenerated.qll
.PHONY: ast
ast: $(MASTER_FILE)
python3 -m semmle.query_gen $(AST_GENERATED_DIR)
$(GIT_ROOT)/target/intree/codeql/codeql query format --in-place $(AST_GENERATED_FILE)
################################################################################
# Tests
################################################################################
.PHONY: test-all
test-all: test-3
.PHONY: test-3
test-3: pytest-3 test-tokenizer
.PHONY: test-tokenizer
test-tokenizer: SHELL:=/bin/bash
test-tokenizer:
@echo Not running test-tokenizer as jenkins uses Python 3.5
# TODO: Enable again once we run Python > 3.5 on Jenkins
# diff -u $(TOKENIZER_FILE) <($(TOKENIZER_CMD))
.PHONY: pytest-3
pytest-3:
poetry run pytest
.PHONY: pytest-3-deprecation-error
pytest-3-deprecation-error:
PYTHONWARNINGS='error::DeprecationWarning' poetry run pytest

211
python/extractor/README.md Normal file
View File

@@ -0,0 +1,211 @@
# Python extraction
Python extraction happens in two phases:
1. [Setup](#1-Setup-Phase)
- determine which version to analyze the project as
- creating virtual environment (only LGTM.com)
- determine python import path
- invoking the actual python extractor
2. [The actual Python extractor](#2-The-actual-Python-extractor)
- walks files and folders, and performs extraction
The rule for `pack_zip('python-extractor')` in `build` defines what files are included in a distribution and in the CodeQL CLI. After building the CodeQL CLI locally, the files are in `target/intree/codeql/python/tools`.
## Local development
This project uses
- [poetry](https://python-poetry.org/) as the package manager
- [tox](https://tox.wiki/en) together with [pytest](https://docs.pytest.org/en/) to run tests across multiple versions
You can install both tools with [`pipx`](https://pypa.github.io/pipx/), like so
```sh
pipx install poetry
pipx inject poetry virtualenv-pyenv # to allow poetry to find python versions from pyenv
pipx install tox
pipx inject tox virtualenv-pyenv # to allow tox to find python versions from pyenv
```
Once you've installed poetry, you can do this:
```sh
# install required packages
$ poetry install
# to run tests against python version used by poetry
$ poetry run pytest
# or
$ poetry shell # activate poetry environment
$ pytest # so now pytest is available
# to run tests against all support python versions
$ tox
# to run against specific version (Python 3.9)
$ tox -e py39
```
To install multiple python versions locally, we recommend you use [`pyenv`](https://github.com/pyenv/pyenv)
_(don't try to use `tox run-parallel`, our tests are not set up for this to work 😅)_
### Zip files
Currently we distribute our code in an obfuscated way, by including the code in the subfolders in a zip file that is imported at run-time (by the python files in the top level of this directory).
The one exception is the `data` directory (used for stubs) which is included directly in the `tools` folder.
The zip creation is managed by [`make_zips.py`](./make_zips.py), and currently we make one zipfile for Python 2 (which is byte compiled), and one for Python 3 (which has source files, but they are stripped of comments and docstrings).
### A note about Python versions
We expect to be able to run our tools (setup phase) with either Python 2 or Python 3, and after determining which version to analyze the code as, we run the extractor with that version. So we must support:
- Setup tools run using Python 2:
- Extracting code using Python 2
- Extracting code using Python 3
- Setup tools run using Python 3:
- Extracting code using Python 2
- Extracting code using Python 3
# 1. Setup phase
**For extraction with the CodeQL CLI locally** (`codeql database create --language python`)
- Runs [`language-packs/python/tools/autobuild.sh`](/language-packs/python/tools/autobuild.sh) and this script runs [`index.py`](./index.py)
### Overview of control flow for [`setup.py`](./setup.py)
The representation of the code in the figure below has in some cases been altered slightly, but is accurate as of 2020-03-20.
<details open>
<!-- This file can be opened with diagrams.net directly -->
![python extraction overiew](./docs/extractor-python-setup.svg)
</details>
### Overview of control flow for [`index.py`](./index.py)
The representation of the code in the figure below has in some cases been altered slightly, but is accurate as of 2020-03-20.
<details open>
<!-- This file can be opened with diagrams.net directly -->
![python extraction overiew](./docs/extractor-python-index.svg)
</details>
# 2. The actual Python extractor
## Overview
The entrypoint of the actual Python extractor is [`python_tracer.py`](./python_tracer.py).
The usual way to invoke the extractor is to pass a directory of Python files to the launcher. The extractor extracts code from those files and their dependencies, producing TRAP files, and copies the source code to a source archive.
Alternatively, for highly distributed systems, it is possible to pass a single file to the per extractor invocation; invoking it many times.
The extractor recognizes Python source code files and Thrift IDL files.
Other types of file can be added to the database, by passing the `--filter` option to the extractor, but they'll be stored as text blobs.
The extractor expects the `CODEQL_EXTRACTOR_PYTHON_TRAP_DIR` and
`CODEQL_EXTRACTOR_PYTHON_SOURCE_ARCHIVE_DIR` environment variables to be set (which determine,
respectively, where it puts TRAP files and the source archive). However, the location of the TRAP
folder and source archive can be specified on the command-line instead.
The extractor outputs the following information as TRAP files:
- A file containing per-interpreter data, such as version information and the contents of the `builtins` module.
- One file per extractor process containing the file and folder information for all processed files and all enclosing folders.
- Per Python or template file:
- The AST.
- Scopes and variables, attached to the AST.
- The control-flow graph, selectively split when repeated tests are seen.
## How it works
### Overall Architecture
Once started, the extractor consists of three sets of communicating processes.
1. The front-end: A single process which walks the files and folders specified on the command-line, enqueuing those files plus any additional modules requested by the extractor processes.
2. The extractors: Typically one process per CPU. Takes file and module descriptions from the queue, producing TRAP files and copies of the source.
3. The logging process. To avoid message interleaving and avoid deadlock, all log messages are queued up to be sent to a logging process which formats and prints the messages.
The front-end -> worker message queue has quite limited capacity (2 per process) to ensure rapid shutdown when interrupted. The capacity of the worker -> front-end message queue must be at least twice that size to prevent deadlock, and is in fact much larger to prevent workers being blocked on the queue.
Experiments suggest that the extractor scales almost linearly to at least 20 processes (on linux).
The component that walks the file system is known as the "traverser" and is designed to be pluggable.
Its interface is simply an iterable of file descriptions. See `semmle/traverser.py`.
### Lifetime of the extractor
1. Parse the command-line options and read environment variables.
2. The main process creates:
1. the logging queue and process,
2. the message queues, and
3. the extractor processes.
3. The main process, now the front-end, starts traversing the file system, by iterating over the traverser.
4. Until it has exhausted the traverser, it concurrently:
- Adds module descriptions from the traverser to the message queue
- Reads the reply queue and for any `"IMPORT"` message received adds the module to the message queue if that module has not been seen before.
5. Until a `"SUCCESS"` message has been received on the reply queue for each module description that has been enqueued:
- Reads the reply queue and adds those module descriptions it hasn't seen before to the message queue.
6. Add one `None` message to the message queue for each extractor.
7. Wait for all extractors to halt.
8. Stop the logging process and halt.
### Lifetime of an extractor process
1. Read messages from the message queue until a `None` message is received. For each message:
1. Parse the file or module.
2. Send an "IMPORT" message for all modules imported by the module being processed.
3. Write out TRAP and source archive for the file.
4. Send a "SUCCESS" message for the file.
2. Emit file and folder TRAP for all files and modules processed.
3. Halt.
### TRAP caching
An important consequence of local extraction is that, except for the file path information, the contents of the TRAP file are functionally determined by:
- The contents of the file.
- Some command-line options (those determining name hashing and CFG splitting).
- The extractor version.
Caching of TRAP files can reduce the time to extract a large project with few changes by an order of magnitude.
### Extraction
Each extractor process runs a loop which extracts files or modules from the queue, one at a time.
Each file or module description is passed, in turn, to one of the extractor objects which will either extract it or reject it for the next extractor object to try.
Currently the default extractors are:
- Builtin module extractor: Extracts built-in modules like `sys`.
- Thrift extractor: Extracts Thrift IDL files.
- Python extractor: Extracts Python source code files.
- Package extractor: Extracts minimal information for package folders.
- General file extractor: Any files rejected by the above passes are added to the database as a text blob.
#### Python extraction
The Python extractor is the most interesting of the processes mentioned above.
The Python extractor takes a path to a Python file. It emits TRAP to the specified folder and a UTF-8 encoded version of the source to the source archive.
It consists of the following passes:
1. Ingestion and decoding: Read the contents of the file as bytes, determine its encoding, and decode it to text.
2. Tokenizing: Tokenize the source text, including whitespace and comment tokens.
3. Parsing: Create a concrete parse tree from the list of tokens.
4. Rewriting: Rewrite the concrete parse tree to an AST, annotated with scope, variable information, and locations.
5. Write out lexical and AST information as TRAP.
6. Generate and emit TRAP for control-flow graphs. This is done one scope at a time to minimize memory consumption.
7. Emit ancillary information, like TRAP for comments.
#### Template file extraction
Most Python template languages work by either translating the template into Python or by fairly closely mimicking the behavior of Python. This means that we can extract template files by converting them to the same AST used internally by the Python extractor and then passing that AST to the backend of the Python extractor to determine imports, and generate TRAP files including control-flow information.

View File

@@ -0,0 +1,4 @@
import semmle.populator
if __name__ == "__main__":
semmle.populator.main()

View File

@@ -0,0 +1,224 @@
# Grammar for 2to3. This grammar supports Python 2.x and 3.x.
# NOTE WELL: You should also follow all the steps listed at
# https://devguide.python.org/grammar/
# Start symbols for the grammar:
# file_input is a module or sequence of commands read from an input file;
# single_input is a single interactive statement;
# eval_input is the input for the eval() and input() functions.
# NB: compound_stmt in single_input is followed by extra NEWLINE!
file_input: (NEWLINE | stmt)* ENDMARKER
single_input: NEWLINE | simple_stmt | compound_stmt NEWLINE
eval_input: testlist NEWLINE* ENDMARKER
decorator: '@' namedexpr_test NEWLINE
decorators: decorator+
decorated: decorators (classdef | funcdef | async_funcdef)
async_funcdef: 'async' funcdef
funcdef: 'def' NAME parameters ['->' test] ':' suite
parameters: '(' [typedargslist] ')'
# The following definition for typedarglist is equivalent to this set of rules:
#
# arguments = argument (',' argument)*
# argument = tfpdef ['=' test]
# kwargs = '**' tname [',']
# args = '*' [tname]
# kwonly_kwargs = (',' argument)* [',' [kwargs]]
# args_kwonly_kwargs = args kwonly_kwargs | kwargs
# poskeyword_args_kwonly_kwargs = arguments [',' [args_kwonly_kwargs]]
# typedargslist_no_posonly = poskeyword_args_kwonly_kwargs | args_kwonly_kwargs
# typedarglist = arguments ',' '/' [',' [typedargslist_no_posonly]])|(typedargslist_no_posonly)"
#
# It needs to be fully expanded to allow our LL(1) parser to work on it.
typedargslist: tfpdef ['=' test] (',' tfpdef ['=' test])* ',' '/' [
',' [((tfpdef ['=' test] ',')* ('*' [tname] (',' tname ['=' test])*
[',' ['**' tname [',']]] | '**' tname [','])
| tfpdef ['=' test] (',' tfpdef ['=' test])* [','])]
] | ((tfpdef ['=' test] ',')* ('*' [tname] (',' tname ['=' test])*
[',' ['**' tname [',']]] | '**' tname [','])
| tfpdef ['=' test] (',' tfpdef ['=' test])* [','])
tname: NAME [':' test]
tfpdef: tname | '(' tfplist ')'
tfplist: tfpdef (',' tfpdef)* [',']
# The following definition for varargslist is equivalent to this set of rules:
#
# arguments = argument (',' argument )*
# argument = vfpdef ['=' test]
# kwargs = '**' vname [',']
# args = '*' [vname]
# kwonly_kwargs = (',' argument )* [',' [kwargs]]
# args_kwonly_kwargs = args kwonly_kwargs | kwargs
# poskeyword_args_kwonly_kwargs = arguments [',' [args_kwonly_kwargs]]
# vararglist_no_posonly = poskeyword_args_kwonly_kwargs | args_kwonly_kwargs
# varargslist = arguments ',' '/' [','[(vararglist_no_posonly)]] | (vararglist_no_posonly)
#
# It needs to be fully expanded to allow our LL(1) parser to work on it.
varargslist: vfpdef ['=' test ](',' vfpdef ['=' test])* ',' '/' [',' [
((vfpdef ['=' test] ',')* ('*' [vname] (',' vname ['=' test])*
[',' ['**' vname [',']]] | '**' vname [','])
| vfpdef ['=' test] (',' vfpdef ['=' test])* [','])
]] | ((vfpdef ['=' test] ',')*
('*' [vname] (',' vname ['=' test])* [',' ['**' vname [',']]]| '**' vname [','])
| vfpdef ['=' test] (',' vfpdef ['=' test])* [','])
vname: NAME
vfpdef: vname | '(' vfplist ')'
vfplist: vfpdef (',' vfpdef)* [',']
stmt: simple_stmt | compound_stmt
simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE
small_stmt: (expr_stmt | print_stmt | del_stmt | pass_stmt | flow_stmt |
import_stmt | global_stmt | exec_stmt | assert_stmt)
expr_stmt: testlist_star_expr (annassign | augassign (yield_expr|testlist) |
('=' (yield_expr|testlist_star_expr))*)
annassign: ':' test ['=' test]
testlist_star_expr: (test|star_expr) (',' (test|star_expr))* [',']
augassign: ('+=' | '-=' | '*=' | '@=' | '/=' | '%=' | '&=' | '|=' | '^=' |
'<<=' | '>>=' | '**=' | '//=')
# For normal and annotated assignments, additional restrictions enforced by the interpreter
print_stmt: 'print' ( [ test (',' test)* [','] ] |
'>>' test [ (',' test)+ [','] ] )
del_stmt: 'del' del_list
del_list: (expr|star_expr) (',' (expr|star_expr))* [',']
pass_stmt: 'pass'
flow_stmt: break_stmt | continue_stmt | return_stmt | raise_stmt | yield_stmt
break_stmt: 'break'
continue_stmt: 'continue'
return_stmt: 'return' [testlist_star_expr]
yield_stmt: yield_expr
raise_stmt: 'raise' [test ['from' test | ',' test [',' test]]]
import_stmt: import_name | import_from
import_name: 'import' dotted_as_names
import_from: ('from' ('.'* dotted_name | '.'+)
'import' ('*' | '(' import_as_names ')' | import_as_names))
import_as_name: NAME ['as' NAME]
dotted_as_name: dotted_name ['as' NAME]
import_as_names: import_as_name (',' import_as_name)* [',']
dotted_as_names: dotted_as_name (',' dotted_as_name)*
dotted_name: NAME ('.' NAME)*
global_stmt: ('global' | 'nonlocal') NAME (',' NAME)*
exec_stmt: 'exec' expr ['in' test [',' test]]
assert_stmt: 'assert' test [',' test]
compound_stmt: if_stmt | while_stmt | for_stmt | try_stmt | with_stmt | funcdef | classdef | decorated | async_stmt
async_stmt: 'async' (funcdef | with_stmt | for_stmt)
if_stmt: 'if' namedexpr_test ':' suite ('elif' namedexpr_test ':' suite)* ['else' ':' suite]
while_stmt: 'while' namedexpr_test ':' suite ['else' ':' suite]
for_stmt: 'for' exprlist 'in' testlist ':' suite ['else' ':' suite]
try_stmt: ('try' ':' suite
((except_clause ':' suite)+
['else' ':' suite]
['finally' ':' suite] |
'finally' ':' suite))
with_stmt: 'with' with_item (',' with_item)* ':' suite
with_item: test ['as' expr]
with_var: 'as' expr
# NB compile.c makes sure that the default except clause is last
except_clause: 'except' [test [(',' | 'as') test]]
suite: simple_stmt | NEWLINE INDENT stmt+ DEDENT
# Backward compatibility cruft to support:
# [ x for x in lambda: True, lambda: False if x() ]
# even while also allowing:
# lambda x: 5 if x else 2
# (But not a mix of the two)
testlist_safe: old_test [(',' old_test)+ [',']]
old_test: or_test | old_lambdef
old_lambdef: 'lambda' [varargslist] ':' old_test
namedexpr_test: test [':=' test]
test: or_test ['if' or_test 'else' test] | lambdef
or_test: and_test ('or' and_test)*
and_test: not_test ('and' not_test)*
not_test: 'not' not_test | comparison
comparison: expr (comp_op expr)*
comp_op: '<'|'>'|'=='|'>='|'<='|'<>'|'!='|'in'|'not' 'in'|'is'|'is' 'not'
star_expr: '*' expr
expr: xor_expr ('|' xor_expr)*
xor_expr: and_expr ('^' and_expr)*
and_expr: shift_expr ('&' shift_expr)*
shift_expr: arith_expr (('<<'|'>>') arith_expr)*
arith_expr: term (('+'|'-') term)*
term: factor (('*'|'@'|'/'|'%'|'//') factor)*
factor: ('+'|'-'|'~') factor | power
power: ['await'] atom trailer* ['**' factor]
atom: ('(' [yield_expr|testlist_gexp] ')' |
'[' [listmaker] ']' |
'{' [dictsetmaker] '}' |
'`' testlist1 '`' |
NAME | NUMBER | string | '.' '.' '.' | special_operation
)
string: (fstring_part | STRING)+
fstring_part: FSTRING_START testlist ['='] [ CONVERSION ] [ format_specifier ] (FSTRING_MID testlist ['='] [ CONVERSION ] [ format_specifier ] )* FSTRING_END
format_specifier: ':' (FSTRING_SPEC test [ CONVERSION ] [ format_specifier ] )* FSTRING_SPEC
listmaker: (namedexpr_test|star_expr) ( old_comp_for | (',' (namedexpr_test|star_expr))* [','] )
testlist_gexp: (namedexpr_test|star_expr) ( old_comp_for | (',' (namedexpr_test|star_expr))* [','] )
lambdef: 'lambda' [varargslist] ':' test
trailer: '(' [arglist] ')' | '[' subscriptlist ']' | '.' NAME
subscriptlist: subscript (',' subscript)* [',']
subscript: test | [test] ':' [test] [ ':' [test] ]
exprlist: (expr|star_expr) (',' (expr|star_expr))* [',']
testlist: test (',' test)* [',']
dictsetmaker: ( ((test ':' test | '**' expr)
(comp_for | (',' (test ':' test | '**' expr))* [','])) |
((test | star_expr)
(comp_for | (',' (test | star_expr))* [','])) )
classdef: 'class' NAME ['(' [arglist] ')'] ':' suite
arglist: argument (',' argument)* [',']
# "test '=' test" is really "keyword '=' test", but we have no such token.
# These need to be in a single rule to avoid grammar that is ambiguous
# to our LL(1) parser. Even though 'test' includes '*expr' in star_expr,
# we explicitly match '*' here, too, to give it proper precedence.
# Illegal combinations and orderings are blocked in ast.c:
# multiple (test comp_for) arguments are blocked; keyword unpackings
# that precede iterable unpackings are blocked; etc.
argument: ( test [comp_for] |
test ':=' test |
test '=' test |
'**' test |
'*' test )
comp_iter: comp_for | comp_if
comp_for: ['async'] 'for' exprlist 'in' or_test [comp_iter]
comp_if: 'if' old_test [comp_iter]
# As noted above, testlist_safe extends the syntax allowed in list
# comprehensions and generators. We can't use it indiscriminately in all
# derivations using a comp_for-like pattern because the testlist_safe derivation
# contains comma which clashes with trailing comma in arglist.
#
# This was an issue because the parser would not follow the correct derivation
# when parsing syntactically valid Python code. Since testlist_safe was created
# specifically to handle list comprehensions and generator expressions enclosed
# with parentheses, it's safe to only use it in those. That avoids the issue; we
# can parse code like set(x for x in [],).
#
# The syntax supported by this set of rules is not a valid Python 3 syntax,
# hence the prefix "old".
#
# See https://bugs.python.org/issue27494
old_comp_iter: old_comp_for | old_comp_if
old_comp_for: ['async'] 'for' exprlist 'in' testlist_safe [old_comp_iter]
old_comp_if: 'if' old_test [old_comp_iter]
testlist1: test (',' test)*
# not used in grammar, but may appear in "node" passed from Parser to Compiler
encoding_decl: NAME
yield_expr: 'yield' [yield_arg]
yield_arg: 'from' test | testlist_star_expr
special_operation: DOLLARNAME '(' [testlist] ')'

View File

@@ -0,0 +1,254 @@
A. HISTORY OF THE SOFTWARE
==========================
Python was created in the early 1990s by Guido van Rossum at Stichting
Mathematisch Centrum (CWI, see http://www.cwi.nl) in the Netherlands
as a successor of a language called ABC. Guido remains Python's
principal author, although it includes many contributions from others.
In 1995, Guido continued his work on Python at the Corporation for
National Research Initiatives (CNRI, see http://www.cnri.reston.va.us)
in Reston, Virginia where he released several versions of the
software.
In May 2000, Guido and the Python core development team moved to
BeOpen.com to form the BeOpen PythonLabs team. In October of the same
year, the PythonLabs team moved to Digital Creations, which became
Zope Corporation. In 2001, the Python Software Foundation (PSF, see
https://www.python.org/psf/) was formed, a non-profit organization
created specifically to own Python-related Intellectual Property.
Zope Corporation was a sponsoring member of the PSF.
All Python releases are Open Source (see http://www.opensource.org for
the Open Source Definition). Historically, most, but not all, Python
releases have also been GPL-compatible; the table below summarizes
the various releases.
Release Derived Year Owner GPL-
from compatible? (1)
0.9.0 thru 1.2 1991-1995 CWI yes
1.3 thru 1.5.2 1.2 1995-1999 CNRI yes
1.6 1.5.2 2000 CNRI no
2.0 1.6 2000 BeOpen.com no
1.6.1 1.6 2001 CNRI yes (2)
2.1 2.0+1.6.1 2001 PSF no
2.0.1 2.0+1.6.1 2001 PSF yes
2.1.1 2.1+2.0.1 2001 PSF yes
2.1.2 2.1.1 2002 PSF yes
2.1.3 2.1.2 2002 PSF yes
2.2 and above 2.1.1 2001-now PSF yes
Footnotes:
(1) GPL-compatible doesn't mean that we're distributing Python under
the GPL. All Python licenses, unlike the GPL, let you distribute
a modified version without making your changes open source. The
GPL-compatible licenses make it possible to combine Python with
other software that is released under the GPL; the others don't.
(2) According to Richard Stallman, 1.6.1 is not GPL-compatible,
because its license has a choice of law clause. According to
CNRI, however, Stallman's lawyer has told CNRI's lawyer that 1.6.1
is "not incompatible" with the GPL.
Thanks to the many outside volunteers who have worked under Guido's
direction to make these releases possible.
B. TERMS AND CONDITIONS FOR ACCESSING OR OTHERWISE USING PYTHON
===============================================================
PYTHON SOFTWARE FOUNDATION LICENSE VERSION 2
--------------------------------------------
1. This LICENSE AGREEMENT is between the Python Software Foundation
("PSF"), and the Individual or Organization ("Licensee") accessing and
otherwise using this software ("Python") in source or binary form and
its associated documentation.
2. Subject to the terms and conditions of this License Agreement, PSF hereby
grants Licensee a nonexclusive, royalty-free, world-wide license to reproduce,
analyze, test, perform and/or display publicly, prepare derivative works,
distribute, and otherwise use Python alone or in any derivative version,
provided, however, that PSF's License Agreement and PSF's notice of copyright,
i.e., "Copyright (c) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010,
2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018 Python Software Foundation; All
Rights Reserved" are retained in Python alone or in any derivative version
prepared by Licensee.
3. In the event Licensee prepares a derivative work that is based on
or incorporates Python or any part thereof, and wants to make
the derivative work available to others as provided herein, then
Licensee hereby agrees to include in any such work a brief summary of
the changes made to Python.
4. PSF is making Python available to Licensee on an "AS IS"
basis. PSF MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR
IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, PSF MAKES NO AND
DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS
FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF PYTHON WILL NOT
INFRINGE ANY THIRD PARTY RIGHTS.
5. PSF SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF PYTHON
FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS
A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING PYTHON,
OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF.
6. This License Agreement will automatically terminate upon a material
breach of its terms and conditions.
7. Nothing in this License Agreement shall be deemed to create any
relationship of agency, partnership, or joint venture between PSF and
Licensee. This License Agreement does not grant permission to use PSF
trademarks or trade name in a trademark sense to endorse or promote
products or services of Licensee, or any third party.
8. By copying, installing or otherwise using Python, Licensee
agrees to be bound by the terms and conditions of this License
Agreement.
BEOPEN.COM LICENSE AGREEMENT FOR PYTHON 2.0
-------------------------------------------
BEOPEN PYTHON OPEN SOURCE LICENSE AGREEMENT VERSION 1
1. This LICENSE AGREEMENT is between BeOpen.com ("BeOpen"), having an
office at 160 Saratoga Avenue, Santa Clara, CA 95051, and the
Individual or Organization ("Licensee") accessing and otherwise using
this software in source or binary form and its associated
documentation ("the Software").
2. Subject to the terms and conditions of this BeOpen Python License
Agreement, BeOpen hereby grants Licensee a non-exclusive,
royalty-free, world-wide license to reproduce, analyze, test, perform
and/or display publicly, prepare derivative works, distribute, and
otherwise use the Software alone or in any derivative version,
provided, however, that the BeOpen Python License is retained in the
Software, alone or in any derivative version prepared by Licensee.
3. BeOpen is making the Software available to Licensee on an "AS IS"
basis. BEOPEN MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR
IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, BEOPEN MAKES NO AND
DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS
FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF THE SOFTWARE WILL NOT
INFRINGE ANY THIRD PARTY RIGHTS.
4. BEOPEN SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF THE
SOFTWARE FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS
AS A RESULT OF USING, MODIFYING OR DISTRIBUTING THE SOFTWARE, OR ANY
DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF.
5. This License Agreement will automatically terminate upon a material
breach of its terms and conditions.
6. This License Agreement shall be governed by and interpreted in all
respects by the law of the State of California, excluding conflict of
law provisions. Nothing in this License Agreement shall be deemed to
create any relationship of agency, partnership, or joint venture
between BeOpen and Licensee. This License Agreement does not grant
permission to use BeOpen trademarks or trade names in a trademark
sense to endorse or promote products or services of Licensee, or any
third party. As an exception, the "BeOpen Python" logos available at
http://www.pythonlabs.com/logos.html may be used according to the
permissions granted on that web page.
7. By copying, installing or otherwise using the software, Licensee
agrees to be bound by the terms and conditions of this License
Agreement.
CNRI LICENSE AGREEMENT FOR PYTHON 1.6.1
---------------------------------------
1. This LICENSE AGREEMENT is between the Corporation for National
Research Initiatives, having an office at 1895 Preston White Drive,
Reston, VA 20191 ("CNRI"), and the Individual or Organization
("Licensee") accessing and otherwise using Python 1.6.1 software in
source or binary form and its associated documentation.
2. Subject to the terms and conditions of this License Agreement, CNRI
hereby grants Licensee a nonexclusive, royalty-free, world-wide
license to reproduce, analyze, test, perform and/or display publicly,
prepare derivative works, distribute, and otherwise use Python 1.6.1
alone or in any derivative version, provided, however, that CNRI's
License Agreement and CNRI's notice of copyright, i.e., "Copyright (c)
1995-2001 Corporation for National Research Initiatives; All Rights
Reserved" are retained in Python 1.6.1 alone or in any derivative
version prepared by Licensee. Alternately, in lieu of CNRI's License
Agreement, Licensee may substitute the following text (omitting the
quotes): "Python 1.6.1 is made available subject to the terms and
conditions in CNRI's License Agreement. This Agreement together with
Python 1.6.1 may be located on the Internet using the following
unique, persistent identifier (known as a handle): 1895.22/1013. This
Agreement may also be obtained from a proxy server on the Internet
using the following URL: http://hdl.handle.net/1895.22/1013".
3. In the event Licensee prepares a derivative work that is based on
or incorporates Python 1.6.1 or any part thereof, and wants to make
the derivative work available to others as provided herein, then
Licensee hereby agrees to include in any such work a brief summary of
the changes made to Python 1.6.1.
4. CNRI is making Python 1.6.1 available to Licensee on an "AS IS"
basis. CNRI MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR
IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, CNRI MAKES NO AND
DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS
FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF PYTHON 1.6.1 WILL NOT
INFRINGE ANY THIRD PARTY RIGHTS.
5. CNRI SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF PYTHON
1.6.1 FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS
A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING PYTHON 1.6.1,
OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF.
6. This License Agreement will automatically terminate upon a material
breach of its terms and conditions.
7. This License Agreement shall be governed by the federal
intellectual property law of the United States, including without
limitation the federal copyright law, and, to the extent such
U.S. federal law does not apply, by the law of the Commonwealth of
Virginia, excluding Virginia's conflict of law provisions.
Notwithstanding the foregoing, with regard to derivative works based
on Python 1.6.1 that incorporate non-separable material that was
previously distributed under the GNU General Public License (GPL), the
law of the Commonwealth of Virginia shall govern this License
Agreement only as to issues arising under or with respect to
Paragraphs 4, 5, and 7 of this License Agreement. Nothing in this
License Agreement shall be deemed to create any relationship of
agency, partnership, or joint venture between CNRI and Licensee. This
License Agreement does not grant permission to use CNRI trademarks or
trade name in a trademark sense to endorse or promote products or
services of Licensee, or any third party.
8. By clicking on the "ACCEPT" button where indicated, or by copying,
installing or otherwise using Python 1.6.1, Licensee agrees to be
bound by the terms and conditions of this License Agreement.
ACCEPT
CWI LICENSE AGREEMENT FOR PYTHON 0.9.0 THROUGH 1.2
--------------------------------------------------
Copyright (c) 1991 - 1995, Stichting Mathematisch Centrum Amsterdam,
The Netherlands. All rights reserved.
Permission to use, copy, modify, and distribute this software and its
documentation for any purpose and without fee is hereby granted,
provided that the above copyright notice appear in all copies and that
both that copyright notice and this permission notice appear in
supporting documentation, and that the name of Stichting Mathematisch
Centrum or CWI not be used in advertising or publicity pertaining to
distribution of the software without specific, written prior
permission.
STICHTING MATHEMATISCH CENTRUM DISCLAIMS ALL WARRANTIES WITH REGARD TO
THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
FITNESS, IN NO EVENT SHALL STICHTING MATHEMATISCH CENTRUM BE LIABLE
FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT
OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

View File

@@ -0,0 +1,20 @@
This code is derived from the black code formatter,
which itself was derived from the lib2to3 package in the Python standard library.
We (Semmle) have modified this further to ease conversion to our multi-version AST.
Original README from black:
A subset of lib2to3 taken from Python 3.7.0b2.
Commit hash: 9c17e3a1987004b8bcfbe423953aad84493a7984
Reasons for forking:
- consistent handling of f-strings for users of Python < 3.6.2
- backport of BPO-33064 that fixes parsing files with trailing commas after
*args and **kwargs
- backport of GH-6143 that restores the ability to reformat legacy usage of
`async`
- support all types of string literals
- better ability to debug (better reprs)
- INDENT and DEDENT don't hold whitespace and comment prefixes
- ability to Cythonize

View File

@@ -0,0 +1,67 @@
# Building Concrete Parse Trees using the Python grammar
This grammar is mostly reusing existing code:
- `lib2to3` is a part of the `2to3` utility (included in the CPython
distribution) aimed at automatically converting Python 2 code to equivalent
Python 3 code. Because it needs to be idempotent when applied to Python 3
code, this grammar must be capable of parsing both Python 2 and 3 (with
certain restrictions).
- `blib2to3` is part of the `black` formatter for Python. It adds a few
extensions on top of `lib2to3`.
- Finally, we extend this grammar even further, in order to support things like
f-strings even when the extractor is run using Python 2. (In this respect,
`blib2to3` "cheats" by requiring Python 3 if you want to parse Python 3 code.
We do not have this luxury.)
The grammar of Python is described in `Grammar.txt` in the style of an EBNF:
- Rules have the form `nonterminal_name: production` (where traditionally, one
would use `::=` instead of `:`)
- Productions can contain
- Literal strings, enclosed in single quotes.
- Alternation, indicated by an infix `|`.
- Repetition, indicated by a postfixed `*` for "zero or more" and `+` for
"one or more".
- Optional parts, indicated by these being surrounded by square brackets.
- Parentheseses to indicate grouping, and to allow productions to span several lines.
>Note: You may wonder: How is `Grammar.txt` parsed? The answer to this is that
>it is used to parse itself. In particular, it uses the same tokenizer as that
>for Python, and hence every symbol appearing in the grammar must be a valid
>Python token. This is why rules use `:` instead of `::=`. This also explains
>why parentheses must be used when a production spans multiple lines, as the
>presence of parentheses affects the tokenization.
The concrete parse tree built based on these rules has a simple form: Each node
has a `name` attribute, equal to that of the corresponding nonterminal, and a
`children` attribute, which contains a list of all of the children of the node.
These come directly from the production on the right hand side of the rule for
the given nonterminal. Thus, something like
```
testlist: test (',' test)* [',']
```
will result in a node with name `testlist`, and its attribute `children` will be
a list where the first element is a `test` node, the second (if any) is a node
for `','`, etc. Note in particular that _every_ part of the production is
included in the children, even parts that are just static tokens.
The leaves of the concrete parse tree (corresponding to the terminals of the
grammar) will have an associated `value` attribute. This contains the underlying
string for this token (in particular, for a `NAME` token, its value will be the
underlying identifier).
## From Concrete to Abstract
To turn the concrete parse tree into an asbstract parse tree, we _walk_ the tree
using the visitor pattern. Thus, for every nonterminal (e.g. `testlist`) we have
a method (in this case `visit_testlist`) that takes care of visiting nodes of
this type in the concrete parse tree. In doing so, we build up the abstract
parse tree, eliding any nodes that are not relevant in terms of the abstract
syntax.
>TO DO:
>- Why we parse everything four times (`async` et al.)

View File

@@ -0,0 +1 @@
#empty

View File

@@ -0,0 +1,37 @@
# Copyright 2004-2005 Elemental Security, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
# Modifications:
# Copyright 2006 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Parser driver.
This provides a high-level interface to parse a file into a syntax tree.
"""
__author__ = "Guido van Rossum <guido@python.org>"
__all__ = ["load_grammar"]
# Python imports
import os
import logging
import pkgutil
import sys
# Pgen imports
from . import grammar, pgen
if sys.version < "3":
from cStringIO import StringIO
else:
from io import StringIO
def load_grammar(package, grammar):
"""Load the grammar (maybe from a pickle)."""
data = pkgutil.get_data(package, grammar)
stream = StringIO(data.decode("utf8"))
g = pgen.generate_grammar(grammar, stream)
return g

View File

@@ -0,0 +1,188 @@
# Copyright 2004-2005 Elemental Security, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""This module defines the data structures used to represent a grammar.
These are a bit arcane because they are derived from the data
structures used by Python's 'pgen' parser generator.
There's also a table here mapping operators to their names in the
token module; the Python tokenize module reports all operators as the
fallback token code OP, but the parser needs the actual token code.
"""
# Python imports
import pickle
# Local imports
from . import token
class Grammar(object):
"""Pgen parsing tables conversion class.
Once initialized, this class supplies the grammar tables for the
parsing engine implemented by parse.py. The parsing engine
accesses the instance variables directly. The class here does not
provide initialization of the tables; several subclasses exist to
do this (see the conv and pgen modules).
The load() method reads the tables from a pickle file, which is
much faster than the other ways offered by subclasses. The pickle
file is written by calling dump() (after loading the grammar
tables using a subclass). The report() method prints a readable
representation of the tables to stdout, for debugging.
The instance variables are as follows:
symbol2number -- a dict mapping symbol names to numbers. Symbol
numbers are always 256 or higher, to distinguish
them from token numbers, which are between 0 and
255 (inclusive).
number2symbol -- a dict mapping numbers to symbol names;
these two are each other's inverse.
states -- a list of DFAs, where each DFA is a list of
states, each state is a list of arcs, and each
arc is a (i, j) pair where i is a label and j is
a state number. The DFA number is the index into
this list. (This name is slightly confusing.)
Final states are represented by a special arc of
the form (0, j) where j is its own state number.
dfas -- a dict mapping symbol numbers to (DFA, first)
pairs, where DFA is an item from the states list
above, and first is a set of tokens that can
begin this grammar rule (represented by a dict
whose values are always 1).
labels -- a list of (x, y) pairs where x is either a token
number or a symbol number, and y is either None
or a string; the strings are keywords. The label
number is the index in this list; label numbers
are used to mark state transitions (arcs) in the
DFAs.
start -- the number of the grammar's start symbol.
keywords -- a dict mapping keyword strings to arc labels.
tokens -- a dict mapping token numbers to arc labels.
"""
def __init__(self):
self.symbol2number = {}
self.number2symbol = {}
self.states = []
self.dfas = {}
self.labels = [(0, "EMPTY")]
self.keywords = {}
self.tokens = {}
self.symbol2label = {}
self.start = 256
def dump(self, filename):
"""Dump the grammar tables to a pickle file."""
with open(filename, "wb") as f:
pickle.dump(self.__dict__, f, pickle.HIGHEST_PROTOCOL)
def load(self, filename):
"""Load the grammar tables from a pickle file."""
with open(filename, "rb") as f:
d = pickle.load(f)
self.__dict__.update(d)
def loads(self, pkl):
"""Load the grammar tables from a pickle bytes object."""
self.__dict__.update(pickle.loads(pkl))
def copy(self):
"""
Copy the grammar.
"""
new = self.__class__()
for dict_attr in ("symbol2number", "number2symbol", "dfas", "keywords",
"tokens", "symbol2label"):
setattr(new, dict_attr, getattr(self, dict_attr).copy())
new.labels = self.labels[:]
new.states = self.states[:]
new.start = self.start
return new
def report(self):
"""Dump the grammar tables to standard output, for debugging."""
from pprint import pprint
print("s2n")
pprint(self.symbol2number)
print("n2s")
pprint(self.number2symbol)
print("states")
pprint(self.states)
print("dfas")
pprint(self.dfas)
print("labels")
pprint(self.labels)
print("start", self.start)
# Map from operator to number (since tokenize doesn't do this)
opmap_raw = """
( LPAR
) RPAR
[ LSQB
] RSQB
: COLON
, COMMA
; SEMI
+ PLUS
- MINUS
* STAR
/ SLASH
| VBAR
& AMPER
< LESS
> GREATER
= EQUAL
. DOT
% PERCENT
` BACKQUOTE
{ LBRACE
} RBRACE
@ AT
@= ATEQUAL
== EQEQUAL
!= NOTEQUAL
<> NOTEQUAL
<= LESSEQUAL
>= GREATEREQUAL
~ TILDE
^ CIRCUMFLEX
<< LEFTSHIFT
>> RIGHTSHIFT
** DOUBLESTAR
+= PLUSEQUAL
-= MINEQUAL
*= STAREQUAL
/= SLASHEQUAL
%= PERCENTEQUAL
&= AMPEREQUAL
|= VBAREQUAL
^= CIRCUMFLEXEQUAL
<<= LEFTSHIFTEQUAL
>>= RIGHTSHIFTEQUAL
**= DOUBLESTAREQUAL
// DOUBLESLASH
//= DOUBLESLASHEQUAL
-> RARROW
:= COLONEQUAL
"""
opmap = {}
for line in opmap_raw.splitlines():
if line:
op, name = line.split()
opmap[op] = getattr(token, name)

View File

@@ -0,0 +1,201 @@
# Copyright 2004-2005 Elemental Security, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Parser engine for the grammar tables generated by pgen.
The grammar table must be loaded first.
See Parser/parser.c in the Python distribution for additional info on
how this parsing engine works.
"""
# Local imports
from . import token
class ParseError(Exception):
"""Exception to signal the parser is stuck."""
def __init__(self, msg, type, value, context):
Exception.__init__(self, "%s: type=%r, value=%r, context=%r" %
(msg, type, value, context))
self.msg = msg
self.type = type
self.value = value
self.context = context
class Parser(object):
"""Parser engine.
The proper usage sequence is:
p = Parser(grammar, [converter]) # create instance
p.setup([start]) # prepare for parsing
<for each input token>:
if p.addtoken(...): # parse a token; may raise ParseError
break
root = p.rootnode # root of abstract syntax tree
A Parser instance may be reused by calling setup() repeatedly.
A Parser instance contains state pertaining to the current token
sequence, and should not be used concurrently by different threads
to parse separate token sequences.
See driver.py for how to get input tokens by tokenizing a file or
string.
Parsing is complete when addtoken() returns True; the root of the
abstract syntax tree can then be retrieved from the rootnode
instance variable. When a syntax error occurs, addtoken() raises
the ParseError exception. There is no error recovery; the parser
cannot be used after a syntax error was reported (but it can be
reinitialized by calling setup()).
"""
def __init__(self, grammar, convert=None):
"""Constructor.
The grammar argument is a grammar.Grammar instance; see the
grammar module for more information.
The parser is not ready yet for parsing; you must call the
setup() method to get it started.
The optional convert argument is a function mapping concrete
syntax tree nodes to abstract syntax tree nodes. If not
given, no conversion is done and the syntax tree produced is
the concrete syntax tree. If given, it must be a function of
two arguments, the first being the grammar (a grammar.Grammar
instance), and the second being the concrete syntax tree node
to be converted. The syntax tree is converted from the bottom
up.
A concrete syntax tree node is a (type, value, context, nodes)
tuple, where type is the node type (a token or symbol number),
value is None for symbols and a string for tokens, context is
None or an opaque value used for error reporting (typically a
(lineno, offset) pair), and nodes is a list of children for
symbols, and None for tokens.
An abstract syntax tree node may be anything; this is entirely
up to the converter function.
"""
self.grammar = grammar
self.convert = convert or (lambda grammar, node: node)
def setup(self, start=None):
"""Prepare for parsing.
This *must* be called before starting to parse.
The optional argument is an alternative start symbol; it
defaults to the grammar's start symbol.
You can use a Parser instance to parse any number of programs;
each time you call setup() the parser is reset to an initial
state determined by the (implicit or explicit) start symbol.
"""
if start is None:
start = self.grammar.start
# Each stack entry is a tuple: (dfa, state, node).
# A node is a tuple: (type, value, context, children),
# where children is a list of nodes or None, and context may be None.
newnode = (start, None, None, [])
stackentry = (self.grammar.dfas[start], 0, newnode)
self.stack = [stackentry]
self.rootnode = None
self.used_names = set() # Aliased to self.rootnode.used_names in pop()
def addtoken(self, type, value, context):
"""Add a token; return True iff this is the end of the program."""
# Map from token to label
ilabel = self.classify(type, value, context)
# Loop until the token is shifted; may raise exceptions
while True:
dfa, state, node = self.stack[-1]
states, first = dfa
arcs = states[state]
# Look for a state with this label
for i, newstate in arcs:
t, v = self.grammar.labels[i]
if ilabel == i:
# Look it up in the list of labels
assert t < 256
# Shift a token; we're done with it
self.shift(type, value, newstate, context)
# Pop while we are in an accept-only state
state = newstate
while states[state] == [(0, state)]:
self.pop()
if not self.stack:
# Done parsing!
return True
dfa, state, node = self.stack[-1]
states, first = dfa
# Done with this token
return False
elif t >= 256:
# See if it's a symbol and if we're in its first set
itsdfa = self.grammar.dfas[t]
itsstates, itsfirst = itsdfa
if ilabel in itsfirst:
# Push a symbol
self.push(t, self.grammar.dfas[t], newstate, context)
break # To continue the outer while loop
else:
if (0, state) in arcs:
# An accepting state, pop it and try something else
self.pop()
if not self.stack:
# Done parsing, but another token is input
raise ParseError("too much input",
type, value, context)
else:
# No success finding a transition
raise ParseError("bad input", type, value, context)
def classify(self, type, value, context):
"""Turn a token into a label. (Internal)"""
if type == token.NAME:
# Keep a listing of all used names
self.used_names.add(value)
# Check for reserved words
ilabel = self.grammar.keywords.get(value)
if ilabel is not None:
return ilabel
ilabel = self.grammar.tokens.get(type)
if ilabel is None:
raise ParseError("bad token", type, value, context)
return ilabel
def shift(self, type, value, newstate, context):
"""Shift a token. (Internal)"""
dfa, state, node = self.stack[-1]
newnode = (type, value, context, None)
newnode = self.convert(self.grammar, newnode)
if newnode is not None:
node[-1].append(newnode)
self.stack[-1] = (dfa, newstate, node)
def push(self, type, newdfa, newstate, context):
"""Push a nonterminal. (Internal)"""
dfa, state, node = self.stack[-1]
newnode = (type, None, context, [])
self.stack[-1] = (dfa, newstate, node)
self.stack.append((newdfa, 0, newnode))
def pop(self):
"""Pop a nonterminal. (Internal)"""
popdfa, popstate, popnode = self.stack.pop()
newnode = self.convert(self.grammar, popnode)
if newnode is not None:
if self.stack:
dfa, state, node = self.stack[-1]
node[-1].append(newnode)
else:
self.rootnode = newnode
self.rootnode.used_names = self.used_names

View File

@@ -0,0 +1,386 @@
# Copyright 2004-2005 Elemental Security, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
# Pgen imports
from . import grammar, token, tokenize
class PgenGrammar(grammar.Grammar):
pass
class ParserGenerator(object):
def __init__(self, filename, stream=None):
close_stream = None
if stream is None:
stream = open(filename)
close_stream = stream.close
self.filename = filename
self.stream = stream
self.generator = tokenize.generate_tokens(stream.readline)
self.gettoken() # Initialize lookahead
self.dfas, self.startsymbol = self.parse()
if close_stream is not None:
close_stream()
self.first = {} # map from symbol name to set of tokens
self.addfirstsets()
def make_grammar(self):
c = PgenGrammar()
names = list(self.dfas.keys())
names.sort()
names.remove(self.startsymbol)
names.insert(0, self.startsymbol)
for name in names:
i = 256 + len(c.symbol2number)
c.symbol2number[name] = i
c.number2symbol[i] = name
for name in names:
dfa = self.dfas[name]
states = []
for state in dfa:
arcs = []
for label, next in sorted(state.arcs.items()):
arcs.append((self.make_label(c, label), dfa.index(next)))
if state.isfinal:
arcs.append((0, dfa.index(state)))
states.append(arcs)
c.states.append(states)
c.dfas[c.symbol2number[name]] = (states, self.make_first(c, name))
c.start = c.symbol2number[self.startsymbol]
return c
def make_first(self, c, name):
rawfirst = self.first[name]
first = {}
for label in sorted(rawfirst):
ilabel = self.make_label(c, label)
##assert ilabel not in first # XXX failed on <> ... !=
first[ilabel] = 1
return first
def make_label(self, c, label):
# XXX Maybe this should be a method on a subclass of converter?
ilabel = len(c.labels)
if label[0].isalpha():
# Either a symbol name or a named token
if label in c.symbol2number:
# A symbol name (a non-terminal)
if label in c.symbol2label:
return c.symbol2label[label]
else:
c.labels.append((c.symbol2number[label], None))
c.symbol2label[label] = ilabel
return ilabel
else:
# A named token (NAME, NUMBER, STRING)
itoken = getattr(token, label, None)
assert isinstance(itoken, int), label
assert itoken in token.tok_name, label
if itoken in c.tokens:
return c.tokens[itoken]
else:
c.labels.append((itoken, None))
c.tokens[itoken] = ilabel
return ilabel
else:
# Either a keyword or an operator
assert label[0] in ('"', "'"), label
value = eval(label)
if value[0].isalpha():
# A keyword
if value in c.keywords:
return c.keywords[value]
else:
c.labels.append((token.NAME, value))
c.keywords[value] = ilabel
return ilabel
else:
# An operator (any non-numeric token)
itoken = grammar.opmap[value] # Fails if unknown token
if itoken in c.tokens:
return c.tokens[itoken]
else:
c.labels.append((itoken, None))
c.tokens[itoken] = ilabel
return ilabel
def addfirstsets(self):
names = list(self.dfas.keys())
names.sort()
for name in names:
if name not in self.first:
self.calcfirst(name)
#print name, self.first[name].keys()
def calcfirst(self, name):
dfa = self.dfas[name]
self.first[name] = None # dummy to detect left recursion
state = dfa[0]
totalset = {}
overlapcheck = {}
for label, next in state.arcs.items():
if label in self.dfas:
if label in self.first:
fset = self.first[label]
if fset is None:
raise ValueError("recursion for rule %r" % name)
else:
self.calcfirst(label)
fset = self.first[label]
totalset.update(fset)
overlapcheck[label] = fset
else:
totalset[label] = 1
overlapcheck[label] = {label: 1}
inverse = {}
for label, itsfirst in overlapcheck.items():
for symbol in itsfirst:
if symbol in inverse:
raise ValueError("rule %s is ambiguous; %s is in the"
" first sets of %s as well as %s" %
(name, symbol, label, inverse[symbol]))
inverse[symbol] = label
self.first[name] = totalset
def parse(self):
dfas = {}
startsymbol = None
# MSTART: (NEWLINE | RULE)* ENDMARKER
while self.type != token.ENDMARKER:
while self.type == token.NEWLINE:
self.gettoken()
# RULE: NAME ':' RHS NEWLINE
name = self.expect(token.NAME)
self.expect(token.OP, ":")
a, z = self.parse_rhs()
self.expect(token.NEWLINE)
#self.dump_nfa(name, a, z)
dfa = self.make_dfa(a, z)
#self.dump_dfa(name, dfa)
oldlen = len(dfa)
self.simplify_dfa(dfa)
newlen = len(dfa)
dfas[name] = dfa
#print name, oldlen, newlen
if startsymbol is None:
startsymbol = name
return dfas, startsymbol
def make_dfa(self, start, finish):
# To turn an NFA into a DFA, we define the states of the DFA
# to correspond to *sets* of states of the NFA. Then do some
# state reduction. Let's represent sets as dicts with 1 for
# values.
assert isinstance(start, NFAState)
assert isinstance(finish, NFAState)
def closure(state):
base = {}
addclosure(state, base)
return base
def addclosure(state, base):
assert isinstance(state, NFAState)
if state in base:
return
base[state] = 1
for label, next in state.arcs:
if label is None:
addclosure(next, base)
states = [DFAState(closure(start), finish)]
for state in states: # NB states grows while we're iterating
arcs = {}
for nfastate in state.nfaset:
for label, next in nfastate.arcs:
if label is not None:
addclosure(next, arcs.setdefault(label, {}))
for label, nfaset in sorted(arcs.items()):
for st in states:
if st.nfaset == nfaset:
break
else:
st = DFAState(nfaset, finish)
states.append(st)
state.addarc(st, label)
return states # List of DFAState instances; first one is start
def dump_nfa(self, name, start, finish):
print("Dump of NFA for", name)
todo = [start]
for i, state in enumerate(todo):
print(" State", i, state is finish and "(final)" or "")
for label, next in state.arcs:
if next in todo:
j = todo.index(next)
else:
j = len(todo)
todo.append(next)
if label is None:
print(" -> %d" % j)
else:
print(" %s -> %d" % (label, j))
def dump_dfa(self, name, dfa):
print("Dump of DFA for", name)
for i, state in enumerate(dfa):
print(" State", i, state.isfinal and "(final)" or "")
for label, next in sorted(state.arcs.items()):
print(" %s -> %d" % (label, dfa.index(next)))
def simplify_dfa(self, dfa):
# This is not theoretically optimal, but works well enough.
# Algorithm: repeatedly look for two states that have the same
# set of arcs (same labels pointing to the same nodes) and
# unify them, until things stop changing.
# dfa is a list of DFAState instances
changes = True
while changes:
changes = False
for i, state_i in enumerate(dfa):
for j in range(i+1, len(dfa)):
state_j = dfa[j]
if state_i == state_j:
#print " unify", i, j
del dfa[j]
for state in dfa:
state.unifystate(state_j, state_i)
changes = True
break
def parse_rhs(self):
# RHS: ALT ('|' ALT)*
a, z = self.parse_alt()
if self.value != "|":
return a, z
else:
aa = NFAState()
zz = NFAState()
aa.addarc(a)
z.addarc(zz)
while self.value == "|":
self.gettoken()
a, z = self.parse_alt()
aa.addarc(a)
z.addarc(zz)
return aa, zz
def parse_alt(self):
# ALT: ITEM+
a, b = self.parse_item()
while (self.value in ("(", "[") or
self.type in (token.NAME, token.STRING)):
c, d = self.parse_item()
b.addarc(c)
b = d
return a, b
def parse_item(self):
# ITEM: '[' RHS ']' | ATOM ['+' | '*']
if self.value == "[":
self.gettoken()
a, z = self.parse_rhs()
self.expect(token.OP, "]")
a.addarc(z)
return a, z
else:
a, z = self.parse_atom()
value = self.value
if value not in ("+", "*"):
return a, z
self.gettoken()
z.addarc(a)
if value == "+":
return a, z
else:
return a, a
def parse_atom(self):
# ATOM: '(' RHS ')' | NAME | STRING
if self.value == "(":
self.gettoken()
a, z = self.parse_rhs()
self.expect(token.OP, ")")
return a, z
elif self.type in (token.NAME, token.STRING):
a = NFAState()
z = NFAState()
a.addarc(z, self.value)
self.gettoken()
return a, z
else:
self.raise_error("expected (...) or NAME or STRING, got %s/%s",
self.type, self.value)
def expect(self, type, value=None):
if self.type != type or (value is not None and self.value != value):
self.raise_error("expected %s/%s, got %s/%s",
type, value, self.type, self.value)
value = self.value
self.gettoken()
return value
def gettoken(self):
tup = next(self.generator)
while tup[0] in (tokenize.COMMENT, tokenize.NL):
tup = next(self.generator)
self.type, self.value, self.begin, self.end, self.line = tup
#print token.tok_name[self.type], repr(self.value)
def raise_error(self, msg, *args):
if args:
try:
msg = msg % args
except:
msg = " ".join([msg] + list(map(str, args)))
raise SyntaxError(msg, (self.filename, self.end[0],
self.end[1], self.line))
class NFAState(object):
def __init__(self):
self.arcs = [] # list of (label, NFAState) pairs
def addarc(self, next, label=None):
assert label is None or isinstance(label, str)
assert isinstance(next, NFAState)
self.arcs.append((label, next))
class DFAState(object):
def __init__(self, nfaset, final):
assert isinstance(nfaset, dict)
assert isinstance(next(iter(nfaset)), NFAState)
assert isinstance(final, NFAState)
self.nfaset = nfaset
self.isfinal = final in nfaset
self.arcs = {} # map from label to DFAState
def addarc(self, next, label):
assert isinstance(label, str)
assert label not in self.arcs
assert isinstance(next, DFAState)
self.arcs[label] = next
def unifystate(self, old, new):
for label, next in self.arcs.items():
if next is old:
self.arcs[label] = new
def __eq__(self, other):
# Equality test -- ignore the nfaset instance variable
assert isinstance(other, DFAState)
if self.isfinal != other.isfinal:
return False
# Can't just return self.arcs == other.arcs, because that
# would invoke this method recursively, with cycles...
if len(self.arcs) != len(other.arcs):
return False
for label, next in self.arcs.items():
if next is not other.arcs.get(label):
return False
return True
__hash__ = None # For Py3 compatibility.
def generate_grammar(filename, stream=None):
p = ParserGenerator(filename, stream)
return p.make_grammar()

View File

@@ -0,0 +1,91 @@
"""Token constants (from "token.h")."""
# Taken from Python (r53757) and modified to include some tokens
# originally monkeypatched in by pgen2.tokenize
#--start constants--
ENDMARKER = 0
NAME = 1
NUMBER = 2
STRING = 3
NEWLINE = 4
INDENT = 5
DEDENT = 6
LPAR = 7
RPAR = 8
LSQB = 9
RSQB = 10
COLON = 11
COMMA = 12
SEMI = 13
PLUS = 14
MINUS = 15
STAR = 16
SLASH = 17
VBAR = 18
AMPER = 19
LESS = 20
GREATER = 21
EQUAL = 22
DOT = 23
PERCENT = 24
BACKQUOTE = 25
LBRACE = 26
RBRACE = 27
EQEQUAL = 28
NOTEQUAL = 29
LESSEQUAL = 30
GREATEREQUAL = 31
TILDE = 32
CIRCUMFLEX = 33
LEFTSHIFT = 34
RIGHTSHIFT = 35
DOUBLESTAR = 36
PLUSEQUAL = 37
MINEQUAL = 38
STAREQUAL = 39
SLASHEQUAL = 40
PERCENTEQUAL = 41
AMPEREQUAL = 42
VBAREQUAL = 43
CIRCUMFLEXEQUAL = 44
LEFTSHIFTEQUAL = 45
RIGHTSHIFTEQUAL = 46
DOUBLESTAREQUAL = 47
DOUBLESLASH = 48
DOUBLESLASHEQUAL = 49
AT = 50
ATEQUAL = 51
OP = 52
COMMENT = 53
NL = 54
RARROW = 55
AWAIT = 56
ASYNC = 57
DOLLARNAME = 58
FSTRING_START = 59
FSTRING_MID = 60
FSTRING_END = 61
CONVERSION = 62
COLONEQUAL = 63
FSTRING_SPEC = 64
ILLEGALINDENT = 65
ERRORTOKEN = 66
N_TOKENS = 67
NT_OFFSET = 256
#--end constants--
tok_name = {}
for _name, _value in list(globals().items()):
if type(_value) is type(0):
tok_name[_value] = _name
def ISTERMINAL(x):
return x < NT_OFFSET
def ISNONTERMINAL(x):
return x >= NT_OFFSET
def ISEOF(x):
return x == ENDMARKER

View File

@@ -0,0 +1,509 @@
# Copyright (c) 2001, 2002, 2003, 2004, 2005, 2006 Python Software Foundation.
# All rights reserved.
"""Tokenization help for Python programs.
generate_tokens(readline) is a generator that breaks a stream of
text into Python tokens. It accepts a readline-like method which is called
repeatedly to get the next line of input (or "" for EOF). It generates
5-tuples with these members:
the token type (see token.py)
the token (a string)
the starting (row, column) indices of the token (a 2-tuple of ints)
the ending (row, column) indices of the token (a 2-tuple of ints)
the original line (string)
It is designed to match the working of the Python tokenizer exactly, except
that it produces COMMENT tokens for comments and gives type OP for all
operators
Older entry points
tokenize_loop(readline, tokeneater)
tokenize(readline, tokeneater=printtoken)
are the same, except instead of generating tokens, tokeneater is a callback
function to which the 5 fields described above are passed as 5 arguments,
each time a new token is found."""
__author__ = 'Ka-Ping Yee <ping@lfw.org>'
__credits__ = \
'GvR, ESR, Tim Peters, Thomas Wouters, Fred Drake, Skip Montanaro'
import re
from codecs import BOM_UTF8, lookup
from blib2to3.pgen2.token import *
import sys
from . import token
__all__ = [x for x in dir(token) if x[0] != '_'] + ["tokenize",
"generate_tokens", "untokenize"]
del token
try:
bytes
except NameError:
# Support bytes type in Python <= 2.5, so 2to3 turns itself into
# valid Python 3 code.
bytes = str
def group(*choices): return '(' + '|'.join(choices) + ')'
def any(*choices): return group(*choices) + '*'
def maybe(*choices): return group(*choices) + '?'
def _combinations(*l):
return set(
x + y for x in l for y in l + ("",) if x.lower() != y.lower()
)
Whitespace = r'[ \f\t]*'
Comment = r'#[^\r\n]*'
Ignore = Whitespace + any(r'\\\r?\n' + Whitespace) + maybe(Comment)
Name = r'\w+' # this is invalid but it's fine because Name comes after Number in all groups
DollarName = r'\$\w+'
Binnumber = r'0[bB]_?[01]+(?:_[01]+)*'
Hexnumber = r'0[xX]_?[\da-fA-F]+(?:_[\da-fA-F]+)*[lL]?'
Octnumber = r'0[oO]?_?[0-7]+(?:_[0-7]+)*[lL]?'
Decnumber = group(r'[1-9]\d*(?:_\d+)*[lL]?', '0[lL]?')
Intnumber = group(Binnumber, Hexnumber, Octnumber, Decnumber)
Exponent = r'[eE][-+]?\d+(?:_\d+)*'
Pointfloat = group(r'\d+(?:_\d+)*\.(?:\d+(?:_\d+)*)?', r'\.\d+(?:_\d+)*') + maybe(Exponent)
Expfloat = r'\d+(?:_\d+)*' + Exponent
Floatnumber = group(Pointfloat, Expfloat)
Imagnumber = group(r'\d+(?:_\d+)*[jJ]', Floatnumber + r'[jJ]')
Number = group(Imagnumber, Floatnumber, Intnumber)
# Tail end of ' string.
Single = r"[^'\\]*(?:\\.[^'\\]*)*'"
# Tail end of " string.
Double = r'[^"\\]*(?:\\.[^"\\]*)*"'
# Tail end of ''' string.
Single3 = r"[^'\\]*(?:(?:\\.|'(?!''))[^'\\]*)*'''"
# Tail end of """ string.
Double3 = r'[^"\\]*(?:(?:\\.|"(?!""))[^"\\]*)*"""'
_litprefix = r"(?:[uUrRbBfF]|[rR][fFbB]|[fFbBuU][rR])?"
Triple = group(_litprefix + "'''", _litprefix + '"""')
# Single-line ' or " string.
String = group(_litprefix + r"'[^\n'\\]*(?:\\.[^\n'\\]*)*'",
_litprefix + r'"[^\n"\\]*(?:\\.[^\n"\\]*)*"')
# Because of leftmost-then-longest match semantics, be sure to put the
# longest operators first (e.g., if = came before ==, == would get
# recognized as two instances of =).
Operator = group(r"\*\*=?", r">>=?", r"<<=?", r"<>", r"!=",
r"//=?", r"->",
r"[+\-*/%&@|^=<>]=?",
r"~")
Bracket = '[][(){}]'
Special = group(r'\r?\n', r'[:;.,`@]')
Funny = group(Operator, Bracket, Special)
PlainToken = group(Number, Funny, String, Name, DollarName)
Token = Ignore + PlainToken
# First (or only) line of ' or " string.
ContStr = group(_litprefix + r"'[^\n'\\]*(?:\\.[^\n'\\]*)*" +
group("'", r'\\\r?\n'),
_litprefix + r'"[^\n"\\]*(?:\\.[^\n"\\]*)*' +
group('"', r'\\\r?\n'))
PseudoExtras = group(r'\\\r?\n', Comment, Triple)
PseudoToken = Whitespace + group(PseudoExtras, Number, Funny, ContStr, Name, DollarName)
tokenprog = re.compile(Token, re.UNICODE)
pseudoprog = re.compile(PseudoToken, re.UNICODE)
single3prog = re.compile(Single3)
double3prog = re.compile(Double3)
_strprefixes = (
_combinations('r', 'R', 'f', 'F') |
_combinations('r', 'R', 'b', 'B') |
{'u', 'U', 'ur', 'uR', 'Ur', 'UR'}
)
endprogs = {"'": re.compile(Single), '"': re.compile(Double),
"'''": single3prog, '"""': double3prog,
}
endprogs.update({prefix+"'''": single3prog for prefix in _strprefixes})
endprogs.update({prefix+'"""': double3prog for prefix in _strprefixes})
endprogs.update({prefix: None for prefix in _strprefixes})
triple_quoted = (
{"'''", '"""'} |
{prefix+"'''" for prefix in _strprefixes} |
{prefix+'"""' for prefix in _strprefixes}
)
single_quoted = (
{"'", '"'} |
{prefix+"'" for prefix in _strprefixes} |
{prefix+'"' for prefix in _strprefixes}
)
tabsize = 8
class TokenError(Exception): pass
class StopTokenizing(Exception): pass
def printtoken(type, token, xxx_todo_changeme, xxx_todo_changeme1, line): # for testing
(srow, scol) = xxx_todo_changeme
(erow, ecol) = xxx_todo_changeme1
print("%d,%d-%d,%d:\t%s\t%s" % \
(srow, scol, erow, ecol, tok_name[type], repr(token)))
def tokenize(readline, tokeneater=printtoken):
"""
The tokenize() function accepts two parameters: one representing the
input stream, and one providing an output mechanism for tokenize().
The first parameter, readline, must be a callable object which provides
the same interface as the readline() method of built-in file objects.
Each call to the function should return one line of input as a string.
The second parameter, tokeneater, must also be a callable object. It is
called once for each token, with five arguments, corresponding to the
tuples generated by generate_tokens().
"""
try:
tokenize_loop(readline, tokeneater)
except StopTokenizing:
pass
# backwards compatible interface
def tokenize_loop(readline, tokeneater):
for token_info in generate_tokens(readline):
tokeneater(*token_info)
if sys.version_info > (3,):
isidentifier = str.isidentifier
else:
IDENTIFIER_RE = re.compile(r"^[^\d\W]\w*$", re.UNICODE)
def isidentifier(s):
return bool(IDENTIFIER_RE.match(s))
ASCII = re.ASCII if sys.version_info > (3,) else 0
cookie_re = re.compile(r'^[ \t\f]*#.*?coding[:=][ \t]*([-\w.]+)', ASCII)
blank_re = re.compile(br'^[ \t\f]*(?:[#\r\n]|$)', ASCII)
def _get_normal_name(orig_enc):
"""Imitates get_normal_name in tokenizer.c."""
# Only care about the first 12 characters.
enc = orig_enc[:12].lower().replace("_", "-")
if enc == "utf-8" or enc.startswith("utf-8-"):
return "utf-8"
if enc in ("latin-1", "iso-8859-1", "iso-latin-1") or \
enc.startswith(("latin-1-", "iso-8859-1-", "iso-latin-1-")):
return "iso-8859-1"
return orig_enc
def detect_encoding(readline):
"""
The detect_encoding() function is used to detect the encoding that should
be used to decode a Python source file. It requires one argument, readline,
in the same way as the tokenize() generator.
It will call readline a maximum of twice, and return the encoding used
(as a string) and a list of any lines (left as bytes) it has read
in.
It detects the encoding from the presence of a utf-8 bom or an encoding
cookie as specified in pep-0263. If both a bom and a cookie are present, but
disagree, a SyntaxError will be raised. If the encoding cookie is an invalid
charset, raise a SyntaxError. Note that if a utf-8 bom is found,
'utf-8-sig' is returned.
If no encoding is specified, then the default of 'utf-8' will be returned.
"""
bom_found = False
encoding = None
default = 'utf-8'
def read_or_stop():
try:
return readline()
except StopIteration:
return bytes()
def find_cookie(line):
try:
line_string = line.decode('ascii')
except UnicodeDecodeError:
return None
match = cookie_re.match(line_string)
if not match:
return None
encoding = _get_normal_name(match.group(1))
try:
codec = lookup(encoding)
except LookupError:
# This behaviour mimics the Python interpreter
raise SyntaxError("unknown encoding: " + encoding)
if bom_found:
if codec.name != 'utf-8':
# This behaviour mimics the Python interpreter
raise SyntaxError('encoding problem: utf-8')
encoding += '-sig'
return encoding
first = read_or_stop()
if first.startswith(BOM_UTF8):
bom_found = True
first = first[3:]
default = 'utf-8-sig'
if not first:
return default, []
encoding = find_cookie(first)
if encoding:
return encoding, [first]
if not blank_re.match(first):
return default, [first]
second = read_or_stop()
if not second:
return default, [first]
encoding = find_cookie(second)
if encoding:
return encoding, [first, second]
return default, [first, second]
def generate_tokens(readline):
"""
The generate_tokens() generator requires one argument, readline, which
must be a callable object which provides the same interface as the
readline() method of built-in file objects. Each call to the function
should return one line of input as a string. Alternately, readline
can be a callable function terminating with StopIteration:
readline = open(myfile).next # Example of alternate readline
The generator produces 5-tuples with these members: the token type; the
token string; a 2-tuple (srow, scol) of ints specifying the row and
column where the token begins in the source; a 2-tuple (erow, ecol) of
ints specifying the row and column where the token ends in the source;
and the line on which the token was found. The line passed is the
logical line; continuation lines are included.
"""
lnum = parenlev = continued = 0
numchars = '0123456789'
contstr, needcont = '', 0
contline = None
indents = [0]
# 'stashed' and 'async_*' are used for async/await parsing
stashed = None
async_def = False
async_def_indent = 0
async_def_nl = False
while 1: # loop over lines in stream
try:
line = readline()
except StopIteration:
line = ''
lnum = lnum + 1
pos, max = 0, len(line)
if contstr: # continued string
if not line:
raise TokenError("EOF in multi-line string", strstart)
endmatch = endprog.match(line)
if endmatch:
pos = end = endmatch.end(0)
yield (STRING, contstr + line[:end],
strstart, (lnum, end), contline + line)
contstr, needcont = '', 0
contline = None
elif needcont and line[-2:] != '\\\n' and line[-3:] != '\\\r\n':
yield (ERRORTOKEN, contstr + line,
strstart, (lnum, len(line)), contline)
contstr = ''
contline = None
continue
else:
contstr = contstr + line
contline = contline + line
continue
elif parenlev == 0 and not continued: # new statement
if not line: break
column = 0
while pos < max: # measure leading whitespace
if line[pos] == ' ': column = column + 1
elif line[pos] == '\t': column = (column//tabsize + 1)*tabsize
elif line[pos] == '\f': column = 0
else: break
pos = pos + 1
if pos == max: break
if stashed:
yield stashed
stashed = None
if line[pos] in '\r\n': # skip blank lines
yield (NL, line[pos:], (lnum, pos), (lnum, len(line)), line)
continue
if line[pos] == '#': # skip comments
comment_token = line[pos:].rstrip('\r\n')
nl_pos = pos + len(comment_token)
yield (COMMENT, comment_token,
(lnum, pos), (lnum, pos + len(comment_token)), line)
yield (NL, line[nl_pos:],
(lnum, nl_pos), (lnum, len(line)), line)
continue
if column > indents[-1]: # count indents
indents.append(column)
yield (INDENT, line[:pos], (lnum, 0), (lnum, pos), line)
while column < indents[-1]: # count dedents
if column not in indents:
raise IndentationError(
"unindent does not match any outer indentation level",
("<tokenize>", lnum, pos, line))
indents = indents[:-1]
if async_def and async_def_indent >= indents[-1]:
async_def = False
async_def_nl = False
async_def_indent = 0
yield (DEDENT, '', (lnum, pos), (lnum, pos), line)
if async_def and async_def_nl and async_def_indent >= indents[-1]:
async_def = False
async_def_nl = False
async_def_indent = 0
else: # continued statement
if not line:
raise TokenError("EOF in multi-line statement", (lnum, 0))
continued = 0
while pos < max:
pseudomatch = pseudoprog.match(line, pos)
if pseudomatch: # scan for tokens
start, end = pseudomatch.span(1)
spos, epos, pos = (lnum, start), (lnum, end), end
token, initial = line[start:end], line[start]
if initial in numchars or \
(initial == '.' and token != '.'): # ordinary number
yield (NUMBER, token, spos, epos, line)
elif initial in '\r\n':
newline = NEWLINE
if parenlev > 0:
newline = NL
elif async_def:
async_def_nl = True
if stashed:
yield stashed
stashed = None
yield (newline, token, spos, epos, line)
elif initial == '#':
assert not token.endswith("\n")
if stashed:
yield stashed
stashed = None
yield (COMMENT, token, spos, epos, line)
elif token in triple_quoted:
endprog = endprogs[token]
endmatch = endprog.match(line, pos)
if endmatch: # all on one line
pos = endmatch.end(0)
token = line[start:pos]
if stashed:
yield stashed
stashed = None
yield (STRING, token, spos, (lnum, pos), line)
else:
strstart = (lnum, start) # multiple lines
contstr = line[start:]
contline = line
break
elif initial in single_quoted or \
token[:2] in single_quoted or \
token[:3] in single_quoted:
if token[-1] == '\n': # continued string
strstart = (lnum, start)
endprog = (endprogs[initial] or endprogs[token[1]] or
endprogs[token[2]])
contstr, needcont = line[start:], 1
contline = line
break
else: # ordinary string
if stashed:
yield stashed
stashed = None
yield (STRING, token, spos, epos, line)
elif isidentifier(initial): # ordinary name
if token in ('async', 'await'):
if async_def:
yield (ASYNC if token == 'async' else AWAIT,
token, spos, epos, line)
continue
tok = (NAME, token, spos, epos, line)
if token == 'async' and not stashed:
stashed = tok
continue
if token in ('def', 'for'):
if (stashed
and stashed[0] == NAME
and stashed[1] == 'async'):
if token == 'def':
async_def = True
async_def_indent = indents[-1]
yield (ASYNC, stashed[1],
stashed[2], stashed[3],
stashed[4])
stashed = None
if stashed:
yield stashed
stashed = None
yield tok
elif initial == '\\': # continued stmt
# This yield is new; needed for better idempotency:
if stashed:
yield stashed
stashed = None
yield (NL, token, spos, (lnum, pos), line)
continued = 1
elif initial == '$':
if stashed:
yield stashed
stashed = None
yield (DOLLARNAME, token, spos, epos, line)
else:
if initial in '([{': parenlev = parenlev + 1
elif initial in ')]}': parenlev = parenlev - 1
if stashed:
yield stashed
stashed = None
yield (OP, token, spos, epos, line)
else:
yield (ERRORTOKEN, line[pos],
(lnum, pos), (lnum, pos+1), line)
pos = pos + 1
if stashed:
yield stashed
stashed = None
for indent in indents[1:]: # pop remaining indent levels
yield (DEDENT, '', (lnum, 0), (lnum, 0), '')
yield (ENDMARKER, '', (lnum, 0), (lnum, 0), '')
if __name__ == '__main__': # testing
import sys
if len(sys.argv) > 1: tokenize(open(sys.argv[1]).readline)
else: tokenize(sys.stdin.readline)

View File

@@ -0,0 +1,56 @@
# Copyright 2006 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Export the Python grammar and symbols."""
# Python imports
import os
# Local imports
from .pgen2 import token
from .pgen2 import driver
# The grammar file
_GRAMMAR_FILE = "Grammar.txt"
class Symbols(object):
def __init__(self, grammar):
"""Initializer.
Creates an attribute for each grammar symbol (nonterminal),
whose value is the symbol's type (an int >= 256).
"""
for name, symbol in grammar.symbol2number.items():
setattr(self, name, symbol)
def initialize(cache_dir=None):
global python2_grammar
global python2_grammar_no_print_statement
global python3_grammar
global python3_grammar_no_async
global python_symbols
python_grammar = driver.load_grammar("blib2to3", _GRAMMAR_FILE)
python_symbols = Symbols(python_grammar)
# Python 2
python2_grammar = python_grammar.copy()
del python2_grammar.keywords["async"]
del python2_grammar.keywords["await"]
# Python 2 + from __future__ import print_function
python2_grammar_no_print_statement = python2_grammar.copy()
del python2_grammar_no_print_statement.keywords["print"]
# Python 3
python3_grammar = python_grammar
del python3_grammar.keywords["print"]
del python3_grammar.keywords["exec"]
#Python 3 wihtout async or await
python3_grammar_no_async = python3_grammar.copy()
del python3_grammar_no_async.keywords["async"]
del python3_grammar_no_async.keywords["await"]

View File

@@ -0,0 +1,29 @@
# Copyright 2006 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""
Python parse tree definitions.
This is a very concrete parse tree; we need to keep every token and
even the comments and whitespace between tokens.
There's also a pattern matching implementation here.
"""
__author__ = "Guido van Rossum <guido@python.org>"
import sys
from io import StringIO
HUGE = 0x7FFFFFFF # maximum repeat count, default max
_type_reprs = {}
def type_repr(type_num):
global _type_reprs
if not _type_reprs:
from .pygram import python_symbols
# printing tokens is possible but not as useful
# from .pgen2 import token // token.__dict__.items():
for name, val in python_symbols.__dict__.items():
if type(val) == int: _type_reprs[val] = name
return _type_reprs.setdefault(type_num, type_num)

View File

View File

@@ -0,0 +1,106 @@
#!/usr/bin/python3
import sys
import logging
import os
import os.path
import re
from packaging.specifiers import SpecifierSet
from packaging.version import Version
import buildtools.semmle.requirements as requirements
logging.basicConfig(level=logging.WARNING)
def pip_install(req, venv, dependencies=True, wheel=True):
venv.upgrade_pip()
tmp = requirements.save_to_file([req])
#Install the requirements using the venv python
args = [ "install", "-r", tmp]
if dependencies:
print("Installing %s with dependencies." % req)
elif wheel:
print("Installing %s without dependencies." % req)
args += [ "--no-deps"]
else:
print("Installing %s without dependencies or wheel." % req)
args += [ "--no-deps", "--no-binary", ":all:"]
print("Calling " + " ".join(args))
venv.pip(args)
os.remove(tmp)
def restrict_django(reqs):
for req in reqs:
if sys.version_info[0] < 3 and req.name.lower() == "django":
if Version("2") in req.specifier:
req.specifier = SpecifierSet("<2")
return reqs
ignored_packages = [
"pyobjc-.*",
"pypiwin32",
"frida",
"pyopenssl", # Installed by pip. Don't mess with its version.
"wxpython", # Takes forever to compile all the C code.
"cryptography", #Installed by pyOpenSSL and thus by pip. Don't mess with its version.
"psycopg2", #psycopg2 version 2.6 fails to install.
]
if os.name != "nt":
ignored_packages.append("pywin32") #Only works on Windows
ignored_package_regex = re.compile("|".join(ignored_packages))
def non_ignored(reqs):
filtered_reqs = []
for req in reqs:
if ignored_package_regex.match(req.name.lower()) is not None:
logging.info("Package %s is ignored. Skipping." % req.name)
else:
filtered_reqs += [req]
return filtered_reqs
def try_install_with_deps(req, venv):
try:
pip_install(req, venv, dependencies=True)
except Exception as ex:
logging.warn("Failed to install all dependencies for " + req.name)
logging.info(ex)
try:
pip_install(req, venv, dependencies = False)
except Exception:
pip_install(req, venv, dependencies = False, wheel = False)
def install(reqs, venv):
'''Attempt to install a sufficient and stable set of dependencies from the requirements.txt file.
First of all we 'clean' the requirements, removing contradictory version numbers.
Then we attempt to install the restricted version of each dependency, and , should that fail,
we install the unrestricted version. If that fails, the whole installation fails.
Once the immediate dependencies are installed, we then (attempt to ) install the dependencies.
Returns True if installation was successful. False otherwise.
`reqs` should be a string containing all requirements separated by newlines or a list of
strings with each string being a requirement.
'''
if isinstance(reqs, str):
reqs = reqs.split("\n")
reqs = requirements.parse(reqs)
reqs = restrict_django(reqs)
reqs = non_ignored(reqs)
cleaned = requirements.clean(reqs)
restricted = requirements.restrict(reqs)
for i, req in enumerate(restricted):
try:
try_install_with_deps(req, venv)
except Exception as ex1:
try:
try_install_with_deps(cleaned[i], venv)
except Exception as ex2:
logging.error("Failed to install " + req.name)
logging.warning(ex2)
return False
logging.info("Failed to install restricted form of " + req.name)
logging.info(ex1)
return True

View File

@@ -0,0 +1,65 @@
import sys
import os
from buildtools import version
DEFAULT_VERSION = 3
def get_relative_root(root_identifiers):
if any([os.path.exists(identifier) for identifier in root_identifiers]):
print("Source root appears to be the real root.")
return "."
found = set()
for directory in next(os.walk("."))[1]:
if any([os.path.exists(os.path.join(directory, identifier)) for identifier in root_identifiers]):
found.add(directory)
if not found:
print("No directories containing root identifiers were found. Returning working directory as root.")
return "."
if len(found) > 1:
print("Multiple possible root directories found. Returning working directory as root.")
return "."
root = found.pop()
print("'%s' appears to be the root." % root)
return root
def get_root(*root_identifiers):
return os.path.abspath(get_relative_root(root_identifiers))
REQUIREMENTS_TAG = "LGTM_PYTHON_SETUP_REQUIREMENTS_FILES"
def find_requirements(dir):
if REQUIREMENTS_TAG in os.environ:
val = os.environ[REQUIREMENTS_TAG]
if val == "false":
return []
paths = [ os.path.join(dir, line.strip()) for line in val.splitlines() ]
for p in paths:
if not os.path.exists(p):
raise IOError(p + " not found")
return paths
candidates = ["requirements.txt", "test-requirements.txt"]
return [ path if os.path.exists(path) else "" for path in [ os.path.join(dir, file) for file in candidates] ]
def discover(default_version=DEFAULT_VERSION):
"""Discover things about the Python checkout and return a version, root, requirement-files triple."""
root = get_root("requirements.txt", "setup.py")
v = version.best_version(root, default_version)
# Unify the requirements or just get path to requirements...
requirement_files = find_requirements(root)
return v, root, requirement_files
def get_version(default_version=DEFAULT_VERSION):
root = get_root("requirements.txt", "setup.py")
return version.best_version(root, default_version)
def main():
if len(sys.argv) > 1:
print(discover(int(sys.argv[1])))
else:
print(discover())
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,16 @@
import os
import traceback
import re
SCRIPTDIR = os.path.split(os.path.dirname(__file__))[1]
def print_exception_indented(opt=None):
exc_text = traceback.format_exc()
for line in exc_text.splitlines():
# remove path information that might be sensitive
# for example, in the .pyc files for Python 2, a traceback would contain
# /home/rasmus/code/target/thirdparty/python/build/extractor-python/buildtools/install.py
line = re.sub(r'File \".*' + SCRIPTDIR + r'(.*)\",', r'File <'+ SCRIPTDIR + r'\1>', line)
print(' ' + line)

View File

@@ -0,0 +1,429 @@
import sys
import os
import subprocess
import csv
if sys.version_info < (3,):
from urlparse import urlparse
from urllib import url2pathname
else:
from urllib.parse import urlparse
from urllib.request import url2pathname
from buildtools import discover
from buildtools import install
from buildtools.version import executable, extractor_executable
INCLUDE_TAG = "LGTM_INDEX_INCLUDE"
EXCLUDE_TAG = "LGTM_INDEX_EXCLUDE"
FILTER_TAG = "LGTM_INDEX_FILTERS"
PATH_TAG = "LGTM_INDEX_IMPORT_PATH"
REPO_FOLDERS_TAG = "LGTM_REPOSITORY_FOLDERS_CSV"
REPO_EXCLUDE_KINDS = "metadata", "external"
# These are the levels that the CodeQL CLI supports, in order of increasing verbosity.
CLI_LOGGING_LEVELS = ['off', 'errors', 'warnings', 'progress', 'progress+', 'progress++', 'progress+++']
# These are the verbosity levels used internally in the extractor. The indices of these levels
# should match up with the corresponding constants in the semmle.logging module.
EXTRACTOR_LOGGING_LEVELS = ['off', 'errors', 'warnings', 'info', 'debug', 'trace']
def trap_cache():
return os.path.join(os.environ["LGTM_WORKSPACE"], "trap_cache")
def split_into_options(lines, opt):
opts = []
for line in lines.split("\n"):
line = line.strip()
if line:
opts.append(opt)
opts.append(line)
return opts
def get_include_options():
if INCLUDE_TAG in os.environ:
return split_into_options(os.environ[INCLUDE_TAG], "-R")
else:
src = os.environ["LGTM_SRC"]
return [ "-R", src]
def get_exclude_options():
options = []
if EXCLUDE_TAG in os.environ:
options.extend(split_into_options(os.environ[EXCLUDE_TAG], "-Y"))
if REPO_FOLDERS_TAG not in os.environ:
return options
with open(os.environ[REPO_FOLDERS_TAG]) as csv_file:
csv_reader = csv.reader(csv_file)
next(csv_reader) # discard header
for kind, url in csv_reader:
if kind not in REPO_EXCLUDE_KINDS:
continue
try:
path = url2pathname(urlparse(url).path)
except:
print("Unable to parse '" + url + "' as file url.")
else:
options.append("-Y")
options.append(path)
return options
def get_filter_options():
if FILTER_TAG in os.environ:
return split_into_options(os.environ[FILTER_TAG], "--filter")
else:
return []
def get_path_options(version):
# We want to stop extracting libraries, and only extract the code that is in the
# repo. While in the transition period for stopping to install dependencies in the
# codeql-action, we will need to be able to support both old and new behavior.
#
# Like PYTHONUNBUFFERED for Python, we treat any non-empty string as meaning the
# flag is enabled.
# https://docs.python.org/3/using/cmdline.html#envvar-PYTHONUNBUFFERED
if os.environ.get("CODEQL_EXTRACTOR_PYTHON_DISABLE_LIBRARY_EXTRACTION"):
return []
# Not extracting dependencies will be default in CodeQL CLI release 2.16.0. Until
# 2.17.0, we provide an escape hatch to get the old behavior.
force_enable_envvar_name = "CODEQL_EXTRACTOR_PYTHON_FORCE_ENABLE_LIBRARY_EXTRACTION_UNTIL_2_17_0"
if os.environ.get(force_enable_envvar_name):
print("WARNING: We plan to remove the availability of the {} option in CodeQL CLI release 2.17.0 and beyond. Please let us know by submitting an issue to https://github.com/github/codeql why you needed to re-enable dependency extraction.".format(force_enable_envvar_name))
path_option = [ "-p", install.get_library(version)]
if PATH_TAG in os.environ:
path_option = split_into_options(os.environ[PATH_TAG], "-p") + path_option
return path_option
else:
print("INFO: The Python extractor has recently (from 2.16.0 CodeQL CLI release) stopped extracting dependencies by default, and therefore stopped analyzing the source code of dependencies by default. We plan to remove this entirely in CodeQL CLI release 2.17.0. If you encounter problems, please let us know by submitting an issue to https://github.com/github/codeql, so we can consider adjusting our plans. It is possible to re-enable dependency extraction by exporting '{}=1'.".format(force_enable_envvar_name))
return []
def get_stdlib():
return os.path.dirname(os.__file__)
def exclude_pip_21_3_build_dir_options():
"""
Handle build/ dir from `pip install .` (new in pip 21.3)
Starting with pip 21.3, in-tree builds are now the default (see
https://pip.pypa.io/en/stable/news/#v21-3). This means that pip commands that build
the package (like `pip install .` or `pip wheel .`), will leave a copy of all the
package source code in `build/lib/<package-name>/`.
If that is done before invoking the extractor, we will end up extracting that copy
as well, which is very bad (especially for points-to performance). So with this
function we try to find such folders, so they can be excluded from extraction.
The only reliable sign is that inside the `build` folder, there must be a `lib`
subfolder, and there must not be any ordinary files.
When the `wheel` package is installed there will also be a `bdist.linux-x86_64`
subfolder. Although most people have the `wheel` package installed, it's not
required, so we don't use that in the logic.
"""
# As a failsafe, we include logic to disable this functionality based on an
# environment variable.
#
# Like PYTHONUNBUFFERED for Python, we treat any non-empty string as meaning the
# flag is enabled.
# https://docs.python.org/3/using/cmdline.html#envvar-PYTHONUNBUFFERED
if os.environ.get("CODEQL_EXTRACTOR_PYTHON_DISABLE_AUTOMATIC_PIP_BUILD_DIR_EXCLUDE"):
return []
include_dirs = set(get_include_options()[1::2])
# For the purpose of exclusion, we normalize paths to their absolute path, just like
# we do in the actual traverser.
exclude_dirs = set(os.path.abspath(path) for path in get_exclude_options()[1::2])
to_exclude = list()
def walk_dir(dirpath):
if os.path.abspath(dirpath) in exclude_dirs:
return
contents = os.listdir(dirpath)
paths = [os.path.join(dirpath, c) for c in contents]
dirs = [path for path in paths if os.path.isdir(path)]
dirnames = [os.path.basename(path) for path in dirs]
# Allow Python package such as `mypkg.build.lib`, so if we see an `__init__.py`
# file in the current dir don't walk the tree further.
if "__init__.py" in contents:
return
# note that we don't require that there by a `setup.py` present beside the
# `build/` dir, since that is not required to build a package -- see
# https://pgjones.dev/blog/packaging-without-setup-py-2020
#
# Although I didn't observe `pip install .` with a package that uses `poetry` as
# the build-system leave behind a `build/` directory, that doesn't mean it
# couldn't happen.
if os.path.basename(dirpath) == "build" and "lib" in dirnames and dirs == paths:
to_exclude.append(dirpath)
return # no need to walk the sub directories
for dir in dirs:
# We ignore symlinks, as these can present infinite loops, and any folders
# they can point to will be handled on their own anyway.
if not os.path.islink(dir):
walk_dir(dir)
for top in include_dirs:
walk_dir(top)
options = []
if to_exclude:
print(
"Excluding the following directories from extraction, since they look like "
"in-tree build directories generated by pip: {}".format(to_exclude)
)
print(
"You can disable this behavior by setting the environment variable "
"CODEQL_EXTRACTOR_PYTHON_DISABLE_AUTOMATIC_PIP_BUILD_DIR_EXCLUDE=1"
)
for dirpath in to_exclude:
options.append("-Y") # `-Y` is the same as `--exclude-file`
options.append(dirpath)
return options
def exclude_venvs_options():
"""
If there are virtual environments (venv) present within the directory that is being
extracted, we don't want to recurse into all of these and extract all the Python
source code.
This function tries to find such venvs, and produce the right options to ignore
them.
"""
# As a failsafe, we include logic to disable this functionality based on an
# environment variable.
#
# Like PYTHONUNBUFFERED for Python, we treat any non-empty string as meaning the
# flag is enabled.
# https://docs.python.org/3/using/cmdline.html#envvar-PYTHONUNBUFFERED
if os.environ.get("CODEQL_EXTRACTOR_PYTHON_DISABLE_AUTOMATIC_VENV_EXCLUDE"):
return []
include_dirs = set(get_include_options()[1::2])
# For the purpose of exclusion, we normalize paths to their absolute path, just like
# we do in the actual traverser.
exclude_dirs = set(os.path.abspath(path) for path in get_exclude_options()[1::2])
to_exclude = []
def walk_dir(dirpath):
if os.path.abspath(dirpath) in exclude_dirs:
return
paths = [os.path.join(dirpath, c) for c in os.listdir(dirpath)]
dirs = [path for path in paths if os.path.isdir(path)]
dirnames = [os.path.basename(path) for path in dirs]
# we look for `<venv>/Lib/site-packages` (Windows) or
# `<venv>/lib/python*/site-packages` (unix) without requiring any other files to
# be present.
#
# Initially we had implemented some more advanced logic to only ignore venvs
# that had a `pyvenv.cfg` or a suitable activate scripts. But reality turned out
# to be less reliable, so now we just ignore any venv that has a proper
# `site-packages` as a subfolder.
#
# This logic for detecting a virtual environment was based on the CPython implementation, see:
# - https://github.com/python/cpython/blob/4575c01b750cd26377e803247c38d65dad15e26a/Lib/venv/__init__.py#L122-L131
# - https://github.com/python/cpython/blob/4575c01b750cd26377e803247c38d65dad15e26a/Lib/venv/__init__.py#L170
#
# Some interesting examples:
# - windows without `activate`: https://github.com/NTUST/106-team4/tree/7f902fec29f68ca44d4f4385f2d7714c2078c937/finalPage/finalVENV/Scripts
# - windows with `activate`: https://github.com/Lynchie/KCM/tree/ea9eeed07e0c9eec41f9fc7480ce90390ee09876/VENV/Scripts
# - without `pyvenv.cfg`: https://github.com/FiacreT/M-moire/tree/4089755191ffc848614247e98bbb641c1933450d/osintplatform/testNeo/venv
# - without `pyvenv.cfg`: https://github.com/Lynchie/KCM/tree/ea9eeed07e0c9eec41f9fc7480ce90390ee09876/VENV
# - without `pyvenv.cfg`: https://github.com/mignonjia/NetworkingProject/tree/a89fe12ffbf384095766aadfe6454a4c0062d1e7/crud/venv
#
# I'm quite sure I saw some project on LGTM that had neither `pyvenv.cfg` or an activate script, but I could not find the reference again.
if "Lib" in dirnames:
has_site_packages_folder = os.path.exists(os.path.join(dirpath, "Lib", "site-packages"))
elif "lib" in dirnames:
lib_path = os.path.join(dirpath, "lib")
python_folders = [dirname for dirname in os.listdir(lib_path) if dirname.startswith("python")]
has_site_packages_folder = bool(python_folders) and any(
os.path.exists(os.path.join(dirpath, "lib", python_folder, "site-packages")) for python_folder in python_folders
)
else:
has_site_packages_folder = False
if has_site_packages_folder:
to_exclude.append(dirpath)
return # no need to walk the sub directories
for dir in dirs:
# We ignore symlinks, as these can present infinite loops, and any folders
# they can point to will be handled on their own anyway.
if not os.path.islink(dir):
walk_dir(dir)
for top in include_dirs:
walk_dir(top)
options = []
if to_exclude:
print(
"Excluding the following directories from extraction, since they look like "
"virtual environments: {}".format(to_exclude)
)
print(
"You can disable this behavior by setting the environment variable "
"CODEQL_EXTRACTOR_PYTHON_DISABLE_AUTOMATIC_VENV_EXCLUDE=1"
)
for dirpath in to_exclude:
options.append("-Y") # `-Y` is the same as `--exclude-file`
options.append(dirpath)
return options
def get_extractor_logging_level(s: str):
"""Returns a integer value corresponding to the logging level specified by the string s, or `None` if s is invalid."""
try:
return EXTRACTOR_LOGGING_LEVELS.index(s)
except ValueError:
return None
def get_cli_logging_level(s: str):
"""Returns a integer value corresponding to the logging level specified by the string s, or `None` if s is invalid."""
try:
return CLI_LOGGING_LEVELS.index(s)
except ValueError:
return None
def get_logging_options():
# First look for the extractor-specific option
verbosity_level = os.environ.get("CODEQL_EXTRACTOR_PYTHON_OPTION_LOGGING_VERBOSITY", None)
if verbosity_level is not None:
level = get_extractor_logging_level(verbosity_level)
if level is None:
level = get_cli_logging_level(verbosity_level)
if level is None:
# This is unlikely to be reached in practice, as the level should be validated by the CLI.
raise ValueError(
"Invalid verbosity level: {}. Valid values are: {}".format(
verbosity_level, ", ".join(set(EXTRACTOR_LOGGING_LEVELS + CLI_LOGGING_LEVELS))
)
)
return ["--verbosity", str(level)]
# Then look for the CLI-wide option
cli_verbosity_level = os.environ.get("CODEQL_VERBOSITY", None)
if cli_verbosity_level is not None:
level = get_cli_logging_level(cli_verbosity_level)
if level is None:
# This is unlikely to be reached in practice, as the level should be validated by the CLI.
raise ValueError(
"Invalid verbosity level: {}. Valid values are: {}".format(
cli_verbosity_level, ", ".join(CLI_LOGGING_LEVELS)
)
)
return ["--verbosity", str(level)]
# Default behaviour: turn on verbose mode:
return ["-v"]
def extractor_options(version):
options = []
options += get_logging_options()
# use maximum number of processes
options += ["-z", "all"]
# cache trap files
options += ["-c", trap_cache()]
options += get_path_options(version)
options += get_include_options()
options += get_exclude_options()
options += get_filter_options()
options += exclude_pip_21_3_build_dir_options()
options += exclude_venvs_options()
return options
def site_flag(version):
#
# Disabling site with -S (which we do by default) has been observed to cause
# problems at some customers. We're not entirely sure enabling this by default is
# going to be 100% ok, so for now we just want to disable this flag if running with
# it turns out to be a problem (which we check for).
#
# see https://docs.python.org/3/library/site.html
#
# I don't see any reason for running with -S when invoking the tracer in this
# scenario. If we were using the executable from a virtual environment after
# installing PyPI packages, running without -S would allow one of those packages to
# influence the behavior of the extractor, as was the problem for CVE-2020-5252
# (described in https://github.com/akoumjian/python-safety-vuln). But since this is
# not the case, I don't think there is any advantage to running with -S.
# Although we have an automatic way that should detect when we should not be running
# with -S, we're not 100% certain that it is not possible to create _other_ strange
# Python installations where `gzip` could be available, but the rest of the standard
# library still not being available. Therefore we're going to keep this environment
# variable, just to make sure there is an easy fall-back in those cases.
#
# Like PYTHONUNBUFFERED for Python, we treat any non-empty string as meaning the
# flag is enabled.
# https://docs.python.org/3/using/cmdline.html#envvar-PYTHONUNBUFFERED
if os.environ.get("CODEQL_EXTRACTOR_PYTHON_ENABLE_SITE"):
return []
try:
# In the cases where customers had problems, `gzip` was the first module
# encountered that could not be loaded, so that's the one we check for. Note
# that this has nothing to do with it being problematic to add GZIP support to
# Python :)
args = executable(version) + ["-S", "-c", "import gzip"]
subprocess.check_call(args)
return ["-S"]
except (subprocess.CalledProcessError, Exception):
print("Running without -S")
return []
def get_analysis_version(major_version):
"""Gets the version of Python that we _analyze_ the code as being written for.
The return value is a string, e.g. "3.11" or "2.7.18". Populating the `major_version`,
`minor_version` and `micro_version` predicates is done inside the CodeQL libraries.
"""
# If the version is already specified, simply reuse it.
if "CODEQL_EXTRACTOR_PYTHON_ANALYSIS_VERSION" in os.environ:
return os.environ["CODEQL_EXTRACTOR_PYTHON_ANALYSIS_VERSION"]
elif major_version == 2:
return "2.7.18" # Last officially supported version
else:
return "3.12" # This should always be the latest supported version
def main():
version = discover.get_version()
tracer = os.path.join(os.environ["SEMMLE_DIST"], "tools", "python_tracer.py")
args = extractor_executable() + site_flag(3) + [tracer] + extractor_options(version)
print("Calling " + " ".join(args))
sys.stdout.flush()
sys.stderr.flush()
env = os.environ.copy()
env["CODEQL_EXTRACTOR_PYTHON_ANALYSIS_VERSION"] = get_analysis_version(version)
subprocess.check_call(args, env=env)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,123 @@
import sys
import os
import subprocess
import re
import ast
import tempfile
from buildtools import unify_requirements
from buildtools.version import executable
from buildtools.version import WIN
from buildtools.helper import print_exception_indented
def call(args, cwd=None):
print("Calling " + " ".join(args))
sys.stdout.flush()
sys.stderr.flush()
subprocess.check_call(args, cwd=cwd)
class Venv(object):
def __init__(self, path, version):
self.environ = {}
self.path = path
exe_ext = [ "Scripts", "python.exe" ] if WIN else [ "bin", "python" ]
self.venv_executable = os.path.join(self.path, *exe_ext)
self._lib = None
self.pip_upgraded = False
self.empty_folder = tempfile.mkdtemp(prefix="empty", dir=os.environ["LGTM_WORKSPACE"])
self.version = version
def create(self):
if self.version < 3:
venv = ["-m", "virtualenv", "--never-download"]
else:
venv = ["-m", "venv"]
call(executable(self.version) + venv + [self.path], cwd=self.empty_folder)
def upgrade_pip(self):
'Make sure that pip has been upgraded to latest version'
if self.pip_upgraded:
return
self.pip([ "install", "--upgrade", "pip"])
self.pip_upgraded = True
def pip(self, args):
call([self.venv_executable, "-m", "pip"] + args, cwd=self.empty_folder)
@property
def lib(self):
if self._lib is None:
try:
tools = os.path.join(os.environ['SEMMLE_DIST'], "tools")
get_venv_lib = os.path.join(tools, "get_venv_lib.py")
if os.path.exists(self.venv_executable):
python_executable = [self.venv_executable]
else:
python_executable = executable(self.version)
args = python_executable + [get_venv_lib]
print("Calling " + " ".join(args))
sys.stdout.flush()
sys.stderr.flush()
self._lib = subprocess.check_output(args)
if sys.version_info >= (3,):
self._lib = str(self._lib, sys.getfilesystemencoding())
self._lib = self._lib.rstrip("\r\n")
except:
lib_ext = ["Lib"] if WIN else [ "lib" ]
self._lib = os.path.join(self.path, *lib_ext)
print('Error trying to run get_venv_lib (this is Python {})'.format(sys.version[:5]))
print_exception_indented()
return self._lib
def venv_path():
return os.path.join(os.environ["LGTM_WORKSPACE"], "venv")
def system_packages(version):
output = subprocess.check_output(executable(version) + [ "-c", "import sys; print(sys.path)"])
if sys.version_info >= (3,):
output = str(output, sys.getfilesystemencoding())
paths = ast.literal_eval(output.strip())
return [ path for path in paths if ("dist-packages" in path or "site-packages" in path) ]
REQUIREMENTS_TAG = "LGTM_PYTHON_SETUP_REQUIREMENTS"
EXCLUDE_REQUIREMENTS_TAG = "LGTM_PYTHON_SETUP_EXCLUDE_REQUIREMENTS"
def main(version, root, requirement_files):
# We import `auto_install` here, as it has a dependency on the `packaging`
# module. For the CodeQL CLI (where we do not install any packages) we never
# run the `main` function, and so there is no need to always import this
# dependency.
from buildtools import auto_install
print("version, root, requirement_files", version, root, requirement_files)
venv = Venv(venv_path(), version)
venv.create()
if REQUIREMENTS_TAG in os.environ:
if not auto_install.install(os.environ[REQUIREMENTS_TAG], venv):
sys.exit(1)
requirements_from_setup = os.path.join(os.environ["LGTM_WORKSPACE"], "setup_requirements.txt")
args = [ venv.venv_executable, os.path.join(os.environ["SEMMLE_DIST"], "tools", "convert_setup.py"), root, requirements_from_setup] + system_packages(version)
print("Calling " + " ".join(args))
sys.stdout.flush()
sys.stderr.flush()
#We don't care if this fails, we only care if `requirements_from_setup` was created.
subprocess.call(args)
if os.path.exists(requirements_from_setup):
requirement_files = [ requirements_from_setup ] + requirement_files[1:]
print("Requirement files: " + str(requirement_files))
requirements = unify_requirements.gather(requirement_files)
if EXCLUDE_REQUIREMENTS_TAG in os.environ:
excludes = os.environ[EXCLUDE_REQUIREMENTS_TAG].splitlines()
print("Excluding ", excludes)
regex = re.compile("|".join(exclude + r'\b' for exclude in excludes))
requirements = [ req for req in requirements if not regex.match(req) ]
err = 0 if auto_install.install(requirements, venv) else 1
sys.exit(err)
def get_library(version):
return Venv(venv_path(), version).lib
if __name__ == "__main__":
version, root, requirement_files = sys.argv[1], sys.argv[2], sys.argv[3:]
version = int(version)
main(version, root, requirement_files)

View File

@@ -0,0 +1,136 @@
import copy
import tempfile
import re
from packaging.requirements import Requirement
from packaging.version import Version
from packaging.specifiers import SpecifierSet
IGNORED_REQUIREMENTS = re.compile("^(-e\\s+)?(git|svn|hg)(?:\\+.*)?://.*$")
def parse(lines):
'Parse a list of requirement strings into a list of `Requirement`s'
res = []
#Process
for line in lines:
if '#' in line:
line, _ = line.split('#', 1)
if not line:
continue
if IGNORED_REQUIREMENTS.match(line):
continue
try:
req = Requirement(line)
except:
print("Cannot parse requirements line '%s'" % line)
else:
res.append(req)
return res
def parse_file(filename):
with open(filename, 'r') as fd:
return parse(fd.read().splitlines())
def save_to_file(reqs):
'Takes a list of requirements, saves them to a temporary file and returns the filename'
with tempfile.NamedTemporaryFile(prefix="semmle-requirements", suffix=".txt", mode="w", delete=False) as fd:
for req in reqs:
if req.url is None:
fd.write(str(req))
else:
fd.write(req.url)
fd.write("\n")
return fd.name
def clean(reqs):
'Look for self-contradictory specifier groups and remove the necessary specifier parts to make them consistent'
result = []
for req in reqs:
specs = req.specifier
cleaned_specs = _clean_specs(specs)
req.specifier = cleaned_specs
result.append(Requirement(str(req)))
req.specifier = specs
return result
def _clean_specs(specs):
ok = SpecifierSet()
#Choose a deterministic order such that >= comes before <=.
for spec in sorted(iter(specs), key=str, reverse=True):
for ok_spec in ok:
if not _compatible_specifier(ok_spec, spec):
break
else:
ok &= SpecifierSet(str(spec))
return ok
def restrict(reqs):
'''Restrict versions to "compatible" versions.
For example restrict >=1.2 to all versions >= 1.2 that have 1 as the major version number.
>=N... becomes >=N...,==N.* and >N... requirements becomes >N..,==N.*
'''
#First of all clean the requirements
reqs = clean(reqs)
result = []
for req in reqs:
specs = req.specifier
req.specifier = _restrict_specs(specs)
result.append(Requirement(str(req)))
req.specifier = specs
return result
def _restrict_specs(specs):
restricted = copy.deepcopy(specs)
#Iteration order doesn't really matter here so we choose the
#same as for clean, just to be consistent
for spec in sorted(iter(specs), key=str, reverse=True):
if spec.operator in ('>', '>='):
base_version = spec.version.split(".", 1)[0]
restricted &= SpecifierSet('==' + base_version + '.*')
return restricted
def _compatible_specifier(s1, s2):
overlaps = 0
overlaps += _min_version(s1) in s2
overlaps += _max_version(s1) in s2
overlaps += _min_version(s2) in s1
overlaps += _max_version(s2) in s1
if overlaps > 1:
return True
if overlaps == 1:
#One overlap -- Generally compatible, but not for <x, >=x
return not _is_strict(s1) and not _is_strict(s2)
#overlaps == 0:
return False
MIN_VERSION = Version('0.0a0')
MAX_VERSION = Version('1000000')
def _min_version(s):
if s.operator in ('>', '>='):
return s.version
elif s.operator in ('<', '<=', '!='):
return MIN_VERSION
elif s.operator == '==':
v = s.version
if v[-1] == '*':
return v[:-1] + '0'
else:
return s.version
else:
# '~='
return s.version
def _max_version(s):
if s.operator in ('<', '<='):
return s.version
elif s.operator in ('>', '>=', '!='):
return MAX_VERSION
elif s.operator in ('~=', '=='):
v = s.version
if v[-1] == '*' or s.operator == '~=':
return v[:-1] + '1000000'
else:
return s.version
def _is_strict(s):
return s.operator in ('>', '<')

View File

@@ -0,0 +1,14 @@
# this is a setup file for `tox`, which allows us to run test locally against multiple python
# versions. Simply run `tox` in the directory of this file!
#
# install tox with `pipx install tox` or whatever your preferred way is :)
[tox]
envlist = py27,py3
skipsdist=True
[testenv]
# install <deps> in the virtualenv where commands will be executed
deps = pytest
commands =
pytest

View File

@@ -0,0 +1,36 @@
#!/usr/bin/env python
import os
import re
def get_requirements(file_path):
if not file_path:
return []
with open(file_path, "r") as requirements_file:
lines = requirements_file.read().splitlines()
for line_no, line in enumerate(lines):
match = re.search("^\\s*-r\\s+([^#]+)", line)
if match:
include_file_path = os.path.join(os.path.dirname(file_path), match.group(1).strip())
include_requirements = get_requirements(include_file_path)
lines[line_no:line_no+1] = include_requirements
return lines
def deduplicate(requirements):
result = []
seen = set()
for req in requirements:
if req in seen:
continue
result.append(req)
seen.add(req)
return result
def gather(requirement_files):
requirements = []
for file in requirement_files:
requirements += get_requirements(file)
requirements = deduplicate(requirements)
print("Requirements:")
for r in requirements:
print(" {}".format(r))
return requirements

View File

@@ -0,0 +1,223 @@
import sys
import os
import subprocess
import tokenize
import re
from buildtools.helper import print_exception_indented
TROVE = re.compile(r"Programming Language\s+::\s+Python\s+::\s+(\d)")
if sys.version_info > (3,):
import collections.abc as collections
file_open = tokenize.open
else:
import collections
file_open = open
WIN = sys.platform == "win32"
if WIN:
# installing `py` launcher is optional when installing Python on windows, so it's
# possible that the user did not install it, see
# https://github.com/github/codeql-cli-binaries/issues/125#issuecomment-1157429430
# so we check whether it has been installed. Newer versions have a `--list` option,
# but that has only been mentioned in the docs since 3.9, so to not risk it not
# working on potential older versions, we'll just use `py --version` which forwards
# the `--version` argument to the default python executable.
try:
subprocess.check_call(["py", "--version"])
except (subprocess.CalledProcessError, Exception):
sys.stderr.write("The `py` launcher is required for CodeQL to work on Windows.")
sys.stderr.write("Please include it when installing Python for Windows.")
sys.stderr.write("see https://docs.python.org/3/using/windows.html#python-launcher-for-windows")
sys.stderr.flush()
sys.exit(4) # 4 was a unique exit code at the time of writing
AVAILABLE_VERSIONS = []
def set_available_versions():
"""Sets the global `AVAILABLE_VERSIONS` to a list of available (major) Python versions."""
global AVAILABLE_VERSIONS
if AVAILABLE_VERSIONS:
return # already set
for version in [3, 2]:
try:
subprocess.check_call(" ".join(executable_name(version) + ["-c", "pass"]), shell=True)
AVAILABLE_VERSIONS.append(version)
except Exception:
pass # If not available, we simply don't add it to the list
if not AVAILABLE_VERSIONS:
# If neither 'python3' nor 'python2' is available, we'll just try 'python' and hope for the best
AVAILABLE_VERSIONS = ['']
def executable(version):
"""Returns the executable to use for the given Python version."""
global AVAILABLE_VERSIONS
set_available_versions()
if version not in AVAILABLE_VERSIONS:
available_version = AVAILABLE_VERSIONS[0]
print("Wanted to run Python %s, but it is not available. Using Python %s instead" % (version, available_version))
version = available_version
return executable_name(version)
def executable_name(version):
if WIN:
return ["py", "-%s" % version]
else:
return ["python%s" % version]
PREFERRED_PYTHON_VERSION = None
def extractor_executable():
'''
Returns the executable to use for the extractor.
If a Python executable name is specified using the extractor option, returns that name.
In the absence of a user-specified executable name, returns the executable name for
Python 3 if it is available, and Python 2 if not.
'''
executable_name = os.environ.get("CODEQL_EXTRACTOR_PYTHON_OPTION_PYTHON_EXECUTABLE_NAME", None)
if executable_name is not None:
print("Using Python executable name provided via the python_executable_name extractor option: {}"
.format(executable_name)
)
return [executable_name]
# Call machine_version() to ensure we've set PREFERRED_PYTHON_VERSION
if PREFERRED_PYTHON_VERSION is None:
machine_version()
return executable(PREFERRED_PYTHON_VERSION)
def machine_version():
"""If only Python 2 or Python 3 is installed, will return that version"""
global PREFERRED_PYTHON_VERSION
print("Trying to guess Python version based on installed versions")
if sys.version_info > (3,):
this, other = 3, 2
else:
this, other = 2, 3
try:
exe = executable(other)
# We need `shell=True` here in order for the test framework to function correctly. For
# whatever reason, the `PATH` variable is ignored if `shell=False`.
# Also, this in turn forces us to give the whole command as a string, rather than a list.
# Otherwise, the effect is that the Python interpreter is invoked _as a REPL_, rather than
# with the given piece of code.
subprocess.check_call(" ".join(exe + [ "-c", "pass" ]), shell=True)
print("This script is running Python {}, but Python {} is also available (as '{}')"
.format(this, other, ' '.join(exe))
)
# If both versions are available, our preferred version is Python 3
PREFERRED_PYTHON_VERSION = 3
return None
except Exception:
print("Only Python {} installed -- will use that version".format(this))
PREFERRED_PYTHON_VERSION = this
return this
def trove_version(root):
print("Trying to guess Python version based on Trove classifiers in setup.py")
try:
full_path = os.path.join(root, "setup.py")
if not os.path.exists(full_path):
print("Did not find setup.py (expected it to be at {})".format(full_path))
return None
versions = set()
with file_open(full_path) as fd:
contents = fd.read()
for match in TROVE.finditer(contents):
versions.add(int(match.group(1)))
if 2 in versions and 3 in versions:
print("Found Trove classifiers for both Python 2 and Python 3 in setup.py -- will use Python 3")
return 3
elif len(versions) == 1:
result = versions.pop()
print("Found Trove classifier for Python {} in setup.py -- will use that version".format(result))
return result
else:
print("Found no Trove classifiers for Python in setup.py")
except Exception:
print("Skipping due to exception:")
print_exception_indented()
return None
def wrap_with_list(x):
if isinstance(x, collections.Iterable) and not isinstance(x, str):
return x
else:
return [x]
def travis_version(root):
print("Trying to guess Python version based on travis file")
try:
full_paths = [os.path.join(root, filename) for filename in [".travis.yml", "travis.yml"]]
travis_file_paths = [path for path in full_paths if os.path.exists(path)]
if not travis_file_paths:
print("Did not find any travis files (expected them at either {})".format(full_paths))
return None
try:
import yaml
except ImportError:
print("Found a travis file, but yaml library not available")
return None
with open(travis_file_paths[0]) as travis_file:
travis_yaml = yaml.safe_load(travis_file)
if "python" in travis_yaml:
versions = wrap_with_list(travis_yaml["python"])
else:
versions = []
# 'matrix' is an alias for 'jobs' now (https://github.com/travis-ci/docs-travis-ci-com/issues/1500)
# If both are defined, only the last defined will be used.
if "matrix" in travis_yaml and "jobs" in travis_yaml:
print("Ignoring 'matrix' and 'jobs' in Travis file, since they are both defined (only one of them should be).")
else:
matrix = travis_yaml.get("matrix") or travis_yaml.get("jobs") or dict()
includes = matrix.get("include") or []
for include in includes:
if "python" in include:
versions.extend(wrap_with_list(include["python"]))
found = set()
for version in versions:
# Yaml may convert version strings to numbers, convert them back.
version = str(version)
if version.startswith("2"):
found.add(2)
if version.startswith("3"):
found.add(3)
if len(found) == 1:
result = found.pop()
print("Only found Python {} in travis file -- will use that version".format(result))
return result
elif len(found) == 2:
print("Found both Python 2 and Python 3 being used in travis file -- ignoring")
else:
print("Found no Python being used in travis file")
except Exception:
print("Skipping due to exception:")
print_exception_indented()
return None
VERSION_TAG = "LGTM_PYTHON_SETUP_VERSION"
def best_version(root, default):
if VERSION_TAG in os.environ:
try:
return int(os.environ[VERSION_TAG])
except ValueError:
raise SyntaxError("Illegal value for " + VERSION_TAG)
print("Will try to guess Python version, as it was not specified in `lgtm.yml`")
version = trove_version(root) or travis_version(root) or machine_version()
if version is None:
version = default
print("Could not guess Python version, will use default: Python {}".format(version))
return version

View File

@@ -0,0 +1,5 @@
*/db/
*/dbs/
*/venv/
**/*.egg-info/
*/.cache

View File

@@ -0,0 +1,21 @@
# Extractor Python CodeQL CLI integration tests
To ensure that the two work together as intended, and as an easy way to set up realistic test-cases.
### Adding a new test case
Add a new folder, place a file called `test.sh` in it, which should start with the code below. The script should exit with failure code to fail the test.
```bash
#!/bin/bash
set -Eeuo pipefail # see https://vaneyckt.io/posts/safer_bash_scripts_with_set_euxo_pipefail/
set -x
CODEQL=${CODEQL:-codeql}
SCRIPTDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
cd "$SCRIPTDIR"
```

View File

@@ -0,0 +1 @@
select 1

View File

@@ -0,0 +1 @@
print(42)

View File

@@ -0,0 +1,15 @@
#!/bin/bash
set -Eeuo pipefail # see https://vaneyckt.io/posts/safer_bash_scripts_with_set_euxo_pipefail/
set -x
CODEQL=${CODEQL:-codeql}
SCRIPTDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
cd "$SCRIPTDIR"
rm -rf db
$CODEQL database create db --language python --source-root repo_dir/
$CODEQL query run --database db query.ql

View File

@@ -0,0 +1,3 @@
import pip
print(42)

View File

@@ -0,0 +1,44 @@
#!/bin/bash
set -Eeuo pipefail # see https://vaneyckt.io/posts/safer_bash_scripts_with_set_euxo_pipefail/
set -x
CODEQL=${CODEQL:-codeql}
SCRIPTDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
cd "$SCRIPTDIR"
# start on clean slate
rm -rf dbs
mkdir dbs
cd "$SCRIPTDIR"
# In 2.16.0 we will not extract libraries by default, so there is no difference in what
# is extracted by setting this environment variable.. We should remove this test when
# 2.17.0 is released.
export CODEQL_EXTRACTOR_PYTHON_DISABLE_LIBRARY_EXTRACTION=
$CODEQL database create dbs/normal --language python --source-root repo_dir/
export CODEQL_EXTRACTOR_PYTHON_DISABLE_LIBRARY_EXTRACTION=1
$CODEQL database create dbs/no-lib-extraction --language python --source-root repo_dir/
# ---
set +x
EXTRACTED_NORMAL=$(unzip -l dbs/normal/src.zip | wc -l)
EXTRACTED_NO_LIB_EXTRACTION=$(unzip -l dbs/no-lib-extraction/src.zip | wc -l)
exitcode=0
echo "EXTRACTED_NORMAL=$EXTRACTED_NORMAL"
echo "EXTRACTED_NO_LIB_EXTRACTION=$EXTRACTED_NO_LIB_EXTRACTION"
if [[ $EXTRACTED_NO_LIB_EXTRACTION -lt $EXTRACTED_NORMAL ]]; then
echo "ERROR: EXTRACTED_NO_LIB_EXTRACTION smaller than EXTRACTED_NORMAL"
exitcode=1
fi
exit $exitcode

View File

@@ -0,0 +1,18 @@
import python
import semmle.python.types.Builtins
predicate named_entity(string name, string kind) {
exists(Builtin::special(name)) and kind = "special"
or
exists(Builtin::builtin(name)) and kind = "builtin"
or
exists(Module m | m.getName() = name) and kind = "module"
or
exists(File f | f.getShortName() = name + ".py") and kind = "file"
}
from string name, string kind
where
name in ["foo", "baz", "main", "os", "sys", "re"] and
named_entity(name, kind)
select name, kind order by name, kind

View File

@@ -0,0 +1,12 @@
| name | kind |
+------+---------+
| baz | file |
| baz | module |
| foo | file |
| foo | module |
| main | file |
| os | file |
| os | module |
| re | file |
| re | module |
| sys | special |

View File

@@ -0,0 +1,8 @@
| name | kind |
+------+---------+
| baz | file |
| baz | module |
| foo | file |
| foo | module |
| main | file |
| sys | special |

View File

@@ -0,0 +1 @@
quux = 4

View File

@@ -0,0 +1,4 @@
import baz
import re
bar = 5 + baz.quux
re.compile("hello")

View File

@@ -0,0 +1,6 @@
import sys
import os
print(os.path)
print(sys.path)
import foo
print(foo.bar)

View File

@@ -0,0 +1,22 @@
#!/bin/bash
set -Eeuo pipefail # see https://vaneyckt.io/posts/safer_bash_scripts_with_set_euxo_pipefail/
set -x
CODEQL=${CODEQL:-codeql}
SCRIPTDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
cd "$SCRIPTDIR"
rm -rf dbs
mkdir dbs
CODEQL_EXTRACTOR_PYTHON_DONT_EXTRACT_STDLIB=True $CODEQL database create dbs/without-stdlib --language python --source-root repo_dir/
$CODEQL query run --database dbs/without-stdlib query.ql > query.without-stdlib.actual
diff query.without-stdlib.expected query.without-stdlib.actual
LGTM_INDEX_EXCLUDE="/usr/lib/**" $CODEQL database create dbs/with-stdlib --language python --source-root repo_dir/
$CODEQL query run --database dbs/with-stdlib query.ql > query.with-stdlib.actual
diff query.with-stdlib.expected query.with-stdlib.actual

View File

@@ -0,0 +1,3 @@
import pip
print(42)

View File

@@ -0,0 +1,41 @@
#!/bin/bash
set -Eeuo pipefail # see https://vaneyckt.io/posts/safer_bash_scripts_with_set_euxo_pipefail/
set -x
CODEQL=${CODEQL:-codeql}
SCRIPTDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
cd "$SCRIPTDIR"
# start on clean slate
rm -rf dbs
mkdir dbs
cd "$SCRIPTDIR"
export CODEQL_EXTRACTOR_PYTHON_FORCE_ENABLE_LIBRARY_EXTRACTION_UNTIL_2_17_0=
$CODEQL database create dbs/normal --language python --source-root repo_dir/
export CODEQL_EXTRACTOR_PYTHON_FORCE_ENABLE_LIBRARY_EXTRACTION_UNTIL_2_17_0=1
$CODEQL database create dbs/with-lib-extraction --language python --source-root repo_dir/
# ---
set +x
EXTRACTED_NORMAL=$(unzip -l dbs/normal/src.zip | wc -l)
EXTRACTED_WITH_LIB_EXTRACTION=$(unzip -l dbs/with-lib-extraction/src.zip | wc -l)
exitcode=0
echo "EXTRACTED_NORMAL=$EXTRACTED_NORMAL"
echo "EXTRACTED_WITH_LIB_EXTRACTION=$EXTRACTED_WITH_LIB_EXTRACTION"
if [[ ! $EXTRACTED_WITH_LIB_EXTRACTION -gt $EXTRACTED_NORMAL ]]; then
echo "ERROR: EXTRACTED_WITH_LIB_EXTRACTION not greater than EXTRACTED_NORMAL"
exitcode=1
fi
exit $exitcode

View File

@@ -0,0 +1,2 @@
venv/
venv2/

View File

@@ -0,0 +1,3 @@
import flask
print(42)

View File

@@ -0,0 +1,79 @@
#!/bin/bash
set -Eeuo pipefail # see https://vaneyckt.io/posts/safer_bash_scripts_with_set_euxo_pipefail/
set -x
CODEQL=${CODEQL:-codeql}
SCRIPTDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
cd "$SCRIPTDIR"
# start on clean slate
rm -rf dbs repo_dir/venv*
mkdir dbs
# set up venvs
cd repo_dir
python3 -m venv venv
venv/bin/pip install flask
python3 -m venv venv2
cd "$SCRIPTDIR"
# In 2.16.0 we stop extracting libraries by default, so to test this functionality we
# need to force enable it. Once we release 2.17.0 and turn off library extraction for
# good, we can remove the part of this test ensuring that dependencies in an active
# venv are still extracted (since that will no longer be the case).
export CODEQL_EXTRACTOR_PYTHON_FORCE_ENABLE_LIBRARY_EXTRACTION_UNTIL_2_17_0=1
# Create DBs with venv2 active (that does not have flask installed)
source repo_dir/venv2/bin/activate
export CODEQL_EXTRACTOR_PYTHON_DISABLE_AUTOMATIC_VENV_EXCLUDE=
$CODEQL database create dbs/normal --language python --source-root repo_dir/
export CODEQL_EXTRACTOR_PYTHON_DISABLE_AUTOMATIC_VENV_EXCLUDE=1
$CODEQL database create dbs/no-venv-ignore --language python --source-root repo_dir/
# Create DB with venv active that has flask installed. We want to ensure that we're
# still able to resolve imports to flask, but don't want to extract EVERYTHING from
# within the venv. Important note is that the test-file in the repo_dir actually imports
# flask :D
source repo_dir/venv/bin/activate
export CODEQL_EXTRACTOR_PYTHON_DISABLE_AUTOMATIC_VENV_EXCLUDE=
$CODEQL database create dbs/normal-with-flask-venv --language python --source-root repo_dir/
# ---
set +x
EXTRACTED_NORMAL=$(unzip -l dbs/normal/src.zip | wc -l)
EXTRACTED_NO_VENV_IGNORE=$(unzip -l dbs/no-venv-ignore/src.zip | wc -l)
EXTRACTED_ACTIVE_FLASK=$(unzip -l dbs/normal-with-flask-venv/src.zip | wc -l)
exitcode=0
echo "EXTRACTED_NORMAL=$EXTRACTED_NORMAL"
echo "EXTRACTED_NO_VENV_IGNORE=$EXTRACTED_NO_VENV_IGNORE"
echo "EXTRACTED_ACTIVE_FLASK=$EXTRACTED_ACTIVE_FLASK"
if [[ ! $EXTRACTED_NORMAL -lt $EXTRACTED_NO_VENV_IGNORE ]]; then
echo "ERROR: EXTRACTED_NORMAL not smaller EXTRACTED_NO_VENV_IGNORE"
exitcode=1
fi
if [[ ! $EXTRACTED_NORMAL -lt $EXTRACTED_ACTIVE_FLASK ]]; then
echo "ERROR: EXTRACTED_NORMAL not smaller EXTRACTED_ACTIVE_FLASK"
exitcode=1
fi
if [[ ! $EXTRACTED_ACTIVE_FLASK -lt $EXTRACTED_NO_VENV_IGNORE ]]; then
echo "ERROR: EXTRACTED_ACTIVE_FLASK not smaller EXTRACTED_NO_VENV_IGNORE"
exitcode=1
fi
exit $exitcode

View File

@@ -0,0 +1,2 @@
repo_dir/build/
dbs/

View File

@@ -0,0 +1,12 @@
from setuptools import find_packages, setup
# using src/ folder as recommended in: https://blog.ionelmc.ro/2014/05/25/python-packaging/
setup(
name="example_pkg",
version="0.0.1",
description="example",
packages=find_packages("src"),
package_dir={"": "src"},
install_requires=[],
)

View File

@@ -0,0 +1,45 @@
#!/bin/bash
set -Eeuo pipefail # see https://vaneyckt.io/posts/safer_bash_scripts_with_set_euxo_pipefail/
set -x
CODEQL=${CODEQL:-codeql}
SCRIPTDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
cd "$SCRIPTDIR"
NUM_PYTHON_FILES_IN_REPO=$(find repo_dir/src/ -name '*.py' | wc -l)
rm -rf venv dbs
mkdir dbs
python3 -m venv venv
source venv/bin/activate
pip install --upgrade 'pip>=21.3'
cd repo_dir
pip install .
cd "$SCRIPTDIR"
export CODEQL_EXTRACTOR_PYTHON_DISABLE_AUTOMATIC_PIP_BUILD_DIR_EXCLUDE=
$CODEQL database create dbs/normal --language python --source-root repo_dir/
export CODEQL_EXTRACTOR_PYTHON_DISABLE_AUTOMATIC_PIP_BUILD_DIR_EXCLUDE=1
$CODEQL database create dbs/with-build-dir --language python --source-root repo_dir/
EXTRACTED_NORMAL=$(unzip -l dbs/normal/src.zip | wc -l)
EXTRACTED_WITH_BUILD=$(unzip -l dbs/with-build-dir/src.zip | wc -l)
if [[ $((EXTRACTED_NORMAL + NUM_PYTHON_FILES_IN_REPO)) == $EXTRACTED_WITH_BUILD ]]; then
echo "Numbers add up"
else
echo "Numbers did not add up"
echo "NUM_PYTHON_FILES_IN_REPO=$NUM_PYTHON_FILES_IN_REPO"
echo "EXTRACTED_NORMAL=$EXTRACTED_NORMAL"
echo "EXTRACTED_WITH_BUILD=$EXTRACTED_WITH_BUILD"
exit 1
fi

View File

@@ -0,0 +1,5 @@
| name |
+----------+
| dircache |
| stat |
| test |

View File

@@ -0,0 +1,5 @@
| name |
+----------+
| dircache |
| stat |
| test |

View File

@@ -0,0 +1,18 @@
import python
import semmle.python.types.Builtins
predicate named_entity(string name, string kind) {
exists(Builtin::special(name)) and kind = "special"
or
exists(Builtin::builtin(name)) and kind = "builtin"
or
exists(Module m | m.getName() = name) and kind = "module"
or
exists(File f | f.getShortName() = name + ".py") and kind = "file"
}
from string name
where
name in ["dircache", "test", "stat"] and
named_entity(name, "file")
select name order by name

View File

@@ -0,0 +1,4 @@
| name |
+------+
| stat |
| test |

View File

@@ -0,0 +1 @@
"Programming Language :: Python :: 2"

View File

@@ -0,0 +1,5 @@
# `dircache` was removed in Python 3, and so is a good test of which standard library we're
# extracting.
import dircache
# A module that's present in both Python 2 and 3
import stat

View File

@@ -0,0 +1,35 @@
#!/bin/bash
set -Eeuo pipefail # see https://vaneyckt.io/posts/safer_bash_scripts_with_set_euxo_pipefail/
set -x
CODEQL=${CODEQL:-codeql}
SCRIPTDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
cd "$SCRIPTDIR"
rm -rf dbs
rm -f *.actual
mkdir dbs
# NB: on our Linux CI infrastructure, `python` is aliased to `python3`.
WITHOUT_PYTHON2=$(pwd)/without-python2
WITHOUT_PYTHON3=$(pwd)/without-python3
echo "Test 1: Only Python 2 is available. Should fail."
# Note the negation at the start of the command.
! PATH="$WITHOUT_PYTHON3:$PATH" $CODEQL database create dbs/only-python2-no-flag --language python --source-root repo_dir/
echo "Test 2: Only Python 3 is available. Should extract using Python 3 and use the Python 3 standard library."
PATH="$WITHOUT_PYTHON2:$PATH" $CODEQL database create dbs/without-python2 --language python --source-root repo_dir/
$CODEQL query run --database dbs/without-python2 query.ql > query.without-python2.actual
diff query.without-python2.expected query.without-python2.actual
echo "Test 3: Python 2 and 3 are both available. Should extract using Python 3, but use the Python 2 standard library."
$CODEQL database create dbs/python2-using-python3 --language python --source-root repo_dir/
$CODEQL query run --database dbs/python2-using-python3 query.ql > query.python2-using-python3.actual
diff query.python2-using-python3.expected query.python2-using-python3.actual
rm -f *.actual

View File

@@ -0,0 +1,4 @@
echo "Attempted to run:"
echo " python2 $@"
echo "Failing instead."
exit 127

View File

@@ -0,0 +1,6 @@
#!/bin/bash -p
case $1 in
python2) exit 1;;
*) command /usr/bin/which -- "$1";;
esac

View File

@@ -0,0 +1,4 @@
echo "Attempted to run:"
echo " python $@"
echo "Failing instead."
exit 127

View File

@@ -0,0 +1,4 @@
echo "Attempted to run:"
echo " python3 $@"
echo "Failing instead."
exit 127

View File

@@ -0,0 +1,9 @@
#!/bin/bash -p
echo "Fake which called with arguments: $@"
case $1 in
python) exit 1;;
python3) exit 1;;
*) command /usr/bin/which -- "$1";;
esac

View File

@@ -0,0 +1,28 @@
#!/bin/bash
set -Eeuo pipefail # see https://vaneyckt.io/posts/safer_bash_scripts_with_set_euxo_pipefail/
SCRIPTDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
cd "$SCRIPTDIR"
failures=()
for f in */test.sh; do
echo "Running $f:"
if ! bash "$f"; then
echo "ERROR: $f failed"
failures+=("$f")
fi
echo "---"
done
if [ -z "${failures[*]}" ]; then
echo "All integration tests passed!"
exit 0
else
echo "ERROR: Some integration test failed! Failures:"
for failure in "${failures[@]}"
do
echo "- ${failure}"
done
exit 1
fi

View File

@@ -0,0 +1,18 @@
#!/bin/bash
set -Eeuo pipefail # see https://vaneyckt.io/posts/safer_bash_scripts_with_set_euxo_pipefail/
set -x
CODEQL=${CODEQL:-codeql}
SCRIPTDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
cd "$SCRIPTDIR"
rm -rf db
# even with default encoding that doesn't support utf-8 (like on windows) we want to
# ensure that we can properly log that we've extracted files whose filenames contain
# utf-8 chars
export PYTHONIOENCODING="ascii"
$CODEQL database create db --language python --source-root repo_dir/

View File

@@ -0,0 +1,2 @@
repo_dir/subdir
repo_dir/symlink_to_top

View File

@@ -0,0 +1 @@
select 1

View File

@@ -0,0 +1 @@
print(42)

View File

@@ -0,0 +1,27 @@
#!/bin/bash
set -Eeuo pipefail # see https://vaneyckt.io/posts/safer_bash_scripts_with_set_euxo_pipefail/
set -x
CODEQL=${CODEQL:-codeql}
SCRIPTDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
cd "$SCRIPTDIR"
rm -rf db
# create two symlink loops, so
# - repo_dir/subdir/symlink_to_top -> repo_dir
# - repo_dir/symlink_to_top -> repo_dir
# such a setup was seen in https://github.com/PowerDNS/weakforced
rm -rf repo_dir/subdir
mkdir repo_dir/subdir
ln -s .. repo_dir/subdir/symlink_to_top
rm -f repo_dir/symlink_to_top
ln -s . repo_dir/symlink_to_top
timeout --verbose 15s $CODEQL database create db --language python --source-root repo_dir/
$CODEQL query run --database db query.ql

View File

@@ -0,0 +1,163 @@
{
"attributes": {
"args": [
"Syntax Error"
],
"traceback": [
"\"semmle/python/modules.py\", line 108, in py_ast",
"\"semmle/python/modules.py\", line 102, in old_py_ast",
"\"semmle/python/parser/__init__.py\", line 100, in parse",
"\"semmleFile \"<string>\", line 1",
"\"semmle/python/extractor.py\", line 84, in process_source_module",
"\"semmle/python/modules.py\", line 92, in ast",
"\"semmle/python/modules.py\", line 120, in py_ast",
"\"semmle/python/modules.py\", line 117, in py_ast",
"\"semmle/python/parser/tsg_parser.py\", line 221, in parse",
"\"semmleFile \"<string>\", line 1"
]
},
"location": {
"file": "<test-root-directory>/repo_dir/syntaxerror3.py",
"startColumn": 0,
"endColumn": 0,
"startLine": 1,
"endLine": 1
},
"markdownMessage": "A parse error occurred while processing `<test-root-directory>/repo_dir/syntaxerror3.py`, and as a result this file could not be analyzed. Check the syntax of the file using the `python -m py_compile` command and correct any invalid syntax.",
"severity": "warning",
"source": {
"extractorName": "python",
"id": "py/diagnostics/syntax-error",
"name": "Could not process some files due to syntax errors"
},
"timestamp": "2023-03-13T15:03:48.177832",
"visibility": {
"cliSummaryTable": true,
"statusPage": true,
"telemetry": true
}
}
{
"attributes": {
"args": [
"Syntax Error"
],
"traceback": [
"\"semmle/python/modules.py\", line 108, in py_ast",
"\"semmle/python/modules.py\", line 102, in old_py_ast",
"\"semmle/python/parser/__init__.py\", line 100, in parse",
"\"semmleFile \"<string>\", line 3",
"\"semmle/python/extractor.py\", line 84, in process_source_module",
"\"semmle/python/modules.py\", line 92, in ast",
"\"semmle/python/modules.py\", line 120, in py_ast",
"\"semmle/python/modules.py\", line 117, in py_ast",
"\"semmle/python/parser/tsg_parser.py\", line 221, in parse",
"\"semmleFile \"<string>\", line 3"
]
},
"location": {
"file": "<test-root-directory>/repo_dir/syntaxerror1.py",
"startColumn": 0,
"endColumn": 0,
"startLine": 3,
"endLine": 3
},
"markdownMessage": "A parse error occurred while processing `<test-root-directory>/repo_dir/syntaxerror1.py`, and as a result this file could not be analyzed. Check the syntax of the file using the `python -m py_compile` command and correct any invalid syntax.",
"severity": "warning",
"source": {
"extractorName": "python",
"id": "py/diagnostics/syntax-error",
"name": "Could not process some files due to syntax errors"
},
"timestamp": "2023-03-13T15:03:48.181384",
"visibility": {
"cliSummaryTable": true,
"statusPage": true,
"telemetry": true
}
}
{
"attributes": {
"args": [
"Syntax Error"
],
"traceback": [
"\"semmle/python/modules.py\", line 108, in py_ast",
"\"semmle/python/modules.py\", line 102, in old_py_ast",
"\"semmle/python/parser/__init__.py\", line 100, in parse",
"\"semmleFile \"<string>\", line 6",
"\"semmle/python/extractor.py\", line 84, in process_source_module",
"\"semmle/python/modules.py\", line 92, in ast",
"\"semmle/python/modules.py\", line 120, in py_ast",
"\"semmle/python/modules.py\", line 117, in py_ast",
"\"semmle/python/parser/tsg_parser.py\", line 221, in parse",
"\"semmleFile \"<string>\", line 5"
]
},
"location": {
"file": "<test-root-directory>/repo_dir/syntaxerror2.py",
"startColumn": 0,
"endColumn": 0,
"startLine": 5,
"endLine": 5
},
"markdownMessage": "A parse error occurred while processing `<test-root-directory>/repo_dir/syntaxerror2.py`, and as a result this file could not be analyzed. Check the syntax of the file using the `python -m py_compile` command and correct any invalid syntax.",
"severity": "warning",
"source": {
"extractorName": "python",
"id": "py/diagnostics/syntax-error",
"name": "Could not process some files due to syntax errors"
},
"timestamp": "2023-03-13T15:03:48.164991",
"visibility": {
"cliSummaryTable": true,
"statusPage": true,
"telemetry": true
}
}
{
"attributes": {
"args": [
"maximum recursion depth exceeded while calling a Python object"
],
"traceback": [
"\"semmle/worker.py\", line 235, in _extract_loop",
"\"semmle/extractors/super_extractor.py\", line 37, in process",
"\"semmle/extractors/py_extractor.py\", line 43, in process",
"\"semmle/python/extractor.py\", line 227, in process_source_module",
"\"semmle/python/extractor.py\", line 84, in process_source_module",
"\"semmle/python/modules.py\", line 96, in ast",
"\"semmle/python/passes/labeller.py\", line 85, in apply",
"\"semmle/python/passes/labeller.py\", line 44, in __init__",
"\"semmle/python/passes/labeller.py\", line 14, in __init__",
"\"semmle/python/passes/ast_pass.py\", line 208, in visit",
"\"semmle/python/passes/ast_pass.py\", line 216, in generic_visit",
"\"semmle/python/passes/ast_pass.py\", line 213, in generic_visit",
"\"semmle/python/passes/ast_pass.py\", line 208, in visit",
"\"semmle/python/passes/ast_pass.py\", line 213, in generic_visit",
"\"semmle/python/passes/ast_pass.py\", line 208, in visit",
"... 3930 lines skipped",
"\"semmle/python/passes/ast_pass.py\", line 213, in generic_visit",
"\"semmle/python/passes/ast_pass.py\", line 208, in visit",
"\"semmle/python/passes/ast_pass.py\", line 213, in generic_visit",
"\"semmle/python/passes/ast_pass.py\", line 208, in visit",
"\"semmle/python/passes/ast_pass.py\", line 205, in _get_visit_method"
]
},
"location": {
"file": "<test-root-directory>/repo_dir/recursion_error.py"
},
"plaintextMessage": "maximum recursion depth exceeded while calling a Python object",
"severity": "error",
"source": {
"extractorName": "python",
"id": "py/diagnostics/recursion-error",
"name": "Recursion error in Python extractor"
},
"timestamp": "2023-03-13T15:03:47.468924",
"visibility": {
"cliSummaryTable": false,
"statusPage": false,
"telemetry": true
}
}

View File

@@ -0,0 +1,4 @@
# Creates a test file that will cause a RecursionError when run with the Python extractor.
with open('repo_dir/recursion_error.py', 'w') as f:
f.write("print({})\n".format("+".join(["1"] * 1000)))

View File

@@ -0,0 +1,6 @@
| filename |
+-----------------+
| safe.py |
| syntaxerror1.py |
| syntaxerror2.py |
| syntaxerror3.py |

View File

@@ -0,0 +1,3 @@
import python
select any(File f).getShortName() as filename order by filename

View File

@@ -0,0 +1 @@
print("No deeply nested structures here!")

View File

@@ -0,0 +1,3 @@
# This file contains a deliberate syntax error
2 +

View File

@@ -0,0 +1,25 @@
#!/bin/bash
set -Eeuo pipefail # see https://vaneyckt.io/posts/safer_bash_scripts_with_set_euxo_pipefail/
set -x
CODEQL=${CODEQL:-codeql}
SCRIPTDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
cd "$SCRIPTDIR"
rm -rf db
rm -f *.actual
python3 make_test.py
echo "Testing database with various errors during extraction"
$CODEQL database create db --language python --source-root repo_dir/
$CODEQL query run --database db query.ql > query.actual
diff query.expected query.actual
python3 test_diagnostics_output.py
rm -f *.actual
rm -f repo_dir/recursion_error.py
rm -rf db

View File

@@ -0,0 +1,7 @@
import os
import sys
sys.path.append(os.path.join(os.path.dirname(__file__), "..", "..", "..", "integration-tests"))
import diagnostics_test_utils
test_db = "db"
diagnostics_test_utils.check_diagnostics(".", test_db, skip_attributes=True)

View File

@@ -0,0 +1,126 @@
#!/usr/bin/env python
import os.path
import imp
import sys
import traceback
import re
SETUP_TAG = "LGTM_PYTHON_SETUP_SETUP_PY"
setup_file_path = "<default value>"
requirements_file_path = "<default value>"
if sys.version_info >= (3,):
basestring = str
def setup_interceptor(**args):
requirements = make_requirements(**args)
write_requirements_file(requirements)
def make_requirements(requires=(), install_requires=(), extras_require={}, dependency_links=[], **other_args):
# Install main requirements.
requirements = list(requires) + list(install_requires)
# Install requirements for all features.
for feature, feature_requirements in extras_require.items():
if isinstance(feature_requirements, basestring):
requirements += [feature_requirements]
else:
requirements += list(feature_requirements)
# Attempt to use dependency_links to find requirements first.
for link in dependency_links:
split_link = link.rsplit("#egg=", 1)
if len(split_link) != 2:
print("Invalid dependency link \"%s\" was ignored." % link)
continue
if not link.startswith("http"):
print("Dependency link \"%s\" is not an HTTP link so is being ignored." % link)
continue
package_name = split_link[1].rsplit("-", 1)[0]
for index, requirement in enumerate(requirements):
if requirement_name(requirement) == package_name:
print("Using %s to install %s." % (link, requirement))
requirements[index] = package_name + " @ " + link
print("Creating %s file from %s." % (requirements_file_path, setup_file_path))
requirements = [requirement.encode("ascii", "ignore").strip().decode("ascii") for requirement in requirements]
print("Requirements extracted from setup.py: %s" % requirements)
return requirements
REQUIREMENT = re.compile(r"^([\w-]+)")
def requirement_name(req_string):
req_string = req_string.strip()
if req_string[0] == '#':
return None
match = REQUIREMENT.match(req_string)
if match:
return match.group(1)
return None
def write_requirements_file(requirements):
if os.path.exists(requirements_file_path):
# Only overwrite the existing requirements if the new requirements are not empty.
if requirements:
print("%s already exists. It will be overwritten." % requirements_file_path)
else:
print("%s already exists and it will not be overwritten because the new requirements list is empty." % requirements_file_path)
return
elif not requirements:
print("%s will not be written because the new requirements list is empty." % requirements_file_path)
return
with open(requirements_file_path, "w") as requirements_file:
for requirement in requirements:
requirements_file.write(requirement + "\n")
print("Requirements have been written to " + requirements_file_path)
def convert_setup_to_requirements(root):
global setup_file_path
if SETUP_TAG in os.environ:
setup_file_path = os.environ[SETUP_TAG]
if setup_file_path == "false":
print("setup.py explicitly ignored")
return 0
else:
setup_file_path = os.path.join(root, "setup.py")
if not os.path.exists(setup_file_path):
print("%s does not exist. Not generating requirements.txt." % setup_file_path)
return 0
# Override the setuptools and distutils.core implementation of setup with our own.
import setuptools
setattr(setuptools, "setup", setup_interceptor)
import distutils.core
setattr(distutils.core, "setup", setup_interceptor)
# TODO: WHY are we inserting at index 1?
# >>> l = [1,2,3]; l.insert(1, 'x'); print(l)
# [1, 'x', 2, 3]
# Ensure the current directory is on path since setup.py might try and include some files in it.
sys.path.insert(1, root)
# Modify the arguments since the setup file sometimes checks them.
sys.argv = [setup_file_path, "build"]
# Run the setup.py file.
try:
imp.load_source("__main__", setup_file_path)
except BaseException as ex:
# We don't really care about errors so long as a requirements.txt exists in the next build step.
print("Running %s failed." % setup_file_path)
traceback.print_exc(file=sys.stdout)
if not os.path.exists(requirements_file_path):
print("%s failed, and a %s file does not exist. Exiting with error." % (setup_file_path, requirements_file_path))
return 1
return 0
def main():
global requirements_file_path
requirements_file_path = sys.argv[2]
sys.path.extend(sys.argv[3:])
sys.exit(convert_setup_to_requirements(sys.argv[1]))
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,3 @@
This folder contains stubs for commonly used Python libraries, which have
the same interface as the original libraries, but are more amenable to
static analysis. The original licenses are noted in each subdirectory.

View File

@@ -0,0 +1,18 @@
Copyright (c) 2010-2019 Benjamin Peterson
Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
the Software, and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

View File

@@ -0,0 +1,240 @@
# Stub file for six.
#This should have the same interface as the six module,
#but be much more tractable for static analysis.
"""Utilities for writing code that runs on Python 2 and 3"""
# Copyright (c) 2015 Semmle Limited
# All rights reserved
# Note that the original six module is copyright Benjamin Peterson
#
import operator
import sys
import types
__author__ = "Benjamin Peterson <benjamin@python.org>"
__version__ = "1.14.0"
# Useful for very coarse version differentiation.
PY2 = sys.version_info < (3,)
PY3 = sys.version_info >= (3,)
if PY3:
string_types = str,
integer_types = int,
class_types = type,
text_type = str
binary_type = bytes
MAXSIZE = sys.maxsize
else:
string_types = basestring,
integer_types = (int, long)
class_types = (type, types.ClassType)
text_type = unicode
binary_type = str
#We can't compute MAXSIZE, but it doesn't really matter
MAXSIZE = int((1 << 63) - 1)
def _add_doc(func, doc):
"""Add documentation to a function."""
func.__doc__ = doc
def _import_module(name):
"""Import module, returning the module after the last dot."""
__import__(name)
return sys.modules[name]
import six.moves as moves
def add_move(move):
"""Add an item to six.moves."""
setattr(_MovedItems, move.name, move)
def remove_move(name):
"""Remove item from six.moves."""
try:
delattr(_MovedItems, name)
except AttributeError:
try:
del moves.__dict__[name]
except KeyError:
raise AttributeError("no such move, %r" % (name,))
if PY3:
_meth_func = "__func__"
_meth_self = "__self__"
_func_closure = "__closure__"
_func_code = "__code__"
_func_defaults = "__defaults__"
_func_globals = "__globals__"
_iterkeys = "keys"
_itervalues = "values"
_iteritems = "items"
_iterlists = "lists"
else:
_meth_func = "im_func"
_meth_self = "im_self"
_func_closure = "func_closure"
_func_code = "func_code"
_func_defaults = "func_defaults"
_func_globals = "func_globals"
_iterkeys = "iterkeys"
_itervalues = "itervalues"
_iteritems = "iteritems"
_iterlists = "iterlists"
try:
advance_iterator = next
except NameError:
def advance_iterator(it):
return it.next()
next = advance_iterator
try:
callable = callable
except NameError:
def callable(obj):
return any("__call__" in klass.__dict__ for klass in type(obj).__mro__)
if PY3:
def get_unbound_function(unbound):
return unbound
create_bound_method = types.MethodType
Iterator = object
else:
def get_unbound_function(unbound):
return unbound.im_func
def create_bound_method(func, obj):
return types.MethodType(func, obj, obj.__class__)
class Iterator(object):
def next(self):
return type(self).__next__(self)
callable = callable
_add_doc(get_unbound_function,
"""Get the function out of a possibly unbound function""")
get_method_function = operator.attrgetter(_meth_func)
get_method_self = operator.attrgetter(_meth_self)
get_function_closure = operator.attrgetter(_func_closure)
get_function_code = operator.attrgetter(_func_code)
get_function_defaults = operator.attrgetter(_func_defaults)
get_function_globals = operator.attrgetter(_func_globals)
def iterkeys(d, **kw):
"""Return an iterator over the keys of a dictionary."""
return iter(getattr(d, _iterkeys)(**kw))
def itervalues(d, **kw):
"""Return an iterator over the values of a dictionary."""
return iter(getattr(d, _itervalues)(**kw))
def iteritems(d, **kw):
"""Return an iterator over the (key, value) pairs of a dictionary."""
return iter(getattr(d, _iteritems)(**kw))
def iterlists(d, **kw):
"""Return an iterator over the (key, [values]) pairs of a dictionary."""
return iter(getattr(d, _iterlists)(**kw))
def byte2int(ch): #type bytes -> int
return int(unknown())
def b(s): #type str -> bytes
"""Byte literal"""
return bytes(unknown())
def u(s): #type str -> unicode
"""Text literal"""
if PY3:
unicode = str
return unicode(unknown())
if PY3:
unichr = chr
def int2byte(i): #type int -> bytes
return bytes(unknown())
indexbytes = operator.getitem
iterbytes = iter
import io
StringIO = io.StringIO
BytesIO = io.BytesIO
else:
unichr = unichr
int2byte = chr
def indexbytes(buf, i):
return int(unknown())
def iterbytes(buf):
return (int(unknown()) for byte in buf)
import StringIO
StringIO = BytesIO = StringIO.StringIO
if PY3:
exec_ = getattr(six.moves.builtins, "exec")
def reraise(tp, value, tb=None):
"""Reraise an exception."""
if value.__traceback__ is not tb:
raise value.with_traceback(tb)
raise value
else:
def exec_(_code_, _globs_=None, _locs_=None):
pass
def reraise(tp, value, tb=None):
"""Reraise an exception."""
exc = tp(value)
exc.__traceback__ = tb
raise exc
print_ = getattr(moves.builtins, "print", None)
if print_ is None:
def print_(*args, **kwargs):
"""The new-style print function for Python 2.4 and 2.5."""
pass
def with_metaclass(meta, *bases):
"""Create a base class with a metaclass."""
return meta("NewBase", bases, {})
def add_metaclass(metaclass):
"""Class decorator for creating a class with a metaclass."""
def wrapper(cls):
orig_vars = cls.__dict__.copy()
orig_vars.pop('__dict__', None)
orig_vars.pop('__weakref__', None)
slots = orig_vars.get('__slots__')
if slots is not None:
if isinstance(slots, str):
slots = [slots]
for slots_var in slots:
orig_vars.pop(slots_var)
return metaclass(cls.__name__, cls.__bases__, orig_vars)
return wrapper

View File

@@ -0,0 +1,239 @@
# six.moves
import sys
PY2 = sys.version_info < (3,)
PY3 = sys.version_info >= (3,)
# Generated (six_gen.py) from six version 1.14.0 with Python 2.7.17 (default, Nov 18 2019, 13:12:39)
if PY2:
import cStringIO as _1
cStringIO = _1.StringIO
import itertools as _2
filter = _2.filter
filterfalse = _2.filterfalse
import __builtin__ as _3
input = _3.raw_input
intern = _3.intern
map = _2.map
import os as _4
getcwd = _4.getcwdu
getcwdb = _4.getcwd
import commands as _5
getoutput = _5.getoutput
range = _3.xrange
reload_module = _3.reload
reduce = _3.reduce
import pipes as _6
shlex_quote = _6.quote
import StringIO as _7
StringIO = _7.StringIO
import UserDict as _8
UserDict = _8.UserDict
import UserList as _9
UserList = _9.UserList
import UserString as _10
UserString = _10.UserString
xrange = _3.xrange
zip = zip
zip_longest = _2.zip_longest
import __builtin__ as builtins
import ConfigParser as configparser
import collections as collections_abc
import copy_reg as copyreg
import gdbm as dbm_gnu
import dbm as dbm_ndbm
import dummy_thread as _dummy_thread
import cookielib as http_cookiejar
import Cookie as http_cookies
import htmlentitydefs as html_entities
import HTMLParser as html_parser
import httplib as http_client
import email.MIMEBase as email_mime_base
import email.MIMEImage as email_mime_image
import email.MIMEMultipart as email_mime_multipart
import email.MIMENonMultipart as email_mime_nonmultipart
import email.MIMEText as email_mime_text
import BaseHTTPServer as BaseHTTPServer
import CGIHTTPServer as CGIHTTPServer
import SimpleHTTPServer as SimpleHTTPServer
import cPickle as cPickle
import Queue as queue
import repr as reprlib
import SocketServer as socketserver
import thread as _thread
import Tkinter as tkinter
import Dialog as tkinter_dialog
import FileDialog as tkinter_filedialog
import ScrolledText as tkinter_scrolledtext
import SimpleDialog as tkinter_simpledialog
import Tix as tkinter_tix
import ttk as tkinter_ttk
import Tkconstants as tkinter_constants
import Tkdnd as tkinter_dnd
import tkColorChooser as tkinter_colorchooser
import tkCommonDialog as tkinter_commondialog
import tkFileDialog as tkinter_tkfiledialog
import tkFont as tkinter_font
import tkMessageBox as tkinter_messagebox
import tkSimpleDialog as tkinter_tksimpledialog
import xmlrpclib as xmlrpc_client
import SimpleXMLRPCServer as xmlrpc_server
del _1
del _5
del _7
del _8
del _6
del _3
del _9
del _2
del _10
del _4
# Generated (six_gen.py) from six version 1.14.0 with Python 3.8.0 (default, Nov 18 2019, 13:17:17)
if PY3:
import io as _1
cStringIO = _1.StringIO
import builtins as _2
filter = _2.filter
import itertools as _3
filterfalse = _3.filterfalse
input = _2.input
import sys as _4
intern = _4.intern
map = _2.map
import os as _5
getcwd = _5.getcwd
getcwdb = _5.getcwdb
import subprocess as _6
getoutput = _6.getoutput
range = _2.range
import importlib as _7
reload_module = _7.reload
import functools as _8
reduce = _8.reduce
import shlex as _9
shlex_quote = _9.quote
StringIO = _1.StringIO
import collections as _10
UserDict = _10.UserDict
UserList = _10.UserList
UserString = _10.UserString
xrange = _2.range
zip = _2.zip
zip_longest = _3.zip_longest
import builtins as builtins
import configparser as configparser
import collections.abc as collections_abc
import copyreg as copyreg
import dbm.gnu as dbm_gnu
import dbm.ndbm as dbm_ndbm
import _dummy_thread as _dummy_thread
import http.cookiejar as http_cookiejar
import http.cookies as http_cookies
import html.entities as html_entities
import html.parser as html_parser
import http.client as http_client
import email.mime.base as email_mime_base
import email.mime.image as email_mime_image
import email.mime.multipart as email_mime_multipart
import email.mime.nonmultipart as email_mime_nonmultipart
import email.mime.text as email_mime_text
import http.server as BaseHTTPServer
import http.server as CGIHTTPServer
import http.server as SimpleHTTPServer
import pickle as cPickle
import queue as queue
import reprlib as reprlib
import socketserver as socketserver
import _thread as _thread
import tkinter as tkinter
import tkinter.dialog as tkinter_dialog
import tkinter.filedialog as tkinter_filedialog
import tkinter.scrolledtext as tkinter_scrolledtext
import tkinter.simpledialog as tkinter_simpledialog
import tkinter.tix as tkinter_tix
import tkinter.ttk as tkinter_ttk
import tkinter.constants as tkinter_constants
import tkinter.dnd as tkinter_dnd
import tkinter.colorchooser as tkinter_colorchooser
import tkinter.commondialog as tkinter_commondialog
import tkinter.filedialog as tkinter_tkfiledialog
import tkinter.font as tkinter_font
import tkinter.messagebox as tkinter_messagebox
import tkinter.simpledialog as tkinter_tksimpledialog
import xmlrpc.client as xmlrpc_client
import xmlrpc.server as xmlrpc_server
del _1
del _2
del _3
del _4
del _5
del _6
del _7
del _8
del _9
del _10
# Not generated:
import six.moves.urllib as urllib
import six.moves.urllib_parse as urllib_parse
import six.moves.urllib_response as urllib_response
import six.moves.urllib_request as urllib_request
import six.moves.urllib_error as urllib_error
import six.moves.urllib_robotparser as urllib_robotparser
sys.modules['six.moves.builtins'] = builtins
sys.modules['six.moves.configparser'] = configparser
sys.modules['six.moves.collections_abc'] = collections_abc
sys.modules['six.moves.copyreg'] = copyreg
sys.modules['six.moves.dbm_gnu'] = dbm_gnu
sys.modules['six.moves.dbm_ndbm'] = dbm_ndbm
sys.modules['six.moves._dummy_thread'] = _dummy_thread
sys.modules['six.moves.http_cookiejar'] = http_cookiejar
sys.modules['six.moves.http_cookies'] = http_cookies
sys.modules['six.moves.html_entities'] = html_entities
sys.modules['six.moves.html_parser'] = html_parser
sys.modules['six.moves.http_client'] = http_client
sys.modules['six.moves.email_mime_base'] = email_mime_base
sys.modules['six.moves.email_mime_image'] = email_mime_image
sys.modules['six.moves.email_mime_multipart'] = email_mime_multipart
sys.modules['six.moves.email_mime_nonmultipart'] = email_mime_nonmultipart
sys.modules['six.moves.email_mime_text'] = email_mime_text
sys.modules['six.moves.BaseHTTPServer'] = BaseHTTPServer
sys.modules['six.moves.CGIHTTPServer'] = CGIHTTPServer
sys.modules['six.moves.SimpleHTTPServer'] = SimpleHTTPServer
sys.modules['six.moves.cPickle'] = cPickle
sys.modules['six.moves.queue'] = queue
sys.modules['six.moves.reprlib'] = reprlib
sys.modules['six.moves.socketserver'] = socketserver
sys.modules['six.moves._thread'] = _thread
sys.modules['six.moves.tkinter'] = tkinter
sys.modules['six.moves.tkinter_dialog'] = tkinter_dialog
sys.modules['six.moves.tkinter_filedialog'] = tkinter_filedialog
sys.modules['six.moves.tkinter_scrolledtext'] = tkinter_scrolledtext
sys.modules['six.moves.tkinter_simpledialog'] = tkinter_simpledialog
sys.modules['six.moves.tkinter_tix'] = tkinter_tix
sys.modules['six.moves.tkinter_ttk'] = tkinter_ttk
sys.modules['six.moves.tkinter_constants'] = tkinter_constants
sys.modules['six.moves.tkinter_dnd'] = tkinter_dnd
sys.modules['six.moves.tkinter_colorchooser'] = tkinter_colorchooser
sys.modules['six.moves.tkinter_commondialog'] = tkinter_commondialog
sys.modules['six.moves.tkinter_tkfiledialog'] = tkinter_tkfiledialog
sys.modules['six.moves.tkinter_font'] = tkinter_font
sys.modules['six.moves.tkinter_messagebox'] = tkinter_messagebox
sys.modules['six.moves.tkinter_tksimpledialog'] = tkinter_tksimpledialog
sys.modules['six.moves.xmlrpc_client'] = xmlrpc_client
sys.modules['six.moves.xmlrpc_server'] = xmlrpc_server
# Windows special
if PY2:
import _winreg as winreg
if PY3:
import winreg as winreg
sys.modules['six.moves.winreg'] = winreg
del sys

View File

@@ -0,0 +1,15 @@
import sys
import six.moves.urllib_error as error
import six.moves.urllib_parse as parse
import six.moves.urllib_request as request
import six.moves.urllib_response as response
import six.moves.urllib_robotparser as robotparser
sys.modules['six.moves.urllib.error'] = error
sys.modules['six.moves.urllib.parse'] = parse
sys.modules['six.moves.urllib.request'] = request
sys.modules['six.moves.urllib.response'] = response
sys.modules['six.moves.urllib.robotparser'] = robotparser
del sys

View File

@@ -0,0 +1,21 @@
# six.moves.urllib_error
from six import PY2, PY3
# Generated (six_gen.py) from six version 1.14.0 with Python 2.7.17 (default, Nov 18 2019, 13:12:39)
if PY2:
import urllib2 as _1
URLError = _1.URLError
HTTPError = _1.HTTPError
import urllib as _2
ContentTooShortError = _2.ContentTooShortError
del _1
del _2
# Generated (six_gen.py) from six version 1.14.0 with Python 3.8.0 (default, Nov 18 2019, 13:17:17)
if PY3:
import urllib.error as _1
URLError = _1.URLError
HTTPError = _1.HTTPError
ContentTooShortError = _1.ContentTooShortError
del _1

View File

@@ -0,0 +1,65 @@
# six.moves.urllib_parse
from six import PY2, PY3
# Generated (six_gen.py) from six version 1.14.0 with Python 2.7.17 (default, Nov 18 2019, 13:12:39)
if PY2:
import urlparse as _1
ParseResult = _1.ParseResult
SplitResult = _1.SplitResult
parse_qs = _1.parse_qs
parse_qsl = _1.parse_qsl
urldefrag = _1.urldefrag
urljoin = _1.urljoin
urlparse = _1.urlparse
urlsplit = _1.urlsplit
urlunparse = _1.urlunparse
urlunsplit = _1.urlunsplit
import urllib as _2
quote = _2.quote
quote_plus = _2.quote_plus
unquote = _2.unquote
unquote_plus = _2.unquote_plus
unquote_to_bytes = _2.unquote
urlencode = _2.urlencode
splitquery = _2.splitquery
splittag = _2.splittag
splituser = _2.splituser
splitvalue = _2.splitvalue
uses_fragment = _1.uses_fragment
uses_netloc = _1.uses_netloc
uses_params = _1.uses_params
uses_query = _1.uses_query
uses_relative = _1.uses_relative
del _1
del _2
# Generated (six_gen.py) from six version 1.14.0 with Python 3.8.0 (default, Nov 18 2019, 13:17:17)
if PY3:
import urllib.parse as _1
ParseResult = _1.ParseResult
SplitResult = _1.SplitResult
parse_qs = _1.parse_qs
parse_qsl = _1.parse_qsl
urldefrag = _1.urldefrag
urljoin = _1.urljoin
urlparse = _1.urlparse
urlsplit = _1.urlsplit
urlunparse = _1.urlunparse
urlunsplit = _1.urlunsplit
quote = _1.quote
quote_plus = _1.quote_plus
unquote = _1.unquote
unquote_plus = _1.unquote_plus
unquote_to_bytes = _1.unquote_to_bytes
urlencode = _1.urlencode
splitquery = _1.splitquery
splittag = _1.splittag
splituser = _1.splituser
splitvalue = _1.splitvalue
uses_fragment = _1.uses_fragment
uses_netloc = _1.uses_netloc
uses_params = _1.uses_params
uses_query = _1.uses_query
uses_relative = _1.uses_relative
del _1

View File

@@ -0,0 +1,85 @@
# six.moves.urllib_request
from six import PY2, PY3
# Generated (six_gen.py) from six version 1.14.0 with Python 2.7.17 (default, Nov 18 2019, 13:12:39)
if PY2:
import urllib2 as _1
urlopen = _1.urlopen
install_opener = _1.install_opener
build_opener = _1.build_opener
import urllib as _2
pathname2url = _2.pathname2url
url2pathname = _2.url2pathname
getproxies = _2.getproxies
Request = _1.Request
OpenerDirector = _1.OpenerDirector
HTTPDefaultErrorHandler = _1.HTTPDefaultErrorHandler
HTTPRedirectHandler = _1.HTTPRedirectHandler
HTTPCookieProcessor = _1.HTTPCookieProcessor
ProxyHandler = _1.ProxyHandler
BaseHandler = _1.BaseHandler
HTTPPasswordMgr = _1.HTTPPasswordMgr
HTTPPasswordMgrWithDefaultRealm = _1.HTTPPasswordMgrWithDefaultRealm
AbstractBasicAuthHandler = _1.AbstractBasicAuthHandler
HTTPBasicAuthHandler = _1.HTTPBasicAuthHandler
ProxyBasicAuthHandler = _1.ProxyBasicAuthHandler
AbstractDigestAuthHandler = _1.AbstractDigestAuthHandler
HTTPDigestAuthHandler = _1.HTTPDigestAuthHandler
ProxyDigestAuthHandler = _1.ProxyDigestAuthHandler
HTTPHandler = _1.HTTPHandler
HTTPSHandler = _1.HTTPSHandler
FileHandler = _1.FileHandler
FTPHandler = _1.FTPHandler
CacheFTPHandler = _1.CacheFTPHandler
UnknownHandler = _1.UnknownHandler
HTTPErrorProcessor = _1.HTTPErrorProcessor
urlretrieve = _2.urlretrieve
urlcleanup = _2.urlcleanup
URLopener = _2.URLopener
FancyURLopener = _2.FancyURLopener
proxy_bypass = _2.proxy_bypass
parse_http_list = _1.parse_http_list
parse_keqv_list = _1.parse_keqv_list
del _1
del _2
# Generated (six_gen.py) from six version 1.14.0 with Python 3.8.0 (default, Nov 18 2019, 13:17:17)
if PY3:
import urllib.request as _1
urlopen = _1.urlopen
install_opener = _1.install_opener
build_opener = _1.build_opener
pathname2url = _1.pathname2url
url2pathname = _1.url2pathname
getproxies = _1.getproxies
Request = _1.Request
OpenerDirector = _1.OpenerDirector
HTTPDefaultErrorHandler = _1.HTTPDefaultErrorHandler
HTTPRedirectHandler = _1.HTTPRedirectHandler
HTTPCookieProcessor = _1.HTTPCookieProcessor
ProxyHandler = _1.ProxyHandler
BaseHandler = _1.BaseHandler
HTTPPasswordMgr = _1.HTTPPasswordMgr
HTTPPasswordMgrWithDefaultRealm = _1.HTTPPasswordMgrWithDefaultRealm
AbstractBasicAuthHandler = _1.AbstractBasicAuthHandler
HTTPBasicAuthHandler = _1.HTTPBasicAuthHandler
ProxyBasicAuthHandler = _1.ProxyBasicAuthHandler
AbstractDigestAuthHandler = _1.AbstractDigestAuthHandler
HTTPDigestAuthHandler = _1.HTTPDigestAuthHandler
ProxyDigestAuthHandler = _1.ProxyDigestAuthHandler
HTTPHandler = _1.HTTPHandler
HTTPSHandler = _1.HTTPSHandler
FileHandler = _1.FileHandler
FTPHandler = _1.FTPHandler
CacheFTPHandler = _1.CacheFTPHandler
UnknownHandler = _1.UnknownHandler
HTTPErrorProcessor = _1.HTTPErrorProcessor
urlretrieve = _1.urlretrieve
urlcleanup = _1.urlcleanup
URLopener = _1.URLopener
FancyURLopener = _1.FancyURLopener
proxy_bypass = _1.proxy_bypass
parse_http_list = _1.parse_http_list
parse_keqv_list = _1.parse_keqv_list
del _1

View File

@@ -0,0 +1,21 @@
# six.moves.urllib_response
from six import PY2, PY3
# Generated (six_gen.py) from six version 1.14.0 with Python 2.7.17 (default, Nov 18 2019, 13:12:39)
if PY2:
import urllib as _1
addbase = _1.addbase
addclosehook = _1.addclosehook
addinfo = _1.addinfo
addinfourl = _1.addinfourl
del _1
# Generated (six_gen.py) from six version 1.14.0 with Python 3.8.0 (default, Nov 18 2019, 13:17:17)
if PY3:
import urllib.response as _1
addbase = _1.addbase
addclosehook = _1.addclosehook
addinfo = _1.addinfo
addinfourl = _1.addinfourl
del _1

View File

@@ -0,0 +1,15 @@
# six.moves.urllib_robotparser
from six import PY2, PY3
# Generated (six_gen.py) from six version 1.14.0 with Python 2.7.17 (default, Nov 18 2019, 13:12:39)
if PY2:
import robotparser as _1
RobotFileParser = _1.RobotFileParser
del _1
# Generated (six_gen.py) from six version 1.14.0 with Python 3.8.0 (default, Nov 18 2019, 13:17:17)
if PY3:
import urllib.robotparser as _1
RobotFileParser = _1.RobotFileParser
del _1

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 27 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 31 KiB

View File

@@ -0,0 +1,35 @@
import os
import sys
def pip_installed_folder():
try:
import pip
except ImportError:
print("ERROR: 'pip' not installed.")
sys.exit(2)
dirname, filename = os.path.split(pip.__file__)
if filename.startswith("__init__."):
dirname = os.path.dirname(dirname)
return dirname
def first_site_packages():
dist_packages = None
for path in sys.path:
if "site-packages" in path:
return path
if "dist-packages" in path and not dist_packages:
dist_packages = path
if dist_packages:
return dist_packages
# No site-packages or dist-packages?
raise Exception
def get_venv_lib():
try:
return pip_installed_folder()
except:
return first_site_packages()
if __name__=='__main__':
print(get_venv_lib())

344
python/extractor/imp.py Normal file
View File

@@ -0,0 +1,344 @@
"""This module provides the components needed to build your own __import__
function. Undocumented functions are obsolete.
In most cases it is preferred you consider using the importlib module's
functionality over this module.
This file was copied from `Lib/imp.py`, copyright PSF, with minor modifications made afterward.
"""
# (Probably) need to stay in _imp
from _imp import (lock_held, acquire_lock, release_lock,
get_frozen_object, is_frozen_package,
init_frozen, is_builtin, is_frozen,
_fix_co_filename)
try:
from _imp import create_dynamic
except ImportError:
# Platform doesn't support dynamic loading.
create_dynamic = None
from importlib._bootstrap import _ERR_MSG, _exec, _load, _builtin_from_name
from importlib._bootstrap_external import SourcelessFileLoader
from importlib import machinery
from importlib import util
import importlib
import os
import sys
import tokenize
import types
import warnings
# DEPRECATED
SEARCH_ERROR = 0
PY_SOURCE = 1
PY_COMPILED = 2
C_EXTENSION = 3
PY_RESOURCE = 4
PKG_DIRECTORY = 5
C_BUILTIN = 6
PY_FROZEN = 7
PY_CODERESOURCE = 8
IMP_HOOK = 9
def new_module(name):
"""**DEPRECATED**
Create a new module.
The module is not entered into sys.modules.
"""
return types.ModuleType(name)
def get_magic():
"""**DEPRECATED**
Return the magic number for .pyc files.
"""
return util.MAGIC_NUMBER
def get_tag():
"""Return the magic tag for .pyc files."""
return sys.implementation.cache_tag
def cache_from_source(path, debug_override=None):
"""**DEPRECATED**
Given the path to a .py file, return the path to its .pyc file.
The .py file does not need to exist; this simply returns the path to the
.pyc file calculated as if the .py file were imported.
If debug_override is not None, then it must be a boolean and is used in
place of sys.flags.optimize.
If sys.implementation.cache_tag is None then NotImplementedError is raised.
"""
with warnings.catch_warnings():
warnings.simplefilter('ignore')
return util.cache_from_source(path, debug_override)
def source_from_cache(path):
"""**DEPRECATED**
Given the path to a .pyc. file, return the path to its .py file.
The .pyc file does not need to exist; this simply returns the path to
the .py file calculated to correspond to the .pyc file. If path does
not conform to PEP 3147 format, ValueError will be raised. If
sys.implementation.cache_tag is None then NotImplementedError is raised.
"""
return util.source_from_cache(path)
def get_suffixes():
"""**DEPRECATED**"""
extensions = [(s, 'rb', C_EXTENSION) for s in machinery.EXTENSION_SUFFIXES]
source = [(s, 'r', PY_SOURCE) for s in machinery.SOURCE_SUFFIXES]
bytecode = [(s, 'rb', PY_COMPILED) for s in machinery.BYTECODE_SUFFIXES]
return extensions + source + bytecode
class NullImporter:
"""**DEPRECATED**
Null import object.
"""
def __init__(self, path):
if path == '':
raise ImportError('empty pathname', path='')
elif os.path.isdir(path):
raise ImportError('existing directory', path=path)
def find_module(self, fullname):
"""Always returns None."""
return None
class _HackedGetData:
"""Compatibility support for 'file' arguments of various load_*()
functions."""
def __init__(self, fullname, path, file=None):
super().__init__(fullname, path)
self.file = file
def get_data(self, path):
"""Gross hack to contort loader to deal w/ load_*()'s bad API."""
if self.file and path == self.path:
# The contract of get_data() requires us to return bytes. Reopen the
# file in binary mode if needed.
if not self.file.closed:
file = self.file
if 'b' not in file.mode:
file.close()
if self.file.closed:
self.file = file = open(self.path, 'rb')
with file:
return file.read()
else:
return super().get_data(path)
class _LoadSourceCompatibility(_HackedGetData, machinery.SourceFileLoader):
"""Compatibility support for implementing load_source()."""
def load_source(name, pathname, file=None):
loader = _LoadSourceCompatibility(name, pathname, file)
spec = util.spec_from_file_location(name, pathname, loader=loader)
if name in sys.modules:
module = _exec(spec, sys.modules[name])
else:
module = _load(spec)
# To allow reloading to potentially work, use a non-hacked loader which
# won't rely on a now-closed file object.
module.__loader__ = machinery.SourceFileLoader(name, pathname)
module.__spec__.loader = module.__loader__
return module
class _LoadCompiledCompatibility(_HackedGetData, SourcelessFileLoader):
"""Compatibility support for implementing load_compiled()."""
def load_compiled(name, pathname, file=None):
"""**DEPRECATED**"""
loader = _LoadCompiledCompatibility(name, pathname, file)
spec = util.spec_from_file_location(name, pathname, loader=loader)
if name in sys.modules:
module = _exec(spec, sys.modules[name])
else:
module = _load(spec)
# To allow reloading to potentially work, use a non-hacked loader which
# won't rely on a now-closed file object.
module.__loader__ = SourcelessFileLoader(name, pathname)
module.__spec__.loader = module.__loader__
return module
def load_package(name, path):
"""**DEPRECATED**"""
if os.path.isdir(path):
extensions = (machinery.SOURCE_SUFFIXES[:] +
machinery.BYTECODE_SUFFIXES[:])
for extension in extensions:
init_path = os.path.join(path, '__init__' + extension)
if os.path.exists(init_path):
path = init_path
break
else:
raise ValueError('{!r} is not a package'.format(path))
spec = util.spec_from_file_location(name, path,
submodule_search_locations=[])
if name in sys.modules:
return _exec(spec, sys.modules[name])
else:
return _load(spec)
def load_module(name, file, filename, details):
"""**DEPRECATED**
Load a module, given information returned by find_module().
The module name must include the full package name, if any.
"""
suffix, mode, type_ = details
if mode and (not mode.startswith(('r', 'U')) or '+' in mode):
raise ValueError('invalid file open mode {!r}'.format(mode))
elif file is None and type_ in {PY_SOURCE, PY_COMPILED}:
msg = 'file object required for import (type code {})'.format(type_)
raise ValueError(msg)
elif type_ == PY_SOURCE:
return load_source(name, filename, file)
elif type_ == PY_COMPILED:
return load_compiled(name, filename, file)
elif type_ == C_EXTENSION and load_dynamic is not None:
if file is None:
with open(filename, 'rb') as opened_file:
return load_dynamic(name, filename, opened_file)
else:
return load_dynamic(name, filename, file)
elif type_ == PKG_DIRECTORY:
return load_package(name, filename)
elif type_ == C_BUILTIN:
return init_builtin(name)
elif type_ == PY_FROZEN:
return init_frozen(name)
else:
msg = "Don't know how to import {} (type code {})".format(name, type_)
raise ImportError(msg, name=name)
def find_module(name, path=None):
"""**DEPRECATED**
Search for a module.
If path is omitted or None, search for a built-in, frozen or special
module and continue search in sys.path. The module name cannot
contain '.'; to search for a submodule of a package, pass the
submodule name and the package's __path__.
"""
if not isinstance(name, str):
raise TypeError("'name' must be a str, not {}".format(type(name)))
elif not isinstance(path, (type(None), list)):
# Backwards-compatibility
raise RuntimeError("'path' must be None or a list, "
"not {}".format(type(path)))
if path is None:
if is_builtin(name):
return None, None, ('', '', C_BUILTIN)
elif is_frozen(name):
return None, None, ('', '', PY_FROZEN)
else:
path = sys.path
for entry in path:
package_directory = os.path.join(entry, name)
for suffix in ['.py', machinery.BYTECODE_SUFFIXES[0]]:
package_file_name = '__init__' + suffix
file_path = os.path.join(package_directory, package_file_name)
if os.path.isfile(file_path):
return None, package_directory, ('', '', PKG_DIRECTORY)
for suffix, mode, type_ in get_suffixes():
file_name = name + suffix
file_path = os.path.join(entry, file_name)
if os.path.isfile(file_path):
break
else:
continue
break # Break out of outer loop when breaking out of inner loop.
else:
raise ImportError(_ERR_MSG.format(name), name=name)
encoding = None
if 'b' not in mode:
with open(file_path, 'rb') as file:
encoding = tokenize.detect_encoding(file.readline)[0]
file = open(file_path, mode, encoding=encoding)
return file, file_path, (suffix, mode, type_)
def reload(module):
"""**DEPRECATED**
Reload the module and return it.
The module must have been successfully imported before.
"""
return importlib.reload(module)
def init_builtin(name):
"""**DEPRECATED**
Load and return a built-in module by name, or None is such module doesn't
exist
"""
try:
return _builtin_from_name(name)
except ImportError:
return None
if create_dynamic:
def load_dynamic(name, path, file=None):
"""**DEPRECATED**
Load an extension module.
"""
import importlib.machinery
loader = importlib.machinery.ExtensionFileLoader(name, path)
# Issue #24748: Skip the sys.modules check in _load_module_shim;
# always load new extension
spec = importlib.machinery.ModuleSpec(
name=name, loader=loader, origin=path)
return _load(spec)
else:
load_dynamic = None

28
python/extractor/index.py Normal file
View File

@@ -0,0 +1,28 @@
#!/usr/bin/env python
# This file needs to be able to handle all versions of Python we are likely to encounter
# Which is probably 3.6 and upwards. Handling 3.6 specifically will be by throwing an error, though.
# We will require at least 3.7 to proceed.
'''Run index.py in buildtools'''
import os
import sys
if sys.version_info < (3, 7):
sys.exit("ERROR: Python 3.7 or later is required (currently running {}.{})".format(sys.version_info[0], sys.version_info[1]))
from python_tracer import getzipfilename
if 'SEMMLE_DIST' in os.environ:
if 'CODEQL_EXTRACTOR_PYTHON_ROOT' not in os.environ:
os.environ['CODEQL_EXTRACTOR_PYTHON_ROOT'] = os.environ['SEMMLE_DIST']
else:
os.environ["SEMMLE_DIST"] = os.environ["CODEQL_EXTRACTOR_PYTHON_ROOT"]
tools = os.path.join(os.environ['SEMMLE_DIST'], "tools")
zippath = os.path.join(tools, getzipfilename())
sys.path = [ zippath ] + sys.path
import buildtools.index
buildtools.index.main()

View File

@@ -0,0 +1,19 @@
Copyright © 2017 Erez Shinan
Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
the Software, and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Some files were not shown because too many files have changed in this diff Show More