Commit Graph

123 Commits

Author SHA1 Message Date
Taus
f55ff96674 Python: Bump extractor version and add change note 2025-11-27 13:52:37 +00:00
Alexander Köplinger
458f8570e8 Fix KeyError: 'name' in python/extractor/imp.py on Python 3.14
Follow-up to https://github.com/github/codeql/pull/20630

The fix didn't fully work since when we raise the ImportError in `find_module` we don't pass a named argument into the format string which causes a `KeyError`.

We need to use a format string without named arguments, like Python 3.13 and earlier did.
2025-11-25 12:38:55 +01:00
Nora Dimitrijević
e120e5c3ba Merge pull request #20337 from d10c/d10c/python-overlay-compilation-plus-extractor
Python: enable overlay compilation + extractor overlay support
2025-10-16 14:49:01 +02:00
Taus
c4b27d5f28 Python: Fix ImportError in imp.py under Python 3.14
It seems `_ERR_MSG` was silently removed in Python 3.14, leading to an
`ImportError` when running the extractor.

To fix this, we explicitly set `_ERR_MSG` when the existing import fails
(using `_ERR_MSG_PREFIX` which is available in Python 3.14+, along with
the bits that make up the difference between this and `_ERR_MSG`).
2025-10-13 13:50:43 +00:00
Nora Dimitrijević
c749607db8 Bump python extractor version to 7.1.5 2025-10-07 11:22:16 +02:00
Nora Dimitrijević
1a9683f986 Add @top database type 2025-10-06 11:47:14 +02:00
Nora Dimitrijević
6f208e9dec Write overlay metadata at end of extraction. 2025-10-06 11:47:12 +02:00
Nora Dimitrijević
49b18db044 Python extractor: in overlay mode, traverse only changed files
- fall back to full extraction on overlay changes json read error
- we filter both root modules and (transitive) imports against the overlay-changes json.
2025-10-06 11:47:09 +02:00
Nora Dimitrijević
e0cf719cb9 Path transformer: handle Windows-style paths
And don't add slash to start of path patterns on Windows.
2025-10-06 11:37:04 +02:00
Nora Dimitrijević
29b1a7403b Support CODEQL_PATH_TRANSFORMER env var in python path renamer
The new name is required by overlay support.
2025-10-06 11:37:02 +02:00
Nora Dimitrijević
a88d3397cd Add overlay builtins to python dbscheme 2025-10-06 11:36:56 +02:00
Taus
f5a06bef4a Merge pull request #19929 from github/tausbn/python-update-tree-sitter-dependency
Python: Update `tree-sitter` dependency
2025-09-17 13:40:13 +02:00
Arthur Baars
5d3ec35e29 Remove non-breaking spaces from code 2025-09-05 09:41:15 +02:00
Taus
f6732a927b Python: Bump extractor version 2025-09-03 11:56:54 +00:00
Taus
13a93c7e32 Python: Add suggestions from Copilot 2025-09-03 11:55:49 +00:00
Taus
9802ad77dc Python: Update types_new.py and test output 2025-09-02 12:41:57 +00:00
Taus
235822d782 Python: Improve handling of syntax errors
Rather than relying on matching arbitrary nodes inside tree-sitter-graph
and then checking whether they are of type ERROR or MISSING (which seems
to have stopped working in later versions of tree-sitter), we now
explicitly go through the tree-sitter tree, locating all of the error
and missing nodes along the way. We then add these on to the graph
output in the same format as was previously produced by
tree-sitter-graph.

Note that it's very likely that some of the syntax errors will move
around a bit as a consequence of this change. In general, we don't
expect syntax errors to have stable locations, as small changes in the
grammar can cause an error to appear in a different position, even if
the underlying (erroneous) code has not changed.
2025-09-02 12:41:57 +00:00
Taus
b108d47b26 Python: Update parser test output
It seems that with a newer version of tree-sitter, we no longer parse
the (not actually valid!) syntax `Spam[**P2]` as if the `**` is an
exponentiation operation (with a missing left operand).
2025-09-02 12:41:55 +00:00
Taus
76f15a890c Python: Update tree-sitter dependency
Updates the Python extractor to depend on version 0.24.7 of tree-sitter
(and 0.12.0 of tree-sitter-graph).

A few changes were needed in order to make the code build and run after
updating the dependencies:

- In `main.rs`, the `Language` parameter is now passed as a reference.
- In `python.tsg`, many queries had captures that were not actually used
in the body of the stanza. This is no longer allowed (unless the
captures start with an underscore), as it may indicate an error. To fix
this, I added underscores in the appropriate places (and verified that
none of these unused captures were in fact bugs).
2025-09-02 12:40:20 +00:00
Taus
ad53518644 Python: Regenerate parser files 2025-06-26 15:34:44 +00:00
Taus
e04821e9e3 Python: Allow use of match as an identifier
This previously only worked in certain circumstances. In particular,
assignments such as `match[1] = ...` or even just `match[1]` would fail
to parse correctly.

Fixing this turned out to be less trivial than anticipated. Consider the
fact that
```
match [1]: case (...)
```
can either look the start of a `match` statement, or it could be a type
ascription, ascribing the value of `case(...)` (a call) to the item at
index 1 of `match`.

To fix this, then, we give `match` the identifier and `match` the
statement the same precendence in the grammar, and additionally also
mark a conflict between `match_statement` and `primary_expression`. This
causes the conflict to be resolved dynamically, and seems to do the
right thing in all cases.
2025-06-26 15:33:00 +00:00
Taus
2158eaa34c Python: Fix a bug in glob regex creation
The previous version was tested on a version of the code where we had
temporarily removed the `glob.strip("/")` bit, and so the bug didn't
trigger then.

We now correctly remember if the glob ends in `/`, and add an extra part
in that case. This way, if the path ends with multiple slashes, they
effectively get consolidated into a single one, which results in the
correct semantics.
2025-05-15 15:34:11 +00:00
Taus
c8cca126a1 Python: Bump extractor version 2025-05-15 14:59:33 +00:00
Taus
96558b53b8 Python: Update test
The second test case now sets the `paths-ignore` setting in the config
file in order to skip files in hidden directories.
2025-05-15 14:53:15 +00:00
Taus
98388be25c Python: Remove special casing of hidden files
If it is necessary to exclude hidden files, then adding
```
paths-ignore: ['**/.*/**']
```
to the relevant config file is recommended instead.
2025-05-15 14:49:17 +00:00
Taus
61719cf448 Python: Fix a bug in glob conversion
If you have a filter like `**/foo/**` set in the `paths-ignore` bit of
your config file, then currently the following happens:

- First, the CodeQL CLI observes that this string ends in `/**` and
  strips off the `**` leaving `**/foo/`
- Then the Python extractor strips off leading and trailing `/`
  characters and proceeds to convert `**/foo` into a regex that is
  matched against files to (potentially) extract.

The trouble with this is that it leaves us unable to distinguish
between, say, a file `foo.py` and a file `foo/bar.py`. In other words,
we have lost the ability to exclude only the _folder_ `foo` and not any
files that happen to start with `foo`.

To fix this, we instead make a note of whether the glob ends in a
forward slash or not, and adjust the regex correspondingly.
2025-05-15 14:48:06 +00:00
Taus
605f2bff9c Python: Add integration test 2025-05-02 14:27:46 +00:00
Taus
0c1b379ac1 Python: Extract files in hidden dirs by default
Changes the default behaviour of the Python extractor so files inside
hidden directories are extracted by default.

Also adds an extractor option, `skip_hidden_directories`, which can be
set to `true` in order to revert to the old behaviour.

Finally, I made the logic surrounding what is logged in various cases a
bit more obvious.

Technically this changes the behaviour of the extractor (in that hidden
excluded files will now be logged as `(excluded)`, but I think this
makes more sense anyway.
2025-05-02 12:44:05 +00:00
Taus
6546bb1b1d Merge branch 'main' into tausbn/python-fix-match-pruning-logic 2025-03-06 14:37:58 +01:00
Paolo Tranquilli
1bcc6ddb32 Rust/Ruby/Python: apply clippy lints 2025-02-25 13:21:28 +01:00
Paolo Tranquilli
6089a75262 Rust/Ruby/Python: format code 2025-02-25 13:19:03 +01:00
Paolo Tranquilli
e8799e346d Rust/Python: fix edition-related errors 2025-02-25 13:16:58 +01:00
Paolo Tranquilli
eff87d24fa Rust/Ruby/Python: update rustc and edition 2025-02-25 13:15:19 +01:00
Paolo Tranquilli
38efd4a8a2 Python: downgrade tree-sitter back to 0.20.4 2025-02-18 10:03:18 +01:00
Paolo Tranquilli
342bff6125 Python: undo tree-sitter update 2025-02-17 15:52:45 +01:00
Paolo Tranquilli
91b3d108bb Python: upgrade cargo dependencies
This required some code changes because of some breaking changes in
`clap` and `tree-sitter`.

Also needed to assign a new bazel repo name to the `crates_vendor` to
avoid name conflicts in `MODULE.bazel`.
2025-02-17 10:56:36 +01:00
Taus
918c05c538 Python: Don't prune any MatchLiteralPatterns
Extends the mechanism introduced in
https://github.com/github/codeql/pull/18030
to behave the same for _all_ `MatchLiteralPattern`s, not just the ones
that happen to be the constant `True` or `False`.

Co-authored-by: yoff <yoff@github.com>
2025-02-11 12:58:52 +00:00
Paolo Tranquilli
cc939e64fd Python: fix bazel rule 2025-02-07 14:42:26 +01:00
yoff
37ddaa36ad Merge pull request #18702 from github/tausbn/python-allow-comments-in-subscripts
Python: Allow comments in subscripts
2025-02-06 23:31:29 +01:00
Taus
131ec8d22f Python: Handle loop constructs outside of loops
Observed on some test files in Nuitka/Nuitka, having `break` and
`continue` outside of loops in Python is (to Python) a syntax error, but
our parser happily accepted this broken syntax.

This then caused issues further downstream in the control-flow
construction, as it broke some invariants.

To fix this we now skip the code that would previously fail when the
invariants are broken.

Co-authored-by: yoff <yoff@github.com>
2025-02-06 14:30:16 +00:00
Taus
7124e80f28 Python: Regenerate parser files 2025-02-06 14:05:40 +00:00
Taus
c5be2a3e2d Python: Allow comments in subscripts
Once again, the interaction between anchors and extras (specifically
comments) was causing trouble.

The root of the problem was the fact that in `a[b]`, we put `b` in the
`index` field of the subscript node, whereas in `a[b,c]`, we
additionally synthesize a `Tuple` node for `b,c` (which matches the
Python AST).

To fix this, we refactored the grammar slightly so as to make that tuple
explicit, such that a subscript node either contains a single expression
or the newly added tuple node. This greatly simplifies the logic.
2025-02-06 14:04:57 +00:00
Taus
60d97e0e16 Python: Print file path when logging context errors
This makes it _much_ easier to find the offending bit of syntax.
2025-02-05 13:13:39 +00:00
Cornelius Riemenschneider
53ca5083a9 Upgrade bazel to 8.0.0.
Previously, we were using 8.0.0rc1.
In particular, this upgrade means we need to explicitly
import more rules, as they've been moved out of the core bazel repo.
2024-12-10 12:05:37 +01:00
Taus
a9817a0281 Python: Add guide describing how to extend the parser 2024-11-28 12:32:00 +00:00
Taus
d779ae5c3e Python: Add change note for CFG pruning fix
... And also bump the extractor version.
2024-11-26 15:39:15 +00:00
Taus
a4ccda5fe3 Python: Fix pruning of literals in match pattern
Co-authored-by: yoff <lerchedahl@gmail.com>
2024-11-19 13:48:13 +00:00
Cornelius Riemenschneider
a66f8209f9 Rust: Vendor 3rdparty dependencies.
We've been observing some performance issues using crate_universe on CI.
Therefore, we're moving to vendor the auto-generated BUILD files
in our repository. This should provide a nice speed boost, while
getting rid of the complexity of the "rust cache" job we've been using
when we had a lot of git dependencies.

This PR includes a vendor script, and I'll put up a CI job internally
that runs that vendor script on Cargo.toml and Cargo.lock changes, to check
that the vendored files are in sync.
2024-11-13 13:22:14 +01:00
Taus
0bb5b4b9dc Merge pull request #17875 from github/tausbn/python-improve-parser-logging-and-timing
Python: Improve parser logging/timing/customisability
2024-11-01 12:47:46 +01:00
Taus
2892f0ff48 Merge pull request #17873 from github/tausbn/python-fix-generator-expression-locations
Python: Even more parser fixes
2024-11-01 12:47:19 +01:00