Commit Graph

18 Commits

Author SHA1 Message Date
Paolo Tranquilli
342bff6125 Python: undo tree-sitter update 2025-02-17 15:52:45 +01:00
Paolo Tranquilli
91b3d108bb Python: upgrade cargo dependencies
This required some code changes because of some breaking changes in
`clap` and `tree-sitter`.

Also needed to assign a new bazel repo name to the `crates_vendor` to
avoid name conflicts in `MODULE.bazel`.
2025-02-17 10:56:36 +01:00
Taus
7124e80f28 Python: Regenerate parser files 2025-02-06 14:05:40 +00:00
Taus
c5be2a3e2d Python: Allow comments in subscripts
Once again, the interaction between anchors and extras (specifically
comments) was causing trouble.

The root of the problem was the fact that in `a[b]`, we put `b` in the
`index` field of the subscript node, whereas in `a[b,c]`, we
additionally synthesize a `Tuple` node for `b,c` (which matches the
Python AST).

To fix this, we refactored the grammar slightly so as to make that tuple
explicit, such that a subscript node either contains a single expression
or the newly added tuple node. This greatly simplifies the logic.
2025-02-06 14:04:57 +00:00
Cornelius Riemenschneider
a66f8209f9 Rust: Vendor 3rdparty dependencies.
We've been observing some performance issues using crate_universe on CI.
Therefore, we're moving to vendor the auto-generated BUILD files
in our repository. This should provide a nice speed boost, while
getting rid of the complexity of the "rust cache" job we've been using
when we had a lot of git dependencies.

This PR includes a vendor script, and I'll put up a CI job internally
that runs that vendor script on Cargo.toml and Cargo.lock changes, to check
that the vendored files are in sync.
2024-11-13 13:22:14 +01:00
Taus
e710c0a6bf Python: Regenerate parser files 2024-10-28 14:44:01 +00:00
Taus
ac87868097 Python: Fix parsing of await inside expressions
Found when parsing `Lib/test/test_coroutines.py` using the new parser.

For whatever reason, having `await` be an `expression` (with an argument
of the same kind) resulted in a bad parse. Consulting the official
grammar, we see that `await` should actually be a `primary_expression`
instead. This is also more in line with the other unary operators, whose
precedence is shared by the `await` syntax.
2024-10-28 14:44:01 +00:00
Taus
1e51703ce9 Python: Allow escaped quotes/backslashes in raw strings
Quoting the Python documentation (last paragraph of
https://docs.python.org/3/reference/lexical_analysis.html#escape-sequences):

"Even in a raw literal, quotes can be escaped with a backslash, but the
backslash remains in the result; for example, r"\"" is a valid string
literal consisting of two characters: a backslash and a double quote;
r"\" is not a valid string literal (even a raw string cannot end in an
odd number of backslashes)."

We did not handle this correctly in the scanner, as we only consumed the
backslash but not the following single or double quote, resulting in
that character getting interpreted as the end of the string.

To fix this, we do a second lookahead after consuming the backslash, and
if the next character is the end character for the string, we advance
the lexer across it as well.

Similarly, backslashes in raw strings can escape other backslashes.
Thus, for a string like '\\' we must consume the second backslash,
otherwise we'll interpret it as escaping the end quote.
2024-10-28 14:40:24 +00:00
Taus
89ea4b8200 Python: Regenerate parser files 2024-10-22 15:39:41 +00:00
Taus
9c913902c5 Python: Allow except* to be written as except *
Turns out, `except*` is actually not a token on its own according to the
Python grammar. This means it's legal to write `except *foo: ...`, which
we previously would consider a syntax error.

To fix it, we simply break up the `except*` into two separate tokens.
2024-10-22 15:39:29 +00:00
Taus
7ceefb509b Python: Regenerate parser files 2024-10-22 15:17:34 +00:00
Taus
8053e0ed44 Python: Allow list_splats as type annotations
That is, the `*T` in `def foo(*args : *T): ...`.

This is apparently a piece of syntax we did not support correctly until
now.

In terms of the grammar, we simply add `list_splat` as a possible
alternative for `type` (which could previously only be an `expression`).
We also update `python.tsg` to not specify `expression` those places (as
the relevant stanzas will then not work for `list_splat`s).

This syntax is not supported by the old parser, hence we only add a new
parser test for it.
2024-10-22 15:17:12 +00:00
Taus
6545bfffa7 Python: Regenerate parser files
Two new files -- alloc.h and array.h -- suddenly appeared. Presumably
they are used by the somewhat newer version of tree-sitter. To be safe,
I included them in this commit.
2024-10-15 11:22:31 +00:00
Taus
882249ef82 Python: Add grammar support for type defaults
Also fixes an oversight in the grammar: starred expressions should be
allowed inside the subscript of an `Index` expression.
2024-10-15 11:22:30 +00:00
Paolo Tranquilli
9f5782b67b Bazel: introduce buildifier formatting
This introduces tooling and enforcement for formatting bazel files.

The tooling is provided as a bazel run target from
[keith/buildifier-prebuilt](https://github.com/keith/buildifier-prebuilt).

This is used in a [`pre-commit`](https://pre-commit.com/) hook for those
having that installed. In turn this is used in a CI check. Relying on a
`pre-commit` action gives us easy checking that buildifying did not
change anything in the files and printing the diff, without having to
hand-roll the check ourselves.

This enforcement will make usage of gazelle easier, as gazelle itself
might reformat files, even outside of `go`. Having them properly
formatted will allow gazelle to leave them unchanged, without needing
to configure awkward exclude directives.
2024-04-24 15:49:48 +02:00
Taus
7bec41096c Python: Rename tsg-build target to tsp-build
The latter makes more sense, as it's actually building
`tree-sitter-python`.
2024-04-05 12:30:40 +02:00
Taus
d12ac1e7ce Python: Use tsp instead of tree-sitter-python 2024-03-19 17:11:40 +00:00
Taus
38169a981d Python: Shorten tree-sitter-python directory name
The current name results in a path that is more than 260 characters long,
and this causes issues for the build on Windows.
2024-03-19 17:11:40 +00:00