Commit Graph

112 Commits

Author SHA1 Message Date
Taus
89ea4b8200 Python: Regenerate parser files 2024-10-22 15:39:41 +00:00
Taus
9c913902c5 Python: Allow except* to be written as except *
Turns out, `except*` is actually not a token on its own according to the
Python grammar. This means it's legal to write `except *foo: ...`, which
we previously would consider a syntax error.

To fix it, we simply break up the `except*` into two separate tokens.
2024-10-22 15:39:29 +00:00
Taus
7ceefb509b Python: Regenerate parser files 2024-10-22 15:17:34 +00:00
Taus
8053e0ed44 Python: Allow list_splats as type annotations
That is, the `*T` in `def foo(*args : *T): ...`.

This is apparently a piece of syntax we did not support correctly until
now.

In terms of the grammar, we simply add `list_splat` as a possible
alternative for `type` (which could previously only be an `expression`).
We also update `python.tsg` to not specify `expression` those places (as
the relevant stanzas will then not work for `list_splat`s).

This syntax is not supported by the old parser, hence we only add a new
parser test for it.
2024-10-22 15:17:12 +00:00
Taus
fcec8e0256 Python: Fail tests when errors/warnings are logged
This is primarily useful for ensuring that errors where a node does not
have an appropriate context set in `python.tsg` actually have an effect
on the pass/fail status of the parser tests. Previously, these would
just be logged to stdout, but test could still succeed when there were
errors present.

Also fixes one of the logging lines in `tsg_parser.py` to be more
consistent with the others.
2024-10-22 15:11:51 +00:00
Taus
9803bbdc4b Python: Update class parser test 2024-10-21 15:35:48 +00:00
Taus
1cd04c96c7 Python: Fix bug in handling of **kwargs in class bases
This caused a dataset check error on the `python/cpython` database, as
we had a `DictUnpacking` node whose parent was not a `dict_item_list`,
but rather an `expr_list`.

Investigating a bit further revealed that this was because in a
construction like

```python
class C[T](base, foo=bar, **kwargs): ...
```
we were mistakenly adding `**kwargs` to the same list as `base` (which
is just a list of expressions), rather than the same list as `foo=bar`
(which is a list of dictionary items)

The ultimate cause of this was the use of `! name` in `python.tsg` to
distinguish between bases and keyword arguments (only the latter of
which have the `name` field). Because `dictionary_splat` doesn't have a
`name` field either, these were mistakenly put in the wrong list,
leading to the error.

Also, because our previous test of `class` statements did not include a
`**kwargs` construction, we were not checking that the new parser
behaved correctly in this case. For the most part this was not a
problem, but on files that use syntax not supported by the old parser
(like type parameters on classes), this became an issue. This is also
why we did not see this error previously.

To fix this, we added `! value` (which is a field present on
`dictionary_splat` nodes) as a secondary filter, and added a third
stanza to handle `dictionary_splat` nodes.
2024-10-21 15:35:47 +00:00
Taus
ae4a4bb881 Python: Flip test expectation
This test should now validate that we no longer have dataset check
errors even when there are unencodable characters.
2024-10-21 15:32:23 +00:00
Taus
cc39ae57dc Python: Fix dataset check error for string encoding
Here's an example of one of these errors:
```
INVALID_KEY predicate py_cobjectnames(@py_cobject obj, string name)

The key set {obj} does not functionally determine all fields. Here is a
pair of tuples that agree on the key set but differ at index 1: Tuple 1
in row 63874: (72088,"u'<X>'") Tuple 2 in row 63875: (72088,"u'<?>'")
```
(Here, the substring `X` should really be the Unicode character U+FFFD,
but for some reason I'm not allowed to put that in this commit message.)

Inside the extractor, we assign IDs based on the string type (bytestring
or Unicode) and a hash of the UTF-8 encoded content of the string. In
this case, however, certain _different_ strings were receiving the same
hash, due to replacement characters in the encoding process.

In particular, we were converting unencodable characters to question
marks in one place, and to U+FFFD in another place. This caused a
discrepancy that lead to the dataset check error.

To fix this, we put in a custom error handler that always puts the
U+FFFD character in place of unencodable characters. With this, the
strings now agree, and hence there is no clash.
2024-10-21 15:31:16 +00:00
Taus
d01593e571 Python: Add test for string encoding dataset check
Note that this test checks that the current setup creates dataset check
violations. A later commit will fix this (and flip the negation in the
test).
2024-10-21 12:08:46 +00:00
Taus
417e60a466 Python: Update extractor version 2024-10-15 11:22:54 +00:00
Taus
819b3d77ab Python: Update test expectations
Note that this still includes the somewhat puzzling parsing of
`Spam[**P2]` as an exponentiation with an empty left hand side. When we
fix that bug, we should also update this test to contain actually valid
syntax.
2024-10-15 11:22:33 +00:00
Taus
36d89745f9 Python: Fix dbscheme/AST autogeneration
There was an errant `ql` in the relevant paths, a leftover from the move
from the internal repo. Also, we can no longer rely on an intree version
of the CodeQL CLI, so from now on we'll just assume it's present in the
path. (On Codespaces, `gh codeql` is a decent replacement, especially if
using the `install-stub` functionality.
2024-10-15 11:22:32 +00:00
Taus
2af0d78435 Python: Add default field to the relevant AST nodes 2024-10-15 11:22:32 +00:00
Taus
55ee3eb36b Python: Add TSG support for type defaults 2024-10-15 11:22:31 +00:00
Taus
6545bfffa7 Python: Regenerate parser files
Two new files -- alloc.h and array.h -- suddenly appeared. Presumably
they are used by the somewhat newer version of tree-sitter. To be safe,
I included them in this commit.
2024-10-15 11:22:31 +00:00
Taus
882249ef82 Python: Add grammar support for type defaults
Also fixes an oversight in the grammar: starred expressions should be
allowed inside the subscript of an `Index` expression.
2024-10-15 11:22:30 +00:00
Taus
1ced5b44d7 Python: Add test for type parameter defaults 2024-10-15 11:22:30 +00:00
yoff
da5e9ac18c python: more adjustments... 2024-10-14 14:54:33 +00:00
yoff
9d8d7ab237 python: update extractor expectations 2024-10-14 14:14:40 +00:00
Rasmus Lerchedahl Petersen
3402a729d0 Python: adjust test expectations for extractor test 2024-10-14 12:36:56 +02:00
Rasmus Lerchedahl Petersen
e2eb08b543 Python: improve messaging 2024-10-11 15:36:44 +02:00
Rasmus Lerchedahl Petersen
22588c9f85 Python: update ectractor version 2024-10-11 15:36:44 +02:00
Rasmus Lerchedahl Petersen
4a291147e0 Python: only look for the py2 stdlib if we extract std lib 2024-10-11 15:36:44 +02:00
Rasmus Lerchedahl Petersen
e91efaa92e python: do not extract stdlib by default 2024-10-11 15:36:44 +02:00
yoff
0b0e8a4bf5 Update python/extractor/tests/parser/.gitignore
As suggested by @tausbn
2024-10-09 12:22:17 +02:00
Rasmus Lerchedahl Petersen
ad630bc6ff Python: ignore some extractor test output
If you test the extractor locally, you want to ignore these files.
2024-10-09 11:34:58 +02:00
Rasmus Wriedt Larsen
d50898e114 Python: Downgrade packaging for Python 3.7 support 2024-08-06 11:15:48 +02:00
Rasmus Wriedt Larsen
4eb6afa880 Python: Update poetry.lock 2024-08-05 14:14:41 +02:00
Rasmus Wriedt Larsen
354394d4c2 Python: Don't use fake locations in diagnostics
Some of the internal tooling would not be too happy about this :D
2024-07-12 13:36:41 +02:00
Rasmus Wriedt Larsen
60d1dc8af8 Python: Bump extractor version 2024-07-09 14:15:52 +02:00
Rasmus Wriedt Larsen
6b3625e24e Python: Handle diagnostics writing for BuiltinModuleExtractable 2024-07-09 14:15:52 +02:00
Rasmus Wriedt Larsen
c1da2c1d2f Python: Gracefully handle exceptions in diagnostics writing 2024-07-09 14:15:51 +02:00
Rasmus Wriedt Larsen
a8b976b389 Python: Always log errors before writing diagnostics
So we have the info in the logs if the diagnostics processing fails
2024-07-09 13:47:53 +02:00
Cornelius Riemenschneider
092bc6445d Rust/bazel: Port to bzlmod.
This gets rid of our last workspace dependency.
In particular, this change also gets rid of the checked-in extra
lock files that took forever to generate.
2024-06-10 17:03:58 +02:00
Cornelius Riemenschneider
c30cc0f665 Fix formatting. 2024-06-03 16:10:41 +02:00
Cornelius Riemenschneider
1158e92f12 Python: Move to the new packaging rules. 2024-05-30 14:25:18 +02:00
Tom Hvitved
386bc1eb03 Bazel: repin 2024-05-24 13:53:55 +02:00
Tom Hvitved
7490472772 Update Python to use Rust 1.74 2024-05-24 13:05:39 +02:00
Tom Hvitved
158dafa7d0 Python: Dummy change to trigger CI 2024-05-21 11:25:21 +02:00
Paolo Tranquilli
9f5782b67b Bazel: introduce buildifier formatting
This introduces tooling and enforcement for formatting bazel files.

The tooling is provided as a bazel run target from
[keith/buildifier-prebuilt](https://github.com/keith/buildifier-prebuilt).

This is used in a [`pre-commit`](https://pre-commit.com/) hook for those
having that installed. In turn this is used in a CI check. Relying on a
`pre-commit` action gives us easy checking that buildifying did not
change anything in the files and printing the diff, without having to
hand-roll the check ourselves.

This enforcement will make usage of gazelle easier, as gazelle itself
might reformat files, even outside of `go`. Having them properly
formatted will allow gazelle to leave them unchanged, without needing
to configure awkward exclude directives.
2024-04-24 15:49:48 +02:00
Sid Shankar
e33c5706f8 Modifies check for py launcher
This commit modifies the check for the "py" launcher on windows. We now look for the launcher only if the python_executable_name extractor option is not specified.
2024-04-11 12:59:41 -04:00
Rasmus Wriedt Larsen
6f1a9d4574 Merge pull request #16159 from RasmusWL/fix-integration-tests
Python: Fixup integration tests after no dep inst
2024-04-09 15:08:20 +02:00
Rasmus Wriedt Larsen
6ce38be3cc Merge pull request #16112 from github/tausbn/python-various-extractor-fixups
Python: Various extractor fixups
2024-04-09 14:46:23 +02:00
Rasmus Wriedt Larsen
e9e7ccddce Python: delete force-enable-library-extraction integration test 2024-04-09 14:02:34 +02:00
Rasmus Wriedt Larsen
a0d6324f68 Python: Fix ignore-venv integration test
Now that we no longer support the fallback option
(https://github.com/github/codeql/pull/16127)
2024-04-09 14:01:10 +02:00
Rasmus Wriedt Larsen
bb4952f557 Revert "Python: Disable failing integration tests"
This reverts commit 8c2455fc11.
2024-04-09 14:00:25 +02:00
Taus
8c2455fc11 Python: Disable failing integration tests
These failures were likely caused by
https://github.com/github/codeql/pull/16127

My guess is that they can probably be deleted altogether, but as the
failures are blocking other development, I have opted to simply disable
them for the time being.
2024-04-09 10:49:30 +00:00
Taus
ef9f99b3be Python: Remove unparse.py 2024-04-05 12:30:40 +02:00
Taus
599f573a4a Python: Preserve comments and docstrings in extractor 2024-04-05 12:30:40 +02:00