codeql

mirror of https://github.com/github/codeql.git synced 2025-12-16 16:53:25 +01:00

Author	SHA1	Message	Date
Taus	ad53518644	Python: Regenerate parser files	2025-06-26 15:34:44 +00:00
Taus	e04821e9e3	Python: Allow use of `match` as an identifier This previously only worked in certain circumstances. In particular, assignments such as `match[1] = ...` or even just `match[1]` would fail to parse correctly. Fixing this turned out to be less trivial than anticipated. Consider the fact that ``` match [1]: case (...) ``` can either look the start of a `match` statement, or it could be a type ascription, ascribing the value of `case(...)` (a call) to the item at index 1 of `match`. To fix this, then, we give `match` the identifier and `match` the statement the same precendence in the grammar, and additionally also mark a conflict between `match_statement` and `primary_expression`. This causes the conflict to be resolved dynamically, and seems to do the right thing in all cases.	2025-06-26 15:33:00 +00:00
Taus	2158eaa34c	Python: Fix a bug in glob regex creation The previous version was tested on a version of the code where we had temporarily removed the `glob.strip("/")` bit, and so the bug didn't trigger then. We now correctly remember if the glob ends in `/`, and add an extra part in that case. This way, if the path ends with multiple slashes, they effectively get consolidated into a single one, which results in the correct semantics.	2025-05-15 15:34:11 +00:00
Taus	c8cca126a1	Python: Bump extractor version	2025-05-15 14:59:33 +00:00
Taus	96558b53b8	Python: Update test The second test case now sets the `paths-ignore` setting in the config file in order to skip files in hidden directories.	2025-05-15 14:53:15 +00:00
Taus	98388be25c	Python: Remove special casing of hidden files If it is necessary to exclude hidden files, then adding ``` paths-ignore: ['*/./**'] ``` to the relevant config file is recommended instead.	2025-05-15 14:49:17 +00:00
Taus	61719cf448	Python: Fix a bug in glob conversion If you have a filter like `/foo/` set in the `paths-ignore` bit of your config file, then currently the following happens: - First, the CodeQL CLI observes that this string ends in `/` and strips off the `` leaving `/foo/` - Then the Python extractor strips off leading and trailing `/` characters and proceeds to convert `/foo` into a regex that is matched against files to (potentially) extract. The trouble with this is that it leaves us unable to distinguish between, say, a file `foo.py` and a file `foo/bar.py`. In other words, we have lost the ability to exclude only the _folder_ `foo` and not any files that happen to start with `foo`. To fix this, we instead make a note of whether the glob ends in a forward slash or not, and adjust the regex correspondingly.	2025-05-15 14:48:06 +00:00
Taus	605f2bff9c	Python: Add integration test	2025-05-02 14:27:46 +00:00
Taus	0c1b379ac1	Python: Extract files in hidden dirs by default Changes the default behaviour of the Python extractor so files inside hidden directories are extracted by default. Also adds an extractor option, `skip_hidden_directories`, which can be set to `true` in order to revert to the old behaviour. Finally, I made the logic surrounding what is logged in various cases a bit more obvious. Technically this changes the behaviour of the extractor (in that hidden excluded files will now be logged as `(excluded)`, but I think this makes more sense anyway.	2025-05-02 12:44:05 +00:00
Taus	6546bb1b1d	Merge branch 'main' into tausbn/python-fix-match-pruning-logic	2025-03-06 14:37:58 +01:00
Paolo Tranquilli	1bcc6ddb32	Rust/Ruby/Python: apply clippy lints	2025-02-25 13:21:28 +01:00
Paolo Tranquilli	6089a75262	Rust/Ruby/Python: format code	2025-02-25 13:19:03 +01:00
Paolo Tranquilli	e8799e346d	Rust/Python: fix edition-related errors	2025-02-25 13:16:58 +01:00
Paolo Tranquilli	eff87d24fa	Rust/Ruby/Python: update rustc and edition	2025-02-25 13:15:19 +01:00
Paolo Tranquilli	38efd4a8a2	Python: downgrade `tree-sitter` back to `0.20.4`	2025-02-18 10:03:18 +01:00
Paolo Tranquilli	342bff6125	Python: undo tree-sitter update	2025-02-17 15:52:45 +01:00
Paolo Tranquilli	91b3d108bb	Python: upgrade cargo dependencies This required some code changes because of some breaking changes in `clap` and `tree-sitter`. Also needed to assign a new bazel repo name to the `crates_vendor` to avoid name conflicts in `MODULE.bazel`.	2025-02-17 10:56:36 +01:00
Taus	918c05c538	Python: Don't prune any `MatchLiteralPattern`s Extends the mechanism introduced in https://github.com/github/codeql/pull/18030 to behave the same for _all_ `MatchLiteralPattern`s, not just the ones that happen to be the constant `True` or `False`. Co-authored-by: yoff <yoff@github.com>	2025-02-11 12:58:52 +00:00
Paolo Tranquilli	cc939e64fd	Python: fix bazel rule	2025-02-07 14:42:26 +01:00
yoff	37ddaa36ad	Merge pull request #18702 from github/tausbn/python-allow-comments-in-subscripts Python: Allow comments in subscripts	2025-02-06 23:31:29 +01:00
Taus	131ec8d22f	Python: Handle loop constructs outside of loops Observed on some test files in Nuitka/Nuitka, having `break` and `continue` outside of loops in Python is (to Python) a syntax error, but our parser happily accepted this broken syntax. This then caused issues further downstream in the control-flow construction, as it broke some invariants. To fix this we now skip the code that would previously fail when the invariants are broken. Co-authored-by: yoff <yoff@github.com>	2025-02-06 14:30:16 +00:00
Taus	7124e80f28	Python: Regenerate parser files	2025-02-06 14:05:40 +00:00
Taus	c5be2a3e2d	Python: Allow comments in subscripts Once again, the interaction between anchors and extras (specifically comments) was causing trouble. The root of the problem was the fact that in `a[b]`, we put `b` in the `index` field of the subscript node, whereas in `a[b,c]`, we additionally synthesize a `Tuple` node for `b,c` (which matches the Python AST). To fix this, we refactored the grammar slightly so as to make that tuple explicit, such that a subscript node either contains a single expression or the newly added tuple node. This greatly simplifies the logic.	2025-02-06 14:04:57 +00:00
Taus	60d97e0e16	Python: Print file path when logging context errors This makes it _much_ easier to find the offending bit of syntax.	2025-02-05 13:13:39 +00:00
Cornelius Riemenschneider	53ca5083a9	Upgrade bazel to 8.0.0. Previously, we were using 8.0.0rc1. In particular, this upgrade means we need to explicitly import more rules, as they've been moved out of the core bazel repo.	2024-12-10 12:05:37 +01:00
Taus	a9817a0281	Python: Add guide describing how to extend the parser	2024-11-28 12:32:00 +00:00
Taus	d779ae5c3e	Python: Add change note for CFG pruning fix ... And also bump the extractor version.	2024-11-26 15:39:15 +00:00
Taus	a4ccda5fe3	Python: Fix pruning of literals in `match` pattern Co-authored-by: yoff <lerchedahl@gmail.com>	2024-11-19 13:48:13 +00:00
Cornelius Riemenschneider	a66f8209f9	Rust: Vendor 3rdparty dependencies. We've been observing some performance issues using crate_universe on CI. Therefore, we're moving to vendor the auto-generated BUILD files in our repository. This should provide a nice speed boost, while getting rid of the complexity of the "rust cache" job we've been using when we had a lot of git dependencies. This PR includes a vendor script, and I'll put up a CI job internally that runs that vendor script on Cargo.toml and Cargo.lock changes, to check that the vendored files are in sync.	2024-11-13 13:22:14 +01:00
Taus	0bb5b4b9dc	Merge pull request #17875 from github/tausbn/python-improve-parser-logging-and-timing Python: Improve parser logging/timing/customisability	2024-11-01 12:47:46 +01:00
Taus	2892f0ff48	Merge pull request #17873 from github/tausbn/python-fix-generator-expression-locations Python: Even more parser fixes	2024-11-01 12:47:19 +01:00
Taus	2ef3ae9860	Python: Improve parser logging/timing/customisability Does a bunch of things, unfortunately all in the same place, so my apologies in advance for a slightly complicated commit. As for the changes themselves, this commit - Adds timers for the old and new parsers. This means we get the overall time spent on these parts of the extractor if the extractor is run with `DEBUG` output shown. - Adds logging information (at the `DEBUG` level) to show which invocations of the parsers happen when, and whether they succeed or not. - Adds support for using an environment variable named `CODEQL_PYTHON_DISABLE_OLD_PARSER` to disable using the old parser entirely. This makes it easier to test the new parser in isolation. - Fixes a bug where we did not check whether a parse with the new parser had already succeeded, and so would do a superfluous second parse.	2024-10-30 13:58:46 +00:00
Taus	f75615b913	Merge pull request #17822 from github/tausbn/python-more-parser-fixes Python: A few more parser fixes	2024-10-30 13:47:10 +01:00
Taus	5d6600e61f	Python: Fix generator expression locations Our logic for detecting the first and last item in a generator expression was faulty, sometimes matching comments as well. Because attributes (like `_location_start`) can only be written once, this caused `tree-sitter-graph` to get unhappy. To fix this, we now require the first item to be an `expression`, and the last one to be either a `for_in_clause` or an `if_clause`. Crucially, `comment` is neither of these, and this prevents the unfortunate overlap.	2024-10-28 14:53:09 +00:00
Taus	ef60b730ea	Python: Fix parenthesized tuple parser bug We were writing the `parenthesised` attribute twice on tuples, once because of the explicit parenthetisation, and once because all non-empty tuples are parenthesised. This made `tree-sitter-graph` unhappy. To fix this, we now explicitly check whether a tuple is already parenthesised, and do nothing if that is the case.	2024-10-28 14:49:45 +00:00
Taus	b4ecc7937d	Python: Fix some more `async` parsing problems Turns out we were not setting the `is_async` field on anything except `async for` statements. This commit makes it so that we also do this for `async def` and `async with`, and adds a test that this produces the same behaviour as the old parser.	2024-10-28 14:44:02 +00:00
Taus	e710c0a6bf	Python: Regenerate parser files	2024-10-28 14:44:01 +00:00
Taus	ac87868097	Python: Fix parsing of `await` inside expressions Found when parsing `Lib/test/test_coroutines.py` using the new parser. For whatever reason, having `await` be an `expression` (with an argument of the same kind) resulted in a bad parse. Consulting the official grammar, we see that `await` should actually be a `primary_expression` instead. This is also more in line with the other unary operators, whose precedence is shared by the `await` syntax.	2024-10-28 14:44:01 +00:00
Taus	1e51703ce9	Python: Allow escaped quotes/backslashes in raw strings Quoting the Python documentation (last paragraph of https://docs.python.org/3/reference/lexical_analysis.html#escape-sequences): "Even in a raw literal, quotes can be escaped with a backslash, but the backslash remains in the result; for example, r"\"" is a valid string literal consisting of two characters: a backslash and a double quote; r"\" is not a valid string literal (even a raw string cannot end in an odd number of backslashes)." We did not handle this correctly in the scanner, as we only consumed the backslash but not the following single or double quote, resulting in that character getting interpreted as the end of the string. To fix this, we do a second lookahead after consuming the backslash, and if the next character is the end character for the string, we advance the lexer across it as well. Similarly, backslashes in raw strings can escape other backslashes. Thus, for a string like '\\' we must consume the second backslash, otherwise we'll interpret it as escaping the end quote.	2024-10-28 14:40:24 +00:00
Taus	5db601af3c	Python: Allow comments in comprehensions A somewhat complicated solution that necessitated adding a new custom function to `tsg-python`. See the comments in `python.tsg` for why this was necessary.	2024-10-23 14:24:47 +00:00
Taus	24ae54886f	Merge pull request #17809 from github/tausbn/python-fix-kwargs-in-class-bases Python: Fix bug in handling of `**kwargs` in class bases	2024-10-23 15:04:54 +02:00
Taus	4f60494019	Python: Support assignments of the form `[x,y,z] = w` Surprisingly, the new parser did not support these constructs (and the relevant test was missing this case), so on files that required the new parser we were unable to parse this construct. To fix it, we add `list_pattern` (not to be confused with `pattern_list`) as a `tree-sitter-python` node that results in a `List` node in the AST.	2024-10-22 16:06:35 +00:00
Taus	89ea4b8200	Python: Regenerate parser files	2024-10-22 15:39:41 +00:00
Taus	9c913902c5	Python: Allow `except` to be written as `except ` Turns out, `except` is actually not a token on its own according to the Python grammar. This means it's legal to write `except foo: ...`, which we previously would consider a syntax error. To fix it, we simply break up the `except*` into two separate tokens.	2024-10-22 15:39:29 +00:00
Taus	7ceefb509b	Python: Regenerate parser files	2024-10-22 15:17:34 +00:00
Taus	8053e0ed44	Python: Allow `list_splat`s as type annotations That is, the `T` in `def foo(args : *T): ...`. This is apparently a piece of syntax we did not support correctly until now. In terms of the grammar, we simply add `list_splat` as a possible alternative for `type` (which could previously only be an `expression`). We also update `python.tsg` to not specify `expression` those places (as the relevant stanzas will then not work for `list_splat`s). This syntax is not supported by the old parser, hence we only add a new parser test for it.	2024-10-22 15:17:12 +00:00
Taus	fcec8e0256	Python: Fail tests when errors/warnings are logged This is primarily useful for ensuring that errors where a node does not have an appropriate context set in `python.tsg` actually have an effect on the pass/fail status of the parser tests. Previously, these would just be logged to stdout, but test could still succeed when there were errors present. Also fixes one of the logging lines in `tsg_parser.py` to be more consistent with the others.	2024-10-22 15:11:51 +00:00
Taus	9803bbdc4b	Python: Update class parser test	2024-10-21 15:35:48 +00:00
Taus	1cd04c96c7	Python: Fix bug in handling of `kwargs` in class bases This caused a dataset check error on the `python/cpython` database, as we had a `DictUnpacking` node whose parent was not a `dict_item_list`, but rather an `expr_list`. Investigating a bit further revealed that this was because in a construction like ```python class C[T](base, foo=bar, kwargs): ... ``` we were mistakenly adding `kwargs` to the same list as `base` (which is just a list of expressions), rather than the same list as `foo=bar` (which is a list of dictionary items) The ultimate cause of this was the use of `! name` in `python.tsg` to distinguish between bases and keyword arguments (only the latter of which have the `name` field). Because `dictionary_splat` doesn't have a `name` field either, these were mistakenly put in the wrong list, leading to the error. Also, because our previous test of `class` statements did not include a `kwargs` construction, we were not checking that the new parser behaved correctly in this case. For the most part this was not a problem, but on files that use syntax not supported by the old parser (like type parameters on classes), this became an issue. This is also why we did not see this error previously. To fix this, we added `! value` (which is a field present on `dictionary_splat` nodes) as a secondary filter, and added a third stanza to handle `dictionary_splat` nodes.	2024-10-21 15:35:47 +00:00
Taus	ae4a4bb881	Python: Flip test expectation This test should now validate that we no longer have dataset check errors even when there are unencodable characters.	2024-10-21 15:32:23 +00:00

1 2 3

104 Commits