Commit Graph

9547 Commits

Author SHA1 Message Date
Taus
ac87868097 Python: Fix parsing of await inside expressions
Found when parsing `Lib/test/test_coroutines.py` using the new parser.

For whatever reason, having `await` be an `expression` (with an argument
of the same kind) resulted in a bad parse. Consulting the official
grammar, we see that `await` should actually be a `primary_expression`
instead. This is also more in line with the other unary operators, whose
precedence is shared by the `await` syntax.
2024-10-28 14:44:01 +00:00
Taus
1e51703ce9 Python: Allow escaped quotes/backslashes in raw strings
Quoting the Python documentation (last paragraph of
https://docs.python.org/3/reference/lexical_analysis.html#escape-sequences):

"Even in a raw literal, quotes can be escaped with a backslash, but the
backslash remains in the result; for example, r"\"" is a valid string
literal consisting of two characters: a backslash and a double quote;
r"\" is not a valid string literal (even a raw string cannot end in an
odd number of backslashes)."

We did not handle this correctly in the scanner, as we only consumed the
backslash but not the following single or double quote, resulting in
that character getting interpreted as the end of the string.

To fix this, we do a second lookahead after consuming the backslash, and
if the next character is the end character for the string, we advance
the lexer across it as well.

Similarly, backslashes in raw strings can escape other backslashes.
Thus, for a string like '\\' we must consume the second backslash,
otherwise we'll interpret it as escaping the end quote.
2024-10-28 14:40:24 +00:00
yoff
7338eafbd4 Merge pull request #16812 from porcupineyhairs/pyloadSsl
Python: Pycurl SSL Disabled
2024-10-25 16:23:25 +02:00
Tom Hvitved
7c4d5981dd Shared: Add missing spaces in inline test expectation output 2024-10-25 13:23:03 +02:00
yoff
c78aeec2ec Update python/ql/lib/semmle/python/frameworks/Pycurl.qll 2024-10-24 11:44:16 +02:00
Taus
5db601af3c Python: Allow comments in comprehensions
A somewhat complicated solution that necessitated adding a new custom
function to `tsg-python`. See the comments in `python.tsg` for why this
was necessary.
2024-10-23 14:24:47 +00:00
Taus
24ae54886f Merge pull request #17809 from github/tausbn/python-fix-kwargs-in-class-bases
Python: Fix bug in handling of `**kwargs` in class bases
2024-10-23 15:04:54 +02:00
Taus
e1e35689ca Merge pull request #17807 from github/tausbn/python-fix-string-encoding-dataset-check-failure
Python: Fix string encoding dataset check failure
2024-10-23 14:26:45 +02:00
Taus
4f60494019 Python: Support assignments of the form [x,y,z] = w
Surprisingly, the new parser did not support these constructs (and the
relevant test was missing this case), so on files that required the new
parser we were unable to parse this construct.

To fix it, we add `list_pattern` (not to be confused with
`pattern_list`) as a `tree-sitter-python` node that results in a `List`
node in the AST.
2024-10-22 16:06:35 +00:00
Taus
89ea4b8200 Python: Regenerate parser files 2024-10-22 15:39:41 +00:00
Taus
9c913902c5 Python: Allow except* to be written as except *
Turns out, `except*` is actually not a token on its own according to the
Python grammar. This means it's legal to write `except *foo: ...`, which
we previously would consider a syntax error.

To fix it, we simply break up the `except*` into two separate tokens.
2024-10-22 15:39:29 +00:00
Taus
7ceefb509b Python: Regenerate parser files 2024-10-22 15:17:34 +00:00
Taus
8053e0ed44 Python: Allow list_splats as type annotations
That is, the `*T` in `def foo(*args : *T): ...`.

This is apparently a piece of syntax we did not support correctly until
now.

In terms of the grammar, we simply add `list_splat` as a possible
alternative for `type` (which could previously only be an `expression`).
We also update `python.tsg` to not specify `expression` those places (as
the relevant stanzas will then not work for `list_splat`s).

This syntax is not supported by the old parser, hence we only add a new
parser test for it.
2024-10-22 15:17:12 +00:00
Taus
fcec8e0256 Python: Fail tests when errors/warnings are logged
This is primarily useful for ensuring that errors where a node does not
have an appropriate context set in `python.tsg` actually have an effect
on the pass/fail status of the parser tests. Previously, these would
just be logged to stdout, but test could still succeed when there were
errors present.

Also fixes one of the logging lines in `tsg_parser.py` to be more
consistent with the others.
2024-10-22 15:11:51 +00:00
Taus
9803bbdc4b Python: Update class parser test 2024-10-21 15:35:48 +00:00
Taus
1cd04c96c7 Python: Fix bug in handling of **kwargs in class bases
This caused a dataset check error on the `python/cpython` database, as
we had a `DictUnpacking` node whose parent was not a `dict_item_list`,
but rather an `expr_list`.

Investigating a bit further revealed that this was because in a
construction like

```python
class C[T](base, foo=bar, **kwargs): ...
```
we were mistakenly adding `**kwargs` to the same list as `base` (which
is just a list of expressions), rather than the same list as `foo=bar`
(which is a list of dictionary items)

The ultimate cause of this was the use of `! name` in `python.tsg` to
distinguish between bases and keyword arguments (only the latter of
which have the `name` field). Because `dictionary_splat` doesn't have a
`name` field either, these were mistakenly put in the wrong list,
leading to the error.

Also, because our previous test of `class` statements did not include a
`**kwargs` construction, we were not checking that the new parser
behaved correctly in this case. For the most part this was not a
problem, but on files that use syntax not supported by the old parser
(like type parameters on classes), this became an issue. This is also
why we did not see this error previously.

To fix this, we added `! value` (which is a field present on
`dictionary_splat` nodes) as a secondary filter, and added a third
stanza to handle `dictionary_splat` nodes.
2024-10-21 15:35:47 +00:00
Taus
ae4a4bb881 Python: Flip test expectation
This test should now validate that we no longer have dataset check
errors even when there are unencodable characters.
2024-10-21 15:32:23 +00:00
Taus
cc39ae57dc Python: Fix dataset check error for string encoding
Here's an example of one of these errors:
```
INVALID_KEY predicate py_cobjectnames(@py_cobject obj, string name)

The key set {obj} does not functionally determine all fields. Here is a
pair of tuples that agree on the key set but differ at index 1: Tuple 1
in row 63874: (72088,"u'<X>'") Tuple 2 in row 63875: (72088,"u'<?>'")
```
(Here, the substring `X` should really be the Unicode character U+FFFD,
but for some reason I'm not allowed to put that in this commit message.)

Inside the extractor, we assign IDs based on the string type (bytestring
or Unicode) and a hash of the UTF-8 encoded content of the string. In
this case, however, certain _different_ strings were receiving the same
hash, due to replacement characters in the encoding process.

In particular, we were converting unencodable characters to question
marks in one place, and to U+FFFD in another place. This caused a
discrepancy that lead to the dataset check error.

To fix this, we put in a custom error handler that always puts the
U+FFFD character in place of unencodable characters. With this, the
strings now agree, and hence there is no clash.
2024-10-21 15:31:16 +00:00
Porcupiney Hairs
c7610b3539 Include change-note 2024-10-21 20:14:58 +05:30
Porcupiney Hairs
c93f0ed851 Include change-note 2024-10-21 20:12:46 +05:30
Porcupiney Hairs
c74f6f587f Merge branch 'main' into pyloadSsl 2024-10-21 20:09:05 +05:30
Porcupiney Hairs
f6369a6ed7 Include changes from review 2024-10-21 20:01:44 +05:30
Taus
d01593e571 Python: Add test for string encoding dataset check
Note that this test checks that the current setup creates dataset check
violations. A later commit will fix this (and flip the negation in the
test).
2024-10-21 12:08:46 +00:00
Porcupiney Hairs
7ef2d79b3f Include changes from review 2024-10-21 03:28:19 +05:30
Arthur Baars
08af7d0007 Merge pull request #17810 from github/post-release-prep/codeql-cli-2.19.2
Post-release preparation for codeql-cli-2.19.2
2024-10-18 18:28:07 +02:00
github-actions[bot]
272f6c2541 Post-release preparation for codeql-cli-2.19.2 2024-10-18 15:56:02 +00:00
Arthur Baars
aaf220d41e Fix typos in changelogs 2024-10-18 15:28:05 +00:00
github-actions[bot]
ca0345324e Release preparation for version 2.19.2 2024-10-18 15:16:21 +00:00
Arthur Baars
eb515f884b Revert "Release preparation for version 2.19.2" 2024-10-18 17:06:20 +02:00
Rasmus Lerchedahl Petersen
30e5a12230 Python: udate expectations 2024-10-18 15:14:51 +02:00
Rasmus Lerchedahl Petersen
30053da70d Python: extra modelling of stdlib
as a reaction to the latest QA run
2024-10-18 13:49:33 +02:00
yoff
e46722f3be Update python/ql/lib/semmle/python/dataflow/new/internal/TypeTrackingImpl.qll 2024-10-17 17:23:00 +02:00
Anders Schack-Mulligen
4153a83a4f Python: Add workaround. 2024-10-16 16:14:51 +02:00
Anders Schack-Mulligen
5950c336e2 Python: Refactor references to NormalCall. 2024-10-16 16:04:31 +02:00
Rasmus Lerchedahl Petersen
22d621c625 shared: add locations to typetracking nodes 2024-10-16 15:16:18 +02:00
Anders Schack-Mulligen
c20f12fa6c Add qldoc. 2024-10-16 14:35:23 +02:00
Anders Schack-Mulligen
7497d9530d Python: Add tentative support for speculative taint flow. 2024-10-16 14:35:20 +02:00
Anders Schack-Mulligen
c80627a3d3 Dataflow: add plumbing for adding provenance to state-steps. 2024-10-16 14:35:18 +02:00
Taus
65dbc1de91 Python: Add copy.replace test to list of runnable tests 2024-10-15 18:17:00 +02:00
Taus
28f8874243 Merge pull request #17688 from github/tausbn/python-3.13-default-type-parser-support
Python: Add support for type parameter defaults
2024-10-15 18:01:51 +02:00
Taus
d4e0cb2ffa Merge pull request #17767 from github/tausbn/python-3.13-model-flow-in-replace
Python: Model `copy.replace`
2024-10-15 18:01:28 +02:00
yoff
9ed8fe5dd0 Update python/ql/test/library-tests/dataflow/coverage/functional.py
Co-authored-by: Taus <tausbn@github.com>
2024-10-15 17:35:36 +02:00
github-actions[bot]
079ab77a38 Post-release preparation for codeql-cli-2.19.2 2024-10-15 12:16:59 +00:00
Taus
3b60d8302b Python: Add change note 2024-10-15 12:14:20 +00:00
Taus
778b96aa39 Python: Update test expectations 2024-10-15 12:14:19 +00:00
Taus
eaef783f4b Python: Add partial model for copy.replace
Extends our modelling to partially cover the behaviour of
`copy.replace`. In particular, we model this in two ways:

Firstly, we extend the existing Models-as-Data row for `copy` and
`deepcopy` to also cover `replace`. This means that we treat the result
of `replace` as containing all of the fields of the original object.
This is somewhat _more_ than we want, as strictly speaking the fields
that are overwritten should _not_ propagate flow through the `replace`
call, but currently we don't have a good way of modelling this blocking
of flow.

Secondly, we add a flow summary that adds flow from named arguments of
the `replace` call to the corresponding fields on the base object. This
ensures that we at least have the new flow arising from the `replace`
call.

Note that the flow summary adds this flow for _all_ named arguments of
_all_ `replace` calls throughout the codebase. However, since any
particular `replace` call will only populate a subset of these (the
subset consisting of exactly those named arguments that are in that
particular call), this does not cause any unwanted crosstalk between
different `replace` calls.§
2024-10-15 12:14:19 +00:00
Taus
6f2cfa0ba8 Python: Update CopySummary to use getMaDRepresentation
Demonstrates the somewhat more ergonomic way to use
`getMaDRepresentation` when specifying summaries.

Note that this slightly extends the previous definition, in that
`DictionaryContentAny` is now _also_ propagated by a call to the
`.copy()` method, but I think this is correct.
2024-10-15 11:52:37 +00:00
Taus
ce914019c5 Python: Add getMaDRepresentation()
This adds a convenient way of getting the Models-as-Data representation
of a particular type of content. This avoids repeating the same
construction over and over in our various summaries. Currently this is
defined for all types of content except the captured variable content,
which to my knowledge doesn't have any representation in Models-as-Data.
2024-10-15 11:50:38 +00:00
Taus
e16405c675 Python: Add test for copy.replace
This test demonstrates the current state of affairs: that `copy.replace`
essentially blocks all flow of taint through it, because it has not been
modelled yet.
2024-10-15 11:48:43 +00:00
Taus
417e60a466 Python: Update extractor version 2024-10-15 11:22:54 +00:00