Python: Allow escaped quotes/backslashes in raw strings

Quoting the Python documentation (last paragraph of
https://docs.python.org/3/reference/lexical_analysis.html#escape-sequences):

"Even in a raw literal, quotes can be escaped with a backslash, but the
backslash remains in the result; for example, r"\"" is a valid string
literal consisting of two characters: a backslash and a double quote;
r"\" is not a valid string literal (even a raw string cannot end in an
odd number of backslashes)."

We did not handle this correctly in the scanner, as we only consumed the
backslash but not the following single or double quote, resulting in
that character getting interpreted as the end of the string.

To fix this, we do a second lookahead after consuming the backslash, and
if the next character is the end character for the string, we advance
the lexer across it as well.

Similarly, backslashes in raw strings can escape other backslashes.
Thus, for a string like '\\' we must consume the second backslash,
otherwise we'll interpret it as escaping the end quote.
This commit is contained in:
Taus
2024-10-25 14:57:44 +00:00
parent 5db601af3c
commit 1e51703ce9
2 changed files with 25 additions and 0 deletions

View File

@@ -161,6 +161,22 @@ struct Scanner {
} else if (lexer->lookahead == '\\') {
if (delimiter.is_raw()) {
lexer->advance(lexer, false);
// In raw strings, backslashes _can_ escape the same kind of quotes as the outer
// string, so we must take care to traverse any such escaped quotes now. If we don't do
// this, we will mistakenly consider the string to end at that escaped quote.
// Likewise, this also extends to escaped backslashes.
if (lexer->lookahead == end_character || lexer->lookahead == '\\') {
lexer->advance(lexer, false);
}
// Newlines after backslashes also cause issues, so we explicitly step over them here.
if (lexer->lookahead == '\r') {
lexer->advance(lexer, false);
if (lexer->lookahead == '\n') {
lexer->advance(lexer, false);
}
} else if (lexer->lookahead == '\n') {
lexer->advance(lexer, false);
}
continue;
} else if (delimiter.is_bytes()) {
lexer->mark_end(lexer);