Python: Fix false positive for unmatchable dollar/caret

Our previous modelling did not account for the fact that a lookahead can
potentially extend all the way to the end of the input (and similarly,
that a lookbehind can extend all the way to the beginning).

To fix this, I extended `firstPart` and `lastPart` to handle lookbehinds
and lookaheads correctly, and added some test cases (all of which yield
no new results).

Fixes #20429.
This commit is contained in:
Taus
2025-09-19 15:06:46 +00:00
parent c1c0828082
commit 95a84ad655
4 changed files with 64 additions and 29 deletions

View File

@@ -150,4 +150,12 @@ re.compile(r"[\U00010000-\U0010FFFF]")
re.compile(r"[\u0000-\uFFFF]")
#Allow unicode names
re.compile(r"[\N{degree sign}\N{EM DASH}]")
re.compile(r"[\N{degree sign}\N{EM DASH}]")
#Lookahead assertions. None of these are unmatchable dollars:
re.compile(r"^(?=a$)[ab]")
re.compile(r"^(?!a$)[ab]")
#Lookbehind assertions. None of these are unmatchable carets:
re.compile(r"(?<=^a)a")
re.compile(r"(?<!^a)a")