Python: Port URL sanitisation queries to API graphs

Really, this boils down to "Port `re` library model to use API graphs
instead of points-to", which is what this PR actually does.

Instead of using points-to to track flags, we use a type tracker. To
handle multiple flags at the same time, we add additional flow from

`x` to `x | y` and `y | x`

and, as an added bonus, the above with `+` instead of `|`, neatly
fixing https://github.com/github/codeql/issues/4707

I had to modify the `Qualified.ql` test slightly, as it now had a
result stemming from the standard library (in `warnings.py`) that
points-to previously ignored.

It might be possible to implement this as a type tracker on
`LocalSourceNode`s, but with the added steps for the above operations,
this was not obvious to me, and so I opted for the simpler
"`smallstep`" variant.
This commit is contained in:
Taus Brock-Nannestad
2021-02-23 22:02:35 +01:00
parent f65843a273
commit e812eb777d
4 changed files with 73 additions and 29 deletions

View File

@@ -7,5 +7,7 @@
| 50 | VERBOSE |
| 51 | UNICODE |
| 52 | UNICODE |
| 54 | DOTALL |
| 54 | VERBOSE |
| 56 | VERBOSE |
| 68 | MULTILINE |

View File

@@ -2,5 +2,6 @@ import python
import semmle.python.regex
from Regex r, int start, int end, boolean maybe_empty
where r.qualifiedItem(start, end, maybe_empty)
where
r.qualifiedItem(start, end, maybe_empty) and r.getLocation().getFile().getShortName() = "test.py"
select r.getText(), start, end, maybe_empty

View File

@@ -51,7 +51,7 @@ re.compile("", flags=re.VERBOSE|re.IGNORECASE)
re.search("", None, re.UNICODE)
x = re.search("", flags=re.UNICODE)
# using addition for flags was reported as FP in https://github.com/github/codeql/issues/4707
re.compile("", re.VERBOSE+re.DOTALL) # TODO: Currently not recognized with Mode.ql
re.compile("", re.VERBOSE+re.DOTALL)
# re.X is an alias for re.VERBOSE
re.compile("", re.X)