I've added 2 queries:
- one that detects full SSRF, where an attacker can control the full URL,
which is always bad
- and one for partial SSRF, where an attacker can control parts of an
URL (such as the path, query parameters, or fragment), which is not a
big problem in many cases (but might still be exploitable)
full SSRF should run by default, and partial SSRF should not (but makes
it easy to see the other results).
Some elements of the full SSRF queries needs a bit more polishing, like
being able to detect `"https://" + user_input` is in fact controlling
the full URL.
I think `getUrl` is a bit too misleading, since from the name, I would
only ever expect ONE result for one request being made.
`getAUrlPart` captures that there could be multiple results, and that
they might not constitute a whole URl.
Which is the same naming I used when I tried to model this a long time ago
a80860cdc6/python/ql/lib/semmle/python/web/Http.qll (L102-L111)
Also adjusts test slightly. Writing
`clientRequestDisablesCertValidation=False` to mean that certificate
validation was disabled by the `False` expression is just confusing, as
it easily reads as _certificate validate was NOT disabled_ :|
The new one ties to each request that is being made, which seems like
the right setup.
For the snippet below, our current query is able to show _why_ we
consider `var` to be a falsey value that would disable SSL/TLS
verification. I'm not sure we're going to need the part that Ruby did,
for being able to specify _where_ the verification was removed, but
we'll see.
```
requests.get(url, verify=var)
```
Taken from Ruby, except that `getURL` member predicate was changed to
`getUrl` to keep consistency with the rest of our concepts, and stick
to our naming convention.
The easiest way to implement this was to change the definition of
`module_export` to account for chains of `import *`. We reuse the
machinery from `ImportStar.qll` for this, naturally.
TL;DR: We were missing out on flow in the following situation:
`mod1.py`:
```python
foo = SOURCE
```
`mod2.py`:
```python
from mod1 import *
```
`test.py`:
```python
from mod2 import foo
SINK(foo)
```
This is because there's no node at which a read of `foo` takes place
within `test.py`, and so the added reads make no difference.
Unfortunately, this means the previous test was a bit too simplistic,
since it only looks for module variable reads and writes. Because of
this, we change the test to be a more traditional "all flow" style
(though restricted to `CfgNode`s).
I went through https://docs.python.org/3.10/library/os.html in order,
and added all the functions that works on paths.
`lstat` and `statvfs` were already modeled, but did not have any tests.
I also renamed the arguments to match what the keyword argument is
called. It doesn't matter too much for these specific tests, but for the
tests I'm about to add, it makes things a lot easier to get an overview
of.
Oh, and a test failure :O
Adds result for `ModuleVariableNode::getARead` corresponding to reads
that go through (chains of) `import *`.
This required a bit of a change to _which_ module variables we define.
Previously, we only included variables that were accessed elsewhere in
the same file, but now we must ensure to also include variables that may
be accessed through `import *`.