I've added 2 queries:
- one that detects full SSRF, where an attacker can control the full URL,
which is always bad
- and one for partial SSRF, where an attacker can control parts of an
URL (such as the path, query parameters, or fragment), which is not a
big problem in many cases (but might still be exploitable)
full SSRF should run by default, and partial SSRF should not (but makes
it easy to see the other results).
Some elements of the full SSRF queries needs a bit more polishing, like
being able to detect `"https://" + user_input` is in fact controlling
the full URL.
I think `getUrl` is a bit too misleading, since from the name, I would
only ever expect ONE result for one request being made.
`getAUrlPart` captures that there could be multiple results, and that
they might not constitute a whole URl.
Which is the same naming I used when I tried to model this a long time ago
a80860cdc6/python/ql/lib/semmle/python/web/Http.qll (L102-L111)
For the snippet below, our current query is able to show _why_ we
consider `var` to be a falsey value that would disable SSL/TLS
verification. I'm not sure we're going to need the part that Ruby did,
for being able to specify _where_ the verification was removed, but
we'll see.
```
requests.get(url, verify=var)
```
Taken from Ruby, except that `getURL` member predicate was changed to
`getUrl` to keep consistency with the rest of our concepts, and stick
to our naming convention.
Unrolling the transitive closure had slightly better performance here.
Also, we exclude names of builtins, since those will be handled by a
separate case of `isDefinedLocally`.
A few changes, all bundled together:
- We were getting a lot of magic applied to the predicates in the
`ImportStar` module, and this was causing needless re-evaluation.
To address this, the easiest solution was to simply cache the entire
module.
- In order to separate this from the dataflow analysis and make it
dependent only on control flow, `potentialImportStarBase` was changed
to return a `ControlFlowNode`.
- `isDefinedLocally` was defined on control flow nodes, which meant we
were duplicating a lot of tuples due to control flow splitting, to no
actual benefit.
Finally, there was a really bad join in `isDefinedLocally` that was
fixed by separating out a helper predicate. This is a case where we
could use a three-way join, since the join between the `Scope`, the
`name` string and the `Name` is big no matter what.
If we join `scope_defines_name` with `n.getId()`, we'll get `Name`s
belonging to irrelevant scopes.
If we join `scope_defines_name` with the enclosing scope of the `Name`
`n`, then we'll get this also for `Name`s that don't share their `getId`
with the local variable defined in the scope.
If we join `n.getId()` with `n.getScope()...` then we'll get all
enclosing scopes for each `Name`.
The last of these is what we currently have. It's not terrible, but not
great either. (Though thankfully it's rare to have lots of enclosing
scopes.)