I think `getUrl` is a bit too misleading, since from the name, I would
only ever expect ONE result for one request being made.
`getAUrlPart` captures that there could be multiple results, and that
they might not constitute a whole URl.
Which is the same naming I used when I tried to model this a long time ago
a80860cdc6/python/ql/lib/semmle/python/web/Http.qll (L102-L111)
Also adjusts test slightly. Writing
`clientRequestDisablesCertValidation=False` to mean that certificate
validation was disabled by the `False` expression is just confusing, as
it easily reads as _certificate validate was NOT disabled_ :|
The new one ties to each request that is being made, which seems like
the right setup.
For the snippet below, our current query is able to show _why_ we
consider `var` to be a falsey value that would disable SSL/TLS
verification. I'm not sure we're going to need the part that Ruby did,
for being able to specify _where_ the verification was removed, but
we'll see.
```
requests.get(url, verify=var)
```
Taken from Ruby, except that `getURL` member predicate was changed to
`getUrl` to keep consistency with the rest of our concepts, and stick
to our naming convention.
Before, results from `dca` would look something like
## + py/meta/alerts/remote-flow-sources-reach
- django/django@c2250cf_cb8f: tests/messages_tests/urls.py:38:16:38:48
reachable with taint-tracking from RemoteFlowSource
- django/django@c2250cf_cb8f: tests/messages_tests/urls.py:38:9:38:12
reachable with taint-tracking from RemoteFlowSource
now it should make it easier to spot _what_ it is that actually changed,
since we pretty-print the node.
along with tests, but no implementations (to ease reviewing).
---
I've put quite some thinking into what to call our concept for this.
[JS has `CookieDefinition`](581f4ed757/javascript/ql/src/semmle/javascript/frameworks/HTTP.qll (L148-L187)), but I couldn't find a matching concept in any other languages.
We used to call this [`CookieSet`](f07a7bf8cf/python/ql/src/semmle/python/web/Http.qll (L76)) (and had a corresponding `CookieGet`).
But for headers, [Go calls this `HeaderWrite`](cd1e14ed09/ql/src/semmle/go/concepts/HTTP.qll (L97-L131)) and [JS calls this `HeaderDefinition`](581f4ed757/javascript/ql/src/semmle/javascript/frameworks/HTTP.qll (L23-L46))
I think it would be really cool if we have a naming scheme that means the name for getting the value of a header on a incoming request is obvious. I think `HeaderWrite`/`HeaderRead` fulfils this best. We could go with `HeaderSet`/`HeaderGet`, but they feel a bit too vague to me. For me, I'm so used to talking about def-use, that I would immediately go for `HeaderDefinition` and `HeaderUse`, which could work, but is kinda strange.
So in the end that means I went with `CookieWrite`, since that allows using a consistent naming scheme for the future :)
I made `FileSystemWriteAccess` be a subclass of `FileSystemAccess` (like in [JS](64001cc02c/javascript/ql/src/semmle/javascript/Concepts.qll (L68-L74))), but then I started wondering about how I could give a good result for `getAPathArgument`, and what would a good result even be? The argument to the `open` call, or the object that the `write` method is called on? I can't see how doing either of these enables us to do anything useful...
So I looked closer at how JS uses `FileSystemWriteAccess`:
1. as sink for zip-slip: 7c51dff0f7/javascript/ql/src/semmle/javascript/security/dataflow/ZipSlipCustomizations.qll (L121)
2. as sink for downloading unsafe files (identified through their extension) through non-secure connections: 89ef6ea4eb/javascript/ql/src/semmle/javascript/security/dataflow/InsecureDownloadCustomizations.qll (L134-L150)
3. as sink for writing untrusted data to a local file 93b1e59d62/javascript/ql/src/semmle/javascript/security/dataflow/HttpToFileAccessCustomizations.qll (L43-L46)
for the 2 first sinks, it's important that `getAPathArgument` has a proper result... so that solves the problem, and highlights that it _can_ be important to give proper results for `getAPathArgument` (if possible).
So I'm trying to do best effort for `f = open(...); f.write(...)`, but with this current code we won't always be able to give a result (as highlighted by the tests). It will also be the case that there are multiple `FileSystemAccess` with the same path-argument, which could be a little strange.
overall, I'm not super confident about the way this new concept and implementation turned out, but it also seems like the best I could come up with right now...
The obvious alternative solution is to NOT make `FileSystemWriteAccess` a subclass of `FileSystemAccess`, but I'm not very tempted to go down this path, given the examples of this being useful above, and just the general notion that we should be able to model writes as being a specialized kind of `FileSystemAccess`.
This required quite some changes in the expected output. I think it's much more
clear what the selected nodes are now 👍 (but it was a bit boring work to fix
this up)
I considered using `getInput` like in JS, but things like signature verification
has multiple inputs (message and signature).
Using getAnInput also aligns better with Decoding/Encoding.
I did spend some time to figure out how to best write `minimumSecureKeySize`
predicate. I wanted to write once and for all the recommended sizes for each
cryptosystem.
I considered making the predicate such as
```codeql
int minimumSecureKeySize() {
this.getName() = "RSA" and result = 2048
or
this.getName() = "DSA" and result = 2048
or
this.getName() = "ECC" and result = 244
}
```
but then it would be impossible to add a new model without also being able to
modify the body of this predicate -- which seems like a bad way to start off a
brand new way of modeling things.
So I considered if we could add it to the non-range class, such as
```codeql
class RSAKeyGeneration extends KeyGeneration {
RSAKeyGeneration() { this.getName() = "RSA" }
override int minimumSecureKeySize() { result = 2048 }
}
```
This has the major problem that when you're writing the models for a new
API (and therefore extending KeyGeneration::Range), there is no way for you to
see that you need to take this extra step :| (also problem about how we should
define `minimumSecureKeySize` on `KeyGeneration` class then, since if we make it
abstract, we effectively disable the ability to refine `KeyGeneration` since any
subclass must provide an implementation.)
So, therefore I ended up with this solution ;)