I think `getUrl` is a bit too misleading, since from the name, I would
only ever expect ONE result for one request being made.
`getAUrlPart` captures that there could be multiple results, and that
they might not constitute a whole URl.
Which is the same naming I used when I tried to model this a long time ago
a80860cdc6/python/ql/lib/semmle/python/web/Http.qll (L102-L111)
I went through https://docs.python.org/3.10/library/os.html in order,
and added all the functions that works on paths.
`lstat` and `statvfs` were already modeled, but did not have any tests.
I also renamed the arguments to match what the keyword argument is
called. It doesn't matter too much for these specific tests, but for the
tests I'm about to add, it makes things a lot easier to get an overview
of.
Oh, and a test failure :O
The part about claiming there is decoding of the input to `shelve.open`
is sort of an odd one, since it's not the filename, but the contents of
the file that is decoded.
However, trying to only handle this problem through path injection is
not enough -- if a user is able to upload and access files through
`shelve.open` in a path injection safe manner, that still leads to code
execution.
So right now the best way we have of modeling this is to treat the
filename argument as being deserialized...
I made `FileSystemWriteAccess` be a subclass of `FileSystemAccess` (like in [JS](64001cc02c/javascript/ql/src/semmle/javascript/Concepts.qll (L68-L74))), but then I started wondering about how I could give a good result for `getAPathArgument`, and what would a good result even be? The argument to the `open` call, or the object that the `write` method is called on? I can't see how doing either of these enables us to do anything useful...
So I looked closer at how JS uses `FileSystemWriteAccess`:
1. as sink for zip-slip: 7c51dff0f7/javascript/ql/src/semmle/javascript/security/dataflow/ZipSlipCustomizations.qll (L121)
2. as sink for downloading unsafe files (identified through their extension) through non-secure connections: 89ef6ea4eb/javascript/ql/src/semmle/javascript/security/dataflow/InsecureDownloadCustomizations.qll (L134-L150)
3. as sink for writing untrusted data to a local file 93b1e59d62/javascript/ql/src/semmle/javascript/security/dataflow/HttpToFileAccessCustomizations.qll (L43-L46)
for the 2 first sinks, it's important that `getAPathArgument` has a proper result... so that solves the problem, and highlights that it _can_ be important to give proper results for `getAPathArgument` (if possible).
So I'm trying to do best effort for `f = open(...); f.write(...)`, but with this current code we won't always be able to give a result (as highlighted by the tests). It will also be the case that there are multiple `FileSystemAccess` with the same path-argument, which could be a little strange.
overall, I'm not super confident about the way this new concept and implementation turned out, but it also seems like the best I could come up with right now...
The obvious alternative solution is to NOT make `FileSystemWriteAccess` a subclass of `FileSystemAccess`, but I'm not very tempted to go down this path, given the examples of this being useful above, and just the general notion that we should be able to model writes as being a specialized kind of `FileSystemAccess`.
This required quite some changes in the expected output. I think it's much more
clear what the selected nodes are now 👍 (but it was a bit boring work to fix
this up)
This PR was rebased on newest main, but was written a long time ago when all the
framework test-files were still in experimental. I have not re-written my local
git-history, since there are MANY updates to those files (and I dare not risk
it).