In these cases the `return` might end up creating a new HTTP response, so they
need to be modeled as such.
Initially I created a very naive solution that didn't handle either
tricky_return1 or tricky_return2.
The interaction in tricky_return2/helper highlighted for me that to handle this
properly, due to the fact that the flow is across functions, we either need to
use a global dataflow/taint-tracking configuration, or some clever use of
type-trackers.
In the end, this extra effort for not modeling all returns in a flask route
handler as a creation of a HTTP response doesn't really seem to be worth it (at
least not right now). Sicne we use it with taint-tracking for the Reflected XSS
query, and use a HTTP response _creation_ as the sink (without propagating taint
to the HTTP response), we won't get into trouble where we report a path to BOTH
`make_response(...)` and the `return`
```
resp = make_response(...)
return resp
```
If we change this setup in the future, we will probably need to do something to
avoid this double-path reporting.
When dealing with
```
resp = make_response(...)
return resp
```
ideally we don't want to mark the return as a creation of a HTTP response. I'll
deal with this in a second commit, to show off how annoying it looks in the
tests right now :D
I'm not even trying to model it properly right now, and don't have a specific
use-case for it RIGHT NOW. I think we could want this in the future, but I think
it's probably better to model it when we know what we want to use it for.
We might need to rework this a bit when we also start to handle redirects. I
could see a world where we simply allow http redirects to be subclasses of http
responses, and need to manually exclude them from queries (or create
HttpContentResponse to model the HttpResponses that will contain a body). Let us
see where the wind will take us.
I looked through JS and Go libraries, but I didn't feel their modeling would map
very well to Python.
Added it to the `testapp` so it's easy to run the server to SEE that it works.
Added it to `routing_test` so it's obvious this is supported by our modeling
when we _know_ it's running Django 2/3.
I'm slightly suspicious of this fix -- it seems to work, but it makes
me wonder if we're potentially missing other kinds of flow, by not
handling other kinds of definitions.
Also, I feel like this should really be attached to an appropriate
post-update node of the given argument. As it is written now, the flow
will go from the argument _before_ the call, which obviously misses a
step if the argument is modified by the call. In practice, I would
expect this to be rather rare.
This is the quick-and-dirty solution, as discussed.
An even quicker-and-dirtier solution would have used
`ModuleValue::attr` and take the `getOrigin` of that as the source of
the jump step. However, this turns out to be a bad choice, since
`attr` might fail to have a value for the given attribute (for a
variety of reasons). Thus, we instead appeal to a helper predicate
that keeps track of which names are defined by which right-hand-sides
in a given module. (Observe that type tracking works correctly for `x`
in `mymodule.py`, even though `x` is never assigned a value in the
eyes of the Value API.)
This means that points-to is only used to actually figure out if the
object we're looking an attribute up on is a module or not. This is
the next thing to replace in order to eliminate the dependence on
points-to, but this will require some care to ensure that all module
lookups are handled correctly.
Only two test files needed to be changed for the tests to pass. The
first was the fixed false negative in the type tracker, and the other
was a bunch of missing flow in the regression test. I have manually
removed the `# Flow not found` annotations to make them consistent
with the output. Pay particular attention to the annotation on line
117 -- I believe it was misplaced and should have been on line 106
instead (where, indeed, we now have flow where none appeared before).
This required some thought for how to model that we're interested in subclasses
of `fabric.group.Group`, and not so much that class itself. Some thoughts:
---
After initially using this in `module Group`
/** A reference to a subclass of `fabric.group.Group` */
abstract class SubclassRef extends DataFlow::Node { }
private class SubclassInstantiation extends SubclassInstanceSource, DataFlow::CfgNode {
override CallNode node;
SubclassInstantiation() { node.getFunction() = any(SubclassRef ref).asCfgNode() }
}
with this in `module SerialGroup` and `module ThreadingGroup`:
class ClassRef extends DataFlow::Node, fabric::group::Group::SubclassRef {
ClassRef() { this = classRef(DataFlow::TypeTracker::end()) }
}
I wasn't too much of fan of that approach. Since we probably need the `SubclassInstanceSource` anyway, and don't really have a specific use for `SubclassRef`, I just went with concrete (QL) subclasses of `SubclassInstanceSource` in each of the modules for the Python subclasses.
I really don't know what the best approach is, so I'm very open to suggestions. I think we'll really have to flesh this out for handling Django responses, since we're interested in the fact that some subclasses provide default values for the content-type, and keeping track of that is important for XSS (since there is no XSS if response is `text/plain`)
For v1 tests, just extended with explicit calls that use keyword arguments.
For v2 tests, rewrote pretty much everything to what it 100% explicit what we support