Two interesting things happened while doing this:
1. I found out that you can't use the same name to define a submodule as any
parent module. So we need give unique names to the top-level module, and the
module for modeling the `flask.Flask` class. I randomly choose a new name for
the top-level module to get things moving (and not be stuck in bikeshedding
forever).
2. With this new setup, I wanted to expose the `route` and `add_url_rule`
methods on instances of `flask.Flask`. It wasn't quite obvious how to do so. I
simply lumped them next to `classRef()` and `instance()`, without too much
care. I did consider putting them inside a `instance` module, which would allow
you to access them by `flask::Flask::instance::route()`, but I wasn't quite
sure, and just did something easy to get moving.
Clause timing report had this suspicious entry
```
CommandInjection.ql-12:DataFlowPublic::Node::getCallableScope#bbf .................. 7.2s
(4 evaluations with max 6.4s in DataFlowPublic::Node::getCallableScope#bbf/3@i3#119d7b)
```
which indeed was a bad join:
```
Tuple counts for DataFlowPublic::Node::getCallableScope#bbf:
293509 ~2% {3} r1 = JOIN DataFlowPublic::Node::getCallableScope#bbf#prev_delta AS L WITH DataFlowPublic::TNode#f AS R ON FIRST 1 OUTPUT L.<1>, L.<0>, L.<2>
22337162 ~0% {3} r2 = JOIN r1 WITH Scope::Scope::getEnclosingScope_dispred#ff_10#join_rhs AS R ON FIRST 1 OUTPUT r1.<1>, r1.<2>, R.<1>
22337162 ~0% {3} r3 = r2 AND NOT DataFlowPublic::Node::getCallableScope#bbf#prev AS R(r2.<0>, r2.<2>, r2.<1>)
22337162 ~0% {3} r4 = SCAN r3 OUTPUT r3.<0>, r3.<2>, r3.<1>
722 ~1% {3} r5 = JOIN r4 WITH m#DataFlowPublic::Node::getCallableScope#bbf AS R ON FIRST 2 OUTPUT r4.<0>, r4.<1>, r4.<2>
722 ~1% {3} r6 = JOIN r5 WITH m#DataFlowPublic::Node::getCallableScope#bbf AS R ON FIRST 2 OUTPUT r5.<0>, r5.<2>, r5.<1>
722 ~1% {3} r7 = r6 AND NOT project#DataFlowPrivate::DataFlowCallable::getScope_dispred#ff AS R(r6.<2>)
722 ~1% {3} r8 = SCAN r7 OUTPUT r7.<0>, r7.<2>, r7.<1>
return r8
```
In this case, the join went away by simply moving the helper predicate
out of the class it was situated in (and since it doesn't mention
`this`, it didn't really belong there in the first place).
Result:
```
DataFlowPublic.qll-8:DataFlowPublic::getCallableScope#ff ........................... 26ms
(4 evaluations with max 15ms in DataFlowPublic::getCallableScope#ff/2@i3#709a9e)
```
With CP:
(0s) Tuple counts for dom#DataFlowPublic::TKwOverflowNode#ff:
1209 ~0% {2} r1 = JOIN project#AstGenerated::Function_::getKwarg_dispred#ff AS L WITH ObjectAPI::CallableValue::getScope_dispred#ff_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, L.<0>
4329 ~0% {3} r2 = JOIN r1 WITH DataFlowPrivate::ArgumentPassing::connects#bb_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, r1.<1>, r1.<0>
7819 ~2% {4} r3 = JOIN r2 WITH Flow::CallNode::getArgByName_dispred#fff AS R ON FIRST 1 OUTPUT r2.<1>, r2.<2>, r2.<0>, R.<1>
7114 ~1% {4} r4 = r3 AND NOT Function::Function::getArgByName_dispred#fff_01#antijoin_rhs AS R(r3.<0>, r3.<3>)
7114 ~76% {2} r5 = SCAN r4 OUTPUT r4.<2>, r4.<1>
1123 ~0% {1} r6 = JOIN project#Exprs::Call::getKwargs_dispred#ff AS L WITH py_flow_bb_node_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>
1123 ~0% {1} r7 = JOIN r6 WITH Flow::CallNode#class#f AS R ON FIRST 1 OUTPUT r6.<0>
1357707 ~0% {2} r8 = JOIN r7 WITH project#AstGenerated::Function_::getKwarg_dispred#ff AS R CARTESIAN PRODUCT OUTPUT R.<0>, r7.<0>
1357707 ~0% {2} r9 = JOIN r8 WITH ObjectAPI::CallableValue::getScope_dispred#ff_10#join_rhs AS R ON FIRST 1 OUTPUT r8.<1>, R.<1>
1364821 ~0% {2} r10 = r5 \/ r9
return r10
Without CP:
(13s) Tuple counts for dom#DataFlowPublic::TKwOverflowNode#ff:
1209 ~0% {2} r1 = JOIN project#AstGenerated::Function_::getKwarg_dispred#ff AS L WITH ObjectAPI::CallableValue::getScope_dispred#ff_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, L.<0>
19175 ~4% {3} r2 = JOIN r1 WITH DataFlowPrivate::ArgumentPassing::connects#ff_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, r1.<1>, r1.<0>
7819 ~2% {4} r3 = JOIN r2 WITH Flow::CallNode::getArgByName_dispred#fff AS R ON FIRST 1 OUTPUT r2.<1>, r2.<2>, r2.<0>, R.<1>
7114 ~1% {4} r4 = r3 AND NOT Function::Function::getArgByName_dispred#fff_01#antijoin_rhs AS R(r3.<0>, r3.<3>)
7114 ~76% {2} r5 = SCAN r4 OUTPUT r4.<2>, r4.<1>
1123 ~0% {1} r6 = JOIN project#Exprs::Call::getKwargs_dispred#ff AS L WITH py_flow_bb_node_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>
574 ~0% {2} r7 = JOIN r6 WITH DataFlowPrivate::ArgumentPassing::connects#ff AS R ON FIRST 1 OUTPUT R.<1>, r6.<0>
524 ~1% {3} r8 = JOIN r7 WITH ObjectAPI::CallableValue::getScope_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, r7.<1>, r7.<0>
291 ~0% {2} r9 = JOIN r8 WITH project#AstGenerated::Function_::getKwarg_dispred#ff AS R ON FIRST 1 OUTPUT r8.<1>, r8.<2>
7405 ~72% {2} r10 = r5 \/ r9
return r10
I'm slightly suspicious of this fix -- it seems to work, but it makes
me wonder if we're potentially missing other kinds of flow, by not
handling other kinds of definitions.
Also, I feel like this should really be attached to an appropriate
post-update node of the given argument. As it is written now, the flow
will go from the argument _before_ the call, which obviously misses a
step if the argument is modified by the call. In practice, I would
expect this to be rather rare.
This is the quick-and-dirty solution, as discussed.
An even quicker-and-dirtier solution would have used
`ModuleValue::attr` and take the `getOrigin` of that as the source of
the jump step. However, this turns out to be a bad choice, since
`attr` might fail to have a value for the given attribute (for a
variety of reasons). Thus, we instead appeal to a helper predicate
that keeps track of which names are defined by which right-hand-sides
in a given module. (Observe that type tracking works correctly for `x`
in `mymodule.py`, even though `x` is never assigned a value in the
eyes of the Value API.)
This means that points-to is only used to actually figure out if the
object we're looking an attribute up on is a module or not. This is
the next thing to replace in order to eliminate the dependence on
points-to, but this will require some care to ensure that all module
lookups are handled correctly.
Only two test files needed to be changed for the tests to pass. The
first was the fixed false negative in the type tracker, and the other
was a bunch of missing flow in the regression test. I have manually
removed the `# Flow not found` annotations to make them consistent
with the output. Pay particular attention to the annotation on line
117 -- I believe it was misplaced and should have been on line 106
instead (where, indeed, we now have flow where none appeared before).
This required some thought for how to model that we're interested in subclasses
of `fabric.group.Group`, and not so much that class itself. Some thoughts:
---
After initially using this in `module Group`
/** A reference to a subclass of `fabric.group.Group` */
abstract class SubclassRef extends DataFlow::Node { }
private class SubclassInstantiation extends SubclassInstanceSource, DataFlow::CfgNode {
override CallNode node;
SubclassInstantiation() { node.getFunction() = any(SubclassRef ref).asCfgNode() }
}
with this in `module SerialGroup` and `module ThreadingGroup`:
class ClassRef extends DataFlow::Node, fabric::group::Group::SubclassRef {
ClassRef() { this = classRef(DataFlow::TypeTracker::end()) }
}
I wasn't too much of fan of that approach. Since we probably need the `SubclassInstanceSource` anyway, and don't really have a specific use for `SubclassRef`, I just went with concrete (QL) subclasses of `SubclassInstanceSource` in each of the modules for the Python subclasses.
I really don't know what the best approach is, so I'm very open to suggestions. I think we'll really have to flesh this out for handling Django responses, since we're interested in the fact that some subclasses provide default values for the content-type, and keeping track of that is important for XSS (since there is no XSS if response is `text/plain`)