This is the quick-and-dirty solution, as discussed.
An even quicker-and-dirtier solution would have used
`ModuleValue::attr` and take the `getOrigin` of that as the source of
the jump step. However, this turns out to be a bad choice, since
`attr` might fail to have a value for the given attribute (for a
variety of reasons). Thus, we instead appeal to a helper predicate
that keeps track of which names are defined by which right-hand-sides
in a given module. (Observe that type tracking works correctly for `x`
in `mymodule.py`, even though `x` is never assigned a value in the
eyes of the Value API.)
This means that points-to is only used to actually figure out if the
object we're looking an attribute up on is a module or not. This is
the next thing to replace in order to eliminate the dependence on
points-to, but this will require some care to ensure that all module
lookups are handled correctly.
Only two test files needed to be changed for the tests to pass. The
first was the fixed false negative in the type tracker, and the other
was a bunch of missing flow in the regression test. I have manually
removed the `# Flow not found` annotations to make them consistent
with the output. Pay particular attention to the annotation on line
117 -- I believe it was misplaced and should have been on line 106
instead (where, indeed, we now have flow where none appeared before).
This required some thought for how to model that we're interested in subclasses
of `fabric.group.Group`, and not so much that class itself. Some thoughts:
---
After initially using this in `module Group`
/** A reference to a subclass of `fabric.group.Group` */
abstract class SubclassRef extends DataFlow::Node { }
private class SubclassInstantiation extends SubclassInstanceSource, DataFlow::CfgNode {
override CallNode node;
SubclassInstantiation() { node.getFunction() = any(SubclassRef ref).asCfgNode() }
}
with this in `module SerialGroup` and `module ThreadingGroup`:
class ClassRef extends DataFlow::Node, fabric::group::Group::SubclassRef {
ClassRef() { this = classRef(DataFlow::TypeTracker::end()) }
}
I wasn't too much of fan of that approach. Since we probably need the `SubclassInstanceSource` anyway, and don't really have a specific use for `SubclassRef`, I just went with concrete (QL) subclasses of `SubclassInstanceSource` in each of the modules for the Python subclasses.
I really don't know what the best approach is, so I'm very open to suggestions. I think we'll really have to flesh this out for handling Django responses, since we're interested in the fact that some subclasses provide default values for the content-type, and keeping track of that is important for XSS (since there is no XSS if response is `text/plain`)
Having multiple copies of the StrConst data-flow tracking code means that if we
need to update this to be more sophisticated, we could easily forget to do it
somewhere :|
Until we have a proper `.getAPossibleStringValue` helper, this refactoring
should be nice :)
A slightly more complicated version of the situation in
https://github.com/github/codeql/pull/2507 could cause the `toString`
calculation to diverge. Although the previous PR took tuples nested
inside tuples into account (and subscripted types cannot be nested
inside each other in our modelling), it did not account for having
this nesting be interleaved, and this is what caused the divergence.
I have not done the usual "test case first to show the problem
exists", since this would also diverge and take forever to fail. The
instance observed in `scipy` was likely caused by something akin to
```python
x = ()
while True:
x = x[(x,)]
```
Finally, to prevent this from happening with other types, I went
through and checked each instance where the string representation of
an `ObjectInternal` might potentially contain a reference to
itself (and thus explode). I encapsulated this in a
`bounded_toString` helper predicate, and used this in all the cases
where I was able to determine that the above _could_ happen.