I had to rewrite the SINK1-SINK7 definitions, since this new requirement
complained that we had to add this `MISSING: flow` annotation :D
Doing this implementation also revealed that there was a bug, since I
did not compare files when checking for these `MISSING:` annotations. So
fixed that up in the implementation for inline taint tests as well.
(extra whitespace in argumentPassing.py to avoid changing line numbers
for other tests)
I went with NormalDataflowTest to signify that if you don't know what
you're looking for, this is probably the one. I did not want to just
call it DataflowTest, since that becomes a big vague when there are also
`FlowTest.qll` and `MaximalFlowTest.qll` -- I'm open to renaming this
though 👍
TL;DR: Something introduced the following bad join order:
```
(227s) Tuple counts for dom#TObject::TPythonTuple#ff/2@i2#8f58670w after 3m46s:
25000 ~0% {2} r1 = SCAN PointsToContext::PointsToContext::appliesToScope_dispred#ff#prev_delta OUTPUT In.1, In.0 'context'
24000 ~1% {2} r2 = JOIN r1 WITH @py_scope#f ON FIRST 1 OUTPUT Lhs.1 'context', Lhs.0
1076876712 ~6% {3} r3 = JOIN r2 WITH Flow::TupleNode#class#f CARTESIAN PRODUCT OUTPUT Rhs.0, Lhs.0 'context', Lhs.1
870129666 ~0% {3} r4 = JOIN r3 WITH Flow::ControlFlowNode::isLoad_dispred#f ON FIRST 1 OUTPUT Lhs.1 'context', Lhs.2, Lhs.0 'origin'
870129000 ~0% {3} r5 = r4 AND NOT dom#TObject::TPythonTuple#ff#prev(Lhs.2 'origin', Lhs.0 'context')
870129000 ~1% {3} r6 = SCAN r5 OUTPUT In.2 'origin', In.1, In.0 'context'
9000 ~0% {2} r7 = JOIN r6 WITH Flow::ControlFlowNode::getScope_dispred#ff ON FIRST 2 OUTPUT Lhs.0 'origin', Lhs.2 'context'
return r7
```
(...the above being the tuple counts _at the point when I cancelled the
query_!)
Rewriting the code to force a join between `TupleNode#class` and
`getScope` results in the following join orders:
```
(0s) Tuple counts for TObject::scope_loads_tuplenode#ff/2@b3cf0bo5 after 13ms:
37369 ~3% {1} r1 = JOIN Flow::TupleNode#class#f WITH Flow::ControlFlowNode::isLoad_dispred#f ON FIRST 1 OUTPUT Lhs.0 'origin'
37369 ~3% {2} r2 = JOIN r1 WITH Flow::ControlFlowNode::getScope_dispred#ff ON FIRST 1 OUTPUT Rhs.1 's', Lhs.0 'origin'
return r2
```
and
```
(78s) Tuple counts for dom#TObject::TPythonTuple#ff/2@i53#121c440w after 6ms:
34736 ~3% {2} r1 = SCAN PointsToContext::PointsToContext::appliesToScope_dispred#ff#prev_delta OUTPUT In.1, In.0 'context'
7370 ~5% {2} r2 = JOIN r1 WITH TObject::scope_loads_tuplenode#ff ON FIRST 1 OUTPUT Lhs.1 'context', Rhs.1 'origin'
7370 ~5% {2} r3 = r2 AND NOT dom#TObject::TPythonTuple#ff#prev(Lhs.1 'origin', Lhs.0 'context')
7370 ~1% {2} r4 = SCAN r3 OUTPUT In.1 'origin', In.0 'context'
return r4
```
the latter being the largest iteration of `dom#TPythonTuple` throughout
the log.
No other major performance issues were observed.
- move from custom concept `LogOutput` to standard concept `Logging`
- remove `Log.qll` from experimental frameworks
- fold models into standard models (naively for now)
- stdlib:
- make Logger module public
- broaden definition of instance
- add `extra` keyword as possible source
- flak: add app.logger as logger instance
- django: `add django.utils.log.request_logger` as logger instance
(should we add the rest?)
- remove LogOutput from experimental concepts
I am slightly concerned that the test now generates many more
intermediate results. I suppose that maes the analysis heavy.
Should the new library get a new name instead, so the old code
does not get evaluated?
I did a test locally, something like
import requests
req = requests.Request(
"POST",
"http://127.0.0.1:8000/app/upload-test/",
data={"name": "foo"},
files={"upload" : ("wat/haha|!#$%^&", open("foo.txt", "rb"))},
)
# print(req.prepare().body.decode('ascii'))
requests.session().send(req.prepare())
and the `wat/` part was stripped from the filename
The original configuration did not match sinks with sanitizers.
Here it is resolved using flow state,
it could also be done by using two configurations.