I had to rewrite the SINK1-SINK7 definitions, since this new requirement
complained that we had to add this `MISSING: flow` annotation :D
Doing this implementation also revealed that there was a bug, since I
did not compare files when checking for these `MISSING:` annotations. So
fixed that up in the implementation for inline taint tests as well.
(extra whitespace in argumentPassing.py to avoid changing line numbers
for other tests)
I went with NormalDataflowTest to signify that if you don't know what
you're looking for, this is probably the one. I did not want to just
call it DataflowTest, since that becomes a big vague when there are also
`FlowTest.qll` and `MaximalFlowTest.qll` -- I'm open to renaming this
though 👍
TL;DR: Something introduced the following bad join order:
```
(227s) Tuple counts for dom#TObject::TPythonTuple#ff/2@i2#8f58670w after 3m46s:
25000 ~0% {2} r1 = SCAN PointsToContext::PointsToContext::appliesToScope_dispred#ff#prev_delta OUTPUT In.1, In.0 'context'
24000 ~1% {2} r2 = JOIN r1 WITH @py_scope#f ON FIRST 1 OUTPUT Lhs.1 'context', Lhs.0
1076876712 ~6% {3} r3 = JOIN r2 WITH Flow::TupleNode#class#f CARTESIAN PRODUCT OUTPUT Rhs.0, Lhs.0 'context', Lhs.1
870129666 ~0% {3} r4 = JOIN r3 WITH Flow::ControlFlowNode::isLoad_dispred#f ON FIRST 1 OUTPUT Lhs.1 'context', Lhs.2, Lhs.0 'origin'
870129000 ~0% {3} r5 = r4 AND NOT dom#TObject::TPythonTuple#ff#prev(Lhs.2 'origin', Lhs.0 'context')
870129000 ~1% {3} r6 = SCAN r5 OUTPUT In.2 'origin', In.1, In.0 'context'
9000 ~0% {2} r7 = JOIN r6 WITH Flow::ControlFlowNode::getScope_dispred#ff ON FIRST 2 OUTPUT Lhs.0 'origin', Lhs.2 'context'
return r7
```
(...the above being the tuple counts _at the point when I cancelled the
query_!)
Rewriting the code to force a join between `TupleNode#class` and
`getScope` results in the following join orders:
```
(0s) Tuple counts for TObject::scope_loads_tuplenode#ff/2@b3cf0bo5 after 13ms:
37369 ~3% {1} r1 = JOIN Flow::TupleNode#class#f WITH Flow::ControlFlowNode::isLoad_dispred#f ON FIRST 1 OUTPUT Lhs.0 'origin'
37369 ~3% {2} r2 = JOIN r1 WITH Flow::ControlFlowNode::getScope_dispred#ff ON FIRST 1 OUTPUT Rhs.1 's', Lhs.0 'origin'
return r2
```
and
```
(78s) Tuple counts for dom#TObject::TPythonTuple#ff/2@i53#121c440w after 6ms:
34736 ~3% {2} r1 = SCAN PointsToContext::PointsToContext::appliesToScope_dispred#ff#prev_delta OUTPUT In.1, In.0 'context'
7370 ~5% {2} r2 = JOIN r1 WITH TObject::scope_loads_tuplenode#ff ON FIRST 1 OUTPUT Lhs.1 'context', Rhs.1 'origin'
7370 ~5% {2} r3 = r2 AND NOT dom#TObject::TPythonTuple#ff#prev(Lhs.1 'origin', Lhs.0 'context')
7370 ~1% {2} r4 = SCAN r3 OUTPUT In.1 'origin', In.0 'context'
return r4
```
the latter being the largest iteration of `dom#TPythonTuple` throughout
the log.
No other major performance issues were observed.
- move from custom concept `LogOutput` to standard concept `Logging`
- remove `Log.qll` from experimental frameworks
- fold models into standard models (naively for now)
- stdlib:
- make Logger module public
- broaden definition of instance
- add `extra` keyword as possible source
- flak: add app.logger as logger instance
- django: `add django.utils.log.request_logger` as logger instance
(should we add the rest?)
- remove LogOutput from experimental concepts
I am slightly concerned that the test now generates many more
intermediate results. I suppose that maes the analysis heavy.
Should the new library get a new name instead, so the old code
does not get evaluated?
I did a test locally, something like
import requests
req = requests.Request(
"POST",
"http://127.0.0.1:8000/app/upload-test/",
data={"name": "foo"},
files={"upload" : ("wat/haha|!#$%^&", open("foo.txt", "rb"))},
)
# print(req.prepare().body.decode('ascii'))
requests.session().send(req.prepare())
and the `wat/` part was stripped from the filename