Since some sanitisers don't handle backslashes correctly, I updated the data-flow configuration to incorporate a flow state tracking whether or not backslashes have been eliminated or converted to forward slashes.
Adds support for extraction filters as defined in
https://peps.python.org/pep-0706/
and implemented in Python 3.12.
By my reading, setting the filter to `'data'` or `'tar'` is probably
safe, whereas `'fully_trusted'` or the default (which is the same as
`None`) is not.
For now, I have just added this modelling to the tarslip query. We could
also share it with the modelling of `shutil.unpack_archive` (which has also
gained a `filter` argument), but it was unclear to me where we should put
this modelling in that case. Perhaps the best solution would be to merge
the experimental `py/tarslip-extended` query into the existing query (in
which case the current location is perhaps not too bad).
This commit removes SSA nodes from the data flow graph. Specifically, for a definition and use such as
```python
x = expr
y = x + 2
```
we used to have flow from `expr` to an SSA variable representing x and from that SSA variable to the use of `x` in the definition of `y`. Now we instead have flow from `expr` to the control flow node for `x` at line 1 and from there to the control flow node for `x` at line 2.
Specific changes:
- `EssaNode` from the data flow layer no longer exists.
- Several glue steps between `EssaNode`s and `CfgNode`s have been deleted.
- Entry nodes are now admitted as `CfgNodes` in the data flow layer (they were filtered out before).
- Entry nodes now have a new `toString` taking into account that the module name may be ambigous.
- Some tests have been rewritten to accomodate the changes, but only `python/ql/test/experimental/dataflow/basic/maximalFlowsConfig.qll` should have semantic changes.
- Comments have been updated
- Test output has been updated, but apart from `python/ql/test/experimental/dataflow/basic/maximalFlows.expected` only `python/ql/test/experimental/dataflow/typetracking-summaries/summaries.py` should have a semantic change. This is a bonus fix, probably meaning that something was never connected up correctly.
I reckon this is due to the Python 3 version used by the Python 2 tests
is different from 3.12, so even with --lang=3 the tests are still using
an incompatible version :(
You might wonder why the number of lines changed, but it's due to `tty`
module receiving its' first update since 2001, so the actual number of
lines DID change :phew:
https://github.com/python/cpython/commits/3.12/Lib/tty.py
Since there is now a difference between Python 2 and Python 3, we need to restrict the lines of code test to only run as Python 3.
turn "the long jump" that would end up
straight at the argument into a short jump
that ends up at the dictionary being written to.
Dataflow takes care of the rest of the path.
Query operators that interpret JavaScript
are no longer considered sinks.
Instead they are considered decodings
and the output is the tainted dictionary.
The state changes to `DictInput` to reflect
that the user now controls a dangerous dictionary.
This fixes the spurious result and moves the error reporting
to a more logical place.
This allows us to make more precise modelling
The query tests now pass.
I do wonder, if there is a cleaner approach, similar to
`TaintedObject` in JavaScript. I want the option to
get this query in the hands of the custumors before
such an investigation, though.