This commit renames the files relating to the diagnostic query that produces information on the number of files extracted. The files have been renamed from "SuccessfullExtractedFiles.*" to "ExtractedFiles.*". All related tests and test files have been renamed too.
The `@tags` and `@id` attributes of the queries have been left untouched, consistent with the `@tags` and `@id` for similar queries in other languages.
Or, more generally, any copy step, as these presumably do not preserve
object identity.
(Arguably, `copy` could still be susceptible to interior mutability, but
I think that's outside the scope of this query anyway.)
There are two issues with `deepcopy` here. Firstly, the `deepcopy` function itself
has a mutable default value in its parameter `_nil` (set to the empty list by default).
Now, this value is never actually returned from `deepcopy`, as it is only used as a
sentinel, but our analysis is not clever enough to see this. Thus, it thinks that this
mutable default is returned, and hence the result of any call to `deepcopy` is a
potential source.
To remedy this, I opted to simply exclude all sources that originate from within the
standard library. It is very unlikely for any of the sources in the standard library
to be legit.
Secondly, `deepcopy` -- by virtue of being a function that we model as preserving
values -- admits data-flow through its calls, but this is not correct for the mutable
default query, as it is here the _identity_ of the default value in question that is
important. Thus, we get spurious flow through `deepcopy` for this specific query.
Moves the existing tests into the `ModificationOfParameterWithDefault` subdirectory
which already contained a bunch more tests. In the process, I also removed some
duplicated test cases.
Since some sanitisers don't handle backslashes correctly, I updated the data-flow configuration to incorporate a flow state tracking whether or not backslashes have been eliminated or converted to forward slashes.
Adds support for extraction filters as defined in
https://peps.python.org/pep-0706/
and implemented in Python 3.12.
By my reading, setting the filter to `'data'` or `'tar'` is probably
safe, whereas `'fully_trusted'` or the default (which is the same as
`None`) is not.
For now, I have just added this modelling to the tarslip query. We could
also share it with the modelling of `shutil.unpack_archive` (which has also
gained a `filter` argument), but it was unclear to me where we should put
this modelling in that case. Perhaps the best solution would be to merge
the experimental `py/tarslip-extended` query into the existing query (in
which case the current location is perhaps not too bad).
This commit removes SSA nodes from the data flow graph. Specifically, for a definition and use such as
```python
x = expr
y = x + 2
```
we used to have flow from `expr` to an SSA variable representing x and from that SSA variable to the use of `x` in the definition of `y`. Now we instead have flow from `expr` to the control flow node for `x` at line 1 and from there to the control flow node for `x` at line 2.
Specific changes:
- `EssaNode` from the data flow layer no longer exists.
- Several glue steps between `EssaNode`s and `CfgNode`s have been deleted.
- Entry nodes are now admitted as `CfgNodes` in the data flow layer (they were filtered out before).
- Entry nodes now have a new `toString` taking into account that the module name may be ambigous.
- Some tests have been rewritten to accomodate the changes, but only `python/ql/test/experimental/dataflow/basic/maximalFlowsConfig.qll` should have semantic changes.
- Comments have been updated
- Test output has been updated, but apart from `python/ql/test/experimental/dataflow/basic/maximalFlows.expected` only `python/ql/test/experimental/dataflow/typetracking-summaries/summaries.py` should have a semantic change. This is a bonus fix, probably meaning that something was never connected up correctly.
I reckon this is due to the Python 3 version used by the Python 2 tests
is different from 3.12, so even with --lang=3 the tests are still using
an incompatible version :(
You might wonder why the number of lines changed, but it's due to `tty`
module receiving its' first update since 2001, so the actual number of
lines DID change :phew:
https://github.com/python/cpython/commits/3.12/Lib/tty.py
Since there is now a difference between Python 2 and Python 3, we need to restrict the lines of code test to only run as Python 3.
turn "the long jump" that would end up
straight at the argument into a short jump
that ends up at the dictionary being written to.
Dataflow takes care of the rest of the path.
Query operators that interpret JavaScript
are no longer considered sinks.
Instead they are considered decodings
and the output is the tainted dictionary.
The state changes to `DictInput` to reflect
that the user now controls a dangerous dictionary.
This fixes the spurious result and moves the error reporting
to a more logical place.