This one is a bit awkward, since the previous version was supposed to
improve indexing. Unfortunately this is vastly outweighed by the slow
convergence of the TC. Right now we pay the cost of inverting the
`hasFlowSource` relation, but this is still cheaper.
A more principled approach is possible here, but in the short term
this will prevent an explosion.
For reference, openstack/cinder has roughly 19000 `ForTarget`s and
tuples of size up to 5300, and we were calculating the cartesian
product of these.
Using `simpleLocalFlowStep+` with the first argument specialised to
`CfgNode` was causing the compiler to turn this into a very slowly
converging manual TC computation.
Instead, we use `simpleLocalFlowStep*` (which is fast) and then join
that with a single step from any `CfgNode`. This should amount to the
same thing.
I also noticed that the charpred for `LocalSourceNode` was getting
recomputed a lot, so this is now cached. (The recomputation was
especially bad since it relied on `simpleLocalFlowStep+`, but anyway
it's a good idea not to recompute this.)
One of those cases where I _wish_ `pragma[inline]` also meant "don't
join on the stuff inside this predicate -- it's inlined for a reason".
Unsurprisingly, joining on the scope first works poorly.
Many instances of `lookup` are restricted by the presence of
`attributeRequired`, but this does not work well if we join on
`name`. A few instances of `only_bind_into` prevents this.
This has no effect on the current compilation (indeed,
`ssa_filter_definition_bool` is not currently inlined), but will
prevent this from ever occurring, should the heuristics for inlining
ever change...
Fixes the import logic to account for absolute imports.
We do this by classifying which files and folders may serve as the
entry point for execution, based on a few simple heuristics. If the
file `module.py` is in the same folder as a file `main.py` that may be
executed directly, then we allow `module` to be a valid name for
`module.py` so that `import module` will work as expected.
This one will require some explanation...
First, the file structure. This commit adds a test consisting
representing a few different kinds of imports.
- Absolute imports, from `module.py` to `main.py` when the latter is
executed directly.
- A package (contained in the `package` folder)
- A namespace package (contained in the `namespace_package` folder)
All of these are inside a folder called `code` for reasons I will
detail later.
The file `main.py` is identified as a script, by the presence of the
`!#` comment in its first line.
The files themselves are executable, and `python3 main.py` will print
out all modules in the order they are imported.
The test itself is very simple. It simply lists all modules and their
corresponding names. As is plainly visible, without modification we
only pick up `package` and its component modules as having names. This
is the bit that needs to be fixed.
Convincing the test runner to extract this test in a way that mimics
reality is, unfortunately, a bit complicated. By default, the test
runner itself includes any Python files in the test directory as
modules in the invocation of the extractor, and so we must hide
everything in the `code` subdirectory.
Secondly, a `--path` argument (set to the test directory) is
automatically added, and this would also interfere with extraction,
and hence we must prevent this. Luckily, if we supply our own `--path`
argument -- even if it doesn't make any sense -- then the other
argument is left out.
Finally, we must actually tell the extractor to extract the files (or
it would just happily pass the test with zero files extracted), so the
`-R .` argument ensures that we recurse over the files in the test
directory after all.
Since it's exposed nicely in the version that doesn't have a
`DataFlow::TypeTracker` parameter, these should be private.
Also found one instance where I had accidentially used DataFlow::Node instead of
LocalSourceNode
I'm not feeling 100% confident about `SelfRefMixin`, but since I needed it for
both DjangoViewClass and DjangoFormClass, I wanted to avoid copy-pasting this
code around. However, I'm not so opitimistic about it that I want to add it to a
sharable utility qll file :D
With the non-private imports, auto-completing on `API::` gave ALL results
available from `import python`, as well as the ones specified in the `API`
module.
The non-private import in Attributes.qll did the same for `DataFlow::`.
This should eliminate the need for explicit casting to
`CallCfgNode` (which does not appear in our code as far as I can see,
but was observed in an external contribution).