Commit Graph

3067 Commits

Author SHA1 Message Date
Calum Grant
c26d05b1d5 Merge pull request #5532 from RasmusWL/python-cleanup
Python: Delete filter queries, code duplication library, and precision tag from metric queries
2021-03-29 17:16:43 +01:00
Rasmus Wriedt Larsen
96a66fa4ee Python: Apply suggestions from code review 2021-03-29 17:02:56 +02:00
CodeQL CI
3613ceb07f Merge pull request #5535 from tausbn/python-prevent-bad-TCs
Approved by yoff
2021-03-29 12:03:08 +01:00
Taus Brock-Nannestad
f17bbd9982 Python: Fix another bad TC.
This one is a bit awkward, since the previous version was supposed to
improve indexing. Unfortunately this is vastly outweighed by the slow
convergence of the TC. Right now we pay the cost of inverting the
`hasFlowSource` relation, but this is still cheaper.
2021-03-26 16:38:13 +01:00
yoff
208d5157fa Merge pull request #5500 from RasmusWL/django-forms
Python: Model RemoteFlowSources on Django forms/fields
2021-03-25 20:43:19 +01:00
Taus Brock-Nannestad
c2f112cb92 Python: Filter _before_ the cartesian product
It's always a sad thing to see a good plan go wrong:

86860032 ~0%      {4} r26 = JOIN r19 WITH DataFlowPublic::TupleElementContent#class#ff CARTESIAN PRODUCT OUTPUT Lhs.0 'nodeFrom', Lhs.1 'nodeTo', Rhs.0, Rhs.1
129256   ~3%      {4} r27 = SELECT r26 ON In.3 <= 7
129256   ~0%      {3} r28 = SCAN r27 OUTPUT In.0 'nodeFrom', In.2 'c', In.1 'nodeTo'

Happily, now it looks like this:

129256  ~0%      {3} r20 = JOIN r19 WITH DataFlowPrivate::small_tuple#f CARTESIAN PRODUCT OUTPUT Lhs.0 'nodeFrom', Rhs.0, Lhs.1 'nodeTo'
2021-03-25 19:06:05 +01:00
Taus Brock-Nannestad
8734df334b Python: Slight cleanup 2021-03-25 18:35:16 +01:00
Taus Brock-Nannestad
229250dc54 Python: Limit size of TupleElementContent
A more principled approach is possible here, but in the short term
this will prevent an explosion.

For reference, openstack/cinder has roughly 19000 `ForTarget`s and
tuples of size up to 5300, and we were calculating the cartesian
product of these.
2021-03-25 18:28:49 +01:00
yoff
716e0f1404 Merge pull request #5517 from tausbn/python-prevent-potentially-bad-join-order
Python: Prevent potentially bad join order
2021-03-25 18:14:47 +01:00
Taus Brock-Nannestad
dbef36cbbb Python: Prevent bad TC and add a bit of caching
Using `simpleLocalFlowStep+` with the first argument specialised to
`CfgNode` was causing the compiler to turn this into a very slowly
converging manual TC computation.

Instead, we use `simpleLocalFlowStep*` (which is fast) and then join
that with a single step from any `CfgNode`. This should amount to the
same thing.

I also noticed that the charpred for `LocalSourceNode` was getting
recomputed a lot, so this is now cached. (The recomputation was
especially bad since it relied on `simpleLocalFlowStep+`, but anyway
it's a good idea not to recompute this.)
2021-03-25 17:28:37 +01:00
Rasmus Wriedt Larsen
9abe02f419 Python: Fix query metadata for old queries that have been ported
I'm not sure even I want to keep these around much longer. They seem to be
causing more problem than they are doing good.
2021-03-25 16:01:56 +01:00
Rasmus Wriedt Larsen
203b0e3d88 Python: Add change note 2021-03-25 15:34:09 +01:00
Rasmus Wriedt Larsen
bd4934380a Python: Remove code duplication library 2021-03-25 15:27:55 +01:00
Rasmus Wriedt Larsen
09fbf480db Python: Remove precision tag from metric queries 2021-03-25 15:06:47 +01:00
Rasmus Wriedt Larsen
e3b2e0a1de Python: Delete filter queries 2021-03-25 15:06:46 +01:00
Taus Brock-Nannestad
0ae8b69102 Python: Prevent joining on scope in PointsToContext::appliesTo
One of those cases where I _wish_ `pragma[inline]` also meant "don't
join on the stuff inside this predicate -- it's inlined for a reason".

Unsurprisingly, joining on the scope first works poorly.
2021-03-24 23:12:48 +01:00
Taus Brock-Nannestad
28d6cad3d0 Python: Prevent joining on name as the first thing
Many instances of `lookup` are restricted by the presence of
`attributeRequired`, but this does not work well if we join on
`name`. A few instances of `only_bind_into` prevents this.
2021-03-24 23:11:09 +01:00
Taus Brock-Nannestad
ed8ffab356 Python: Prevent potentially bad join order
This has no effect on the current compilation (indeed,
`ssa_filter_definition_bool` is not currently inlined), but will
prevent this from ever occurring, should the heuristics for inlining
ever change...
2021-03-24 19:20:19 +01:00
yoff
8d15680af4 Merge pull request #5506 from tausbn/python-allow-absolute-imports-from-source-directory
Python: Allow absolute imports in directories with scripts
2021-03-24 14:42:14 +01:00
yoff
b023d73016 Merge pull request #5504 from RasmusWL/type-tracking-first-predicate-private
Python: Ensure first type-tracking predicate is private
2021-03-24 14:23:27 +01:00
Rasmus Wriedt Larsen
1473778bb8 Merge pull request #5493 from yoff/python-add-experimental-structure
Python: Add stub structure to `experimental` for external contributions
2021-03-24 14:11:13 +01:00
Rasmus Wriedt Larsen
70974ea197 Python: Fix grammar in QLDoc
Co-authored-by: yoff <lerchedahl@gmail.com>
2021-03-24 14:06:06 +01:00
Taus Brock-Nannestad
47686a6e4c Python: Disregard all files matching .py% 2021-03-24 14:03:00 +01:00
Taus Brock-Nannestad
8d30ee5c3c Python: Include unmarked Python file in snapshot
Sadly, it seems we're not interpreting this as Python code, even if we
explicitly ask to have it included.
2021-03-24 14:01:13 +01:00
Rasmus Wriedt Larsen
59200386a7 Python: Fix mistake in refactor 2021-03-24 13:51:29 +01:00
Taus Brock-Nannestad
6d86239929 Python: Test all cases
Note that the test in `no_py_extension` isn't complete, since we're
not extracting the `main` file there.
2021-03-24 13:15:59 +01:00
yoff
61cff8faed Update python/ql/src/experimental/semmle/python/Concepts.qll
Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
2021-03-24 01:06:03 +01:00
Taus Brock-Nannestad
17d1768259 Python: Allow absolute imports in directories with scripts
Fixes the import logic to account for absolute imports.

We do this by classifying which files and folders may serve as the
entry point for execution, based on a few simple heuristics. If the
file `module.py` is in the same folder as a file `main.py` that may be
executed directly, then we allow `module` to be a valid name for
`module.py` so that `import module` will work as expected.
2021-03-23 18:32:17 +01:00
Taus Brock-Nannestad
4289e358bf Python: Add module import test case
This one will require some explanation...

First, the file structure. This commit adds a test consisting
representing a few different kinds of imports.

- Absolute imports, from `module.py` to `main.py` when the latter is
  executed directly.
- A package (contained in the `package` folder)
- A namespace package (contained in the `namespace_package` folder)

All of these are inside a folder called `code` for reasons I will
detail later.

The file `main.py` is identified as a script, by the presence of the
`!#` comment in its first line.

The files themselves are executable, and `python3 main.py` will print
out all modules in the order they are imported.

The test itself is very simple. It simply lists all modules and their
corresponding names. As is plainly visible, without modification we
only pick up `package` and its component modules as having names. This
is the bit that needs to be fixed.

Convincing the test runner to extract this test in a way that mimics
reality is, unfortunately, a bit complicated. By default, the test
runner itself includes any Python files in the test directory as
modules in the invocation of the extractor, and so we must hide
everything in the `code` subdirectory.

Secondly, a `--path` argument (set to the test directory) is
automatically added, and this would also interfere with extraction,
and hence we must prevent this. Luckily, if we supply our own `--path`
argument -- even if it doesn't make any sense -- then the other
argument is left out.

Finally, we must actually tell the extractor to extract the files (or
it would just happily pass the test with zero files extracted), so the
`-R .` argument ensures that we recurse over the files in the test
directory after all.
2021-03-23 18:21:58 +01:00
Rasmus Wriedt Larsen
deefbefffc Python: Minor refactor to use CallCfgNode 2021-03-23 16:42:41 +01:00
Rasmus Wriedt Larsen
1f5e52e822 Python: Cleanup "first" type-tracking predicate to be private
Since it's exposed nicely in the version that doesn't have a
`DataFlow::TypeTracker` parameter, these should be private.

Also found one instance where I had accidentially used DataFlow::Node instead of
LocalSourceNode
2021-03-23 16:40:56 +01:00
Rasmus Wriedt Larsen
f2bc413318 Python: remove single commented out line of code 2021-03-23 14:00:38 +01:00
Rasmus Wriedt Larsen
a4924856a2 Python: Model known form/field subclasses in Django
I used some ad-hoc QL queries to help me find all these extra instances, but not
quite ready to share that code yet :P
2021-03-23 13:57:39 +01:00
Rasmus Wriedt Larsen
8d0f6086af Python: Model django forms/fields
I'm not feeling 100% confident about `SelfRefMixin`, but since I needed it for
both DjangoViewClass and DjangoFormClass, I wanted to avoid copy-pasting this
code around. However, I'm not so opitimistic about it that I want to add it to a
sharable utility qll file :D
2021-03-23 13:57:38 +01:00
Taus
b46a3616d8 Merge pull request #5490 from RasmusWL/private-imports
Python: Make import private for better auto-complete
2021-03-23 12:00:35 +01:00
Rasmus Lerchedahl Petersen
198a4ca79b Python: Add files to experimental 2021-03-22 21:42:06 +01:00
Rasmus Wriedt Larsen
1890e63d4c Python: Make import private for better auto-complete
With the non-private imports, auto-completing on `API::` gave ALL results
available from `import python`, as well as the ones specified in the `API`
module.

The non-private import in Attributes.qll did the same for `DataFlow::`.
2021-03-22 16:45:44 +01:00
Taus Brock-Nannestad
4a6589d0ae Python: Make API::Node::getACall return a CallCfgNode
This should eliminate the need for explicit casting to
`CallCfgNode` (which does not appear in our code as far as I can see,
but was observed in an external contribution).
2021-03-22 16:37:24 +01:00
Rasmus Wriedt Larsen
3a83ecf067 Python: Add test for taint in django forms/fields 2021-03-22 10:03:32 +01:00
Rasmus Wriedt Larsen
d9079e34e3 Python: Move framework tests out of experimental
Since they are not experimental anymore 😄
2021-03-19 15:51:54 +01:00
Tom Hvitved
09a49e4580 Merge pull request #5311 from hvitved/dataflow/lambda
Data flow: Move C# lambda flow logic into shared library
2021-03-19 11:44:15 +01:00
yoff
37036b5e76 Merge pull request #5437 from RasmusWL/small-pyyaml-improvements
Python: Small PyYAML improvements
2021-03-19 11:15:49 +01:00
Rasmus Wriedt Larsen
7543f10593 Python: Reorganize PyYAML tests a bit 2021-03-19 09:53:25 +01:00
yoff
746e9948b0 Merge pull request #5075 from RasmusWL/crypto
Python: Port py/weak-crypto-key to use type-tracking
2021-03-18 20:53:28 +01:00
Rasmus Wriedt Larsen
42b2c3ed52 Python: Model C-based loaders for PyYAML
Not really that important. But easy to do while I was working on this library.
2021-03-18 11:55:01 +01:00
Rasmus Wriedt Larsen
54e6f51512 Python: Add example of C-based PyYAML loaders
```
In [6]: yaml.load("!!python/object/new:os.system [echo EXPLOIT!]", yaml.CLoader)
EXPLOIT!
Out[6]: 0
```
2021-03-18 11:50:59 +01:00
Rasmus Wriedt Larsen
25b15d7470 Python: Move PyYAML modeling classes within module
For now, this is how we're trying to structure things -- all in all it doesn't
matter too much, since everything is still marked as private.
2021-03-18 11:48:30 +01:00
Rasmus Wriedt Larsen
5ec8511d50 Python: Port PyYAML model to API graphs 2021-03-18 11:47:46 +01:00
Rasmus Wriedt Larsen
14e9bda5de Python: Refactor PyYAML tests a bit 2021-03-18 11:39:47 +01:00
Rasmus Wriedt Larsen
45a1fc6a96 Python: Add link to better PyYAML docs
I found this randomly
2021-03-18 11:20:22 +01:00