Commit Graph

29 Commits

Author SHA1 Message Date
yoff
8988a02806 Merge pull request #9733 from tausbn/python-fix-bad-mro-flatten-list-join
Python: Fix bad join in MRO `flatten_list`
2022-06-29 13:29:48 +02:00
Taus
b98c482c47 Python: Fix bad join in MRO flatten_list
This bad join was identified by the join-order-badness report, which
showed that:

py/use-of-input:MRO::flatten_list#f4eaf05f#fff#9c5fe54whnlqffdgu65vhb8uhpg# (order_500000)

calculated a whopping 212,820,108 tuples in order to produce an output of
size 55516, roughly 3833 times more effort than needed.

Here's a snippet of the slowest iteration of that predicate:
```
Tuple counts for MRO::flatten_list#f4eaf05f#fff/3@i1839#0265eb3w after 14ms:
0     ~0%     {3} r1 = JOIN MRO::need_flattening#f4eaf05f#f#prev_delta WITH MRO::ConsList#f4eaf05f#fff#reorder_2_0_1#prev ON FIRST 1 OUTPUT Rhs.1, Lhs.0 'list', Rhs.2
0     ~0%     {3} r2 = JOIN r1 WITH MRO::ClassList::length#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.2, Lhs.1 'list', Rhs.1 'n'
0     ~0%     {3} r3 = JOIN r2 WITH MRO::ClassListList::flatten#dispred#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.1 'list', Lhs.2 'n', Rhs.1 'result'

0     ~0%     {3} r4 = SCAN MRO::ConsList#f4eaf05f#fff#prev_delta OUTPUT In.2 'list', In.0, In.1
0     ~0%     {3} r5 = JOIN r4 WITH MRO::need_flattening#f4eaf05f#f#prev ON FIRST 1 OUTPUT Lhs.1, Lhs.2, Lhs.0 'list'
0     ~0%     {3} r6 = JOIN r5 WITH MRO::ClassList::length#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.1, Lhs.2 'list', Rhs.1 'n'
0     ~0%     {3} r7 = JOIN r6 WITH MRO::ClassListList::flatten#dispred#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.1 'list', Lhs.2 'n', Rhs.1 'result'

0     ~0%     {3} r8 = r3 UNION r7

26355 ~2%     {3} r9 = SCAN MRO::ConsList#f4eaf05f#fff#prev OUTPUT In.2 'list', In.0, In.1

0     ~0%     {3} r10 = JOIN r9 WITH MRO::need_flattening#f4eaf05f#f#prev ON FIRST 1 OUTPUT Lhs.1, Lhs.2, Lhs.0 'list'
0     ~0%     {3} r11 = JOIN r10 WITH MRO::ClassList::length#f0820431#ff#prev_delta ON FIRST 1 OUTPUT Lhs.1, Lhs.2 'list', Rhs.1 'n'
0     ~0%     {3} r12 = JOIN r11 WITH MRO::ClassListList::flatten#dispred#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.1 'list', Lhs.2 'n', Rhs.1 'result'
...
```
(... and a bunch more lines. The same construction appears several times,
but the join order is the same each time.)

Clearly it would be better to start with whatever is in `need_flattening`,
and then do the other joins. This is what the present fix does (by
unbinding `list` in all but the `needs_flattening` call).

After the fix, the slowest iteration is as follows:

```
Tuple counts for MRO::flatten_list#f4eaf05f#fff/3@i2617#8155ab3w after 9ms:
0 ~0%     {2} r1 = SCAN MRO::need_flattening#f4eaf05f#f#prev_delta OUTPUT In.0 'list', In.0 'list'

0 ~0%     {3} r2 = JOIN r1 WITH MRO::ConsList#f4eaf05f#fff#reorder_2_0_1#prev ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'list', Rhs.2
0 ~0%     {3} r3 = JOIN r2 WITH MRO::ClassList::length#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.2, Lhs.1 'list', Rhs.1 'n'
0 ~0%     {3} r4 = JOIN r3 WITH MRO::ClassListList::flatten#dispred#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.1 'list', Lhs.2 'n', Rhs.1 'result'

1 ~0%     {2} r5 = SCAN MRO::need_flattening#f4eaf05f#f#prev OUTPUT In.0 'list', In.0 'list'

0 ~0%     {3} r6 = JOIN r5 WITH MRO::ConsList#f4eaf05f#fff#reorder_2_0_1#prev_delta ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'list', Rhs.2
0 ~0%     {3} r7 = JOIN r6 WITH MRO::ClassList::length#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.2, Lhs.1 'list', Rhs.1 'n'
0 ~0%     {3} r8 = JOIN r7 WITH MRO::ClassListList::flatten#dispred#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.1 'list', Lhs.2 'n', Rhs.1 'result'
...
```
(... and so on. The remainder is 0 tuples all the way.)

In total, we went from
```
40.6s |  7614 |  15ms @ 1839 | MRO::flatten_list#f4eaf05f#fff@0265eb3w
```
to
```
7.8s |  7614 |  11ms @ 2617 | MRO::flatten_list#f4eaf05f#fff@8155ab3w
```
2022-06-28 14:17:47 +00:00
Taus
dc0f50d49a Python: Clean up variable names
Makes it more consistent with the names used in
`legalMergeCandidateNonEmpty`.
2022-06-27 19:54:09 +00:00
Taus
8fc9ce9699 Python: Fix bad join in MRO
Fixes a bad join in `list_of_linearization_of_bases_plus_bases`.

Previvously, we joined together `ConsList` and `getBase` before filtering
these out using the recursive call. Now we do the recursion first.

Co-authored-by: yoff <yoff@github.com>
2022-06-27 19:54:09 +00:00
Taus
80ef09f034 Python: Fix bad join in declaredAttributeVar
Before:
```
Tuple counts for PointsTo::declaredAttributeVar#fbf/3@99d5aenq after 1.1s:
451054   ~7%     {2} r1 = SCAN variable OUTPUT In.0, In.2 'name'
1296149  ~0%     {2} r2 = JOIN r1 WITH Essa::EssaVariable::getSourceVariable_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'var', Lhs.1 'name'
12179900 ~4%     {3} r3 = JOIN r2 WITH Essa::EssaVariable::getAUse_dispred#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'name', Lhs.0 'var'
8028     ~2%     {3} r4 = JOIN r3 WITH Scope::Scope::getANormalExit_dispred#bf_10#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'name', Lhs.2 'var'
8028     ~2%     {3} r5 = JOIN r4 WITH Classes::PythonClassObjectInternal::getScope_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'cls', Lhs.1 'name', Lhs.2 'var'
                 return r5
```

After:
```
Tuple counts for PointsTo::declaredAttributeVar#fbf/3@cccf36hb after 4ms:
1450 ~0%     {2} r1 = SCAN Classes::PythonClassObjectInternal::getScope_dispred#ff OUTPUT In.1, In.0 'cls'
1450 ~7%     {2} r2 = JOIN r1 WITH Scope::Scope::getANormalExit_dispred#bf ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'cls'
8028 ~0%     {2} r3 = JOIN r2 WITH Essa::EssaVariable::getAUse_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'var', Lhs.1 'cls'
8028 ~0%     {3} r4 = JOIN r3 WITH Essa::EssaVariable::getSourceVariable_dispred#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'cls', Lhs.0 'var'
8028 ~2%     {3} r5 = JOIN r4 WITH variable ON FIRST 1 OUTPUT Lhs.1 'cls', Rhs.2 'name', Lhs.2 'var'
return r5
```
2022-04-28 11:11:37 +00:00
Erik Krogh Kristensen
b959705531 revert changes in MRO.qll 2022-03-30 22:54:01 +02:00
Erik Krogh Kristensen
79713e0ef8 a bit more caching 2022-03-30 22:54:00 +02:00
Erik Krogh Kristensen
88e896992e cache the remainder of the pointsto layer 2022-03-30 22:54:00 +02:00
Erik Krogh Kristensen
79da0970cc various join order fixes 2022-03-30 22:54:00 +02:00
Erik Krogh Kristensen
71eacea90b add the cached stages pattern to Python 2022-03-30 22:53:59 +02:00
Erik Krogh Kristensen
3bf5e06d53 delete all dead code 2022-03-14 13:03:31 +01:00
Erik Krogh Kristensen
ddf93b555e PY: fix some ql/non-doc-block warnings 2022-03-11 11:02:58 +01:00
Erik Krogh Kristensen
a86f0afb3c delete all deprecations that are over 14 months old 2022-03-09 18:28:07 +01:00
Taus
d2603884ca Python: Fix a bunch of class QLDoc 2022-03-07 18:59:49 +00:00
Taus
af7f532212 Python: Fix up a bunch of function QLDoc 2022-03-07 18:59:49 +00:00
Taus
095f27f294 Python: Remove deprecated annotations 2022-03-04 12:30:26 +00:00
Taus
74f0bdfc79 Python: Fix "unused disjunct" warnings
For the most part, these boil down to "some global property holds, and
so this relation contains all instances of class `X`". The fix is to
explicitly build the cartesian product (which we were already building
implicitly anyway) by adding `and exists(var)` to the disjunct that did
not mention `var`.

Note that these cartesian products are always with singletons on one
side, and so should be unproblematic.
2022-03-04 12:14:57 +00:00
Tom Hvitved
92fa0071bd Update python/ql/lib/semmle/python/pointsto/MRO.qll
Co-authored-by: Taus <tausbn@github.com>
2022-03-01 14:16:49 +01:00
Tom Hvitved
58d90c7f8d Python: More points-to performance improvements 2022-02-10 10:29:30 +01:00
Tom Hvitved
7fd8d6dd30 Address review comments 2022-02-10 10:29:30 +01:00
Tom Hvitved
2de892bfd8 Python: Points-to performance improvements 2022-02-10 10:29:30 +01:00
Erik Krogh Kristensen
4e8e3a7420 simplify expressions that could be type-casts 2022-01-20 10:41:35 +01:00
Rasmus Wriedt Larsen
bd9b96e154 Merge pull request #7331 from tausbn/python-fix-bad-callsite-points-to-join
Python: Fix bad `callsite_points_to` join
2021-12-10 15:39:49 +01:00
Taus
e7c298d903 Python: Fix bad scope_entry_points_to join
From `pritomrajkhowa/LoopBound`:

```
Definitions.ql-7:PointsTo::PointsToInternal::scope_entry_points_to#ffff#antijoin_rhs#2 ........... 55.1s
```

specifically

```
(443s) Tuple counts for PointsTo::PointsToInternal::scope_entry_points_to#ffff#antijoin_rhs#2/3@74a7cart after 55.1s:
184070    ~0%        {3} r1 = JOIN PointsTo::PointsToInternal::scope_entry_points_to#ffff#shared#1 WITH Variables::GlobalVariable#class#f ON FIRST 1 OUTPUT Lhs.0 'arg2', Lhs.1 'arg0', Lhs.2 'arg1'
184070    ~0%        {3} r2 = STREAM DEDUP r1
919966523 ~2%        {4} r3 = JOIN r2 WITH Essa::EssaDefinition::getSourceVariable_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'arg0', Lhs.2 'arg1', Lhs.0 'arg2'
4281779   ~2293%     {3} r4 = JOIN r3 WITH Essa::EssaVariable::getScope_dispred#ff ON FIRST 2 OUTPUT Lhs.1 'arg0', Lhs.2 'arg1', Lhs.3 'arg2'
                    return r4
```

First, this is an `antijoin`, so there's likely some negation involved.
Also, there's mention of `GlobalVariable`, `getScope`, and
`getSourceVariable`, none of which appear in `scope_entry_points_to`, so
it's likely that something got inlined.

Taking a closer look at the predicates mentioned in the body, we spot
`undefined_variable` as a likely culprit.

Evaluating this predicate in isolation reveals that it's not terribly
big, so we could try just marking it with `pragma[noinline]` (I opted
for the slightly more solid `nomagic`) and see how that fares. I also
checked that `builtin_not_in_outer_scope` was similarly small, and
made that one un-inlineable as well.

The result? Well, I can't even show you. Both `scope_entry_points_to`
and `undefined_variable` are so fast that they don't appear in the
clause timing report (so they can at most take 3.5s each to evaluate, as
that is the smallest timing in the list).
2021-12-07 18:51:44 +00:00
Taus
b502ca1ea7 Python: Fix bad callsite_points_to join
From `pritomrajkhowa/LoopBound`:

```
Definitions.ql-7:PointsTo::InterProceduralPointsTo::callsite_points_to#ffff#join_rhs#3 ........... 5m53s
```

specifically

```
(767s) Tuple counts for PointsTo::InterProceduralPointsTo::callsite_points_to#ffff#join_rhs#3/3@f8f86764 after 5m53s:
832806293 ~0%     {4} r1 = JOIN PointsTo::InterProceduralPointsTo::callsite_points_to#ffff#shared#1 WITH PointsTo::InterProceduralPointsTo::var_at_exit#fff ON FIRST 1 OUTPUT Lhs.0, Lhs.1 'arg1', Rhs.1 'arg2', Rhs.2 'arg0'
832806293 ~0%     {3} r2 = JOIN r1 WITH Essa::TEssaNodeRefinement#ffff_03#join_rhs ON FIRST 2 OUTPUT Lhs.3 'arg0', Lhs.1 'arg1', Lhs.2 'arg2'
                return r2
```

This one is a bit tricky to unpack. Where is this `shared#1` defined?

```
EVALUATE NONRECURSIVE RELATION:
SYNTHETIC PointsTo::InterProceduralPointsTo::callsite_points_to#ffff#shared#1(int arg0, numbered_tuple arg1) :-
    SENTINEL PointsTo::InterProceduralPointsTo::callsite_points_to#ffff#shared
    SENTINEL Definitions::EscapingAssignmentGlobalVariable#class#f
    SENTINEL Essa::TEssaNodeRefinement#ffff_03#join_rhs
    {2} r1 = JOIN PointsTo::InterProceduralPointsTo::callsite_points_to#ffff#shared WITH Definitions::EscapingAssignmentGlobalVariable#class#f ON FIRST 1 OUTPUT Lhs.0 'arg0', Lhs.1 'arg1'
    {2} r2 = STREAM DEDUP r1
    {2} r3 = JOIN r2 WITH Essa::TEssaNodeRefinement#ffff_03#join_rhs ON FIRST 2 OUTPUT Lhs.0 'arg0', Lhs.1 'arg1'
    {2} r4 = STREAM DEDUP r3
    return r4
```

Looking at `callsite_points_to`, we see a likely candidate in `srcvar`.
It is guarded with an `instanceof` check for
`EscapingAssignmentGlobalVariable` (which lines up nicely with the
sentinel on its charpred) and `getSourceVariable` is just a projection
of `TEssaNodeRefinement`.

So let's try unbinding `srcvar` to prevent an early join.

The timing is now:

```
Definitions.ql-7:PointsTo::InterProceduralPointsTo::callsite_points_to#ffff ...................... 31.3s (2554 evaluations with max 101ms in PointsTo::InterProceduralPointsTo::callsite_points_to#ffff/4@i516#581fap5w)
```
(Showing the tuple counts doesn't make sense here, since all of the
`shared` and `join_rhs` predicates have been smooshed around.)
2021-12-07 18:25:53 +00:00
Erik Krogh Kristensen
6ff8d4de5c add all remaining explicit this 2021-11-26 13:50:10 +01:00
Taus
33135e909a Python: Add magic to named_argument_transfer
This predicate was materialised as a _big_, _cached_ relation:

```
(169s) Tuple counts for PointsTo::InterProceduralPointsTo::named_argument_transfer#ffff#join_rhs/4@38ce07 after 53.4s:
25212     ~4%     {3} r1 = SCAN Function::Function::getArgByName_dispred#fff OUTPUT In.1, In.0 'arg1', In.2 'arg2'
159751200 ~0%     {4} r2 = JOIN r1 WITH Flow::CallNode::getArgByName_dispred#fff_102#join_rhs ON FIRST 1 OUTPUT Rhs.1 'arg0', Lhs.1 'arg1', Lhs.2 'arg2', Rhs.2 'arg3'
                  return r2
```

... However it's only used in a single place (where it is immediately
joined with the points-to relation to relate the caller and argument),
none of these joins were ever larger than 2000 tuples. This made it
pretty clear that we could gain something by pushing in that points-to
join as a bit of manual magic.

However, doing so didn't actually fix anything, since the join-orderer
then decided to join `func.getArgByName(name)` with
`call.getArgByName(name)` on `name` as the first thing (which caused a
join of the same size as above).

Unbinding didn't work, since `name` would then be an unbound `string`,
so instead I factored out relating the function, parameter, and name
thereof into its own predicate. (I could also have done this with the
call, but I would expect there to be more calls than function
definitions in general.)

Overall, this resulted in going from

```
(709s)
Definitions.ql-7:PointsTo::InterProceduralPointsTo::named_argument_transfer#ffff#join_rhs ......... 53.5s
Definitions.ql-7:Instances::InstanceObject::initializer_dispred#fbf ............................... 35.3s (456 evaluations with max 136ms in Instances::InstanceObject::initializer_dispred#fbf/3@i110#0508e8)
Definitions.ql-10:DefinitionTracking::jump_to_defn_attribute#fbf .................................. 27s (100 evaluations with max 12.8s in DefinitionTracking::jump_to_defn_attribute#fbf/3@i1#fc1f7x)
Definitions.ql-7:PointsTo::PointsToInternal::pointsTo#ffff ........................................ 16.1s (681 evaluations with max 2.5s in PointsTo::PointsToInternal::pointsTo#ffff/4@i4#0508eg)
Definitions.ql-7:Constants::ConstantObjectInternal::attribute#ffff ................................ 13.4s (505 evaluations with max 50ms in Constants::ConstantObjectInternal::attribute#ffff/4@i153#0508e5)
Definitions.ql-10:DefinitionTracking::assignment_jump_to_defn_attribute#fbf ....................... 12.4s (99 evaluations with max 11.8s in DefinitionTracking::assignment_jump_to_defn_attribute#fbf/3@i2#fc1f
7z)
...
```

to

```
(668s)
Definitions.ql-7:Instances::InstanceObject::initializer_dispred#fbf ................... 35.4s (456 evaluations with max 140ms in Instances::InstanceObject::initializer_dispred#fbf/3@i110#bf4328)
Definitions.ql-10:DefinitionTracking::jump_to_defn_attribute#fbf ...................... 27.4s (100 evaluations with max 13.3s in DefinitionTracking::jump_to_defn_attribute#fbf/3@i1#679d7x)
Definitions.ql-7:PointsTo::PointsToInternal::pointsTo#ffff ............................ 16.1s (681 evaluations with max 2.5s in PointsTo::PointsToInternal::pointsTo#ffff/4@i4#bf432g)
Definitions.ql-7:Constants::ConstantObjectInternal::attribute#ffff .................... 14.4s (505 evaluations with max 51ms in Constants::ConstantObjectInternal::attribute#ffff/4@i140#bf4325)
Definitions.ql-10:DefinitionTracking::assignment_jump_to_defn_attribute#fbf ........... 12.3s (99 evaluations with max 11.7s in DefinitionTracking::assignment_jump_to_defn_attribute#fbf/3@i2#679d
7z)
...
```
2021-11-09 21:39:32 +00:00
Anders Schack-Mulligen
310eec07c1 Java/Python: Fix some potential performance problems due to transitive deltas. 2021-10-14 16:10:00 +02:00
Andrew Eisenberg
3660c64328 Packaging: Rafactor Python core libraries
Extract the external facing `qll` files into the codeql/python-all
query pack.
2021-08-24 13:23:45 -07:00