The "most expensive predicates" report had the following line on a
certain database:
```
1m15s | 11 | 37s @ 4 | ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0@12bb4xdo
```
Investigating further revealed the following bad joins
```
(388s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i4#b2325xoe after 36.5s:
0 ~0% {2} r1 = JOIN `ImportResolution::ImportResolution::sys_modules_module_with_name/1#134529bf#prev_delta` WITH `ImportResolution::ImportResolution::getReferenceToModuleName/1#bc5da225` ON FIRST 1 OUTPUT Rhs.1 'result', Lhs.1 'm'
74884348 ~0% {3} r2 = JOIN `ImportResolution::ImportResolution::getModuleReference/1#28368ea4#prev_delta` WITH `ImportResolution::ImportResolution::potential_module_export/2#19340171` ON FIRST 1 OUTPUT Lhs.1, Rhs.1, Lhs.0
5221604 ~0% {3} | JOIN WITH `Attributes::AttrRef.accesses/2#dispred#31929f12_120#join_rhs` ON FIRST 2 OUTPUT Rhs.2 'result', Lhs.2, Lhs.1
5219926 ~2% {3} | JOIN WITH Attributes::AttrRead#class#f6c3f431 ON FIRST 1 OUTPUT Lhs.1, Lhs.2, Lhs.0 'result'
5300880 ~1% {2} | JOIN WITH `ImportResolution::ImportResolution::module_export/3#f2fc6a2a` ON FIRST 2 OUTPUT Rhs.2, Lhs.2 'result'
42211 ~5% {2} | JOIN WITH `ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0#prev` ON FIRST 1 OUTPUT Lhs.1 'result', Rhs.1 'm'
957042 ~4% {3} r3 = JOIN `ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0#prev_delta` WITH `ImportResolution::ImportResolution::module_export/3#f2fc6a2a_201#join_rhs` ON FIRST 1 OUTPUT Rhs.1, Rhs.2, Lhs.1 'm'
957035 ~0% {3} | JOIN WITH `ImportResolution::ImportResolution::potential_module_export/2#19340171` ON FIRST 2 OUTPUT Lhs.1, Lhs.2 'm', Lhs.0
236753257 ~1% {4} | JOIN WITH `Attributes::AttrRef.accesses/2#dispred#31929f12_201#join_rhs` ON FIRST 1 OUTPUT Rhs.1 'result', Lhs.1 'm', Lhs.2, Rhs.2
199557145 ~2% {4} | JOIN WITH Attributes::AttrRead#class#f6c3f431 ON FIRST 1 OUTPUT Lhs.2, Lhs.3, Lhs.1 'm', Lhs.0 'result'
1 ~0% {2} | JOIN WITH `ImportResolution::ImportResolution::getModuleReference/1#28368ea4#prev` ON FIRST 2 OUTPUT Lhs.3 'result', Lhs.2 'm'
15199013 ~1951% {2} r4 = JOIN `ImportResolution::ImportResolution::getModuleReference/1#28368ea4#prev_delta` WITH `Module::Module.getPackageName/0#dispred#bb0c3872` ON FIRST 1 OUTPUT Lhs.1, Rhs.1
14707604 ~2136% {3} | JOIN WITH `Attributes::AttrRef.accesses/2#dispred#31929f12_102#join_rhs` ON FIRST 1 OUTPUT Rhs.1 'result', Lhs.1, Rhs.2
14623588 ~2190% {4} r5 = JOIN r4 WITH Attributes::AttrRead#class#f6c3f431 ON FIRST 1 OUTPUT _, Lhs.0 'result', Lhs.1, Lhs.2
14623588 ~2058% {2} | REWRITE WITH Tmp.0 := ".", Out.0 := (In.2 ++ Tmp.0 ++ In.3) KEEPING 2
14623588 ~2139% {5} r6 = JOIN r4 WITH Attributes::AttrRead#class#f6c3f431 ON FIRST 1 OUTPUT _, Lhs.0 'result', Lhs.1, Lhs.2, _
14623588 ~2092% {2} | REWRITE WITH Tmp.0 := ".", Tmp.0 := (In.2 ++ Tmp.0 ++ In.3), Tmp.4 := ".__init__", Out.0 := (Tmp.0 ++ Tmp.4) KEEPING 2
29247176 ~2099% {2} r7 = r5 UNION r6
199786001 ~6922% {2} | JOIN WITH `Module::isPreferredModuleForName/2#5fb427f9_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'result'
199756923 ~7024% {2} | JOIN WITH `Module::Module.getFile/0#dispred#53eb9b1b_10#join_rhs` ON FIRST 1 OUTPUT Lhs.1 'result', Rhs.1 'm'
199799135 ~6954% {2} r8 = r1 UNION r2 UNION r3 UNION r7
199793992 ~6954% {2} | AND NOT `ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0#prev`(FIRST 2)
return r8
```
Clearly, waiting to joining with `getModuleReference` last is not
healthy. To fix this, I opted to simply create a helper predicate for
the `accesses` construct.
After this change, here are the iteration timings
```
(327s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i1#74f41yqa after 1.2s:
(327s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i1#8a053ys7 after 1.3s:
(327s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i2#74f41yqa after 20ms:
(327s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i2#8a053ys7 after 20ms:
(337s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i4#74f41yqa after 8.5s:
(341s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i4#8a053ys7 after 3.2s:
(346s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i5#74f41yqa after 7.2s:
(349s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i6#74f41yqa after 3ms:
(352s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i5#8a053ys7 after 10s:
(352s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i8#74f41yqa after 37ms:
(352s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i9#74f41yqa after 0ms:
(352s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i10#74f41yqa after 0ms:
(352s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i11#74f41yqa after 1ms:
(352s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i12#74f41yqa after 1ms:
(353s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i6#8a053ys7 after 1ms:
(354s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i8#8a053ys7 after 7ms:
(354s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i9#8a053ys7 after 0ms:
(354s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i10#8a053ys7 after 0ms:
(354s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i11#8a053ys7 after 0ms:
(354s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i12#8a053ys7 after 0ms:
```
And the helper predicate itself is also quick to evaluate:
```
(327s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i1#74f41xqa after 0ms:
(327s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i1#8a053xs7 after 0ms:
(329s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i3#74f41xqa after 99ms:
(337s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i3#8a053xs7 after 98ms:
(338s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i4#74f41xqa after 679ms:
(341s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i4#8a053xs7 after 400ms:
(346s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i5#74f41xqa after 1ms:
(349s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i6#74f41xqa after 22ms:
(352s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i5#8a053xs7 after 1ms:
(352s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i7#74f41xqa after 1.4s:
(352s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i8#74f41xqa after 8ms:
(352s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i9#74f41xqa after 0ms:
(352s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i10#74f41xqa after 1ms:
(352s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i11#74f41xqa after 1ms:
(352s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i12#74f41xqa after 1ms:
(353s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i13#74f41xqa after 806ms:
(353s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i6#8a053xs7 after 7ms:
(354s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i7#8a053xs7 after 870ms:
(354s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i8#8a053xs7 after 2ms:
(354s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i9#8a053xs7 after 0ms:
(354s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i10#8a053xs7 after 0ms:
(354s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i11#8a053xs7 after 0ms:
(354s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i12#8a053xs7 after 0ms:
(354s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i13#8a053xs7 after 276ms:
```
(I note that we appear to be evaluating this code twice, which is a bit
worrying. I'll leave that investigaton for later.)
Two of the generated summaries have been excluded:
- ["re", "Member[split]", "Argument[0,pattern:]", "ReturnValue", "taint"]
From the documentation, it is not clear why pattern should figure in the return value, as that is the part denoting split point and thus all those instances are filtered out.
From the implementation
Spit function: https://github.com/python/cpython/blob/3.12/Lib/re/__init__.py#L199
_compile function being called by split: https://github.com/python/cpython/blob/3.12/Lib/re/__init__.py#L280
We see that in case the pattern is already a compiled `Pattern`, it is returned directly from _compile and could thus be part of the return value from split. This is probably not possible to arrange for an attacker, and so an FP in practice.
- ["urllib2", "Member[unquote]", "Argument[0,string:]", "ReturnValue", "taint"]
urllib2 seems to be only in Python2 (e.g. https://docs.python.org/2.7/library/urllib2.html) and I cannot locate the function unquote.
There is already a model there so we add to that one.
We did observe that this existing model was blocked by the external MaD model.
This is concerning and needs to be cleared up.
- `quote` together with `re.compile` recover regex injection alerts on haiwen/seahub
- `quote_plus` recovers the URL redirection alert on DemocracyClub/EveryElection
- `unquote` recovers path injection alerts on `cloudera/hue`
- it was tedious finding justifications for the rest..