Commit Graph

820 Commits

Author SHA1 Message Date
Anders Schack-Mulligen
8470e91c16 Legacy Dataflow: Sync. 2024-08-20 10:07:57 +02:00
Joe Farebrother
82da8b95a7 Fix typo 2024-07-29 23:29:19 +01:00
Joe Farebrother
ef3bbeacd6 Add check for kwargs in cookie attribute predicates 2024-07-29 11:17:42 +01:00
Anders Schack-Mulligen
7a48fe1102 Dataflow: Replace ppReprType with DataFlowType.toString. 2024-07-25 13:08:47 +02:00
Anders Schack-Mulligen
da5abc8321 Dataflow: Replace MakeSets with QlBuiltins::InternSets. 2024-07-15 13:35:57 +02:00
Tom Hvitved
da0909c080 Merge pull request #16896 from hvitved/ssa/dataflow-integration-prep
SSA: Add `BasicBlock.{getNode/1,length/0}` to the input signature
2024-07-03 19:56:35 +02:00
Tom Hvitved
4ae8720930 SSA: Add BasicBlock.{getNode/1,length/0} to the input signature 2024-07-03 11:32:35 +02:00
Taus
446dbf67cc Python: Fix bad join in getImmediateModuleReference
The "most expensive predicates" report had the following line on a
certain database:

```
1m15s |    11 |   37s @ 4    | ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0@12bb4xdo
```

Investigating further revealed the following bad joins

```
(388s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i4#b2325xoe after 36.5s:
0         ~0%        {2} r1 = JOIN `ImportResolution::ImportResolution::sys_modules_module_with_name/1#134529bf#prev_delta` WITH `ImportResolution::ImportResolution::getReferenceToModuleName/1#bc5da225` ON FIRST 1 OUTPUT Rhs.1 'result', Lhs.1 'm'

74884348  ~0%        {3} r2 = JOIN `ImportResolution::ImportResolution::getModuleReference/1#28368ea4#prev_delta` WITH `ImportResolution::ImportResolution::potential_module_export/2#19340171` ON FIRST 1 OUTPUT Lhs.1, Rhs.1, Lhs.0
5221604   ~0%        {3}    | JOIN WITH `Attributes::AttrRef.accesses/2#dispred#31929f12_120#join_rhs` ON FIRST 2 OUTPUT Rhs.2 'result', Lhs.2, Lhs.1
5219926   ~2%        {3}    | JOIN WITH Attributes::AttrRead#class#f6c3f431 ON FIRST 1 OUTPUT Lhs.1, Lhs.2, Lhs.0 'result'
5300880   ~1%        {2}    | JOIN WITH `ImportResolution::ImportResolution::module_export/3#f2fc6a2a` ON FIRST 2 OUTPUT Rhs.2, Lhs.2 'result'
42211     ~5%        {2}    | JOIN WITH `ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0#prev` ON FIRST 1 OUTPUT Lhs.1 'result', Rhs.1 'm'

957042    ~4%        {3} r3 = JOIN `ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0#prev_delta` WITH `ImportResolution::ImportResolution::module_export/3#f2fc6a2a_201#join_rhs` ON FIRST 1 OUTPUT Rhs.1, Rhs.2, Lhs.1 'm'
957035    ~0%        {3}    | JOIN WITH `ImportResolution::ImportResolution::potential_module_export/2#19340171` ON FIRST 2 OUTPUT Lhs.1, Lhs.2 'm', Lhs.0
236753257 ~1%        {4}    | JOIN WITH `Attributes::AttrRef.accesses/2#dispred#31929f12_201#join_rhs` ON FIRST 1 OUTPUT Rhs.1 'result', Lhs.1 'm', Lhs.2, Rhs.2
199557145 ~2%        {4}    | JOIN WITH Attributes::AttrRead#class#f6c3f431 ON FIRST 1 OUTPUT Lhs.2, Lhs.3, Lhs.1 'm', Lhs.0 'result'
1         ~0%        {2}    | JOIN WITH `ImportResolution::ImportResolution::getModuleReference/1#28368ea4#prev` ON FIRST 2 OUTPUT Lhs.3 'result', Lhs.2 'm'

15199013  ~1951%     {2} r4 = JOIN `ImportResolution::ImportResolution::getModuleReference/1#28368ea4#prev_delta` WITH `Module::Module.getPackageName/0#dispred#bb0c3872` ON FIRST 1 OUTPUT Lhs.1, Rhs.1
14707604  ~2136%     {3}    | JOIN WITH `Attributes::AttrRef.accesses/2#dispred#31929f12_102#join_rhs` ON FIRST 1 OUTPUT Rhs.1 'result', Lhs.1, Rhs.2

14623588  ~2190%     {4} r5 = JOIN r4 WITH Attributes::AttrRead#class#f6c3f431 ON FIRST 1 OUTPUT _, Lhs.0 'result', Lhs.1, Lhs.2
14623588  ~2058%     {2}    | REWRITE WITH Tmp.0 := ".", Out.0 := (In.2 ++ Tmp.0 ++ In.3) KEEPING 2

14623588  ~2139%     {5} r6 = JOIN r4 WITH Attributes::AttrRead#class#f6c3f431 ON FIRST 1 OUTPUT _, Lhs.0 'result', Lhs.1, Lhs.2, _
14623588  ~2092%     {2}    | REWRITE WITH Tmp.0 := ".", Tmp.0 := (In.2 ++ Tmp.0 ++ In.3), Tmp.4 := ".__init__", Out.0 := (Tmp.0 ++ Tmp.4) KEEPING 2

29247176  ~2099%     {2} r7 = r5 UNION r6
199786001 ~6922%     {2}    | JOIN WITH `Module::isPreferredModuleForName/2#5fb427f9_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'result'
199756923 ~7024%     {2}    | JOIN WITH `Module::Module.getFile/0#dispred#53eb9b1b_10#join_rhs` ON FIRST 1 OUTPUT Lhs.1 'result', Rhs.1 'm'

199799135 ~6954%     {2} r8 = r1 UNION r2 UNION r3 UNION r7
199793992 ~6954%     {2}    | AND NOT `ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0#prev`(FIRST 2)
                     return r8
```

Clearly, waiting to joining with `getModuleReference` last is not
healthy. To fix this, I opted to simply create a helper predicate for
the `accesses` construct.

After this change, here are the iteration timings

```
(327s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i1#74f41yqa after 1.2s:
(327s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i1#8a053ys7 after 1.3s:
(327s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i2#74f41yqa after 20ms:
(327s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i2#8a053ys7 after 20ms:
(337s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i4#74f41yqa after 8.5s:
(341s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i4#8a053ys7 after 3.2s:
(346s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i5#74f41yqa after 7.2s:
(349s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i6#74f41yqa after 3ms:
(352s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i5#8a053ys7 after 10s:
(352s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i8#74f41yqa after 37ms:
(352s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i9#74f41yqa after 0ms:
(352s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i10#74f41yqa after 0ms:
(352s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i11#74f41yqa after 1ms:
(352s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i12#74f41yqa after 1ms:
(353s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i6#8a053ys7 after 1ms:
(354s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i8#8a053ys7 after 7ms:
(354s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i9#8a053ys7 after 0ms:
(354s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i10#8a053ys7 after 0ms:
(354s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i11#8a053ys7 after 0ms:
(354s) Tuple counts for ImportResolution::ImportResolution::getImmediateModuleReference/1#3553e6c0#reorder_1_0/2@i12#8a053ys7 after 0ms:
```

And the helper predicate itself is also quick to evaluate:

```
(327s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i1#74f41xqa after 0ms:
(327s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i1#8a053xs7 after 0ms:
(329s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i3#74f41xqa after 99ms:
(337s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i3#8a053xs7 after 98ms:
(338s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i4#74f41xqa after 679ms:
(341s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i4#8a053xs7 after 400ms:
(346s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i5#74f41xqa after 1ms:
(349s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i6#74f41xqa after 22ms:
(352s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i5#8a053xs7 after 1ms:
(352s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i7#74f41xqa after 1.4s:
(352s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i8#74f41xqa after 8ms:
(352s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i9#74f41xqa after 0ms:
(352s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i10#74f41xqa after 1ms:
(352s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i11#74f41xqa after 1ms:
(352s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i12#74f41xqa after 1ms:
(353s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i13#74f41xqa after 806ms:
(353s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i6#8a053xs7 after 7ms:
(354s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i7#8a053xs7 after 870ms:
(354s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i8#8a053xs7 after 2ms:
(354s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i9#8a053xs7 after 0ms:
(354s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i10#8a053xs7 after 0ms:
(354s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i11#8a053xs7 after 0ms:
(354s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i12#8a053xs7 after 0ms:
(354s) Tuple counts for ImportResolution::ImportResolution::module_reference_accesses/3#8f45b418#reorder_1_2_0/3@i13#8a053xs7 after 276ms:
```

(I note that we appear to be evaluating this code twice, which is a bit
worrying. I'll leave that investigaton for later.)
2024-07-01 12:53:04 +00:00
Taus
d9b337cb2c Merge pull request #16804 from github/tausbn/python-fix-bad-join-in-dataflow-dispatch
Python: Fix bad join in `DataFlowDispatch`
2024-07-01 13:14:28 +02:00
Anders Schack-Mulligen
8c23e21073 Dataflow: Cache compatibleTypes. 2024-06-24 13:35:48 +02:00
Taus
6db7e72fb8 Python: Fix bad join in DataFlowDispatch
A case of bad magic. Rather than evaluating separately whether a class
has a method of some name, the compiler opted to magick in the fact
that this was done as part of the `findFunctionAccordingToMro`
predicate. Hilarity ensued.

However, _we_ know that magic really isn't needed in this case (the
number of results is bounded by `Class.getAMethod` since methods have
only a single name), so by factoring it out into a helper predicate, we
can help the join-orderer along.

Before
```
(377s) Starting to evaluate predicate _DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev_DataFlowDispatch::getNextClassInMro/1#__#shared/3@i6#L3#f893bw2h (iteration 6)
(377s) Tuple counts for _DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev_DataFlowDispatch::getNextClassInMro/1#__#shared/3@i6#L3#f893bw2h after 16ms:
33363  ~0%     {2} r1 = SCAN `DataFlowDispatch::getNextClassInMro/1#e1ee596a#prev_delta` OUTPUT In.1, In.0 'arg1'
159696 ~4%     {3}    | JOIN WITH `DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev` ON FIRST 1 OUTPUT Rhs.1 'arg0', Lhs.1 'arg1', Rhs.2 'arg2'
               return r1
(377s) Starting to evaluate predicate _Class::Class.getAMethod/0#dispred#66416e47_Function::Function.getName/0#dispred#033700ef_10#join_rh__#antijoin_rhs/3@i6#L4#f893bw2h (iteration 6)
(382s) Tuple counts for _Class::Class.getAMethod/0#dispred#66416e47_Function::Function.getName/0#dispred#033700ef_10#join_rh__#antijoin_rhs/3@i6#L4#f893bw2h after 4.4s:
1770825904 ~4%     {4} r1 = JOIN `_DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev_DataFlowDispatch::getNextClassInMro/1#__#shared` WITH `Function::Function.getName/0#dispred#033700ef_10#join_rhs` ON FIRST 1 OUTPUT Lhs.1 'arg0', Rhs.1, Lhs.0 'arg1', Lhs.2 'arg2'
34558      ~3%     {3}    | JOIN WITH `Class::Class.getAMethod/0#dispred#66416e47` ON FIRST 2 OUTPUT Lhs.0 'arg0', Lhs.2 'arg1', Lhs.3 'arg2'
                   return r1
...
(382s) Starting to evaluate predicate DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3/3@i6#f893b1xh (iteration 6)
(382s)                     - DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3_delta has 125138 rows (order for disjuncts: delta=<standard>).
(382s) Tuple counts for DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3/3@i6#f893b1xh after 12ms:
33363  ~0%     {2} r1 = SCAN `DataFlowDispatch::getNextClassInMro/1#e1ee596a#prev_delta` OUTPUT In.1, In.0 'cls'
159696 ~0%     {3}    | JOIN WITH `DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev` ON FIRST 1 OUTPUT Lhs.1 'cls', Rhs.1 'name', Rhs.2 'result'
125138 ~1%     {3}    | AND NOT `_Class::Class.getAMethod/0#dispred#66416e47_Function::Function.getName/0#dispred#033700ef_10#join_rh__#antijoin_rhs`(FIRST 3)

0      ~0%     {3} r2 = JOIN `DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev_delta` WITH `DataFlowDispatch::getNextClassInMro/1#e1ee596a#reorder_1_0#prev` ON FIRST 1 OUTPUT Lhs.1 'name', Lhs.2 'result', Rhs.1 'cls'
               {3}    | AND NOT `_Class::Class.getAMethod/0#dispred#66416e47_DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#__#antijoin_rhs`(FIRST 3)
0      ~0%     {3}    | SCAN OUTPUT In.2 'cls', In.0 'name', In.1 'result'

125138 ~1%     {3} r3 = r1 UNION r2
125138 ~1%     {3}    | AND NOT `DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev`(FIRST 3)
               return r3
```

And now
```
(18s) Tuple counts for DataFlowDispatch::class_has_method/2#0d2ae9c0/2@ff66c1lr after 18ms:
202279 ~1%     {2} r1 = JOIN `Class::Class.getAMethod/0#dispred#66416e47_10#join_rhs` WITH `Function::Function.getName/0#dispred#033700ef` ON FIRST 1 OUTPUT Lhs.1 'cls', Rhs.1 'name'
               return r1
...
(490s) Tuple counts for DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3/3@i6#48b6c1xi after 54ms:
0      ~0%     {3} r1 = JOIN `DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev_delta` WITH `DataFlowDispatch::getNextClassInMro/1#e1ee596a#reorder_1_0#prev` ON FIRST 1 OUTPUT Rhs.1 'cls', Lhs.1 'name', Lhs.2 'result'
0      ~0%     {3}    | AND NOT `DataFlowDispatch::class_has_method/2#0d2ae9c0`(FIRST 2)

33363  ~0%     {2} r2 = SCAN `DataFlowDispatch::getNextClassInMro/1#e1ee596a#prev_delta` OUTPUT In.1, In.0 'cls'
159696 ~0%     {3}    | JOIN WITH `DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev` ON FIRST 1 OUTPUT Lhs.1 'cls', Rhs.1 'name', Rhs.2 'result'
125138 ~1%     {3}    | AND NOT `DataFlowDispatch::class_has_method/2#0d2ae9c0`(FIRST 2)

125138 ~1%     {3} r3 = r1 UNION r2
125138 ~1%     {3}    | AND NOT `DataFlowDispatch::findFunctionAccordingToMro/2#a610c0a3#prev`(FIRST 3)
               return r3
```
2024-06-21 12:16:27 +00:00
Asger F
6e0f3df573 Merge pull request #14120 from asgerf/dynamic/typemodel-istypeused
Dynamic: add TypeModel.isTypeUsed
2024-06-06 15:31:16 +02:00
Anders Schack-Mulligen
1432519cc2 Dataflow: Add totalorder predicates to all languages. 2024-05-27 11:01:52 +02:00
Anders Schack-Mulligen
bc8ca1af86 Dataflow: Introduce NodeRegions for use in isUnreachableInCall. 2024-05-27 11:01:51 +02:00
Asger F
0b78d1d953 Python: add qldoc 2024-05-21 14:40:35 +02:00
Asger F
13d01f1ec4 Ruby/Python: add recursion guard 2024-05-21 14:40:15 +02:00
Rasmus Lerchedahl Petersen
e66cce7fe1 python: add qldoc and refactor
The logic of which steps an `AdditionalTaintStep` has defined
is now pushed into the defitnion of `AdditionalTaintStep`.
2024-05-17 09:49:31 +02:00
Rasmus Lerchedahl Petersen
20ea9255a1 Python: Allow provenance in additional taint steps 2024-05-16 14:04:10 +02:00
erik-krogh
baa31e1469 delete outdated deprecations 2024-04-25 22:19:28 +02:00
Rasmus Wriedt Larsen
13ff9412a4 Merge pull request #16252 from RasmusWL/move-dataflow-tests
Python: Move dataflow tests out of experimental
2024-04-25 10:05:06 +02:00
Anders Schack-Mulligen
b2f09949df Merge pull request #15599 from aschackmull/dataflow/fieldflowbranchlimit-v2
Dataflow: update fieldFlowBranchLimit semantics
2024-04-23 10:08:05 +02:00
Rasmus Wriedt Larsen
e0e405bb31 Python: replace dataflow-test location in files 2024-04-23 09:40:59 +02:00
Taus
b484aee39e Python: Autoformat everything
Of course, `StringLiteral` being much longer than `StrConst` meant a
bunch of files changed formatting.
2024-04-22 12:00:09 +00:00
Taus
1c68c987b0 Python: Change all remaining occurrences of StrConst
Done using
```
git grep StrConst | xargs sed -i 's/StrConst/StringLiteral/g'
```
2024-04-22 12:00:09 +00:00
Anders Schack-Mulligen
2f0987e980 Dataflow: Add dummy DataFlowSecondLevelScope implementations.
These could be an empty type, but Unit was available and it probably
doesn't matter.
2024-04-15 15:16:30 +02:00
Anders Schack-Mulligen
a8fc100108 Python: Add alert provenance plumbing. 2024-04-12 09:20:08 +02:00
Anders Schack-Mulligen
eafc0075fd Legacy dataflow: Sync. 2024-04-12 09:19:54 +02:00
yoff
1048cf7c5e Merge pull request #15711 from RasmusWL/tt-content
Python: Add type tracking for content
2024-04-09 10:37:43 +02:00
Rasmus Wriedt Larsen
a22b9947c0 Python: Revert IterableSequenceNode as LocalSourceNode
When looking things over a bit more, we could actually exclude the steps
that would never be used instead. A much more involved solution, but
more performance oriented and clear in terms of what is supported (at
least until we start supporting type-tracking with more than depth 1
access-path, if that ever happens)
2024-04-02 16:51:00 +02:00
Rasmus Wriedt Larsen
8707a63edb Python: Add comments around storeStepCommon 2024-04-02 13:26:26 +02:00
Rasmus Wriedt Larsen
20202aba90 Python: Deprecate AttributeName 2024-04-02 13:21:46 +02:00
Rasmus Wriedt Larsen
93f940aa9c Python: Join-order improvement for DataFlowDispatch::TrackAttrReadInput
I was surprised to see that this predicate actually gets evaluated 3 times

- Pipeline standard for DataFlowDispatch::TrackAttrReadInput::start/2#67f26627@c15596yu was evaluated in 74 iterations totaling 165ms (delta sizes total: 113119).
- Pipeline standard for DataFlowDispatch::TrackAttrReadInput::start/2#67f26627@3459ejws was evaluated in 30 iterations totaling 76ms (delta sizes total: 32555).
- Pipeline standard for DataFlowDispatch::TrackAttrReadInput::start/2#67f26627@5ac22jwq was evaluated in 30 iterations totaling 108ms (delta sizes total: 32555).

It does however fit with it being used in exactly 3 places: https://github.com/search?q=repo%3Agithub%2Fcodeql+%2FattrReadTracker%5C%28%2F&type=code -- so I assume it's because each use forces a new evaluation. Although that's something we could look into solving, for now I'm just trying to fix the join-order.

Initial

```
Pipeline standard for DataFlowDispatch::TrackAttrReadInput::start/2#67f26627@3459ejws was evaluated in 30 iterations totaling 76ms (delta sizes total: 32555).
        7068090   ~0%    {2} r1 = SCAN Attributes::AttrRead#class#f6c3f431 OUTPUT In.0, In.0
                         {2}    | AND NOT `DataFlowDispatch::TrackAttrReadInput::start/2#67f26627#prev`(FIRST 2)
        3901178   ~5%    {2}    | SCAN OUTPUT In.1, In.1
        3901178   ~0%    {3}    | JOIN WITH `Attributes::AttrRef.getObject/0#dispred#d7cd0a97` ON FIRST 1 OUTPUT Rhs.1, Lhs.0, Lhs.1

          13615   ~1%    {2} r2 = JOIN r1 WITH `DataFlowDispatch::classTracker/1#d11f2237#reorder_1_0#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2

             94   ~2%    {2} r3 = JOIN r1 WITH `DataFlowDispatch::superCallTwoArgumentTracker/2#d18be99f#reorder_2_0_1#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2

          18846   ~1%    {2} r4 = JOIN r1 WITH `DataFlowDispatch::classInstanceTracker/1#d73ecef4#prev_delta_1#join_rhs` ON FIRST 1 OUTPUT Lhs.1, Lhs.2

          32555   ~1%    {2} r5 = r2 UNION r3 UNION r4
                         return r5
```

==>

```
Pipeline standard for DataFlowDispatch::TrackAttrReadInput::start/2#67f26627@f2517jwq was evaluated in 30 iterations totaling 12ms (delta sizes total: 32704).
        186719  ~121%    {1} r1 = SCAN `DataFlowDispatch::classInstanceTracker/1#d73ecef4#prev_delta` OUTPUT In.1

        164342  ~158%    {1} r2 = SCAN `DataFlowDispatch::classTracker/1#d11f2237#reorder_1_0#prev_delta` OUTPUT In.0

            96    ~0%    {1} r3 = SCAN `DataFlowDispatch::superCallTwoArgumentTracker/2#d18be99f#reorder_2_0_1#prev_delta` OUTPUT In.0

        351157   ~80%    {1} r4 = r1 UNION r2 UNION r3
         88074   ~14%    {1}    | JOIN WITH `Attributes::AttrRef.getObject/0#dispred#d7cd0a97_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1
         41789   ~18%    {2}    | JOIN WITH Attributes::AttrRead#class#f6c3f431 ON FIRST 1 OUTPUT Lhs.0, Lhs.0
                         {2}    | AND NOT `DataFlowDispatch::TrackAttrReadInput::start/2#67f26627#prev`(FIRST 2)
         32883    ~2%    {2}    | SCAN OUTPUT In.1, In.1
                         return r4
```

AND

initial

```
Pipeline standard for DataFlowDispatch::TrackAttrReadInput::start/2#67f26627@c15596yu was evaluated in 74 iterations totaling 165ms (delta sizes total: 113119).
        17434622   ~0%    {2} r1 = SCAN Attributes::AttrRead#class#f6c3f431 OUTPUT In.0, In.0
                          {2}    | AND NOT `DataFlowDispatch::TrackAttrReadInput::start/2#67f26627#prev`(FIRST 2)
         9483976   ~4%    {2}    | SCAN OUTPUT In.1, In.1
         9483976   ~0%    {3}    | JOIN WITH `Attributes::AttrRef.getObject/0#dispred#d7cd0a97` ON FIRST 1 OUTPUT Rhs.1, Lhs.0, Lhs.1

           19258   ~1%    {2} r2 = JOIN r1 WITH `DataFlowDispatch::classInstanceTracker/1#d73ecef4#reorder_1_0#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2

            1654   ~1%    {2} r3 = JOIN r1 WITH `DataFlowDispatch::superCallNoArgumentTracker/1#0a2e8a06#reorder_1_0#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2

            1314   ~4%    {2} r4 = JOIN r1 WITH `DataFlowDispatch::clsArgumentTracker/1#47339327#reorder_1_0#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2

              94   ~2%    {2} r5 = JOIN r1 WITH `DataFlowDispatch::superCallTwoArgumentTracker/2#d18be99f#reorder_2_0_1#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2

           77217   ~0%    {2} r6 = JOIN r1 WITH `DataFlowDispatch::selfTracker/1#f157aa27#reorder_1_0#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2

           13632   ~1%    {2} r7 = JOIN r1 WITH `DataFlowDispatch::classTracker/1#d11f2237#reorder_1_0#prev_delta` ON FIRST 1 OUTPUT Lhs.1, Lhs.2

          113169   ~0%    {2} r8 = r2 UNION r3 UNION r4 UNION r5 UNION r6 UNION r7
                          return r8
```
==>

```
Pipeline standard for DataFlowDispatch::TrackAttrReadInput::start/2#67f26627@d732e6yt was evaluated in 74 iterations totaling 31ms (delta sizes total: 113129).
        186719  ~150%    {1} r1 = SCAN `DataFlowDispatch::classInstanceTracker/1#d73ecef4#reorder_1_0#prev_delta` OUTPUT In.0

          1669    ~0%    {1} r2 = SCAN `DataFlowDispatch::superCallNoArgumentTracker/1#0a2e8a06#reorder_1_0#prev_delta` OUTPUT In.0

          3425   ~15%    {1} r3 = SCAN `DataFlowDispatch::clsArgumentTracker/1#47339327#prev_delta` OUTPUT In.1

            96    ~0%    {1} r4 = SCAN `DataFlowDispatch::superCallTwoArgumentTracker/2#d18be99f#reorder_2_0_1#prev_delta` OUTPUT In.0

        123310    ~0%    {1} r5 = SCAN `DataFlowDispatch::selfTracker/1#f157aa27#reorder_1_0#prev_delta` OUTPUT In.0

        164342  ~581%    {1} r6 = SCAN `DataFlowDispatch::classTracker/1#d11f2237#reorder_1_0#prev_delta` OUTPUT In.0

        479561   ~94%    {1} r7 = r1 UNION r2 UNION r3 UNION r4 UNION r5 UNION r6
        169424    ~2%    {1}    | JOIN WITH `Attributes::AttrRef.getObject/0#dispred#d7cd0a97_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1
        116290    ~0%    {2}    | JOIN WITH Attributes::AttrRead#class#f6c3f431 ON FIRST 1 OUTPUT Lhs.0, Lhs.0
                         {2}    | AND NOT `DataFlowDispatch::TrackAttrReadInput::start/2#67f26627#prev`(FIRST 2)
        113160    ~0%    {2}    | SCAN OUTPUT In.1, In.1
                         return r7
```
2024-03-21 15:55:58 +01:00
Rasmus Wriedt Larsen
cff63ad5d5 Python: Fix small join-order problem for call-graph
problem is:
```
           14294  ~33%    {1} r23 = r21 UNION r22
           13626   ~0%    {2}    | JOIN WITH `DataFlowPublic::Node.getEnclosingCallable/0#dispred#be95825a` ON FIRST 1 OUTPUT Rhs.1, Lhs.0
        11871493   ~2%    {2}    | JOIN WITH `DataFlowPublic::Node.getEnclosingCallable/0#dispred#be95825a_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1, Lhs.1
         6810938   ~3%    {2}    | JOIN WITH num#DataFlowPublic::TCfgNode#2cd2fb22_10#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1
               0   ~0%    {4}    | JOIN WITH `DataFlowDispatch::resolveMethodCall/4#3067f1f1#reorder_0_3_1_2#prev` ON FIRST 2 OUTPUT Rhs.3, Lhs.1, Lhs.0, Rhs.2
               0   ~0%    {4}    | JOIN WITH num#DataFlowDispatch::CallTypeClassMethod#3508c3e5 ON FIRST 1 OUTPUT Lhs.3, Lhs.2, Lhs.0, Lhs.1
               0   ~0%    {4}    | JOIN WITH `DataFlowDispatch::resolveCall/3#454c02d8#reorder_1_0_2#prev` ON FIRST 3 OUTPUT Lhs.3, Lhs.1, Lhs.0, Lhs.2
               0   ~0%    {5}    | JOIN WITH num#DataFlowDispatch::TSelfArgumentPosition#de6d64b8 CARTESIAN PRODUCT OUTPUT Lhs.1, Lhs.2, Lhs.3, Lhs.0, Rhs.0
```
that is, it does cartesian product of DataFlowPublic::Node.getEnclosingCallable

After fix

```
        14294  ~33%    {1} r23 = r21 UNION r22
            0   ~0%    {4}    | JOIN WITH `DataFlowDispatch::resolveMethodCall/4#3067f1f1#reorder_3_0_1_2#prev` ON FIRST 1 OUTPUT Rhs.3, Lhs.0, Rhs.1, Rhs.2
            0   ~0%    {4}    | JOIN WITH num#DataFlowDispatch::CallTypeClassMethod#3508c3e5 ON FIRST 1 OUTPUT Lhs.3, Lhs.2, Lhs.0, Lhs.1
            0   ~0%    {4}    | JOIN WITH `DataFlowDispatch::resolveCall/3#454c02d8#reorder_1_0_2#prev` ON FIRST 3 OUTPUT Lhs.1, Lhs.3, Lhs.0, Lhs.2
            0   ~0%    {5}    | JOIN WITH num#DataFlowPublic::TCfgNode#2cd2fb22 ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.0, Lhs.2, Lhs.3
            0   ~0%    {5}    | JOIN WITH `DataFlowPublic::Node.getEnclosingCallable/0#dispred#be95825a` ON FIRST 1 OUTPUT Lhs.1, Rhs.1, Lhs.2, Lhs.3, Lhs.4
            0   ~0%    {4}    | JOIN WITH `DataFlowPublic::Node.getEnclosingCallable/0#dispred#be95825a` ON FIRST 2 OUTPUT Lhs.0, Lhs.2, Lhs.3, Lhs.4
            0   ~0%    {5}    | JOIN WITH num#DataFlowDispatch::TSelfArgumentPosition#de6d64b8 CARTESIAN PRODUCT OUTPUT Lhs.1, Lhs.2, Lhs.3, Lhs.0, Rhs.0
```

Overall stats

(old)
Pipeline standard for DataFlowDispatch::getCallArg/5#21589076@b30c7vxg was evaluated in 51 iterations totaling 54ms (delta sizes total: 38247).

==>

(new)
Pipeline standard for DataFlowDispatch::getCallArg/5#21589076@c1559vxu was evaluated in 51 iterations totaling 28ms (delta sizes total: 38247).
2024-03-21 12:31:58 +01:00
yoff
ee411cc53a Merge pull request #15936 from yoff/python/test-conflicting-summaries
Python: No `fieldFlowBranchLimit` for `SummarizedCallable`s
2024-03-19 16:56:56 +01:00
Tom Hvitved
fc55567d90 Merge pull request #15853 from hvitved/dataflow/get-location
Data flow: Replace `hasLocationInfo` with `getLocation`
2024-03-18 20:21:46 +01:00
Rasmus Lerchedahl Petersen
2a0c451d2d python: No fieldFlowBranchLimit for SummarizedCallables
Like https://github.com/github/codeql/pull/15689 for Ruby.
2024-03-18 10:29:36 +01:00
Rasmus Wriedt Larsen
7eb4419342 Python: Restrict type-tracking content to only be precise
At least for now :)
2024-03-15 10:24:57 +01:00
Rasmus Wriedt Larsen
7a3ee0f5f8 Python: Make IterableSequenceNode LocalSourceNode
We do this to remove the inconsistencies, and to be ready for a future
where type-tracking support content tracker of depth > 1.

It works because targets of loadSteps needs to be LocalSourceNodes

predicate loadStep(Node nodeFrom, LocalSourceNode nodeTo, Content content) {
2024-03-14 10:46:29 +01:00
Rasmus Wriedt Larsen
af8cef5b53 Python: Fixup deprecated type-tracker API 2024-03-14 10:43:28 +01:00
Rasmus Wriedt Larsen
92729dbbd6 Python: Support iterable unpacking in type-tracking 2024-03-14 10:42:38 +01:00
Rasmus Wriedt Larsen
dac2b57bb0 Python: type-track through dict-updates 2024-03-14 10:42:38 +01:00
Rasmus Wriedt Larsen
73fe596753 Python: type-tracking through dictionary construction 2024-03-14 10:42:38 +01:00
Rasmus Wriedt Larsen
ece8245a4b Python: type-track through tuple content 2024-03-14 10:42:38 +01:00
Rasmus Wriedt Larsen
7721fb3331 Python: Setup shared read/store steps 2024-03-14 10:42:37 +01:00
Rasmus Wriedt Larsen
636cf611ae Python: Allow general content in type-tracker
This should not result in many changes, since store/load steps are still
only implemented for attributes.
2024-03-14 10:42:37 +01:00
Rasmus Wriedt Larsen
fc8caa66c8 Python: Prepare for general content in type-tracker
Due to the char-pred of Content, this change should keep exactly the
same behavior as before.
2024-03-14 10:42:37 +01:00
Tom Hvitved
6c0ed28e6b Python: Implement new data flow interface 2024-03-13 14:41:57 +01:00
Tom Hvitved
dddba3228b Merge pull request #15867 from hvitved/dataflow/ap-limit
Data flow: Add `ConfigSig::accessPathLimit`
2024-03-12 14:57:51 +01:00
yoff
adbcbefaa9 Merge pull request #15551 from yoff/python/avoid-duplicate-model-inclusions
python: Remove `TaintStepFromSummary`
2024-03-11 13:52:20 +01:00
Tom Hvitved
da66281fef Sync files 2024-03-11 13:02:04 +01:00