codeql

mirror of https://github.com/github/codeql.git synced 2026-02-11 12:41:06 +01:00

Author	SHA1	Message	Date
yoff	75ac24a847	Merge branch 'main' into python-dataflow/flow-summaries-from-scratch	2022-08-10 10:57:59 +02:00
Rasmus Wriedt Larsen	b541103b7f	Merge pull request #9846 from tausbn/python-fix-bad-syntactic_call_count-join Python: Fix bad join in `syntactic_call_count`	2022-08-10 10:09:51 +02:00
Erik Krogh Kristensen	559ec7ba56	Merge branch 'main' into repeatedWord	2022-08-09 21:22:47 +02:00
Rasmus Wriedt Larsen	f89b32183f	Merge branch 'main' into typetracker-decorators	2022-08-08 11:52:09 +02:00
Anders Schack-Mulligen	3d47875b60	Dataflow: Generate shorter RA/DIL names.	2022-08-05 11:00:56 +02:00
Anders Schack-Mulligen	d3dcc3ce3a	Dataflow: Sync.	2022-08-05 11:00:56 +02:00
Rasmus Wriedt Larsen	1737d08145	Merge pull request #9579 from yoff/python/more-logic-tests Python: Improve `BarrierGuard`	2022-08-01 11:36:11 +02:00
alexet	f9b6ca76e5	Python: Fix binding incorrect predicate.	2022-07-18 16:28:19 +01:00
Taus	bdd771989f	Python: Fix bad join in `syntactic_call_count` On certain databases, the evaluation of this predicate was running out of memory due to the way the `count` aggregate was being used. Here's an example of the tuple counts involved: ``` Tuple counts for PointsToContext::syntactic_call_count#cf3039a0#ff#antijoin_rhs/1@d2199bb8 after 1m27s: 595518502 ~521250% {1} r1 = JOIN PointsToContext::syntactic_call_count#cf3039a0#ff#shared#3 WITH Flow::CallNode::getFunction#dispred#f0820431#ff_1#join_rhs ON FIRST 1 OUTPUT Lhs.1 'arg0' 26518709 ~111513% {1} r2 = JOIN PointsToContext::syntactic_call_count#cf3039a0#ff#shared#2 WITH Flow::CallNode::getFunction#dispred#f0820431#ff_1#join_rhs ON FIRST 1 OUTPUT Lhs.1 'arg0' 622037211 ~498045% {1} r3 = r1 UNION r2 return r3 ``` and a timing report that looked like this: ``` time \| evals \| max @ iter \| predicate ------\|-------\|--------------\|---------- 5m8s \| \| \| PointsToContext::syntactic_call_count#cf3039a0#ff#shared#2@6d98d1nd 4m38s \| \| \| PointsToContext::syntactic_call_count#cf3039a0#ff#count_range@f5df1do4 3m51s \| \| \| PointsToContext::syntactic_call_count#cf3039a0#ff#shared#3@da3b4abf 1m58s \| 7613 \| 37ms @ 4609 \| MRO::ClassListList::removedClassParts#f0820431#fffff#reorder_2_3_4_0_1@8155axyi 1m37s \| 7613 \| 33ms @ 3904 \| MRO::ClassListList::bestMergeCandidate#f0820431#2#fff@8155a83w 1m27s \| \| \| PointsToContext::syntactic_call_count#cf3039a0#ff#antijoin_rhs@d2199bb8 1m8s \| 1825 \| 63ms @ 404 \| PointsTo::Expressions::equalityEvaluatesTo#741b54e2#fffff@8155aw7w 37.6s \| \| \| PointsToContext::syntactic_call_count#cf3039a0#ff#join_rhs@e348fc1p ... ``` To make optimising this easier for the compiler, I moved the bodies of the `count` aggregate into their own helper predicates (with size linear in the number of `CallNode`s), and also factored out the many calls to `f.getName()`. The astute reader will notice that in writing this as a sum of `count`s rather than a count of a disjunction, the intersection (if it exists) will be counted twice, and so the semantics may be different. However, since `method_call` and `function_call` require `AttrNode` and `NameNode` functions respectively, and as these two types are disjoint, there is no intersection, and so the semantics should be preserved. After the change, the evaluation of `syntactic_call_count` now looks as follows: ``` Tuple counts for PointsToContext::syntactic_call_count#cf3039a0#ff/2@662dd8s0 after 216ms: 23960 ~0% {1} r1 = @py_scope#f AND NOT py_Functions_0#antijoin_rhs(Lhs.0 's') 23960 ~0% {2} r2 = SCAN r1 OUTPUT In.0 's', 0 276309 ~7% {2} r3 = SCAN @py_scope#f OUTPUT In.0 's', "__init__" 11763 ~0% {2} r4 = JOIN r3 WITH Scope::Scope::getName#dispred#f0820431#fb ON FIRST 2 OUTPUT Lhs.0 's', 1 35723 ~0% {2} r5 = r2 UNION r4 252349 ~0% {2} r6 = JOIN @py_scope#f WITH Function::Function::getName#dispred#f0820431#ff ON FIRST 1 OUTPUT Lhs.0 's', Rhs.1 240586 ~0% {2} r7 = SELECT r6 ON In.1 != "__init__" 131727 ~4% {2} r8 = r7 AND NOT project#PointsToContext::method_call#cf3039a0#ff(Lhs.1) 131727 ~0% {3} r9 = SCAN r8 OUTPUT In.1, In.0 's', 0 240586 ~0% {2} r10 = SCAN r7 OUTPUT In.1, In.0 's' 108859 ~0% {3} r11 = JOIN r10 WITH PointsToContext::syntactic_call_count#cf3039a0#ff#join_rhs ON FIRST 1 OUTPUT Lhs.0, Lhs.1 's', Rhs.1 240586 ~0% {3} r12 = r9 UNION r11 24100 ~0% {2} r13 = JOIN r12 WITH PointsToContext::syntactic_call_count#cf3039a0#ff#join_rhs#1 ON FIRST 1 OUTPUT Lhs.1 's', (Rhs.1 + Lhs.2) 240586 ~0% {2} r14 = SELECT r6 ON In.1 != "__init__" 131727 ~4% {2} r15 = r14 AND NOT project#PointsToContext::method_call#cf3039a0#ff(Lhs.1) 131727 ~0% {3} r16 = SCAN r15 OUTPUT In.0 's', In.1, 0 108859 ~4% {3} r17 = JOIN r10 WITH PointsToContext::syntactic_call_count#cf3039a0#ff#join_rhs ON FIRST 1 OUTPUT Lhs.1 's', Lhs.0, Rhs.1 240586 ~4% {3} r18 = r16 UNION r17 216486 ~2% {3} r19 = r18 AND NOT project#PointsToContext::function_call#cf3039a0#ff(Lhs.1) 216486 ~0% {2} r20 = SCAN r19 OUTPUT In.0 's', (0 + In.2) 240586 ~0% {2} r21 = r13 UNION r20 276309 ~0% {2} r22 = r5 UNION r21 return r22 ```	2022-07-18 13:58:00 +00:00
Erik Krogh Kristensen	85a652f3d1	remove a bunch of repeated words	2022-07-14 12:42:48 +02:00
yoff	f52d792b36	Merge branch 'main' of https://github.com/github/codeql into python-dataflow/flow-summaries-from-scratch	2022-07-01 12:01:07 +00:00
CodeQL CI	5b5a52fa25	Merge pull request #9551 from yoff/python/port-tarslip Approved by RasmusWL	2022-07-01 12:58:25 +01:00
yoff	61523bd330	python: better names - "Normal" instead of "NonSpecial" - "NonLibrary" instead of "2" I could not find a good replacement for "NonLibrary", nor for "Source", but I added QLDocs in a few places to help the reading.	2022-07-01 11:55:20 +00:00
yoff	a0db438799	python: rename `getACall2` -> `getANonLibraryCall`	2022-07-01 10:29:03 +00:00
yoff	f6af24894d	python: recover `isPackageUsed` - add `unknownAttribute` to pre-compute negation - add `Node`-less formulation of "is imported"	2022-07-01 09:39:07 +00:00
yoff	3a80baf39c	python: concession to get the code to compile `isPackageUsed` now does no filtering	2022-07-01 07:06:09 +00:00
yoff	e54ada175d	python: rewrite `not` away A `LocalSourceNode` is either a `ModuleVariableNode` or an `ExprNode`.	2022-07-01 07:03:14 +00:00
yoff	cf9b69b5f2	python: More helpful comment	2022-06-30 13:07:13 +00:00
yoff	b0a29b146a	Update python/ql/lib/semmle/python/security/dataflow/TarSlipQuery.qll Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>	2022-06-30 14:54:01 +02:00
yoff	df7ffb2880	Update python/ql/lib/semmle/python/security/dataflow/TarSlipCustomizations.qll Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>	2022-06-30 14:53:49 +02:00
yoff	8988a02806	Merge pull request #9733 from tausbn/python-fix-bad-mro-flatten-list-join Python: Fix bad join in MRO `flatten_list`	2022-06-29 13:29:48 +02:00
yoff	f122af81ea	Merge pull request #9741 from tausbn/python-fix-bad-join-in-regexpbackref-getgroup Python: Fix bad join in `RegExpBackRef::getGroup`	2022-06-29 13:23:07 +02:00
yoff	731f866242	Merge pull request #9717 from tausbn/python-fix-bad-mro-linearization-of-bases-join Python: Fix bad join in MRO	2022-06-29 13:08:18 +02:00
Jeroen Ketema	55e052af26	Merge pull request #9686 from aschackmull/dataflow/no-node-scan Dataflow performance: Avoid node scans	2022-06-29 10:38:56 +02:00
yoff	1105cd569b	Merge branch 'main' into python/port-tarslip	2022-06-28 22:17:28 +02:00
yoff	6087bc6888	Merge branch 'main' into python/more-logic-tests	2022-06-28 22:16:38 +02:00
yoff	ac0c8d238f	python: only clear taint on false-edge	2022-06-28 20:14:52 +00:00
Taus	38b8640582	Python: Fix bad join in `RegExpBackRef::getGroup` Although this wasn't (as far as I know) causing any performance issues, it was making the join-order badness report quite noisy, and so I figured it was worth fixing. Before: ``` Tuple counts for RegexTreeView::RegExpBackRef::getGroup#dispred#f0820431#ff/2@d3441d0b after 84ms: 1501195 ~3% {2} r1 = JOIN RegexTreeView::RegExpTerm::getLiteral#dispred#f0820431#ff_10#join_rhs WITH RegexTreeView::RegExpTerm::getLiteral#dispred#f0820431#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'result', Lhs.1 'result' 149 ~0% {5} r2 = JOIN r1 WITH RegexTreeView::RegExpBackRef#class#31aac2a7#ffff ON FIRST 1 OUTPUT Rhs.1, Rhs.2, Rhs.3, Lhs.1 'result', Lhs.0 'this' 149 ~1% {3} r3 = JOIN r2 WITH regex::RegexString::numbered_backreference#dispred#f0820431#ffff ON FIRST 3 OUTPUT Lhs.3 'result', Rhs.3, Lhs.4 'this' 4 ~0% {2} r4 = JOIN r3 WITH RegexTreeView::RegExpGroup::getNumber#dispred#f0820431#ff ON FIRST 2 OUTPUT Lhs.2 'this', Lhs.0 'result' 1501195 ~3% {2} r5 = JOIN RegexTreeView::RegExpTerm::getLiteral#dispred#f0820431#ff_10#join_rhs WITH RegexTreeView::RegExpTerm::getLiteral#dispred#f0820431#ff_10#join_rhs ON FIRST 1 OUTPUT Lhs.1 'result', Rhs.1 'result' 42526 ~0% {5} r6 = JOIN r5 WITH RegexTreeView::RegExpGroup#31aac2a7#ffff ON FIRST 1 OUTPUT Lhs.1 'this', Lhs.0 'result', Rhs.1, Rhs.2, Rhs.3 22 ~0% {8} r7 = JOIN r6 WITH RegexTreeView::RegExpBackRef#class#31aac2a7#ffff ON FIRST 1 OUTPUT Lhs.2, Lhs.3, Lhs.4, Lhs.1 'result', Lhs.0 'this', Rhs.1, Rhs.2, Rhs.3 0 ~0% {6} r8 = JOIN r7 WITH regex::RegexString::getGroupName#dispred#f0820431#ffff ON FIRST 3 OUTPUT Lhs.5, Lhs.6, Lhs.7, Rhs.3, Lhs.3 'result', Lhs.4 'this' 0 ~0% {2} r9 = JOIN r8 WITH regex::RegexString::named_backreference#dispred#f0820431#ffff ON FIRST 4 OUTPUT Lhs.5 'this', Lhs.4 'result' 4 ~0% {2} r10 = r4 UNION r9 return r10 ``` In this case I opted for a classical solution: tying together the literal and number (or name) part of the backreference in order to encourage a two-column join. After: ``` Tuple counts for RegexTreeView::RegExpBackRef::getGroup#dispred#f0820431#ff/2@b0cc4d5n after 0ms: 898 ~1% {3} r1 = JOIN RegexTreeView::RegExpTerm::getLiteral#dispred#f0820431#ff WITH RegexTreeView::RegExpGroup::getNumber#dispred#f0820431#ff ON FIRST 1 OUTPUT Lhs.1, Rhs.1, Lhs.0 'result' 4 ~0% {2} r2 = JOIN r1 WITH RegexTreeView::RegExpBackRef::hasLiteralAndNumber#f0820431#fff_120#join_rhs ON FIRST 2 OUTPUT Rhs.2 'this', Lhs.2 'result' 1110 ~0% {5} r3 = JOIN RegexTreeView::RegExpGroup#31aac2a7#ffff WITH RegexTreeView::RegExpTerm::getLiteral#dispred#f0820431#ff ON FIRST 1 OUTPUT Lhs.1, Lhs.2, Lhs.3, Lhs.0 'result', Rhs.1 146 ~0% {3} r4 = JOIN r3 WITH regex::RegexString::getGroupName#dispred#f0820431#ffff ON FIRST 3 OUTPUT Lhs.4, Rhs.3, Lhs.3 'result' 0 ~0% {2} r5 = JOIN r4 WITH RegexTreeView::RegExpBackRef::hasLiteralAndName#f0820431#fff_120#join_rhs ON FIRST 2 OUTPUT Rhs.2 'this', Lhs.2 'result' 4 ~0% {2} r6 = r2 UNION r5 return r6 ```	2022-06-28 16:51:09 +00:00
Taus	b98c482c47	Python: Fix bad join in MRO `flatten_list` This bad join was identified by the join-order-badness report, which showed that: py/use-of-input:MRO::flatten_list#f4eaf05f#fff#9c5fe54whnlqffdgu65vhb8uhpg# (order_500000) calculated a whopping 212,820,108 tuples in order to produce an output of size 55516, roughly 3833 times more effort than needed. Here's a snippet of the slowest iteration of that predicate: ``` Tuple counts for MRO::flatten_list#f4eaf05f#fff/3@i1839#0265eb3w after 14ms: 0 ~0% {3} r1 = JOIN MRO::need_flattening#f4eaf05f#f#prev_delta WITH MRO::ConsList#f4eaf05f#fff#reorder_2_0_1#prev ON FIRST 1 OUTPUT Rhs.1, Lhs.0 'list', Rhs.2 0 ~0% {3} r2 = JOIN r1 WITH MRO::ClassList::length#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.2, Lhs.1 'list', Rhs.1 'n' 0 ~0% {3} r3 = JOIN r2 WITH MRO::ClassListList::flatten#dispred#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.1 'list', Lhs.2 'n', Rhs.1 'result' 0 ~0% {3} r4 = SCAN MRO::ConsList#f4eaf05f#fff#prev_delta OUTPUT In.2 'list', In.0, In.1 0 ~0% {3} r5 = JOIN r4 WITH MRO::need_flattening#f4eaf05f#f#prev ON FIRST 1 OUTPUT Lhs.1, Lhs.2, Lhs.0 'list' 0 ~0% {3} r6 = JOIN r5 WITH MRO::ClassList::length#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.1, Lhs.2 'list', Rhs.1 'n' 0 ~0% {3} r7 = JOIN r6 WITH MRO::ClassListList::flatten#dispred#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.1 'list', Lhs.2 'n', Rhs.1 'result' 0 ~0% {3} r8 = r3 UNION r7 26355 ~2% {3} r9 = SCAN MRO::ConsList#f4eaf05f#fff#prev OUTPUT In.2 'list', In.0, In.1 0 ~0% {3} r10 = JOIN r9 WITH MRO::need_flattening#f4eaf05f#f#prev ON FIRST 1 OUTPUT Lhs.1, Lhs.2, Lhs.0 'list' 0 ~0% {3} r11 = JOIN r10 WITH MRO::ClassList::length#f0820431#ff#prev_delta ON FIRST 1 OUTPUT Lhs.1, Lhs.2 'list', Rhs.1 'n' 0 ~0% {3} r12 = JOIN r11 WITH MRO::ClassListList::flatten#dispred#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.1 'list', Lhs.2 'n', Rhs.1 'result' ... ``` (... and a bunch more lines. The same construction appears several times, but the join order is the same each time.) Clearly it would be better to start with whatever is in `need_flattening`, and then do the other joins. This is what the present fix does (by unbinding `list` in all but the `needs_flattening` call). After the fix, the slowest iteration is as follows: ``` Tuple counts for MRO::flatten_list#f4eaf05f#fff/3@i2617#8155ab3w after 9ms: 0 ~0% {2} r1 = SCAN MRO::need_flattening#f4eaf05f#f#prev_delta OUTPUT In.0 'list', In.0 'list' 0 ~0% {3} r2 = JOIN r1 WITH MRO::ConsList#f4eaf05f#fff#reorder_2_0_1#prev ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'list', Rhs.2 0 ~0% {3} r3 = JOIN r2 WITH MRO::ClassList::length#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.2, Lhs.1 'list', Rhs.1 'n' 0 ~0% {3} r4 = JOIN r3 WITH MRO::ClassListList::flatten#dispred#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.1 'list', Lhs.2 'n', Rhs.1 'result' 1 ~0% {2} r5 = SCAN MRO::need_flattening#f4eaf05f#f#prev OUTPUT In.0 'list', In.0 'list' 0 ~0% {3} r6 = JOIN r5 WITH MRO::ConsList#f4eaf05f#fff#reorder_2_0_1#prev_delta ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'list', Rhs.2 0 ~0% {3} r7 = JOIN r6 WITH MRO::ClassList::length#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.2, Lhs.1 'list', Rhs.1 'n' 0 ~0% {3} r8 = JOIN r7 WITH MRO::ClassListList::flatten#dispred#f0820431#ff#prev ON FIRST 1 OUTPUT Lhs.1 'list', Lhs.2 'n', Rhs.1 'result' ... ``` (... and so on. The remainder is 0 tuples all the way.) In total, we went from ``` 40.6s \| 7614 \| 15ms @ 1839 \| MRO::flatten_list#f4eaf05f#fff@0265eb3w ``` to ``` 7.8s \| 7614 \| 11ms @ 2617 \| MRO::flatten_list#f4eaf05f#fff@8155ab3w ```	2022-06-28 14:17:47 +00:00
Asger F	a522562f93	Merge pull request #9369 from asgerf/python/api-graph-api Python: API graph renaming and documentation	2022-06-28 14:48:12 +02:00
yoff	834d2603a2	python: update use of barrier guard	2022-06-28 11:15:37 +00:00
Asger F	4c73ab2679	Apply suggestions from code review Co-authored-by: Taus <tausbn@github.com>	2022-06-28 09:48:53 +02:00
Asger F	a033338d20	Python: Explicitly mention lack of transitive flow in asSource/asSink	2022-06-28 09:46:26 +02:00
Asger F	9b27a7cbcd	Python: Dont claim that external libraries are excluded from the database	2022-06-28 09:28:26 +02:00
yoff	67b6f215dc	Apply suggestions from code review Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>	2022-06-28 08:05:53 +02:00
yoff	1788507571	python: add qldoc	2022-06-27 21:00:12 +00:00
Rasmus Lerchedahl Petersen	a1fe8a5b2b	python: handle `not` in BarrierGuard in the program ```python if not is_safe(path): return ``` the last node in the `ConditionBlock` is `not is_safe(path)`, so it would never match "a call to is_safe". Thus, guards inside `not` would not be part of `GuardNode` (nor `BarrierGuard`). Now they can.	2022-06-27 20:10:47 +00:00
Taus	dc0f50d49a	Python: Clean up variable names Makes it more consistent with the names used in `legalMergeCandidateNonEmpty`.	2022-06-27 19:54:09 +00:00
Taus	8fc9ce9699	Python: Fix bad join in MRO Fixes a bad join in `list_of_linearization_of_bases_plus_bases`. Previvously, we joined together `ConsList` and `getBase` before filtering these out using the recursive call. Now we do the recursion first. Co-authored-by: yoff <yoff@github.com>	2022-06-27 19:54:09 +00:00
Rasmus Wriedt Larsen	9e154ff4bd	Merge branch 'main' into python/port-tarslip	2022-06-27 14:36:15 +02:00
yoff	5042c804dd	python: sync files and fix many small things - but now we have non-monotonic recursion again...	2022-06-23 14:57:06 +00:00
Anders Schack-Mulligen	dc517a758e	Autoformat	2022-06-23 14:44:40 +02:00
Anders Schack-Mulligen	4a317a25d3	Dataflow: Sync.	2022-06-23 14:34:52 +02:00
yoff	a2851baa9f	python: fix import of "merge moved" file	2022-06-23 12:05:55 +00:00
Rasmus Wriedt Larsen	3248f7b423	Merge pull request #9649 from RasmusWL/certificate-modeling Python/JS/Ruby: Ignore common words (like certain) as sensitive data source	2022-06-23 12:04:58 +02:00
yoff	140dc1a61e	merge in main	2022-06-23 09:05:32 +00:00
yoff	fe0c5d8ee5	python: make `ArgumentNode` publicly usable - add `getCall`	2022-06-23 08:48:55 +00:00
yoff	b22de69ab2	python: update qldoc now predicates may be empty	2022-06-23 08:41:28 +00:00
yoff	cedf9ef538	python: make `DataFlowCall` "publicly usable" - add `getCallable`, `getArg` and `getNode` - these are `none` for summary calls - revert "external" uses (they had been changed to `DataFlowSourceCall`)	2022-06-23 08:32:23 +00:00
Rasmus Wriedt Larsen	4be375521f	Python: Handle `_` in sensitive-data-sources	2022-06-22 11:05:14 +02:00

1 2 3 4 5 ...

880 Commits