codeql

mirror of https://github.com/github/codeql.git synced 2025-12-17 17:23:36 +01:00

Author	SHA1	Message	Date
Erik Krogh Kristensen	16774ba285	add support for named parameters in API graphs	2022-02-03 23:10:38 +01:00
Erik Krogh Kristensen	66fd43fc3b	add def edge for function returns	2022-02-03 23:10:38 +01:00
Erik Krogh Kristensen	d8eea7ba4c	property writes are def nodes	2022-02-03 23:10:38 +01:00
Erik Krogh Kristensen	a908b219e9	more backtracking of def nodes, and lots of tests	2022-02-03 23:10:38 +01:00
Erik Krogh Kristensen	038b032a43	get basic module exports to work in API-graphs	2022-02-03 23:10:38 +01:00
Erik Krogh Kristensen	df9efbe778	get mimimal def nodes to work in python	2022-02-03 23:10:38 +01:00
Erik Krogh Kristensen	89786d9ce2	rename `pr` to `ref` in `memberFromRef`	2022-02-03 23:10:37 +01:00
Taus	22aa4c9379	Python: Fix performance issue in `charSet` Observed on `mozilla/bugbug` on the 2.8.0 CLI branch, we had the following line in the timing report: ``` FullServerSideRequestForgery.ql-17:regex::RegexString::charSet_dispred#fff#antijoin_rhs ............... 1m13s ``` Inspecting the logs, we see the following join: ``` (644s) Tuple counts for regex::RegexString::charSet_dispred#fff#antijoin_rhs/5@f295d1bk after 1m13s: 1 ~0% {1} r1 = CONSTANT(unique string)["]"] 2389 ~4% {3} r2 = JOIN r1 WITH regex::RegexString::nonEscapedCharAt_dispred#fff_201#join_rhs ON FIRST 1 OUTPUT Rhs.1 'arg0', Rhs.2 'arg1', (Rhs.2 'arg1' + 1) 668873 ~0% {6} r3 = JOIN r2 WITH regex::RegexString::char_set_start_dispred#fff ON FIRST 1 OUTPUT Lhs.0 'arg0', "]", Lhs.1 'arg1', Lhs.2 'arg2', Rhs.1 'arg3', Rhs.2 'arg4' 537501371 ~4% {7} r4 = JOIN r3 WITH regex::RegexString::nonEscapedCharAt_dispred#fff_021#join_rhs ON FIRST 2 OUTPUT Lhs.0 'arg0', Lhs.2 'arg1', Lhs.3 'arg2', Lhs.4 'arg3', Lhs.5 'arg4', "]", Rhs.2 269085087 ~0% {7} r5 = SELECT r4 ON In.6 > In.4 'arg4' 89583155 ~3% {7} r6 = SELECT r5 ON In.6 < In.1 'arg1' 89583155 ~26634% {5} r7 = SCAN r6 OUTPUT In.0 'arg0', In.1 'arg1', In.2 'arg2', In.3 'arg3', In.4 'arg4' return r7 ``` Now, this is problematic not just because of the large intermediary join but also because of the large number of tuples being materialised at the end. The culprit in this case turns out to be this bit of `charSet`: ``` not exists(int mid \| this.nonEscapedCharAt(mid) = "]" \| mid > inner_start and mid < inner_end) ``` Rewriting this to instead look for the minimum index at which a `]` appears resulted in a much nicer join. I also fixed up a similar issue surrounding the `\N` unicode escape. Not that I think this will necessarily be relevant, but the `min`-based solution is more robust either way.	2022-02-03 20:42:04 +00:00
Chuan-kai Lin	c8bc5cfa75	Merge pull request #7825 from github/cklin/python-downgrade-scripts Python: adjust downgrade script location and format	2022-02-03 11:40:07 -08:00
Rasmus Wriedt Larsen	5cd08b8e8c	Python: Ignore `.isAbsent()` from ClassCall This means that DataFlowCall is only for resolvable calls, which might not seem like a big thing in itself, but enables the next commit to actually work :P	2022-02-03 14:58:30 +01:00
Rasmus Wriedt Larsen	48aa07d67a	Python: Handle SyntheticPreUpdateNode in PrintNode	2022-02-03 14:58:30 +01:00
Rasmus Wriedt Larsen	49b5d60229	Python: Use AttrRead/AttrWrite for attr read/store steps Note that this doesn't actually add the desired flow from setattr, due to missing post-update note. This will be fixed in later commit.	2022-02-03 14:58:30 +01:00
Rasmus Wriedt Larsen	5774459dfb	Python: restrict `AttrRead` with `AttrNode.isLoad()`	2022-02-03 14:58:23 +01:00
Rasmus Wriedt Larsen	e2de0e61ca	Python: Remove `RegExpTerm` from PrintAST Since this caused bad performance (as we had to evaluate points-to). Fixes https://github.com/github/codeql/issues/6964 This approach was motivated by the comment on the issue from @tausbn: > We discussed this internally in the CodeQL Python team, and have > agreed that the best approach for now is to disable the printing of > regex ASTs. I tried to keep our RegExpTerm logic, but doing the fix below did not work, and still evaluated RegExpTerm :\| I guess we will just have to revert this PR if we want it back ```diff TRegExpTermNode(RegExpTerm term) { + none() and exists(StrConst str \| term.getRootTerm() = getParsedRegExp(str) and shouldPrint(str, _)) } ```	2022-02-03 14:22:14 +01:00
Erik Krogh Kristensen	e93c46ad31	Merge pull request #7811 from erik-krogh/pyApiIpa Python: refactor API-graph labels to an IPA type	2022-02-03 12:31:39 +01:00
Tom Hvitved	6bb71f051b	Merge pull request #7791 from hvitved/dataflow/inline-local-flow-star Data flow: Inline `local(Expr\|Instruction)?(Flow\|Taint)`	2022-02-03 09:02:43 +01:00
Chuan-kai Lin	df91ee6616	Python: adjust downgrade script location and format	2022-02-02 14:23:21 -08:00
Arthur Baars	33b97f3e0c	Update synchronized files	2022-02-02 13:30:45 +01:00
CodeQL CI	7bb11b837c	Merge pull request #7788 from yoff/python/remove-library-annotation Approved by tausbn	2022-02-02 03:51:00 -08:00
Rasmus Wriedt Larsen	51bc6dcf7e	Python: Add `attributeClearStep`	2022-02-02 11:19:35 +01:00
Rasmus Lerchedahl Petersen	4ad99d9299	python: add missing QlDoc	2022-02-02 09:14:21 +01:00
Rasmus Lerchedahl Petersen	448e0785c2	python: `logging.root` is not a call	2022-02-02 09:04:16 +01:00
Erik Krogh Kristensen	e06f6529f1	refactor API-graph labels to an IPA type	2022-02-01 17:32:08 +01:00
Rasmus Lerchedahl Petersen	1e2428cb6b	python: create LDAP module in `Concepts`	2022-02-01 14:39:58 +01:00
Rasmus Lerchedahl Petersen	c2cd58edc4	python: rewrite to separate configurations source nodes get duplicated, so perhaps flow states are actually better for performance?	2022-02-01 14:36:11 +01:00
Rasmus Lerchedahl Petersen	c587084758	python: use standard `InstanceSource` construction	2022-02-01 13:31:16 +01:00
Rasmus Wriedt Larsen	f7a0b17ed6	Merge pull request #7687 from yoff/python/PathInjection-FlowState python: Rewrite path injection query to use flow state	2022-02-01 11:33:37 +01:00
Rasmus Lerchedahl Petersen	119a7e4f34	python: provide links for Flask	2022-02-01 10:55:45 +01:00
Rasmus Lerchedahl Petersen	7511b33512	python: "command" -> "log"	2022-02-01 10:23:16 +01:00
yoff	45f0bfd8f0	Apply suggestions from code review Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>	2022-02-01 10:06:37 +01:00
yoff	c03f89d712	Apply suggestions from code review Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>	2022-02-01 10:04:26 +01:00
Taus	4a29095e3b	Python: Fix bad join order in `TPythonTuple` TL;DR: Something introduced the following bad join order: ``` (227s) Tuple counts for dom#TObject::TPythonTuple#ff/2@i2#8f58670w after 3m46s: 25000 ~0% {2} r1 = SCAN PointsToContext::PointsToContext::appliesToScope_dispred#ff#prev_delta OUTPUT In.1, In.0 'context' 24000 ~1% {2} r2 = JOIN r1 WITH @py_scope#f ON FIRST 1 OUTPUT Lhs.1 'context', Lhs.0 1076876712 ~6% {3} r3 = JOIN r2 WITH Flow::TupleNode#class#f CARTESIAN PRODUCT OUTPUT Rhs.0, Lhs.0 'context', Lhs.1 870129666 ~0% {3} r4 = JOIN r3 WITH Flow::ControlFlowNode::isLoad_dispred#f ON FIRST 1 OUTPUT Lhs.1 'context', Lhs.2, Lhs.0 'origin' 870129000 ~0% {3} r5 = r4 AND NOT dom#TObject::TPythonTuple#ff#prev(Lhs.2 'origin', Lhs.0 'context') 870129000 ~1% {3} r6 = SCAN r5 OUTPUT In.2 'origin', In.1, In.0 'context' 9000 ~0% {2} r7 = JOIN r6 WITH Flow::ControlFlowNode::getScope_dispred#ff ON FIRST 2 OUTPUT Lhs.0 'origin', Lhs.2 'context' return r7 ``` (...the above being the tuple counts _at the point when I cancelled the query_!) Rewriting the code to force a join between `TupleNode#class` and `getScope` results in the following join orders: ``` (0s) Tuple counts for TObject::scope_loads_tuplenode#ff/2@b3cf0bo5 after 13ms: 37369 ~3% {1} r1 = JOIN Flow::TupleNode#class#f WITH Flow::ControlFlowNode::isLoad_dispred#f ON FIRST 1 OUTPUT Lhs.0 'origin' 37369 ~3% {2} r2 = JOIN r1 WITH Flow::ControlFlowNode::getScope_dispred#ff ON FIRST 1 OUTPUT Rhs.1 's', Lhs.0 'origin' return r2 ``` and ``` (78s) Tuple counts for dom#TObject::TPythonTuple#ff/2@i53#121c440w after 6ms: 34736 ~3% {2} r1 = SCAN PointsToContext::PointsToContext::appliesToScope_dispred#ff#prev_delta OUTPUT In.1, In.0 'context' 7370 ~5% {2} r2 = JOIN r1 WITH TObject::scope_loads_tuplenode#ff ON FIRST 1 OUTPUT Lhs.1 'context', Rhs.1 'origin' 7370 ~5% {2} r3 = r2 AND NOT dom#TObject::TPythonTuple#ff#prev(Lhs.1 'origin', Lhs.0 'context') 7370 ~1% {2} r4 = SCAN r3 OUTPUT In.1 'origin', In.0 'context' return r4 ``` the latter being the largest iteration of `dom#TPythonTuple` throughout the log. No other major performance issues were observed.	2022-01-31 16:59:50 +00:00
Tom Hvitved	f2352d8272	Data flow: Inline `local(Expr\|Instruction)?(Flow\|Taint)` Computing a full transitive closure is often bad; by inlining all calls we are providing more context to the QL optimizer.	2022-01-31 14:33:41 +01:00
Rasmus Lerchedahl Petersen	8b5114d10e	python: Add standard customization setup - modernize the sanitizer, but do not make it less specific	2022-01-31 11:27:55 +01:00
Rasmus Lerchedahl Petersen	20d54543fd	python: move log injection out of experimental - move from custom concept `LogOutput` to standard concept `Logging` - remove `Log.qll` from experimental frameworks - fold models into standard models (naively for now) - stdlib: - make Logger module public - broaden definition of instance - add `extra` keyword as possible source - flak: add app.logger as logger instance - django: `add django.utils.log.request_logger` as logger instance (should we add the rest?) - remove LogOutput from experimental concepts	2022-01-31 11:27:55 +01:00
Rasmus Lerchedahl Petersen	211345c010	python: remove more annotations	2022-01-31 11:20:59 +01:00
Rasmus Lerchedahl Petersen	cac3862659	python: remove library annotation to clean up QL warnings. Should put these in a private module instead?	2022-01-31 08:50:37 +01:00
Rasmus Lerchedahl Petersen	0c3bce1415	python: deprecation I am slightly concerned that the test now generates many more intermediate results. I suppose that maes the analysis heavy. Should the new library get a new name instead, so the old code does not get evaluated?	2022-01-31 08:32:24 +01:00
Rasmus Wriedt Larsen	3e71d7f9bb	Python: Add note about `/` for Django upload_to I did a test locally, something like import requests req = requests.Request( "POST", "http://127.0.0.1:8000/app/upload-test/", data={"name": "foo"}, files={"upload" : ("wat/haha\|!#$%^&", open("foo.txt", "rb"))}, ) # print(req.prepare().body.decode('ascii')) requests.session().send(req.prepare()) and the `wat/` part was stripped from the filename	2022-01-28 12:17:46 +01:00
yoff	74d57bbb1a	Update python/ql/lib/semmle/python/dataflow/new/internal/DataFlowPrivate.qll Co-authored-by: Taus <tausbn@github.com>	2022-01-28 11:38:29 +01:00
Rasmus Lerchedahl Petersen	a026120c52	Python: Move configuration over and refine it The original configuration did not match sinks with sanitizers. Here it is resolved using flow state, it could also be done by using two configurations.	2022-01-28 09:00:40 +01:00
Rasmus Lerchedahl Petersen	d539920661	Python: Update list of frameworks	2022-01-28 08:58:30 +01:00
Rasmus Wriedt Larsen	4338c06b0d	Python: Support Django FileField.upload_to	2022-01-27 17:20:16 +01:00
Rasmus Lerchedahl Petersen	b93c04bb79	python: Add reverse flow in some patterns Particularly in value and literal patterns. This is getting a little bit into the guards aspect of matching. We could similarly add reverse flow in terms of sub-patterns storing to a sequence pattern, a flow step from alternatives to an-or-pattern, etc.. It does not seem too likely that sources are embedded in patterns to begin with, but for secrets perhaps? It is illustrated by the literal test. The value test still fails. I believe we miss flow in general from the static attribute.	2022-01-27 15:20:23 +01:00
github-actions[bot]	634134f283	Release preparation for version 2.8.0	2022-01-27 10:40:20 +00:00
Rasmus Lerchedahl Petersen	cb52ab669e	python: address review comments The comment about `py_scopes` was simply removed	2022-01-27 11:17:00 +01:00
yoff	e28669e487	Apply suggestions from code review Co-authored-by: Taus <tausbn@github.com>	2022-01-27 10:31:43 +01:00
Rasmus Lerchedahl Petersen	163c888781	python: port concepts and implementations	2022-01-26 19:05:37 +01:00
Rasmus Lerchedahl Petersen	47af3a69a5	Merge branch 'main' of github.com:github/codeql into python/support-match	2022-01-26 11:39:46 +01:00
Edoardo Pirovano	1b539eb4dc	Merge branch `rc/3.4` into `main`	2022-01-25 16:22:01 +00:00

... 5 6 7 8 9 ...

750 Commits