codeql

mirror of https://github.com/github/codeql.git synced 2026-02-11 20:51:06 +01:00

Author	SHA1	Message	Date
Rasmus Wriedt Larsen	d2efe0b84d	Python: Normalize additional taint steps for modeled classes Such that it should be next to the other class-related predicates (such as `instance()`), the class is called `AdditionalTaintStep`, and it marked private. I also moved any modeling of attributes as well, while I was at it.	2021-07-22 11:59:46 +02:00
Rasmus Wriedt Larsen	be1cad864b	Python: Resolve all `meth = obj.meth; meth()` TODOs It would probably have been easier to do this as the _first_ thing... but that's too late now 😓	2021-07-22 11:59:46 +02:00
Rasmus Wriedt Larsen	6f63c03558	Python: Model `http.cookies.Morsel` and usage in Tornado	2021-07-22 10:43:18 +02:00
Rasmus Wriedt Larsen	7e09a1cbfd	Python: Model `tornado.httputil.HTTPHeaders`	2021-07-22 10:43:18 +02:00
Rasmus Wriedt Larsen	7020e4132b	Python: Model BaseHTTPRequestHandler.rfile as file-like object	2021-07-22 10:43:18 +02:00
Rasmus Wriedt Larsen	d388dd547e	Python: Model `HTTPMessage` from Stdlib	2021-07-22 10:43:18 +02:00
Rasmus Wriedt Larsen	f3ce3933d1	Python: Add AdditionalTaintStep to type-tracking class snippet I know that the TODO about not having the tools to handling `meth = obj.meth; meth()` is outdated now that we `DataFlow::MethodCallNode`, but I'm planning to deal with that later on ;)	2021-07-22 10:43:18 +02:00
Rasmus Wriedt Larsen	dac71ded9d	Python: Add Authorization modeling in Flask	2021-07-22 10:43:18 +02:00
Rasmus Wriedt Larsen	133632119d	Python: Model werkzeug Headers Also removed a misleading comment link to method on wrong class :D	2021-07-22 10:43:18 +02:00
Rasmus Wriedt Larsen	4d9c86a252	Python: Model Werkzeug `FileStorage.save` as FileSystemAccess	2021-07-22 10:43:18 +02:00
Rasmus Wriedt Larsen	9cb4899c5c	Python: Add FileStorage modeling in Flask	2021-07-22 10:43:18 +02:00
Rasmus Wriedt Larsen	09b0c300d9	Python: Rewrite werkzeug to avoid InstanceSourceApiNode InstanceSourceApiNode is a really good idea, but it just happened too soon. I can't do what I need if I have to supply an API-node. So to avoid confusion between deprecating to/from InstanceSource in those classes, I opted to do some major reorganizing as well 👍 Due to aliasing restrictions, I had to use a little trick with the `WerkzeugOld` module.	2021-07-22 10:43:18 +02:00
Rasmus Wriedt Larsen	04190ea308	Python: Add file-like modeling to werkzeug FileStorage	2021-07-22 10:43:18 +02:00
Rasmus Wriedt Larsen	5f5c0b11c7	Python: Refactor Werkzeugmodeling Having the additional taint step just next to the other definitions, so everything is together.	2021-07-22 10:43:18 +02:00
Rasmus Wriedt Larsen	4f4dec50f2	Python: Model ResovlerMatch in Django Like before, omitted ClassInstantiation	2021-07-22 10:43:13 +02:00
Rasmus Wriedt Larsen	6f0a622252	Python: Remove ClassInstantiation from Django UploadedFile since UploadedFile is the abstract base class, all real usage would be of one of the subclasses, so removing this to not provide a false hope that it actually works. I don't think investing the time into making this work would give any value, so that's why I didn't do it ;)	2021-07-21 16:35:09 +02:00
Rasmus Wriedt Larsen	7dc6518350	Python: Add `FileLikeObject` modeling Such that the result of `request.FILES["key"].file.read()` is tainted	2021-07-21 16:35:09 +02:00
Rasmus Wriedt Larsen	18c0d13efd	Python: Model most of UploadedFile in Django	2021-07-21 16:35:09 +02:00
Rasmus Wriedt Larsen	5ec5557203	Python: Model MultiValueDict in Django	2021-07-21 16:35:09 +02:00
Rasmus Wriedt Larsen	95e88c18b9	Python: Minor cleanup	2021-07-21 16:35:09 +02:00
Rasmus Wriedt Larsen	51b543c67c	Python: Model taint for django request methods	2021-07-21 16:35:09 +02:00
Rasmus Wriedt Larsen	bced467a88	Python: Refactor django additional step handling So it matches the new style we're using in aiohttp/twisted/...	2021-07-21 16:35:09 +02:00
Rasmus Wriedt Larsen	ce4b192caa	Python: Improve usefulness of RemoteFlowSourcesReach meta query Before, results from `dca` would look something like ## + py/meta/alerts/remote-flow-sources-reach - django/django@c2250cf_cb8f: tests/messages_tests/urls.py:38:16:38:48 reachable with taint-tracking from RemoteFlowSource - django/django@c2250cf_cb8f: tests/messages_tests/urls.py:38:9:38:12 reachable with taint-tracking from RemoteFlowSource now it should make it easier to spot _what_ it is that actually changed, since we pretty-print the node.	2021-07-21 16:35:09 +02:00
Rasmus Wriedt Larsen	6aabbf0b9a	Python: Add some alert meta queries Intended for use with dca	2021-07-21 14:53:01 +02:00
Rasmus Wriedt Larsen	c9087b2e1b	Python: Minor fixup to snippet Spotted by @tausbn 🎉	2021-07-19 10:19:23 +02:00
Rasmus Wriedt Larsen	5e193ee8da	Python: Add more snippets	2021-07-15 18:56:49 +02:00
CodeQL CI	d282f6a356	Merge pull request #6218 from tausbn/python-add-typetrackingnode Approved by RasmusWL	2021-07-15 07:04:50 -07:00
Taus	dd03d8102b	Merge pull request #6300 from RasmusWL/redos-tests Python: Fix `py/polynomial-redos`	2021-07-15 15:59:01 +02:00
Rasmus Wriedt Larsen	900cbc9a2f	Merge pull request #6265 from tausbn/python-performance-fixes Python: Fix a few performance issues.	2021-07-15 14:19:37 +02:00
Rasmus Wriedt Larsen	a5834c4d78	Python: Fix `py/polynomial-redos`	2021-07-15 14:16:19 +02:00
Rasmus Wriedt Larsen	76caf43b54	Python: Add tests for `py/polynomial-redos`	2021-07-15 14:15:44 +02:00
Rasmus Wriedt Larsen	1be0dc0876	Python: Move test for ReDoS	2021-07-15 14:15:24 +02:00
Anders Schack-Mulligen	8ccdd4fb9f	Merge pull request #6211 from aschackmull/dataflow/refactor-call-context-check Dataflow: Refactor call context check	2021-07-15 12:27:23 +02:00
Taus	fb57c5f6f0	Merge pull request #6143 from RasmusWL/concepts-private-import-python Python: Make `import python` private in Concepts.qll	2021-07-14 17:49:06 +02:00
Taus	5c5ee85332	Merge pull request #6122 from RasmusWL/mention-mysqlclient Python: Mention modeling of `mysqlclient` PyPI package	2021-07-14 17:48:40 +02:00
Taus	30d61045d2	Python: Mention `nameIndicatesSensitiveData` Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>	2021-07-14 17:33:39 +02:00
Taus	2bb44d49d9	Python: Perform more deduplication This cut the evaluation time on `django` down from 1.2 seconds to ~0.8 seconds (but the impact will likely be greater on bigger projects).	2021-07-14 13:38:05 +00:00
Taus	09993406f1	Python: Add explanatory QLDoc comment	2021-07-14 10:42:07 +00:00
Anders Schack-Mulligen	0ccb213ec5	Dataflow: Sync.	2021-07-14 10:36:09 +02:00
CodeQL CI	f6f7020388	Merge pull request #6250 from erik-krogh/python-redos-unicode Approved by RasmusWL	2021-07-14 01:09:26 -07:00
Taus	6aec7f2c49	Merge pull request #6264 from RasmusWL/customization-files-for-path-problems Python: Provide proper source/sink customization for most path queries	2021-07-13 15:09:33 +02:00
Rasmus Wriedt Larsen	6f8969a55e	Python: Add change-note	2021-07-13 14:39:44 +02:00
Rasmus Wriedt Larsen	9ed61e7663	Python: Port `py/polynomial-redos` to use proper source/sink customization I noticed the configuration/customization files are in the `performance` folder in JS, but I just kept them in place, since that seems correct to me.	2021-07-13 14:39:44 +02:00
Rasmus Wriedt Larsen	cea2f82be9	Python: Port `py/path-injection` to use proper source/sink customization	2021-07-13 14:09:02 +02:00
Rasmus Wriedt Larsen	bf214ac3bb	Python: Apply suggestions from code review Co-authored-by: Taus <tausbn@github.com>	2021-07-13 13:41:26 +02:00
Rasmus Wriedt Larsen	1a59c9b64a	Merge pull request #6204 from tausbn/python-ensmallen-localsourcenode Python: Clean up `LocalSourceNode` charpred	2021-07-13 13:27:38 +02:00
Taus	1decf23785	Python: Fix bad join order for sensitive data Not the prettiest of solutions, but it does the job. Basically, we were calculating (and re-calculating) the same big relation between strings and regexes and then checking whether the latter matched the former. This resulted in tuple counts like the following: ``` [2021-07-12 16:09:24] (12s) Tuple counts for SensitiveDataSources::SensitiveDataModeling::SensitiveVariableAssignment#class#ff#shared/4@7489c6: 4918074 ~0% {4} r1 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH Flow::NameNode::getId_dispred#ff CARTESIAN PRODUCT OUTPUT Lhs.0 'arg0', Lhs.1 'arg1', Rhs.0, Rhs.1 'arg3' 2654 ~0% {4} r2 = JOIN r1 WITH PRIMITIVE regexpMatch#bb ON Lhs.3 'arg3',Lhs.1 'arg1' return r2 ``` (The above being just the bit that handles `DefinitionNode` in `SensitiveVariableAssignment`, and taking 12 seconds to evaluate.) By applying a bit of manual inlining and magic, this becomes somewhat more manageable: ``` [2021-07-12 15:59:44] (1s) Tuple counts for SensitiveDataSources::SensitiveDataModeling::sensitiveString#ff/2@8830e2: 27671 ~2% {3} r1 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveParameterName#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0 334012 ~2% {3} r2 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveName#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0 361683 ~11% {3} r3 = r1 UNION r2 154644 ~0% {3} r4 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveFunctionName#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0 149198 ~1% {3} r5 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveStrConst#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0 124257 ~5% {3} r6 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveAttributeName#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0 273455 ~21% {3} r7 = r5 UNION r6 428099 ~30% {3} r8 = r4 UNION r7 789782 ~78% {3} r9 = r3 UNION r8 1121 ~77% {3} r10 = JOIN r9 WITH PRIMITIVE regexpMatch#bb ON Lhs.2 'result',Lhs.1 1121 ~70% {2} r11 = SCAN r10 OUTPUT In.0 'classification', In.2 'result' return r11 ``` (The above being the total for all the sensitive names we care about, taking only 1.2 seconds to evaluate.) Incidentally, you may wonder why this has _fewer_ results than before. The answer is control flow splitting -- every sensitively-named `DefinitionNode` would have been matched in isolation previously. By pre-matching on just the names of these, we can subsequently join against those names that are known to be sensitive, which is a much faster operation. (We also get the benefit of deduplicating the strings that are matched, before actually performing the match, so if, say, an attribute name and a variable name are identical, then we'll only match them once.) We also exclude all docstrings as relevant string constants, as these presumably don't actually flow anywhere.	2021-07-12 16:10:49 +00:00
Taus	a73e382dfe	Python: Prevent bad join in hashlib model I'm not entirely sure what triggered this bad join order, but some combination of the use of abstract classes and the exclusion of `new` caused this to go really wrong: ``` WeakSensitiveDataHashing.ql-15:Stdlib::Stdlib::HashlibDataPassedToHashClass#class#ffff ......... 15.5s ``` with the following tuple counts: ``` [2021-07-12 13:20:15] (16s) Tuple counts for Stdlib::Stdlib::HashlibDataPassedToHashClass#class#ffff/4@217901: 148810 ~3% {3} r1 = JOIN DataFlowPublic::CallCfgNode#class#ff#shared WITH project#DataFlowPublic::CallCfgNode::getArg_dispred#fff ON FIRST 1 OUTPUT "hashlib", Lhs.1 'node', Lhs.0 'this' 148810 ~4% {3} r2 = JOIN r1 WITH ApiGraphs::API::Impl::MkModuleImport#ff@staged_ext ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'node', Lhs.2 'this' 7589310 ~486% {4} r3 = JOIN r2 WITH ApiGraphs::API::Impl::edge#2#fff@staged_ext ON FIRST 1 OUTPUT Lhs.1 'node', Lhs.2 'this', Rhs.1, InverseAppend("getMember(\"","\")",Rhs.1) 6994070 ~490% {4} r4 = SELECT r3 ON In.3 != "new" 6994070 ~4503% {2} r5 = SCAN r4 OUTPUT In.1 'this', In.0 'node' 22 ~4% {3} r6 = JOIN DataFlowPublic::CallCfgNode#class#ff#shared WITH project#DataFlowPublic::CallCfgNode::getArgByName_dispred#fff ON FIRST 1 OUTPUT "hashlib", Lhs.1 'node', Lhs.0 'this' 22 ~0% {3} r7 = JOIN r6 WITH ApiGraphs::API::Impl::MkModuleImport#ff@staged_ext ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'node', Lhs.2 'this' 1122 ~437% {4} r8 = JOIN r7 WITH ApiGraphs::API::Impl::edge#2#fff@staged_ext ON FIRST 1 OUTPUT Lhs.1 'node', Lhs.2 'this', Rhs.1, InverseAppend("getMember(\"","\")",Rhs.1) 1034 ~460% {4} r9 = SELECT r8 ON In.3 != "new" 1034 ~4549% {2} r10 = SCAN r9 OUTPUT In.1 'this', In.0 'node' 6995104 ~4503% {2} r11 = r5 UNION r10 5213851 ~4683% {3} r12 = JOIN r11 WITH ApiGraphs::API::Node::getACall_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'hashClass', Lhs.1 'node', Lhs.0 'this' 6478480 ~4646% {6} r13 = JOIN r12 WITH ApiGraphs::API::Impl::edge#2#fff_201#join_rhs ON FIRST 1 OUTPUT "hashlib", Rhs.1, Lhs.1 'node', Lhs.2 'this', Lhs.0 'hashClass', Rhs.2 1410 ~4693% {5} r14 = JOIN r13 WITH ApiGraphs::API::Impl::MkModuleImport#ff@staged_ext ON FIRST 2 OUTPUT Lhs.2 'node', Lhs.3 'this', Lhs.4 'hashClass', Lhs.5, InverseAppend("getMember(\"","\")",Lhs.5) 1222 ~4540% {5} r15 = SELECT r14 ON In.4 'hashName' != "new" 1222 ~4540% {4} r16 = SCAN r15 OUTPUT In.1 'this', In.4 'hashName', In.2 'hashClass', In.0 'node' ``` By factoring out the insides, the biggest iteration now looks like ``` [2021-07-12 14:17:36] (0s) Tuple counts for Stdlib::Stdlib::HashlibDataPassedToHashClass#class#ffff/4@85bb21: 148810 ~0% {2} r1 = JOIN DataFlowPublic::CallCfgNode#class#ff#shared WITH project#DataFlowPublic::CallCfgNode::getArg_dispred#fff ON FIRST 1 OUTPUT Lhs.1 'node', Lhs.0 'this' 148810 ~0% {2} r2 = JOIN r1 WITH Stdlib::Stdlib::hashlibMember#ff#nonempty CARTESIAN PRODUCT OUTPUT Lhs.1 'this', Lhs.0 'node' 22 ~0% {2} r3 = JOIN DataFlowPublic::CallCfgNode#class#ff#shared WITH project#DataFlowPublic::CallCfgNode::getArgByName_dispred#fff ON FIRST 1 OUTPUT Lhs.1 'node', Lhs.0 'this' 22 ~0% {2} r4 = JOIN r3 WITH Stdlib::Stdlib::hashlibMember#ff#nonempty CARTESIAN PRODUCT OUTPUT Lhs.1 'this', Lhs.0 'node' 148832 ~0% {2} r5 = r2 UNION r4 110933 ~2% {3} r6 = JOIN r5 WITH ApiGraphs::API::Node::getACall_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'hashClass', Lhs.1 'node', Lhs.0 'this' 26 ~0% {4} r7 = JOIN r6 WITH Stdlib::Stdlib::hashlibMember#ff_10#join_rhs ON FIRST 1 OUTPUT Lhs.2 'this', Rhs.1 'hashName', Lhs.0 'hashClass', Lhs.1 'node' return r7 ``` (The tuple counts themselves are not directly comparable.)	2021-07-12 14:22:21 +00:00
Rasmus Wriedt Larsen	47f5c977cf	Python: Port `py/stack-trace-exposure` to use proper source/sink customization	2021-07-12 16:22:10 +02:00
Rasmus Wriedt Larsen	934007c811	Python: Port `py/unsafe-deserialization` to use proper source/sink customization	2021-07-12 16:22:10 +02:00

1 2 3 4 5 ...

3767 Commits