codeql

mirror of https://github.com/github/codeql.git synced 2026-03-28 02:08:17 +01:00

Author	SHA1	Message	Date
Taus	2bb44d49d9	Python: Perform more deduplication This cut the evaluation time on `django` down from 1.2 seconds to ~0.8 seconds (but the impact will likely be greater on bigger projects).	2021-07-14 13:38:05 +00:00
Taus	09993406f1	Python: Add explanatory QLDoc comment	2021-07-14 10:42:07 +00:00
Taus	1decf23785	Python: Fix bad join order for sensitive data Not the prettiest of solutions, but it does the job. Basically, we were calculating (and re-calculating) the same big relation between strings and regexes and then checking whether the latter matched the former. This resulted in tuple counts like the following: ``` [2021-07-12 16:09:24] (12s) Tuple counts for SensitiveDataSources::SensitiveDataModeling::SensitiveVariableAssignment#class#ff#shared/4@7489c6: 4918074 ~0% {4} r1 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH Flow::NameNode::getId_dispred#ff CARTESIAN PRODUCT OUTPUT Lhs.0 'arg0', Lhs.1 'arg1', Rhs.0, Rhs.1 'arg3' 2654 ~0% {4} r2 = JOIN r1 WITH PRIMITIVE regexpMatch#bb ON Lhs.3 'arg3',Lhs.1 'arg1' return r2 ``` (The above being just the bit that handles `DefinitionNode` in `SensitiveVariableAssignment`, and taking 12 seconds to evaluate.) By applying a bit of manual inlining and magic, this becomes somewhat more manageable: ``` [2021-07-12 15:59:44] (1s) Tuple counts for SensitiveDataSources::SensitiveDataModeling::sensitiveString#ff/2@8830e2: 27671 ~2% {3} r1 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveParameterName#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0 334012 ~2% {3} r2 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveName#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0 361683 ~11% {3} r3 = r1 UNION r2 154644 ~0% {3} r4 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveFunctionName#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0 149198 ~1% {3} r5 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveStrConst#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0 124257 ~5% {3} r6 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveAttributeName#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0 273455 ~21% {3} r7 = r5 UNION r6 428099 ~30% {3} r8 = r4 UNION r7 789782 ~78% {3} r9 = r3 UNION r8 1121 ~77% {3} r10 = JOIN r9 WITH PRIMITIVE regexpMatch#bb ON Lhs.2 'result',Lhs.1 1121 ~70% {2} r11 = SCAN r10 OUTPUT In.0 'classification', In.2 'result' return r11 ``` (The above being the total for all the sensitive names we care about, taking only 1.2 seconds to evaluate.) Incidentally, you may wonder why this has _fewer_ results than before. The answer is control flow splitting -- every sensitively-named `DefinitionNode` would have been matched in isolation previously. By pre-matching on just the names of these, we can subsequently join against those names that are known to be sensitive, which is a much faster operation. (We also get the benefit of deduplicating the strings that are matched, before actually performing the match, so if, say, an attribute name and a variable name are identical, then we'll only match them once.) We also exclude all docstrings as relevant string constants, as these presumably don't actually flow anywhere.	2021-07-12 16:10:49 +00:00
Taus	a73e382dfe	Python: Prevent bad join in hashlib model I'm not entirely sure what triggered this bad join order, but some combination of the use of abstract classes and the exclusion of `new` caused this to go really wrong: ``` WeakSensitiveDataHashing.ql-15:Stdlib::Stdlib::HashlibDataPassedToHashClass#class#ffff ......... 15.5s ``` with the following tuple counts: ``` [2021-07-12 13:20:15] (16s) Tuple counts for Stdlib::Stdlib::HashlibDataPassedToHashClass#class#ffff/4@217901: 148810 ~3% {3} r1 = JOIN DataFlowPublic::CallCfgNode#class#ff#shared WITH project#DataFlowPublic::CallCfgNode::getArg_dispred#fff ON FIRST 1 OUTPUT "hashlib", Lhs.1 'node', Lhs.0 'this' 148810 ~4% {3} r2 = JOIN r1 WITH ApiGraphs::API::Impl::MkModuleImport#ff@staged_ext ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'node', Lhs.2 'this' 7589310 ~486% {4} r3 = JOIN r2 WITH ApiGraphs::API::Impl::edge#2#fff@staged_ext ON FIRST 1 OUTPUT Lhs.1 'node', Lhs.2 'this', Rhs.1, InverseAppend("getMember(\"","\")",Rhs.1) 6994070 ~490% {4} r4 = SELECT r3 ON In.3 != "new" 6994070 ~4503% {2} r5 = SCAN r4 OUTPUT In.1 'this', In.0 'node' 22 ~4% {3} r6 = JOIN DataFlowPublic::CallCfgNode#class#ff#shared WITH project#DataFlowPublic::CallCfgNode::getArgByName_dispred#fff ON FIRST 1 OUTPUT "hashlib", Lhs.1 'node', Lhs.0 'this' 22 ~0% {3} r7 = JOIN r6 WITH ApiGraphs::API::Impl::MkModuleImport#ff@staged_ext ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'node', Lhs.2 'this' 1122 ~437% {4} r8 = JOIN r7 WITH ApiGraphs::API::Impl::edge#2#fff@staged_ext ON FIRST 1 OUTPUT Lhs.1 'node', Lhs.2 'this', Rhs.1, InverseAppend("getMember(\"","\")",Rhs.1) 1034 ~460% {4} r9 = SELECT r8 ON In.3 != "new" 1034 ~4549% {2} r10 = SCAN r9 OUTPUT In.1 'this', In.0 'node' 6995104 ~4503% {2} r11 = r5 UNION r10 5213851 ~4683% {3} r12 = JOIN r11 WITH ApiGraphs::API::Node::getACall_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'hashClass', Lhs.1 'node', Lhs.0 'this' 6478480 ~4646% {6} r13 = JOIN r12 WITH ApiGraphs::API::Impl::edge#2#fff_201#join_rhs ON FIRST 1 OUTPUT "hashlib", Rhs.1, Lhs.1 'node', Lhs.2 'this', Lhs.0 'hashClass', Rhs.2 1410 ~4693% {5} r14 = JOIN r13 WITH ApiGraphs::API::Impl::MkModuleImport#ff@staged_ext ON FIRST 2 OUTPUT Lhs.2 'node', Lhs.3 'this', Lhs.4 'hashClass', Lhs.5, InverseAppend("getMember(\"","\")",Lhs.5) 1222 ~4540% {5} r15 = SELECT r14 ON In.4 'hashName' != "new" 1222 ~4540% {4} r16 = SCAN r15 OUTPUT In.1 'this', In.4 'hashName', In.2 'hashClass', In.0 'node' ``` By factoring out the insides, the biggest iteration now looks like ``` [2021-07-12 14:17:36] (0s) Tuple counts for Stdlib::Stdlib::HashlibDataPassedToHashClass#class#ffff/4@85bb21: 148810 ~0% {2} r1 = JOIN DataFlowPublic::CallCfgNode#class#ff#shared WITH project#DataFlowPublic::CallCfgNode::getArg_dispred#fff ON FIRST 1 OUTPUT Lhs.1 'node', Lhs.0 'this' 148810 ~0% {2} r2 = JOIN r1 WITH Stdlib::Stdlib::hashlibMember#ff#nonempty CARTESIAN PRODUCT OUTPUT Lhs.1 'this', Lhs.0 'node' 22 ~0% {2} r3 = JOIN DataFlowPublic::CallCfgNode#class#ff#shared WITH project#DataFlowPublic::CallCfgNode::getArgByName_dispred#fff ON FIRST 1 OUTPUT Lhs.1 'node', Lhs.0 'this' 22 ~0% {2} r4 = JOIN r3 WITH Stdlib::Stdlib::hashlibMember#ff#nonempty CARTESIAN PRODUCT OUTPUT Lhs.1 'this', Lhs.0 'node' 148832 ~0% {2} r5 = r2 UNION r4 110933 ~2% {3} r6 = JOIN r5 WITH ApiGraphs::API::Node::getACall_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'hashClass', Lhs.1 'node', Lhs.0 'this' 26 ~0% {4} r7 = JOIN r6 WITH Stdlib::Stdlib::hashlibMember#ff_10#join_rhs ON FIRST 1 OUTPUT Lhs.2 'this', Rhs.1 'hashName', Lhs.0 'hashClass', Lhs.1 'node' return r7 ``` (The tuple counts themselves are not directly comparable.)	2021-07-12 14:22:21 +00:00
CodeQL CI	1d56748eed	Merge pull request #6200 from yoff/pythonJS-make-expbtlib-private Approved by RasmusWL, esbena	2021-07-02 09:09:18 -07:00
CodeQL CI	a25933aa56	Merge pull request #5926 from RasmusWL/small-cleanups Approved by tausbn	2021-07-02 04:59:54 -07:00
Rasmus Wriedt Larsen	81fab487a4	Python: Apply suggestions from code review Co-authored-by: Taus <tausbn@github.com>	2021-07-02 13:27:41 +02:00
Rasmus Wriedt Larsen	22c155687e	Python: Fix code after removing `getPostUpdateNode`	2021-07-02 13:25:25 +02:00
Rasmus Wriedt Larsen	7a6eee50ff	Revert "Python: Add getPostUpdateNode to DataFlow::Node" This reverts commit `9137f04bd3`.	2021-07-02 13:23:02 +02:00
Rasmus Wriedt Larsen	e56dfe75bd	Python: AttrRef getOjbect/1 -> accesses/2 See this thread for discussion: https://github.com/github/codeql/pull/5926#discussion_r635384981	2021-07-02 13:21:12 +02:00
Rasmus Lerchedahl Petersen	6f2642607e	Python: make the import of `RedosUtil` public This mirrors `SuperlinearBacktracking.qll` An alternative is to keep it private and import it again in the query files.	2021-07-02 12:32:04 +02:00
Rasmus Lerchedahl Petersen	77c329fb0f	Python/JS: Make much more private	2021-07-02 12:13:52 +02:00
Rasmus Lerchedahl Petersen	1fc9638486	Python: port redos .qhelp from js	2021-07-02 11:36:46 +02:00
Taus	f151338def	Merge pull request #6198 from RasmusWL/fix-cleartext-logging Python: Some minor fixes to `py/clear-text-logging-sensitive-data`	2021-07-01 18:28:25 +02:00
Rasmus Lerchedahl Petersen	eee56e0156	Python/JS: Make most of the new library private	2021-07-01 15:34:06 +02:00
Anders Schack-Mulligen	37f8794d01	Merge pull request #6165 from edoardopirovano/fix-regression Performance: Improve join order in data flow library	2021-07-01 14:13:18 +02:00
Rasmus Wriedt Larsen	b0309dd321	Python: Limit SensitiveDataSources to prevent _some_ cross-talk	2021-07-01 12:08:12 +02:00
Rasmus Wriedt Larsen	f64e58a21c	Python: Fix a QLDoc for SensitiveDataSources	2021-07-01 12:05:59 +02:00
Rasmus Wriedt Larsen	d9e2f504f8	Python: Fix clear text logging sink No need to restrict it to arguments that are calls	2021-06-30 20:31:17 +02:00
Taus	e4af14638b	Merge pull request #6175 from yoff/python-port-ReDoS Python: port ReDoS queries from Javascript	2021-06-30 16:26:07 +02:00
yoff	6a77b890af	Merge pull request #6155 from RasmusWL/port-cleartext-queries Python: Port cleartext queries	2021-06-30 15:52:34 +02:00
Rasmus Lerchedahl Petersen	a176e6ac30	Python: comment out temporarily unused predicate	2021-06-30 15:28:31 +02:00
Rasmus Lerchedahl Petersen	45e30b0c06	Python: comment out temporarily unused predicate	2021-06-30 15:04:37 +02:00
Rasmus Lerchedahl Petersen	c306cee04e	Python: mimic JS file hierarchy	2021-06-30 15:03:22 +02:00
Rasmus Lerchedahl Petersen	651f8abba0	Python: Avoid multiple results for `toString`	2021-06-30 14:39:49 +02:00
Rasmus Wriedt Larsen	c2708176b1	Python: Support %-style formatting for MarkupSafe	2021-06-30 14:15:41 +02:00
Rasmus Wriedt Larsen	c84658dff1	Python: Use `MethodCallNode` for `MarkupSafe` string-format	2021-06-30 13:58:09 +02:00
Rasmus Wriedt Larsen	d6e8fafdbd	Python: Proper sorting in `Frameworks.qll`	2021-06-30 13:55:26 +02:00
Rasmus Wriedt Larsen	075953860b	Merge branch 'main' into markupsafe-modeling	2021-06-30 13:55:08 +02:00
Rasmus Lerchedahl Petersen	72986e1e28	Python: Add some comments on the booelan sweep pattern	2021-06-30 12:50:36 +02:00
Rasmus Lerchedahl Petersen	52d91917aa	Merge branch 'python-port-ReDoS' of github.com:yoff/codeql into python-port-ReDoS	2021-06-30 12:25:59 +02:00
Rasmus Lerchedahl Petersen	6dfbf80494	Python: Disable use of `toUnicode` until supporting CLI is released	2021-06-30 12:21:52 +02:00
Rasmus Wriedt Larsen	e5d65992b4	Python: Use `DefinitionNode` instead of `Assign` Based on https://github.com/github/codeql/pull/6155#discussion_r660964666: > Hmm... Would it be better to do this using DefinitionNode instead of > Assign? The latter is fairly limited in what it can represent, and also > raises questions of whether this definition is sound with regard to > control-flow splitting.	2021-06-30 12:08:32 +02:00
yoff	c19522e921	Apply suggestions from code review Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>	2021-06-30 11:49:45 +02:00
Edoardo Pirovano	8354f66c29	Performance: Improve join order in data flow library	2021-06-29 18:23:22 +01:00
Rasmus Wriedt Larsen	94bcda3bae	Python: Highlight problem picking `DataFlow::Node` for `Assign`	2021-06-29 15:32:16 +02:00
Rasmus Lerchedahl Petersen	6f2cdbf59e	Python: Give up on providing values for form feeds	2021-06-29 11:14:27 +02:00
Rasmus Lerchedahl Petersen	ffb8938e52	Python: undo autoformat character mangling	2021-06-29 11:06:17 +02:00
Rasmus Lerchedahl Petersen	135b71b649	Python: Apply performance fix by @hvitved	2021-06-29 11:01:33 +02:00
Rasmus Lerchedahl Petersen	591b6ef69c	Python: Add ReDoS as identical files from JS The library specific file is `RegExpTreeView`. The files are recorded as identical via the mapping in `identical-files.json`.	2021-06-28 17:04:48 +02:00
Rasmus Lerchedahl Petersen	2c27ce7aa5	Python: Make ast viewer see regexes This work is due to @erik-krogh who also - made corresponding fixes to `RegexTreeView.qll` - implemented `toUnicode` so it is available on `String`s	2021-06-28 17:04:48 +02:00
Rasmus Lerchedahl Petersen	d953ba8dd4	Python: A parse-tree-view of regular expressions This contains several contributions from @erik-krogh and also some fixes from @nickrolfe	2021-06-28 17:04:48 +02:00
Rasmus Lerchedahl Petersen	21007d21f4	Python: track if qualifiers allow unbounded repeats. This in preparation for ReDoS	2021-06-28 17:04:48 +02:00
Rasmus Lerchedahl Petersen	74ca1d00b9	Python: More precise regex parsing	2021-06-28 17:04:48 +02:00
Rasmus Lerchedahl Petersen	e5f07cc4d3	Python: inline test of regex components - Added naive implementation of `charRange` so the test can run. - Made predicates public as needed.	2021-06-28 17:04:48 +02:00
Rasmus Wriedt Larsen	9573048ee8	Python: Port `py/clear-text-logging-sensitive-data`	2021-06-25 14:35:31 +02:00
Rasmus Wriedt Larsen	68cfeb0b5c	Python: Model logging from the `logging` module	2021-06-25 14:26:35 +02:00
Rasmus Wriedt Larsen	c05e375401	Python: Fix indentation of `hashlib` modeling	2021-06-25 14:26:35 +02:00
Rasmus Wriedt Larsen	36c9ceb13b	Python: Add `Logging` concept	2021-06-25 14:26:35 +02:00
Rasmus Wriedt Larsen	a7eb1b3a12	Python: Minor QLDoc fixup	2021-06-25 14:26:35 +02:00

1 2 3 4 5 ...

2702 Commits