Commit Graph

3955 Commits

Author SHA1 Message Date
Taus
8b3fa789da Python: Add AnnAssign DefinitionNode
This was a source of false positives for the
`py/uninitialized-local-variable` query, as exemplified by the test
case.
2021-07-20 11:57:26 +00:00
Taus
f91e826781 Python: Add test case 2021-07-20 11:57:12 +00:00
Porcuiney Hairs
c6c925d67a Python : Improve Xpath Injection Query 2021-07-20 03:31:30 +05:30
Sam Havron
733e5b45bf Fix qhelp typo in RequestWithoutValidation 2021-07-19 16:01:06 -04:00
Rasmus Wriedt Larsen
5249591747 Python: Fix test folder for InsecureProtocol 2021-07-19 16:57:00 +02:00
Rasmus Wriedt Larsen
5939128a76 Python: Fix test folder for InsecureDefaultProtocol
it was named wrong before. whoops.
2021-07-19 16:56:07 +02:00
Rasmus Wriedt Larsen
77021ae119 Python: Restructure security tests to contain query name
We were mixing between things, so this is just to keep things
consistent. Even though it's not strictly needed for all queries,
it does look nice I think
2021-07-19 16:54:34 +02:00
Rasmus Wriedt Larsen
da021feb8b Python: Move py/incomplete-hostname-regexp tests to own folder 2021-07-19 16:48:21 +02:00
Rasmus Wriedt Larsen
7939a1372e Python: Move Jinja2WithoutEscaping tests to own folder 2021-07-19 16:44:41 +02:00
thank_you
9e01338500 Query only vulnerable methods 2021-07-18 17:13:10 -04:00
Taus
4f3f93f267 Python: Autoformat 2021-07-16 12:22:24 +00:00
Taus
3fd0ec74f0 Python: Deprecate importNode
Unsurprisingly, the only thing affected by this was the `import-helper`
tests. These have lost all of the results relating to `ImportMember`s,
but apart from that the underlying behaviour should be the same.

I also limited the test to only `CfgNode`s, as a bunch of `EssaNode`s
suddenly appeared when I switched to API graphs.

Finally, I used `API::moduleImport` with a dotted name in the type
tracking tests. This goes against the API graphs interface, but I think
it's more correct for this use case, as these type trackers are doing
the "module attribute lookup" bit manually.
2021-07-16 11:38:30 +00:00
Rasmus Wriedt Larsen
a07de3faae Merge branch 'main' into emptyRedos 2021-07-15 18:21:29 +02:00
jorgectf
6f09b95019 Update .expected 2021-07-15 17:16:29 +02:00
CodeQL CI
d282f6a356 Merge pull request #6218 from tausbn/python-add-typetrackingnode
Approved by RasmusWL
2021-07-15 07:04:50 -07:00
Taus
dd03d8102b Merge pull request #6300 from RasmusWL/redos-tests
Python: Fix `py/polynomial-redos`
2021-07-15 15:59:01 +02:00
Rasmus Wriedt Larsen
900cbc9a2f Merge pull request #6265 from tausbn/python-performance-fixes
Python: Fix a few performance issues.
2021-07-15 14:19:37 +02:00
Rasmus Wriedt Larsen
a5834c4d78 Python: Fix py/polynomial-redos 2021-07-15 14:16:19 +02:00
Rasmus Wriedt Larsen
76caf43b54 Python: Add tests for py/polynomial-redos 2021-07-15 14:15:44 +02:00
Rasmus Wriedt Larsen
1be0dc0876 Python: Move test for ReDoS 2021-07-15 14:15:24 +02:00
Anders Schack-Mulligen
8ccdd4fb9f Merge pull request #6211 from aschackmull/dataflow/refactor-call-context-check
Dataflow: Refactor call context check
2021-07-15 12:27:23 +02:00
Erik Krogh Kristensen
383b5f2ff2 implement RegExpSubPattern.getOperand in the Python regexp implementation 2021-07-15 09:41:53 +02:00
Erik Krogh Kristensen
de8f64c5be sync with python 2021-07-14 23:40:06 +02:00
Taus
fb57c5f6f0 Merge pull request #6143 from RasmusWL/concepts-private-import-python
Python: Make `import python` private in Concepts.qll
2021-07-14 17:49:06 +02:00
Taus
5c5ee85332 Merge pull request #6122 from RasmusWL/mention-mysqlclient
Python: Mention modeling of `mysqlclient` PyPI package
2021-07-14 17:48:40 +02:00
Taus
30d61045d2 Python: Mention nameIndicatesSensitiveData
Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
2021-07-14 17:33:39 +02:00
Taus
5a9fca48e8 Python: Fix ExceptStmt::getType
We were not supporting `except` statements handling multiple exception
types (specified as a tuple) correctly, instead just returning the
tuple itself as the "type" (which makes little sense).

To fix this, we explicitly extract the elements of this node, in the
case where it _is_ a tuple.

This is a change that can potentially affect many queries (as `getType`
is used in quite a few places), so some care should be taken to
ensure that this does not adversely affect performance.
2021-07-14 14:03:49 +00:00
Taus
ec9063b4a5 Python: Add test case for github/codeql#6227 2021-07-14 13:52:32 +00:00
Taus
2bb44d49d9 Python: Perform more deduplication
This cut the evaluation time on `django` down from 1.2 seconds to ~0.8
seconds (but the impact will likely be greater on bigger projects).
2021-07-14 13:38:05 +00:00
Taus
09993406f1 Python: Add explanatory QLDoc comment 2021-07-14 10:42:07 +00:00
Anders Schack-Mulligen
0ccb213ec5 Dataflow: Sync. 2021-07-14 10:36:09 +02:00
CodeQL CI
f6f7020388 Merge pull request #6250 from erik-krogh/python-redos-unicode
Approved by RasmusWL
2021-07-14 01:09:26 -07:00
Taus
c3789811c8 Python: Support import * in API graphs 2021-07-13 18:22:51 +00:00
Taus
8b6b4dde69 Python: Refactor built-ins logic
This will make it possible to reuse for names defined in `import *`.
2021-07-13 18:20:25 +00:00
${sleep,5}
51a6140258 Change variable name to correct sanitized input variable
Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
2021-07-13 14:04:06 -04:00
Taus
df8a6b984a Python: Add import * tests
Moves the current test out of `test.py`, as otherwise any unknown global
(like, say, `sink`) would _also_ be considered to be something
potentially defined in `unknown`.
2021-07-13 17:46:59 +00:00
Taus
6aec7f2c49 Merge pull request #6264 from RasmusWL/customization-files-for-path-problems
Python: Provide proper source/sink customization for most path queries
2021-07-13 15:09:33 +02:00
Rasmus Wriedt Larsen
9ed61e7663 Python: Port py/polynomial-redos to use proper source/sink customization
I noticed the configuration/customization files are in the `performance`
folder in JS, but I just kept them in place, since that seems correct to
me.
2021-07-13 14:39:44 +02:00
Rasmus Wriedt Larsen
cea2f82be9 Python: Port py/path-injection to use proper source/sink customization 2021-07-13 14:09:02 +02:00
Rasmus Wriedt Larsen
bf214ac3bb Python: Apply suggestions from code review
Co-authored-by: Taus <tausbn@github.com>
2021-07-13 13:41:26 +02:00
Rasmus Wriedt Larsen
1a59c9b64a Merge pull request #6204 from tausbn/python-ensmallen-localsourcenode
Python: Clean up `LocalSourceNode` charpred
2021-07-13 13:27:38 +02:00
Taus
1decf23785 Python: Fix bad join order for sensitive data
Not the prettiest of solutions, but it does the job. Basically, we were
calculating (and re-calculating) the same big relation between strings
and regexes and then checking whether the latter matched the former.

This resulted in tuple counts like the following:

```
[2021-07-12 16:09:24] (12s) Tuple counts for SensitiveDataSources::SensitiveDataModeling::SensitiveVariableAssignment#class#ff#shared/4@7489c6:
4918074 ~0%     {4} r1 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH Flow::NameNode::getId_dispred#ff CARTESIAN PRODUCT OUTPUT Lhs.0 'arg0', Lhs.1 'arg1', Rhs.0, Rhs.1 'arg3'
2654    ~0%     {4} r2 = JOIN r1 WITH PRIMITIVE regexpMatch#bb ON Lhs.3 'arg3',Lhs.1 'arg1'
                return r2
```
(The above being just the bit that handles `DefinitionNode` in
`SensitiveVariableAssignment`, and taking 12 seconds to evaluate.)

By applying a bit of manual inlining and magic, this becomes somewhat
more manageable:

```
[2021-07-12 15:59:44] (1s) Tuple counts for SensitiveDataSources::SensitiveDataModeling::sensitiveString#ff/2@8830e2:
27671  ~2%      {3} r1 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveParameterName#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0

334012 ~2%      {3} r2 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveName#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0

361683 ~11%     {3} r3 = r1 UNION r2

154644 ~0%      {3} r4 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveFunctionName#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0

149198 ~1%      {3} r5 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveStrConst#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0

124257 ~5%      {3} r6 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveAttributeName#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0

273455 ~21%     {3} r7 = r5 UNION r6
428099 ~30%     {3} r8 = r4 UNION r7
789782 ~78%     {3} r9 = r3 UNION r8
1121   ~77%     {3} r10 = JOIN r9 WITH PRIMITIVE regexpMatch#bb ON Lhs.2 'result',Lhs.1
1121   ~70%     {2} r11 = SCAN r10 OUTPUT In.0 'classification', In.2 'result'
                return r11
```
(The above being the total for all the sensitive names we care about,
taking only 1.2 seconds to evaluate.)

Incidentally, you may wonder why this has _fewer_ results than before.
The answer is control flow splitting -- every sensitively-named
`DefinitionNode` would have been matched in isolation previously. By
pre-matching on just the names of these, we can subsequently join
against those names that are known to be sensitive, which is a much
faster operation.

(We also get the benefit of deduplicating the strings that are matched,
before actually performing the match, so if, say, an attribute name and
a variable name are identical, then we'll only match them once.)

We also exclude all docstrings as relevant string constants, as these
presumably don't actually flow anywhere.
2021-07-12 16:10:49 +00:00
Taus
a73e382dfe Python: Prevent bad join in hashlib model
I'm not entirely sure what triggered this bad join order, but some
combination of the use of abstract classes and the exclusion of `new`
caused this to go really wrong:

```
WeakSensitiveDataHashing.ql-15:Stdlib::Stdlib::HashlibDataPassedToHashClass#class#ffff ......... 15.5s
```

with the following tuple counts:
```
[2021-07-12 13:20:15] (16s) Tuple counts for Stdlib::Stdlib::HashlibDataPassedToHashClass#class#ffff/4@217901:
148810  ~3%        {3} r1 = JOIN DataFlowPublic::CallCfgNode#class#ff#shared WITH project#DataFlowPublic::CallCfgNode::getArg_dispred#fff ON FIRST 1 OUTPUT "hashlib", Lhs.1 'node', Lhs.0 'this'
148810  ~4%        {3} r2 = JOIN r1 WITH ApiGraphs::API::Impl::MkModuleImport#ff@staged_ext ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'node', Lhs.2 'this'
7589310 ~486%      {4} r3 = JOIN r2 WITH ApiGraphs::API::Impl::edge#2#fff@staged_ext ON FIRST 1 OUTPUT Lhs.1 'node', Lhs.2 'this', Rhs.1, InverseAppend("getMember(\"","\")",Rhs.1)
6994070 ~490%      {4} r4 = SELECT r3 ON In.3 != "new"
6994070 ~4503%     {2} r5 = SCAN r4 OUTPUT In.1 'this', In.0 'node'

22      ~4%        {3} r6 = JOIN DataFlowPublic::CallCfgNode#class#ff#shared WITH project#DataFlowPublic::CallCfgNode::getArgByName_dispred#fff ON FIRST 1 OUTPUT "hashlib", Lhs.1 'node', Lhs.0 'this'
22      ~0%        {3} r7 = JOIN r6 WITH ApiGraphs::API::Impl::MkModuleImport#ff@staged_ext ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'node', Lhs.2 'this'
1122    ~437%      {4} r8 = JOIN r7 WITH ApiGraphs::API::Impl::edge#2#fff@staged_ext ON FIRST 1 OUTPUT Lhs.1 'node', Lhs.2 'this', Rhs.1, InverseAppend("getMember(\"","\")",Rhs.1)
1034    ~460%      {4} r9 = SELECT r8 ON In.3 != "new"
1034    ~4549%     {2} r10 = SCAN r9 OUTPUT In.1 'this', In.0 'node'

6995104 ~4503%     {2} r11 = r5 UNION r10
5213851 ~4683%     {3} r12 = JOIN r11 WITH ApiGraphs::API::Node::getACall_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'hashClass', Lhs.1 'node', Lhs.0 'this'
6478480 ~4646%     {6} r13 = JOIN r12 WITH ApiGraphs::API::Impl::edge#2#fff_201#join_rhs ON FIRST 1 OUTPUT "hashlib", Rhs.1, Lhs.1 'node', Lhs.2 'this', Lhs.0 'hashClass', Rhs.2
1410    ~4693%     {5} r14 = JOIN r13 WITH ApiGraphs::API::Impl::MkModuleImport#ff@staged_ext ON FIRST 2 OUTPUT Lhs.2 'node', Lhs.3 'this', Lhs.4 'hashClass', Lhs.5, InverseAppend("getMember(\"","\")",Lhs.5)
1222    ~4540%     {5} r15 = SELECT r14 ON In.4 'hashName' != "new"
1222    ~4540%     {4} r16 = SCAN r15 OUTPUT In.1 'this', In.4 'hashName', In.2 'hashClass', In.0 'node'
```

By factoring out the insides, the biggest iteration now looks like

```
[2021-07-12 14:17:36] (0s) Tuple counts for Stdlib::Stdlib::HashlibDataPassedToHashClass#class#ffff/4@85bb21:
148810 ~0%     {2} r1 = JOIN DataFlowPublic::CallCfgNode#class#ff#shared WITH project#DataFlowPublic::CallCfgNode::getArg_dispred#fff ON FIRST 1 OUTPUT Lhs.1 'node', Lhs.0 'this'
148810 ~0%     {2} r2 = JOIN r1 WITH Stdlib::Stdlib::hashlibMember#ff#nonempty CARTESIAN PRODUCT OUTPUT Lhs.1 'this', Lhs.0 'node'

22     ~0%     {2} r3 = JOIN DataFlowPublic::CallCfgNode#class#ff#shared WITH project#DataFlowPublic::CallCfgNode::getArgByName_dispred#fff ON FIRST 1 OUTPUT Lhs.1 'node', Lhs.0 'this'
22     ~0%     {2} r4 = JOIN r3 WITH Stdlib::Stdlib::hashlibMember#ff#nonempty CARTESIAN PRODUCT OUTPUT Lhs.1 'this', Lhs.0 'node'

148832 ~0%     {2} r5 = r2 UNION r4
110933 ~2%     {3} r6 = JOIN r5 WITH ApiGraphs::API::Node::getACall_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'hashClass', Lhs.1 'node', Lhs.0 'this'
26     ~0%     {4} r7 = JOIN r6 WITH Stdlib::Stdlib::hashlibMember#ff_10#join_rhs ON FIRST 1 OUTPUT Lhs.2 'this', Rhs.1 'hashName', Lhs.0 'hashClass', Lhs.1 'node'
               return r7
```

(The tuple counts themselves are not directly comparable.)
2021-07-12 14:22:21 +00:00
Rasmus Wriedt Larsen
47f5c977cf Python: Port py/stack-trace-exposure to use proper source/sink customization 2021-07-12 16:22:10 +02:00
Rasmus Wriedt Larsen
934007c811 Python: Port py/unsafe-deserialization to use proper source/sink customization 2021-07-12 16:22:10 +02:00
Rasmus Wriedt Larsen
7c71223f7f Python: Port py/url-redirection to use proper source/sink customization 2021-07-12 16:22:10 +02:00
Rasmus Wriedt Larsen
b4c0b1b525 Python: Port py/reflective-xss to use proper source/sink customization 2021-07-12 16:22:10 +02:00
Rasmus Wriedt Larsen
62e4445f45 Python: Port py/command-line-injection to use proper source/sink customization 2021-07-12 16:22:10 +02:00
Rasmus Wriedt Larsen
7f53781ba7 Python: Port py/code-injection to use proper source/sink customization 2021-07-12 16:22:10 +02:00
Rasmus Wriedt Larsen
0be280c608 Python: Port py/sql-injection to use proper source/sink customization 2021-07-12 16:22:10 +02:00