Commit Graph

4561 Commits

Author SHA1 Message Date
Rasmus Wriedt Larsen
133632119d Python: Model werkzeug Headers
Also removed a misleading comment link to method on wrong class :D
2021-07-22 10:43:18 +02:00
Rasmus Wriedt Larsen
4d9c86a252 Python: Model Werkzeug FileStorage.save as FileSystemAccess 2021-07-22 10:43:18 +02:00
Rasmus Wriedt Larsen
9cb4899c5c Python: Add FileStorage modeling in Flask 2021-07-22 10:43:18 +02:00
Rasmus Wriedt Larsen
09b0c300d9 Python: Rewrite werkzeug to avoid InstanceSourceApiNode
InstanceSourceApiNode is a really good idea, but it just happened too
soon. I can't do what I need if I have to supply an API-node. So to
avoid confusion between deprecating to/from InstanceSource in those
classes, I opted to do some major reorganizing as well 👍

Due to aliasing restrictions, I had to use a little trick with the
`WerkzeugOld` module.
2021-07-22 10:43:18 +02:00
Rasmus Wriedt Larsen
04190ea308 Python: Add file-like modeling to werkzeug FileStorage 2021-07-22 10:43:18 +02:00
Rasmus Wriedt Larsen
5f5c0b11c7 Python: Refactor Werkzeugmodeling
Having the additional taint step just next to the other definitions, so
everything is together.
2021-07-22 10:43:18 +02:00
Rasmus Wriedt Larsen
4f4dec50f2 Python: Model ResovlerMatch in Django
Like before, omitted ClassInstantiation
2021-07-22 10:43:13 +02:00
jorgectf
edb273ace5 Merge remote-tracking branch 'origin/jorgectf/python/ldapimproperauth' into jorgectf/python/ldapinsecureauth 2021-07-22 02:51:19 +02:00
jorgectf
8d84d63b94 Add Python-Jose modeling and tests 2021-07-21 21:31:53 +02:00
jorgectf
ce507beed4 Add Authlib modeling and tests 2021-07-21 21:31:35 +02:00
jorgectf
f1b3c70909 Divide JWT libraries 2021-07-21 21:29:23 +02:00
Rasmus Wriedt Larsen
6f0a622252 Python: Remove ClassInstantiation from Django UploadedFile
since UploadedFile is the abstract base class, all real usage would be
of one of the subclasses, so removing this to not provide a false hope
that it actually works.

I don't think investing the time into making this work would give any
value, so that's why I didn't do it ;)
2021-07-21 16:35:09 +02:00
Rasmus Wriedt Larsen
7dc6518350 Python: Add FileLikeObject modeling
Such that the result of `request.FILES["key"].file.read()` is tainted
2021-07-21 16:35:09 +02:00
Rasmus Wriedt Larsen
18c0d13efd Python: Model most of UploadedFile in Django 2021-07-21 16:35:09 +02:00
Rasmus Wriedt Larsen
5ec5557203 Python: Model MultiValueDict in Django 2021-07-21 16:35:09 +02:00
Rasmus Wriedt Larsen
95e88c18b9 Python: Minor cleanup 2021-07-21 16:35:09 +02:00
Rasmus Wriedt Larsen
51b543c67c Python: Model taint for django request methods 2021-07-21 16:35:09 +02:00
Rasmus Wriedt Larsen
bced467a88 Python: Refactor django additional step handling
So it matches the new style we're using in aiohttp/twisted/...
2021-07-21 16:35:09 +02:00
Rasmus Wriedt Larsen
ce4b192caa Python: Improve usefulness of RemoteFlowSourcesReach meta query
Before, results from `dca` would look something like

    ## + py/meta/alerts/remote-flow-sources-reach

    - django/django@c2250cf_cb8f: tests/messages_tests/urls.py:38:16:38:48
        reachable with taint-tracking from RemoteFlowSource
    - django/django@c2250cf_cb8f: tests/messages_tests/urls.py:38:9:38:12
        reachable with taint-tracking from RemoteFlowSource

now it should make it easier to spot _what_ it is that actually changed,
since we pretty-print the node.
2021-07-21 16:35:09 +02:00
Rasmus Wriedt Larsen
6aabbf0b9a Python: Add some alert meta queries
Intended for use with dca
2021-07-21 14:53:01 +02:00
Taus
233ae5a54b Python: Fix FP in py/unused-local-variable
This is only a temporary fix, as indicated by the TODO comment.

The real underlying issue is the fact that `isUnused` is defined in
terms of the underlying SSA variables (as these are only created
for variables that are actually used), and the fact that annotated
assignments are always considered to redefine their targets, which may
not actually be the case.

Thus, the correct fix would be to change the extractor to _disregard_
mere type annotations for the purposes of figuring out whether an
SSA variable should be created or not.

However, in the short term the present fix is likely sufficient.
2021-07-20 12:13:44 +00:00
Taus
8b3fa789da Python: Add AnnAssign DefinitionNode
This was a source of false positives for the
`py/uninitialized-local-variable` query, as exemplified by the test
case.
2021-07-20 11:57:26 +00:00
Porcuiney Hairs
c6c925d67a Python : Improve Xpath Injection Query 2021-07-20 03:31:30 +05:30
Sam Havron
733e5b45bf Fix qhelp typo in RequestWithoutValidation 2021-07-19 16:01:06 -04:00
thank_you
9e01338500 Query only vulnerable methods 2021-07-18 17:13:10 -04:00
Rasmus Wriedt Larsen
a07de3faae Merge branch 'main' into emptyRedos 2021-07-15 18:21:29 +02:00
CodeQL CI
d282f6a356 Merge pull request #6218 from tausbn/python-add-typetrackingnode
Approved by RasmusWL
2021-07-15 07:04:50 -07:00
Taus
dd03d8102b Merge pull request #6300 from RasmusWL/redos-tests
Python: Fix `py/polynomial-redos`
2021-07-15 15:59:01 +02:00
Rasmus Wriedt Larsen
900cbc9a2f Merge pull request #6265 from tausbn/python-performance-fixes
Python: Fix a few performance issues.
2021-07-15 14:19:37 +02:00
Rasmus Wriedt Larsen
a5834c4d78 Python: Fix py/polynomial-redos 2021-07-15 14:16:19 +02:00
Anders Schack-Mulligen
8ccdd4fb9f Merge pull request #6211 from aschackmull/dataflow/refactor-call-context-check
Dataflow: Refactor call context check
2021-07-15 12:27:23 +02:00
Erik Krogh Kristensen
383b5f2ff2 implement RegExpSubPattern.getOperand in the Python regexp implementation 2021-07-15 09:41:53 +02:00
Erik Krogh Kristensen
de8f64c5be sync with python 2021-07-14 23:40:06 +02:00
Taus
fb57c5f6f0 Merge pull request #6143 from RasmusWL/concepts-private-import-python
Python: Make `import python` private in Concepts.qll
2021-07-14 17:49:06 +02:00
Taus
5c5ee85332 Merge pull request #6122 from RasmusWL/mention-mysqlclient
Python: Mention modeling of `mysqlclient` PyPI package
2021-07-14 17:48:40 +02:00
Taus
30d61045d2 Python: Mention nameIndicatesSensitiveData
Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
2021-07-14 17:33:39 +02:00
Taus
2bb44d49d9 Python: Perform more deduplication
This cut the evaluation time on `django` down from 1.2 seconds to ~0.8
seconds (but the impact will likely be greater on bigger projects).
2021-07-14 13:38:05 +00:00
Taus
09993406f1 Python: Add explanatory QLDoc comment 2021-07-14 10:42:07 +00:00
Anders Schack-Mulligen
0ccb213ec5 Dataflow: Sync. 2021-07-14 10:36:09 +02:00
CodeQL CI
f6f7020388 Merge pull request #6250 from erik-krogh/python-redos-unicode
Approved by RasmusWL
2021-07-14 01:09:26 -07:00
Taus
6aec7f2c49 Merge pull request #6264 from RasmusWL/customization-files-for-path-problems
Python: Provide proper source/sink customization for most path queries
2021-07-13 15:09:33 +02:00
Rasmus Wriedt Larsen
9ed61e7663 Python: Port py/polynomial-redos to use proper source/sink customization
I noticed the configuration/customization files are in the `performance`
folder in JS, but I just kept them in place, since that seems correct to
me.
2021-07-13 14:39:44 +02:00
Rasmus Wriedt Larsen
cea2f82be9 Python: Port py/path-injection to use proper source/sink customization 2021-07-13 14:09:02 +02:00
Rasmus Wriedt Larsen
bf214ac3bb Python: Apply suggestions from code review
Co-authored-by: Taus <tausbn@github.com>
2021-07-13 13:41:26 +02:00
Rasmus Wriedt Larsen
1a59c9b64a Merge pull request #6204 from tausbn/python-ensmallen-localsourcenode
Python: Clean up `LocalSourceNode` charpred
2021-07-13 13:27:38 +02:00
Taus
1decf23785 Python: Fix bad join order for sensitive data
Not the prettiest of solutions, but it does the job. Basically, we were
calculating (and re-calculating) the same big relation between strings
and regexes and then checking whether the latter matched the former.

This resulted in tuple counts like the following:

```
[2021-07-12 16:09:24] (12s) Tuple counts for SensitiveDataSources::SensitiveDataModeling::SensitiveVariableAssignment#class#ff#shared/4@7489c6:
4918074 ~0%     {4} r1 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH Flow::NameNode::getId_dispred#ff CARTESIAN PRODUCT OUTPUT Lhs.0 'arg0', Lhs.1 'arg1', Rhs.0, Rhs.1 'arg3'
2654    ~0%     {4} r2 = JOIN r1 WITH PRIMITIVE regexpMatch#bb ON Lhs.3 'arg3',Lhs.1 'arg1'
                return r2
```
(The above being just the bit that handles `DefinitionNode` in
`SensitiveVariableAssignment`, and taking 12 seconds to evaluate.)

By applying a bit of manual inlining and magic, this becomes somewhat
more manageable:

```
[2021-07-12 15:59:44] (1s) Tuple counts for SensitiveDataSources::SensitiveDataModeling::sensitiveString#ff/2@8830e2:
27671  ~2%      {3} r1 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveParameterName#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0

334012 ~2%      {3} r2 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveName#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0

361683 ~11%     {3} r3 = r1 UNION r2

154644 ~0%      {3} r4 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveFunctionName#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0

149198 ~1%      {3} r5 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveStrConst#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0

124257 ~5%      {3} r6 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveAttributeName#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0

273455 ~21%     {3} r7 = r5 UNION r6
428099 ~30%     {3} r8 = r4 UNION r7
789782 ~78%     {3} r9 = r3 UNION r8
1121   ~77%     {3} r10 = JOIN r9 WITH PRIMITIVE regexpMatch#bb ON Lhs.2 'result',Lhs.1
1121   ~70%     {2} r11 = SCAN r10 OUTPUT In.0 'classification', In.2 'result'
                return r11
```
(The above being the total for all the sensitive names we care about,
taking only 1.2 seconds to evaluate.)

Incidentally, you may wonder why this has _fewer_ results than before.
The answer is control flow splitting -- every sensitively-named
`DefinitionNode` would have been matched in isolation previously. By
pre-matching on just the names of these, we can subsequently join
against those names that are known to be sensitive, which is a much
faster operation.

(We also get the benefit of deduplicating the strings that are matched,
before actually performing the match, so if, say, an attribute name and
a variable name are identical, then we'll only match them once.)

We also exclude all docstrings as relevant string constants, as these
presumably don't actually flow anywhere.
2021-07-12 16:10:49 +00:00
Taus
a73e382dfe Python: Prevent bad join in hashlib model
I'm not entirely sure what triggered this bad join order, but some
combination of the use of abstract classes and the exclusion of `new`
caused this to go really wrong:

```
WeakSensitiveDataHashing.ql-15:Stdlib::Stdlib::HashlibDataPassedToHashClass#class#ffff ......... 15.5s
```

with the following tuple counts:
```
[2021-07-12 13:20:15] (16s) Tuple counts for Stdlib::Stdlib::HashlibDataPassedToHashClass#class#ffff/4@217901:
148810  ~3%        {3} r1 = JOIN DataFlowPublic::CallCfgNode#class#ff#shared WITH project#DataFlowPublic::CallCfgNode::getArg_dispred#fff ON FIRST 1 OUTPUT "hashlib", Lhs.1 'node', Lhs.0 'this'
148810  ~4%        {3} r2 = JOIN r1 WITH ApiGraphs::API::Impl::MkModuleImport#ff@staged_ext ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'node', Lhs.2 'this'
7589310 ~486%      {4} r3 = JOIN r2 WITH ApiGraphs::API::Impl::edge#2#fff@staged_ext ON FIRST 1 OUTPUT Lhs.1 'node', Lhs.2 'this', Rhs.1, InverseAppend("getMember(\"","\")",Rhs.1)
6994070 ~490%      {4} r4 = SELECT r3 ON In.3 != "new"
6994070 ~4503%     {2} r5 = SCAN r4 OUTPUT In.1 'this', In.0 'node'

22      ~4%        {3} r6 = JOIN DataFlowPublic::CallCfgNode#class#ff#shared WITH project#DataFlowPublic::CallCfgNode::getArgByName_dispred#fff ON FIRST 1 OUTPUT "hashlib", Lhs.1 'node', Lhs.0 'this'
22      ~0%        {3} r7 = JOIN r6 WITH ApiGraphs::API::Impl::MkModuleImport#ff@staged_ext ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'node', Lhs.2 'this'
1122    ~437%      {4} r8 = JOIN r7 WITH ApiGraphs::API::Impl::edge#2#fff@staged_ext ON FIRST 1 OUTPUT Lhs.1 'node', Lhs.2 'this', Rhs.1, InverseAppend("getMember(\"","\")",Rhs.1)
1034    ~460%      {4} r9 = SELECT r8 ON In.3 != "new"
1034    ~4549%     {2} r10 = SCAN r9 OUTPUT In.1 'this', In.0 'node'

6995104 ~4503%     {2} r11 = r5 UNION r10
5213851 ~4683%     {3} r12 = JOIN r11 WITH ApiGraphs::API::Node::getACall_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'hashClass', Lhs.1 'node', Lhs.0 'this'
6478480 ~4646%     {6} r13 = JOIN r12 WITH ApiGraphs::API::Impl::edge#2#fff_201#join_rhs ON FIRST 1 OUTPUT "hashlib", Rhs.1, Lhs.1 'node', Lhs.2 'this', Lhs.0 'hashClass', Rhs.2
1410    ~4693%     {5} r14 = JOIN r13 WITH ApiGraphs::API::Impl::MkModuleImport#ff@staged_ext ON FIRST 2 OUTPUT Lhs.2 'node', Lhs.3 'this', Lhs.4 'hashClass', Lhs.5, InverseAppend("getMember(\"","\")",Lhs.5)
1222    ~4540%     {5} r15 = SELECT r14 ON In.4 'hashName' != "new"
1222    ~4540%     {4} r16 = SCAN r15 OUTPUT In.1 'this', In.4 'hashName', In.2 'hashClass', In.0 'node'
```

By factoring out the insides, the biggest iteration now looks like

```
[2021-07-12 14:17:36] (0s) Tuple counts for Stdlib::Stdlib::HashlibDataPassedToHashClass#class#ffff/4@85bb21:
148810 ~0%     {2} r1 = JOIN DataFlowPublic::CallCfgNode#class#ff#shared WITH project#DataFlowPublic::CallCfgNode::getArg_dispred#fff ON FIRST 1 OUTPUT Lhs.1 'node', Lhs.0 'this'
148810 ~0%     {2} r2 = JOIN r1 WITH Stdlib::Stdlib::hashlibMember#ff#nonempty CARTESIAN PRODUCT OUTPUT Lhs.1 'this', Lhs.0 'node'

22     ~0%     {2} r3 = JOIN DataFlowPublic::CallCfgNode#class#ff#shared WITH project#DataFlowPublic::CallCfgNode::getArgByName_dispred#fff ON FIRST 1 OUTPUT Lhs.1 'node', Lhs.0 'this'
22     ~0%     {2} r4 = JOIN r3 WITH Stdlib::Stdlib::hashlibMember#ff#nonempty CARTESIAN PRODUCT OUTPUT Lhs.1 'this', Lhs.0 'node'

148832 ~0%     {2} r5 = r2 UNION r4
110933 ~2%     {3} r6 = JOIN r5 WITH ApiGraphs::API::Node::getACall_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'hashClass', Lhs.1 'node', Lhs.0 'this'
26     ~0%     {4} r7 = JOIN r6 WITH Stdlib::Stdlib::hashlibMember#ff_10#join_rhs ON FIRST 1 OUTPUT Lhs.2 'this', Rhs.1 'hashName', Lhs.0 'hashClass', Lhs.1 'node'
               return r7
```

(The tuple counts themselves are not directly comparable.)
2021-07-12 14:22:21 +00:00
Rasmus Wriedt Larsen
47f5c977cf Python: Port py/stack-trace-exposure to use proper source/sink customization 2021-07-12 16:22:10 +02:00
Rasmus Wriedt Larsen
934007c811 Python: Port py/unsafe-deserialization to use proper source/sink customization 2021-07-12 16:22:10 +02:00
Rasmus Wriedt Larsen
7c71223f7f Python: Port py/url-redirection to use proper source/sink customization 2021-07-12 16:22:10 +02:00