Commit Graph

3072 Commits

Author SHA1 Message Date
Rasmus Wriedt Larsen
51b543c67c Python: Model taint for django request methods 2021-07-21 16:35:09 +02:00
Rasmus Wriedt Larsen
bced467a88 Python: Refactor django additional step handling
So it matches the new style we're using in aiohttp/twisted/...
2021-07-21 16:35:09 +02:00
Rasmus Wriedt Larsen
ce4b192caa Python: Improve usefulness of RemoteFlowSourcesReach meta query
Before, results from `dca` would look something like

    ## + py/meta/alerts/remote-flow-sources-reach

    - django/django@c2250cf_cb8f: tests/messages_tests/urls.py:38:16:38:48
        reachable with taint-tracking from RemoteFlowSource
    - django/django@c2250cf_cb8f: tests/messages_tests/urls.py:38:9:38:12
        reachable with taint-tracking from RemoteFlowSource

now it should make it easier to spot _what_ it is that actually changed,
since we pretty-print the node.
2021-07-21 16:35:09 +02:00
Rasmus Wriedt Larsen
6aabbf0b9a Python: Add some alert meta queries
Intended for use with dca
2021-07-21 14:53:01 +02:00
Taus
233ae5a54b Python: Fix FP in py/unused-local-variable
This is only a temporary fix, as indicated by the TODO comment.

The real underlying issue is the fact that `isUnused` is defined in
terms of the underlying SSA variables (as these are only created
for variables that are actually used), and the fact that annotated
assignments are always considered to redefine their targets, which may
not actually be the case.

Thus, the correct fix would be to change the extractor to _disregard_
mere type annotations for the purposes of figuring out whether an
SSA variable should be created or not.

However, in the short term the present fix is likely sufficient.
2021-07-20 12:13:44 +00:00
Taus
8b3fa789da Python: Add AnnAssign DefinitionNode
This was a source of false positives for the
`py/uninitialized-local-variable` query, as exemplified by the test
case.
2021-07-20 11:57:26 +00:00
Porcuiney Hairs
c6c925d67a Python : Improve Xpath Injection Query 2021-07-20 03:31:30 +05:30
Sam Havron
733e5b45bf Fix qhelp typo in RequestWithoutValidation 2021-07-19 16:01:06 -04:00
thank_you
9e01338500 Query only vulnerable methods 2021-07-18 17:13:10 -04:00
Rasmus Wriedt Larsen
a07de3faae Merge branch 'main' into emptyRedos 2021-07-15 18:21:29 +02:00
CodeQL CI
d282f6a356 Merge pull request #6218 from tausbn/python-add-typetrackingnode
Approved by RasmusWL
2021-07-15 07:04:50 -07:00
Taus
dd03d8102b Merge pull request #6300 from RasmusWL/redos-tests
Python: Fix `py/polynomial-redos`
2021-07-15 15:59:01 +02:00
Rasmus Wriedt Larsen
900cbc9a2f Merge pull request #6265 from tausbn/python-performance-fixes
Python: Fix a few performance issues.
2021-07-15 14:19:37 +02:00
Rasmus Wriedt Larsen
a5834c4d78 Python: Fix py/polynomial-redos 2021-07-15 14:16:19 +02:00
Anders Schack-Mulligen
8ccdd4fb9f Merge pull request #6211 from aschackmull/dataflow/refactor-call-context-check
Dataflow: Refactor call context check
2021-07-15 12:27:23 +02:00
Erik Krogh Kristensen
383b5f2ff2 implement RegExpSubPattern.getOperand in the Python regexp implementation 2021-07-15 09:41:53 +02:00
Erik Krogh Kristensen
de8f64c5be sync with python 2021-07-14 23:40:06 +02:00
Taus
fb57c5f6f0 Merge pull request #6143 from RasmusWL/concepts-private-import-python
Python: Make `import python` private in Concepts.qll
2021-07-14 17:49:06 +02:00
Taus
5c5ee85332 Merge pull request #6122 from RasmusWL/mention-mysqlclient
Python: Mention modeling of `mysqlclient` PyPI package
2021-07-14 17:48:40 +02:00
Taus
30d61045d2 Python: Mention nameIndicatesSensitiveData
Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
2021-07-14 17:33:39 +02:00
Taus
2bb44d49d9 Python: Perform more deduplication
This cut the evaluation time on `django` down from 1.2 seconds to ~0.8
seconds (but the impact will likely be greater on bigger projects).
2021-07-14 13:38:05 +00:00
Taus
09993406f1 Python: Add explanatory QLDoc comment 2021-07-14 10:42:07 +00:00
Anders Schack-Mulligen
0ccb213ec5 Dataflow: Sync. 2021-07-14 10:36:09 +02:00
CodeQL CI
f6f7020388 Merge pull request #6250 from erik-krogh/python-redos-unicode
Approved by RasmusWL
2021-07-14 01:09:26 -07:00
Taus
6aec7f2c49 Merge pull request #6264 from RasmusWL/customization-files-for-path-problems
Python: Provide proper source/sink customization for most path queries
2021-07-13 15:09:33 +02:00
Rasmus Wriedt Larsen
9ed61e7663 Python: Port py/polynomial-redos to use proper source/sink customization
I noticed the configuration/customization files are in the `performance`
folder in JS, but I just kept them in place, since that seems correct to
me.
2021-07-13 14:39:44 +02:00
Rasmus Wriedt Larsen
cea2f82be9 Python: Port py/path-injection to use proper source/sink customization 2021-07-13 14:09:02 +02:00
Rasmus Wriedt Larsen
bf214ac3bb Python: Apply suggestions from code review
Co-authored-by: Taus <tausbn@github.com>
2021-07-13 13:41:26 +02:00
Rasmus Wriedt Larsen
1a59c9b64a Merge pull request #6204 from tausbn/python-ensmallen-localsourcenode
Python: Clean up `LocalSourceNode` charpred
2021-07-13 13:27:38 +02:00
Taus
1decf23785 Python: Fix bad join order for sensitive data
Not the prettiest of solutions, but it does the job. Basically, we were
calculating (and re-calculating) the same big relation between strings
and regexes and then checking whether the latter matched the former.

This resulted in tuple counts like the following:

```
[2021-07-12 16:09:24] (12s) Tuple counts for SensitiveDataSources::SensitiveDataModeling::SensitiveVariableAssignment#class#ff#shared/4@7489c6:
4918074 ~0%     {4} r1 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH Flow::NameNode::getId_dispred#ff CARTESIAN PRODUCT OUTPUT Lhs.0 'arg0', Lhs.1 'arg1', Rhs.0, Rhs.1 'arg3'
2654    ~0%     {4} r2 = JOIN r1 WITH PRIMITIVE regexpMatch#bb ON Lhs.3 'arg3',Lhs.1 'arg1'
                return r2
```
(The above being just the bit that handles `DefinitionNode` in
`SensitiveVariableAssignment`, and taking 12 seconds to evaluate.)

By applying a bit of manual inlining and magic, this becomes somewhat
more manageable:

```
[2021-07-12 15:59:44] (1s) Tuple counts for SensitiveDataSources::SensitiveDataModeling::sensitiveString#ff/2@8830e2:
27671  ~2%      {3} r1 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveParameterName#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0

334012 ~2%      {3} r2 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveName#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0

361683 ~11%     {3} r3 = r1 UNION r2

154644 ~0%      {3} r4 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveFunctionName#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0

149198 ~1%      {3} r5 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveStrConst#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0

124257 ~5%      {3} r6 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveAttributeName#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0

273455 ~21%     {3} r7 = r5 UNION r6
428099 ~30%     {3} r8 = r4 UNION r7
789782 ~78%     {3} r9 = r3 UNION r8
1121   ~77%     {3} r10 = JOIN r9 WITH PRIMITIVE regexpMatch#bb ON Lhs.2 'result',Lhs.1
1121   ~70%     {2} r11 = SCAN r10 OUTPUT In.0 'classification', In.2 'result'
                return r11
```
(The above being the total for all the sensitive names we care about,
taking only 1.2 seconds to evaluate.)

Incidentally, you may wonder why this has _fewer_ results than before.
The answer is control flow splitting -- every sensitively-named
`DefinitionNode` would have been matched in isolation previously. By
pre-matching on just the names of these, we can subsequently join
against those names that are known to be sensitive, which is a much
faster operation.

(We also get the benefit of deduplicating the strings that are matched,
before actually performing the match, so if, say, an attribute name and
a variable name are identical, then we'll only match them once.)

We also exclude all docstrings as relevant string constants, as these
presumably don't actually flow anywhere.
2021-07-12 16:10:49 +00:00
Taus
a73e382dfe Python: Prevent bad join in hashlib model
I'm not entirely sure what triggered this bad join order, but some
combination of the use of abstract classes and the exclusion of `new`
caused this to go really wrong:

```
WeakSensitiveDataHashing.ql-15:Stdlib::Stdlib::HashlibDataPassedToHashClass#class#ffff ......... 15.5s
```

with the following tuple counts:
```
[2021-07-12 13:20:15] (16s) Tuple counts for Stdlib::Stdlib::HashlibDataPassedToHashClass#class#ffff/4@217901:
148810  ~3%        {3} r1 = JOIN DataFlowPublic::CallCfgNode#class#ff#shared WITH project#DataFlowPublic::CallCfgNode::getArg_dispred#fff ON FIRST 1 OUTPUT "hashlib", Lhs.1 'node', Lhs.0 'this'
148810  ~4%        {3} r2 = JOIN r1 WITH ApiGraphs::API::Impl::MkModuleImport#ff@staged_ext ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'node', Lhs.2 'this'
7589310 ~486%      {4} r3 = JOIN r2 WITH ApiGraphs::API::Impl::edge#2#fff@staged_ext ON FIRST 1 OUTPUT Lhs.1 'node', Lhs.2 'this', Rhs.1, InverseAppend("getMember(\"","\")",Rhs.1)
6994070 ~490%      {4} r4 = SELECT r3 ON In.3 != "new"
6994070 ~4503%     {2} r5 = SCAN r4 OUTPUT In.1 'this', In.0 'node'

22      ~4%        {3} r6 = JOIN DataFlowPublic::CallCfgNode#class#ff#shared WITH project#DataFlowPublic::CallCfgNode::getArgByName_dispred#fff ON FIRST 1 OUTPUT "hashlib", Lhs.1 'node', Lhs.0 'this'
22      ~0%        {3} r7 = JOIN r6 WITH ApiGraphs::API::Impl::MkModuleImport#ff@staged_ext ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'node', Lhs.2 'this'
1122    ~437%      {4} r8 = JOIN r7 WITH ApiGraphs::API::Impl::edge#2#fff@staged_ext ON FIRST 1 OUTPUT Lhs.1 'node', Lhs.2 'this', Rhs.1, InverseAppend("getMember(\"","\")",Rhs.1)
1034    ~460%      {4} r9 = SELECT r8 ON In.3 != "new"
1034    ~4549%     {2} r10 = SCAN r9 OUTPUT In.1 'this', In.0 'node'

6995104 ~4503%     {2} r11 = r5 UNION r10
5213851 ~4683%     {3} r12 = JOIN r11 WITH ApiGraphs::API::Node::getACall_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'hashClass', Lhs.1 'node', Lhs.0 'this'
6478480 ~4646%     {6} r13 = JOIN r12 WITH ApiGraphs::API::Impl::edge#2#fff_201#join_rhs ON FIRST 1 OUTPUT "hashlib", Rhs.1, Lhs.1 'node', Lhs.2 'this', Lhs.0 'hashClass', Rhs.2
1410    ~4693%     {5} r14 = JOIN r13 WITH ApiGraphs::API::Impl::MkModuleImport#ff@staged_ext ON FIRST 2 OUTPUT Lhs.2 'node', Lhs.3 'this', Lhs.4 'hashClass', Lhs.5, InverseAppend("getMember(\"","\")",Lhs.5)
1222    ~4540%     {5} r15 = SELECT r14 ON In.4 'hashName' != "new"
1222    ~4540%     {4} r16 = SCAN r15 OUTPUT In.1 'this', In.4 'hashName', In.2 'hashClass', In.0 'node'
```

By factoring out the insides, the biggest iteration now looks like

```
[2021-07-12 14:17:36] (0s) Tuple counts for Stdlib::Stdlib::HashlibDataPassedToHashClass#class#ffff/4@85bb21:
148810 ~0%     {2} r1 = JOIN DataFlowPublic::CallCfgNode#class#ff#shared WITH project#DataFlowPublic::CallCfgNode::getArg_dispred#fff ON FIRST 1 OUTPUT Lhs.1 'node', Lhs.0 'this'
148810 ~0%     {2} r2 = JOIN r1 WITH Stdlib::Stdlib::hashlibMember#ff#nonempty CARTESIAN PRODUCT OUTPUT Lhs.1 'this', Lhs.0 'node'

22     ~0%     {2} r3 = JOIN DataFlowPublic::CallCfgNode#class#ff#shared WITH project#DataFlowPublic::CallCfgNode::getArgByName_dispred#fff ON FIRST 1 OUTPUT Lhs.1 'node', Lhs.0 'this'
22     ~0%     {2} r4 = JOIN r3 WITH Stdlib::Stdlib::hashlibMember#ff#nonempty CARTESIAN PRODUCT OUTPUT Lhs.1 'this', Lhs.0 'node'

148832 ~0%     {2} r5 = r2 UNION r4
110933 ~2%     {3} r6 = JOIN r5 WITH ApiGraphs::API::Node::getACall_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'hashClass', Lhs.1 'node', Lhs.0 'this'
26     ~0%     {4} r7 = JOIN r6 WITH Stdlib::Stdlib::hashlibMember#ff_10#join_rhs ON FIRST 1 OUTPUT Lhs.2 'this', Rhs.1 'hashName', Lhs.0 'hashClass', Lhs.1 'node'
               return r7
```

(The tuple counts themselves are not directly comparable.)
2021-07-12 14:22:21 +00:00
Rasmus Wriedt Larsen
47f5c977cf Python: Port py/stack-trace-exposure to use proper source/sink customization 2021-07-12 16:22:10 +02:00
Rasmus Wriedt Larsen
934007c811 Python: Port py/unsafe-deserialization to use proper source/sink customization 2021-07-12 16:22:10 +02:00
Rasmus Wriedt Larsen
7c71223f7f Python: Port py/url-redirection to use proper source/sink customization 2021-07-12 16:22:10 +02:00
Rasmus Wriedt Larsen
b4c0b1b525 Python: Port py/reflective-xss to use proper source/sink customization 2021-07-12 16:22:10 +02:00
Rasmus Wriedt Larsen
62e4445f45 Python: Port py/command-line-injection to use proper source/sink customization 2021-07-12 16:22:10 +02:00
Rasmus Wriedt Larsen
7f53781ba7 Python: Port py/code-injection to use proper source/sink customization 2021-07-12 16:22:10 +02:00
Rasmus Wriedt Larsen
0be280c608 Python: Port py/sql-injection to use proper source/sink customization 2021-07-12 16:22:10 +02:00
Erik Krogh Kristensen
c4f5009917 make explicit calls to member predicates
Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
2021-07-12 14:22:08 +02:00
Taus
1e79091120 Python: Fix typo 2021-07-12 11:33:52 +00:00
Taus
32062d83ad Python: Make deprecation warning more prominent 2021-07-12 10:00:21 +00:00
Erik Krogh Kristensen
440e4b9a92 enable unicode support in the Python ReDoS query 2021-07-11 21:28:40 +02:00
haby0
e8d0827916 Add tornado source 2021-07-05 10:42:15 +08:00
Taus
a65d40e36f Merge branch 'main' into python-add-typetrackingnode 2021-07-02 20:55:37 +02:00
Taus
55d822cc56 Python: Add TypeTrackingNode
Splits `ModuleVariableNode` away from `LocalSourceNode`, instead
creating a class `TypeTrackingNode` that encapsulates both of these.

This means we no longer have module variable nodes as part of
`LocalSourceNode` (which is good, since they have no "local" aspect to
them), and hence we can have `LocalSourceNode` inherit directly from
`ExprNode` (which makes the API a bit nicer).

Unfortunately these are breaking changes, so we can't actually fulfil
the above two desiderata until the `track` and `backtrack` methods on
`LocalSourceNode` have been fully deprecated. For this reason, we
preserve the present implementation of `LocalSourceNode`, and instead
lay the foundation for switching over in the future, by deprecating
`track` and `backtrack` on `LocalSourceNode`.
2021-07-02 18:00:33 +00:00
CodeQL CI
1d56748eed Merge pull request #6200 from yoff/pythonJS-make-expbtlib-private
Approved by RasmusWL, esbena
2021-07-02 09:09:18 -07:00
CodeQL CI
a25933aa56 Merge pull request #5926 from RasmusWL/small-cleanups
Approved by tausbn
2021-07-02 04:59:54 -07:00
haby0
b866f1b21e Add CWE-348 ClientSuppliedIpUsedInSecurityCheck 2021-07-02 19:30:33 +08:00
Rasmus Wriedt Larsen
81fab487a4 Python: Apply suggestions from code review
Co-authored-by: Taus <tausbn@github.com>
2021-07-02 13:27:41 +02:00
Rasmus Wriedt Larsen
22c155687e Python: Fix code after removing getPostUpdateNode 2021-07-02 13:25:25 +02:00