Commit Graph

4097 Commits

Author SHA1 Message Date
Rasmus Wriedt Larsen
6aabbf0b9a Python: Add some alert meta queries
Intended for use with dca
2021-07-21 14:53:01 +02:00
Taus
6591a86aad Python: Add test cases
I debated whether to add a
`MISSING: use=moduleImport("builtins").getMember("print").getReturn()`
annotation to the last line.

Ultimately, I decided to add it, as we likely _do_ want this information
to propagate into inner functions (even if the value of `var2` may
change before `func4` is called).
2021-07-20 13:26:35 +00:00
Taus
e53b86fbbc Python: Apply suggestions from code review
Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
2021-07-20 15:19:45 +02:00
Taus
bbcbcefedc Python: Add false negative test case. 2021-07-20 12:54:06 +00:00
Taus
233ae5a54b Python: Fix FP in py/unused-local-variable
This is only a temporary fix, as indicated by the TODO comment.

The real underlying issue is the fact that `isUnused` is defined in
terms of the underlying SSA variables (as these are only created
for variables that are actually used), and the fact that annotated
assignments are always considered to redefine their targets, which may
not actually be the case.

Thus, the correct fix would be to change the extractor to _disregard_
mere type annotations for the purposes of figuring out whether an
SSA variable should be created or not.

However, in the short term the present fix is likely sufficient.
2021-07-20 12:13:44 +00:00
Taus
8b3fa789da Python: Add AnnAssign DefinitionNode
This was a source of false positives for the
`py/uninitialized-local-variable` query, as exemplified by the test
case.
2021-07-20 11:57:26 +00:00
Taus
f91e826781 Python: Add test case 2021-07-20 11:57:12 +00:00
Porcuiney Hairs
c6c925d67a Python : Improve Xpath Injection Query 2021-07-20 03:31:30 +05:30
Sam Havron
733e5b45bf Fix qhelp typo in RequestWithoutValidation 2021-07-19 16:01:06 -04:00
Rasmus Wriedt Larsen
5249591747 Python: Fix test folder for InsecureProtocol 2021-07-19 16:57:00 +02:00
Rasmus Wriedt Larsen
5939128a76 Python: Fix test folder for InsecureDefaultProtocol
it was named wrong before. whoops.
2021-07-19 16:56:07 +02:00
Rasmus Wriedt Larsen
77021ae119 Python: Restructure security tests to contain query name
We were mixing between things, so this is just to keep things
consistent. Even though it's not strictly needed for all queries,
it does look nice I think
2021-07-19 16:54:34 +02:00
Rasmus Wriedt Larsen
da021feb8b Python: Move py/incomplete-hostname-regexp tests to own folder 2021-07-19 16:48:21 +02:00
Rasmus Wriedt Larsen
7939a1372e Python: Move Jinja2WithoutEscaping tests to own folder 2021-07-19 16:44:41 +02:00
Rasmus Wriedt Larsen
c9087b2e1b Python: Minor fixup to snippet
Spotted by @tausbn 🎉
2021-07-19 10:19:23 +02:00
thank_you
9e01338500 Query only vulnerable methods 2021-07-18 17:13:10 -04:00
Taus
4f3f93f267 Python: Autoformat 2021-07-16 12:22:24 +00:00
Taus
3fd0ec74f0 Python: Deprecate importNode
Unsurprisingly, the only thing affected by this was the `import-helper`
tests. These have lost all of the results relating to `ImportMember`s,
but apart from that the underlying behaviour should be the same.

I also limited the test to only `CfgNode`s, as a bunch of `EssaNode`s
suddenly appeared when I switched to API graphs.

Finally, I used `API::moduleImport` with a dotted name in the type
tracking tests. This goes against the API graphs interface, but I think
it's more correct for this use case, as these type trackers are doing
the "module attribute lookup" bit manually.
2021-07-16 11:38:30 +00:00
Rasmus Wriedt Larsen
5e193ee8da Python: Add more snippets 2021-07-15 18:56:49 +02:00
Rasmus Wriedt Larsen
a07de3faae Merge branch 'main' into emptyRedos 2021-07-15 18:21:29 +02:00
jorgectf
6f09b95019 Update .expected 2021-07-15 17:16:29 +02:00
CodeQL CI
d282f6a356 Merge pull request #6218 from tausbn/python-add-typetrackingnode
Approved by RasmusWL
2021-07-15 07:04:50 -07:00
Taus
dd03d8102b Merge pull request #6300 from RasmusWL/redos-tests
Python: Fix `py/polynomial-redos`
2021-07-15 15:59:01 +02:00
Rasmus Wriedt Larsen
900cbc9a2f Merge pull request #6265 from tausbn/python-performance-fixes
Python: Fix a few performance issues.
2021-07-15 14:19:37 +02:00
Rasmus Wriedt Larsen
a5834c4d78 Python: Fix py/polynomial-redos 2021-07-15 14:16:19 +02:00
Rasmus Wriedt Larsen
76caf43b54 Python: Add tests for py/polynomial-redos 2021-07-15 14:15:44 +02:00
Rasmus Wriedt Larsen
1be0dc0876 Python: Move test for ReDoS 2021-07-15 14:15:24 +02:00
Anders Schack-Mulligen
8ccdd4fb9f Merge pull request #6211 from aschackmull/dataflow/refactor-call-context-check
Dataflow: Refactor call context check
2021-07-15 12:27:23 +02:00
Erik Krogh Kristensen
383b5f2ff2 implement RegExpSubPattern.getOperand in the Python regexp implementation 2021-07-15 09:41:53 +02:00
Erik Krogh Kristensen
de8f64c5be sync with python 2021-07-14 23:40:06 +02:00
Taus
fb57c5f6f0 Merge pull request #6143 from RasmusWL/concepts-private-import-python
Python: Make `import python` private in Concepts.qll
2021-07-14 17:49:06 +02:00
Taus
5c5ee85332 Merge pull request #6122 from RasmusWL/mention-mysqlclient
Python: Mention modeling of `mysqlclient` PyPI package
2021-07-14 17:48:40 +02:00
Taus
30d61045d2 Python: Mention nameIndicatesSensitiveData
Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
2021-07-14 17:33:39 +02:00
Taus
5a9fca48e8 Python: Fix ExceptStmt::getType
We were not supporting `except` statements handling multiple exception
types (specified as a tuple) correctly, instead just returning the
tuple itself as the "type" (which makes little sense).

To fix this, we explicitly extract the elements of this node, in the
case where it _is_ a tuple.

This is a change that can potentially affect many queries (as `getType`
is used in quite a few places), so some care should be taken to
ensure that this does not adversely affect performance.
2021-07-14 14:03:49 +00:00
Taus
ec9063b4a5 Python: Add test case for github/codeql#6227 2021-07-14 13:52:32 +00:00
Taus
2bb44d49d9 Python: Perform more deduplication
This cut the evaluation time on `django` down from 1.2 seconds to ~0.8
seconds (but the impact will likely be greater on bigger projects).
2021-07-14 13:38:05 +00:00
Taus
09993406f1 Python: Add explanatory QLDoc comment 2021-07-14 10:42:07 +00:00
Anders Schack-Mulligen
0ccb213ec5 Dataflow: Sync. 2021-07-14 10:36:09 +02:00
CodeQL CI
f6f7020388 Merge pull request #6250 from erik-krogh/python-redos-unicode
Approved by RasmusWL
2021-07-14 01:09:26 -07:00
Taus
c3789811c8 Python: Support import * in API graphs 2021-07-13 18:22:51 +00:00
Taus
8b6b4dde69 Python: Refactor built-ins logic
This will make it possible to reuse for names defined in `import *`.
2021-07-13 18:20:25 +00:00
${sleep,5}
51a6140258 Change variable name to correct sanitized input variable
Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
2021-07-13 14:04:06 -04:00
Taus
df8a6b984a Python: Add import * tests
Moves the current test out of `test.py`, as otherwise any unknown global
(like, say, `sink`) would _also_ be considered to be something
potentially defined in `unknown`.
2021-07-13 17:46:59 +00:00
Taus
6aec7f2c49 Merge pull request #6264 from RasmusWL/customization-files-for-path-problems
Python: Provide proper source/sink customization for most path queries
2021-07-13 15:09:33 +02:00
Rasmus Wriedt Larsen
6f8969a55e Python: Add change-note 2021-07-13 14:39:44 +02:00
Rasmus Wriedt Larsen
9ed61e7663 Python: Port py/polynomial-redos to use proper source/sink customization
I noticed the configuration/customization files are in the `performance`
folder in JS, but I just kept them in place, since that seems correct to
me.
2021-07-13 14:39:44 +02:00
Rasmus Wriedt Larsen
cea2f82be9 Python: Port py/path-injection to use proper source/sink customization 2021-07-13 14:09:02 +02:00
Rasmus Wriedt Larsen
bf214ac3bb Python: Apply suggestions from code review
Co-authored-by: Taus <tausbn@github.com>
2021-07-13 13:41:26 +02:00
Rasmus Wriedt Larsen
1a59c9b64a Merge pull request #6204 from tausbn/python-ensmallen-localsourcenode
Python: Clean up `LocalSourceNode` charpred
2021-07-13 13:27:38 +02:00
Taus
1decf23785 Python: Fix bad join order for sensitive data
Not the prettiest of solutions, but it does the job. Basically, we were
calculating (and re-calculating) the same big relation between strings
and regexes and then checking whether the latter matched the former.

This resulted in tuple counts like the following:

```
[2021-07-12 16:09:24] (12s) Tuple counts for SensitiveDataSources::SensitiveDataModeling::SensitiveVariableAssignment#class#ff#shared/4@7489c6:
4918074 ~0%     {4} r1 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH Flow::NameNode::getId_dispred#ff CARTESIAN PRODUCT OUTPUT Lhs.0 'arg0', Lhs.1 'arg1', Rhs.0, Rhs.1 'arg3'
2654    ~0%     {4} r2 = JOIN r1 WITH PRIMITIVE regexpMatch#bb ON Lhs.3 'arg3',Lhs.1 'arg1'
                return r2
```
(The above being just the bit that handles `DefinitionNode` in
`SensitiveVariableAssignment`, and taking 12 seconds to evaluate.)

By applying a bit of manual inlining and magic, this becomes somewhat
more manageable:

```
[2021-07-12 15:59:44] (1s) Tuple counts for SensitiveDataSources::SensitiveDataModeling::sensitiveString#ff/2@8830e2:
27671  ~2%      {3} r1 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveParameterName#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0

334012 ~2%      {3} r2 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveName#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0

361683 ~11%     {3} r3 = r1 UNION r2

154644 ~0%      {3} r4 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveFunctionName#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0

149198 ~1%      {3} r5 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveStrConst#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0

124257 ~5%      {3} r6 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveAttributeName#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0

273455 ~21%     {3} r7 = r5 UNION r6
428099 ~30%     {3} r8 = r4 UNION r7
789782 ~78%     {3} r9 = r3 UNION r8
1121   ~77%     {3} r10 = JOIN r9 WITH PRIMITIVE regexpMatch#bb ON Lhs.2 'result',Lhs.1
1121   ~70%     {2} r11 = SCAN r10 OUTPUT In.0 'classification', In.2 'result'
                return r11
```
(The above being the total for all the sensitive names we care about,
taking only 1.2 seconds to evaluate.)

Incidentally, you may wonder why this has _fewer_ results than before.
The answer is control flow splitting -- every sensitively-named
`DefinitionNode` would have been matched in isolation previously. By
pre-matching on just the names of these, we can subsequently join
against those names that are known to be sensitive, which is a much
faster operation.

(We also get the benefit of deduplicating the strings that are matched,
before actually performing the match, so if, say, an attribute name and
a variable name are identical, then we'll only match them once.)

We also exclude all docstrings as relevant string constants, as these
presumably don't actually flow anywhere.
2021-07-12 16:10:49 +00:00