Commit Graph

432 Commits

Author SHA1 Message Date
Rasmus Wriedt Larsen
f8a51bb994 Python: ORM: Add data-flow steps for Django ORM
Added dummy-whitespace to `orm_security_tests.py` so it would be
possible to see what the reflected XSS results are in the diff
2022-02-28 16:38:40 +01:00
Rasmus Wriedt Larsen
ef39968a56 Python: ORM: Add data-flow plumbing for ORM modeling
The idea is that we will do `save ==> synthetic`
and `synthetic ==> load`, so we don't need to do CP between save/load.

This setup with synthetic node in the middle, also allows for a limited
amount of the field-flow we can do with real flow-summary support.
2022-02-28 16:38:40 +01:00
yoff
d953382df9 Merge pull request #7807 from RasmusWL/dataflow-improvements
Python: Dataflow improvements
2022-02-28 16:24:00 +01:00
yoff
8b926f6859 Merge pull request #7873 from RasmusWL/fix-attribute-taint
Python: Fix attribute taint
2022-02-25 15:02:24 +01:00
Rasmus Wriedt Larsen
d2cd77aefb Merge branch 'main' into dataflow-improvements 2022-02-21 14:49:40 +01:00
Rasmus Wriedt Larsen
b59ab7f5f3 Merge branch 'main' into python/promote-log-injection 2022-02-21 09:59:31 +01:00
Rasmus Wriedt Larsen
67ca14876a Python: Apply suggestions from code review
Co-authored-by: yoff <lerchedahl@gmail.com>
2022-02-18 13:47:07 +01:00
Rasmus Wriedt Larsen
62d4bb50a5 Python: Autoformat
Trailing whitespace is a bit too easy with the ```suggestions through
the UI :|
2022-02-15 10:38:52 +01:00
Rasmus Wriedt Larsen
5a90214ece Merge pull request #7783 from yoff/python/promote-ldap-injection
Python: promote LDAP injection query
2022-02-15 10:24:18 +01:00
yoff
de5b3a272d Merge pull request #7660 from RasmusWL/deprecate-old-modeling
Python: Deprecate old points-to based modeling
2022-02-14 19:48:03 +01:00
yoff
3a995ec1b1 Update python/ql/lib/semmle/python/security/dataflow/LogInjectionCustomizations.qll
Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
2022-02-14 16:08:44 +01:00
yoff
62598c0fd1 Update python/ql/lib/semmle/python/security/dataflow/LogInjectionCustomizations.qll
Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
2022-02-14 16:07:40 +01:00
Rasmus Lerchedahl Petersen
84447e4710 python: more detailed alert message 2022-02-14 11:55:07 +01:00
Rasmus Lerchedahl Petersen
bd14adefa0 python: add apologetic comment 2022-02-14 11:37:46 +01:00
Taus
d7f30de5b0 Merge pull request #7874 from RasmusWL/set-store-step
Python: Fix setStoreStep to use `SetElementContent`
2022-02-11 12:50:02 +01:00
yoff
f21ac04285 Update python/ql/lib/semmle/python/frameworks/Stdlib.qll
Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
2022-02-09 09:22:31 +01:00
Rasmus Wriedt Larsen
62702d0ca9 Python: Fix setStoreStep to use SetElementContent 2022-02-07 13:18:36 +01:00
Rasmus Wriedt Larsen
b276b2d48c Python: Clean up taint steps for attributes 2022-02-07 13:12:31 +01:00
yoff
182c62f5c3 Merge pull request #7838 from tausbn/python-fix-charset-performance-problem
Python: Fix performance issue in `charSet`
2022-02-04 14:18:13 +01:00
Taus
67be20f368 Python: Remove implied inequalities
Also gets rid of `inner_end`, since we're already doing `end - 1 = ...`
in the other fix (and so this is more consistent).
2022-02-04 12:46:06 +00:00
Rasmus Wriedt Larsen
438a01e911 Python: Deprecate old bottle points-to extension 2022-02-04 12:02:09 +01:00
Rasmus Wriedt Larsen
c9e36aaf72 Python: Fix deprecated deprecated 2022-02-04 12:02:09 +01:00
Rasmus Wriedt Larsen
84fdd8a739 Python: Add non-deprecated httpVerb to Concepts 2022-02-04 12:02:09 +01:00
Rasmus Wriedt Larsen
5a032d6f84 Python: deprecate old taint-tracking related predicates 2022-02-04 12:02:08 +01:00
Rasmus Wriedt Larsen
dba6b60c80 Python: Deprecate old library modeling 2022-02-04 12:02:08 +01:00
Rasmus Wriedt Larsen
a40fdf7a7c Python: Deprecate old web modeling 2022-02-04 12:02:08 +01:00
Rasmus Wriedt Larsen
b2ce0fcb72 Python: Add post-update nodes to args of unresolved calls
Besides solving the problem with `setattr`, it also solved some old
problems with json library modeling (yay).
2022-02-04 11:51:53 +01:00
Taus
22aa4c9379 Python: Fix performance issue in charSet
Observed on `mozilla/bugbug` on the 2.8.0 CLI branch, we had the
following line in the timing report:
```
FullServerSideRequestForgery.ql-17:regex::RegexString::charSet_dispred#fff#antijoin_rhs ............... 1m13s
```

Inspecting the logs, we see the following join:

```
(644s) Tuple counts for regex::RegexString::charSet_dispred#fff#antijoin_rhs/5@f295d1bk after 1m13s:
1         ~0%         {1} r1 = CONSTANT(unique string)["]"]
2389      ~4%         {3} r2 = JOIN r1 WITH regex::RegexString::nonEscapedCharAt_dispred#fff_201#join_rhs ON FIRST 1 OUTPUT Rhs.1 'arg0', Rhs.2 'arg1', (Rhs.2 'arg1' + 1)
668873    ~0%         {6} r3 = JOIN r2 WITH regex::RegexString::char_set_start_dispred#fff ON FIRST 1 OUTPUT Lhs.0 'arg0', "]", Lhs.1 'arg1', Lhs.2 'arg2', Rhs.1 'arg3', Rhs.2 'arg4'
537501371 ~4%         {7} r4 = JOIN r3 WITH regex::RegexString::nonEscapedCharAt_dispred#fff_021#join_rhs ON FIRST 2 OUTPUT Lhs.0 'arg0', Lhs.2 'arg1', Lhs.3 'arg2', Lhs.4 'arg3', Lhs.5 'arg4', "]", Rhs.2
269085087 ~0%         {7} r5 = SELECT r4 ON In.6 > In.4 'arg4'
89583155  ~3%         {7} r6 = SELECT r5 ON In.6 < In.1 'arg1'
89583155  ~26634%     {5} r7 = SCAN r6 OUTPUT In.0 'arg0', In.1 'arg1', In.2 'arg2', In.3 'arg3', In.4 'arg4'
                    return r7
```
Now, this is problematic not just because of the large intermediary join
but also because of the large number of tuples being materialised at the
end. The culprit in this case turns out to be this bit of `charSet`:
```
not exists(int mid | this.nonEscapedCharAt(mid) = "]" | mid > inner_start and mid < inner_end)
```

Rewriting this to instead look for the minimum index at which a `]`
appears resulted in a much nicer join.

I also fixed up a similar issue surrounding the `\N` unicode escape.
Not that I think this will necessarily be relevant, but the `min`-based
solution is more robust either way.
2022-02-03 20:42:04 +00:00
Rasmus Wriedt Larsen
5cd08b8e8c Python: Ignore .isAbsent() from ClassCall
This means that DataFlowCall is only for resolvable calls, which might not seem
like a big thing in itself, but enables the next commit to actually work :P
2022-02-03 14:58:30 +01:00
Rasmus Wriedt Larsen
48aa07d67a Python: Handle SyntheticPreUpdateNode in PrintNode 2022-02-03 14:58:30 +01:00
Rasmus Wriedt Larsen
49b5d60229 Python: Use AttrRead/AttrWrite for attr read/store steps
Note that this doesn't actually add the desired flow from setattr, due
to missing post-update note. This will be fixed in later commit.
2022-02-03 14:58:30 +01:00
Rasmus Wriedt Larsen
5774459dfb Python: restrict AttrRead with AttrNode.isLoad() 2022-02-03 14:58:23 +01:00
Rasmus Wriedt Larsen
e2de0e61ca Python: Remove RegExpTerm from PrintAST
Since this caused bad performance (as we had to evaluate points-to).

Fixes https://github.com/github/codeql/issues/6964

This approach was motivated by the comment on the issue from @tausbn:

> We discussed this internally in the CodeQL Python team, and have
> agreed that the best approach for now is to disable the printing of
> regex ASTs.

I tried to keep our RegExpTerm logic, but doing the fix below did not
work, and still evaluated RegExpTerm :| I guess we will just have to
revert this PR if we want it back

```diff
   TRegExpTermNode(RegExpTerm term) {
+    none() and
     exists(StrConst str | term.getRootTerm() = getParsedRegExp(str) and shouldPrint(str, _))
   }
```
2022-02-03 14:22:14 +01:00
Erik Krogh Kristensen
e93c46ad31 Merge pull request #7811 from erik-krogh/pyApiIpa
Python: refactor API-graph labels to an IPA type
2022-02-03 12:31:39 +01:00
Tom Hvitved
6bb71f051b Merge pull request #7791 from hvitved/dataflow/inline-local-flow-star
Data flow: Inline `local(Expr|Instruction)?(Flow|Taint)`
2022-02-03 09:02:43 +01:00
Arthur Baars
33b97f3e0c Update synchronized files 2022-02-02 13:30:45 +01:00
CodeQL CI
7bb11b837c Merge pull request #7788 from yoff/python/remove-library-annotation
Approved by tausbn
2022-02-02 03:51:00 -08:00
Rasmus Wriedt Larsen
51bc6dcf7e Python: Add attributeClearStep 2022-02-02 11:19:35 +01:00
Rasmus Lerchedahl Petersen
4ad99d9299 python: add missing QlDoc 2022-02-02 09:14:21 +01:00
Rasmus Lerchedahl Petersen
448e0785c2 python: logging.root is not a call 2022-02-02 09:04:16 +01:00
Erik Krogh Kristensen
e06f6529f1 refactor API-graph labels to an IPA type 2022-02-01 17:32:08 +01:00
Rasmus Lerchedahl Petersen
1e2428cb6b python: create LDAP module in Concepts 2022-02-01 14:39:58 +01:00
Rasmus Lerchedahl Petersen
c2cd58edc4 python: rewrite to separate configurations
source nodes get duplicated, so perhaps flow states
are actually better for performance?
2022-02-01 14:36:11 +01:00
Rasmus Lerchedahl Petersen
c587084758 python: use standard InstanceSource construction 2022-02-01 13:31:16 +01:00
Rasmus Wriedt Larsen
f7a0b17ed6 Merge pull request #7687 from yoff/python/PathInjection-FlowState
python: Rewrite path injection query to use flow state
2022-02-01 11:33:37 +01:00
Rasmus Lerchedahl Petersen
119a7e4f34 python: provide links for Flask 2022-02-01 10:55:45 +01:00
Rasmus Lerchedahl Petersen
7511b33512 python: "command" -> "log" 2022-02-01 10:23:16 +01:00
yoff
45f0bfd8f0 Apply suggestions from code review
Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
2022-02-01 10:06:37 +01:00
yoff
c03f89d712 Apply suggestions from code review
Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
2022-02-01 10:04:26 +01:00
Taus
4a29095e3b Python: Fix bad join order in TPythonTuple
TL;DR: Something introduced the following bad join order:
```
(227s) Tuple counts for dom#TObject::TPythonTuple#ff/2@i2#8f58670w after 3m46s:
25000      ~0%     {2} r1 = SCAN PointsToContext::PointsToContext::appliesToScope_dispred#ff#prev_delta OUTPUT In.1, In.0 'context'
24000      ~1%     {2} r2 = JOIN r1 WITH @py_scope#f ON FIRST 1 OUTPUT Lhs.1 'context', Lhs.0
1076876712 ~6%     {3} r3 = JOIN r2 WITH Flow::TupleNode#class#f CARTESIAN PRODUCT OUTPUT Rhs.0, Lhs.0 'context', Lhs.1
870129666  ~0%     {3} r4 = JOIN r3 WITH Flow::ControlFlowNode::isLoad_dispred#f ON FIRST 1 OUTPUT Lhs.1 'context', Lhs.2, Lhs.0 'origin'
870129000  ~0%     {3} r5 = r4 AND NOT dom#TObject::TPythonTuple#ff#prev(Lhs.2 'origin', Lhs.0 'context')
870129000  ~1%     {3} r6 = SCAN r5 OUTPUT In.2 'origin', In.1, In.0 'context'
9000       ~0%     {2} r7 = JOIN r6 WITH Flow::ControlFlowNode::getScope_dispred#ff ON FIRST 2 OUTPUT Lhs.0 'origin', Lhs.2 'context'
                    return r7
```
(...the above being the tuple counts _at the point when I cancelled the
query_!)

Rewriting the code to force a join between `TupleNode#class` and
`getScope` results in the following join orders:

```
(0s) Tuple counts for TObject::scope_loads_tuplenode#ff/2@b3cf0bo5 after 13ms:
37369 ~3%     {1} r1 = JOIN Flow::TupleNode#class#f WITH Flow::ControlFlowNode::isLoad_dispred#f ON FIRST 1 OUTPUT Lhs.0 'origin'
37369 ~3%     {2} r2 = JOIN r1 WITH Flow::ControlFlowNode::getScope_dispred#ff ON FIRST 1 OUTPUT Rhs.1 's', Lhs.0 'origin'
            return r2
```
and
```
(78s) Tuple counts for dom#TObject::TPythonTuple#ff/2@i53#121c440w after 6ms:
34736 ~3%     {2} r1 = SCAN PointsToContext::PointsToContext::appliesToScope_dispred#ff#prev_delta OUTPUT In.1, In.0 'context'
7370  ~5%     {2} r2 = JOIN r1 WITH TObject::scope_loads_tuplenode#ff ON FIRST 1 OUTPUT Lhs.1 'context', Rhs.1 'origin'
7370  ~5%     {2} r3 = r2 AND NOT dom#TObject::TPythonTuple#ff#prev(Lhs.1 'origin', Lhs.0 'context')
7370  ~1%     {2} r4 = SCAN r3 OUTPUT In.1 'origin', In.0 'context'
            return r4
```
the latter being the largest iteration of `dom#TPythonTuple` throughout
the log.

No other major performance issues were observed.
2022-01-31 16:59:50 +00:00