Commit Graph

4539 Commits

Author SHA1 Message Date
Rasmus Wriedt Larsen
bd9b96e154 Merge pull request #7331 from tausbn/python-fix-bad-callsite-points-to-join
Python: Fix bad `callsite_points_to` join
2021-12-10 15:39:49 +01:00
Rasmus Wriedt Larsen
8ee020f79c Merge pull request #7332 from tausbn/python-fix-bad-scope-entry-points-to-join
Python: Fix bad `scope_entry_points_to` join
2021-12-10 15:33:13 +01:00
Taus
6d247bfdf9 Merge pull request #7330 from tausbn/python-fix-bad-adjacentuseuse-join
Python: Fix bad join in SSA
2021-12-09 20:55:45 +01:00
yoff
8e11c2c476 Merge pull request #7259 from RasmusWL/even-more-path-injection-sinks
Python: Add more path-injection sinks from `os` and `tempfile` modules
2021-12-09 14:46:41 +01:00
Anders Schack-Mulligen
38d0bb4a60 Merge pull request #7260 from hvitved/dataflow/argument-parameter-matching
Data flow: Introduce `ParameterPosition` and `ArgumentPosition`
2021-12-08 12:49:08 +01:00
Tom Hvitved
283173ad02 Address review comments 2021-12-08 11:26:44 +01:00
Tom Hvitved
490872173a Data flow: Sync files 2021-12-07 20:29:18 +01:00
Taus
e7c298d903 Python: Fix bad scope_entry_points_to join
From `pritomrajkhowa/LoopBound`:

```
Definitions.ql-7:PointsTo::PointsToInternal::scope_entry_points_to#ffff#antijoin_rhs#2 ........... 55.1s
```

specifically

```
(443s) Tuple counts for PointsTo::PointsToInternal::scope_entry_points_to#ffff#antijoin_rhs#2/3@74a7cart after 55.1s:
184070    ~0%        {3} r1 = JOIN PointsTo::PointsToInternal::scope_entry_points_to#ffff#shared#1 WITH Variables::GlobalVariable#class#f ON FIRST 1 OUTPUT Lhs.0 'arg2', Lhs.1 'arg0', Lhs.2 'arg1'
184070    ~0%        {3} r2 = STREAM DEDUP r1
919966523 ~2%        {4} r3 = JOIN r2 WITH Essa::EssaDefinition::getSourceVariable_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'arg0', Lhs.2 'arg1', Lhs.0 'arg2'
4281779   ~2293%     {3} r4 = JOIN r3 WITH Essa::EssaVariable::getScope_dispred#ff ON FIRST 2 OUTPUT Lhs.1 'arg0', Lhs.2 'arg1', Lhs.3 'arg2'
                    return r4
```

First, this is an `antijoin`, so there's likely some negation involved.
Also, there's mention of `GlobalVariable`, `getScope`, and
`getSourceVariable`, none of which appear in `scope_entry_points_to`, so
it's likely that something got inlined.

Taking a closer look at the predicates mentioned in the body, we spot
`undefined_variable` as a likely culprit.

Evaluating this predicate in isolation reveals that it's not terribly
big, so we could try just marking it with `pragma[noinline]` (I opted
for the slightly more solid `nomagic`) and see how that fares. I also
checked that `builtin_not_in_outer_scope` was similarly small, and
made that one un-inlineable as well.

The result? Well, I can't even show you. Both `scope_entry_points_to`
and `undefined_variable` are so fast that they don't appear in the
clause timing report (so they can at most take 3.5s each to evaluate, as
that is the smallest timing in the list).
2021-12-07 18:51:44 +00:00
Taus
b502ca1ea7 Python: Fix bad callsite_points_to join
From `pritomrajkhowa/LoopBound`:

```
Definitions.ql-7:PointsTo::InterProceduralPointsTo::callsite_points_to#ffff#join_rhs#3 ........... 5m53s
```

specifically

```
(767s) Tuple counts for PointsTo::InterProceduralPointsTo::callsite_points_to#ffff#join_rhs#3/3@f8f86764 after 5m53s:
832806293 ~0%     {4} r1 = JOIN PointsTo::InterProceduralPointsTo::callsite_points_to#ffff#shared#1 WITH PointsTo::InterProceduralPointsTo::var_at_exit#fff ON FIRST 1 OUTPUT Lhs.0, Lhs.1 'arg1', Rhs.1 'arg2', Rhs.2 'arg0'
832806293 ~0%     {3} r2 = JOIN r1 WITH Essa::TEssaNodeRefinement#ffff_03#join_rhs ON FIRST 2 OUTPUT Lhs.3 'arg0', Lhs.1 'arg1', Lhs.2 'arg2'
                return r2
```

This one is a bit tricky to unpack. Where is this `shared#1` defined?

```
EVALUATE NONRECURSIVE RELATION:
SYNTHETIC PointsTo::InterProceduralPointsTo::callsite_points_to#ffff#shared#1(int arg0, numbered_tuple arg1) :-
    SENTINEL PointsTo::InterProceduralPointsTo::callsite_points_to#ffff#shared
    SENTINEL Definitions::EscapingAssignmentGlobalVariable#class#f
    SENTINEL Essa::TEssaNodeRefinement#ffff_03#join_rhs
    {2} r1 = JOIN PointsTo::InterProceduralPointsTo::callsite_points_to#ffff#shared WITH Definitions::EscapingAssignmentGlobalVariable#class#f ON FIRST 1 OUTPUT Lhs.0 'arg0', Lhs.1 'arg1'
    {2} r2 = STREAM DEDUP r1
    {2} r3 = JOIN r2 WITH Essa::TEssaNodeRefinement#ffff_03#join_rhs ON FIRST 2 OUTPUT Lhs.0 'arg0', Lhs.1 'arg1'
    {2} r4 = STREAM DEDUP r3
    return r4
```

Looking at `callsite_points_to`, we see a likely candidate in `srcvar`.
It is guarded with an `instanceof` check for
`EscapingAssignmentGlobalVariable` (which lines up nicely with the
sentinel on its charpred) and `getSourceVariable` is just a projection
of `TEssaNodeRefinement`.

So let's try unbinding `srcvar` to prevent an early join.

The timing is now:

```
Definitions.ql-7:PointsTo::InterProceduralPointsTo::callsite_points_to#ffff ...................... 31.3s (2554 evaluations with max 101ms in PointsTo::InterProceduralPointsTo::callsite_points_to#ffff/4@i516#581fap5w)
```
(Showing the tuple counts doesn't make sense here, since all of the
`shared` and `join_rhs` predicates have been smooshed around.)
2021-12-07 18:25:53 +00:00
Taus
a716482c1f Python: Fix bad join in SSA
On `pritomrajkhowa/LoopBound`:

```
Definitions.ql-3:SsaCompute::SsaComputeImpl::AdjacentUsesImpl::adjacentUseUse#ff ................. 4m35s
```

specifically

```
(376s) Tuple counts for SsaCompute::SsaComputeImpl::AdjacentUsesImpl::adjacentUseUse#ff/2@be04e9kp after 4m58s:
388843     ~0%     {4} r1 = JOIN Essa::TPhiFunction#fff_2#join_rhs WITH SsaCompute::SsaComputeImpl::AdjacentUsesImpl::definesAt#ffff ON FIRST 1 OUTPUT Rhs.1, Lhs.0, Rhs.2, Rhs.3
3629812090 ~1%     {7} r2 = JOIN r1 WITH SsaCompute::SsaComputeImpl::variableUse#ffff ON FIRST 1 OUTPUT Lhs.0, Rhs.2, Rhs.3, Lhs.2, Lhs.3, Lhs.1, Rhs.1 'use1'
0          ~0%     {2} r3 = JOIN r2 WITH SsaCompute::SsaComputeImpl::AdjacentUsesImpl::adjacentVarRefs#fffff ON FIRST 5 OUTPUT Lhs.5, Lhs.6 'use1'
0          ~0%     {2} r4 = JOIN r3 WITH SsaCompute::SsaComputeImpl::AdjacentUsesImpl::firstUse#ff ON FIRST 1 OUTPUT Lhs.1 'use1', Rhs.1 'use2'

897141     ~0%     {2} r5 = SsaCompute::SsaComputeImpl::AdjacentUsesImpl::adjacentUseUseSameVar#ff UNION r4
                    return r5
```

Clearly we do not want to join on the variable so soon. So we unbind it
and get

```
(78s) Tuple counts for SsaCompute::SsaComputeImpl::AdjacentUsesImpl::adjacentUseUse#ff/2@40e0e6uv after 434ms:
3377959 ~2%     {4} r1 = SCAN SsaCompute::SsaComputeImpl::variableUse#ffff OUTPUT In.0, In.2, In.3, In.1 'use1'
1026855 ~2%     {4} r2 = JOIN r1 WITH SsaCompute::SsaComputeImpl::AdjacentUsesImpl::adjacentVarRefs#fffff ON FIRST 3 OUTPUT Lhs.0, Rhs.3, Rhs.4, Lhs.3 'use1'
129484  ~0%     {2} r3 = JOIN r2 WITH SsaCompute::SsaComputeImpl::AdjacentUsesImpl::definesAt#ffff_1230#join_rhs ON FIRST 3 OUTPUT Rhs.3, Lhs.3 'use1'
0       ~0%     {2} r4 = JOIN r3 WITH Essa::TPhiFunction#fff_2#join_rhs ON FIRST 1 OUTPUT Lhs.0, Lhs.1 'use1'
0       ~0%     {2} r5 = JOIN r4 WITH SsaCompute::SsaComputeImpl::AdjacentUsesImpl::firstUse#ff ON FIRST 1 OUTPUT Lhs.1 'use1', Rhs.1 'use2'

897141  ~0%     {2} r6 = SsaCompute::SsaComputeImpl::AdjacentUsesImpl::adjacentUseUseSameVar#ff UNION r5
                return r6
```
2021-12-07 18:19:47 +00:00
Taus
59bac04d8f Python: Fix Python 2 failures 2021-12-07 18:00:46 +00:00
Taus
ffc858e34d Python: Add missing file 2021-12-07 17:29:35 +00:00
Taus
7437cd4d85 Python: Fix syntax error locations 2021-12-07 16:51:33 +00:00
Erik Krogh Kristensen
3c59aa319e Merge pull request #7245 from erik-krogh/explicit-this-all-the-places
All langs: apply the explicit-this patch to all remaining code
2021-12-07 10:40:26 +01:00
Nick Rolfe
05415768c9 Merge remote-tracking branch 'origin/main' into nickrolfe/regexp_g_anchor 2021-12-02 12:07:13 +00:00
yoff
f10f053c36 Merge pull request #7228 from RasmusWL/fastapi-improvements
Python: FastAPI improvements
2021-12-02 12:58:53 +01:00
yoff
4609b2060a Merge pull request #7217 from RasmusWL/more-path-injection-fps
Python: Add `x in <var>` test for StringConstCompare
2021-12-02 12:35:33 +01:00
Anders Schack-Mulligen
cde853c095 Merge pull request #7270 from aschackmull/dataflow/stage2-refactor
Dataflow: Stage 2 refactor
2021-12-01 11:09:08 +01:00
Tom Hvitved
e410244fe0 Python: Implement ParameterPosition et al 2021-12-01 08:51:22 +01:00
Tom Hvitved
540ecf3c21 Data flow: Sync files 2021-11-30 15:20:20 +01:00
Anders Schack-Mulligen
3e914ef2ff Dataflow: Sync. 2021-11-30 13:52:52 +01:00
Rasmus Wriedt Larsen
d557f6fd2e Merge pull request #7101 from RasmusWL/python-ids
Python: Fix some query-ids
2021-11-29 16:12:57 +01:00
yoff
41b7922c7d Merge pull request #7089 from RasmusWL/redos-cwe-1333
Python/C#: Add CWE-1333 to redos queries
2021-11-29 16:09:39 +01:00
yoff
19802ccb73 Merge pull request #7046 from RasmusWL/django-own-json-response
Python: Add test with custom django json response (FP)
2021-11-29 16:05:20 +01:00
yoff
e63f9141e5 Merge pull request #7233 from RasmusWL/fix-cleartext-logging-cwes
JS/Py: Fix cleartext logging CWEs
2021-11-29 15:58:10 +01:00
Rasmus Wriedt Larsen
cbd7434a7e Python: Add modeling of tempfile module 2021-11-29 15:08:36 +01:00
Rasmus Wriedt Larsen
b68538376c Python: Add tests of tempfile module 2021-11-29 15:08:36 +01:00
Rasmus Wriedt Larsen
3bcf6d68ce Python: Refactor os FileSystemAccess change-note
I think it's more readable to have only one to cover all of these
changes, even though they came in through different PRs.
2021-11-29 15:08:18 +01:00
Rasmus Wriedt Larsen
58f92764f7 Python: Model more file access from os module 2021-11-29 14:54:02 +01:00
Rasmus Wriedt Larsen
fd23fa94a5 Python: Remove dubious fstat* modeling
These operate on file descriptors, and not on paths. file descriptors
doesn't fit into the rest of our modeling, so I would rather remove them
than to make it look like it's properly handled.

I also did not include any of the functions that work on file
descriptors when looking through all of `os`. So this keeps everything
consistent at least ;)
2021-11-29 14:54:02 +01:00
Rasmus Wriedt Larsen
e79b8f3e23 Python: Treat os.exec*, os.spawn*, and os.posix_spawn* as FileSystemAccess 2021-11-29 14:54:02 +01:00
Rasmus Wriedt Larsen
d2d5cce787 Python: Recognize keyword arguments for os.*spawn* calls 2021-11-29 14:54:02 +01:00
Rasmus Wriedt Larsen
14590436f9 Python: Expand tests for os.exec*, os.spawn*, and os.posix_spawn* 2021-11-29 14:54:02 +01:00
Rasmus Wriedt Larsen
50d3592ad3 Python: Add more complete tests of os module
I went through https://docs.python.org/3.10/library/os.html in order,
and added all the functions that works on paths.

`lstat` and `statvfs` were already modeled, but did not have any tests.
2021-11-29 14:54:02 +01:00
Rasmus Wriedt Larsen
a91208fd2c Python: Fix kwarg modeling for os.path.isdir 2021-11-29 14:54:02 +01:00
Rasmus Wriedt Larsen
36f14b31bc Python: Add explicit tests for kwargs
I also renamed the arguments to match what the keyword argument is
called. It doesn't matter too much for these specific tests, but for the
tests I'm about to add, it makes things a lot easier to get an overview
of.

Oh, and a test failure :O
2021-11-29 14:54:02 +01:00
Rasmus Wriedt Larsen
82602014ad Python: Minor refactor to use os.path.<func>
Since that's the idiomatic way to use this module
2021-11-29 14:54:02 +01:00
Erik Krogh Kristensen
6ff8d4de5c add all remaining explicit this 2021-11-26 13:50:10 +01:00
Anders Schack-Mulligen
00ee34c0a0 Merge pull request #7237 from hvitved/dataflow/consistency-config
Data flow: Introduce `ConsistencyConfiguration` class
2021-11-26 12:49:25 +01:00
Anders Schack-Mulligen
a06642944f Merge pull request #7232 from aschackmull/dataflow/perf
Data flow: Performance tuning
2021-11-25 15:01:01 +01:00
Tom Hvitved
6cb00992e8 Data flow: Introduce ConsistencyConfiguration class 2021-11-25 10:01:47 +01:00
Erik Krogh Kristensen
3bab8c6d1d Merge pull request #7173 from erik-krogh/getRubyInSync
JS/PY/RB: get ReDoSUtil in sync for ruby
2021-11-24 15:20:23 +01:00
Rasmus Wriedt Larsen
651a76c9ce Python: Add CWE-532 to CleartextLogging
Relevant for this query:

CWE-532: Insertion of Sensitive Information into Log File

> While logging all information may be helpful during development
> stages, it is important that logging levels be set appropriately
> before a product ships so that sensitive user data and system
> information are not accidentally exposed to potential attackers.

See https://cwe.mitre.org/data/definitions/532.html

JS also did this recently: https://github.com/github/codeql/pull/7103
2021-11-24 14:59:52 +01:00
Rasmus Wriedt Larsen
c05ffd4d00 JS/PY: Remove CWE-315 form CleartextLogging
Since it is not relevant for this query:

CWE-315: Cleartext Storage of Sensitive Information in a Cookie

See https://cwe.mitre.org/data/definitions/315.html
2021-11-24 14:59:18 +01:00
Anders Schack-Mulligen
7ca3407c86 Dataflow: Sync. 2021-11-24 14:43:00 +01:00
Rasmus Wriedt Larsen
7dde52ced2 Merge pull request #7131 from RasmusWL/wsgiref.simple_server
Python: Model `wsgiref.simple_server` applications
2021-11-24 14:22:23 +01:00
Rasmus Wriedt Larsen
2a5e0a3b77 Merge pull request #7145 from RasmusWL/remove-owasp-tags
Python/Ruby: Remove owasp tags
2021-11-24 13:56:48 +01:00
Rasmus Wriedt Larsen
e2652591a5 Python: Change perf fix PoorMansFunctionResolution
Thanks @yoff, this leaves us with the following evaluation, which looks
very close to the one in the other fix (but with cleaner implementation)
-- both at 688k max tuples (although numbers are not exactly the same).

```
[2021-11-24 13:48:40] (14s) Tuple counts for PoorMansFunctionResolution::getSimpleMethodReferenceWithinClass#ff/2@e5f05asv after 74ms:
                      47493  ~3%     {3} r1 = JOIN Class::Class::getAMethod_dispred#ff WITH py_Classes ON FIRST 1 OUTPUT Lhs.1, 0, Lhs.0
                      47335  ~0%     {2} r2 = JOIN r1 WITH AstGenerated::Function_::getArg_dispred#fff ON FIRST 2 OUTPUT Rhs.2, Lhs.2
                      46683  ~0%     {2} r3 = JOIN r2 WITH DataFlowPublic::ParameterNode::getParameter_dispred#fb_10#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1
                      259968 ~4%     {2} r4 = JOIN r3 WITH LocalSources::Cached::hasLocalSource#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1
                      161985 ~0%     {3} r5 = JOIN r4 WITH Attributes::AttrRef::accesses_dispred#bff_102#join_rhs ON FIRST 1 OUTPUT Rhs.1 'result', Lhs.1, Rhs.2
                      161985 ~2%     {3} r6 = JOIN r5 WITH Attributes::AttrRead#class#f ON FIRST 1 OUTPUT Lhs.2, Lhs.1, Lhs.0 'result'
                      688766 ~0%     {3} r7 = JOIN r6 WITH Function::Function::getName_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Lhs.1, Rhs.1 'func', Lhs.2 'result'
                      20928  ~0%     {2} r8 = JOIN r7 WITH Class::Class::getAMethod_dispred#ff ON FIRST 2 OUTPUT Lhs.1 'func', Lhs.2 'result'
                                     return r8
```
2021-11-24 13:52:05 +01:00
Rasmus Wriedt Larsen
1411804e58 Python: Allow custom fastapi.APIRouter subclasses 2021-11-24 13:46:38 +01:00
Rasmus Wriedt Larsen
47448d9efc Python: Apply suggestions from code review
Co-authored-by: yoff <lerchedahl@gmail.com>
2021-11-24 12:02:12 +01:00