Commit Graph

5487 Commits

Author SHA1 Message Date
Rasmus Wriedt Larsen
579de0c3f0 Python: Remove getResponse and do manual taint steps 2021-12-15 21:55:04 +01:00
Rasmus Wriedt Larsen
f8fc583af3 Python: client request: getUrl => getAUrlPart
I think `getUrl` is a bit too misleading, since from the name, I would
only ever expect ONE result for one request being made.

`getAUrlPart` captures that there could be multiple results, and that
they might not constitute a whole URl.

Which is the same naming I used when I tried to model this a long time ago
a80860cdc6/python/ql/lib/semmle/python/web/Http.qll (L102-L111)
2021-12-15 21:55:04 +01:00
Rasmus Wriedt Larsen
6f81685f48 Python: Add modeling of http.client.HTTPResponse 2021-12-15 21:55:04 +01:00
Rasmus Wriedt Larsen
a5bae30d81 Python: Add tests of http.client.HTTPResponse 2021-12-15 20:39:46 +01:00
Sam Partington
db7b3bc136 Remove experimental tag from non-ATM queries 2021-12-15 16:17:14 +00:00
github-actions[bot]
59da2cdf69 Release preparation for version 2.7.4 2021-12-14 21:35:09 +00:00
Dave Bartolomeo
fa40d59332 Move older change notes to old-change-notes
Now that change notes are per-package, new change notes should be created in the `change-notes` folder under the affected pack (e.g., `cpp/ql/src/change-notes` for C++ query change notes. I've moved all of the change note files that were added before we started publishing them in packs to an `old-change-notes` directory under each language, to reduce the temptation to add new change notes there.

I'm working on a document to describe how and when to create change notes for packs separately.
2021-12-14 12:35:04 -05:00
Dave Bartolomeo
a62f181d42 Move new change notes to appropriate packs 2021-12-14 12:05:15 -05:00
Andrew Eisenberg
0669ef505e Fix semver for upgrades references
Ensure the version range is flexible enough to handle
future version changes.
2021-12-13 09:03:33 -08:00
Rasmus Wriedt Larsen
cf2ee0672f Python: Model requests Responses 2021-12-13 15:09:46 +01:00
Rasmus Wriedt Larsen
35cba17642 Python: Consider taint of client http requests 2021-12-13 14:56:16 +01:00
Rasmus Wriedt Larsen
b68d280129 Python: Add modeling of requests 2021-12-13 14:56:16 +01:00
Rasmus Wriedt Larsen
1ff56d5143 Python: Add tests of requests
Also adjusts test slightly. Writing
`clientRequestDisablesCertValidation=False` to mean that certificate
validation was disabled by the `False` expression is just confusing, as
it easily reads as _certificate validate was NOT disabled_ :|

The new one ties to each request that is being made, which seems like
the right setup.
2021-12-13 14:07:32 +01:00
Rasmus Wriedt Larsen
7bf285a52e Python: Alter disablesCertificateValidation to fit our needs
For the snippet below, our current query is able to show _why_ we
consider `var` to be a falsey value that would disable SSL/TLS
verification. I'm not sure we're going to need the part that Ruby did,
for being able to specify _where_ the verification was removed, but
we'll see.

```
requests.get(url, verify=var)
```
2021-12-13 11:37:12 +01:00
Rasmus Wriedt Larsen
08f6d1ab80 Python: Clearer sourceType for client response body 2021-12-13 11:24:38 +01:00
Rasmus Wriedt Larsen
5de79b4ffe Python: Add HTTP::Client::Request concept
Taken from Ruby, except that `getURL` member predicate was changed to
`getUrl` to keep consistency with the rest of our concepts, and stick
to our naming convention.
2021-12-13 11:09:09 +01:00
Rasmus Wriedt Larsen
1e45fa9ed4 JS/Py/Ruby: Add more CWEs to bad-tag-filter queries
CWE-185: Incorrect Regular Expression

The software specifies a regular expression in a way that causes data to
be improperly matched or compared.

https://cwe.mitre.org/data/definitions/185.html

CWE-186: Overly Restrictive Regular Expression

> A regular expression is overly restrictive, which prevents dangerous values from being detected.
>
> (...) [this CWE] is about a regular expression that does not match all
> values that are intended. (...)

https://cwe.mitre.org/data/definitions/186.html

From my understanding,
CWE-625: Permissive Regular Expression, is not applicable. (since this
is about accepting a regex match where there should not be a match).
2021-12-13 10:23:24 +01:00
liangjinhuang
77b5f422ba change PasswordFnSink to RandomFnSink 2021-12-11 12:31:20 +08:00
Andrew Eisenberg
66c1629974 Merge pull request #7285 from github/post-release-prep-2.7.3-ddd4ccbb
Post-release preparation 2.7.3
2021-12-10 09:59:45 -08:00
yoff
d8857c7ce8 Merge pull request #7246 from tausbn/python/import-star-flow
Python: Support flow through `import *`
2021-12-10 16:34:32 +01:00
Rasmus Wriedt Larsen
bd9b96e154 Merge pull request #7331 from tausbn/python-fix-bad-callsite-points-to-join
Python: Fix bad `callsite_points_to` join
2021-12-10 15:39:49 +01:00
Rasmus Wriedt Larsen
8ee020f79c Merge pull request #7332 from tausbn/python-fix-bad-scope-entry-points-to-join
Python: Fix bad `scope_entry_points_to` join
2021-12-10 15:33:13 +01:00
Anders Schack-Mulligen
464b9c3991 Dataflow: Sync. 2021-12-10 11:20:01 +01:00
Taus
6d247bfdf9 Merge pull request #7330 from tausbn/python-fix-bad-adjacentuseuse-join
Python: Fix bad join in SSA
2021-12-09 20:55:45 +01:00
yoff
8e11c2c476 Merge pull request #7259 from RasmusWL/even-more-path-injection-sinks
Python: Add more path-injection sinks from `os` and `tempfile` modules
2021-12-09 14:46:41 +01:00
Taus
b871342e83 Python: A small further performance improvement
Unrolling the transitive closure had slightly better performance here.

Also, we exclude names of builtins, since those will be handled by a
separate case of `isDefinedLocally`.
2021-12-09 10:29:55 +00:00
Taus
8517eff0f7 Python: Fix bad performance
A few changes, all bundled together:

- We were getting a lot of magic applied to the predicates in the
  `ImportStar` module, and this was causing needless re-evaluation.
  To address this, the easiest solution was to simply cache the entire
  module.
- In order to separate this from the dataflow analysis and make it
  dependent only on control flow, `potentialImportStarBase` was changed
  to return a `ControlFlowNode`.
- `isDefinedLocally` was defined on control flow nodes, which meant we
  were duplicating a lot of tuples due to control flow splitting, to no
  actual benefit.

Finally, there was a really bad join in `isDefinedLocally` that was
fixed by separating out a helper predicate. This is a case where we
could use a three-way join, since the join between the `Scope`, the
`name` string and the `Name` is big no matter what.

If we join `scope_defines_name` with `n.getId()`, we'll get `Name`s
belonging to irrelevant scopes.

If we join `scope_defines_name` with the enclosing scope of the `Name`
`n`, then we'll get this also for `Name`s that don't share their `getId`
with the local variable defined in the scope.

If we join `n.getId()` with `n.getScope()...` then we'll get all
enclosing scopes for each `Name`.

The last of these is what we currently have. It's not terrible, but not
great either. (Though thankfully it's rare to have lots of enclosing
scopes.)
2021-12-08 22:53:45 +00:00
Anders Schack-Mulligen
38d0bb4a60 Merge pull request #7260 from hvitved/dataflow/argument-parameter-matching
Data flow: Introduce `ParameterPosition` and `ArgumentPosition`
2021-12-08 12:49:08 +01:00
Tom Hvitved
283173ad02 Address review comments 2021-12-08 11:26:44 +01:00
Tom Hvitved
490872173a Data flow: Sync files 2021-12-07 20:29:18 +01:00
Taus
e7c298d903 Python: Fix bad scope_entry_points_to join
From `pritomrajkhowa/LoopBound`:

```
Definitions.ql-7:PointsTo::PointsToInternal::scope_entry_points_to#ffff#antijoin_rhs#2 ........... 55.1s
```

specifically

```
(443s) Tuple counts for PointsTo::PointsToInternal::scope_entry_points_to#ffff#antijoin_rhs#2/3@74a7cart after 55.1s:
184070    ~0%        {3} r1 = JOIN PointsTo::PointsToInternal::scope_entry_points_to#ffff#shared#1 WITH Variables::GlobalVariable#class#f ON FIRST 1 OUTPUT Lhs.0 'arg2', Lhs.1 'arg0', Lhs.2 'arg1'
184070    ~0%        {3} r2 = STREAM DEDUP r1
919966523 ~2%        {4} r3 = JOIN r2 WITH Essa::EssaDefinition::getSourceVariable_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'arg0', Lhs.2 'arg1', Lhs.0 'arg2'
4281779   ~2293%     {3} r4 = JOIN r3 WITH Essa::EssaVariable::getScope_dispred#ff ON FIRST 2 OUTPUT Lhs.1 'arg0', Lhs.2 'arg1', Lhs.3 'arg2'
                    return r4
```

First, this is an `antijoin`, so there's likely some negation involved.
Also, there's mention of `GlobalVariable`, `getScope`, and
`getSourceVariable`, none of which appear in `scope_entry_points_to`, so
it's likely that something got inlined.

Taking a closer look at the predicates mentioned in the body, we spot
`undefined_variable` as a likely culprit.

Evaluating this predicate in isolation reveals that it's not terribly
big, so we could try just marking it with `pragma[noinline]` (I opted
for the slightly more solid `nomagic`) and see how that fares. I also
checked that `builtin_not_in_outer_scope` was similarly small, and
made that one un-inlineable as well.

The result? Well, I can't even show you. Both `scope_entry_points_to`
and `undefined_variable` are so fast that they don't appear in the
clause timing report (so they can at most take 3.5s each to evaluate, as
that is the smallest timing in the list).
2021-12-07 18:51:44 +00:00
Taus
b502ca1ea7 Python: Fix bad callsite_points_to join
From `pritomrajkhowa/LoopBound`:

```
Definitions.ql-7:PointsTo::InterProceduralPointsTo::callsite_points_to#ffff#join_rhs#3 ........... 5m53s
```

specifically

```
(767s) Tuple counts for PointsTo::InterProceduralPointsTo::callsite_points_to#ffff#join_rhs#3/3@f8f86764 after 5m53s:
832806293 ~0%     {4} r1 = JOIN PointsTo::InterProceduralPointsTo::callsite_points_to#ffff#shared#1 WITH PointsTo::InterProceduralPointsTo::var_at_exit#fff ON FIRST 1 OUTPUT Lhs.0, Lhs.1 'arg1', Rhs.1 'arg2', Rhs.2 'arg0'
832806293 ~0%     {3} r2 = JOIN r1 WITH Essa::TEssaNodeRefinement#ffff_03#join_rhs ON FIRST 2 OUTPUT Lhs.3 'arg0', Lhs.1 'arg1', Lhs.2 'arg2'
                return r2
```

This one is a bit tricky to unpack. Where is this `shared#1` defined?

```
EVALUATE NONRECURSIVE RELATION:
SYNTHETIC PointsTo::InterProceduralPointsTo::callsite_points_to#ffff#shared#1(int arg0, numbered_tuple arg1) :-
    SENTINEL PointsTo::InterProceduralPointsTo::callsite_points_to#ffff#shared
    SENTINEL Definitions::EscapingAssignmentGlobalVariable#class#f
    SENTINEL Essa::TEssaNodeRefinement#ffff_03#join_rhs
    {2} r1 = JOIN PointsTo::InterProceduralPointsTo::callsite_points_to#ffff#shared WITH Definitions::EscapingAssignmentGlobalVariable#class#f ON FIRST 1 OUTPUT Lhs.0 'arg0', Lhs.1 'arg1'
    {2} r2 = STREAM DEDUP r1
    {2} r3 = JOIN r2 WITH Essa::TEssaNodeRefinement#ffff_03#join_rhs ON FIRST 2 OUTPUT Lhs.0 'arg0', Lhs.1 'arg1'
    {2} r4 = STREAM DEDUP r3
    return r4
```

Looking at `callsite_points_to`, we see a likely candidate in `srcvar`.
It is guarded with an `instanceof` check for
`EscapingAssignmentGlobalVariable` (which lines up nicely with the
sentinel on its charpred) and `getSourceVariable` is just a projection
of `TEssaNodeRefinement`.

So let's try unbinding `srcvar` to prevent an early join.

The timing is now:

```
Definitions.ql-7:PointsTo::InterProceduralPointsTo::callsite_points_to#ffff ...................... 31.3s (2554 evaluations with max 101ms in PointsTo::InterProceduralPointsTo::callsite_points_to#ffff/4@i516#581fap5w)
```
(Showing the tuple counts doesn't make sense here, since all of the
`shared` and `join_rhs` predicates have been smooshed around.)
2021-12-07 18:25:53 +00:00
Taus
a716482c1f Python: Fix bad join in SSA
On `pritomrajkhowa/LoopBound`:

```
Definitions.ql-3:SsaCompute::SsaComputeImpl::AdjacentUsesImpl::adjacentUseUse#ff ................. 4m35s
```

specifically

```
(376s) Tuple counts for SsaCompute::SsaComputeImpl::AdjacentUsesImpl::adjacentUseUse#ff/2@be04e9kp after 4m58s:
388843     ~0%     {4} r1 = JOIN Essa::TPhiFunction#fff_2#join_rhs WITH SsaCompute::SsaComputeImpl::AdjacentUsesImpl::definesAt#ffff ON FIRST 1 OUTPUT Rhs.1, Lhs.0, Rhs.2, Rhs.3
3629812090 ~1%     {7} r2 = JOIN r1 WITH SsaCompute::SsaComputeImpl::variableUse#ffff ON FIRST 1 OUTPUT Lhs.0, Rhs.2, Rhs.3, Lhs.2, Lhs.3, Lhs.1, Rhs.1 'use1'
0          ~0%     {2} r3 = JOIN r2 WITH SsaCompute::SsaComputeImpl::AdjacentUsesImpl::adjacentVarRefs#fffff ON FIRST 5 OUTPUT Lhs.5, Lhs.6 'use1'
0          ~0%     {2} r4 = JOIN r3 WITH SsaCompute::SsaComputeImpl::AdjacentUsesImpl::firstUse#ff ON FIRST 1 OUTPUT Lhs.1 'use1', Rhs.1 'use2'

897141     ~0%     {2} r5 = SsaCompute::SsaComputeImpl::AdjacentUsesImpl::adjacentUseUseSameVar#ff UNION r4
                    return r5
```

Clearly we do not want to join on the variable so soon. So we unbind it
and get

```
(78s) Tuple counts for SsaCompute::SsaComputeImpl::AdjacentUsesImpl::adjacentUseUse#ff/2@40e0e6uv after 434ms:
3377959 ~2%     {4} r1 = SCAN SsaCompute::SsaComputeImpl::variableUse#ffff OUTPUT In.0, In.2, In.3, In.1 'use1'
1026855 ~2%     {4} r2 = JOIN r1 WITH SsaCompute::SsaComputeImpl::AdjacentUsesImpl::adjacentVarRefs#fffff ON FIRST 3 OUTPUT Lhs.0, Rhs.3, Rhs.4, Lhs.3 'use1'
129484  ~0%     {2} r3 = JOIN r2 WITH SsaCompute::SsaComputeImpl::AdjacentUsesImpl::definesAt#ffff_1230#join_rhs ON FIRST 3 OUTPUT Rhs.3, Lhs.3 'use1'
0       ~0%     {2} r4 = JOIN r3 WITH Essa::TPhiFunction#fff_2#join_rhs ON FIRST 1 OUTPUT Lhs.0, Lhs.1 'use1'
0       ~0%     {2} r5 = JOIN r4 WITH SsaCompute::SsaComputeImpl::AdjacentUsesImpl::firstUse#ff ON FIRST 1 OUTPUT Lhs.1 'use1', Rhs.1 'use2'

897141  ~0%     {2} r6 = SsaCompute::SsaComputeImpl::AdjacentUsesImpl::adjacentUseUseSameVar#ff UNION r5
                return r6
```
2021-12-07 18:19:47 +00:00
Taus
59bac04d8f Python: Fix Python 2 failures 2021-12-07 18:00:46 +00:00
Taus
ffc858e34d Python: Add missing file 2021-12-07 17:29:35 +00:00
Taus
7437cd4d85 Python: Fix syntax error locations 2021-12-07 16:51:33 +00:00
Erik Krogh Kristensen
3c59aa319e Merge pull request #7245 from erik-krogh/explicit-this-all-the-places
All langs: apply the explicit-this patch to all remaining code
2021-12-07 10:40:26 +01:00
Taus
7cd9369d91 Python: Autoformat 2021-12-07 09:29:24 +00:00
Taus
33a9f86f54 Python: Change integer in trois.py 2021-12-07 08:54:07 +00:00
Taus
dd33f4f4d2 Python: Apply suggestions from code review
Co-authored-by: yoff <lerchedahl@gmail.com>
2021-12-07 09:48:53 +01:00
liangjinhuang
1102f60f3e add tests 2021-12-04 00:52:15 +08:00
Taus
7f44cebed7 Python: Add missing hidden flow
The easiest way to implement this was to change the definition of
`module_export` to account for chains of `import *`. We reuse the
machinery from `ImportStar.qll` for this, naturally.
2021-12-02 17:11:56 +00:00
Taus
4138296ec6 Python: Add test for "hidden" import * flow
TL;DR: We were missing out on flow in the following situation:

`mod1.py`:
```python
foo = SOURCE
```

`mod2.py`:
```python
from mod1 import *
```

`test.py`:
```python
from mod2 import foo
SINK(foo)
```

This is because there's no node at which a read of `foo` takes place
within `test.py`, and so the added reads make no difference.

Unfortunately, this means the previous test was a bit too simplistic,
since it only looks for module variable reads and writes. Because of
this, we change the test to be a more traditional "all flow" style
(though restricted to `CfgNode`s).
2021-12-02 17:05:54 +00:00
Nick Rolfe
05415768c9 Merge remote-tracking branch 'origin/main' into nickrolfe/regexp_g_anchor 2021-12-02 12:07:13 +00:00
yoff
f10f053c36 Merge pull request #7228 from RasmusWL/fastapi-improvements
Python: FastAPI improvements
2021-12-02 12:58:53 +01:00
yoff
4609b2060a Merge pull request #7217 from RasmusWL/more-path-injection-fps
Python: Add `x in <var>` test for StringConstCompare
2021-12-02 12:35:33 +01:00
github-actions[bot]
87b968f337 Post-release preparation 2.7.3 2021-12-02 00:46:55 +00:00
Anders Schack-Mulligen
cde853c095 Merge pull request #7270 from aschackmull/dataflow/stage2-refactor
Dataflow: Stage 2 refactor
2021-12-01 11:09:08 +01:00
Tom Hvitved
e410244fe0 Python: Implement ParameterPosition et al 2021-12-01 08:51:22 +01:00
github-actions[bot]
337ce65fe5 Release preparation for version 2.7.3 2021-11-30 20:39:35 +00:00