Commit Graph

4539 Commits

Author SHA1 Message Date
Rasmus Wriedt Larsen
deb3db3f95 Python: Add non-alert data for extractor diagnostics
This is basically just a port of the C++/JS queries added in:

- https://github.com/github/codeql/pull/5414 (C++)
- https://github.com/github/codeql/pull/5656 (JS)

SyntaxError should capture all errors we have information about. At least in
`python/ql/src/semmlecode.python.dbscheme` the only match for `error` is
`py_syntax_error_versioned` (which `SyntaxError` is based on).
2021-04-23 13:29:44 +02:00
Rasmus Wriedt Larsen
354dee1b09 Python: Add non-alert data for lines of code
`py/summary/lines-of-code` is just a port of the C++/JS queries added in:

- https://github.com/github/codeql/pull/5271 (C++)
- https://github.com/github/codeql/pull/5304 (JS)

We are the first to implement the `lines-of-user-code` query, so nothing to
compare with in other languages -- but it makes a lot of sense to do for Python 👍
2021-04-23 13:22:18 +02:00
yoff
1954c0ba84 Apply suggestions from code review
Co-authored-by: Taus <tausbn@github.com>
2021-04-23 10:20:18 +02:00
Rasmus Wriedt Larsen
f9383a31bf Python: Fix BrokenCryptoAlgorithm.qhelp 2021-04-22 15:58:28 +02:00
Rasmus Wriedt Larsen
222c087e8c Python: Remove type-tracking performance workaround
Since we shouldn't need it anymore (yay)
2021-04-22 15:31:49 +02:00
Rasmus Wriedt Larsen
b82209964a Python: Add change-note for new weak crypto queries 2021-04-22 15:23:42 +02:00
Rasmus Wriedt Larsen
fc1a6d0e32 Python: Say salting is not part of py/weak-sensitive-data-hashing 2021-04-22 15:23:41 +02:00
Rasmus Wriedt Larsen
ac83c695ad Python: Add py/weak-sensitive-data-hashing query 2021-04-22 15:23:41 +02:00
Rasmus Wriedt Larsen
499adc26a3 Python: Extend SensitiveDataSource tests
Now it contains all the sort of things we actually support 👍
2021-04-22 15:23:40 +02:00
Rasmus Wriedt Larsen
794a86a6b0 Python: Add SensitiveDataSource 2021-04-22 15:23:39 +02:00
Rasmus Wriedt Larsen
56c409737d Python: Port py/weak-cryptographic-algorithm
The other query (py/weak-sensitive-data-hashing) is added in future commit
2021-04-22 15:23:38 +02:00
Rasmus Wriedt Larsen
59edd18c34 Python: Move framework test-files out of experimental
This PR was rebased on newest main, but was written a long time ago when all the
framework test-files were still in experimental. I have not re-written my local
git-history, since there are MANY updates to those files (and I dare not risk
it).
2021-04-22 15:23:37 +02:00
Rasmus Wriedt Larsen
1616975e06 Python: Model hashlib from standard library 2021-04-22 15:23:37 +02:00
Rasmus Lerchedahl Petersen
5a4e661e60 Merge branch 'main' of github.com:github/codeql into python-support-pathlib 2021-04-22 15:04:21 +02:00
Rasmus Wriedt Larsen
7ffbfa8043 Python: Expand stdlib md5 tests with keyword-arguments 2021-04-22 14:51:20 +02:00
Rasmus Wriedt Larsen
fa88f22453 Python: Model hashing operations in cryptography package 2021-04-22 14:51:20 +02:00
Rasmus Wriedt Larsen
c5f826580b Python: Model encrypt/decrypt in cryptography package
I introduced a InternalTypeTracking module, since the type-tracking code got so
verbose, that it was impossible to get an overview of the relevant predicates.
(this means the "first" type-tracking predicate that is usually private, cannot
be marked private anymore, since it needs to be exposed in the private module.
2021-04-22 14:51:19 +02:00
Rasmus Wriedt Larsen
bf6f5074c2 Python: Port cryptodome tests to crypto
I don't know if this is really a smart test-setup... I feel a bit stupid when
doing this xD
2021-04-22 14:51:19 +02:00
Rasmus Wriedt Larsen
f8254381f3 Python: Add MISSING: CryptographicOperationAlgorithm annotations
For RSA it's unclear what the algorithm name should even be. Signatures based on
RSA private keys with PSS scheme is ok, but with pkcs#1 v1.5 they are
weak/vulnerable. So clearly just putting RSA as the algorithm name is not enough
information...

and that problem is also why I wanted to do this commit separetely (to call
extra atten to this).
2021-04-22 14:51:18 +02:00
Rasmus Wriedt Larsen
23140dfb76 Python: Add CryptographicOperation modeling for Cryptodome 2021-04-22 14:51:17 +02:00
Rasmus Wriedt Larsen
1b2ed9d99a Python: Align cryptodome tests 2021-04-22 14:51:16 +02:00
Rasmus Wriedt Larsen
2c0df8e656 Python: Add MD5 tests 2021-04-22 14:51:16 +02:00
Rasmus Wriedt Larsen
a8de2aba3b Python: Move CryptoAlgorithms implementation 2021-04-22 14:51:15 +02:00
Rasmus Wriedt Larsen
65c8d9605e Python: Add CryptographicOperation Concept
I considered using `getInput` like in JS, but things like signature verification
has multiple inputs (message and signature).

Using getAnInput also aligns better with Decoding/Encoding.
2021-04-22 14:51:14 +02:00
Rasmus Wriedt Larsen
d18fbb7f07 Python: Add working tests of AES and RC4 2021-04-22 14:51:14 +02:00
Rasmus Wriedt Larsen
cf64701bcb Python: Move weak-crypto-algorithm tests to own folder 2021-04-22 14:51:13 +02:00
Rasmus Lerchedahl Petersen
b724e51cab Python: Improvements from review suggestions 2021-04-22 10:40:42 +02:00
Rasmus Wriedt Larsen
5a9e27c6fc Merge branch 'main' into django-3.2 2021-04-21 17:15:47 +02:00
CodeQL CI
30d7f0dc98 Merge pull request #5687 from RasmusWL/inline-taint-tests
Approved by yoff
2021-04-21 06:24:12 -07:00
Taus
71780228ae Python: Rename TypeTrackerPrivate.qll 2021-04-21 13:08:26 +00:00
Rasmus Wriedt Larsen
be9cbd79d6 Python: Add change-note for Django 3.2 support 2021-04-21 13:58:34 +02:00
Rasmus Wriedt Larsen
59c6f76457 Python: Add test for new response.headers in Django
See https://docs.djangoproject.com/en/3.2/ref/request-response/#setting-header-fields
2021-04-21 13:55:22 +02:00
Rasmus Wriedt Larsen
2302c8d5fa Python: Model new alias method on django QuerySets 2021-04-21 13:52:38 +02:00
yoff
a19373ab54 Merge pull request #5727 from tausbn/python-use-localsource-in-stepsummary
Python: Use `LocalSourceNode` in `StepSummary::step`
2021-04-21 13:50:31 +02:00
Taus
489e1e94e4 Python: Prevent bad joins
Adds a few unbinds to prevent bad joins from occurring.

Firstly, we never want to join `StepSummary::step` with
`TypeTracker::append` on `summary` as the first join, as the resulting
relation is absolutely massive. So we decouple the two occurrences of
`summary` by unbinding each of them.

Secondly, in some cases the node we're stepping to (`nodeTo` for type
trackers, `nodeFrom` for type backtrackers) will get joined eagerly
with the typetracker one is defining, and again this produces an
uncomfortably large intermediate join. A bit of unbinding prevents this
as well.
2021-04-21 11:44:34 +00:00
Taus
9e95f6e7c1 Python: Remove typePreservingStep
This requires a bit of explanation, so strap in.

Firstly, because we use `LocalSourceNode`s as the start and end points
of our `StepSummary::step` relation, there's no need to include
`simpleLocalFlowStep` (via `typePreservingStep`) in `smallstep`. Indeed,
since the successor node for a `step` is a `LocalSourceNode`, and local
sources never have incoming flow, this is entirely futile -- we can find
values for `mid` and `nodeTo` that satisfy the body of `step`, but
`nodeTo` will never be a `LocalSourceNode`.

With this in mind, we can simplify `smallstep` to only refer to
`jumpStep`.

This then brings the other uses of `typePreservingStep` into question.
The only other place we use this predicate is in the `TypeTracker` and
`TypeBackTracker` `smallstep` predicates. Note, however, that here we
no longer need `jumpStep` to be part of `typeTrackingStep` (as it is
already accounted for in `StepSummary::smallstep`) so we can simplify
to `simpleLocalFlowStep`. At this point, `typePreservingStep` is unused.

Finally, because of the way `smallstep` is used in `step` (inside
`StepSummary`), `nodeTo` must always be a `LocalSourceNode`, so I have
propagated this restriction to `smallstep` as well. We can always lift
this restriction later, but for now it seems like it's likely to cause
fewer surprises to have made this explicit.
2021-04-21 11:12:06 +00:00
Rasmus Wriedt Larsen
775ed41592 Python: Update SensitiveDataHeuristics with newer JS version
which also prompted me to rewrite the QLDoc for `nameIndicatesSensitiveData`
2021-04-21 11:34:01 +02:00
Rasmus Wriedt Larsen
16b62486e9 Python: Extract SensitiveDataHeuristics to be shared with JS
Initially I had called `nameIndicatesSensitiveData` for `maybeSensitiveName`,
which made the relationship with `maybeSensitive` and `notSensitive` quite
strange -- and therefore I added the more informative `maybeSensitiveRegexp` and
`notSensitiveRegexp`.

Although I'm no longer using `maybeSensitiveName`, and I no longer have a strong
argument for making this name change, I still like it. If someone thinks this is
a terrible idea, I'm happy to change it though 👍
2021-04-21 11:31:28 +02:00
Rasmus Wriedt Larsen
63a2657aef Merge branch 'main' into inline-taint-tests 2021-04-21 10:02:55 +02:00
yoff
0c4181178d Update python/ql/src/semmle/python/frameworks/Stdlib.qll
Co-authored-by: Taus <tausbn@github.com>
2021-04-20 22:15:09 +02:00
yoff
ef0ea247c4 Merge pull request #5679 from tausbn/python-fix-bad-points-to-joins
Python: Fix bad points-to joins
2021-04-20 21:19:32 +02:00
Rasmus Lerchedahl Petersen
6408ee2eaf Python: Fix bad join 2021-04-20 20:03:06 +02:00
Rasmus Lerchedahl Petersen
fc2c62350e Python: Fix bad join
Also fixed up the QLDoc
2021-04-20 18:54:03 +02:00
Taus
890f96d9b5 Python: Prevent bad joins in TypeBackTracker
Perhaps unsurprisingly, the join orderer was eager and willing to find
the wrong join order in this predicate as well. Applying a similar
fix to the one used in `TypeTracker::step` fixes the problem.
2021-04-20 15:01:04 +00:00
Taus
c0569da65c Python: Move track/backtrack to LocalSourceNode
This is merely making explicit what was implicitly enforced. The move
to change the return type of `step` already meant that `this` and
`result` had to be `LocalSourceNode`. By moving these methods to their
rightful place, we should hopefully avoid a bit of suprising behaviour.
2021-04-20 14:39:56 +00:00
Taus
2a07441c19 Python: ModuleVariableNodes are not API uses
This caused some suprising test changes, where suddenly we had flow from
a `ModuleVariableNode` (as a `RemoteFlowSource`) to a sink. This of
course makes little sense, so instead we simply exclude these nodes as
uses in the first place.
2021-04-20 14:33:42 +00:00
Rasmus Lerchedahl Petersen
9c893cb0f4 Merge branch 'main' of github.com:github/codeql into python-port-insecure-protocol 2021-04-20 16:33:03 +02:00
Taus
7581cbade6 Python: Fix forgotten type tracker
This was the last remaining type tracker that did not use
`LocalSourceNode`.
2021-04-20 14:32:56 +00:00
Taus
38548c9acd Python: Simplify charpred for LocalSourceNode
The somewhat convoluted `comes_from_cfgnode` was originally introduced
in order to have local sources for instances of global variables. This
was needed because global variables have an implicit "scope entry" SSA
definition that flows to the first actual use of the variable (and so
would not fit the strict "has no incoming flow" definition of a local
source node).

However, a subsequent change means that we include all global variable
reads anyway, and so the old definition is no longer needed.

(See commit 3fafb47b16 for further
context.)
2021-04-20 13:19:36 +00:00
Taus
038bf612be Python: Add change note 2021-04-20 13:06:30 +00:00