Commit Graph

2489 Commits

Author SHA1 Message Date
Rasmus Wriedt Larsen
34c5da563e Python: Move files in experiemntal dirs to be consistent
Except for dataflow (where we have a lot of changes, and I don't want to
introduce lots of merge conflicts right now).
2020-09-02 13:39:01 +02:00
Rasmus Wriedt Larsen
9c8b829d65 Python: Fix formatting 2020-09-02 13:27:35 +02:00
Calum Grant
29b3759655 Merge pull request #3961 from tausbn/python-add-typetracker
Python: Add type tracker and step summary implementation.
2020-09-02 09:42:14 +01:00
Rasmus Wriedt Larsen
ab06c459f4 Python: Make validTest error on empty output again
I accidentially disabled that when introducing the ability to handle more than
one OK.
2020-09-01 14:42:11 +02:00
Rasmus Wriedt Larsen
0cc018fec0 Python: Taint tracking setup alá Go
\## TaintFlow sources

The class `RemoteFlowSource` is very similarly defined as the other languages [C++](ac22e7950c/cpp/ql/src/semmle/code/cpp/security/FlowSources.qll), [Java](6de612a566/java/ql/src/semmle/code/java/dataflow/FlowSources.qll), [C#](fddbce0b7b/csharp/ql/src/semmle/code/csharp/security/dataflow/flowsources/Remote.qll), [JS](78334af354/javascript/ql/src/semmle/javascript/security/dataflow/RemoteFlowSources.qll), and [Go](24b3133e0c/ql/src/semmle/go/security/FlowSources.qll). There are some minor differences:

- Java/C++ defines the class in `FlowSources.qll`
- C# uses `csharp/ql/src/semmle/code/csharp/security/dataflow/flowsources/Remote.qll`, and provide `StoredFlowSource` and `LocalFlowSource` in separate classes.
- JS uses `RemoteFlowSources.qll`.
- JS defines additional predicate `RemoteFlowSource.isUserControlledObject`
- Go uses the class name `UntrustedFlowSource`, but still defined in `ql/src/semmle/go/security/FlowSources.qll`
- Go uses the `::Range` pattern to allow both extensibility and refinement

The big difference is how a RemoteFlowSource is specified:

- Java and C# have all subclasses of `RemoteFlowSource` defined in the same file
- Go and JS defines subclasses for frameworks in the actual framework `.qll` file, and all frameworks are transitively imported by `import go` or `import javascript` (so subclasses are always in scope).
- C++ uses class `RemoteFlowFunction` to do all the heavy lifting (and its subclasses are transitively imported).

\### What we will do

Use file `RemoteFlowSource.qll`, define subclasses in framework library classes.

_Why? Personally I really like it, Go/JS is already doing it, and Tom expressed a preference for doing the same for C# (although that is not what they are doing today)._

Jonas gave this advice:
> Whether you split the definitions between multiple files or keep them all in one file, the property you want is that all definitions are included when the abstract class is included. Otherwise you can get unexpected results via transitive includes.

We will make imports of all frameworks in the same file that defines `RemoteFlowSource`, as it seems to be the least intrusive change. If that turns out to be a problem, we can also move them to `python.qll` (the other way is not so easy).

\## TaintFlow sinks

[JS](473787a426/javascript/ql/src/semmle/javascript/Concepts.qll) and [Go](ecff1e6a16/ql/src/semmle/go/Concepts.qll) defines abstract base classes for interesting sinks in `Concepts.qll` (and all uses the `::Range` pattern in Go).

I really like this idea, since it allows multiple queries to reuse the same sink definitions, and it makes it _easy_ to discover what default sinks are available.

Personally I'm not 100% on board with the naming, but I don't have any good reason to change the naming convention.

\## Framework modeling

Following the model from Go ([example](https://github.com/github/codeql-go/blob/main/ql/src/semmle/go/frameworks/Gin.qll)), I propose that we make every definition in a framework modeling `private`. This allows some greater flexibility in changing our modeling, since we don't need to think about keeping deprecated versions around for a whole year.

It _does_ have the downside that someone writing a query can't reuse the classes/predicates for a framework, but it didn't seem to be too big of a concern. If we need to provide access, we can always make the definitions non-private (the other way is not so easy).

\## Customizations

Also introduced `Customizations.qll` like in JS/Java/Go (to replace `site.qll`)
2020-09-01 14:37:11 +02:00
Taus Brock-Nannestad
6a96c53d15 Python: Add missing getNode invocation 2020-09-01 14:04:31 +02:00
Rasmus Lerchedahl Petersen
8b13a429b7 Python: Address review comments 2020-09-01 14:00:41 +02:00
Taus Brock-Nannestad
26d14aba98 Python: Use nodeFrom/nodeTo instead of pred/succ 2020-09-01 14:00:30 +02:00
Arthur Baars
aedfa47cb4 Add missing QHelp files 2020-09-01 12:46:57 +02:00
Rasmus Wriedt Larsen
c5e3333d10 Python: Update expected tests after last commit
I'm pushing too fast it seems
2020-09-01 12:01:34 +02:00
Rasmus Wriedt Larsen
e0cfe8123e Python: Update comments for new taint tests
I see I didn't keep them up to date as I implemented things
2020-09-01 11:58:26 +02:00
Rasmus Lerchedahl Petersen
6d23d7fa0e Python: Test that pointsTo implies data flow
Running the test on a larger database gives some interesting results.
2020-09-01 11:56:22 +02:00
Rasmus Wriedt Larsen
cda88a5e64 Python: Refactor: use DataFlow::Node.asExpr() 2020-09-01 11:53:06 +02:00
Rasmus Wriedt Larsen
ddc55a18cf Python: Fix taint handling of copy.deepcopy
(test results didn't change)

Thanks @yoff 👍
2020-09-01 11:50:46 +02:00
Rasmus Wriedt Larsen
e5a361c230 Python: Better taint tests for copy.deepcopy 2020-09-01 11:50:33 +02:00
Taus Brock-Nannestad
ec64606d5a Python: Remove CopyStep branch type 2020-08-31 17:23:02 +02:00
Taus Brock-Nannestad
eb6443df21 Merge branch 'python-add-typetracker' of github.com:tausbn/ql into python-add-typetracker 2020-08-31 17:22:13 +02:00
Taus
8e1f99af99 Python: Apply suggestions from code review
Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
2020-08-31 17:20:12 +02:00
Taus Brock-Nannestad
3547c70d35 Python: Add tests with redefinition of fields/variables 2020-08-31 17:17:37 +02:00
Taus Brock-Nannestad
06103f4ff2 Python: Consistently use attribute/attr 2020-08-31 17:16:31 +02:00
Rasmus Wriedt Larsen
cf2eacd7a6 Python: Adjust additional taint after PostUpdateNode addition
Still no results though :(
2020-08-31 14:59:29 +02:00
Rasmus Wriedt Larsen
4e73abc254 Merge branch 'main' into python-more-additional-taint-steps 2020-08-31 14:34:42 +02:00
Rasmus Lerchedahl Petersen
5f3eda0a22 Python: Annotate test file
Also add test of custom flow
2020-08-31 09:06:13 +02:00
Taus Brock-Nannestad
7108d28395 Python: Remove failing non-inline test
It is subsumed by `tracked.ql` anyway.
2020-08-28 21:21:29 +02:00
Taus Brock-Nannestad
5d853e840a Merge branch 'main' into python-add-typetracker 2020-08-28 19:59:58 +02:00
Taus Brock-Nannestad
8b78b6b1dc Python: Add inline tests
Nodes to which we track type tracking flow from the source (any
identifier named `tracked`) are indicated with a `$tracked` tag, and
`$tracked=attr_name` if the attribute is for the specified attribute
of the given node.

For nodes that do have flow from `tracked`, I indicate this in one of
two ways:

- If it's expected due to the design of type tracking, I omit the
  `$tracked tag.
- If it's flow that _ought_ to be there, I indicate it as a false
  negative: `$f-:tracked`

Currently, only an instance of global flow is in the latter category.
2020-08-28 19:55:52 +02:00
Taus Brock-Nannestad
fbe8b64dd4 Python: Add support for attribute reads and writes 2020-08-28 19:55:14 +02:00
Rasmus Lerchedahl Petersen
6b8d9f2a77 Merge branch 'main' of github.com:github/codeql into SharedDataflow_PostUpdateNodes 2020-08-28 13:01:14 +02:00
Rasmus Lerchedahl Petersen
9503c5d8bb Python: Add post-update nodes 2020-08-28 12:59:11 +02:00
Rasmus Wriedt Larsen
2d2b036b8c Python: Fix expected output for moved taint tests 2020-08-28 11:25:46 +02:00
Rasmus Wriedt Larsen
7213da195c Python: Use standard naming scheme for taint flow tests
We got into problems since using `string.py` would shadow the string module from
the standard library. By some reason I adopted a pattern of `_` as suffix, but
let us just use the standard pattern of `test_` prefix like a normal testing
framework like pytest does.
2020-08-28 11:22:42 +02:00
Rasmus Wriedt Larsen
621e3f6c3c Python: Add dataflow test of deep call graph 2020-08-28 11:17:23 +02:00
Rasmus Wriedt Larsen
45ab723423 Python: Add dataflow test for a,b = b,a
Also enables a single test to output more than one OK
2020-08-28 11:12:25 +02:00
Rasmus Wriedt Larsen
496d856c48 Python: Reformualte explanation of experience from JS 2020-08-28 10:49:33 +02:00
Taus
1206ff5889 Merge pull request #4150 from RasmusWL/python-dataflow-private-import
Python: Make import of python private in shared dataflow
2020-08-27 18:05:55 +02:00
Rasmus Wriedt Larsen
f12d29de07 Python: Add taint test of more colleciton methods 2020-08-27 17:36:10 +02:00
Taus Brock-Nannestad
7112aa2e9a Merge branch 'main' into python-add-typetracker 2020-08-27 17:05:26 +02:00
Rasmus Wriedt Larsen
654c4f39ac Python: Add missing module.py to consistency/regression tests 2020-08-27 16:32:26 +02:00
Rasmus Wriedt Larsen
f1e11f1efd Python: updated expected output from new shared dataflow tests
I did not verify whether these changes are OK or not, simply ran and accepted
the tests.
2020-08-27 16:17:12 +02:00
Rasmus Wriedt Larsen
b11b5784b2 Python: Adtop more complete tests from old dataflow impl
The ones in test/experimental/dataflow/[consistency,regression]/test.py was a
copy from test/library-tests/taint/dataflow/test.py.

However, test/library-tests/taint/dataflow/test.py only contains a subset of
test/library-tests/taint/config/test.py, that only contains a subset of
test/library-tests/taint/general/test.py

This commit updates the experimental dataflow tests to be a copy of the
test/library-tests/taint/general/test.py file.

There seems to have been a few changes to the file after it being copied, in
`test_truth` and `test_early_exit`. I have no reproduced those changes.
2020-08-27 16:08:51 +02:00
Taus Brock-Nannestad
797e290a67 Python+CPP: Change values to value 2020-08-27 14:12:40 +02:00
Taus Brock-Nannestad
dccbcc15b3 Python: Sync InlineExpectationsTest.qll between Python and C++
Also changes `valuesasas` to `values` in the test example.
2020-08-27 13:37:26 +02:00
Rasmus Wriedt Larsen
9da6da6106 Python: Fix imports in shraed dataflow tests 2020-08-27 13:29:41 +02:00
Taus
e7322d114f Merge pull request #4077 from yoff/MagicMethods
Python: Add support for magic methods
2020-08-27 13:20:56 +02:00
Taus
d3175a7899 Merge pull request #4110 from yoff/SharedDataflow_ParsimoniousFlowNodes
Python: Shared dataflow, parsimonious flow nodes
2020-08-27 13:19:23 +02:00
CodeQL CI
30ac2f9c84 Merge pull request #4143 from tausbn/python-add-inline-test-expectations-library
Approved by RasmusWL
2020-08-27 12:18:41 +01:00
Rasmus Wriedt Larsen
909bff2313 Python: Make import of python private in shared dataflow 2020-08-27 11:48:56 +02:00
Rasmus Wriedt Larsen
627363d6ea Python: Test taint step for string augmented assignment
Apprently it just works 😕 :magic:
2020-08-27 11:37:56 +02:00
Rasmus Wriedt Larsen
569e54e7bb Python: Remove symlink from experimental test 2020-08-27 11:19:55 +02:00
Rasmus Wriedt Larsen
d0081dfbfa Python: Attempt at taint step for list.append/set.add 2020-08-27 10:57:07 +02:00