Commit Graph

726 Commits

Author SHA1 Message Date
yoff
4445cf152a python: various fixes
- compilation
- alerts
- some review comments
2022-05-11 12:28:58 +00:00
yoff
f67be52b99 python: fix compilation
by making client code use the "new" class.
Really, this part of the split class should have the old name,
to minimise disruptions to clients.
Same goes for the other split classes.
2022-05-10 12:53:13 +00:00
yoff
db008f1939 python: summaries may allowParameterReturnInSelf 2022-05-10 12:48:42 +00:00
yoff
238c578f5a python: Add LocalSourceParameterNode
This can be used when one wants to consider a
(source) parameter node as a local source.
2022-05-10 12:48:42 +00:00
yoff
28b239a9a4 python: add qldoc 2022-05-10 12:48:42 +00:00
yoff
da3634188d python: variaous fixes
- sync summary files
- format files
- fix compilation
2022-05-10 12:48:42 +00:00
yoff
f14ee0e794 python: Flow summaries based on type tracking
Two classes have been inserted into the hierarchies:

- `NonLibraryDataFlowCallable` with a method `getACall2`.
This method implements "get a call, not considering flow summaries".
For `NonLibraryDataFlowCallable`s, `getACall` will defer to `getACall2`.
While you could have a synthesised call to such a callable,
it would not correspond to a `CallNode`.

- `NonLibraryDataFlowSourceCall` with methods
`getArg2` and `getCallable2`. These also refer to a call graph that
does not consider flow summaries.

`getArg2` is used to synthesise pre-update nodes for arguments.

`getCallable2` is used in `connects` to compute argument passing.
This is used to define data flow nodes for overflow arguments.

`getACall2` ensures that `LibraryCallableValue::getACall` is not called
when the charpred of `FunctionCall` is evaluated.
2022-05-10 12:48:42 +00:00
Rasmus Lerchedahl Petersen
506efcf051 python: refactor TDataFlowCall
- Branch predicates are made simple. In particular, they do not try to detect library calls.
- All branches based on `CallNode`s are gathered into one.
- That branch has been given a class `NonSpecialCall`, which is the new parent of call classes based on `CallNode`s. (Those classes now have more involved charpreds.)
- A new such class, 'LambdaCall` has been split out from `FunctionCall` to allow the latter to replace its
  general `CallNode` field with a specific `FunctionValue` one.
- `NonSpecialCall` is not an abstract class, but it has some abstract overrides. Therefor, it is not
  considered a resolved call in the test `UnresolvedCalls.qll`.
2022-05-10 12:48:42 +00:00
Rasmus Lerchedahl Petersen
d85844bb89 python: type tracking uses source nodes 2022-05-10 12:48:42 +00:00
Rasmus Lerchedahl Petersen
81ca479ca9 Python: local flow for type tracking
summary flow is excluded from the local flow relation used for
typetracking, but included in the one used for global data flow.
2022-05-10 12:48:42 +00:00
Rasmus Lerchedahl Petersen
177dea5307 python: use new syntax for flow summaries
also convert to inline tests
2022-05-10 12:48:42 +00:00
Rasmus Lerchedahl Petersen
4024ce4777 python: some summary flows 2022-05-10 12:48:42 +00:00
Rasmus Lerchedahl Petersen
8c263b349f python: add summary flow steps 2022-05-10 12:48:42 +00:00
Rasmus Lerchedahl Petersen
828db3a392 python: Add summary nodes
allowing more `OutNode`s (not restricting to `CallNode`s),
gives more flow in the `classesCallGraph` test
2022-05-10 12:48:42 +00:00
Rasmus Lerchedahl Petersen
80175a9af5 Python: Compiles and mostly pass tests
- add flowsummaries shared files
- register in indentical files
- fix initial non-monotonic recursions
  - add DataFlowSourceCall
  - add resolvedCall
  - add SourceParameterNode

failing tests:
- 3/library-tests/with/test.ql
2022-05-10 12:48:42 +00:00
yoff
6c3e2db7fd Merge branch 'main' into python/simple-csrf 2022-05-10 10:55:28 +02:00
yoff
b6605bc330 Merge pull request #8634 from RasmusWL/promote-xxe
Python: Promote XXE and XML-bomb queries
2022-05-09 21:54:55 +02:00
Rasmus Wriedt Larsen
4a6789182d Python: Apply suggestions from code review
Co-authored-by: yoff <lerchedahl@gmail.com>
2022-05-09 16:37:12 +02:00
Anders Schack-Mulligen
f24364d951 Merge pull request #9045 from hvitved/dataflow/subpaths-perf-take2
Data flow: Speedup `subpaths` predicate (take 2)
2022-05-09 15:39:11 +02:00
Rasmus Wriedt Larsen
36349222a9 Python: Fix casing of XMLDomParsing 2022-05-09 11:00:25 +02:00
Rasmus Wriedt Larsen
f22bd039f3 Python: Slight refactor of LxmlParsing 2022-05-09 10:56:39 +02:00
yoff
6169ac6122 Merge pull request #7776 from RasmusWL/django-filefield-uploadto
Python: Support Django FileField.upload_to
2022-05-05 14:25:08 +02:00
Tom Hvitved
d9d5372f28 Data flow: Sync files 2022-05-05 13:36:26 +02:00
yoff
0c7184952b Merge pull request #9023 from RasmusWL/positional-docs
Python: Clarify `getArg` is about positional arguments
2022-05-05 11:28:17 +02:00
Tom Hvitved
66a9759329 Merge pull request #8870 from hvitved/dataflow/expect-content
Data flow: Introduce `expectsContent`
2022-05-05 09:01:40 +02:00
Tom Hvitved
8e33653d25 Merge pull request #9017 from hvitved/dataflow/subpaths-perf
Data flow: Speedup `subpaths` predicate
2022-05-04 16:37:52 +02:00
Tom Hvitved
9cb63c0a5e Data flow: Sync files 2022-05-04 14:49:26 +02:00
Tom Hvitved
74e99302d6 Address review comments 2022-05-04 09:57:59 +02:00
Tom Hvitved
da72ba46d4 Data flow: Add stub expectsContent for all languages 2022-05-04 09:57:59 +02:00
Tom Hvitved
6e2e8440eb Data flow: Sync files 2022-05-04 09:57:59 +02:00
Rasmus Wriedt Larsen
d012eaa892 Python: Clarify getArg is about positional arguments 2022-05-03 14:26:23 +02:00
yoff
56ed68b3eb Merge pull request #9001 from RasmusWL/files-refactoring
Python: Flask: Improve `request.files` modeing
2022-05-03 12:19:55 +02:00
Tom Hvitved
e9c8f979f9 Data flow: Sync files 2022-05-03 11:46:51 +02:00
Rasmus Wriedt Larsen
de4390cdf6 Python: Improve Flask request.files handling even more 2022-05-02 14:19:45 +02:00
Rasmus Wriedt Larsen
fb0133d276 Python: Fix Flask request.files modeling 2022-05-02 14:14:58 +02:00
yoff
1d44694280 Merge pull request #8732 from RasmusWL/dataflow-imports
Python: Don't re-export `python` under `DataFlow::`
2022-05-02 12:08:28 +02:00
Taus
231def026f Merge pull request #8890 from tausbn/python-add-global-attribute-writes
Python: Add support for global attribute writes
2022-05-02 12:03:41 +02:00
Rasmus Wriedt Larsen
714465bf39 Python: Refactor SaxParserSetFeatureCall
Originally made by @erik-krogh in
https://github.com/github/codeql/pull/8693/files#diff-9627c1fb9a1cc77fb93e6b7e31af1a4fa908f2a60362cfb34377d24debb97398

Could not be applied directly to this PR, since this PR deletes the file.
2022-05-02 11:29:54 +02:00
Rasmus Wriedt Larsen
5f01fc24e4 Merge branch 'main' into promote-xxe 2022-05-02 11:25:55 +02:00
Taus
95d235416c Python: Fix bad antijoin in getAKeyword
Before:

```
Tuple counts for Exprs::Call::getAKeyword_dispred#ff#antijoin_rhs/3@7bc202ij after 9s:
1        ~0%     {1} r1 = CONSTANT(unique int)[2]
4244385  ~2%     {1} r2 = JOIN r1 WITH py_dict_items_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'arg0'
4244352  ~3%     {3} r3 = JOIN r2 WITH AstGenerated::Call_::getNamedArg_dispred#ffb_201#join_rhs ON FIRST 1 OUTPUT Rhs.1 'arg1', Lhs.0 'arg0', Rhs.2 'arg2'
66618690 ~3%     {5} r4 = JOIN r3 WITH AstGenerated::Call_::getNamedArg_dispred#ffb ON FIRST 1 OUTPUT Lhs.1 'arg0', Lhs.0 'arg1', Lhs.2 'arg2', Rhs.1, Rhs.2
31187133 ~0%     {5} r5 = SELECT r4 ON In.3 < In.2 'arg2'
31187133 ~1%     {5} r6 = SCAN r5 OUTPUT In.4, 0, In.0 'arg0', In.1 'arg1', In.2 'arg2'
0        ~0%     {3} r7 = JOIN r6 WITH py_dict_items ON FIRST 2 OUTPUT Lhs.2 'arg0', Lhs.3 'arg1', Lhs.4 'arg2'
                 return r7

Tuple counts for Exprs::Call::getAKeyword_dispred#ff/2@1dc9468b after 421ms:
1       ~0%     {1} r1 = CONSTANT(unique int)[2]
4244385 ~2%     {1} r2 = JOIN r1 WITH py_dict_items_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'result'
4244352 ~0%     {3} r3 = JOIN r2 WITH AstGenerated::Call_::getNamedArg_dispred#ffb_201#join_rhs ON FIRST 1 OUTPUT Lhs.0 'result', Rhs.1 'this', Rhs.2
4244352 ~0%     {3} r4 = r3 AND NOT Exprs::Call::getAKeyword_dispred#ff#antijoin_rhs(Lhs.0 'result', Lhs.1 'this', Lhs.2)
4244352 ~6%     {2} r5 = SCAN r4 OUTPUT In.1 'this', In.0 'result'
                return r5
```

Oof. All that work to produce zero tuples. Luckily we can improve
matters somewhat.

Basically, there's no reason to test _all_ dictionary unpackings, since
we're only interested in a lower bound. Thus, we can use `min` instead
which is much more efficient. For convenience I factored this into its
own (private) helper predicate.

Now the tuple counts look as follows:

```
Tuple counts for Exprs::Call::getMinimumUnpackingIndex_dispred#ff#min_range/2@39b0e9sm after 1ms:
246 ~0%     {2} r1 = JOIN Keywords::DictUnpackingOrKeyword#class#f#shared WITH AstGenerated::Call_::getNamedArg_dispred#ffb_201#join_rhs ON FIRST 1 OUTPUT Rhs.1 'arg0', Rhs.2 'arg1'
            return r1
Registering Exprs::Call::getMinimumUnpackingIndex_dispred#ff#min_range/2@39b0e9sm +  with content 9ea2f123k8necpu015v6tpsc2t1
 >>> Created relation Exprs::Call::getMinimumUnpackingIndex_dispred#ff#min_range/2@39b0e9sm with 246 rows.
Starting to evaluate predicate Exprs::Call::getMinimumUnpackingIndex_dispred#ff#min_term/3@9f4ca5g8
Tuple counts for Exprs::Call::getMinimumUnpackingIndex_dispred#ff#min_term/3@9f4ca5g8 after 0ms:
246 ~2%     {3} r1 = JOIN Keywords::DictUnpackingOrKeyword#class#f#shared WITH AstGenerated::Call_::getNamedArg_dispred#ffb_201#join_rhs ON FIRST 1 OUTPUT Rhs.1 'arg0', Rhs.2 'arg2', Rhs.2 'arg2'
            return r1

Tuple counts for Exprs::Call::getAKeyword_dispred#ff/2@000a0alb after 906ms:
1       ~0%     {1} r1 = CONSTANT(unique int)[2]
4244385 ~2%     {1} r2 = JOIN r1 WITH py_dict_items_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'result'

4244352 ~0%     {3} r3 = JOIN r2 WITH AstGenerated::Call_::getNamedArg_dispred#ffb_201#join_rhs ON FIRST 1 OUTPUT Lhs.0 'result', Rhs.1 'this', Rhs.2
4244280 ~0%     {3} r4 = r3 AND NOT Exprs::Call::getMinimumUnpackingIndex_dispred#ff_0#antijoin_rhs(Lhs.1 'this')
4244280 ~6%     {2} r5 = SCAN r4 OUTPUT In.1 'this', In.0 'result'

4244352 ~3%     {3} r6 = JOIN r2 WITH AstGenerated::Call_::getNamedArg_dispred#ffb_201#join_rhs ON FIRST 1 OUTPUT Rhs.1 'this', Lhs.0 'result', Rhs.2
72      ~4%     {4} r7 = JOIN r6 WITH Exprs::Call::getMinimumUnpackingIndex_dispred#ff ON FIRST 1 OUTPUT Lhs.1 'result', Lhs.0 'this', Lhs.2, Rhs.1
72      ~4%     {4} r8 = SELECT r7 ON In.2 <= In.3
72      ~0%     {2} r9 = SCAN r8 OUTPUT In.1 'this', In.0 'result'

4244352 ~6%     {2} r10 = r5 UNION r9
                return r10
```

This is not the perfect join order (note the similarity between `r3`
and `r6`) but overall it's a win.
2022-04-28 11:11:37 +00:00
Taus
80ef09f034 Python: Fix bad join in declaredAttributeVar
Before:
```
Tuple counts for PointsTo::declaredAttributeVar#fbf/3@99d5aenq after 1.1s:
451054   ~7%     {2} r1 = SCAN variable OUTPUT In.0, In.2 'name'
1296149  ~0%     {2} r2 = JOIN r1 WITH Essa::EssaVariable::getSourceVariable_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'var', Lhs.1 'name'
12179900 ~4%     {3} r3 = JOIN r2 WITH Essa::EssaVariable::getAUse_dispred#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'name', Lhs.0 'var'
8028     ~2%     {3} r4 = JOIN r3 WITH Scope::Scope::getANormalExit_dispred#bf_10#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'name', Lhs.2 'var'
8028     ~2%     {3} r5 = JOIN r4 WITH Classes::PythonClassObjectInternal::getScope_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'cls', Lhs.1 'name', Lhs.2 'var'
                 return r5
```

After:
```
Tuple counts for PointsTo::declaredAttributeVar#fbf/3@cccf36hb after 4ms:
1450 ~0%     {2} r1 = SCAN Classes::PythonClassObjectInternal::getScope_dispred#ff OUTPUT In.1, In.0 'cls'
1450 ~7%     {2} r2 = JOIN r1 WITH Scope::Scope::getANormalExit_dispred#bf ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'cls'
8028 ~0%     {2} r3 = JOIN r2 WITH Essa::EssaVariable::getAUse_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'var', Lhs.1 'cls'
8028 ~0%     {3} r4 = JOIN r3 WITH Essa::EssaVariable::getSourceVariable_dispred#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'cls', Lhs.0 'var'
8028 ~2%     {3} r5 = JOIN r4 WITH variable ON FIRST 1 OUTPUT Lhs.1 'cls', Rhs.2 'name', Lhs.2 'var'
return r5
```
2022-04-28 11:11:37 +00:00
Taus
d28f9f41e8 Python: Fix bad join in import_star_read
Makes this

```
(21s) Tuple counts for DataFlowPublic::import_star_read#ff/2@fcd5e6nr after 8.5s:
9743      ~6%     {3} r1 = SCAN num#DataFlowPublic::TModuleVariableNode#fff OUTPUT In.1, In.0, In.2 'result'
9743      ~1%     {3} r2 = JOIN r1 WITH Variables::Variable::getId_dispred#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2 'result'
390808917 ~3%     {3} r3 = JOIN r2 WITH Flow::NameNode::getId_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2 'result'
307       ~0%     {2} r4 = JOIN r3 WITH ImportStar::ImportStar::importStarResolvesTo#ff ON FIRST 2 OUTPUT Lhs.0, Lhs.2 'result'
307       ~0%     {2} r5 = JOIN r4 WITH num#DataFlowPublic::TCfgNode#ff ON FIRST 1 OUTPUT Rhs.1 'n', Lhs.1 'result'
                  return r5
```

become this

```
(17s) Tuple counts for DataFlowPublic::resolved_import_star_module#fff/3@f5e84aic after 0ms:
307 ~0%     {3} r1 = JOIN ImportStar::ImportStar::importStarResolvesTo#ff WITH num#DataFlowPublic::TCfgNode#ff ON FIRST 1 OUTPUT Lhs.0, Lhs.1 'm', Rhs.1 'n'
307 ~0%     {3} r2 = JOIN r1 WITH Flow::NameNode::getId_dispred#ff ON FIRST 1 OUTPUT Lhs.1 'm', Rhs.1 'name', Lhs.2 'n'
            return r2
(17s) Registering DataFlowPublic::resolved_import_star_module#fff/3@f5e84aic +  with content f29281ig38r98icro4ege09mrva
(17s)  >>> Created relation DataFlowPublic::resolved_import_star_module#fff/3@f5e84aic with 307 rows.
(17s) Starting to evaluate predicate DataFlowPublic::import_star_read#ff/2@57b0c06e
(17s) Tuple counts for DataFlowPublic::import_star_read#ff/2@57b0c06e after 2ms:
9743 ~0%     {3} r1 = SCAN num#DataFlowPublic::TModuleVariableNode#fff OUTPUT In.1, In.0, In.2 'result'
9743 ~0%     {3} r2 = JOIN r1 WITH Variables::Variable::getId_dispred#ff ON FIRST 1 OUTPUT Lhs.1, Rhs.1, Lhs.2 'result'
307  ~0%     {2} r3 = JOIN r2 WITH DataFlowPublic::resolved_import_star_module#fff ON FIRST 2 OUTPUT Rhs.2 'n', Lhs.2 'result'
             return r3
```
2022-04-28 11:11:37 +00:00
yoff
4553a0913f Merge pull request #8897 from tausbn/python-fix-bad-methodcallsite-join
Python: Fix bad join in `MethodCallsiteRefinement`
2022-04-28 12:17:33 +02:00
Taus
b4a31e572f Python: Add global attribute writes 2022-04-27 16:45:00 +00:00
yoff
39753d5a0b Merge pull request #8693 from erik-krogh/pyApi
PY: more API-graphs refactorings
2022-04-27 13:19:50 +02:00
Taus
d3a05b8b7e Python: Fix bad join in MethodCallsiteRefinement
Observed on `FreeCAD/FreeCAD`:

```
Tuple counts for Essa::MethodCallsiteRefinement#24e22a14#f/1@274967ic after 34.5s:
638284     ~0%     {2} r1 = SCAN Essa::TEssaNodeRefinement#24e22a14#ffff OUTPUT In.0, In.3 'this'
636521     ~0%     {2} r2 = r1 AND NOT Essa::SingleSuccessorGuard#class#24e22a14#f(Lhs.1 'this')
1579493668 ~0%     {2} r3 = JOIN r2 WITH SsaDefinitions::SsaSource::method_call_refinement#9197156e#fff ON FIRST 1 OUTPUT Lhs.1 'this', Rhs.2
266673     ~3%     {1} r4 = JOIN r3 WITH Essa::EssaNodeRefinement::getDefiningNode#dispred#f0820431#ff ON FIRST 2 OUTPUT Lhs.0 'this'
                    return r4
```

After a bit of unbinding, we have:

```
Tuple counts for Essa::MethodCallsiteRefinement#24e22a14#f/1@d73d8e27 after 66ms:
215168 ~1%     {2} r1 = SCAN Definitions::SsaSourceVariable#class#486534ab#f OUTPUT In.0, In.0
283965 ~2%     {2} r2 = JOIN r1 WITH SsaDefinitions::SsaSource::method_call_refinement#9197156e#fff ON FIRST 1 OUTPUT Rhs.2, Lhs.1
401274 ~0%     {2} r3 = JOIN r2 WITH Essa::EssaNodeRefinement::getDefiningNode#dispred#f0820431#ff_10#join_rhs ON FIRST 1 OUTPUT Lhs.1, Rhs.1 'this'
266671 ~2%     {1} r4 = JOIN r3 WITH Essa::TEssaNodeRefinement#24e22a14#ffff_03#join_rhs ON FIRST 2 OUTPUT Lhs.1 'this'
266671 ~2%     {1} r5 = r4 AND NOT Essa::SingleSuccessorGuard#class#24e22a14#f(Lhs.0 'this')
                return r5
```
(I'm somewhat confused about the slight difference in tuples, but it's
probably just because the compiler moved some stuff around.)
2022-04-27 11:13:37 +00:00
Erik Krogh Kristensen
e1c7d369be Merge pull request #8796 from erik-krogh/redundantImport
Remove redundant imports
2022-04-27 12:39:51 +02:00
yoff
9d774463f5 Merge pull request #8859 from tausbn/python-fix-bad-essa-joins
Python: Fix a bunch of bad joins
2022-04-27 12:27:50 +02:00
Erik Krogh Kristensen
d389012b75 Merge branch 'main' into redundantImport 2022-04-26 14:24:51 +02:00
Taus
b2cc91369a Python: Fix bad join in firstUse
This was what it looked like (at the point when I killed the evaluation):

```
Tuple counts for SsaCompute::SsaComputeImpl::AdjacentUsesImpl::firstUse#c5fa2be7#ff/2@i1#be98bwif after 1m50s:
274000     ~7%     {4} r1 = SCAN SsaCompute::SsaComputeImpl::AdjacentUsesImpl::definesAt#c5fa2be7#ffff OUTPUT In.1, In.0 'def', In.2, In.3
2731768000 ~1%     {7} r2 = JOIN r1 WITH SsaCompute::SsaComputeImpl::AdjacentUsesImpl::variableSourceUse#c5fa2be7#ffff ON FIRST 1 OUTPUT Rhs.0, Lhs.2, Lhs.3, Rhs.2, Rhs.3, Rhs.1 'use', Lhs.1 'def'
178000     ~4%     {2} r3 = JOIN r2 WITH SsaCompute::SsaComputeImpl::AdjacentUsesImpl::adjacentVarRefs#c5fa2be7#fffff ON FIRST 5 OUTPUT Lhs.6 'def', Lhs.5 'use'
                    return r3
```

And this is what it looks like now:

```
Tuple counts for SsaCompute::SsaComputeImpl::AdjacentUsesImpl::firstUse#c5fa2be7#ff/2@i1#f9d6ewsi after 207ms:
931353  ~2%     {4} r1 = SCAN SsaCompute::SsaComputeImpl::AdjacentUsesImpl::variableSourceUse#c5fa2be7#ffff OUTPUT In.0, In.2, In.3, In.1 'use'
1050477 ~0%     {4} r2 = JOIN r1 WITH SsaCompute::SsaComputeImpl::AdjacentUsesImpl::adjacentVarRefs#c5fa2be7#fffff_03412#join_rhs ON FIRST 3 OUTPUT Lhs.0, Rhs.3, Rhs.4, Lhs.3 'use'
506626  ~0%     {2} r3 = JOIN r2 WITH SsaCompute::SsaComputeImpl::AdjacentUsesImpl::definesAt#c5fa2be7#ffff_1230#join_rhs ON FIRST 3 OUTPUT Rhs.3 'def', Lhs.3 'use'
                return r3
```
2022-04-25 14:33:31 +00:00