Commit Graph

33872 Commits

Author SHA1 Message Date
Taus
1decf23785 Python: Fix bad join order for sensitive data
Not the prettiest of solutions, but it does the job. Basically, we were
calculating (and re-calculating) the same big relation between strings
and regexes and then checking whether the latter matched the former.

This resulted in tuple counts like the following:

```
[2021-07-12 16:09:24] (12s) Tuple counts for SensitiveDataSources::SensitiveDataModeling::SensitiveVariableAssignment#class#ff#shared/4@7489c6:
4918074 ~0%     {4} r1 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH Flow::NameNode::getId_dispred#ff CARTESIAN PRODUCT OUTPUT Lhs.0 'arg0', Lhs.1 'arg1', Rhs.0, Rhs.1 'arg3'
2654    ~0%     {4} r2 = JOIN r1 WITH PRIMITIVE regexpMatch#bb ON Lhs.3 'arg3',Lhs.1 'arg1'
                return r2
```
(The above being just the bit that handles `DefinitionNode` in
`SensitiveVariableAssignment`, and taking 12 seconds to evaluate.)

By applying a bit of manual inlining and magic, this becomes somewhat
more manageable:

```
[2021-07-12 15:59:44] (1s) Tuple counts for SensitiveDataSources::SensitiveDataModeling::sensitiveString#ff/2@8830e2:
27671  ~2%      {3} r1 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveParameterName#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0

334012 ~2%      {3} r2 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveName#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0

361683 ~11%     {3} r3 = r1 UNION r2

154644 ~0%      {3} r4 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveFunctionName#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0

149198 ~1%      {3} r5 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveStrConst#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0

124257 ~5%      {3} r6 = JOIN SensitiveDataHeuristics::HeuristicNames::maybeSensitiveRegexp#ff WITH SensitiveDataSources::SensitiveDataModeling::sensitiveAttributeName#f CARTESIAN PRODUCT OUTPUT Lhs.0 'classification', Lhs.1, Rhs.0

273455 ~21%     {3} r7 = r5 UNION r6
428099 ~30%     {3} r8 = r4 UNION r7
789782 ~78%     {3} r9 = r3 UNION r8
1121   ~77%     {3} r10 = JOIN r9 WITH PRIMITIVE regexpMatch#bb ON Lhs.2 'result',Lhs.1
1121   ~70%     {2} r11 = SCAN r10 OUTPUT In.0 'classification', In.2 'result'
                return r11
```
(The above being the total for all the sensitive names we care about,
taking only 1.2 seconds to evaluate.)

Incidentally, you may wonder why this has _fewer_ results than before.
The answer is control flow splitting -- every sensitively-named
`DefinitionNode` would have been matched in isolation previously. By
pre-matching on just the names of these, we can subsequently join
against those names that are known to be sensitive, which is a much
faster operation.

(We also get the benefit of deduplicating the strings that are matched,
before actually performing the match, so if, say, an attribute name and
a variable name are identical, then we'll only match them once.)

We also exclude all docstrings as relevant string constants, as these
presumably don't actually flow anywhere.
2021-07-12 16:10:49 +00:00
Mathias Vorreiter Pedersen
4fc60aedc6 C++: Relax the restrictions on when '%' is a barrier and accept test changes. 2021-07-12 17:39:12 +02:00
Mathias Vorreiter Pedersen
a6f1f8d3b6 C++: Add testcases demonstrating FPs from real code. 2021-07-12 17:39:12 +02:00
Mathias Vorreiter Pedersen
6a11aa7f2a Merge pull request #6154 from MathiasVP/more-random-sources-in-uncontrolled-arithmetic
C++: Add more random sources in `cpp/uncontrolled-arithmetic`
2021-07-12 17:37:44 +02:00
Robin Neatherway
5d849a9f9d Add docs for summary type queries 2021-07-12 16:26:21 +01:00
Mathias Vorreiter Pedersen
768b3c84c9 C++: Fix a bug that slipped into fd477383b0. 2021-07-12 17:13:21 +02:00
Erik Krogh Kristensen
899e54fbc9 add support for the slash library 2021-07-12 16:36:54 +02:00
Max Schaefer
ce24215dd5 JavaScript: Improve modelling of Module.prototype._compile sink. 2021-07-12 15:32:21 +01:00
Max Schaefer
70c82c83ac JavaScript: Make ModuleVarNode and ExportsVarNode more easily accessible. 2021-07-12 15:31:40 +01:00
Taus
a73e382dfe Python: Prevent bad join in hashlib model
I'm not entirely sure what triggered this bad join order, but some
combination of the use of abstract classes and the exclusion of `new`
caused this to go really wrong:

```
WeakSensitiveDataHashing.ql-15:Stdlib::Stdlib::HashlibDataPassedToHashClass#class#ffff ......... 15.5s
```

with the following tuple counts:
```
[2021-07-12 13:20:15] (16s) Tuple counts for Stdlib::Stdlib::HashlibDataPassedToHashClass#class#ffff/4@217901:
148810  ~3%        {3} r1 = JOIN DataFlowPublic::CallCfgNode#class#ff#shared WITH project#DataFlowPublic::CallCfgNode::getArg_dispred#fff ON FIRST 1 OUTPUT "hashlib", Lhs.1 'node', Lhs.0 'this'
148810  ~4%        {3} r2 = JOIN r1 WITH ApiGraphs::API::Impl::MkModuleImport#ff@staged_ext ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'node', Lhs.2 'this'
7589310 ~486%      {4} r3 = JOIN r2 WITH ApiGraphs::API::Impl::edge#2#fff@staged_ext ON FIRST 1 OUTPUT Lhs.1 'node', Lhs.2 'this', Rhs.1, InverseAppend("getMember(\"","\")",Rhs.1)
6994070 ~490%      {4} r4 = SELECT r3 ON In.3 != "new"
6994070 ~4503%     {2} r5 = SCAN r4 OUTPUT In.1 'this', In.0 'node'

22      ~4%        {3} r6 = JOIN DataFlowPublic::CallCfgNode#class#ff#shared WITH project#DataFlowPublic::CallCfgNode::getArgByName_dispred#fff ON FIRST 1 OUTPUT "hashlib", Lhs.1 'node', Lhs.0 'this'
22      ~0%        {3} r7 = JOIN r6 WITH ApiGraphs::API::Impl::MkModuleImport#ff@staged_ext ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'node', Lhs.2 'this'
1122    ~437%      {4} r8 = JOIN r7 WITH ApiGraphs::API::Impl::edge#2#fff@staged_ext ON FIRST 1 OUTPUT Lhs.1 'node', Lhs.2 'this', Rhs.1, InverseAppend("getMember(\"","\")",Rhs.1)
1034    ~460%      {4} r9 = SELECT r8 ON In.3 != "new"
1034    ~4549%     {2} r10 = SCAN r9 OUTPUT In.1 'this', In.0 'node'

6995104 ~4503%     {2} r11 = r5 UNION r10
5213851 ~4683%     {3} r12 = JOIN r11 WITH ApiGraphs::API::Node::getACall_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'hashClass', Lhs.1 'node', Lhs.0 'this'
6478480 ~4646%     {6} r13 = JOIN r12 WITH ApiGraphs::API::Impl::edge#2#fff_201#join_rhs ON FIRST 1 OUTPUT "hashlib", Rhs.1, Lhs.1 'node', Lhs.2 'this', Lhs.0 'hashClass', Rhs.2
1410    ~4693%     {5} r14 = JOIN r13 WITH ApiGraphs::API::Impl::MkModuleImport#ff@staged_ext ON FIRST 2 OUTPUT Lhs.2 'node', Lhs.3 'this', Lhs.4 'hashClass', Lhs.5, InverseAppend("getMember(\"","\")",Lhs.5)
1222    ~4540%     {5} r15 = SELECT r14 ON In.4 'hashName' != "new"
1222    ~4540%     {4} r16 = SCAN r15 OUTPUT In.1 'this', In.4 'hashName', In.2 'hashClass', In.0 'node'
```

By factoring out the insides, the biggest iteration now looks like

```
[2021-07-12 14:17:36] (0s) Tuple counts for Stdlib::Stdlib::HashlibDataPassedToHashClass#class#ffff/4@85bb21:
148810 ~0%     {2} r1 = JOIN DataFlowPublic::CallCfgNode#class#ff#shared WITH project#DataFlowPublic::CallCfgNode::getArg_dispred#fff ON FIRST 1 OUTPUT Lhs.1 'node', Lhs.0 'this'
148810 ~0%     {2} r2 = JOIN r1 WITH Stdlib::Stdlib::hashlibMember#ff#nonempty CARTESIAN PRODUCT OUTPUT Lhs.1 'this', Lhs.0 'node'

22     ~0%     {2} r3 = JOIN DataFlowPublic::CallCfgNode#class#ff#shared WITH project#DataFlowPublic::CallCfgNode::getArgByName_dispred#fff ON FIRST 1 OUTPUT Lhs.1 'node', Lhs.0 'this'
22     ~0%     {2} r4 = JOIN r3 WITH Stdlib::Stdlib::hashlibMember#ff#nonempty CARTESIAN PRODUCT OUTPUT Lhs.1 'this', Lhs.0 'node'

148832 ~0%     {2} r5 = r2 UNION r4
110933 ~2%     {3} r6 = JOIN r5 WITH ApiGraphs::API::Node::getACall_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'hashClass', Lhs.1 'node', Lhs.0 'this'
26     ~0%     {4} r7 = JOIN r6 WITH Stdlib::Stdlib::hashlibMember#ff_10#join_rhs ON FIRST 1 OUTPUT Lhs.2 'this', Rhs.1 'hashName', Lhs.0 'hashClass', Lhs.1 'node'
               return r7
```

(The tuple counts themselves are not directly comparable.)
2021-07-12 14:22:21 +00:00
Rasmus Wriedt Larsen
47f5c977cf Python: Port py/stack-trace-exposure to use proper source/sink customization 2021-07-12 16:22:10 +02:00
Rasmus Wriedt Larsen
934007c811 Python: Port py/unsafe-deserialization to use proper source/sink customization 2021-07-12 16:22:10 +02:00
Rasmus Wriedt Larsen
7c71223f7f Python: Port py/url-redirection to use proper source/sink customization 2021-07-12 16:22:10 +02:00
Rasmus Wriedt Larsen
b4c0b1b525 Python: Port py/reflective-xss to use proper source/sink customization 2021-07-12 16:22:10 +02:00
Rasmus Wriedt Larsen
62e4445f45 Python: Port py/command-line-injection to use proper source/sink customization 2021-07-12 16:22:10 +02:00
Rasmus Wriedt Larsen
7f53781ba7 Python: Port py/code-injection to use proper source/sink customization 2021-07-12 16:22:10 +02:00
Rasmus Wriedt Larsen
0be280c608 Python: Port py/sql-injection to use proper source/sink customization 2021-07-12 16:22:10 +02:00
Tom Hvitved
6ba6d9931c C#: Skip dotnet restore in standalone extraction when nuget_restore: false is set 2021-07-12 15:16:16 +02:00
Mathias Vorreiter Pedersen
be06230b43 Merge branch 'main' into path-sensitive-stack-variable-reachability-analysis 2021-07-12 14:46:44 +02:00
Asger F
d8927e5612 Apply suggestions from code review
Co-authored-by: Erik Krogh Kristensen <erik-krogh@github.com>
2021-07-12 14:23:58 +02:00
edvraa
a0942e0360 JsonConvert 2021-07-12 15:23:04 +03:00
Erik Krogh Kristensen
c4f5009917 make explicit calls to member predicates
Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
2021-07-12 14:22:08 +02:00
Erik Krogh Kristensen
d22ebadcf2 add support for many more case changing libraries 2021-07-12 14:09:34 +02:00
Mathias Vorreiter Pedersen
dec747f6f0 Merge branch 'main' into more-random-sources-in-uncontrolled-arithmetic 2021-07-12 13:48:48 +02:00
Erik Krogh Kristensen
a5d1325d3f add support for the change-case library 2021-07-12 13:37:06 +02:00
Taus
1e79091120 Python: Fix typo 2021-07-12 11:33:52 +00:00
Mathias Vorreiter Pedersen
c47d680d65 Merge pull request #6168 from criemen/fix-warning
C++: Fix warning from compile-query.
2021-07-12 12:41:29 +02:00
edvraa
f4cb6c50c0 YamlDotNet 2021-07-12 13:25:50 +03:00
edvraa
1e4409f9ed SharpSerializer 2021-07-12 13:22:20 +03:00
edvraa
c3ac3ca41c FsPickler 2021-07-12 13:20:57 +03:00
Tom Hvitved
47d126e681 Data flow: Sync 2021-07-12 12:09:51 +02:00
Tom Hvitved
09daf86e33 Data flow: Fix bad join-orders in summaryNodeType 2021-07-12 12:09:06 +02:00
Taus
32062d83ad Python: Make deprecation warning more prominent 2021-07-12 10:00:21 +00:00
Taus
200da983d9 Python: Add change note 2021-07-12 09:59:17 +00:00
Mathias Vorreiter Pedersen
04dcef5ec4 C++: Include ComplementExpr as a sanitizer. 2021-07-12 11:53:47 +02:00
Cornelius Riemenschneider
d34f7b941a C++: Address code review. 2021-07-12 11:43:43 +02:00
Cornelius Riemenschneider
e821b8be99 C++: Fix warning from compile-query. 2021-07-12 11:43:43 +02:00
Mathias Vorreiter Pedersen
d2cc0d3925 C++: Fix annotations. 2021-07-12 11:30:43 +02:00
Erik Krogh Kristensen
bef7e61e76 add support for the fast-json-stringify library 2021-07-12 11:13:01 +02:00
Erik Krogh Kristensen
40aa970db3 add support for the strip-json-comments library 2021-07-12 11:08:50 +02:00
Erik Krogh Kristensen
23c3be6860 add support for the json-cycle library 2021-07-12 11:03:39 +02:00
Asger Feldthaus
5df961c4ed JS: Add change note 2021-07-12 10:53:41 +02:00
Erik Krogh Kristensen
94cbc4b2c0 add step through the fclone library 2021-07-12 10:51:43 +02:00
Erik Krogh Kristensen
f99a33598f add support for the safe-stable-stringify library 2021-07-12 10:51:43 +02:00
Erik Krogh Kristensen
d6300bced3 add support for the replicator library 2021-07-12 10:51:43 +02:00
Erik Krogh Kristensen
babf657d9d add support for the teleport-javascript library 2021-07-12 10:51:43 +02:00
Erik Krogh Kristensen
9261b7f859 add support for the flatted library 2021-07-12 10:51:43 +02:00
Erik Krogh Kristensen
1792c9a611 add taint step through the prettyjson library 2021-07-12 10:51:43 +02:00
Erik Krogh Kristensen
0bfff1eb7e add support for the json5 library 2021-07-12 10:51:42 +02:00
Erik Krogh Kristensen
cb3bd4901b add taint step through the json2csv library 2021-07-12 10:51:42 +02:00