Commit Graph

4510 Commits

Author SHA1 Message Date
Geoffrey White
ff39f714e8 C++: Autoformat. 2020-04-07 17:07:31 +01:00
Robert Marsh
0ccf39777c Merge pull request #3189 from jbj/DefaultTaintTracking-Configuration
C++: Path explanations in DefaultTaintTracking
2020-04-07 08:38:10 -07:00
Jonas Jensen
39911af56b C++: Avoid partial chi flow to struct/class
Flow through partial chi-instruction operands was introduced to make
definition-by-reference work, but its implementation also allowed all
other partial writes to propagate. In particular, tainting a field would
taint the whole struct, which in turn led to taint propagating across
unrelated fields of a struct.

The security test `CWE-134/semmle/argv/argvLocal.c` shows that we also
want to propagate taint from an array element to the whole array, and it
also seems right to propagate taint from a union member to the whole
union.
2020-04-07 16:24:24 +02:00
Mathias Vorreiter Pedersen
d56284fe8f C++: Move added flow from simpleLocalFlowStep to simpleInstructionLocalFlowStep and remove flow that could cause field conflation 2020-04-07 16:00:40 +02:00
Mathias Vorreiter Pedersen
8928091dfb Merge pull request #3181 from jbj/DefaultTaintTracking-qldoc
C++: QLDoc in DefaultTaintTracking
2020-04-07 14:58:21 +02:00
Jonas Jensen
057155f28f Merge remote-tracking branch 'upstream/master' into DefaultTaintTracking-Configuration 2020-04-07 14:39:30 +02:00
Mathias Vorreiter Pedersen
5719967a8e C++: Remove single-field case from PostUpdateNode and accept tests 2020-04-07 12:03:28 +02:00
Jonas Jensen
0743c42807 Merge remote-tracking branch 'upstream/master' into dataflow-indirect-args
Accepted test results that were in semantic merge conflict between
these branches. The changed results are due to a bug that that's part of
https://github.com/github/codeql-c-analysis-team/issues/35.
2020-04-06 19:26:08 +02:00
Henning Makholm
bf579dedd4 Add extractor field in base language QL packs 2020-04-06 18:48:01 +02:00
Jonas Jensen
e37aab5002 C++: Suppress FieldAddressInstruction taint
See code comment. This fixes false positives on openjdk/jdk.
2020-04-06 16:14:26 +02:00
Geoffrey White
a71ae2b468 C++: Consistent treatment of placement new. 2020-04-06 14:54:15 +01:00
Geoffrey White
492c5f367f C++: Simplify NewDelete.qll. 2020-04-06 14:54:15 +01:00
Mathias Vorreiter Pedersen
c577541850 C++: Fix reverse read dataflow consistency failure and accept tests 2020-04-06 15:50:08 +02:00
Geoffrey White
cbe133d0e6 C++: Deprecate freeCall in the legacy wrapper Alloc.qll. 2020-04-06 14:32:49 +01:00
Geoffrey White
e223557201 C++: Wean NewDelete.qll off the legacy wrapper Alloc.qll. 2020-04-06 14:32:15 +01:00
Geoffrey White
8059d69bbd C++: Model calls to operator new / delete for NewFreeMismatch.ql. 2020-04-06 14:27:05 +01:00
Geoffrey White
3e9f9645ae C++: Exclude calls to operator new / delete from NewFreeMismatch.ql. 2020-04-06 14:08:00 +01:00
Jonas Jensen
bf7614a4c9 C++: Move Expr location workaround to Expr.qll
This workaround from `DataFlowUtil.qll` should be useful for any query
that selects an `Expr`. In particular, it's useful for IR data flow.

This commit does not include test changes.
2020-04-06 14:13:22 +02:00
Jonas Jensen
d4338473b0 C++: Enforce unique enclosing callable
Every data-flow node should have a unique enclosing function (_callable_
in the terminology of the data-flow library), but this was not evident
for the optimizer, and it led to a bad join order in `pathStep`. This
commit fixes the join order for C++ AST data flow. All other copies of
data flow seem to be fine.

These are the tuple counts for OpenJDK before this commit:

    (231s) Tuple counts for DataFlowImplLocal::pathStep#fffff#cur_delta:
    5882       ~0%       {6} r1 = SCAN DataFlowImplLocal::PathNodeMid#class#ffffff#prev_delta AS I OUTPUT I.<2>, I.<0>, I.<1>, I.<3>, I.<4>, I.<5>
    1063406780 ~0%       {7} r2 = JOIN r1 WITH DataFlowImplCommon::CallContext::relevantFor_dispred#ff AS R ON FIRST 1 OUTPUT r1.<2>, R.<1>, r1.<1>, r1.<0>, r1.<3>, r1.<4>, r1.<5>
    5882       ~1%       {6} r3 = JOIN r2 WITH DataFlowUtil::Node::getFunction_dispred#ff AS R ON FIRST 2 OUTPUT r2.<0>, r2.<6>, r2.<2>, r2.<3>, r2.<4>, r2.<5>
    105        ~0%       {5} r4 = JOIN r3 WITH project#DataFlowImplLocal::LocalFlowBigStep::localFlowBigStep#ffffff_021#join_rhs AS R ON FIRST 2 OUTPUT r3.<2>, r3.<3>, r3.<4>, r3.<5>, R.<2>
    5882       ~1%       {6} r5 = JOIN r2 WITH DataFlowUtil::Node::getFunction_dispred#ff AS R ON FIRST 2 OUTPUT r2.<5>, r2.<2>, r2.<0>, r2.<3>, r2.<4>, r2.<6>
    5882       ~0%       {6} r6 = JOIN r5 WITH DataFlowImplLocal::TNil#ff_1#join_rhs AS R ON FIRST 1 OUTPUT r5.<2>, false, r5.<5>, r5.<1>, r5.<3>, r5.<4>
    0          ~0%       {5} r7 = JOIN r6 WITH DataFlowImplLocal::LocalFlowBigStep::localFlowBigStep#ffffff_02413#join_rhs AS R ON FIRST 3 OUTPUT R.<4>, r6.<3>, r6.<4>, r6.<5>, R.<3>
    0          ~0%       {5} r8 = JOIN r7 WITH DataFlowImplLocal::TNil#ff AS R ON FIRST 1 OUTPUT r7.<1>, r7.<2>, r7.<3>, R.<1>, r7.<4>
    105        ~0%       {5} r9 = r4 \/ r8

The problem is that `DataFlowUtil::Node::getFunction_dispred#ff`
(`getEnclosingCallable`) is joined too late.

After this commit, the tuple counts look like this:

    (13s) Tuple counts for DataFlowImplLocal::pathStep#fffff#cur_delta:
    5882    ~1%       {6} r1 = SCAN DataFlowImplLocal::PathNodeMid#class#ffffff#prev_delta AS I OUTPUT I.<1>, I.<0>, I.<2>, I.<3>, I.<4>, I.<5>
    5882    ~3%       {7} r2 = JOIN r1 WITH DataFlowUtil::Node::getEnclosingCallable_dispred#ff AS R ON FIRST 1 OUTPUT r1.<2>, R.<1>, r1.<1>, r1.<0>, r1.<3>, r1.<4>, r1.<5>
    5882    ~1%       {6} r3 = JOIN r2 WITH DataFlowImplCommon::CallContext::relevantFor_dispred#ff AS R ON FIRST 2 OUTPUT r2.<3>, r2.<6>, r2.<2>, r2.<0>, r2.<4>, r2.<5>
    105     ~0%       {5} r4 = JOIN r3 WITH project#DataFlowImplLocal::LocalFlowBigStep::localFlowBigStep#ffffff_021#join_rhs AS R ON FIRST 2 OUTPUT r3.<2>, r3.<3>, r3.<4>, r3.<5>, R.<2>
    5882    ~1%       {6} r5 = JOIN r2 WITH DataFlowImplCommon::CallContext::relevantFor_dispred#ff AS R ON FIRST 2 OUTPUT r2.<5>, r2.<2>, r2.<3>, r2.<0>, r2.<4>, r2.<6>
    5882    ~0%       {6} r6 = JOIN r5 WITH DataFlowImplLocal::TNil#ff_1#join_rhs AS R ON FIRST 1 OUTPUT r5.<2>, false, r5.<5>, r5.<1>, r5.<3>, r5.<4>
    0       ~0%       {5} r7 = JOIN r6 WITH DataFlowImplLocal::LocalFlowBigStep::localFlowBigStep#ffffff_02413#join_rhs AS R ON FIRST 3 OUTPUT R.<4>, r6.<3>, r6.<4>, r6.<5>, R.<3>
    0       ~0%       {5} r8 = JOIN r7 WITH DataFlowImplLocal::TNil#ff AS R ON FIRST 1 OUTPUT r7.<1>, r7.<2>, r7.<3>, R.<1>, r7.<4>
    105     ~0%       {5} r9 = r4 \/ r8

There is a slight slowdown coming from the introduction of a new
predicate `DataFlowImplLocal::pathStep#fffff#join_rhs`, which is used
only in the standard order:

    (12s) Tuple counts for DataFlowImplLocal::pathStep#fffff#join_rhs:
    282057  ~0%     {2} r1 = SCAN DataFlowImplCommon::CallContext::relevantFor_dispred#ff AS I OUTPUT I.<1>, I.<0>
    9159890 ~1%     {2} r2 = JOIN r1 WITH DataFlowUtil::Node::getEnclosingCallable_dispred#ff_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, r1.<1>
                    return r2

The evaluation of `unique` is cheap but not free:

    DataFlowUtil::Node::getEnclosingCallable_dispred#ff .............. 3.9s
    DataFlowUtil::Node::getEnclosingCallable_dispred#ff_10#join_rhs .. 3.5s

The first of these two predicates evaluates `unique`, and the second
simply reorders columns. They take about the same time, which suggests
that `unique` is about as fast as it can be, given the number of tuples
it needs to push around. Note that the column reordering predicate is
only needed because of the standard order.
2020-04-06 12:04:39 +02:00
Mathias Vorreiter Pedersen
3aa293210d C++: Ensure that only non-conflated chi instructions are used everywhere 2020-04-06 12:02:56 +02:00
Jonas Jensen
46fc91315b Java/C++/C#: Revert the join order fix from #2872
This revert brings back the performance problems in
`DataFlowImplLocal.qll` so they can be fixed in a different way. The fix
in #2872 was asymptotically good but had undesired overhead because it
introduced another predicate in the SCC that existed purely for join
ordering.

I did the revert by inlining the helper predicate, eliminating the
`enclosing` variable, and re-ordering the resulting lines to what they
were before #2872.
2020-04-06 10:04:50 +02:00
Robert
1096e5d947 Merge pull request #3163 from robertbrignull/code_scanning_suites
Add code-scanning suites
2020-04-06 08:45:40 +01:00
Mathias Vorreiter Pedersen
317734f41e C++: Attach PostUpdateNodes to Chi nodes following aschackmull's suggestion 2020-04-05 22:35:26 +02:00
Jonas Jensen
530d4294b0 Merge remote-tracking branch 'upstream/master' into DefaultTaintTracking-Configuration 2020-04-05 07:27:07 +02:00
Jonas Jensen
58366b19e9 C++: Path explanations in the last two queries
For some reason I thought that these two queries were special because
they manipulate `SecurityOptions` to change the taint-tracking sources.
It turns out it was just the opposite: the queries used to be special
because they invalidated the cache for the `tainted` predicate, but that
predicate is no longer used, so these queries are no longer special.
2020-04-04 16:47:06 +02:00
Jonas Jensen
d7332644f0 C++: Fix DefinitionByReferenceNode.toString
This predicate now has a value also for calls to function pointers.
2020-04-04 15:31:01 +02:00
Jonas Jensen
108d5177b8 C++: Fix two bugs found by @rdmarsh2
Co-Authored-By: Robert Marsh <rdmarsh2@gmail.com>
2020-04-04 15:24:44 +02:00
Robert Marsh
316d932829 Merge pull request #3198 from MathiasVP/valuenumbering-provider-new-file
C++/C#: Prevent accidental import of ValueNumberPropertyProvider
2020-04-03 13:31:11 -07:00
Jonas Jensen
bb3616e4c4 C++: Add example for globalVarFromId 2020-04-03 17:51:35 +02:00
Jonas Jensen
5822cd7b84 C++: Put paths in the remaining LGTM-suite queries 2020-04-03 17:10:47 +02:00
Jonas Jensen
3ec1f691c2 C++: First query with flow-paths through globals 2020-04-03 16:45:00 +02:00
Jonas Jensen
aaebe3687e C++: Fix copy-paste error in convertedExprNode 2020-04-03 16:37:23 +02:00
Jonas Jensen
469bdae9b2 C++: More helpful toString for def. by ref. node 2020-04-03 16:37:23 +02:00
Jonas Jensen
36da2d1dae C++: Manipulate the source end of paths too
Without this, we get duplicate alerts in some cases and
unnatural-looking source nodes in other cases. The source nodes were
often `Conversion`s.
2020-04-03 16:37:23 +02:00
Jonas Jensen
e916f07a8e C++: Formatting fixups 2020-04-03 15:52:13 +02:00
Jonas Jensen
427815d3d1 C++: taintedWithPath QLDoc + simplification 2020-04-03 15:52:13 +02:00
Jonas Jensen
3653627650 C++: Let configuration class extend singleton 2020-04-03 15:52:13 +02:00
Jonas Jensen
16c7a35b1c Merge pull request #3195 from geoffw0/taintstring
C++: Model taint flow through std::string constructor and c_str()
2020-04-03 12:05:07 +02:00
Geoffrey White
73bfd819d9 C++: Rename classes. 2020-04-03 09:23:31 +01:00
Geoffrey White
1bcf187c3e C++: Rename Strings.qll -> StdString.qll. 2020-04-03 09:17:33 +01:00
Mathias Vorreiter Pedersen
0b12c1519b C++/C#: Sync identical files 2020-04-03 10:06:37 +02:00
Mathias Vorreiter Pedersen
0f70944a5b C++: Move ValueNumberPropertyProvider into its own file to prevent accidental imports 2020-04-03 09:55:41 +02:00
Robert Marsh
a8e191248e Merge branch 'master' into rdmarsh/cpp/ir-flow-through-outparams
Merge IR SSA test additions
2020-04-02 15:30:20 -07:00
Robert Marsh
fd915bb5b1 C++: fix join order in IR virtual dispatch 2020-04-02 14:56:11 -07:00
Geoffrey White
c9ec30fa2a C++: Update use of deprecated methods. 2020-04-02 19:49:42 +01:00
Geoffrey White
e9132d833c C++: Autoformat. 2020-04-02 19:49:42 +01:00
Geoffrey White
73171682b7 C++: Switch to taint flow as suggested in the old PR. 2020-04-02 19:49:41 +01:00
Geoffrey White
b14b52d0ac C++: Add models for std::string (as in old PR). 2020-04-02 19:49:41 +01:00
Mathias Vorreiter Pedersen
ce5d8d516f Merge branch 'master' into ir-flow-fields 2020-04-02 15:23:00 +02:00
Mathias Vorreiter Pedersen
e2908eaf63 C++: Add comment explaining why we can split call and allocation side effects 2020-04-02 15:11:13 +02:00