mirror of https://github.com/github/codeql.git synced 2026-04-20 14:34:04 +02:00

Go to file

Jonas Jensen d4338473b0 C++: Enforce unique enclosing callable

Every data-flow node should have a unique enclosing function (_callable_
in the terminology of the data-flow library), but this was not evident
for the optimizer, and it led to a bad join order in `pathStep`. This
commit fixes the join order for C++ AST data flow. All other copies of
data flow seem to be fine.

These are the tuple counts for OpenJDK before this commit:

    (231s) Tuple counts for DataFlowImplLocal::pathStep#fffff#cur_delta:
    5882       ~0%       {6} r1 = SCAN DataFlowImplLocal::PathNodeMid#class#ffffff#prev_delta AS I OUTPUT I.<2>, I.<0>, I.<1>, I.<3>, I.<4>, I.<5>
    1063406780 ~0%       {7} r2 = JOIN r1 WITH DataFlowImplCommon::CallContext::relevantFor_dispred#ff AS R ON FIRST 1 OUTPUT r1.<2>, R.<1>, r1.<1>, r1.<0>, r1.<3>, r1.<4>, r1.<5>
    5882       ~1%       {6} r3 = JOIN r2 WITH DataFlowUtil::Node::getFunction_dispred#ff AS R ON FIRST 2 OUTPUT r2.<0>, r2.<6>, r2.<2>, r2.<3>, r2.<4>, r2.<5>
    105        ~0%       {5} r4 = JOIN r3 WITH project#DataFlowImplLocal::LocalFlowBigStep::localFlowBigStep#ffffff_021#join_rhs AS R ON FIRST 2 OUTPUT r3.<2>, r3.<3>, r3.<4>, r3.<5>, R.<2>
    5882       ~1%       {6} r5 = JOIN r2 WITH DataFlowUtil::Node::getFunction_dispred#ff AS R ON FIRST 2 OUTPUT r2.<5>, r2.<2>, r2.<0>, r2.<3>, r2.<4>, r2.<6>
    5882       ~0%       {6} r6 = JOIN r5 WITH DataFlowImplLocal::TNil#ff_1#join_rhs AS R ON FIRST 1 OUTPUT r5.<2>, false, r5.<5>, r5.<1>, r5.<3>, r5.<4>
    0          ~0%       {5} r7 = JOIN r6 WITH DataFlowImplLocal::LocalFlowBigStep::localFlowBigStep#ffffff_02413#join_rhs AS R ON FIRST 3 OUTPUT R.<4>, r6.<3>, r6.<4>, r6.<5>, R.<3>
    0          ~0%       {5} r8 = JOIN r7 WITH DataFlowImplLocal::TNil#ff AS R ON FIRST 1 OUTPUT r7.<1>, r7.<2>, r7.<3>, R.<1>, r7.<4>
    105        ~0%       {5} r9 = r4 \/ r8

The problem is that `DataFlowUtil::Node::getFunction_dispred#ff`
(`getEnclosingCallable`) is joined too late.

After this commit, the tuple counts look like this:

    (13s) Tuple counts for DataFlowImplLocal::pathStep#fffff#cur_delta:
    5882    ~1%       {6} r1 = SCAN DataFlowImplLocal::PathNodeMid#class#ffffff#prev_delta AS I OUTPUT I.<1>, I.<0>, I.<2>, I.<3>, I.<4>, I.<5>
    5882    ~3%       {7} r2 = JOIN r1 WITH DataFlowUtil::Node::getEnclosingCallable_dispred#ff AS R ON FIRST 1 OUTPUT r1.<2>, R.<1>, r1.<1>, r1.<0>, r1.<3>, r1.<4>, r1.<5>
    5882    ~1%       {6} r3 = JOIN r2 WITH DataFlowImplCommon::CallContext::relevantFor_dispred#ff AS R ON FIRST 2 OUTPUT r2.<3>, r2.<6>, r2.<2>, r2.<0>, r2.<4>, r2.<5>
    105     ~0%       {5} r4 = JOIN r3 WITH project#DataFlowImplLocal::LocalFlowBigStep::localFlowBigStep#ffffff_021#join_rhs AS R ON FIRST 2 OUTPUT r3.<2>, r3.<3>, r3.<4>, r3.<5>, R.<2>
    5882    ~1%       {6} r5 = JOIN r2 WITH DataFlowImplCommon::CallContext::relevantFor_dispred#ff AS R ON FIRST 2 OUTPUT r2.<5>, r2.<2>, r2.<3>, r2.<0>, r2.<4>, r2.<6>
    5882    ~0%       {6} r6 = JOIN r5 WITH DataFlowImplLocal::TNil#ff_1#join_rhs AS R ON FIRST 1 OUTPUT r5.<2>, false, r5.<5>, r5.<1>, r5.<3>, r5.<4>
    0       ~0%       {5} r7 = JOIN r6 WITH DataFlowImplLocal::LocalFlowBigStep::localFlowBigStep#ffffff_02413#join_rhs AS R ON FIRST 3 OUTPUT R.<4>, r6.<3>, r6.<4>, r6.<5>, R.<3>
    0       ~0%       {5} r8 = JOIN r7 WITH DataFlowImplLocal::TNil#ff AS R ON FIRST 1 OUTPUT r7.<1>, r7.<2>, r7.<3>, R.<1>, r7.<4>
    105     ~0%       {5} r9 = r4 \/ r8

There is a slight slowdown coming from the introduction of a new
predicate `DataFlowImplLocal::pathStep#fffff#join_rhs`, which is used
only in the standard order:

    (12s) Tuple counts for DataFlowImplLocal::pathStep#fffff#join_rhs:
    282057  ~0%     {2} r1 = SCAN DataFlowImplCommon::CallContext::relevantFor_dispred#ff AS I OUTPUT I.<1>, I.<0>
    9159890 ~1%     {2} r2 = JOIN r1 WITH DataFlowUtil::Node::getEnclosingCallable_dispred#ff_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, r1.<1>
                    return r2

The evaluation of `unique` is cheap but not free:

    DataFlowUtil::Node::getEnclosingCallable_dispred#ff .............. 3.9s
    DataFlowUtil::Node::getEnclosingCallable_dispred#ff_10#join_rhs .. 3.5s

The first of these two predicates evaluates `unique`, and the second
simply reorders columns. They take about the same time, which suggests
that `unique` is about as fast as it can be, given the number of tuples
it needs to push around. Note that the column reordering predicate is
only needed because of the standard order.

2020-04-06 12:04:39 +02:00

.github

Actions: Disable labeler action.

2019-12-16 13:53:00 +01:00

change-notes

Merge pull request #3195 from geoffw0/taintstring

2020-04-03 12:05:07 +02:00

config

C++/C#: Add synchronization

2020-04-03 10:08:00 +02:00

cpp

C++: Enforce unique enclosing callable

2020-04-06 12:04:39 +02:00

csharp

Java/C++/C#: Revert the join order fix from #2872

2020-04-06 10:04:50 +02:00

docs

Merge pull request #3128 from jbj/library-overview-assignment

2020-03-31 18:02:11 +01:00

java

Java/C++/C#: Revert the join order fix from #2872

2020-04-06 10:04:50 +02:00

javascript

Merge pull request #3163 from robertbrignull/code_scanning_suites

2020-04-06 08:45:40 +01:00

misc

add code scanning suites

2020-03-27 17:03:23 +00:00

python

Merge pull request #3163 from robertbrignull/code_scanning_suites

2020-04-06 08:45:40 +01:00

.codeqlmanifest.json

Don't chain to ./codeql in .codeqlmanifest.json

2020-02-13 15:30:15 +01:00

.editorconfig

Normalize all text files to LF

2018-09-23 16:24:31 -07:00

.gitattributes

.gitattributes: PDB files are binary

2019-03-13 10:42:28 +00:00

.gitignore

Move sync-identical-files.py into public repo as sync-files.py

2020-03-29 02:59:14 -04:00

.lgtm.yml

JS: Exclude test cases from extraction

2019-05-07 14:36:35 +01:00

CODE_OF_CONDUCT.md

Fix indentation of list item in code of conduct

2019-07-16 08:49:29 +01:00

CODEOWNERS

CODEOWNERS: Add python team

2019-11-14 16:42:12 +01:00

CONTRIBUTING.md

Docs: refactor guidelines for new queries

2020-03-17 08:24:03 +01:00

QL code and tests for C#/C++/JavaScript.

2018-08-02 17:53:23 +01:00

LICENSE

QL code and tests for C#/C++/JavaScript.

2018-08-02 17:53:23 +01:00

README.md

Docs: Remove some Semmle references

2020-03-11 15:20:15 +01:00

README.md

CodeQL

This open source repository contains the standard CodeQL libraries and queries that power LGTM and the other CodeQL products that GitHub makes available to its customers worldwide.

How do I learn CodeQL and run queries?

There is extensive documentation on getting started with writing CodeQL. You can use the interactive query console on LGTM.com or the CodeQL for Visual Studio Code extension to try out your queries on any open source project that's currently being analyzed.

Contributing

We welcome contributions to our standard library and standard checks. Do you have an idea for a new check, or how to improve an existing query? Then please go ahead and open a pull request! Before you do, though, please take the time to read our contributing guidelines. You can also consult our style guides to learn how to format your code for consistency and clarity, how to write query metadata, and how to write query help documentation for your query.

License

The code in this repository is licensed under Apache License 2.0 by GitHub.

Languages

CodeQL 32.4%

Kotlin 27.4%

C# 17.1%

Java 7.6%

Python 4.6%

Other 10.7%