Commit Graph

3356 Commits

Author SHA1 Message Date
Tom Hvitved
4b3cf72c1c C#: Teach XPath injection query about XPathNavigator 2020-03-19 13:38:16 +01:00
Robert Marsh
59a81d8445 C++: merge from master and accept test changes 2020-03-18 13:47:01 -07:00
Tom Hvitved
937924571c Data flow: Sync files 2020-03-18 18:16:27 +01:00
Tom Hvitved
d0aaaad537 Address review comments 2020-03-18 18:16:11 +01:00
Dave Bartolomeo
26ea93af58 Merge remote-tracking branch 'upstream/master' into dbartol/VarArgIR 2020-03-18 09:52:21 -04:00
Tom Hvitved
3bd6429072 Data flow: Sync files 2020-03-18 13:28:26 +01:00
Tom Hvitved
321b91209f Address review comments 2020-03-18 13:28:16 +01:00
Robert Marsh
3a66b04e7a C#: add debug switch to IRConfiguration 2020-03-17 08:51:00 -07:00
Dave Bartolomeo
9cc3cda58e C++: Model varargs in IR, Part I
This change introduces a new synthesized `IRVariable` in every varargs function. This variable represents the entire set of arguments passed to the ellipsis by the caller. We give it an opaque type big enough hold all of the arguments passed by the largest vararg call in the database. It is treated just like any other parameter. It is initialized the same, it has indirect buffers, etc.

I had to introduce a couple new APIs to `Call` and `Function`. The QLDoc comments should explain these. I added tests for these new APIs as well.

The next step will be to change the IR generation for the `va_*` macros to manipulate the ellipsis parameter.
2020-03-17 11:11:48 -04:00
Tom Hvitved
2e8bd5ccba Data flow: Sync files 2020-03-17 15:16:12 +01:00
Tom Hvitved
0645940a5c Address review comments 2020-03-17 15:16:01 +01:00
Robert Marsh
de2d23b432 C++/C#: autoformat 2020-03-16 17:25:53 -07:00
Tom Hvitved
f935f5eaca Data flow: Sync files 2020-03-13 13:58:05 +01:00
Tom Hvitved
17e904f0f6 Data flow: Refactoring + performance improvements
- Introduce `ReadTaintNode` and `TaintStoreNode` to simplify logic for taint
  getters and taint setters, respectively.
- `nodeCandFwd2`: Restrict `stored` column after a read, based on what it might
  be before a store of the same field.
- `nodeCand2`: Restrict `read` column (renamed from `stored`) after a store, based
  on what it might be after a read of the same field.
- Move big step predicates into a `LocalFlowBigStep` module.
- Define predicates by dispatch in `AccessPath[Front]` class.
- `flowCandFwd0`: Restrict `apf` column after a read, as it should be able to match
  a Boolean `read` column from `nodeCand2`.
- `flowFwd0`: Restrict columns `ap` and `apf` after a read, by introducing a
  `flowConsCandFwd` predicate (similar to what is done in the previous pruning steps).
- `flowFwd0`: Restrict columns `ap` and `apf` after a store, by introducing a
  `flowConsCand` predicate (similar to what is done in the previous pruning steps).
2020-03-13 13:58:05 +01:00
Dave Bartolomeo
1526400a81 C++: Model dynamic initialization of static local variables in IR
Previously, the IR for the initialization of a static local variable ran the initialization unconditionally, every time the declaration was reached during execution. This means that we don't model the possibility that an access to the static variable fetches a value that was set on a previous execution of the function.

I've added some simple modelling of the correct behavior to the IR. For each static local variable that has a dynamic initializer, we synthesize a (static) `bool` variable to hold whether the initializer for the original variable has executed. When executing a declaration, we check the value of the synthesized variable, and skip the initialization code if it is `true`. If it is `false`, we execute the initialization code as before, and then set the flag to `true`. This doesn't capture the thread-safe nature of static initialization, but I think it's more than enough to handle anything we're likely to care about for the foreseeable future.

In `TranslatedDeclarationEntry.qll`, I split the translation of a static local variable declaration into two `TranslatedElement`s: one for the declaration itself, and one for the initialization. The declaration part handles the checking and setting of the flag; the initialization just does the initialization as before.

I've added an IR test case that has static variables with constant, zero, and dynamic initialization. I've also verified the new IR generated for @jbj's previous test cases for constant initialization.

I inverted the sense of the `hasConstantInitialization()` predicate to be `hasDynamicInitialization()`. Mostly this just made more sense to me, but I think it also fixed a potential bug where `hasConstantInitialization()` would not hold for a zero-initialized variable. Technically, constant initialization isn't the same as zero initialization, but I believe that most code really cares about the distinction between dynamic initialization and static initialization, where static initialization includes both constant and zero initialization.

I've fixed up the C# side of IR generation to continue working, but it doesn't use any of the dynamic initialization stuff. In theory, it could use something similar to model the initialization of static fields.
2020-03-12 18:29:16 -04:00
Robert Marsh
cc99ddfd2c C++/C#: resync 2020-03-11 12:41:26 -07:00
Robert Marsh
1878d04852 C++/C#: sync files and update imports 2020-03-11 11:49:11 -07:00
Anders Schack-Mulligen
a9d76cbe64 Dataflow: Add consistency checks for toString and location. 2020-03-11 10:29:48 +01:00
Robert Marsh
bba6b23019 Merge branch 'master' into rdmarsh/cpp/ir-flow-through-outparams 2020-03-10 11:12:19 -07:00
Tom Hvitved
bd6c23d165 Merge pull request #3020 from aschackmull/dataflow/type-pruning-bigstep
Dataflow: Fix bug in type pruning.
2020-03-10 14:21:21 +01:00
Anders Schack-Mulligen
e97c72cd5d Dataflow: Adjust imports. 2020-03-10 11:34:09 +01:00
Anders Schack-Mulligen
fc87f1eb1b C#: Fix tests. 2020-03-10 10:54:48 +01:00
Dave Bartolomeo
9fae2faaeb Merge pull request #2994 from jbj/IRSanity-separate-file
C++: Move InstructionSanity out of Instruction.qll
2020-03-09 16:34:36 -04:00
Tom Hvitved
6a10516c1e Merge pull request #3021 from aschackmull/dataflow/partial-path-perf
Java/C++/C#: Fix performance issue in partial paths exploration.
2020-03-09 15:04:33 +01:00
Anders Schack-Mulligen
a2bbacf58d Java/C++/C#: Fix performance issue in partial paths exploration. 2020-03-09 11:30:59 +01:00
Anders Schack-Mulligen
f491fcd5ae Java/C++/C#: Sync. 2020-03-09 11:05:13 +01:00
Jonas Jensen
a13f355a85 C++: autoformat fixup 2020-03-06 08:29:46 +01:00
Jonas Jensen
e29f517af2 C++: Move InstructionSanity out of Instruction.qll
Having that module in `Instruction.qll` slowed down the parsing of that
file both humans and the compiler.

This commit moves the `InstructionSanity` module to `IRSanity.qll`
without making any changes to its contents apart from adding some
imports.
2020-03-05 12:11:50 +01:00
Jonas Jensen
6b2fd17f03 C++: IR: faster definitionReachesRank
On Wireshark with 6GB RAM, I've observed `definitionReachesRank` to be
the slowest predicate in the IR. It seems that the implementation was
slow because the optimizer failed to eliminate the common
`reachesRank - 1` subexpression. This led to context being pushed into
the `not`, which got implemented as `MATERIALIZE`. That wouldn't
normally be a disaster, but this is one of the largest predicates in the
IR SSA construction, and iteration 2 was very slow.

Before:

    (1505s) Starting to evaluate predicate SSAConstruction::DefUse::definitionReachesRank#ffff#cur_delta/4[1]@93f592 (iteration 1)
    (1535s) Tuple counts for SSAConstruction::DefUse::definitionReachesRank#ffff#cur_delta:
    130670697 ~0%     {4} r1 = SCAN project#SSAConstruction::DefUse::hasDefinitionAtRank#fffff AS I OUTPUT I.<0>, I.<1>, I.<2>, (I.<2> + 1)
    130670697 ~6%     {5} r2 = JOIN r1 WITH SSAConstruction::DefUse::exitRank#fff AS R ON FIRST 2 OUTPUT r1.<0>, r1.<1>, r1.<2>, r1.<3>, R.<2>
    130670697 ~6%     {5} r3 = SELECT r2 ON r2.<3> <= r2.<4>
    130670697 ~0%     {4} r4 = SCAN r3 OUTPUT r3.<0>, r3.<1>, r3.<2>, r3.<3>
                      return r4
    (1535s) 			 - SSAConstruction::DefUse::definitionReachesRank#ffff_delta has 130670697 rows (order for disjuncts: delta=<standard>).

    (1535s) Starting to evaluate predicate SSAConstruction::DefUse::definitionReachesRank#ffff#cur_delta/4[2]@866c14 (iteration 2)
    (1626s) Tuple counts for SSAConstruction::DefUse::definitionReachesRank#ffff#cur_delta:
    261341394 ~107%     {4} r1 = JOIN SSAConstruction::DefUse::definitionReachesRank#ffff#prev_delta AS L WITH SSAConstruction::DefUse::definitionReachesRank#ffff#join_rhs AS R ON FIRST 3 OUTPUT R.<0>, R.<1>, R.<2>, (1 + L.<3>)
    261341394 ~107%     {4} r2 = r1 AND NOT SSAConstruction::DefUse::definitionReachesRank#ffff#prev AS R(r1.<0>, r1.<1>, r1.<2>, r1.<3>)
    130670697 ~0%       {5} r3 = SCAN r2 OUTPUT r2.<0>, r2.<1>, (r2.<3> - 1), r2.<2>, r2.<3>
    106034590 ~1%       {4} r4 = JOIN r3 WITH project#SSAConstruction::DefUse::hasDefinitionAtRank#fffff AS R ON FIRST 3 OUTPUT r3.<0>, r3.<1>, r3.<3>, r3.<4>
    106034590           {4} r5 = MATERIALIZE r4 AS antijoin_rhs
    24636107  ~3%       {4} r6 = r2 AND NOT r5(r2.<0>, r2.<1>, r2.<2>, r2.<3>)
    24636107  ~0%       {5} r7 = JOIN r6 WITH SSAConstruction::DefUse::exitRank#fff AS R ON FIRST 2 OUTPUT r6.<0>, r6.<1>, r6.<2>, r6.<3>, R.<2>
    2749441   ~0%       {5} r8 = SELECT r7 ON r7.<3> <= r7.<4>
    2749441   ~4%       {4} r9 = SCAN r8 OUTPUT r8.<0>, r8.<1>, r8.<2>, r8.<3>
                        return r9
    (1626s) 			 - SSAConstruction::DefUse::definitionReachesRank#ffff_delta has 2749441 rows (order for disjuncts: delta=<standard>).

After:

    (12s) Tuple counts for SSAConstruction::DefUse::definitionReachesRank#ffff#cur_delta:
    130670697 ~0%     {4} r1 = SCAN project#SSAConstruction::DefUse::hasDefinitionAtRank#fffff AS I OUTPUT I.<0>, I.<1>, I.<2>, (I.<2> + 1)
                      return r1
    (12s) 			 - SSAConstruction::DefUse::definitionReachesRank#ffff_delta has 130670697 rows (order for disjuncts: delta=<standard>).
    (12s) Starting to evaluate predicate SSAConstruction::DefUse::definitionReachesRank#ffff#cur_delta/4[2]@fff64c (iteration 2)
    (34s) Tuple counts for SSAConstruction::DefUse::definitionReachesRank#ffff#cur_delta:
    108784031 ~0%     {4} r1 = SSAConstruction::DefUse::definitionReachesRank#ffff#prev_delta AS L AND NOT SSAConstruction::DefUse::exitRank#fff AS R(L.<0>, L.<1>, L.<3>)
    2749441   ~5%     {4} r2 = r1 AND NOT project#SSAConstruction::DefUse::hasDefinitionAtRank#fffff AS R(r1.<0>, r1.<1>, r1.<3>)
    2749441   ~4%     {4} r3 = SCAN r2 OUTPUT r2.<0>, r2.<1>, r2.<2>, (r2.<3> + 1)
    2749441   ~4%     {4} r4 = r3 AND NOT SSAConstruction::DefUse::definitionReachesRank#ffff#prev AS R(r3.<0>, r3.<1>, r3.<2>, r3.<3>)
                      return r4
    (34s) 			 - SSAConstruction::DefUse::definitionReachesRank#ffff_delta has 2749441 rows (order for disjuncts: delta=<standard>).

Note that the row counts are exactly the same before and after.
2020-03-04 15:00:47 +01:00
Robert Marsh
1e3419fd60 C++/C#: generate IR for funcs excluded in PrintIR
Previously, functions excluded from PrintIR would not have IR
generated. This sometimes affected escacpe analysis of functions that
were printed.
2020-03-03 14:34:08 -08:00
Mathias Vorreiter Pedersen
20529b4436 C++/C#: Sync identical files 2020-03-02 12:15:54 +01:00
Jonas Jensen
dab6691eb0 Merge pull request #2900 from dbartol/dbartol/void-buffer
C++: Better fix for `void` type on buffer access
2020-03-02 09:00:15 +01:00
Jonas Jensen
ec85f9f1a1 Merge pull request #2797 from rdmarsh2/rdmarsh/cpp/malloc-alias-locations
C++: Support dynamic memory allocations in IR alias analysis
2020-03-02 08:49:59 +01:00
Robert Marsh
28ee756c6a Merge pull request #2934 from geoffw0/add_tests
C++: Test and typos.
2020-02-28 15:12:32 -08:00
Geoffrey White
c6b0d4bbda C#: Sync identical files. 2020-02-28 17:55:59 +00:00
semmle-qlci
ec90627a64 Merge pull request #2909 from yo-h/experimental
Approved by aschackmull, jbj, max-schaefer, tausbn
2020-02-28 03:15:58 +00:00
Dave Bartolomeo
b0fb16c068 C++/C#: Fix formatting 2020-02-27 13:44:02 -05:00
Anders Schack-Mulligen
67d386b5ba C++/C#: Add synchronization. 2020-02-27 14:10:16 +01:00
Robert Marsh
95a762c987 Merge master for submodule update 2020-02-26 13:44:26 -08:00
Robert Marsh
4333fe7905 Merge branch 'master' into rdmarsh/cpp/ir-flow-through-outparams 2020-02-26 13:15:27 -08:00
yo-h
43bcd5b26c Add guidelines for experimental CodeQL queries and libraries 2020-02-24 15:08:31 -05:00
Jonas Jensen
2d9df70abc Merge pull request #2887 from MathiasVP/fix-ir-gen-switch
C++: Fix IR generation for switch statements
2020-02-24 13:29:27 +01:00
Mathias Vorreiter Pedersen
ed430ce855 C++/C#: Bind parameter in new case. 2020-02-24 09:12:14 +01:00
Mathias Vorreiter Pedersen
af364e66fc C++/C#: Move sanity check inside InstructionSanity module and accept tests 2020-02-23 20:53:49 +01:00
Dave Bartolomeo
170331b105 C++: Better fix for void type on buffer access
Fixes issue https://github.com/github/codeql-c-analysis-team/issues/20

This change undoes the workaround in https://github.com/Semmle/ql/pull/2736, and replaces it with a fix for the underlying cause. The problem was that the IR construction code for side effects incorrectly assumed that `BufferAccessOpcode` included `SizedBufferAccessOpcode`. I think that was actually a perfectly reasonable assumption to make, so I changed the `Opcode` hierarchy to make it true.
2020-02-21 18:46:32 -07:00
Mathias Vorreiter Pedersen
da41cbca06 C#: Add similar fix to translation of switch statements in C# 2020-02-21 13:33:54 +01:00
Anders Schack-Mulligen
771cb754c2 Merge pull request #2822 from hvitved/dataflow/node-cand-simple-call-context
Data flow: Track simple call contexts in `nodeCand[Fwd]1`
2020-02-21 10:02:06 +01:00
Tom Hvitved
0cc3218115 Merge pull request #2872 from aschackmull/dataflow/pathstep-localflow-join
Java/C++/C#: Improve join-order in pathStep predicate
2020-02-21 09:39:17 +01:00
Mathias Vorreiter Pedersen
780010d8f9 C++/C#: Sync identical files 2020-02-20 22:15:06 +01:00
Tom Hvitved
a772b82fea Address review comments 2020-02-20 19:48:49 +01:00