Commit Graph

105 Commits

Author SHA1 Message Date
Dave Bartolomeo
0ae98e78a2 Merge remote-tracking branch 'github/master' into github/codeql-c-analysis-team/69_union 2020-06-08 11:20:14 -04:00
Dave Bartolomeo
0666a2e587 Remove usage of f-string 2020-06-04 08:48:14 -04:00
Dave Bartolomeo
a18eba2c4c Allow missing files in sync-files --latest
When running `sync-files` (or `sync-identical-files`) with the `--latest` switch, if one or more of the files in a group does not exist, the script will crash. This happens all the time when I add a new group, or add a new file path in an existing group. This has bothered me for a long time, so I finally fixed it when I ran into it again today.

I've changed the script as follows:
- If _none_ of the paths in the group exist, print an error message listing the paths in the group. This happens with or without `--latest`.
- If `--latest` is specified, copy the master file to the paths of the missing files.
2020-06-03 14:53:31 -04:00
Dave Bartolomeo
bbadf4b4bb C#: Port TInstruction-sharing support from C++
This updates C#'s IR to share `TInstruction` across stages the same way C++ does. The only interesting part is that, since we have not yet ported full alias analysis to C#, I stubbed out the required parts of the aliased SSA interface in `AliasedSSAStub.qll`.
2020-06-03 13:52:19 -04:00
Dave Bartolomeo
1e863ac40b C++: Share TInstruction across IR stages
Each stage of the IR reuses the majority of the instructions from previous stages. Previously, we've been wrapping each reused old instruction in a branch of the `TInstruction` type for the next stage. This causes use to create roughly three times as many `TInstruction` objects as we actually need.

Now that IPA union types are supported in the compiler, we can share a single `TInstruction` IPA type across stages. We create a single `TInstruction` IPA type, with individual branches of this type for instructions created directly from the AST (`TRawInstruction`) and for instructions added by each stage of SSA construction (`T*PhiInstruction`, `T*ChiInstruction`, `T*UnreachedInstruction`). Each stage then defines a `TStageInstruction` type that is a union of all of the branches that can appear in that particular stage. The public `Instruction` class for each phase extends the `TStageInstruction` type for that stage.

The interface that each stage exposes to the pyrameterized modules in the IR is now split into three pieces:
- The `Raw` module, exposed only by the original IR construction stage. This module identifies which functions have IR, which `TRawInstruction`s exist, and which `IRVariable`s exist.
- The `SSA` module, exposed only by the two SSA construction stages. This identifiers which `Phi`, `Chi`, and `Unreached` instructions exist.
- The global module, exposed by all three stages. This module has all of the predicates whose implementation is different for each stage, like gathering definitions of `MemoryOperand`s.

Similarly, there is now a single `TIRFunction` IPA type that is shared across all three stages. There is a single `IRFunctionBase` class that exposes the stage-indepdendent predicates; the `IRFunction` class for each stage extends `IRFunctionBase`.

Most of the other changes are largely mechanical.
2020-06-01 11:15:29 -04:00
Dave Bartolomeo
09d1da2f7a C++/C#: Rename sanity -> consistency
I did both of these languages together because they share some of the changed code via `identical-files.json`.
2020-05-11 13:29:52 -04:00
Sauyon Lee
972551edd7 sync-files.py: cast line to string before concat 2020-04-23 15:32:28 -07:00
Mathias Vorreiter Pedersen
1e73528102 C++/C#: Add synchronization 2020-04-03 10:08:00 +02:00
Jonas Jensen
93f7c950ea Merge pull request #3152 from dbartol/dbartol/sync-files
Move `sync-identical-files.py` into public repo as `sync-files.py`
2020-03-31 08:31:00 +02:00
Dave Bartolomeo
3eef2747d5 Fix LGTM alerts 2020-03-29 03:12:27 -04:00
Dave Bartolomeo
0952064eb3 Move sync-identical-files.py into public repo as sync-files.py
We currently use a script to keep certain duplicate QL files in sync across the repo. For historical reasons, this script has lived in the private repo alongside the rest of CodeQL, even though it's only used for files in the public `ql` repo. This PR moves the script into the public `ql` repo. It is still invoked by Jenkins scripts that live in the private repo during CI, but it can also be invoked directly without having a checkout of the private repo. This is useful for anyone who is modifying the dataflow or IR libraries with only a QL checkout.
2020-03-29 02:59:14 -04:00
Anders Schack-Mulligen
67d386b5ba C++/C#: Add synchronization. 2020-02-27 14:10:16 +01:00
Mathias Vorreiter Pedersen
d4c6f487bc C++/C#: Fix sync config file for value numbering sharing 2020-02-13 22:32:52 +01:00
Robert Marsh
ffaaed0550 C++: separate IR ValueNumber newtype and interface 2020-02-06 15:35:20 +01:00
Dave Bartolomeo
d12b140921 C++/C#: Update shared file list 2020-01-28 10:55:38 -07:00
Dave Bartolomeo
9d35ff73c4 C++/C#: Make escape analysis unsound by default
When building SSA, we'll be assuming that stack variables do not escape, at least until we improve our alias analysis. I've added a new `IREscapeAnalysisConfiguration` class to allow the query to control this, and a new `UseSoundEscapeAnalysis.qll` module that can be imported to switch to the sound escape analysis. I've cloned the existing IR and SSA tests to have both sound and unsound versions. There were relatively few diffs in the IR dump tests, and the sanity tests still give the same results after one change described below.

Assuming that stack variables do not escape exposed an existing bug where we do not emit an `Uninitialized` instruction for the temporary variables used by `return` statements and `throw` expressions, even if the initializer is a constructor call or array initializer. I've refactored the code for handling elements that initialize a variable to share a common base class. I added a test case for returning an object initialized by constructor call, and ensured that the IR diffs for the existing `throw` test cases are correct.
2020-01-22 00:15:30 -07:00
Tom Hvitved
82c368e13e C#: Sync XML.qll with other languages 2019-12-19 10:26:08 +01:00
Max Schaefer
81f51e4e2b Ensure that XML libraries for C++, Java, JavaScript and Python stay in sync. 2019-12-17 10:15:43 +00:00
Dave Bartolomeo
51ff262cbc C++/C#: Add IR SSA sanity tests 2019-11-22 13:16:05 -07:00
Dave Bartolomeo
bcd987cdf1 Merge from master and share value numbering 2019-09-27 17:40:43 -07:00
Dave Bartolomeo
f76334c24a C++, C#: Share unaliased SSA files between languages
Most of the C# diffs are from bringing those files in sync with the latest C++ files.
2019-09-27 13:46:42 -07:00
Dave Bartolomeo
c389432922 C++, C#: Sync IRType.qll between languages 2019-09-26 22:11:24 -07:00
Jonas Jensen
80a0027808 C++: Shared TaintTrackingImpl for IR TaintTracking 2019-09-10 09:40:27 +02:00
Jonas Jensen
e9a029cba3 C++: Local field flow using global library
This commit removes fields from the responsibilities of `FlowVar.qll`.
The treatment of fields in that file was slow and imprecise.

It then adds another copy of the shared global data flow library, used
only to find local field flow, and it exposes that local field flow
through `localFlow` and `localFlowStep`.

This has a performance cost. It adds two cached stages to any query that
uses `localFlow`: the stage from `DataFlowImplCommon`, which is shared
with all queries that use global data flow, and a new stage just for
`localFlowStep`.
2019-09-02 11:17:27 +02:00
Geoffrey White
1215da2d6c Merge pull request #1827 from jbj/sbb-tidy
C++: Tidy up SubBasicBlocks.qll
2019-08-29 15:42:40 +01:00
AndreiDiaconu1
c74898ec9f Synced files
Synced the files that are needed for this PR
2019-08-28 12:25:14 +01:00
AndreiDiaconu1
de6f547088 Synced more files 2019-08-28 12:25:13 +01:00
Dave Bartolomeo
609ca034c0 C#/C++: Share IR implementation 2019-08-28 12:25:13 +01:00
Jonas Jensen
17ee3f555c C++: Sync the two copies of SubBasicBlocks.qll
These files are now added to `identical-files.json` so they will remain
in sync.
2019-08-26 16:01:36 +02:00
Jonas Jensen
25701f203d C++/C#/Java: Shared TaintTrackingImpl.qll
This file is now identical in all languages. Unifying this file led to
the following changes:
- The documentation spelling fixes and example from the C++ version
  were copied to the other versions and updated.
- The steps through `NonLocalJumpNode` from C# were abstracted into a
  `globalAdditionalTaintStep` predicate that's empty for C++ and Java.
- The `defaultTaintBarrier` predicate from Java is now present but empty
  on C++ and C#.
- The C++ `isAdditionalFlowStep` predicate on
  `TaintTracking::Configuration` no longer includes `localFlowStep`.
  That should avoid some unnecessary tuple copying.
2019-08-21 14:55:54 +02:00
Jonas Jensen
11583b69e0 C#: Use pyrameterized modules for TaintTracking
To keep the code changes minimal, and to keep the implementation similar
to C++ and Java, the `TaintTracking{Public,Private}` files are now
imported together through `TaintTrackingUtil`. This has the side effect
of exposing `localAdditionalTaintStep`. The corresponding predicate for
Java was already exposed.
2019-08-20 13:45:38 +02:00
Jonas Jensen
aeb2323128 Java: Use pyrameterized modules for TaintTracking 2019-08-20 13:45:37 +02:00
Jonas Jensen
d388be7d3b C++: Use pyrameterized modules for TaintTracking 2019-08-20 13:45:37 +02:00
Tom Hvitved
081ee9944d C#: Add more copies of the data flow library 2019-08-07 10:41:39 +02:00
Robert Marsh
6bd22b01b3 Merge pull request #1607 from dave-bartolomeo/dave/CrossLanguageIR
C++: Start preparing IR for supporting multiple languages
2019-07-29 12:34:21 -07:00
Anders Schack-Mulligen
3d340d4fba Java: Delete deprecated dependency DataFlowImplDepr. 2019-07-25 11:18:01 +02:00
Dave Bartolomeo
efa854ea3e C++: Add *Imports.qll files to identical-files.json 2019-07-19 15:38:11 -07:00
Tom Hvitved
a6fa6dfd74 C#: Add shared data flow files 2019-05-06 14:54:11 +02:00
Dave Bartolomeo
b5a3edfdae C++: FunctionIR -> IRFunction 2019-03-12 11:28:22 -07:00
Dave Bartolomeo
b40fd95b8e C++: Better tracking of SSA memory accesses
This change fixes a few key problems with the existing SSA implementations:

For unaliased SSA, we were incorrectly choosing to model a local variable that had accesses that did not cover the entire variable. This has been changed to ensure that all accesses to the variable are at offset zero and have the same type as the variable itself. This was only possible to fix now that every `MemoryOperand` has its own type.

For aliased SSA, we now correctly track the offset and size of each memory access using an interval of bit offsets covered by the access. The offset interval makes the overlap computation more straightforward. Again, this is only possible now that operands have types.
The `getXXXMemoryAccess` predicates are now driven by the `MemoryAccessKind` on the operands and results, instead of by specific opcodes.

This change does fix an existing false negative in the IR dataflow tests.

I added a few simple test cases to the SSA IR tests, covering the various kinds of overlap (MustExcactly, MustTotally, and MayPartially).

I added "PrintSSA.qll", which can dump the SSA memory accesses as part of an IR dump.
2019-02-13 10:44:39 -08:00
Dave Bartolomeo
56bb9dcde0 C++: Remove infeasible edges to reachable blocks
The existing unreachable IR removal code only retargeted an infeasible edge to an `Unreached` instruction if the successor of the edge was an unreachable block. This is too conservative, because it doesn't remove an infeasible edge that targets a block that is still reachable via other paths. The trivial example of this is `do { } while (false);`, where the back edge is infeasible, but the body block is still reachable from the loop entry.

This change retargets all infeasible edges to `Unreached` instructions, regardless of the reachability of the successor block.
2018-12-14 12:13:22 -08:00
Dave Bartolomeo
5ba51e32f0 C++: Remove aliased_ssa instantiation of IR reachability
We never actually consumed this iteration, since SSA construction only depends on the reachability instantiation of the previous IR layer.
2018-12-10 21:22:55 -08:00
Dave Bartolomeo
99d33f9623 C++: Remove unreachable IR
This change removes any IR instructions that can be statically proven unreachable. To detect unreachable IR, we first run a simple constant value analysis on the IR. Then, any `ConditionalBranch` with a constant condition has the appropriate edge marked as "infeasible". We define a class `ReachableBlock` as any `IRBlock` with a path from the entry block of the function. SSA construction has been modified to operate only on `ReachableBlock` and `ReachableInstruction`, which ensures that only reachable IR gets translated into SSA form. For any infeasible edge where its predecessor block is reachable, we replace the original target of the branch with an `Unreached` instruction, which lets us preserve the invariant that all `ConditionalBranch` instructions have both a true and a false edge, and allows guard inference to still work.

The changes to `SSAConstruction.qll` are not as scary as they look. They are almost entirely a mechanical replacement of `OldIR::IRBlock` with `OldBlock`, which is just an alias for `ReachableBlock`.

Note that the `constant_func.ql` test can determine that the two new test functions always return 0.

Removing unreachable code helps get rid of some common FPs in IR-based dataflow analysis, especially for constructs like `while(true)`.
2018-12-10 21:22:55 -08:00
Dave Bartolomeo
59fc77f066 C++: Simple constant analysis
This change moves the simple constant analysis that was used by the const_func test into a pyrameterized module for use on any stage of the IR. This will be used to detect unreachable code.
2018-12-10 21:22:54 -08:00
Dave Bartolomeo
58f7596519 C++: IR-based dataflow 2018-11-30 12:15:11 -08:00
Robert Marsh
a33b59103a C++: insert Chi nodes in the IR successor relation
This commit adds Chi nodes to the successor relation and accounts for
them in the CFG, but does not add them to the SSA data graph. Chi nodes
are inserted for partial writes to any VirtualVariable, regardless of
whether the partial write reaches any uses.
2018-11-26 12:08:18 -08:00
Robert Marsh
f9ed39915f C++: recompute IRBlock membership at each stage
This enables the addition of new instructions in later phases of IR
construction; in particular, aliasing write instructions and inference
instructions.
2018-11-26 12:08:18 -08:00
Dave Bartolomeo
f278f4fa47 C++: Operands as IPA types
@rdmarsh2 has been working on various queries and libraries on top of the IR, and has pointed out that having to always refer to an operand of an instruction by the pair of (instruction, operandTag) makes using the IR a bit clunky. This PR adds a new `Operand` IPA type that represents an operand of an instruction. `OperandTag` still exists, but is now an internal type used only in the IR implementation.
2018-10-23 14:58:44 -07:00
Dave Bartolomeo
5a25602c28 C++: Move GVN out of "internal" directory 2018-09-20 08:21:15 -07:00
Dave Bartolomeo
46b2c19c66 C++: Initial attempt at IR-based value numbering 2018-09-17 17:19:05 -07:00