Commit Graph

7329 Commits

Author SHA1 Message Date
Dave Bartolomeo
c8d154e9cc C#: Fix dump of IR types 2019-09-26 15:54:09 -07:00
Dave Bartolomeo
e30e163081 C#: Implement IRType
This commit implements the language-neutral IR type system for C#. It mostly follows the same pattern as C++, modified to fit the C# type system. All object references, pointers, and lvalues are represented as `IRAddress` types. All structs and generic parameters are implemented as `IRBlobType`. Function addresses get a single `IRFunctionAddressType`.

I had to fix a couple places in the original IR type system where I didn't realize I was still depending on language-specific types. As part of this, `CSharpType` and `CppType` now have a `hasUnspecifiedType()` predicate, which is equivalent to `hasType()`, except that it holds only for the unspecified version of the type. This predicate can go away once we remove the IR's references to the underlying `Type` objects.

All C# IR tests pass without modification, but only because this commit continues to print the name of `IRUnknownType` as `null`, and `IRFunctionAddressType` as `glval<null>`. These will be fixed separately in a subsequent commit in this PR.
2019-09-26 15:47:52 -07:00
Dave Bartolomeo
300e580874 C++: Implement language-neutral IR type system
The C++ IR currently has a very clunky way of specifying the type of an IR entity (`Instruction`, `Operand`, `IRVariable`, etc.). There are three separate predicates: `getType()`, `isGLValue()`, and `getSize()`. All three are necessary, rather than just having a `getType()` predicate, because some IR entities have types that are not represented via an existing `Type` object in the AST. Examples include the type for an lvalue returned from a `VariableAddress` instruction, the type for an array slice being zero-initialized in a variable initializer, and several others. It is very easy for QL code to just check the `getType()` predicate, while forgetting to use `isGLValue()` to determine if that type is the actual type of the entity (the prvalue case) or the type referred to by a glvalue entity. Furthermore, the C++ type system creates potentially many different `Type` objects for the same underlying type (e.g. typedefs, using declarations, `const`/`volatile` qualifiers, etc.), making it more difficult to tell when two entities have semantically equivalent types.

In addition, other languages for which we want to enable the IR have somewhat different type systems. The various language type systems differ in their structure, although they tend to share the basic building blocks necessary for the IR.

To address all of the above problems, I've introduced a new class hierarchy, rooted at the class `IRType`, that represents a bare-bones type system that is independent of source language (at least across C/C++/C#/Java). A type's identity is based on its kind (signed integer, unsigned integer, floating-point, Boolean, blob, etc.), size and in the case of blob types, a "tag" to differentiate between different classes and structs. No distinction is made between, say `signed int` and plain `int`, or between different language integer types that have the same signedness and size (e.g. `unsigned int` vs. `wchar_t` on Linux). `IRType` is intended for use by language-agnostic IR-based analyses, including range analysis, dataflow, SSA construction, and alias analysis. The set of available `IRType`s is determined by predicate provided by the language library implementation (e.g. `hasSignedIntegerType(int byteSize)`.

In addition to `IRType`, each language now defines a type alias named `LanguageType`, representing the type of an IR entity in more language-specific terms. The only predicate requried on `LanguageType` is `getIRType()`, which returns the single `IRType` object for the language-neutral representation of that `LanguageType`. All other predicates on and subclasses of `LanguageType` are language-specific. There may be many instances of `LanguageType` that map to a given `IRType`, to allow for typedefs, etc.

Most of the changes are mechanical changes in the IR construction code, to return the correct type for each IR entity. SSA construction has also been updated to avoid dependencies on language-specific types.

I have not yet removed the original `getType()` predicates that just return `Type`. These can be removed once we move the remaining existing libraries to use `IRType`.

Test results are, by design, pretty much unchanged. Once case changed for inline asm, because the previously IR generation for it played a little fast and loose with the input/output expressions. The test case now includes both input and output variables. The generated IR for `Conditional_LValue` is now more correct, because we now have a way to represent an lvalue of an lvalue. `syntax-zoo` is still a hot mess. Most of the changed outputs are due to wobble from having multiple functions with the same name, but with a slightly different order of evaluation due to the type changes. Others are wobble from already-invalid IR. A couple non-wobbly places have improved slightly, though.

The C# part of this change is waiting for #2005 to be merged, since that has some of the necessary C# implementation.
2019-09-23 16:14:00 -07:00
Calum Grant
b85896299d Merge pull request #2000 from AndreiDiaconu1/ircsharp-fixes
C# IR: Minor fixes and changes
2019-09-23 18:14:50 +01:00
Jonas Jensen
22e57a6559 Merge pull request #1860 from matt-gretton-dann/add-using-aliases
Add support for using aliases
2019-09-23 16:53:51 +02:00
Jonas Jensen
898976121b Merge pull request #1987 from geoffw0/toomanyformat
CPP: WrongNumberOfFormatArguments.ql Fix
2019-09-23 16:05:11 +02:00
AndreiDiaconu1
7f76947af0 Autoformat 2019-09-23 15:03:38 +01:00
AndreiDiaconu1
ae503b2982 Remove incorrect Load
Removed an incorrect `Load` op generated by propery accesses.
2019-09-23 14:43:08 +01:00
AndreiDiaconu1
3c95205f2e Minor fixes for array related translation
More accurate type sizes using language specific predicates from `IRCSharpLanguage.qll`.
Added immediate operands for some `PointerX` (add, sub) instructions.
Some other minor consistency fixes.
2019-09-23 14:37:31 +01:00
Robert Marsh
90c91a78f8 Merge pull request #1976 from pavgust/fix/hashcons-perf
C++: HashCons: Further performance improvements
2019-09-23 06:37:03 -07:00
Rasmus Wriedt Larsen
a0ecbc555d Merge pull request #1998 from taus-semmle/python-support-aiter
Python: Add `__aiter__` as a recognised iterator method.
2019-09-23 15:32:53 +02:00
Matthew Gretton-Dann
4606587fe8 C++: Apply style guide to TypedefType.qll 2019-09-23 13:57:50 +01:00
Matthew Gretton-Dann
af3b0d9e73 C++: Update stats. 2019-09-23 13:57:50 +01:00
Matthew Gretton-Dann
c8dfa46c63 C++: Add upgrade script for using aliases. 2019-09-23 13:57:50 +01:00
Matthew Gretton-Dann
fc75a6af5a C++: Add tests for using aliases 2019-09-23 13:57:50 +01:00
Matthew Gretton-Dann
9ff38ebeee C++: Update tests for new CTypedefType. 2019-09-23 13:57:50 +01:00
Matthew Gretton-Dann
5468b8def7 C++: Add support for C++ using aliases
Previously these were identified as typedefs.
2019-09-23 13:57:50 +01:00
Geoffrey White
b3df289a80 CPP: Fix test. 2019-09-23 13:56:24 +01:00
Geoffrey White
2d8e4b3176 CPP: Additional cases resembling the ticket. 2019-09-23 13:04:14 +01:00
semmle-qlci
825a3d2917 Merge pull request #1954 from asger-semmle/type-tracking-through-captured-vars
Approved by xiemaisi
2019-09-23 12:10:30 +01:00
semmle-qlci
e2c941c577 Merge pull request #1916 from erik-krogh/taintedLength
Approved by asger-semmle, xiemaisi
2019-09-23 11:47:48 +01:00
Taus Brock-Nannestad
e1012d8d5a Python: Add __aiter__ as a recognised iterator method. 2019-09-23 12:26:16 +02:00
Geoffrey White
040bd89163 CPP: Correct expected results. 2019-09-23 11:02:36 +01:00
semmle-qlci
7a57a3c743 Merge pull request #1996 from xiemaisi/js/fix-illegal-invocation-refl
Approved by esben-semmle
2019-09-23 09:16:33 +01:00
Max Schaefer
149ae5d7ab JavaScript: Fix IllegalInvocation.
This fixes false positives that arise when a call such as `f.apply` can either be interpreted as a reflective invocation of `f`, or a normal call to method `apply` of `f`.
2019-09-23 07:44:14 +01:00
Erik Krogh Kristensen
814c5537be update name of loop bound injection in change-notes 2019-09-20 22:56:08 +02:00
Asger F
69a88c4fcd JS: Fix typo and add metadata to DomValueRefs 2019-09-20 15:43:08 +01:00
Asger F
1ce0a48996 JS: Update tests 2019-09-20 15:41:36 +01:00
Geoffrey White
9100ab9360 CPP: Autoformat. 2019-09-20 15:30:59 +01:00
Geoffrey White
accb8246d4 CPP: Change note. 2019-09-20 15:15:35 +01:00
Geoffrey White
f7607313e7 CPP: Fix FPs. 2019-09-20 15:12:55 +01:00
Geoffrey White
9a407eb43c CPP: Test format args with mismatching declarations. 2019-09-20 14:54:44 +01:00
Calum Grant
b31cd8ab32 Merge pull request #1982 from hvitved/csharp/null-maybe-dynamic
C#: Remove false positives from `cs/dereferenced-value-may-be-null`
2019-09-20 14:46:01 +01:00
Calum Grant
478095223e Merge pull request #1983 from hvitved/csharp/unit-test-windows
C#: Fix broken unit test on Windows
2019-09-20 13:52:01 +01:00
Pavel Avgustinov
1c971d3f88 HashCons: Further performance improvements
The key insight here is that `HC_FieldCons` and `HC_Array` are
functionally determined by the things that arise in another
recursive call. Lifting them to their own predicate, therefore,
reduces nonlinearity and constrains the join order in a way that
cannot be asymptotically bad -- and, indeed, makes quite a big
difference in practice.
2019-09-20 12:00:33 +01:00
Tom Hvitved
cb6e1536a3 C#: Fix broken unit test on Windows 2019-09-20 11:40:18 +02:00
semmle-qlci
6d9d859119 Merge pull request #1934 from asger-semmle/node-js-classification
Approved by esben-semmle
2019-09-20 09:50:34 +01:00
Tom Hvitved
fb68d839a9 C#: Add change note 2019-09-20 10:40:20 +02:00
Max Schaefer
4fe74c0b2a Merge pull request #1960 from Semmle/rc/1.22
Merge rc/1.22 into master
2019-09-20 09:08:40 +01:00
Tom Hvitved
aa0c78cd85 C#: Teach guards library about more null guards 2019-09-20 09:58:04 +02:00
Tom Hvitved
40fafc5fda C#: Teach comparison library about dynamic comparison operations 2019-09-20 09:51:35 +02:00
Tom Hvitved
c923cc6378 C#: Add tests for dynamic comparisons 2019-09-20 09:19:03 +02:00
Tom Hvitved
cb7db8f4c0 C#: Add more nullness tests 2019-09-20 09:18:55 +02:00
Robert Marsh
d3f2d8169e Merge pull request #1967 from jbj/tainttracking-ir-2
C++: DefaultTaintTracking flow from a to a[i]
2019-09-19 15:00:29 -07:00
Robert Marsh
9c6a0ffc48 Merge pull request #1979 from nickrolfe/wrong_type_uninstantiated
C++: ignore uninstantiated templates in WrongTypeFormatArguments.ql
2019-09-19 14:51:45 -07:00
Nick Rolfe
56f4f86921 C++: ignore uninstantiated templates in WrongTypeFormatArguments.ql 2019-09-19 21:18:47 +01:00
semmle-qlci
0387177acd Merge pull request #1851 from hvitved/csharp/early-identify-duplicate-extraction
Approved by calumgrant
2019-09-19 19:45:33 +01:00
Robert Marsh
fd88f7a3ce Merge pull request #1884 from jbj/dataflow-addressof
C++: Data flow through address-of operator (&)
2019-09-19 09:15:43 -07:00
Robert Marsh
340c8026de Merge pull request #1965 from jbj/bitfield-template
C++: Ignore templates in AmbiguouslySignedBitField.ql
2019-09-19 07:46:54 -07:00
semmle-qlci
6b783141e9 Merge pull request #1962 from shati-patel/sphinx/collapse
Approved by jf205
2019-09-19 15:33:45 +01:00