codeql

mirror of https://github.com/github/codeql.git synced 2025-12-19 10:23:15 +01:00

Author	SHA1	Message	Date
Dave Bartolomeo	73d2e09a8d	C++:/C# Remove `opcode` from `TRawInstruction`	2020-06-12 17:36:01 -04:00
Dave Bartolomeo	978275cbd4	C++/C#: Move `irFunc` out of various `TInstruction` branches	2020-06-12 17:26:45 -04:00
Dave Bartolomeo	07c1520b4d	C++/C#: Move `ast` out of `TRawInstruction`	2020-06-12 17:03:02 -04:00
Dave Bartolomeo	2aabe431f6	C++/C#: Stop caching `getOldInstruction()`	2020-06-12 16:22:58 -04:00
Dave Bartolomeo	ac169931b3	C++/C#: More efficient evaluation of `SSA::hasInstruction()`	2020-06-12 16:09:50 -04:00
Dave Bartolomeo	4331b9b54e	C++: Simplify logic to an implication	2020-06-12 09:31:19 -04:00
Jonas Jensen	abd05bcff1	Merge pull request #3596 from robertbrignull/more-suites Add more code-scanning suites	2020-06-12 09:08:20 +02:00
Robert Marsh	65f4ef712e	C++: accept false positive tests after merge The IR false positives are due to the same path length limit as the AST false positives on the same line.	2020-06-11 15:27:13 -07:00
Robert Marsh	a7efa0d602	Merge branch 'master' into ir-this-parameter-2	2020-06-11 13:21:52 -07:00
Mathias Vorreiter Pedersen	b78c06559e	Merge pull request #3691 from geoffw0/reftest C++: Add a test case for CWE-114 involving pointers and references.	2020-06-11 22:02:45 +02:00
Geoffrey White	fdd7ad2300	C++: Add a SideEffectFunction model to 'system'.	2020-06-11 18:59:17 +01:00
Geoffrey White	e8b34e07f8	C++: Add an AliasFunction model to 'system'.	2020-06-11 18:44:41 +01:00
Geoffrey White	7fee2c239d	C++: Add an ArrayFunction model to 'system'.	2020-06-11 18:44:09 +01:00
Geoffrey White	b38a7a9ffc	C++: Fill out ArrayFunction model for 'fgets'.	2020-06-11 18:20:24 +01:00
Geoffrey White	40c20f2731	C++: Add the test for DefaultTaintTracking as well.	2020-06-11 17:37:05 +01:00
Geoffrey White	2f192f6a0c	C++: Add a test of char* -> std::string -> char* taint.	2020-06-11 17:37:05 +01:00
Dave Bartolomeo	41df7000c5	Merge from master, including fixing up merge conflicts	2020-06-11 12:20:46 -04:00
Ian Lynagh	fd88289e46	C++: Fix reference to `Block` We don't call it `BlockStmt`.	2020-06-11 16:50:23 +01:00
Robert Marsh	982fb38807	Merge pull request #3419 from MathiasVP/flat-structs C++: Add reverse reads to IR field flow	2020-06-10 14:31:00 -07:00
Mathias Vorreiter Pedersen	a38839b446	C++: Include copy of IntWrapper class with two data members	2020-06-10 22:27:40 +02:00
Mathias Vorreiter Pedersen	ca20f17703	C++: Implement move constructor in terms of swap. I'm haven't found anything online on whether this is good or bad, and the only reason for not doing it might be performance.	2020-06-10 22:16:58 +02:00
Mathias Vorreiter Pedersen	1a95095505	C++: Add default move constructor. Also removed debug comment I forgot to remove earlier. Luckily, that meant that no line numbers changed in .expected files.	2020-06-10 17:13:04 +02:00
Mathias Vorreiter Pedersen	5abab25c28	Update cpp/ql/test/library-tests/dataflow/taint-tests/taint.cpp Co-authored-by: Jonas Jensen <jbj@github.com>	2020-06-10 16:51:21 +02:00
Geoffrey White	91b9b78c48	C++: Add a test case for CWE-114 involving pointers and references.	2020-06-10 14:09:46 +01:00
Mathias Vorreiter Pedersen	88dabffd2b	C++: Add tests that demonstrate flow through custom swap functions	2020-06-10 15:06:57 +02:00
semmle-qlci	1b8f3c4b84	Merge pull request #3657 from hvitved/dataflow/hidden-nodes Approved by aschackmull, jbj	2020-06-10 13:22:09 +01:00
Robert Brignull	ded5eec76a	rename slow-queries.yml to exclude-slow-queries.yml	2020-06-10 09:59:31 +01:00
Jonas Jensen	ad401e9f21	C++: Copy and adjust Java's correctness argumnt Instead of a vague reference to a code comment for another language, the `controlsBlock` predicate now has the whole comment in it directly. I've adjusted the wording so it should be reasonably correct for C/C++. As with the other comments in this file, I don't distinguish between the condition and its block. I think that makes the explanation clearer without losing any detail we care about. To make the code fit the wording of the comment, I changed the `hasBranchEdge/2` predicate into `getBranchSuccessor/1`.	2020-06-09 20:53:56 +02:00
Jonas Jensen	a341912da9	C++: Performance tweak for 1-field struct loads On kamailio/kamailio the `DataFlowUtil::simpleInstructionLocalFlowStep` predicate was slow because of the case for single-field structs, where there was a large tuple-count bulge when joining with `getFieldSizeOfClass`: 3552902 ~2% {2} r1 = SCAN Instruction::CopyInstruction::getSourceValueOperand_dispred#3#ff AS I OUTPUT I.<1>, I.<0> 2065347 ~2% {2} r35 = JOIN r1 WITH Operand::NonPhiMemoryOperand::getAnyDef_dispred#3#ff AS R ON FIRST 1 OUTPUT r1.<1>, R.<1> 2065827 ~2% {3} r36 = JOIN r35 WITH Instruction::Instruction::getResultType_dispred#3#ff AS R ON FIRST 1 OUTPUT R.<1>, r35.<1>, r35.<0> 2065825 ~3% {3} r37 = JOIN r36 WITH Type::Type::getSize_dispred#ff AS R ON FIRST 1 OUTPUT r36.<1>, r36.<2>, R.<1> 2068334 ~2% {4} r38 = JOIN r37 WITH Instruction::Instruction::getResultType_dispred#3#ff AS R ON FIRST 1 OUTPUT R.<1>, r37.<2>, r37.<0>, r37.<1> 314603817 ~0% {3} r39 = JOIN r38 WITH DataFlowUtil::getFieldSizeOfClass#fff_120#join_rhs AS R ON FIRST 2 OUTPUT r38.<3>, R.<2>, r38.<2> 8 ~0% {2} r40 = JOIN r39 WITH Instruction::Instruction::getResultType_dispred#3#ff AS R ON FIRST 2 OUTPUT r39.<2>, r39.<0> That's 314M tuples. Strangely, there is no such bulge on more well-behaved snapshots like mysql/mysql-server. With this commit the explosion is gone: ... 2065825 ~0% {4} r37 = JOIN r36 WITH Type::Type::getSize_dispred#ff AS R ON FIRST 1 OUTPUT r36.<0>, R.<1>, r36.<1>, r36.<2> 1521 ~1% {3} r38 = JOIN r37 WITH DataFlowUtil::getFieldSizeOfClass#fff_021#join_rhs AS R ON FIRST 2 OUTPUT r37.<2>, R.<2>, r37.<3> 8 ~0% {2} r39 = JOIN r38 WITH Instruction::Instruction::getResultType_dispred#3#ff AS R ON FIRST 2 OUTPUT r38.<0>, r38.<2>	2020-06-09 14:50:02 +02:00
Tom Hvitved	a371205db1	Data flow: Sync files	2020-06-09 13:55:12 +02:00
Tom Hvitved	8c9f85d04f	Data flow: Allow nodes to be hidden from path explanations	2020-06-09 13:53:19 +02:00
Jonas Jensen	4642037dce	C++: Speed up IRGuardCondition::controlsBlock The `controlsBlock` predicate had some dramatic bulges in its tuple counts. To make matters worse, those bulges were in materialized intermediate predicates like `#shared` and `#antijoin_rhs`, not just in the middle of a pipeline. The problem was particularly evident on kamailio/kamailio, where `controlsBlock` was the slowest predicate in the IR libraries: IRGuards::IRGuardCondition::controlsBlock_dispred#fff#shared#4 ........ 58.8s IRGuards::IRGuardCondition::controlsBlock_dispred#fff#antijoin_rhs .... 33.4s IRGuards::IRGuardCondition::controlsBlock_dispred#fff#antijoin_rhs#1 .. 26.7s The first of the above relations had 201M rows, and the others had intermediate bulges of similar size. The bulges could be observed even on small projects although they did not cause measurable performance issues there. The `controlsBlock_dispred#fff#shared#4` relation had 3M rows on git/git, which is a lot for a project with only 1.5M IR instructions. This commit borrows an efficient implementation from Java's `Guards.qll`, tweaking it slightly to fit into `IRGuards`. Performance is now much better: IRGuards::IRGuardCondition::controlsBlock_dispred#fff ................... 6.1s IRGuards::IRGuardCondition::hasDominatingEdgeTo_dispred#ff .............. 616ms IRGuards::IRGuardCondition::hasDominatingEdgeTo_dispred#ff#antijoin_rhs . 540ms After this commit, the biggest bulge in `controlsBlock` is the size of `IRBlock::dominates`. On kamailio/kamailio this is an intermediate tuple count of 18M rows in the calculation of `controlsBlock`, which in the end produces 11M rows.	2020-06-09 12:15:45 +02:00
Jonas Jensen	cade3a3e23	C++: Use the `hasBranchEdge` helper predicate This tidies up the code, removing unnecessary repetition.	2020-06-09 10:33:03 +02:00
Dave Bartolomeo	3fc02ce24e	C++: Fix join order in virtual dispatch with `unique` The optimizer picked a terrible join order in `VirtualDispatch::DataSensitiveCall::flowsFrom()`. Telling it that `getAnOutNode()` has a unique result convinces it to join first on the `Callable`, rather than on the `ReturnKind`.	2020-06-08 17:15:43 -04:00
Robert Marsh	2a96856ca5	C++/C#: Document IRPositionalParameter	2020-06-08 12:41:26 -07:00
Dave Bartolomeo	c511cc3444	C++: Better caching for `getPrimaryInstructionForSideEffect()`	2020-06-08 15:37:36 -04:00
Dave Bartolomeo	0ae98e78a2	Merge remote-tracking branch 'github/master' into github/codeql-c-analysis-team/69_union	2020-06-08 11:20:14 -04:00
Mathias Vorreiter Pedersen	b48168fc03	C++: Accept tests	2020-06-08 12:26:25 +02:00
Jonas Jensen	c62220e0dc	C++: Fix data-flow dispatch perf with globals There wasn't a good join order for the "store to global var" case in the virtual dispatch library. When a global variable had millions of accesses but few stores to it, the `flowsFrom` predicate would join to see all those millions of accesses before filtering down to stores only. The solution is to pull out a `storeIntoGlobal` helper predicate that pre-computes which accesses are stores. To make the code clearer, I've also pulled out a repeated chunk of code into a new `addressOfGlobal` helper predicate. For the kamailio/kamailio project, these are the tuple counts before: Starting to evaluate predicate DataFlowDispatch::VirtualDispatch::DataSensitiveCall::flowsFrom#fff#cur_delta/3[3]@21a1df (iteration 3) Tuple counts for DataFlowDispatch::VirtualDispatch::DataSensitiveCall::flowsFrom#fff#cur_delta: ... 59002 ~0% {3} r17 = SCAN DataFlowDispatch::VirtualDispatch::DataSensitiveCall::flowsFrom#fff#prev_delta AS I OUTPUT I.<1>, true, I.<0> 58260 ~1% {3} r31 = JOIN r17 WITH DataFlowUtil::Node::asVariable_dispred#fb AS R ON FIRST 1 OUTPUT R.<1>, true, r17.<2> 2536187389 ~6% {3} r32 = JOIN r31 WITH Instruction::VariableInstruction::getASTVariable_dispred#fb_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, true, r31.<2> 2536187389 ~6% {3} r33 = JOIN r32 WITH project#Instruction::VariableAddressInstruction#class#3#ff AS R ON FIRST 1 OUTPUT r32.<0>, true, r32.<2> 58208 ~0% {3} r34 = JOIN r33 WITH Instruction::StoreInstruction::getDestinationAddress_dispred#ff_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, true, r33.<2> Tuple counts after: Starting to evaluate predicate DataFlowDispatch::VirtualDispatch::DataSensitiveCall::flowsFrom#fff#cur_delta/3[3]@6073c5 (iteration 3) Tuple counts for DataFlowDispatch::VirtualDispatch::DataSensitiveCall::flowsFrom#fff#cur_delta: ... 59002 ~0% {3} r17 = SCAN DataFlowDispatch::VirtualDispatch::DataSensitiveCall::flowsFrom#fff#prev_delta AS I OUTPUT I.<1>, true, I.<0> 58260 ~1% {3} r23 = JOIN r17 WITH DataFlowUtil::Node::asVariable_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, true, r17.<2> 58208 ~0% {3} r24 = JOIN r23 WITH DataFlowDispatch::VirtualDispatch::storeIntoGlobal#ff_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, true, r23.<2> 58208 ~0% {3} r25 = JOIN r24 WITH DataFlowUtil::InstructionNode#ff_10#join_rhs AS R ON FIRST 1 OUTPUT true, r24.<2>, R.<1> Notice that the final tuple count, 58208, is the same before and after. The kamailio/kamailio project seems to have been affected by this issue because it has global variables to do with logging policy, and these variables are loaded from in every place where their logging macro is used.	2020-06-08 11:48:40 +02:00
Mathias Vorreiter Pedersen	431cc5c926	C++: Fix inconsistent class name	2020-06-08 11:27:09 +02:00
Mathias Vorreiter Pedersen	01f3793159	C++: Add ReadSideEffect as a possible end instruction for load chains	2020-06-08 11:05:30 +02:00
Mathias Vorreiter Pedersen	a4388e9258	C++: Add example demonstrating missing flow	2020-06-08 11:03:36 +02:00
Robert Marsh	cce99f92a1	C++: exclude conversions in IR field flow tests	2020-06-05 16:19:02 -07:00
Robert Marsh	53a87fa378	C++: accept field flow test changes after merge	2020-06-05 15:41:10 -07:00
Dave Bartolomeo	94c2bba584	C++/C#: Fix formatting	2020-06-05 17:14:14 -04:00
Robert Marsh	0d2f8f3825	Merge branch 'master' into ir-this-parameter-2	2020-06-05 13:52:56 -07:00
Dave Bartolomeo	1c32e4cc68	C++/C#: Do filtering of instructions in cached predicates The four cached predicates used to access common properties of instructions took a `TStageInstruction` as a parameter. This requires the calling code, in `Instruction.qll`, to then join the results with `hasInstruction()` to filter out results for `TRawInstruction`s that were discarded as unreachable. By simply switching the parameter types to `Instruction`, we can force that join to happen in the cached predicate itself. This makes the various accessor predicates on `Instruction` trivially inlinable to the cached predicate, instead of being joins of two huge relations that might have to be recomputed in later stages.	2020-06-05 15:41:21 -04:00
Dave Bartolomeo	e62b884b48	C++/C#: Cache `Instruction.getResultIRType()` Most of the predicates on `Instruction` are thin wrappers around cached predicates in the `IRConstruction` or `SSAConstruction` modules. However, `getResultIRType()` has to join `Construction::getInstructionResultType()` with `LanguageType::getIRType()`. `getResultIRType()` is called frequently both within the IR code and by IR consumers, and that's a big join to have to repeat in multiple stages. I looked at most of the other predicates in `Instruction.qll`, and didn't see any other predicates that met all of the criteria of "large, commonly called, and not already inline".	2020-06-05 15:17:28 -04:00
Dave Bartolomeo	c708ed1fe9	C++: Remove some usage of `Instruction.getResultType()` There were a few places in the IR itself where we use `Instruction.getResultType()`, which returns the C++ `Type` of the result, instead of `Instruction.getResultIRType()`, which returns the language-neutral `IRType` of the result. By removing this usage, we can avoid evaluating `getResultType()` at all. There are still other uses of `Instruction.getResultType()` in other libraries. We should switch those as well.	2020-06-05 14:08:01 -04:00
Dave Bartolomeo	11818489f5	C++/C#: Use `cached` to ensure that IR is evaluated in a single stage Before this change, evaluation of the IR was spread out across about 5 stages. This resulted in a lot of redundant evaluation, especially tuple numbering of large IPA types like `TInstruction`. This change makes two small changes that, when combined, ensure that the IR is evaluated all in one stage: First, we mark `TInstruction` as `cached`. This collapses all of the work to create instructions, across all three IR phases, into a single phase. Second, we make the `SSA` module in `SSAConstruction.qll` just contain aliases to `cached` predicates defined in the `Cached` module. This ensures that all of the `Operand`-related SSA computation happens in the same stage as all of the `Instruction`-related SSA computation.	2020-06-05 14:05:25 -04:00

... 18 19 20 21 22 ...

5006 Commits