diff --git a/.gitignore b/.gitignore index d020f307ebc..dc932d1f5c1 100644 --- a/.gitignore +++ b/.gitignore @@ -19,5 +19,6 @@ # It's useful (though not required) to be able to unpack codeql in the ql checkout itself /codeql/ -.vscode/settings.json + csharp/extractor/Semmle.Extraction.CSharp.Driver/Properties/launchSettings.json +.vscode diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index af8575bc316..9853e6b91b8 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,6 +1,6 @@ # Contributing to CodeQL -We welcome contributions to our CodeQL libraries and queries. Got an idea for a new check, or how to improve an existing query? Then please go ahead and open a pull request! +We welcome contributions to our CodeQL libraries and queries. Got an idea for a new check, or how to improve an existing query? Then please go ahead and open a pull request! Contributions to this project are [released](https://help.github.com/articles/github-terms-of-service/#6-contributions-under-repository-license) to the public under the [project's open source license](LICENSE). There is lots of useful documentation to help you write queries, ranging from information about query file structure to tutorials for specific target languages. For more information on the documentation available, see [Writing CodeQL queries](https://help.semmle.com/QL/learn-ql/writing-queries/writing-queries.html) on [help.semmle.com](https://help.semmle.com). @@ -47,10 +47,6 @@ If you have an idea for a query that you would like to share with other CodeQL u - The query must have at least one true positive result on some revision of a real project. -6. **Contributor License Agreement** - - - The contributor can satisfy the [CLA](#contributor-license-agreement). - Experimental queries and libraries may not be actively maintained as the [supported](docs/supported-queries.md) libraries evolve. They may also be changed in backwards-incompatible ways or may be removed entirely in the future without deprecation warnings. After the experimental query is merged, we welcome pull requests to improve it. Before a query can be moved out of the `experimental` subdirectory, it must satisfy [the requirements for being a supported query](docs/supported-queries.md). @@ -65,33 +61,6 @@ normal course of software development. We also store records of your CLA agreements. Under GDPR legislation, we do this on the basis of our legitimate interest in creating the CodeQL product. -Please do get in touch (privacy@semmle.com) if you have any questions about +Please do get in touch (privacy@github.com) if you have any questions about this or our data protection policies. -## Contributor License Agreement - -This Contributor License Agreement (“Agreement”) is entered into between Semmle Limited (“Semmle,” “we” or “us” etc.), and You (as defined and further identified below). - -Accordingly, You hereby agree to the following terms for Your present and future Contributions submitted to Semmle: - -1. **Definitions**. - - * "You" (or "Your") shall mean the Contribution copyright owner (whether an individual or organization) or legal entity authorized by the copyright owner that is making this Agreement with Semmle. For legal entities, the entity making a Contribution and all other entities that control, are controlled by, or are under common control with that entity are considered to be a single Contributor. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. - - * "Contribution(s)" shall mean the code, documentation or other original works of authorship, including any modifications or additions to an existing work, submitted by You to Semmle for inclusion in, or documentation of, any of the products or projects owned or managed by Semmle (the "Work(s)"). For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to Semmle or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, Semmle for the purpose of discussing and/or improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by You as "Not a Contribution." - -2. **Grant of Copyright License**. You hereby grant to Semmle and to recipients of software distributed by Semmle a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare derivative works of, publicly display, publicly perform, sublicense, and distribute Your Contributions and such derivative works. - -3. **Grant of Patent License**. You hereby grant to Semmle and to recipients of software distributed by Semmle a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by You that are necessarily infringed by Your Contribution(s) alone or by combination of Your Contribution(s) with the Work to which such Contribution(s) was submitted. If any entity institutes patent litigation against You or any other entity (including a cross-claim or counterclaim in a lawsuit) alleging that Your Contribution, or the Work to which You have contributed, constitutes direct or contributory patent infringement, then any patent licenses granted to that entity under this Agreement for that Contribution or Work shall terminate as of the date such litigation is filed. - -4. **Ownership**. Except as set out above, You keep all right, title, and interest in Your Contribution. The rights that You grant to us under this Agreement are effective on the date You first submitted a Contribution to us, even if Your submission took place before the date You entered this Agreement. - -5. **Representations**. You represent and warrant that: (i) the Contributions are an original work and that You can legally grant the rights set out in this Agreement; (ii) the Contributions and Semmle’s exercise of any license rights granted hereunder, does not and will not, infringe the rights of any third party; (iii) You are not aware of any pending or threatened claims, suits, actions, or charges pertaining to the Contributions, including without limitation any claims or allegations that any or all of the Contributions infringes, violates, or misappropriate the intellectual property rights of any third party (You further agree that You will notify Semmle immediately if You become aware of any such actual or potential claims, suits, actions, allegations or charges). - -6. **Employer**. If Your employer(s) has rights to intellectual property that You create that includes Your Contributions, You represent and warrant that Your employer has waived such rights for Your Contributions to Semmle, or that You have received permission to make Contributions on behalf of that employer and that You are authorized to execute this Agreement on behalf of Your employer. - -7. **Inclusion of Code**. We determine the code that is in our Works. You understand that the decision to include the Contribution in any project or source repository is entirely that of Semmle, and this agreement does not guarantee that the Contributions will be included in any product. - -8. **Disclaimer**. You are not expected to provide support for Your Contributions, except to the extent You desire to provide support. You may provide support for free, for a fee, or not at all. Except as set forth herein, and unless required by applicable law or agreed to in writing, You provide Your Contributions on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND. - -9. **General**. The failure of either party to enforce its rights under this Agreement for any period shall not be construed as a waiver of such rights. No changes or modifications or waivers to this Agreement will be effective unless in writing and signed by both parties. In the event that any provision of this Agreement shall be determined to be illegal or unenforceable, that provision will be limited or eliminated to the minimum extent necessary so that this Agreement shall otherwise remain in full force and effect and enforceable. This Agreement shall be governed by and construed in accordance with the laws of the State of California in the United States without regard to the conflicts of laws provisions thereof. In any action or proceeding to enforce rights under this Agreement, the prevailing party will be entitled to recover costs and attorneys’ fees. diff --git a/COPYRIGHT b/COPYRIGHT deleted file mode 100644 index 65d947ff274..00000000000 --- a/COPYRIGHT +++ /dev/null @@ -1,13 +0,0 @@ -Copyright (c) Semmle Inc and other contributors. All rights reserved. - -Licensed under the Apache License, Version 2.0 (the "License"); you may not use -this file except in compliance with the License. You may obtain a copy of the -License at http://www.apache.org/licenses/LICENSE-2.0 - -THIS CODE IS PROVIDED ON AN *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED -WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE, -MERCHANTABLITY OR NON-INFRINGEMENT. - -See the Apache Version 2.0 License for specific language governing permissions -and limitations under the License. diff --git a/LICENSE b/LICENSE index d9a10c0d8e8..e29b05cd648 100644 --- a/LICENSE +++ b/LICENSE @@ -1,176 +1,21 @@ - Apache License - Version 2.0, January 2004 - http://www.apache.org/licenses/ +MIT License - TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION +Copyright (c) 2006-2020 GitHub, Inc. - 1. Definitions. +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: - "License" shall mean the terms and conditions for use, reproduction, - and distribution as defined by Sections 1 through 9 of this document. +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. - "Licensor" shall mean the copyright owner or entity authorized by - the copyright owner that is granting the License. - - "Legal Entity" shall mean the union of the acting entity and all - other entities that control, are controlled by, or are under common - control with that entity. For the purposes of this definition, - "control" means (i) the power, direct or indirect, to cause the - direction or management of such entity, whether by contract or - otherwise, or (ii) ownership of fifty percent (50%) or more of the - outstanding shares, or (iii) beneficial ownership of such entity. - - "You" (or "Your") shall mean an individual or Legal Entity - exercising permissions granted by this License. - - "Source" form shall mean the preferred form for making modifications, - including but not limited to software source code, documentation - source, and configuration files. - - "Object" form shall mean any form resulting from mechanical - transformation or translation of a Source form, including but - not limited to compiled object code, generated documentation, - and conversions to other media types. - - "Work" shall mean the work of authorship, whether in Source or - Object form, made available under the License, as indicated by a - copyright notice that is included in or attached to the work - (an example is provided in the Appendix below). - - "Derivative Works" shall mean any work, whether in Source or Object - form, that is based on (or derived from) the Work and for which the - editorial revisions, annotations, elaborations, or other modifications - represent, as a whole, an original work of authorship. For the purposes - of this License, Derivative Works shall not include works that remain - separable from, or merely link (or bind by name) to the interfaces of, - the Work and Derivative Works thereof. - - "Contribution" shall mean any work of authorship, including - the original version of the Work and any modifications or additions - to that Work or Derivative Works thereof, that is intentionally - submitted to Licensor for inclusion in the Work by the copyright owner - or by an individual or Legal Entity authorized to submit on behalf of - the copyright owner. For the purposes of this definition, "submitted" - means any form of electronic, verbal, or written communication sent - to the Licensor or its representatives, including but not limited to - communication on electronic mailing lists, source code control systems, - and issue tracking systems that are managed by, or on behalf of, the - Licensor for the purpose of discussing and improving the Work, but - excluding communication that is conspicuously marked or otherwise - designated in writing by the copyright owner as "Not a Contribution." - - "Contributor" shall mean Licensor and any individual or Legal Entity - on behalf of whom a Contribution has been received by Licensor and - subsequently incorporated within the Work. - - 2. Grant of Copyright License. Subject to the terms and conditions of - this License, each Contributor hereby grants to You a perpetual, - worldwide, non-exclusive, no-charge, royalty-free, irrevocable - copyright license to reproduce, prepare Derivative Works of, - publicly display, publicly perform, sublicense, and distribute the - Work and such Derivative Works in Source or Object form. - - 3. Grant of Patent License. Subject to the terms and conditions of - this License, each Contributor hereby grants to You a perpetual, - worldwide, non-exclusive, no-charge, royalty-free, irrevocable - (except as stated in this section) patent license to make, have made, - use, offer to sell, sell, import, and otherwise transfer the Work, - where such license applies only to those patent claims licensable - by such Contributor that are necessarily infringed by their - Contribution(s) alone or by combination of their Contribution(s) - with the Work to which such Contribution(s) was submitted. If You - institute patent litigation against any entity (including a - cross-claim or counterclaim in a lawsuit) alleging that the Work - or a Contribution incorporated within the Work constitutes direct - or contributory patent infringement, then any patent licenses - granted to You under this License for that Work shall terminate - as of the date such litigation is filed. - - 4. Redistribution. You may reproduce and distribute copies of the - Work or Derivative Works thereof in any medium, with or without - modifications, and in Source or Object form, provided that You - meet the following conditions: - - (a) You must give any other recipients of the Work or - Derivative Works a copy of this License; and - - (b) You must cause any modified files to carry prominent notices - stating that You changed the files; and - - (c) You must retain, in the Source form of any Derivative Works - that You distribute, all copyright, patent, trademark, and - attribution notices from the Source form of the Work, - excluding those notices that do not pertain to any part of - the Derivative Works; and - - (d) If the Work includes a "NOTICE" text file as part of its - distribution, then any Derivative Works that You distribute must - include a readable copy of the attribution notices contained - within such NOTICE file, excluding those notices that do not - pertain to any part of the Derivative Works, in at least one - of the following places: within a NOTICE text file distributed - as part of the Derivative Works; within the Source form or - documentation, if provided along with the Derivative Works; or, - within a display generated by the Derivative Works, if and - wherever such third-party notices normally appear. The contents - of the NOTICE file are for informational purposes only and - do not modify the License. You may add Your own attribution - notices within Derivative Works that You distribute, alongside - or as an addendum to the NOTICE text from the Work, provided - that such additional attribution notices cannot be construed - as modifying the License. - - You may add Your own copyright statement to Your modifications and - may provide additional or different license terms and conditions - for use, reproduction, or distribution of Your modifications, or - for any such Derivative Works as a whole, provided Your use, - reproduction, and distribution of the Work otherwise complies with - the conditions stated in this License. - - 5. Submission of Contributions. Unless You explicitly state otherwise, - any Contribution intentionally submitted for inclusion in the Work - by You to the Licensor shall be under the terms and conditions of - this License, without any additional terms or conditions. - Notwithstanding the above, nothing herein shall supersede or modify - the terms of any separate license agreement you may have executed - with Licensor regarding such Contributions. - - 6. Trademarks. This License does not grant permission to use the trade - names, trademarks, service marks, or product names of the Licensor, - except as required for reasonable and customary use in describing the - origin of the Work and reproducing the content of the NOTICE file. - - 7. Disclaimer of Warranty. Unless required by applicable law or - agreed to in writing, Licensor provides the Work (and each - Contributor provides its Contributions) on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - implied, including, without limitation, any warranties or conditions - of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A - PARTICULAR PURPOSE. You are solely responsible for determining the - appropriateness of using or redistributing the Work and assume any - risks associated with Your exercise of permissions under this License. - - 8. Limitation of Liability. In no event and under no legal theory, - whether in tort (including negligence), contract, or otherwise, - unless required by applicable law (such as deliberate and grossly - negligent acts) or agreed to in writing, shall any Contributor be - liable to You for damages, including any direct, indirect, special, - incidental, or consequential damages of any character arising as a - result of this License or out of the use or inability to use the - Work (including but not limited to damages for loss of goodwill, - work stoppage, computer failure or malfunction, or any and all - other commercial damages or losses), even if such Contributor - has been advised of the possibility of such damages. - - 9. Accepting Warranty or Additional Liability. While redistributing - the Work or Derivative Works thereof, You may choose to offer, - and charge a fee for, acceptance of support, warranty, indemnity, - or other liability obligations and/or rights consistent with this - License. However, in accepting such obligations, You may act only - on Your own behalf and on Your sole responsibility, not on behalf - of any other Contributor, and only if You agree to indemnify, - defend, and hold each Contributor harmless for any liability - incurred by, or claims asserted against, such Contributor by reason - of your accepting any such warranty or additional liability. - - END OF TERMS AND CONDITIONS +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/change-notes/1.24/analysis-cpp.md b/change-notes/1.24/analysis-cpp.md index 205e5e7a2d7..7ab002857cf 100644 --- a/change-notes/1.24/analysis-cpp.md +++ b/change-notes/1.24/analysis-cpp.md @@ -24,6 +24,7 @@ The following changes in version 1.24 affect C/C++ analysis in all applications. | No space for zero terminator (`cpp/no-space-for-terminator`) | More correct results | String arguments to formatting functions are now (usually) expected to be null terminated strings. | | Hard-coded Japanese era start date (`cpp/japanese-era/exact-era-date`) | | This query is no longer run on LGTM. | | No space for zero terminator (`cpp/no-space-for-terminator`) | Fewer false positive results | This query has been modified to be more conservative when identifying which pointers point to null-terminated strings. This approach produces fewer, more accurate results. | +| Overflow in uncontrolled allocation size (`cpp/uncontrolled-allocation-size`) | Fewer false positive results | Cases where the tainted allocation size is range checked are now more reliably excluded. | | Overloaded assignment does not return 'this' (`cpp/assignment-does-not-return-this`) | Fewer false positive results | This query no longer reports incorrect results in template classes. | | Unsafe array for days of the year (`cpp/leap-year/unsafe-array-for-days-of-the-year`) | | This query is no longer run on LGTM. | | Unsigned comparison to zero (`cpp/unsigned-comparison-zero`) | More correct results | This query now also looks for comparisons of the form `0 <= x`. | @@ -46,9 +47,11 @@ The following changes in version 1.24 affect C/C++ analysis in all applications. `StackVariableReachability`. The functionality is the same. * The models library models `strlen` in more detail, and includes common variations such as `wcslen`. * The models library models `gets` and similar functions. +* The models library now partially models `std::string`. * The taint tracking library (`semmle.code.cpp.dataflow.TaintTracking`) has had the following improvements: * The library now models data flow through `strdup` and similar functions. * The library now models data flow through formatting functions such as `sprintf`. * The security pack taint tracking library (`semmle.code.cpp.security.TaintTracking`) uses a new intermediate representation. This provides a more precise analysis of pointers to stack variables and flow through parameters, improving the results of many security queries. * The global value numbering library (`semmle.code.cpp.valuenumbering.GlobalValueNumbering`) uses a new intermediate representation to provide a more precise analysis of heap allocated memory and pointers to stack variables. +* `freeCall` in `semmle.code.cpp.commons.Alloc` has been deprecated. The`Allocation` and `Deallocation` models in `semmle.code.cpp.models.interfaces` should be used instead. diff --git a/change-notes/1.24/analysis-csharp.md b/change-notes/1.24/analysis-csharp.md index 02f31b12533..a745e985eae 100644 --- a/change-notes/1.24/analysis-csharp.md +++ b/change-notes/1.24/analysis-csharp.md @@ -22,6 +22,8 @@ The following changes in version 1.24 affect C# analysis in all applications. | Dereferenced variable may be null (`cs/dereferenced-value-may-be-null`) | More results | Results are reported from parameters with a default value of `null`. | | Useless assignment to local variable (`cs/useless-assignment-to-local`) | Fewer false positive results | Results have been removed when the value assigned is an (implicitly or explicitly) cast default-like value. For example, `var s = (string)null` and `string s = default`. | | XPath injection (`cs/xml/xpath-injection`) | More results | The query now recognizes calls to methods on `System.Xml.XPath.XPathNavigator` objects. | +| Information exposure through transmitted data (`cs/sensitive-data-transmission`) | More results | The query now recognizes writes to cookies and writes to ASP.NET (`Inner`)`Text` properties as additional sinks. | +| Information exposure through an exception (`cs/information-exposure-through-exception`) | More results | The query now recognizes writes to cookies, writes to ASP.NET (`Inner`)`Text` properties, and email contents as additional sinks. | ## Removal of old queries @@ -42,5 +44,6 @@ The following changes in version 1.24 affect C# analysis in all applications. * [Code contracts](https://docs.microsoft.com/en-us/dotnet/framework/debug-trace-profile/code-contracts) are now recognized, and are treated like any other assertion methods. * Expression nullability flow state is given by the predicates `Expr.hasNotNullFlowState()` and `Expr.hasMaybeNullFlowState()`. * `stackalloc` array creations are now represented by the QL class `Stackalloc`. Previously they were represented by the class `ArrayCreation`. +* A new class `RemoteFlowSink` has been added to model sinks where data might be exposed to external users. Examples include web page output, e-mails, and cookies. ## Changes to autobuilder diff --git a/change-notes/1.24/analysis-javascript.md b/change-notes/1.24/analysis-javascript.md index 7c72ecea6b1..0e92d66033d 100644 --- a/change-notes/1.24/analysis-javascript.md +++ b/change-notes/1.24/analysis-javascript.md @@ -86,6 +86,7 @@ | Useless regular-expression character escape (`js/useless-regexp-character-escape`) | Fewer false positive results | This query now distinguishes escapes in strings and regular expression literals. | | Identical operands (`js/redundant-operation`) | Fewer results | This query now recognizes cases where the operands change a value using ++/-- expressions. | | Superfluous trailing arguments (`js/superfluous-trailing-arguments`) | Fewer results | This query now recognizes cases where a function uses the `Function.arguments` value to process a variable number of parameters. | +| Incomplete URL scheme check (`js/incomplete-url-scheme-check`) | More results | This query now recognizes more variations of URL scheme checks. | ## Changes to libraries diff --git a/config/identical-files.json b/config/identical-files.json index b16b61df654..a5af067ab5d 100644 --- a/config/identical-files.json +++ b/config/identical-files.json @@ -242,6 +242,13 @@ "csharp/ql/src/semmle/code/csharp/ir/implementation/raw/gvn/ValueNumbering.qll", "csharp/ql/src/semmle/code/csharp/ir/implementation/unaliased_ssa/gvn/ValueNumbering.qll" ], + "C++ IR PrintValueNumbering": [ + "cpp/ql/src/semmle/code/cpp/ir/implementation/raw/gvn/PrintValueNumbering.qll", + "cpp/ql/src/semmle/code/cpp/ir/implementation/unaliased_ssa/gvn/PrintValueNumbering.qll", + "cpp/ql/src/semmle/code/cpp/ir/implementation/aliased_ssa/gvn/PrintValueNumbering.qll", + "csharp/ql/src/semmle/code/csharp/ir/implementation/raw/gvn/PrintValueNumbering.qll", + "csharp/ql/src/semmle/code/csharp/ir/implementation/unaliased_ssa/gvn/PrintValueNumbering.qll" + ], "C++ IR ConstantAnalysis": [ "cpp/ql/src/semmle/code/cpp/ir/implementation/raw/constant/ConstantAnalysis.qll", "cpp/ql/src/semmle/code/cpp/ir/implementation/unaliased_ssa/constant/ConstantAnalysis.qll", diff --git a/cpp/ql/src/Critical/MemoryFreed.qll b/cpp/ql/src/Critical/MemoryFreed.qll index d548aca26e7..880199e54c9 100644 --- a/cpp/ql/src/Critical/MemoryFreed.qll +++ b/cpp/ql/src/Critical/MemoryFreed.qll @@ -4,7 +4,7 @@ private predicate freed(Expr e) { e = any(DeallocationExpr de).getFreedExpr() or exists(ExprCall c | - // cautiously assume that any ExprCall could be a freeCall. + // cautiously assume that any `ExprCall` could be a deallocation expression. c.getAnArgument() = e ) } diff --git a/cpp/ql/src/Critical/NewDelete.qll b/cpp/ql/src/Critical/NewDelete.qll index 9dd55525b59..30b9f9ad94a 100644 --- a/cpp/ql/src/Critical/NewDelete.qll +++ b/cpp/ql/src/Critical/NewDelete.qll @@ -5,17 +5,34 @@ import cpp import semmle.code.cpp.controlflow.SSA import semmle.code.cpp.dataflow.DataFlow +import semmle.code.cpp.models.implementations.Allocation +import semmle.code.cpp.models.implementations.Deallocation /** * Holds if `alloc` is a use of `malloc` or `new`. `kind` is * a string describing the type of the allocation. */ predicate allocExpr(Expr alloc, string kind) { - isAllocationExpr(alloc) and - not alloc.isFromUninstantiatedTemplate(_) and ( - alloc instanceof FunctionCall and - kind = "malloc" + exists(Function target | + alloc.(AllocationExpr).(FunctionCall).getTarget() = target and + ( + target.getName() = "operator new" and + kind = "new" and + // exclude placement new and custom overloads as they + // may not conform to assumptions + not target.getNumberOfParameters() > 1 + or + target.getName() = "operator new[]" and + kind = "new[]" and + // exclude placement new and custom overloads as they + // may not conform to assumptions + not target.getNumberOfParameters() > 1 + or + not target instanceof OperatorNewAllocationFunction and + kind = "malloc" + ) + ) or alloc instanceof NewExpr and kind = "new" and @@ -28,7 +45,8 @@ predicate allocExpr(Expr alloc, string kind) { // exclude placement new and custom overloads as they // may not conform to assumptions not alloc.(NewArrayExpr).getAllocatorCall().getTarget().getNumberOfParameters() > 1 - ) + ) and + not alloc.isFromUninstantiatedTemplate(_) } /** @@ -110,8 +128,20 @@ predicate allocReaches(Expr e, Expr alloc, string kind) { * describing the type of that free or delete. */ predicate freeExpr(Expr free, Expr freed, string kind) { - freeCall(free, freed) and - kind = "free" + exists(Function target | + freed = free.(DeallocationExpr).getFreedExpr() and + free.(FunctionCall).getTarget() = target and + ( + target.getName() = "operator delete" and + kind = "delete" + or + target.getName() = "operator delete[]" and + kind = "delete[]" + or + not target instanceof OperatorDeleteDeallocationFunction and + kind = "free" + ) + ) or free.(DeleteExpr).getExpr() = freed and kind = "delete" diff --git a/cpp/ql/src/Likely Bugs/ReturnConstTypeMember.cpp b/cpp/ql/src/Likely Bugs/ReturnConstTypeMember.cpp index cc6b6da3abb..ad3d7d3c9e1 100644 --- a/cpp/ql/src/Likely Bugs/ReturnConstTypeMember.cpp +++ b/cpp/ql/src/Likely Bugs/ReturnConstTypeMember.cpp @@ -8,6 +8,6 @@ struct S { // Whereas here it does make a semantic difference. auto getValCorrect() const -> int { - return val + return val; } }; diff --git a/cpp/ql/src/Security/CWE/CWE-022/TaintedPath.ql b/cpp/ql/src/Security/CWE/CWE-022/TaintedPath.ql index 42a29b96268..496b957cca3 100644 --- a/cpp/ql/src/Security/CWE/CWE-022/TaintedPath.ql +++ b/cpp/ql/src/Security/CWE/CWE-022/TaintedPath.ql @@ -2,7 +2,7 @@ * @name Uncontrolled data used in path expression * @description Accessing paths influenced by users can allow an * attacker to access unexpected resources. - * @kind problem + * @kind path-problem * @problem.severity warning * @precision medium * @id cpp/path-injection @@ -17,6 +17,7 @@ import cpp import semmle.code.cpp.security.FunctionWithWrappers import semmle.code.cpp.security.Security import semmle.code.cpp.security.TaintTracking +import TaintedWithPath /** * A function for opening a file. @@ -51,12 +52,19 @@ class FileFunction extends FunctionWithWrappers { override predicate interestingArg(int arg) { arg = 0 } } +class TaintedPathConfiguration extends TaintTrackingConfiguration { + override predicate isSink(Element tainted) { + exists(FileFunction fileFunction | fileFunction.outermostWrapperFunctionCall(tainted, _)) + } +} + from - FileFunction fileFunction, Expr taintedArg, Expr taintSource, string taintCause, string callChain + FileFunction fileFunction, Expr taintedArg, Expr taintSource, PathNode sourceNode, + PathNode sinkNode, string taintCause, string callChain where fileFunction.outermostWrapperFunctionCall(taintedArg, callChain) and - tainted(taintSource, taintedArg) and + taintedWithPath(taintSource, taintedArg, sourceNode, sinkNode) and isUserInput(taintSource, taintCause) -select taintedArg, +select taintedArg, sourceNode, sinkNode, "This argument to a file access function is derived from $@ and then passed to " + callChain, taintSource, "user input (" + taintCause + ")" diff --git a/cpp/ql/src/Security/CWE/CWE-079/CgiXss.ql b/cpp/ql/src/Security/CWE/CWE-079/CgiXss.ql index 8b7fb83df81..0e0c4add7f6 100644 --- a/cpp/ql/src/Security/CWE/CWE-079/CgiXss.ql +++ b/cpp/ql/src/Security/CWE/CWE-079/CgiXss.ql @@ -2,7 +2,7 @@ * @name CGI script vulnerable to cross-site scripting * @description Writing user input directly to a web page * allows for a cross-site scripting vulnerability. - * @kind problem + * @kind path-problem * @problem.severity error * @precision high * @id cpp/cgi-xss @@ -13,6 +13,7 @@ import cpp import semmle.code.cpp.commons.Environment import semmle.code.cpp.security.TaintTracking +import TaintedWithPath /** A call that prints its arguments to `stdout`. */ class PrintStdoutCall extends FunctionCall { @@ -27,8 +28,13 @@ class QueryString extends EnvironmentRead { QueryString() { getEnvironmentVariable() = "QUERY_STRING" } } -from QueryString query, PrintStdoutCall call, Element printedArg -where - call.getAnArgument() = printedArg and - tainted(query, printedArg) -select printedArg, "Cross-site scripting vulnerability due to $@.", query, "this query data" +class Configuration extends TaintTrackingConfiguration { + override predicate isSink(Element tainted) { + exists(PrintStdoutCall call | call.getAnArgument() = tainted) + } +} + +from QueryString query, Element printedArg, PathNode sourceNode, PathNode sinkNode +where taintedWithPath(query, printedArg, sourceNode, sinkNode) +select printedArg, sourceNode, sinkNode, "Cross-site scripting vulnerability due to $@.", query, + "this query data" diff --git a/cpp/ql/src/Security/CWE/CWE-089/SqlTainted.ql b/cpp/ql/src/Security/CWE/CWE-089/SqlTainted.ql index 3f234dd0678..de786d22f30 100644 --- a/cpp/ql/src/Security/CWE/CWE-089/SqlTainted.ql +++ b/cpp/ql/src/Security/CWE/CWE-089/SqlTainted.ql @@ -3,7 +3,7 @@ * @description Including user-supplied data in a SQL query without * neutralizing special elements can make code vulnerable * to SQL Injection. - * @kind problem + * @kind path-problem * @problem.severity error * @precision high * @id cpp/sql-injection @@ -15,6 +15,7 @@ import cpp import semmle.code.cpp.security.Security import semmle.code.cpp.security.FunctionWithWrappers import semmle.code.cpp.security.TaintTracking +import TaintedWithPath class SQLLikeFunction extends FunctionWithWrappers { SQLLikeFunction() { sqlArgument(this.getName(), _) } @@ -22,11 +23,19 @@ class SQLLikeFunction extends FunctionWithWrappers { override predicate interestingArg(int arg) { sqlArgument(this.getName(), arg) } } -from SQLLikeFunction runSql, Expr taintedArg, Expr taintSource, string taintCause, string callChain +class Configuration extends TaintTrackingConfiguration { + override predicate isSink(Element tainted) { + exists(SQLLikeFunction runSql | runSql.outermostWrapperFunctionCall(tainted, _)) + } +} + +from + SQLLikeFunction runSql, Expr taintedArg, Expr taintSource, PathNode sourceNode, PathNode sinkNode, + string taintCause, string callChain where runSql.outermostWrapperFunctionCall(taintedArg, callChain) and - tainted(taintSource, taintedArg) and + taintedWithPath(taintSource, taintedArg, sourceNode, sinkNode) and isUserInput(taintSource, taintCause) -select taintedArg, +select taintedArg, sourceNode, sinkNode, "This argument to a SQL query function is derived from $@ and then passed to " + callChain, taintSource, "user input (" + taintCause + ")" diff --git a/cpp/ql/src/Security/CWE/CWE-114/UncontrolledProcessOperation.ql b/cpp/ql/src/Security/CWE/CWE-114/UncontrolledProcessOperation.ql index 233fe34e6bf..943c13f9c5d 100644 --- a/cpp/ql/src/Security/CWE/CWE-114/UncontrolledProcessOperation.ql +++ b/cpp/ql/src/Security/CWE/CWE-114/UncontrolledProcessOperation.ql @@ -3,7 +3,7 @@ * @description Using externally controlled strings in a process * operation can allow an attacker to execute malicious * commands. - * @kind problem + * @kind path-problem * @problem.severity warning * @precision medium * @id cpp/uncontrolled-process-operation @@ -14,13 +14,24 @@ import cpp import semmle.code.cpp.security.Security import semmle.code.cpp.security.TaintTracking +import TaintedWithPath -from string processOperation, int processOperationArg, FunctionCall call, Expr arg, Element source +predicate isProcessOperationExplanation(Expr arg, string processOperation) { + exists(int processOperationArg, FunctionCall call | + isProcessOperationArgument(processOperation, processOperationArg) and + call.getTarget().getName() = processOperation and + call.getArgument(processOperationArg) = arg + ) +} + +class Configuration extends TaintTrackingConfiguration { + override predicate isSink(Element arg) { isProcessOperationExplanation(arg, _) } +} + +from string processOperation, Expr arg, Expr source, PathNode sourceNode, PathNode sinkNode where - isProcessOperationArgument(processOperation, processOperationArg) and - call.getTarget().getName() = processOperation and - call.getArgument(processOperationArg) = arg and - tainted(source, arg) -select arg, + isProcessOperationExplanation(arg, processOperation) and + taintedWithPath(source, arg, sourceNode, sinkNode) +select arg, sourceNode, sinkNode, "The value of this argument may come from $@ and is being passed to " + processOperation, source, source.toString() diff --git a/cpp/ql/src/Security/CWE/CWE-120/UnboundedWrite.ql b/cpp/ql/src/Security/CWE/CWE-120/UnboundedWrite.ql index ff136d2c06b..f1a8b4e8544 100644 --- a/cpp/ql/src/Security/CWE/CWE-120/UnboundedWrite.ql +++ b/cpp/ql/src/Security/CWE/CWE-120/UnboundedWrite.ql @@ -2,7 +2,7 @@ * @name Unbounded write * @description Buffer write operations that do not control the length * of data written may overflow. - * @kind problem + * @kind path-problem * @problem.severity error * @precision medium * @id cpp/unbounded-write @@ -16,6 +16,7 @@ import semmle.code.cpp.security.BufferWrite import semmle.code.cpp.security.Security import semmle.code.cpp.security.TaintTracking +import TaintedWithPath /* * --- Summary of CWE-120 alerts --- @@ -54,32 +55,48 @@ predicate isUnboundedWrite(BufferWrite bw) { * } */ +/** + * Holds if `e` is a source buffer going into an unbounded write `bw` or a + * qualifier of (a qualifier of ...) such a source. + */ +predicate unboundedWriteSource(Expr e, BufferWrite bw) { + isUnboundedWrite(bw) and e = bw.getASource() + or + exists(FieldAccess fa | unboundedWriteSource(fa, bw) and e = fa.getQualifier()) +} + /* * --- user input reach --- */ -/** - * Identifies expressions that are potentially tainted with user - * input. Most of the work for this is actually done by the - * TaintTracking library. - */ -predicate tainted2(Expr expr, Expr inputSource, string inputCause) { - taintedIncludingGlobalVars(inputSource, expr, _) and - inputCause = inputSource.toString() - or - exists(Expr e | tainted2(e, inputSource, inputCause) | - // field accesses of a tainted struct are tainted - e = expr.(FieldAccess).getQualifier() - ) +class Configuration extends TaintTrackingConfiguration { + override predicate isSink(Element tainted) { unboundedWriteSource(tainted, _) } + + override predicate taintThroughGlobals() { any() } } /* * --- put it together --- */ -from BufferWrite bw, Expr inputSource, string inputCause +/* + * An unbounded write is, for example `strcpy(..., tainted)`. We're looking + * for a tainted source buffer of an unbounded write, where this source buffer + * is a sink in the taint-tracking analysis. + * + * In the case of `gets` and `scanf`, where the source buffer is implicit, the + * `BufferWrite` library reports the source buffer to be the same as the + * destination buffer. Since those destination-buffer arguments are also + * modeled in the taint-tracking library as being _sources_ of taint, they are + * in practice reported as being tainted because the `security.TaintTracking` + * library does not distinguish between taint going into an argument and out of + * an argument. Thus, we get the desired alerts. + */ + +from BufferWrite bw, Expr inputSource, Expr tainted, PathNode sourceNode, PathNode sinkNode where - isUnboundedWrite(bw) and - tainted2(bw.getASource(), inputSource, inputCause) -select bw, "This '" + bw.getBWDesc() + "' with input from $@ may overflow the destination.", - inputSource, inputCause + taintedWithPath(inputSource, tainted, sourceNode, sinkNode) and + unboundedWriteSource(tainted, bw) +select bw, sourceNode, sinkNode, + "This '" + bw.getBWDesc() + "' with input from $@ may overflow the destination.", inputSource, + inputSource.toString() diff --git a/cpp/ql/src/Security/CWE/CWE-134/UncontrolledFormatString.ql b/cpp/ql/src/Security/CWE/CWE-134/UncontrolledFormatString.ql index 02a0f42fa6b..91ccc5c4d40 100644 --- a/cpp/ql/src/Security/CWE/CWE-134/UncontrolledFormatString.ql +++ b/cpp/ql/src/Security/CWE/CWE-134/UncontrolledFormatString.ql @@ -3,7 +3,7 @@ * @description Using externally-controlled format strings in * printf-style functions can lead to buffer overflows * or data representation problems. - * @kind problem + * @kind path-problem * @problem.severity warning * @precision medium * @id cpp/tainted-format-string @@ -16,12 +16,21 @@ import cpp import semmle.code.cpp.security.Security import semmle.code.cpp.security.FunctionWithWrappers import semmle.code.cpp.security.TaintTracking +import TaintedWithPath -from PrintfLikeFunction printf, Expr arg, string printfFunction, Expr userValue, string cause +class Configuration extends TaintTrackingConfiguration { + override predicate isSink(Element tainted) { + exists(PrintfLikeFunction printf | printf.outermostWrapperFunctionCall(tainted, _)) + } +} + +from + PrintfLikeFunction printf, Expr arg, PathNode sourceNode, PathNode sinkNode, + string printfFunction, Expr userValue, string cause where printf.outermostWrapperFunctionCall(arg, printfFunction) and - tainted(userValue, arg) and + taintedWithPath(userValue, arg, sourceNode, sinkNode) and isUserInput(userValue, cause) -select arg, +select arg, sourceNode, sinkNode, "The value of this argument may come from $@ and is being used as a formatting argument to " + printfFunction, userValue, cause diff --git a/cpp/ql/src/Security/CWE/CWE-134/UncontrolledFormatStringThroughGlobalVar.ql b/cpp/ql/src/Security/CWE/CWE-134/UncontrolledFormatStringThroughGlobalVar.ql index f3cb4fcf1bb..96cffdb024b 100644 --- a/cpp/ql/src/Security/CWE/CWE-134/UncontrolledFormatStringThroughGlobalVar.ql +++ b/cpp/ql/src/Security/CWE/CWE-134/UncontrolledFormatStringThroughGlobalVar.ql @@ -3,7 +3,7 @@ * @description Using externally-controlled format strings in * printf-style functions can lead to buffer overflows * or data representation problems. - * @kind problem + * @kind path-problem * @problem.severity warning * @precision medium * @id cpp/tainted-format-string-through-global @@ -16,15 +16,24 @@ import cpp import semmle.code.cpp.security.FunctionWithWrappers import semmle.code.cpp.security.Security import semmle.code.cpp.security.TaintTracking +import TaintedWithPath + +class Configuration extends TaintTrackingConfiguration { + override predicate isSink(Element tainted) { + exists(PrintfLikeFunction printf | printf.outermostWrapperFunctionCall(tainted, _)) + } + + override predicate taintThroughGlobals() { any() } +} from - PrintfLikeFunction printf, Expr arg, string printfFunction, Expr userValue, string cause, - string globalVar + PrintfLikeFunction printf, Expr arg, PathNode sourceNode, PathNode sinkNode, + string printfFunction, Expr userValue, string cause where printf.outermostWrapperFunctionCall(arg, printfFunction) and - not tainted(_, arg) and - taintedIncludingGlobalVars(userValue, arg, globalVar) and + not taintedWithoutGlobals(arg) and + taintedWithPath(userValue, arg, sourceNode, sinkNode) and isUserInput(userValue, cause) -select arg, - "This value may flow through $@, originating from $@, and is a formatting argument to " + - printfFunction + ".", globalVarFromId(globalVar), globalVar, userValue, cause +select arg, sourceNode, sinkNode, + "The value of this argument may come from $@ and is being used as a formatting argument to " + + printfFunction, userValue, cause diff --git a/cpp/ql/src/Security/CWE/CWE-190/ArithmeticUncontrolled.ql b/cpp/ql/src/Security/CWE/CWE-190/ArithmeticUncontrolled.ql index 8013fbcf614..a4b0f131d14 100644 --- a/cpp/ql/src/Security/CWE/CWE-190/ArithmeticUncontrolled.ql +++ b/cpp/ql/src/Security/CWE/CWE-190/ArithmeticUncontrolled.ql @@ -2,7 +2,7 @@ * @name Uncontrolled data in arithmetic expression * @description Arithmetic operations on uncontrolled data that is not * validated can cause overflows. - * @kind problem + * @kind path-problem * @problem.severity warning * @precision medium * @id cpp/uncontrolled-arithmetic @@ -15,6 +15,7 @@ import cpp import semmle.code.cpp.security.Overflow import semmle.code.cpp.security.Security import semmle.code.cpp.security.TaintTracking +import TaintedWithPath predicate isRandCall(FunctionCall fc) { fc.getTarget().getName() = "rand" } @@ -40,9 +41,22 @@ class SecurityOptionsArith extends SecurityOptions { } } -predicate taintedVarAccess(Expr origin, VariableAccess va) { - isUserInput(origin, _) and - tainted(origin, va) +predicate isDiv(VariableAccess va) { exists(AssignDivExpr div | div.getLValue() = va) } + +predicate missingGuard(VariableAccess va, string effect) { + exists(Operation op | op.getAnOperand() = va | + missingGuardAgainstUnderflow(op, va) and effect = "underflow" + or + missingGuardAgainstOverflow(op, va) and effect = "overflow" + ) +} + +class Configuration extends TaintTrackingConfiguration { + override predicate isSink(Element e) { + isDiv(e) + or + missingGuard(e, _) + } } /** @@ -50,19 +64,17 @@ predicate taintedVarAccess(Expr origin, VariableAccess va) { * range. */ predicate guardedByAssignDiv(Expr origin) { - isUserInput(origin, _) and - exists(AssignDivExpr div, VariableAccess va | tainted(origin, va) and div.getLValue() = va) + exists(VariableAccess va | + taintedWithPath(origin, va, _, _) and + isDiv(va) + ) } -from Expr origin, Operation op, VariableAccess va, string effect +from Expr origin, VariableAccess va, string effect, PathNode sourceNode, PathNode sinkNode where - taintedVarAccess(origin, va) and - op.getAnOperand() = va and - ( - missingGuardAgainstUnderflow(op, va) and effect = "underflow" - or - missingGuardAgainstOverflow(op, va) and effect = "overflow" - ) and + taintedWithPath(origin, va, sourceNode, sinkNode) and + missingGuard(va, effect) and not guardedByAssignDiv(origin) -select va, "$@ flows to here and is used in arithmetic, potentially causing an " + effect + ".", - origin, "Uncontrolled value" +select va, sourceNode, sinkNode, + "$@ flows to here and is used in arithmetic, potentially causing an " + effect + ".", origin, + "Uncontrolled value" diff --git a/cpp/ql/src/Security/CWE/CWE-190/TaintedAllocationSize.ql b/cpp/ql/src/Security/CWE/CWE-190/TaintedAllocationSize.ql index 85248a26d19..2ed95e6cb55 100644 --- a/cpp/ql/src/Security/CWE/CWE-190/TaintedAllocationSize.ql +++ b/cpp/ql/src/Security/CWE/CWE-190/TaintedAllocationSize.ql @@ -2,7 +2,7 @@ * @name Overflow in uncontrolled allocation size * @description Allocating memory with a size controlled by an external * user can result in integer overflow. - * @kind problem + * @kind path-problem * @problem.severity error * @precision high * @id cpp/uncontrolled-allocation-size @@ -13,21 +13,33 @@ import cpp import semmle.code.cpp.security.TaintTracking +import TaintedWithPath -predicate taintedAllocSize(Expr e, Expr source, string taintCause) { +predicate taintedChild(Expr e, Expr tainted) { ( - isAllocationExpr(e) or + isAllocationExpr(e) + or any(MulExpr me | me.getAChild() instanceof SizeofOperator) = e ) and + tainted = e.getAChild() and + tainted.getUnspecifiedType() instanceof IntegralType +} + +class TaintedAllocationSizeConfiguration extends TaintTrackingConfiguration { + override predicate isSink(Element tainted) { taintedChild(_, tainted) } +} + +predicate taintedAllocSize( + Expr e, Expr source, PathNode sourceNode, PathNode sinkNode, string taintCause +) { + isUserInput(source, taintCause) and exists(Expr tainted | - tainted = e.getAChild() and - tainted.getUnspecifiedType() instanceof IntegralType and - isUserInput(source, taintCause) and - tainted(source, tainted) + taintedChild(e, tainted) and + taintedWithPath(source, tainted, sourceNode, sinkNode) ) } -from Expr e, Expr source, string taintCause -where taintedAllocSize(e, source, taintCause) -select e, "This allocation size is derived from $@ and might overflow", source, - "user input (" + taintCause + ")" +from Expr e, Expr source, PathNode sourceNode, PathNode sinkNode, string taintCause +where taintedAllocSize(e, source, sourceNode, sinkNode, taintCause) +select e, sourceNode, sinkNode, "This allocation size is derived from $@ and might overflow", + source, "user input (" + taintCause + ")" diff --git a/cpp/ql/src/Security/CWE/CWE-290/AuthenticationBypass.ql b/cpp/ql/src/Security/CWE/CWE-290/AuthenticationBypass.ql index 19bee24a16d..80b5ee49e97 100644 --- a/cpp/ql/src/Security/CWE/CWE-290/AuthenticationBypass.ql +++ b/cpp/ql/src/Security/CWE/CWE-290/AuthenticationBypass.ql @@ -3,7 +3,7 @@ * @description Authentication by checking that the peer's address * matches a known IP or web address is unsafe as it is * vulnerable to spoofing attacks. - * @kind problem + * @kind path-problem * @problem.severity warning * @precision medium * @id cpp/user-controlled-bypass @@ -12,6 +12,7 @@ */ import semmle.code.cpp.security.TaintTracking +import TaintedWithPath predicate hardCodedAddressOrIP(StringLiteral txt) { exists(string s | s = txt.getValueText() | @@ -102,16 +103,21 @@ predicate useOfHardCodedAddressOrIP(Expr use) { * untrusted input then it might be vulnerable to a spoofing * attack. */ -predicate hardCodedAddressInCondition(Expr source, Expr condition) { - // One of the sub-expressions of the condition is tainted. - exists(Expr taintedExpr | taintedExpr.getParent+() = condition | tainted(source, taintedExpr)) and +predicate hardCodedAddressInCondition(Expr subexpression, Expr condition) { + subexpression = condition.getAChild+() and // One of the sub-expressions of the condition is a hard-coded // IP or web-address. - exists(Expr use | use.getParent+() = condition | useOfHardCodedAddressOrIP(use)) and + exists(Expr use | use = condition.getAChild+() | useOfHardCodedAddressOrIP(use)) and condition = any(IfStmt ifStmt).getCondition() } -from Expr source, Expr condition -where hardCodedAddressInCondition(source, condition) -select condition, "Untrusted input $@ might be vulnerable to a spoofing attack.", source, - source.toString() +class Configuration extends TaintTrackingConfiguration { + override predicate isSink(Element sink) { hardCodedAddressInCondition(sink, _) } +} + +from Expr subexpression, Expr source, Expr condition, PathNode sourceNode, PathNode sinkNode +where + hardCodedAddressInCondition(subexpression, condition) and + taintedWithPath(source, subexpression, sourceNode, sinkNode) +select condition, sourceNode, sinkNode, + "Untrusted input $@ might be vulnerable to a spoofing attack.", source, source.toString() diff --git a/cpp/ql/src/Security/CWE/CWE-311/CleartextBufferWrite.ql b/cpp/ql/src/Security/CWE/CWE-311/CleartextBufferWrite.ql index aa153779df1..3e84c0a87d9 100644 --- a/cpp/ql/src/Security/CWE/CWE-311/CleartextBufferWrite.ql +++ b/cpp/ql/src/Security/CWE/CWE-311/CleartextBufferWrite.ql @@ -2,7 +2,7 @@ * @name Cleartext storage of sensitive information in buffer * @description Storing sensitive information in cleartext can expose it * to an attacker. - * @kind problem + * @kind path-problem * @problem.severity warning * @precision medium * @id cpp/cleartext-storage-buffer @@ -14,12 +14,20 @@ import cpp import semmle.code.cpp.security.BufferWrite import semmle.code.cpp.security.TaintTracking import semmle.code.cpp.security.SensitiveExprs +import TaintedWithPath -from BufferWrite w, Expr taintedArg, Expr taintSource, string taintCause, SensitiveExpr dest +class Configuration extends TaintTrackingConfiguration { + override predicate isSink(Element tainted) { exists(BufferWrite w | w.getASource() = tainted) } +} + +from + BufferWrite w, Expr taintedArg, Expr taintSource, PathNode sourceNode, PathNode sinkNode, + string taintCause, SensitiveExpr dest where - tainted(taintSource, taintedArg) and + taintedWithPath(taintSource, taintedArg, sourceNode, sinkNode) and isUserInput(taintSource, taintCause) and w.getASource() = taintedArg and dest = w.getDest() -select w, "This write into buffer '" + dest.toString() + "' may contain unencrypted data from $@", +select w, sourceNode, sinkNode, + "This write into buffer '" + dest.toString() + "' may contain unencrypted data from $@", taintSource, "user input (" + taintCause + ")" diff --git a/cpp/ql/src/Security/CWE/CWE-313/CleartextSqliteDatabase.ql b/cpp/ql/src/Security/CWE/CWE-313/CleartextSqliteDatabase.ql index e4f1e9c834a..000833cbb58 100644 --- a/cpp/ql/src/Security/CWE/CWE-313/CleartextSqliteDatabase.ql +++ b/cpp/ql/src/Security/CWE/CWE-313/CleartextSqliteDatabase.ql @@ -2,7 +2,7 @@ * @name Cleartext storage of sensitive information in an SQLite database * @description Storing sensitive information in a non-encrypted * database can expose it to an attacker. - * @kind problem + * @kind path-problem * @problem.severity warning * @precision medium * @id cpp/cleartext-storage-database @@ -13,6 +13,7 @@ import cpp import semmle.code.cpp.security.SensitiveExprs import semmle.code.cpp.security.TaintTracking +import TaintedWithPath class UserInputIsSensitiveExpr extends SecurityOptions { override predicate isUserInput(Expr expr, string cause) { @@ -32,10 +33,21 @@ predicate sqlite_encryption_used() { any(FunctionCall fc).getTarget().getName().matches("sqlite%\\_key\\_%") } -from SensitiveExpr taintSource, Expr taintedArg, SqliteFunctionCall sqliteCall +class Configuration extends TaintTrackingConfiguration { + override predicate isSink(Element taintedArg) { + exists(SqliteFunctionCall sqliteCall | + taintedArg = sqliteCall.getASource() and + not sqlite_encryption_used() + ) + } +} + +from + SensitiveExpr taintSource, Expr taintedArg, SqliteFunctionCall sqliteCall, PathNode sourceNode, + PathNode sinkNode where - tainted(taintSource, taintedArg) and - taintedArg = sqliteCall.getASource() and - not sqlite_encryption_used() -select sqliteCall, "This SQLite call may store $@ in a non-encrypted SQLite database", taintSource, + taintedWithPath(taintSource, taintedArg, sourceNode, sinkNode) and + taintedArg = sqliteCall.getASource() +select sqliteCall, sourceNode, sinkNode, + "This SQLite call may store $@ in a non-encrypted SQLite database", taintSource, "sensitive information" diff --git a/cpp/ql/src/Security/CWE/CWE-807/TaintedCondition.ql b/cpp/ql/src/Security/CWE/CWE-807/TaintedCondition.ql index b940da029ec..e60f592b2af 100644 --- a/cpp/ql/src/Security/CWE/CWE-807/TaintedCondition.ql +++ b/cpp/ql/src/Security/CWE/CWE-807/TaintedCondition.ql @@ -3,7 +3,7 @@ * @description Using untrusted inputs in a statement that makes a * security decision makes code vulnerable to * attack. - * @kind problem + * @kind path-problem * @problem.severity warning * @precision medium * @id cpp/tainted-permissions-check @@ -12,14 +12,9 @@ */ import semmle.code.cpp.security.TaintTracking +import TaintedWithPath -/** - * Holds if there is an 'if' statement whose condition `condition` - * is influenced by tainted data `source`, and the body contains - * `raise` which escalates privilege. - */ -predicate cwe807violation(Expr source, Expr condition, Expr raise) { - tainted(source, condition) and +predicate sensitiveCondition(Expr condition, Expr raise) { raisesPrivilege(raise) and exists(IfStmt ifstmt | ifstmt.getCondition() = condition and @@ -27,7 +22,19 @@ predicate cwe807violation(Expr source, Expr condition, Expr raise) { ) } -from Expr source, Expr condition, Expr raise -where cwe807violation(source, condition, raise) -select condition, "Reliance on untrusted input $@ to raise privilege at $@", source, - source.toString(), raise, raise.toString() +class Configuration extends TaintTrackingConfiguration { + override predicate isSink(Element tainted) { sensitiveCondition(tainted, _) } +} + +/* + * Produce an alert if there is an 'if' statement whose condition `condition` + * is influenced by tainted data `source`, and the body contains + * `raise` which escalates privilege. + */ + +from Expr source, Expr condition, Expr raise, PathNode sourceNode, PathNode sinkNode +where + taintedWithPath(source, condition, sourceNode, sinkNode) and + sensitiveCondition(condition, raise) +select condition, sourceNode, sinkNode, "Reliance on untrusted input $@ to raise privilege at $@", + source, source.toString(), raise, raise.toString() diff --git a/cpp/ql/src/codeql-suites/cpp-code-scanning.qls b/cpp/ql/src/codeql-suites/cpp-code-scanning.qls new file mode 100644 index 00000000000..27bff98ea5d --- /dev/null +++ b/cpp/ql/src/codeql-suites/cpp-code-scanning.qls @@ -0,0 +1,4 @@ +- description: Standard Code Scanning queries for C and C++ +- qlpack: codeql-cpp +- apply: code-scanning-selectors.yml + from: codeql-suite-helpers diff --git a/cpp/ql/src/semmle/code/cpp/commons/Alloc.qll b/cpp/ql/src/semmle/code/cpp/commons/Alloc.qll index a508c929d9a..a9597fc72b5 100644 --- a/cpp/ql/src/semmle/code/cpp/commons/Alloc.qll +++ b/cpp/ql/src/semmle/code/cpp/commons/Alloc.qll @@ -23,6 +23,8 @@ predicate freeFunction(Function f, int argNum) { argNum = f.(DeallocationFunctio /** * A call to a library routine that frees memory. + * + * DEPRECATED: Use `DeallocationExpr` instead (this also includes `delete` expressions). */ predicate freeCall(FunctionCall fc, Expr arg) { arg = fc.(DeallocationExpr).getFreedExpr() } diff --git a/cpp/ql/src/semmle/code/cpp/ir/dataflow/DefaultTaintTracking.qll b/cpp/ql/src/semmle/code/cpp/ir/dataflow/DefaultTaintTracking.qll index bf29d3624cc..12857dd05e9 100644 --- a/cpp/ql/src/semmle/code/cpp/ir/dataflow/DefaultTaintTracking.qll +++ b/cpp/ql/src/semmle/code/cpp/ir/dataflow/DefaultTaintTracking.qll @@ -2,6 +2,7 @@ import cpp import semmle.code.cpp.security.Security private import semmle.code.cpp.ir.dataflow.DataFlow private import semmle.code.cpp.ir.dataflow.DataFlow2 +private import semmle.code.cpp.ir.dataflow.DataFlow3 private import semmle.code.cpp.ir.IR private import semmle.code.cpp.ir.dataflow.internal.DataFlowDispatch as Dispatch private import semmle.code.cpp.models.interfaces.Taint @@ -143,7 +144,17 @@ private predicate writesVariable(StoreInstruction store, Variable var) { } /** - * A variable that has any kind of upper-bound check anywhere in the program + * A variable that has any kind of upper-bound check anywhere in the program. This is + * biased towards being inclusive because there are a lot of valid ways of doing an + * upper bounds checks if we don't consider where it occurs, for example: + * ``` + * if (x < 10) { sink(x); } + * + * if (10 > y) { sink(y); } + * + * if (z > 10) { z = 10; } + * sink(z); + * ``` */ // TODO: This coarse overapproximation, ported from the old taint tracking // library, could be replaced with an actual semantic check that a particular @@ -152,10 +163,10 @@ private predicate writesVariable(StoreInstruction store, Variable var) { // previously suppressed by this predicate by coincidence. private predicate hasUpperBoundsCheck(Variable var) { exists(RelationalOperation oper, VariableAccess access | - oper.getLeftOperand() = access and + oper.getAnOperand() = access and access.getTarget() = var and // Comparing to 0 is not an upper bound check - not oper.getRightOperand().getValue() = "0" + not oper.getAnOperand().getValue() = "0" ) } @@ -171,6 +182,7 @@ private predicate nodeIsBarrierIn(DataFlow::Node node) { node = getNodeForSource(any(Expr e)) } +cached private predicate instructionTaintStep(Instruction i1, Instruction i2) { // Expressions computed from tainted data are also tainted exists(CallInstruction call, int argIndex | call = i2 | @@ -191,9 +203,22 @@ private predicate instructionTaintStep(Instruction i1, Instruction i2) { or i2.(UnaryInstruction).getUnary() = i1 or - i2.(ChiInstruction).getPartial() = i1 and + // Flow out of definition-by-reference + i2.(ChiInstruction).getPartial() = i1.(WriteSideEffectInstruction) and not i2.isResultConflated() or + // Flow from an element to an array or union that contains it. + i2.(ChiInstruction).getPartial() = i1 and + not i2.isResultConflated() and + exists(Type t | i2.getResultLanguageType().hasType(t, false) | + t instanceof Union + or + t instanceof ArrayType + or + // Buffers or unknown size + t instanceof UnknownType + ) + or exists(BinaryInstruction bin | bin = i2 and predictableInstruction(i2.getAnOperand().getDef()) and @@ -350,6 +375,16 @@ private Element adjustedSink(DataFlow::Node sink) { result.(AssignOperation).getAnOperand() = sink.asExpr() } +/** + * Holds if `tainted` may contain taint from `source`. + * + * A tainted expression is either directly user input, or is + * computed from user input in a way that users can probably + * control the exact output of the computation. + * + * This doesn't include data flow through global variables. + * If you need that you must call `taintedIncludingGlobalVars`. + */ cached predicate tainted(Expr source, Element tainted) { exists(DefaultTaintTrackingCfg cfg, DataFlow::Node sink | @@ -358,16 +393,21 @@ predicate tainted(Expr source, Element tainted) { ) } -predicate tainted_instruction( - Function sourceFunc, Instruction source, Function sinkFunc, Instruction sink -) { - sourceFunc = source.getEnclosingFunction() and - sinkFunc = sink.getEnclosingFunction() and - exists(DefaultTaintTrackingCfg cfg | - cfg.hasFlow(DataFlow::instructionNode(source), DataFlow::instructionNode(sink)) - ) -} - +/** + * Holds if `tainted` may contain taint from `source`, where the taint passed + * through a global variable named `globalVar`. + * + * A tainted expression is either directly user input, or is + * computed from user input in a way that users can probably + * control the exact output of the computation. + * + * This version gives the same results as tainted but also includes + * data flow through global variables. + * + * The parameter `globalVar` is the qualified name of the last global variable + * used to move the value from source to tainted. If the taint did not pass + * through a global variable, then `globalVar = ""`. + */ cached predicate taintedIncludingGlobalVars(Expr source, Element tainted, string globalVar) { tainted(source, tainted) and @@ -385,11 +425,245 @@ predicate taintedIncludingGlobalVars(Expr source, Element tainted, string global ) } +/** + * Gets the global variable whose qualified name is `id`. Use this predicate + * together with `taintedIncludingGlobalVars`. Example: + * + * ``` + * exists(string varName | + * taintedIncludingGlobalVars(source, tainted, varName) and + * var = globalVarFromId(varName) + * ) + * ``` + */ GlobalOrNamespaceVariable globalVarFromId(string id) { id = result.getQualifiedName() } +/** + * Resolve potential target function(s) for `call`. + * + * If `call` is a call through a function pointer (`ExprCall`) or + * targets a virtual method, simple data flow analysis is performed + * in order to identify target(s). + */ Function resolveCall(Call call) { exists(CallInstruction callInstruction | callInstruction.getAST() = call and result = Dispatch::viableCallable(callInstruction) ) } + +/** + * Provides definitions for augmenting source/sink pairs with data-flow paths + * between them. From a `@kind path-problem` query, import this module in the + * global scope, extend `TaintTrackingConfiguration`, and use `taintedWithPath` + * in place of `tainted`. + * + * Importing this module will also import the query predicates that contain the + * taint paths. + */ +module TaintedWithPath { + private newtype TSingleton = MkSingleton() + + /** + * A taint-tracking configuration that matches sources and sinks in the same + * way as the `tainted` predicate. + * + * Override `isSink` and `taintThroughGlobals` as needed, but do not provide + * a characteristic predicate. + */ + class TaintTrackingConfiguration extends TSingleton { + /** Override this to specify which elements are sinks in this configuration. */ + abstract predicate isSink(Element e); + + /** + * Override this predicate to `any()` to allow taint to flow through global + * variables. + */ + predicate taintThroughGlobals() { none() } + + /** Gets a textual representation of this element. */ + string toString() { result = "TaintTrackingConfiguration" } + } + + private class AdjustedConfiguration extends DataFlow3::Configuration { + AdjustedConfiguration() { this = "AdjustedConfiguration" } + + override predicate isSource(DataFlow::Node source) { source = getNodeForSource(_) } + + override predicate isSink(DataFlow::Node sink) { + exists(TaintTrackingConfiguration cfg | cfg.isSink(adjustedSink(sink))) + } + + override predicate isAdditionalFlowStep(DataFlow::Node n1, DataFlow::Node n2) { + instructionTaintStep(n1.asInstruction(), n2.asInstruction()) + or + exists(TaintTrackingConfiguration cfg | cfg.taintThroughGlobals() | + writesVariable(n1.asInstruction(), n2.asVariable().(GlobalOrNamespaceVariable)) + or + readsVariable(n2.asInstruction(), n1.asVariable().(GlobalOrNamespaceVariable)) + ) + } + + override predicate isBarrier(DataFlow::Node node) { nodeIsBarrier(node) } + + override predicate isBarrierIn(DataFlow::Node node) { nodeIsBarrierIn(node) } + } + + /* + * A sink `Element` may map to multiple `DataFlowX::PathNode`s via (the + * inverse of) `adjustedSink`. For example, an `Expr` maps to all its + * conversions, and a `Variable` maps to all loads and stores from it. Because + * the path node is part of the tuple that constitutes the alert, this leads + * to duplicate alerts. + * + * To avoid showing duplicates, we edit the graph to replace the final node + * coming from the data-flow library with a node that matches exactly the + * `Element` sink that's requested. + * + * The same is done for sources. + */ + + private newtype TPathNode = + TWrapPathNode(DataFlow3::PathNode n) or + // There's a single newtype constructor for both sources and sinks since + // that makes it easiest to deal with the case where source = sink. + TEndpointPathNode(Element e) { + exists(AdjustedConfiguration cfg, DataFlow3::Node sourceNode, DataFlow3::Node sinkNode | + cfg.hasFlow(sourceNode, sinkNode) + | + sourceNode = getNodeForSource(e) + or + e = adjustedSink(sinkNode) and + exists(TaintTrackingConfiguration ttCfg | ttCfg.isSink(e)) + ) + } + + /** An opaque type used for the nodes of a data-flow path. */ + class PathNode extends TPathNode { + /** Gets a textual representation of this element. */ + string toString() { none() } + + /** + * Holds if this element is at the specified location. + * The location spans column `startcolumn` of line `startline` to + * column `endcolumn` of line `endline` in file `filepath`. + * For more information, see + * [Locations](https://help.semmle.com/QL/learn-ql/ql/locations.html). + */ + predicate hasLocationInfo( + string filepath, int startline, int startcolumn, int endline, int endcolumn + ) { + none() + } + } + + private class WrapPathNode extends PathNode, TWrapPathNode { + DataFlow3::PathNode inner() { this = TWrapPathNode(result) } + + override string toString() { result = this.inner().toString() } + + override predicate hasLocationInfo( + string filepath, int startline, int startcolumn, int endline, int endcolumn + ) { + this.inner().hasLocationInfo(filepath, startline, startcolumn, endline, endcolumn) + } + } + + private class EndpointPathNode extends PathNode, TEndpointPathNode { + Expr inner() { this = TEndpointPathNode(result) } + + override string toString() { result = this.inner().toString() } + + override predicate hasLocationInfo( + string filepath, int startline, int startcolumn, int endline, int endcolumn + ) { + this + .inner() + .getLocation() + .hasLocationInfo(filepath, startline, startcolumn, endline, endcolumn) + } + } + + /** A PathNode whose `Element` is a source. It may also be a sink. */ + private class InitialPathNode extends EndpointPathNode { + InitialPathNode() { exists(getNodeForSource(this.inner())) } + } + + /** A PathNode whose `Element` is a sink. It may also be a source. */ + private class FinalPathNode extends EndpointPathNode { + FinalPathNode() { exists(TaintTrackingConfiguration cfg | cfg.isSink(this.inner())) } + } + + /** Holds if `(a,b)` is an edge in the graph of data flow path explanations. */ + query predicate edges(PathNode a, PathNode b) { + DataFlow3::PathGraph::edges(a.(WrapPathNode).inner(), b.(WrapPathNode).inner()) + or + // To avoid showing trivial-looking steps, we _replace_ the last node instead + // of adding an edge out of it. + exists(WrapPathNode sinkNode | + DataFlow3::PathGraph::edges(a.(WrapPathNode).inner(), sinkNode.inner()) and + b.(FinalPathNode).inner() = adjustedSink(sinkNode.inner().getNode()) + ) + or + // Same for the first node + exists(WrapPathNode sourceNode | + DataFlow3::PathGraph::edges(sourceNode.inner(), b.(WrapPathNode).inner()) and + sourceNode.inner().getNode() = getNodeForSource(a.(InitialPathNode).inner()) + ) + or + // Finally, handle the case where the path goes directly from a source to a + // sink, meaning that they both need to be translated. + exists(WrapPathNode sinkNode, WrapPathNode sourceNode | + DataFlow3::PathGraph::edges(sourceNode.inner(), sinkNode.inner()) and + sourceNode.inner().getNode() = getNodeForSource(a.(InitialPathNode).inner()) and + b.(FinalPathNode).inner() = adjustedSink(sinkNode.inner().getNode()) + ) + } + + /** Holds if `n` is a node in the graph of data flow path explanations. */ + query predicate nodes(PathNode n, string key, string val) { + key = "semmle.label" and val = n.toString() + } + + /** + * Holds if `tainted` may contain taint from `source`, where `sourceNode` and + * `sinkNode` are the corresponding `PathNode`s that can be used in a query + * to provide path explanations. Extend `TaintTrackingConfiguration` to use + * this predicate. + * + * A tainted expression is either directly user input, or is computed from + * user input in a way that users can probably control the exact output of + * the computation. + */ + predicate taintedWithPath(Expr source, Element tainted, PathNode sourceNode, PathNode sinkNode) { + exists(AdjustedConfiguration cfg, DataFlow3::Node flowSource, DataFlow3::Node flowSink | + source = sourceNode.(InitialPathNode).inner() and + flowSource = getNodeForSource(source) and + cfg.hasFlow(flowSource, flowSink) and + tainted = adjustedSink(flowSink) and + tainted = sinkNode.(FinalPathNode).inner() + ) + } + + private predicate isGlobalVariablePathNode(WrapPathNode n) { + n.inner().getNode().asVariable() instanceof GlobalOrNamespaceVariable + } + + private predicate edgesWithoutGlobals(PathNode a, PathNode b) { + edges(a, b) and + not isGlobalVariablePathNode(a) and + not isGlobalVariablePathNode(b) + } + + /** + * Holds if `tainted` can be reached from a taint source without passing + * through a global variable. + */ + predicate taintedWithoutGlobals(Element tainted) { + exists(PathNode sourceNode, FinalPathNode sinkNode | + sourceNode.(WrapPathNode).inner().getNode() = getNodeForSource(_) and + edgesWithoutGlobals+(sourceNode, sinkNode) and + tainted = sinkNode.inner() + ) + } +} diff --git a/cpp/ql/src/semmle/code/cpp/ir/dataflow/internal/DataFlowPrivate.qll b/cpp/ql/src/semmle/code/cpp/ir/dataflow/internal/DataFlowPrivate.qll index 4dd181a9195..c5cf4180765 100644 --- a/cpp/ql/src/semmle/code/cpp/ir/dataflow/internal/DataFlowPrivate.qll +++ b/cpp/ql/src/semmle/code/cpp/ir/dataflow/internal/DataFlowPrivate.qll @@ -209,7 +209,7 @@ Type getErasedRepr(Type t) { } /** Gets a string representation of a type returned by `getErasedRepr`. */ -string ppReprType(Type t) { result = t.toString() } +string ppReprType(Type t) { none() } // stub implementation /** * Holds if `t1` and `t2` are compatible, that is, whether data can flow from diff --git a/cpp/ql/src/semmle/code/cpp/ir/dataflow/internal/DataFlowUtil.qll b/cpp/ql/src/semmle/code/cpp/ir/dataflow/internal/DataFlowUtil.qll index e19912a63ee..193173da442 100644 --- a/cpp/ql/src/semmle/code/cpp/ir/dataflow/internal/DataFlowUtil.qll +++ b/cpp/ql/src/semmle/code/cpp/ir/dataflow/internal/DataFlowUtil.qll @@ -239,6 +239,17 @@ class DefinitionByReferenceNode extends InstructionNode { Parameter getParameter() { exists(CallInstruction ci | result = ci.getStaticCallTarget().getParameter(instr.getIndex())) } + + override string toString() { + // This string should be unique enough to be helpful but common enough to + // avoid storing too many different strings. + result = + instr.getPrimaryInstruction().(CallInstruction).getStaticCallTarget().getName() + + " output argument" + or + not exists(instr.getPrimaryInstruction().(CallInstruction).getStaticCallTarget()) and + result = "output argument" + } } /** @@ -289,7 +300,7 @@ ExprNode exprNode(Expr e) { result.getExpr() = e } * Gets the `Node` corresponding to `e`, if any. Here, `e` may be a * `Conversion`. */ -ExprNode convertedExprNode(Expr e) { result.getExpr() = e } +ExprNode convertedExprNode(Expr e) { result.getConvertedExpr() = e } /** * Gets the `Node` corresponding to the value of `p` at function entry. @@ -323,6 +334,7 @@ predicate simpleLocalFlowStep(Node nodeFrom, Node nodeTo) { simpleInstructionLocalFlowStep(nodeFrom.asInstruction(), nodeTo.asInstruction()) } +cached private predicate simpleInstructionLocalFlowStep(Instruction iFrom, Instruction iTo) { iTo.(CopyInstruction).getSourceValue() = iFrom or diff --git a/cpp/ql/src/semmle/code/cpp/ir/implementation/aliased_ssa/gvn/PrintValueNumbering.qll b/cpp/ql/src/semmle/code/cpp/ir/implementation/aliased_ssa/gvn/PrintValueNumbering.qll new file mode 100644 index 00000000000..a7fb1b3c07e --- /dev/null +++ b/cpp/ql/src/semmle/code/cpp/ir/implementation/aliased_ssa/gvn/PrintValueNumbering.qll @@ -0,0 +1,17 @@ +private import internal.ValueNumberingImports +private import ValueNumbering + +/** + * Provides additional information about value numbering in IR dumps. + */ +class ValueNumberPropertyProvider extends IRPropertyProvider { + override string getInstructionProperty(Instruction instr, string key) { + exists(ValueNumber vn | + vn = valueNumber(instr) and + key = "valnum" and + if strictcount(vn.getAnInstruction()) > 1 + then result = vn.getDebugString() + else result = "unique" + ) + } +} diff --git a/cpp/ql/src/semmle/code/cpp/ir/implementation/aliased_ssa/gvn/ValueNumbering.qll b/cpp/ql/src/semmle/code/cpp/ir/implementation/aliased_ssa/gvn/ValueNumbering.qll index 161f69936e9..13d19587135 100644 --- a/cpp/ql/src/semmle/code/cpp/ir/implementation/aliased_ssa/gvn/ValueNumbering.qll +++ b/cpp/ql/src/semmle/code/cpp/ir/implementation/aliased_ssa/gvn/ValueNumbering.qll @@ -1,21 +1,6 @@ private import internal.ValueNumberingInternal private import internal.ValueNumberingImports -/** - * Provides additional information about value numbering in IR dumps. - */ -class ValueNumberPropertyProvider extends IRPropertyProvider { - override string getInstructionProperty(Instruction instr, string key) { - exists(ValueNumber vn | - vn = valueNumber(instr) and - key = "valnum" and - if strictcount(vn.getAnInstruction()) > 1 - then result = vn.getDebugString() - else result = "unique" - ) - } -} - /** * The value number assigned to a particular set of instructions that produce equivalent results. */ diff --git a/cpp/ql/src/semmle/code/cpp/ir/implementation/raw/gvn/PrintValueNumbering.qll b/cpp/ql/src/semmle/code/cpp/ir/implementation/raw/gvn/PrintValueNumbering.qll new file mode 100644 index 00000000000..a7fb1b3c07e --- /dev/null +++ b/cpp/ql/src/semmle/code/cpp/ir/implementation/raw/gvn/PrintValueNumbering.qll @@ -0,0 +1,17 @@ +private import internal.ValueNumberingImports +private import ValueNumbering + +/** + * Provides additional information about value numbering in IR dumps. + */ +class ValueNumberPropertyProvider extends IRPropertyProvider { + override string getInstructionProperty(Instruction instr, string key) { + exists(ValueNumber vn | + vn = valueNumber(instr) and + key = "valnum" and + if strictcount(vn.getAnInstruction()) > 1 + then result = vn.getDebugString() + else result = "unique" + ) + } +} diff --git a/cpp/ql/src/semmle/code/cpp/ir/implementation/raw/gvn/ValueNumbering.qll b/cpp/ql/src/semmle/code/cpp/ir/implementation/raw/gvn/ValueNumbering.qll index 161f69936e9..13d19587135 100644 --- a/cpp/ql/src/semmle/code/cpp/ir/implementation/raw/gvn/ValueNumbering.qll +++ b/cpp/ql/src/semmle/code/cpp/ir/implementation/raw/gvn/ValueNumbering.qll @@ -1,21 +1,6 @@ private import internal.ValueNumberingInternal private import internal.ValueNumberingImports -/** - * Provides additional information about value numbering in IR dumps. - */ -class ValueNumberPropertyProvider extends IRPropertyProvider { - override string getInstructionProperty(Instruction instr, string key) { - exists(ValueNumber vn | - vn = valueNumber(instr) and - key = "valnum" and - if strictcount(vn.getAnInstruction()) > 1 - then result = vn.getDebugString() - else result = "unique" - ) - } -} - /** * The value number assigned to a particular set of instructions that produce equivalent results. */ diff --git a/cpp/ql/src/semmle/code/cpp/ir/implementation/unaliased_ssa/gvn/PrintValueNumbering.qll b/cpp/ql/src/semmle/code/cpp/ir/implementation/unaliased_ssa/gvn/PrintValueNumbering.qll new file mode 100644 index 00000000000..a7fb1b3c07e --- /dev/null +++ b/cpp/ql/src/semmle/code/cpp/ir/implementation/unaliased_ssa/gvn/PrintValueNumbering.qll @@ -0,0 +1,17 @@ +private import internal.ValueNumberingImports +private import ValueNumbering + +/** + * Provides additional information about value numbering in IR dumps. + */ +class ValueNumberPropertyProvider extends IRPropertyProvider { + override string getInstructionProperty(Instruction instr, string key) { + exists(ValueNumber vn | + vn = valueNumber(instr) and + key = "valnum" and + if strictcount(vn.getAnInstruction()) > 1 + then result = vn.getDebugString() + else result = "unique" + ) + } +} diff --git a/cpp/ql/src/semmle/code/cpp/ir/implementation/unaliased_ssa/gvn/ValueNumbering.qll b/cpp/ql/src/semmle/code/cpp/ir/implementation/unaliased_ssa/gvn/ValueNumbering.qll index 161f69936e9..13d19587135 100644 --- a/cpp/ql/src/semmle/code/cpp/ir/implementation/unaliased_ssa/gvn/ValueNumbering.qll +++ b/cpp/ql/src/semmle/code/cpp/ir/implementation/unaliased_ssa/gvn/ValueNumbering.qll @@ -1,21 +1,6 @@ private import internal.ValueNumberingInternal private import internal.ValueNumberingImports -/** - * Provides additional information about value numbering in IR dumps. - */ -class ValueNumberPropertyProvider extends IRPropertyProvider { - override string getInstructionProperty(Instruction instr, string key) { - exists(ValueNumber vn | - vn = valueNumber(instr) and - key = "valnum" and - if strictcount(vn.getAnInstruction()) > 1 - then result = vn.getDebugString() - else result = "unique" - ) - } -} - /** * The value number assigned to a particular set of instructions that produce equivalent results. */ diff --git a/cpp/ql/src/semmle/code/cpp/models/Models.qll b/cpp/ql/src/semmle/code/cpp/models/Models.qll index 4689cb1c7c8..f02d05be711 100644 --- a/cpp/ql/src/semmle/code/cpp/models/Models.qll +++ b/cpp/ql/src/semmle/code/cpp/models/Models.qll @@ -12,4 +12,5 @@ private import implementations.Strcat private import implementations.Strcpy private import implementations.Strdup private import implementations.Strftime +private import implementations.StdString private import implementations.Swap diff --git a/cpp/ql/src/semmle/code/cpp/models/implementations/Allocation.qll b/cpp/ql/src/semmle/code/cpp/models/implementations/Allocation.qll index c6766983889..f6f7ab279a6 100644 --- a/cpp/ql/src/semmle/code/cpp/models/implementations/Allocation.qll +++ b/cpp/ql/src/semmle/code/cpp/models/implementations/Allocation.qll @@ -1,3 +1,9 @@ +/** + * Provides implementation classes modelling various methods of allocation + * (`malloc`, `new` etc). See `semmle.code.cpp.models.interfaces.Allocation` + * for usage information. + */ + import semmle.code.cpp.models.interfaces.Allocation /** diff --git a/cpp/ql/src/semmle/code/cpp/models/implementations/Deallocation.qll b/cpp/ql/src/semmle/code/cpp/models/implementations/Deallocation.qll index d2e4951e436..980645df031 100644 --- a/cpp/ql/src/semmle/code/cpp/models/implementations/Deallocation.qll +++ b/cpp/ql/src/semmle/code/cpp/models/implementations/Deallocation.qll @@ -1,4 +1,10 @@ -import semmle.code.cpp.models.interfaces.Allocation +/** + * Provides implementation classes modelling various methods of deallocation + * (`free`, `delete` etc). See `semmle.code.cpp.models.interfaces.Deallocation` + * for usage information. + */ + +import semmle.code.cpp.models.interfaces.Deallocation /** * A deallocation function such as `free`. diff --git a/cpp/ql/src/semmle/code/cpp/models/implementations/StdString.qll b/cpp/ql/src/semmle/code/cpp/models/implementations/StdString.qll new file mode 100644 index 00000000000..80fe85e9f13 --- /dev/null +++ b/cpp/ql/src/semmle/code/cpp/models/implementations/StdString.qll @@ -0,0 +1,27 @@ +import semmle.code.cpp.models.interfaces.Taint + +/** + * The `std::basic_string` constructor(s). + */ +class StdStringConstructor extends TaintFunction { + StdStringConstructor() { this.hasQualifiedName("std", "basic_string", "basic_string") } + + override predicate hasTaintFlow(FunctionInput input, FunctionOutput output) { + // flow from any constructor argument to return value + input.isParameter(_) and + output.isReturnValue() + } +} + +/** + * The standard function `std::string.c_str`. + */ +class StdStringCStr extends TaintFunction { + StdStringCStr() { this.hasQualifiedName("std", "basic_string", "c_str") } + + override predicate hasTaintFlow(FunctionInput input, FunctionOutput output) { + // flow from string itself (qualifier) to return value + input.isQualifierObject() and + output.isReturnValue() + } +} diff --git a/cpp/ql/src/semmle/code/cpp/security/TaintTrackingImpl.qll b/cpp/ql/src/semmle/code/cpp/security/TaintTrackingImpl.qll index 3a37b43b319..a24820b277f 100644 --- a/cpp/ql/src/semmle/code/cpp/security/TaintTrackingImpl.qll +++ b/cpp/ql/src/semmle/code/cpp/security/TaintTrackingImpl.qll @@ -328,14 +328,24 @@ GlobalOrNamespaceVariable globalVarFromId(string id) { } /** - * A variable that has any kind of upper-bound check anywhere in the program + * A variable that has any kind of upper-bound check anywhere in the program. This is + * biased towards being inclusive because there are a lot of valid ways of doing an + * upper bounds checks if we don't consider where it occurs, for example: + * ``` + * if (x < 10) { sink(x); } + * + * if (10 > y) { sink(y); } + * + * if (z > 10) { z = 10; } + * sink(z); + * ``` */ private predicate hasUpperBoundsCheck(Variable var) { exists(RelationalOperation oper, VariableAccess access | - oper.getLeftOperand() = access and + oper.getAnOperand() = access and access.getTarget() = var and // Comparing to 0 is not an upper bound check - not oper.getRightOperand().getValue() = "0" + not oper.getAnOperand().getValue() = "0" ) } diff --git a/cpp/ql/test/library-tests/dataflow/DefaultTaintTracking/tainted.expected b/cpp/ql/test/library-tests/dataflow/DefaultTaintTracking/tainted.expected index a8345ba863c..4f98b1bead0 100644 --- a/cpp/ql/test/library-tests/dataflow/DefaultTaintTracking/tainted.expected +++ b/cpp/ql/test/library-tests/dataflow/DefaultTaintTracking/tainted.expected @@ -104,6 +104,7 @@ | defaulttainttracking.cpp:97:27:97:32 | call to getenv | defaulttainttracking.cpp:9:11:9:20 | p#0 | | defaulttainttracking.cpp:97:27:97:32 | call to getenv | defaulttainttracking.cpp:91:42:91:44 | arg | | defaulttainttracking.cpp:97:27:97:32 | call to getenv | defaulttainttracking.cpp:92:12:92:14 | arg | +| defaulttainttracking.cpp:97:27:97:32 | call to getenv | defaulttainttracking.cpp:96:11:96:12 | p2 | | defaulttainttracking.cpp:97:27:97:32 | call to getenv | defaulttainttracking.cpp:97:27:97:32 | call to getenv | | defaulttainttracking.cpp:97:27:97:32 | call to getenv | defaulttainttracking.cpp:98:10:98:11 | (const char *)... | | defaulttainttracking.cpp:97:27:97:32 | call to getenv | defaulttainttracking.cpp:98:10:98:11 | p2 | diff --git a/cpp/ql/test/library-tests/dataflow/DefaultTaintTracking/test_diff.expected b/cpp/ql/test/library-tests/dataflow/DefaultTaintTracking/test_diff.expected index 2503a5a2ae5..858965a069b 100644 --- a/cpp/ql/test/library-tests/dataflow/DefaultTaintTracking/test_diff.expected +++ b/cpp/ql/test/library-tests/dataflow/DefaultTaintTracking/test_diff.expected @@ -19,6 +19,7 @@ | defaulttainttracking.cpp:97:27:97:32 | call to getenv | defaulttainttracking.cpp:91:31:91:33 | ret | AST only | | defaulttainttracking.cpp:97:27:97:32 | call to getenv | defaulttainttracking.cpp:92:5:92:8 | * ... | AST only | | defaulttainttracking.cpp:97:27:97:32 | call to getenv | defaulttainttracking.cpp:92:6:92:8 | ret | AST only | +| defaulttainttracking.cpp:97:27:97:32 | call to getenv | defaulttainttracking.cpp:96:11:96:12 | p2 | IR only | | defaulttainttracking.cpp:97:27:97:32 | call to getenv | defaulttainttracking.cpp:98:10:98:11 | (const char *)... | IR only | | defaulttainttracking.cpp:97:27:97:32 | call to getenv | defaulttainttracking.cpp:98:10:98:11 | p2 | IR only | | defaulttainttracking.cpp:97:27:97:32 | call to getenv | test_diff.cpp:1:11:1:20 | p#0 | IR only | diff --git a/cpp/ql/test/library-tests/dataflow/taint-tests/localTaint.expected b/cpp/ql/test/library-tests/dataflow/taint-tests/localTaint.expected index 4c0fe0f8161..ec88e5bc628 100644 --- a/cpp/ql/test/library-tests/dataflow/taint-tests/localTaint.expected +++ b/cpp/ql/test/library-tests/dataflow/taint-tests/localTaint.expected @@ -105,6 +105,63 @@ | format.cpp:130:23:130:23 | 0 | format.cpp:130:21:130:24 | {...} | TAINT | | format.cpp:131:39:131:45 | ref arg & ... | format.cpp:132:8:132:13 | buffer | | | format.cpp:131:40:131:45 | buffer | format.cpp:131:39:131:45 | & ... | | +| stl.cpp:67:12:67:17 | call to source | stl.cpp:71:7:71:7 | a | | +| stl.cpp:68:16:68:20 | 123 | stl.cpp:68:16:68:21 | call to basic_string | TAINT | +| stl.cpp:68:16:68:21 | call to basic_string | stl.cpp:72:7:72:7 | b | | +| stl.cpp:68:16:68:21 | call to basic_string | stl.cpp:74:7:74:7 | b | | +| stl.cpp:69:16:69:21 | call to source | stl.cpp:69:16:69:24 | call to basic_string | TAINT | +| stl.cpp:69:16:69:24 | call to basic_string | stl.cpp:73:7:73:7 | c | | +| stl.cpp:69:16:69:24 | call to basic_string | stl.cpp:75:7:75:7 | c | | +| stl.cpp:74:7:74:7 | b | stl.cpp:74:9:74:13 | call to c_str | TAINT | +| stl.cpp:75:7:75:7 | c | stl.cpp:75:9:75:13 | call to c_str | TAINT | +| stl.cpp:80:20:80:22 | call to basic_stringstream | stl.cpp:83:2:83:4 | ss1 | | +| stl.cpp:80:20:80:22 | call to basic_stringstream | stl.cpp:89:7:89:9 | ss1 | | +| stl.cpp:80:20:80:22 | call to basic_stringstream | stl.cpp:94:7:94:9 | ss1 | | +| stl.cpp:80:25:80:27 | call to basic_stringstream | stl.cpp:84:2:84:4 | ss2 | | +| stl.cpp:80:25:80:27 | call to basic_stringstream | stl.cpp:90:7:90:9 | ss2 | | +| stl.cpp:80:25:80:27 | call to basic_stringstream | stl.cpp:95:7:95:9 | ss2 | | +| stl.cpp:80:30:80:32 | call to basic_stringstream | stl.cpp:85:2:85:4 | ss3 | | +| stl.cpp:80:30:80:32 | call to basic_stringstream | stl.cpp:91:7:91:9 | ss3 | | +| stl.cpp:80:30:80:32 | call to basic_stringstream | stl.cpp:96:7:96:9 | ss3 | | +| stl.cpp:80:35:80:37 | call to basic_stringstream | stl.cpp:86:2:86:4 | ss4 | | +| stl.cpp:80:35:80:37 | call to basic_stringstream | stl.cpp:92:7:92:9 | ss4 | | +| stl.cpp:80:35:80:37 | call to basic_stringstream | stl.cpp:97:7:97:9 | ss4 | | +| stl.cpp:80:40:80:42 | call to basic_stringstream | stl.cpp:87:2:87:4 | ss5 | | +| stl.cpp:80:40:80:42 | call to basic_stringstream | stl.cpp:93:7:93:9 | ss5 | | +| stl.cpp:80:40:80:42 | call to basic_stringstream | stl.cpp:98:7:98:9 | ss5 | | +| stl.cpp:81:16:81:21 | call to source | stl.cpp:81:16:81:24 | call to basic_string | TAINT | +| stl.cpp:81:16:81:24 | call to basic_string | stl.cpp:87:9:87:9 | t | | +| stl.cpp:83:2:83:4 | ref arg ss1 | stl.cpp:89:7:89:9 | ss1 | | +| stl.cpp:83:2:83:4 | ref arg ss1 | stl.cpp:94:7:94:9 | ss1 | | +| stl.cpp:84:2:84:4 | ref arg ss2 | stl.cpp:90:7:90:9 | ss2 | | +| stl.cpp:84:2:84:4 | ref arg ss2 | stl.cpp:95:7:95:9 | ss2 | | +| stl.cpp:85:2:85:4 | ref arg ss3 | stl.cpp:91:7:91:9 | ss3 | | +| stl.cpp:85:2:85:4 | ref arg ss3 | stl.cpp:96:7:96:9 | ss3 | | +| stl.cpp:86:2:86:4 | ref arg ss4 | stl.cpp:92:7:92:9 | ss4 | | +| stl.cpp:86:2:86:4 | ref arg ss4 | stl.cpp:97:7:97:9 | ss4 | | +| stl.cpp:87:2:87:4 | ref arg ss5 | stl.cpp:93:7:93:9 | ss5 | | +| stl.cpp:87:2:87:4 | ref arg ss5 | stl.cpp:98:7:98:9 | ss5 | | +| stl.cpp:101:32:101:37 | source | stl.cpp:106:9:106:14 | source | | +| stl.cpp:103:20:103:22 | call to basic_stringstream | stl.cpp:105:2:105:4 | ss1 | | +| stl.cpp:103:20:103:22 | call to basic_stringstream | stl.cpp:108:7:108:9 | ss1 | | +| stl.cpp:103:20:103:22 | call to basic_stringstream | stl.cpp:110:7:110:9 | ss1 | | +| stl.cpp:103:25:103:27 | call to basic_stringstream | stl.cpp:106:2:106:4 | ss2 | | +| stl.cpp:103:25:103:27 | call to basic_stringstream | stl.cpp:109:7:109:9 | ss2 | | +| stl.cpp:103:25:103:27 | call to basic_stringstream | stl.cpp:111:7:111:9 | ss2 | | +| stl.cpp:105:2:105:4 | ss1 [post update] | stl.cpp:108:7:108:9 | ss1 | | +| stl.cpp:105:2:105:4 | ss1 [post update] | stl.cpp:110:7:110:9 | ss1 | | +| stl.cpp:106:2:106:4 | ss2 [post update] | stl.cpp:109:7:109:9 | ss2 | | +| stl.cpp:106:2:106:4 | ss2 [post update] | stl.cpp:111:7:111:9 | ss2 | | +| stl.cpp:124:16:124:28 | call to basic_string | stl.cpp:125:7:125:11 | path1 | | +| stl.cpp:124:17:124:26 | call to user_input | stl.cpp:124:16:124:28 | call to basic_string | TAINT | +| stl.cpp:125:7:125:11 | path1 | stl.cpp:125:13:125:17 | call to c_str | TAINT | +| stl.cpp:128:10:128:19 | call to user_input | stl.cpp:128:10:128:21 | call to basic_string | TAINT | +| stl.cpp:128:10:128:21 | call to basic_string | stl.cpp:128:2:128:21 | ... = ... | | +| stl.cpp:128:10:128:21 | call to basic_string | stl.cpp:129:7:129:11 | path2 | | +| stl.cpp:129:7:129:11 | path2 | stl.cpp:129:13:129:17 | call to c_str | TAINT | +| stl.cpp:131:15:131:24 | call to user_input | stl.cpp:131:15:131:27 | call to basic_string | TAINT | +| stl.cpp:131:15:131:27 | call to basic_string | stl.cpp:132:7:132:11 | path3 | | +| stl.cpp:132:7:132:11 | path3 | stl.cpp:132:13:132:17 | call to c_str | TAINT | | taint.cpp:4:27:4:33 | source1 | taint.cpp:6:13:6:19 | source1 | | | taint.cpp:4:40:4:45 | clean1 | taint.cpp:5:8:5:13 | clean1 | | | taint.cpp:4:40:4:45 | clean1 | taint.cpp:6:3:6:8 | clean1 | | diff --git a/cpp/ql/test/library-tests/dataflow/taint-tests/stl.cpp b/cpp/ql/test/library-tests/dataflow/taint-tests/stl.cpp new file mode 100644 index 00000000000..d92bb39d158 --- /dev/null +++ b/cpp/ql/test/library-tests/dataflow/taint-tests/stl.cpp @@ -0,0 +1,133 @@ + +typedef unsigned long size_t; + +namespace std +{ + template struct char_traits; + + typedef size_t streamsize; + + template class allocator { + public: + allocator() throw(); + }; + + template, class Allocator = allocator > + class basic_string { + public: + explicit basic_string(const Allocator& a = Allocator()); + basic_string(const charT* s, const Allocator& a = Allocator()); + + const charT* c_str() const; + }; + + typedef basic_string string; + + template > + class basic_istream /*: virtual public basic_ios - not needed for this test */ { + public: + basic_istream& operator>>(int& n); + }; + + template > + class basic_ostream /*: virtual public basic_ios - not needed for this test */ { + public: + typedef charT char_type; + basic_ostream& write(const char_type* s, streamsize n); + + basic_ostream& operator<<(int n); + }; + + template basic_ostream& operator<<(basic_ostream&, const charT*); + template basic_ostream& operator<<(basic_ostream& os, const basic_string& str); + + template> + class basic_iostream : public basic_istream, public basic_ostream { + public: + }; + + template, class Allocator = allocator> + class basic_stringstream : public basic_iostream { + public: + explicit basic_stringstream(/*ios_base::openmode which = ios_base::out|ios_base::in - not needed for this test*/); + + basic_string str() const; + }; + + using stringstream = basic_stringstream; +} + +char *source(); +void sink(const char *s) {}; +void sink(const std::string &s) {}; +void sink(const std::stringstream &s) {}; + +void test_string() +{ + char *a = source(); + std::string b("123"); + std::string c(source()); + + sink(a); // tainted + sink(b); + sink(c); // tainted + sink(b.c_str()); + sink(c.c_str()); // tainted +} + +void test_stringstream() +{ + std::stringstream ss1, ss2, ss3, ss4, ss5; + std::string t(source()); + + ss1 << "1234"; + ss2 << source(); + ss3 << "123" << source(); + ss4 << source() << "456"; + ss5 << t; + + sink(ss1); + sink(ss2); // tainted [NOT DETECTED] + sink(ss3); // tainted [NOT DETECTED] + sink(ss4); // tainted [NOT DETECTED] + sink(ss5); // tainted [NOT DETECTED] + sink(ss1.str()); + sink(ss2.str()); // tainted [NOT DETECTED] + sink(ss3.str()); // tainted [NOT DETECTED] + sink(ss4.str()); // tainted [NOT DETECTED] + sink(ss5.str()); // tainted [NOT DETECTED] +} + +void test_stringstream_int(int source) +{ + std::stringstream ss1, ss2; + + ss1 << 1234; + ss2 << source; + + sink(ss1); + sink(ss2); // tainted [NOT DETECTED] + sink(ss1.str()); + sink(ss2.str()); // tainted [NOT DETECTED] +} + +using namespace std; + +char *user_input() { + return source(); +} + +void sink(const char *filename, const char *mode); + +void test_strings2() +{ + string path1 = user_input(); + sink(path1.c_str(), "r"); // tainted + + string path2; + path2 = user_input(); + sink(path2.c_str(), "r"); // tainted + + string path3(user_input()); + sink(path3.c_str(), "r"); // tainted +} diff --git a/cpp/ql/test/library-tests/dataflow/taint-tests/taint.expected b/cpp/ql/test/library-tests/dataflow/taint-tests/taint.expected index 7061beff0ee..59193d81722 100644 --- a/cpp/ql/test/library-tests/dataflow/taint-tests/taint.expected +++ b/cpp/ql/test/library-tests/dataflow/taint-tests/taint.expected @@ -8,6 +8,12 @@ | format.cpp:96:8:96:13 | buffer | format.cpp:95:30:95:43 | call to source | | format.cpp:101:8:101:13 | buffer | format.cpp:100:31:100:45 | call to source | | format.cpp:106:8:106:14 | wbuffer | format.cpp:105:38:105:52 | call to source | +| stl.cpp:71:7:71:7 | a | stl.cpp:67:12:67:17 | call to source | +| stl.cpp:73:7:73:7 | c | stl.cpp:69:16:69:21 | call to source | +| stl.cpp:75:9:75:13 | call to c_str | stl.cpp:69:16:69:21 | call to source | +| stl.cpp:125:13:125:17 | call to c_str | stl.cpp:117:10:117:15 | call to source | +| stl.cpp:129:13:129:17 | call to c_str | stl.cpp:117:10:117:15 | call to source | +| stl.cpp:132:13:132:17 | call to c_str | stl.cpp:117:10:117:15 | call to source | | taint.cpp:8:8:8:13 | clean1 | taint.cpp:4:27:4:33 | source1 | | taint.cpp:16:8:16:14 | source1 | taint.cpp:12:22:12:27 | call to source | | taint.cpp:17:8:17:16 | ++ ... | taint.cpp:12:22:12:27 | call to source | diff --git a/cpp/ql/test/library-tests/dataflow/taint-tests/test_diff.expected b/cpp/ql/test/library-tests/dataflow/taint-tests/test_diff.expected index 329a0bb6ecc..af9d002dcdc 100644 --- a/cpp/ql/test/library-tests/dataflow/taint-tests/test_diff.expected +++ b/cpp/ql/test/library-tests/dataflow/taint-tests/test_diff.expected @@ -8,6 +8,11 @@ | format.cpp:96:8:96:13 | format.cpp:95:30:95:43 | AST only | | format.cpp:101:8:101:13 | format.cpp:100:31:100:45 | AST only | | format.cpp:106:8:106:14 | format.cpp:105:38:105:52 | AST only | +| stl.cpp:73:7:73:7 | stl.cpp:69:16:69:21 | AST only | +| stl.cpp:75:9:75:13 | stl.cpp:69:16:69:21 | AST only | +| stl.cpp:125:13:125:17 | stl.cpp:117:10:117:15 | AST only | +| stl.cpp:129:13:129:17 | stl.cpp:117:10:117:15 | AST only | +| stl.cpp:132:13:132:17 | stl.cpp:117:10:117:15 | AST only | | taint.cpp:41:7:41:13 | taint.cpp:35:12:35:17 | AST only | | taint.cpp:42:7:42:13 | taint.cpp:35:12:35:17 | AST only | | taint.cpp:43:7:43:13 | taint.cpp:37:22:37:27 | AST only | diff --git a/cpp/ql/test/library-tests/dataflow/taint-tests/test_ir.expected b/cpp/ql/test/library-tests/dataflow/taint-tests/test_ir.expected index b0107791e61..9dc1088434c 100644 --- a/cpp/ql/test/library-tests/dataflow/taint-tests/test_ir.expected +++ b/cpp/ql/test/library-tests/dataflow/taint-tests/test_ir.expected @@ -1,3 +1,5 @@ +| stl.cpp:71:7:71:7 | (const char *)... | stl.cpp:67:12:67:17 | call to source | +| stl.cpp:71:7:71:7 | a | stl.cpp:67:12:67:17 | call to source | | taint.cpp:8:8:8:13 | clean1 | taint.cpp:4:27:4:33 | source1 | | taint.cpp:16:8:16:14 | source1 | taint.cpp:12:22:12:27 | call to source | | taint.cpp:17:8:17:16 | ++ ... | taint.cpp:12:22:12:27 | call to source | diff --git a/cpp/ql/test/library-tests/ir/ssa/aliased_ssa_sanity_unsound.expected b/cpp/ql/test/library-tests/ir/ssa/aliased_ssa_sanity_unsound.expected index 517ef7099c0..89002a5d3ca 100644 --- a/cpp/ql/test/library-tests/ir/ssa/aliased_ssa_sanity_unsound.expected +++ b/cpp/ql/test/library-tests/ir/ssa/aliased_ssa_sanity_unsound.expected @@ -1,7 +1,7 @@ missingOperand unexpectedOperand duplicateOperand -| ssa.cpp:286:27:286:30 | ReturnIndirection: argv | Instruction has 2 operands with tag 'SideEffect' in function '$@'. | ssa.cpp:286:5:286:8 | IR: main | int main(int, char**) | +| ssa.cpp:301:27:301:30 | ReturnIndirection: argv | Instruction has 2 operands with tag 'SideEffect' in function '$@'. | ssa.cpp:301:5:301:8 | IR: main | int main(int, char**) | missingPhiOperand missingOperandType duplicateChiOperand @@ -20,7 +20,7 @@ switchInstructionWithoutDefaultEdge notMarkedAsConflated wronglyMarkedAsConflated invalidOverlap -| ssa.cpp:286:27:286:30 | SideEffect | MemoryOperand 'SideEffect' has a `getDefinitionOverlap()` of 'MayPartiallyOverlap'. | ssa.cpp:286:5:286:8 | IR: main | int main(int, char**) | +| ssa.cpp:301:27:301:30 | SideEffect | MemoryOperand 'SideEffect' has a `getDefinitionOverlap()` of 'MayPartiallyOverlap'. | ssa.cpp:301:5:301:8 | IR: main | int main(int, char**) | missingCanonicalLanguageType multipleCanonicalLanguageTypes missingIRType diff --git a/cpp/ql/test/library-tests/valuenumbering/GlobalValueNumbering/ir_gvn.ql b/cpp/ql/test/library-tests/valuenumbering/GlobalValueNumbering/ir_gvn.ql index 528bb6174a5..3debea869b4 100644 --- a/cpp/ql/test/library-tests/valuenumbering/GlobalValueNumbering/ir_gvn.ql +++ b/cpp/ql/test/library-tests/valuenumbering/GlobalValueNumbering/ir_gvn.ql @@ -5,3 +5,4 @@ import semmle.code.cpp.ir.PrintIR import semmle.code.cpp.ir.IR import semmle.code.cpp.ir.ValueNumbering +import semmle.code.cpp.ir.implementation.aliased_ssa.gvn.PrintValueNumbering diff --git a/cpp/ql/test/query-tests/Critical/NewFree/NewFreeMismatch.expected b/cpp/ql/test/query-tests/Critical/NewFree/NewFreeMismatch.expected index 7350749ce96..45f88d426c9 100644 --- a/cpp/ql/test/query-tests/Critical/NewFree/NewFreeMismatch.expected +++ b/cpp/ql/test/query-tests/Critical/NewFree/NewFreeMismatch.expected @@ -1,5 +1,9 @@ | test2.cpp:19:3:19:6 | call to free | There is a new/free mismatch between this free and the corresponding $@. | test2.cpp:18:12:18:18 | new | new | | test2.cpp:26:3:26:6 | call to free | There is a new/free mismatch between this free and the corresponding $@. | test2.cpp:25:7:25:13 | new | new | +| test2.cpp:51:2:51:5 | call to free | There is a new/free mismatch between this free and the corresponding $@. | test2.cpp:45:18:45:24 | new | new | +| test2.cpp:55:2:55:5 | call to free | There is a new/free mismatch between this free and the corresponding $@. | test2.cpp:46:20:46:33 | call to operator new | new | +| test2.cpp:57:2:57:18 | delete | There is a malloc/delete mismatch between this delete and the corresponding $@. | test2.cpp:47:21:47:26 | call to malloc | malloc | +| test2.cpp:58:2:58:18 | call to operator delete | There is a malloc/delete mismatch between this delete and the corresponding $@. | test2.cpp:47:21:47:26 | call to malloc | malloc | | test.cpp:36:2:36:17 | delete | There is a malloc/delete mismatch between this delete and the corresponding $@. | test.cpp:27:18:27:23 | call to malloc | malloc | | test.cpp:41:2:41:5 | call to free | There is a new/free mismatch between this free and the corresponding $@. | test.cpp:26:7:26:17 | new | new | | test.cpp:68:3:68:11 | delete | There is a malloc/delete mismatch between this delete and the corresponding $@. | test.cpp:64:28:64:33 | call to malloc | malloc | diff --git a/cpp/ql/test/query-tests/Critical/NewFree/test2.cpp b/cpp/ql/test/query-tests/Critical/NewFree/test2.cpp index 9101758d85b..43a286f6f97 100644 --- a/cpp/ql/test/query-tests/Critical/NewFree/test2.cpp +++ b/cpp/ql/test/query-tests/Critical/NewFree/test2.cpp @@ -34,3 +34,27 @@ public: }; MyTest2Class mt2c_i; + +// --- + +void* operator new(size_t); +void operator delete(void*); + +void test_operator_new() +{ + void *ptr_new = new int; + void *ptr_opnew = ::operator new(sizeof(int)); + void *ptr_malloc = malloc(sizeof(int)); + + delete ptr_new; // GOOD + ::operator delete(ptr_new); // GOOD + free(ptr_new); // BAD + + delete ptr_opnew; // GOOD + ::operator delete(ptr_opnew); // GOOD + free(ptr_opnew); // BAD + + delete ptr_malloc; // BAD + ::operator delete(ptr_malloc); // BAD + free(ptr_malloc); // GOOD +} diff --git a/cpp/ql/test/query-tests/Security/CWE/CWE-022/semmle/tests/TaintedPath.expected b/cpp/ql/test/query-tests/Security/CWE/CWE-022/semmle/tests/TaintedPath.expected index 7a58fb1b608..d7db531fbde 100644 --- a/cpp/ql/test/query-tests/Security/CWE/CWE-022/semmle/tests/TaintedPath.expected +++ b/cpp/ql/test/query-tests/Security/CWE/CWE-022/semmle/tests/TaintedPath.expected @@ -1 +1,13 @@ -| test.c:17:11:17:18 | fileName | This argument to a file access function is derived from $@ and then passed to fopen(filename) | test.c:9:23:9:26 | argv | user input (argv) | +edges +| test.c:9:23:9:26 | argv | test.c:17:11:17:18 | (const char *)... | +| test.c:9:23:9:26 | argv | test.c:17:11:17:18 | (const char *)... | +| test.c:9:23:9:26 | argv | test.c:17:11:17:18 | fileName | +| test.c:9:23:9:26 | argv | test.c:17:11:17:18 | fileName | +nodes +| test.c:9:23:9:26 | argv | semmle.label | argv | +| test.c:9:23:9:26 | argv | semmle.label | argv | +| test.c:17:11:17:18 | (const char *)... | semmle.label | (const char *)... | +| test.c:17:11:17:18 | (const char *)... | semmle.label | (const char *)... | +| test.c:17:11:17:18 | fileName | semmle.label | fileName | +#select +| test.c:17:11:17:18 | fileName | test.c:9:23:9:26 | argv | test.c:17:11:17:18 | fileName | This argument to a file access function is derived from $@ and then passed to fopen(filename) | test.c:9:23:9:26 | argv | user input (argv) | diff --git a/cpp/ql/test/query-tests/Security/CWE/CWE-079/semmle/CgiXss/CgiXss.expected b/cpp/ql/test/query-tests/Security/CWE/CWE-079/semmle/CgiXss/CgiXss.expected index bfdbb13a097..17bb67fdc08 100644 --- a/cpp/ql/test/query-tests/Security/CWE/CWE-079/semmle/CgiXss/CgiXss.expected +++ b/cpp/ql/test/query-tests/Security/CWE/CWE-079/semmle/CgiXss/CgiXss.expected @@ -1,2 +1,30 @@ -| search.c:17:8:17:12 | query | Cross-site scripting vulnerability due to $@. | search.c:41:21:41:26 | call to getenv | this query data | -| search.c:23:39:23:43 | query | Cross-site scripting vulnerability due to $@. | search.c:41:21:41:26 | call to getenv | this query data | +edges +| search.c:14:24:14:28 | query | search.c:17:8:17:12 | (const char *)... | +| search.c:14:24:14:28 | query | search.c:17:8:17:12 | query | +| search.c:14:24:14:28 | query | search.c:17:8:17:12 | query | +| search.c:22:24:22:28 | query | search.c:23:39:23:43 | query | +| search.c:22:24:22:28 | query | search.c:23:39:23:43 | query | +| search.c:41:21:41:26 | call to getenv | search.c:45:17:45:25 | raw_query | +| search.c:41:21:41:26 | call to getenv | search.c:45:17:45:25 | raw_query | +| search.c:41:21:41:26 | call to getenv | search.c:47:17:47:25 | raw_query | +| search.c:41:21:41:26 | call to getenv | search.c:47:17:47:25 | raw_query | +| search.c:45:17:45:25 | raw_query | search.c:14:24:14:28 | query | +| search.c:47:17:47:25 | raw_query | search.c:22:24:22:28 | query | +nodes +| search.c:14:24:14:28 | query | semmle.label | query | +| search.c:17:8:17:12 | (const char *)... | semmle.label | (const char *)... | +| search.c:17:8:17:12 | (const char *)... | semmle.label | (const char *)... | +| search.c:17:8:17:12 | query | semmle.label | query | +| search.c:17:8:17:12 | query | semmle.label | query | +| search.c:17:8:17:12 | query | semmle.label | query | +| search.c:22:24:22:28 | query | semmle.label | query | +| search.c:23:39:23:43 | query | semmle.label | query | +| search.c:23:39:23:43 | query | semmle.label | query | +| search.c:23:39:23:43 | query | semmle.label | query | +| search.c:41:21:41:26 | call to getenv | semmle.label | call to getenv | +| search.c:41:21:41:26 | call to getenv | semmle.label | call to getenv | +| search.c:45:17:45:25 | raw_query | semmle.label | raw_query | +| search.c:47:17:47:25 | raw_query | semmle.label | raw_query | +#select +| search.c:17:8:17:12 | query | search.c:41:21:41:26 | call to getenv | search.c:17:8:17:12 | query | Cross-site scripting vulnerability due to $@. | search.c:41:21:41:26 | call to getenv | this query data | +| search.c:23:39:23:43 | query | search.c:41:21:41:26 | call to getenv | search.c:23:39:23:43 | query | Cross-site scripting vulnerability due to $@. | search.c:41:21:41:26 | call to getenv | this query data | diff --git a/cpp/ql/test/query-tests/Security/CWE/CWE-114/semmle/UncontrolledProcessOperation/UncontrolledProcessOperation.expected b/cpp/ql/test/query-tests/Security/CWE/CWE-114/semmle/UncontrolledProcessOperation/UncontrolledProcessOperation.expected index b494196d4aa..f84c6719157 100644 --- a/cpp/ql/test/query-tests/Security/CWE/CWE-114/semmle/UncontrolledProcessOperation/UncontrolledProcessOperation.expected +++ b/cpp/ql/test/query-tests/Security/CWE/CWE-114/semmle/UncontrolledProcessOperation/UncontrolledProcessOperation.expected @@ -1,2 +1,25 @@ -| test.cpp:26:10:26:16 | command | The value of this argument may come from $@ and is being passed to system | test.cpp:42:18:42:23 | call to getenv | call to getenv | -| test.cpp:31:10:31:16 | command | The value of this argument may come from $@ and is being passed to system | test.cpp:43:18:43:23 | call to getenv | call to getenv | +edges +| test.cpp:24:30:24:36 | command | test.cpp:26:10:26:16 | command | +| test.cpp:24:30:24:36 | command | test.cpp:26:10:26:16 | command | +| test.cpp:29:30:29:36 | command | test.cpp:31:10:31:16 | command | +| test.cpp:29:30:29:36 | command | test.cpp:31:10:31:16 | command | +| test.cpp:42:18:42:23 | call to getenv | test.cpp:24:30:24:36 | command | +| test.cpp:42:18:42:34 | (const char *)... | test.cpp:24:30:24:36 | command | +| test.cpp:43:18:43:23 | call to getenv | test.cpp:29:30:29:36 | command | +| test.cpp:43:18:43:34 | (const char *)... | test.cpp:29:30:29:36 | command | +nodes +| test.cpp:24:30:24:36 | command | semmle.label | command | +| test.cpp:26:10:26:16 | command | semmle.label | command | +| test.cpp:26:10:26:16 | command | semmle.label | command | +| test.cpp:26:10:26:16 | command | semmle.label | command | +| test.cpp:29:30:29:36 | command | semmle.label | command | +| test.cpp:31:10:31:16 | command | semmle.label | command | +| test.cpp:31:10:31:16 | command | semmle.label | command | +| test.cpp:31:10:31:16 | command | semmle.label | command | +| test.cpp:42:18:42:23 | call to getenv | semmle.label | call to getenv | +| test.cpp:42:18:42:34 | (const char *)... | semmle.label | (const char *)... | +| test.cpp:43:18:43:23 | call to getenv | semmle.label | call to getenv | +| test.cpp:43:18:43:34 | (const char *)... | semmle.label | (const char *)... | +#select +| test.cpp:26:10:26:16 | command | test.cpp:42:18:42:23 | call to getenv | test.cpp:26:10:26:16 | command | The value of this argument may come from $@ and is being passed to system | test.cpp:42:18:42:23 | call to getenv | call to getenv | +| test.cpp:31:10:31:16 | command | test.cpp:43:18:43:23 | call to getenv | test.cpp:31:10:31:16 | command | The value of this argument may come from $@ and is being passed to system | test.cpp:43:18:43:23 | call to getenv | call to getenv | diff --git a/cpp/ql/test/query-tests/Security/CWE/CWE-119/semmle/tests/UnboundedWrite.expected b/cpp/ql/test/query-tests/Security/CWE/CWE-119/semmle/tests/UnboundedWrite.expected index e69de29bb2d..58e3dda0964 100644 --- a/cpp/ql/test/query-tests/Security/CWE/CWE-119/semmle/tests/UnboundedWrite.expected +++ b/cpp/ql/test/query-tests/Security/CWE/CWE-119/semmle/tests/UnboundedWrite.expected @@ -0,0 +1,3 @@ +edges +nodes +#select diff --git a/cpp/ql/test/query-tests/Security/CWE/CWE-120/semmle/tests/UnboundedWrite.expected b/cpp/ql/test/query-tests/Security/CWE/CWE-120/semmle/tests/UnboundedWrite.expected index 5096e75ebc3..291c1cb3a71 100644 --- a/cpp/ql/test/query-tests/Security/CWE/CWE-120/semmle/tests/UnboundedWrite.expected +++ b/cpp/ql/test/query-tests/Security/CWE/CWE-120/semmle/tests/UnboundedWrite.expected @@ -1,5 +1,53 @@ -| tests.c:28:3:28:9 | call to sprintf | This 'call to sprintf' with input from $@ may overflow the destination. | tests.c:28:22:28:25 | argv | argv | -| tests.c:29:3:29:9 | call to sprintf | This 'call to sprintf' with input from $@ may overflow the destination. | tests.c:29:28:29:31 | argv | argv | -| tests.c:31:15:31:23 | buffer100 | This 'scanf string argument' with input from $@ may overflow the destination. | tests.c:31:15:31:23 | buffer100 | buffer100 | -| tests.c:33:21:33:29 | buffer100 | This 'scanf string argument' with input from $@ may overflow the destination. | tests.c:33:21:33:29 | buffer100 | buffer100 | -| tests.c:34:25:34:33 | buffer100 | This 'sscanf string argument' with input from $@ may overflow the destination. | tests.c:34:10:34:13 | argv | argv | +edges +| tests.c:28:22:28:25 | argv | tests.c:28:22:28:28 | (const char *)... | +| tests.c:28:22:28:25 | argv | tests.c:28:22:28:28 | (const char *)... | +| tests.c:28:22:28:25 | argv | tests.c:28:22:28:28 | access to array | +| tests.c:28:22:28:25 | argv | tests.c:28:22:28:28 | access to array | +| tests.c:28:22:28:25 | argv | tests.c:28:22:28:28 | access to array | +| tests.c:28:22:28:25 | argv | tests.c:28:22:28:28 | access to array | +| tests.c:29:28:29:31 | argv | tests.c:29:28:29:34 | access to array | +| tests.c:29:28:29:31 | argv | tests.c:29:28:29:34 | access to array | +| tests.c:29:28:29:31 | argv | tests.c:29:28:29:34 | access to array | +| tests.c:29:28:29:31 | argv | tests.c:29:28:29:34 | access to array | +| tests.c:34:10:34:13 | argv | tests.c:34:10:34:16 | (const char *)... | +| tests.c:34:10:34:13 | argv | tests.c:34:10:34:16 | (const char *)... | +| tests.c:34:10:34:13 | argv | tests.c:34:10:34:16 | access to array | +| tests.c:34:10:34:13 | argv | tests.c:34:10:34:16 | access to array | +| tests.c:34:10:34:13 | argv | tests.c:34:10:34:16 | access to array | +| tests.c:34:10:34:13 | argv | tests.c:34:10:34:16 | access to array | +nodes +| tests.c:28:22:28:25 | argv | semmle.label | argv | +| tests.c:28:22:28:25 | argv | semmle.label | argv | +| tests.c:28:22:28:28 | (const char *)... | semmle.label | (const char *)... | +| tests.c:28:22:28:28 | (const char *)... | semmle.label | (const char *)... | +| tests.c:28:22:28:28 | access to array | semmle.label | access to array | +| tests.c:28:22:28:28 | access to array | semmle.label | access to array | +| tests.c:28:22:28:28 | access to array | semmle.label | access to array | +| tests.c:29:28:29:31 | argv | semmle.label | argv | +| tests.c:29:28:29:31 | argv | semmle.label | argv | +| tests.c:29:28:29:34 | access to array | semmle.label | access to array | +| tests.c:29:28:29:34 | access to array | semmle.label | access to array | +| tests.c:29:28:29:34 | access to array | semmle.label | access to array | +| tests.c:31:15:31:23 | array to pointer conversion | semmle.label | array to pointer conversion | +| tests.c:31:15:31:23 | array to pointer conversion | semmle.label | array to pointer conversion | +| tests.c:31:15:31:23 | buffer100 | semmle.label | buffer100 | +| tests.c:31:15:31:23 | buffer100 | semmle.label | buffer100 | +| tests.c:31:15:31:23 | buffer100 | semmle.label | buffer100 | +| tests.c:33:21:33:29 | array to pointer conversion | semmle.label | array to pointer conversion | +| tests.c:33:21:33:29 | array to pointer conversion | semmle.label | array to pointer conversion | +| tests.c:33:21:33:29 | buffer100 | semmle.label | buffer100 | +| tests.c:33:21:33:29 | buffer100 | semmle.label | buffer100 | +| tests.c:33:21:33:29 | buffer100 | semmle.label | buffer100 | +| tests.c:34:10:34:13 | argv | semmle.label | argv | +| tests.c:34:10:34:13 | argv | semmle.label | argv | +| tests.c:34:10:34:16 | (const char *)... | semmle.label | (const char *)... | +| tests.c:34:10:34:16 | (const char *)... | semmle.label | (const char *)... | +| tests.c:34:10:34:16 | access to array | semmle.label | access to array | +| tests.c:34:10:34:16 | access to array | semmle.label | access to array | +| tests.c:34:10:34:16 | access to array | semmle.label | access to array | +#select +| tests.c:28:3:28:9 | call to sprintf | tests.c:28:22:28:25 | argv | tests.c:28:22:28:28 | access to array | This 'call to sprintf' with input from $@ may overflow the destination. | tests.c:28:22:28:25 | argv | argv | +| tests.c:29:3:29:9 | call to sprintf | tests.c:29:28:29:31 | argv | tests.c:29:28:29:34 | access to array | This 'call to sprintf' with input from $@ may overflow the destination. | tests.c:29:28:29:31 | argv | argv | +| tests.c:31:15:31:23 | buffer100 | tests.c:31:15:31:23 | buffer100 | tests.c:31:15:31:23 | buffer100 | This 'scanf string argument' with input from $@ may overflow the destination. | tests.c:31:15:31:23 | buffer100 | buffer100 | +| tests.c:33:21:33:29 | buffer100 | tests.c:33:21:33:29 | buffer100 | tests.c:33:21:33:29 | buffer100 | This 'scanf string argument' with input from $@ may overflow the destination. | tests.c:33:21:33:29 | buffer100 | buffer100 | +| tests.c:34:25:34:33 | buffer100 | tests.c:34:10:34:13 | argv | tests.c:34:10:34:16 | access to array | This 'sscanf string argument' with input from $@ may overflow the destination. | tests.c:34:10:34:13 | argv | argv | diff --git a/cpp/ql/test/query-tests/Security/CWE/CWE-134/semmle/argv/argvLocal.expected b/cpp/ql/test/query-tests/Security/CWE/CWE-134/semmle/argv/argvLocal.expected index 7f342b0aabe..0064b6d5715 100644 --- a/cpp/ql/test/query-tests/Security/CWE/CWE-134/semmle/argv/argvLocal.expected +++ b/cpp/ql/test/query-tests/Security/CWE/CWE-134/semmle/argv/argvLocal.expected @@ -1,28 +1,314 @@ -| argvLocal.c:95:9:95:15 | access to array | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | argvLocal.c:95:9:95:12 | argv | argv | -| argvLocal.c:96:15:96:21 | access to array | The value of this argument may come from $@ and is being used as a formatting argument to printWrapper(correct), which calls printf(format) | argvLocal.c:96:15:96:18 | argv | argv | -| argvLocal.c:101:9:101:10 | i1 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | argvLocal.c:100:7:100:10 | argv | argv | -| argvLocal.c:102:15:102:16 | i1 | The value of this argument may come from $@ and is being used as a formatting argument to printWrapper(correct), which calls printf(format) | argvLocal.c:100:7:100:10 | argv | argv | -| argvLocal.c:106:9:106:13 | access to array | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | argvLocal.c:105:14:105:17 | argv | argv | -| argvLocal.c:107:15:107:19 | access to array | The value of this argument may come from $@ and is being used as a formatting argument to printWrapper(correct), which calls printf(format) | argvLocal.c:105:14:105:17 | argv | argv | -| argvLocal.c:110:9:110:11 | * ... | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | argvLocal.c:105:14:105:17 | argv | argv | -| argvLocal.c:111:15:111:17 | * ... | The value of this argument may come from $@ and is being used as a formatting argument to printWrapper(correct), which calls printf(format) | argvLocal.c:105:14:105:17 | argv | argv | -| argvLocal.c:116:9:116:10 | i3 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | argvLocal.c:115:13:115:16 | argv | argv | -| argvLocal.c:117:15:117:16 | i3 | The value of this argument may come from $@ and is being used as a formatting argument to printWrapper(correct), which calls printf(format) | argvLocal.c:115:13:115:16 | argv | argv | -| argvLocal.c:121:9:121:10 | i4 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | argvLocal.c:115:13:115:16 | argv | argv | -| argvLocal.c:122:15:122:16 | i4 | The value of this argument may come from $@ and is being used as a formatting argument to printWrapper(correct), which calls printf(format) | argvLocal.c:115:13:115:16 | argv | argv | -| argvLocal.c:127:9:127:10 | i5 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | argvLocal.c:126:10:126:13 | argv | argv | -| argvLocal.c:128:15:128:16 | i5 | The value of this argument may come from $@ and is being used as a formatting argument to printWrapper(correct), which calls printf(format) | argvLocal.c:126:10:126:13 | argv | argv | -| argvLocal.c:131:9:131:14 | ... + ... | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | argvLocal.c:126:10:126:13 | argv | argv | -| argvLocal.c:132:15:132:20 | ... + ... | The value of this argument may come from $@ and is being used as a formatting argument to printWrapper(correct), which calls printf(format) | argvLocal.c:126:10:126:13 | argv | argv | -| argvLocal.c:135:9:135:12 | ... ++ | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | argvLocal.c:115:13:115:16 | argv | argv | -| argvLocal.c:136:15:136:18 | -- ... | The value of this argument may come from $@ and is being used as a formatting argument to printWrapper(correct), which calls printf(format) | argvLocal.c:115:13:115:16 | argv | argv | -| argvLocal.c:144:9:144:10 | i7 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | argvLocal.c:100:7:100:10 | argv | argv | -| argvLocal.c:145:15:145:16 | i7 | The value of this argument may come from $@ and is being used as a formatting argument to printWrapper(correct), which calls printf(format) | argvLocal.c:100:7:100:10 | argv | argv | -| argvLocal.c:150:9:150:10 | i8 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | argvLocal.c:149:11:149:14 | argv | argv | -| argvLocal.c:151:15:151:16 | i8 | The value of this argument may come from $@ and is being used as a formatting argument to printWrapper(correct), which calls printf(format) | argvLocal.c:149:11:149:14 | argv | argv | -| argvLocal.c:157:9:157:10 | i9 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | argvLocal.c:156:23:156:26 | argv | argv | -| argvLocal.c:158:15:158:16 | i9 | The value of this argument may come from $@ and is being used as a formatting argument to printWrapper(correct), which calls printf(format) | argvLocal.c:156:23:156:26 | argv | argv | -| argvLocal.c:164:9:164:11 | i91 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | argvLocal.c:163:22:163:25 | argv | argv | -| argvLocal.c:165:15:165:17 | i91 | The value of this argument may come from $@ and is being used as a formatting argument to printWrapper(correct), which calls printf(format) | argvLocal.c:163:22:163:25 | argv | argv | -| argvLocal.c:169:18:169:20 | i10 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | argvLocal.c:168:18:168:21 | argv | argv | -| argvLocal.c:170:24:170:26 | i10 | The value of this argument may come from $@ and is being used as a formatting argument to printWrapper(correct), which calls printf(format) | argvLocal.c:168:18:168:21 | argv | argv | +edges +| argvLocal.c:95:9:95:12 | argv | argvLocal.c:95:9:95:15 | (const char *)... | +| argvLocal.c:95:9:95:12 | argv | argvLocal.c:95:9:95:15 | (const char *)... | +| argvLocal.c:95:9:95:12 | argv | argvLocal.c:95:9:95:15 | access to array | +| argvLocal.c:95:9:95:12 | argv | argvLocal.c:95:9:95:15 | access to array | +| argvLocal.c:95:9:95:12 | argv | argvLocal.c:95:9:95:15 | access to array | +| argvLocal.c:95:9:95:12 | argv | argvLocal.c:95:9:95:15 | access to array | +| argvLocal.c:96:15:96:18 | argv | argvLocal.c:96:15:96:21 | access to array | +| argvLocal.c:96:15:96:18 | argv | argvLocal.c:96:15:96:21 | access to array | +| argvLocal.c:96:15:96:18 | argv | argvLocal.c:96:15:96:21 | access to array | +| argvLocal.c:96:15:96:18 | argv | argvLocal.c:96:15:96:21 | access to array | +| argvLocal.c:100:7:100:10 | argv | argvLocal.c:101:9:101:10 | (const char *)... | +| argvLocal.c:100:7:100:10 | argv | argvLocal.c:101:9:101:10 | (const char *)... | +| argvLocal.c:100:7:100:10 | argv | argvLocal.c:101:9:101:10 | i1 | +| argvLocal.c:100:7:100:10 | argv | argvLocal.c:101:9:101:10 | i1 | +| argvLocal.c:100:7:100:10 | argv | argvLocal.c:101:9:101:10 | i1 | +| argvLocal.c:100:7:100:10 | argv | argvLocal.c:101:9:101:10 | i1 | +| argvLocal.c:100:7:100:10 | argv | argvLocal.c:102:15:102:16 | i1 | +| argvLocal.c:100:7:100:10 | argv | argvLocal.c:102:15:102:16 | i1 | +| argvLocal.c:100:7:100:10 | argv | argvLocal.c:102:15:102:16 | i1 | +| argvLocal.c:100:7:100:10 | argv | argvLocal.c:102:15:102:16 | i1 | +| argvLocal.c:100:7:100:10 | argv | argvLocal.c:144:9:144:10 | (const char *)... | +| argvLocal.c:100:7:100:10 | argv | argvLocal.c:144:9:144:10 | (const char *)... | +| argvLocal.c:100:7:100:10 | argv | argvLocal.c:144:9:144:10 | i7 | +| argvLocal.c:100:7:100:10 | argv | argvLocal.c:144:9:144:10 | i7 | +| argvLocal.c:100:7:100:10 | argv | argvLocal.c:144:9:144:10 | i7 | +| argvLocal.c:100:7:100:10 | argv | argvLocal.c:144:9:144:10 | i7 | +| argvLocal.c:100:7:100:10 | argv | argvLocal.c:145:15:145:16 | i7 | +| argvLocal.c:100:7:100:10 | argv | argvLocal.c:145:15:145:16 | i7 | +| argvLocal.c:100:7:100:10 | argv | argvLocal.c:145:15:145:16 | i7 | +| argvLocal.c:100:7:100:10 | argv | argvLocal.c:145:15:145:16 | i7 | +| argvLocal.c:105:14:105:17 | argv | argvLocal.c:106:9:106:13 | (const char *)... | +| argvLocal.c:105:14:105:17 | argv | argvLocal.c:106:9:106:13 | (const char *)... | +| argvLocal.c:105:14:105:17 | argv | argvLocal.c:106:9:106:13 | access to array | +| argvLocal.c:105:14:105:17 | argv | argvLocal.c:106:9:106:13 | access to array | +| argvLocal.c:105:14:105:17 | argv | argvLocal.c:106:9:106:13 | access to array | +| argvLocal.c:105:14:105:17 | argv | argvLocal.c:106:9:106:13 | access to array | +| argvLocal.c:105:14:105:17 | argv | argvLocal.c:107:15:107:19 | access to array | +| argvLocal.c:105:14:105:17 | argv | argvLocal.c:107:15:107:19 | access to array | +| argvLocal.c:105:14:105:17 | argv | argvLocal.c:107:15:107:19 | access to array | +| argvLocal.c:105:14:105:17 | argv | argvLocal.c:107:15:107:19 | access to array | +| argvLocal.c:105:14:105:17 | argv | argvLocal.c:110:9:110:11 | (const char *)... | +| argvLocal.c:105:14:105:17 | argv | argvLocal.c:110:9:110:11 | (const char *)... | +| argvLocal.c:105:14:105:17 | argv | argvLocal.c:110:9:110:11 | * ... | +| argvLocal.c:105:14:105:17 | argv | argvLocal.c:110:9:110:11 | * ... | +| argvLocal.c:105:14:105:17 | argv | argvLocal.c:110:9:110:11 | * ... | +| argvLocal.c:105:14:105:17 | argv | argvLocal.c:110:9:110:11 | * ... | +| argvLocal.c:105:14:105:17 | argv | argvLocal.c:111:15:111:17 | * ... | +| argvLocal.c:105:14:105:17 | argv | argvLocal.c:111:15:111:17 | * ... | +| argvLocal.c:105:14:105:17 | argv | argvLocal.c:111:15:111:17 | * ... | +| argvLocal.c:105:14:105:17 | argv | argvLocal.c:111:15:111:17 | * ... | +| argvLocal.c:115:13:115:16 | argv | argvLocal.c:116:9:116:10 | (const char *)... | +| argvLocal.c:115:13:115:16 | argv | argvLocal.c:116:9:116:10 | (const char *)... | +| argvLocal.c:115:13:115:16 | argv | argvLocal.c:116:9:116:10 | i3 | +| argvLocal.c:115:13:115:16 | argv | argvLocal.c:116:9:116:10 | i3 | +| argvLocal.c:115:13:115:16 | argv | argvLocal.c:117:15:117:16 | array to pointer conversion | +| argvLocal.c:115:13:115:16 | argv | argvLocal.c:117:15:117:16 | array to pointer conversion | +| argvLocal.c:115:13:115:16 | argv | argvLocal.c:117:15:117:16 | array to pointer conversion | +| argvLocal.c:115:13:115:16 | argv | argvLocal.c:117:15:117:16 | array to pointer conversion | +| argvLocal.c:115:13:115:16 | argv | argvLocal.c:117:15:117:16 | i3 | +| argvLocal.c:115:13:115:16 | argv | argvLocal.c:117:15:117:16 | i3 | +| argvLocal.c:115:13:115:16 | argv | argvLocal.c:121:9:121:10 | (const char *)... | +| argvLocal.c:115:13:115:16 | argv | argvLocal.c:121:9:121:10 | (const char *)... | +| argvLocal.c:115:13:115:16 | argv | argvLocal.c:121:9:121:10 | i4 | +| argvLocal.c:115:13:115:16 | argv | argvLocal.c:121:9:121:10 | i4 | +| argvLocal.c:115:13:115:16 | argv | argvLocal.c:122:15:122:16 | i4 | +| argvLocal.c:115:13:115:16 | argv | argvLocal.c:122:15:122:16 | i4 | +| argvLocal.c:115:13:115:16 | argv | argvLocal.c:122:15:122:16 | i4 | +| argvLocal.c:115:13:115:16 | argv | argvLocal.c:122:15:122:16 | i4 | +| argvLocal.c:115:13:115:16 | argv | argvLocal.c:122:15:122:16 | i4 | +| argvLocal.c:115:13:115:16 | argv | argvLocal.c:122:15:122:16 | i4 | +| argvLocal.c:115:13:115:16 | argv | argvLocal.c:135:9:135:12 | (const char *)... | +| argvLocal.c:115:13:115:16 | argv | argvLocal.c:135:9:135:12 | (const char *)... | +| argvLocal.c:115:13:115:16 | argv | argvLocal.c:135:9:135:12 | ... ++ | +| argvLocal.c:115:13:115:16 | argv | argvLocal.c:135:9:135:12 | ... ++ | +| argvLocal.c:115:13:115:16 | argv | argvLocal.c:136:15:136:18 | -- ... | +| argvLocal.c:115:13:115:16 | argv | argvLocal.c:136:15:136:18 | -- ... | +| argvLocal.c:115:13:115:16 | argv | argvLocal.c:136:15:136:18 | -- ... | +| argvLocal.c:115:13:115:16 | argv | argvLocal.c:136:15:136:18 | -- ... | +| argvLocal.c:117:15:117:16 | array to pointer conversion | argvLocal.c:117:15:117:16 | printWrapper output argument | +| argvLocal.c:117:15:117:16 | printWrapper output argument | argvLocal.c:121:9:121:10 | (const char *)... | +| argvLocal.c:117:15:117:16 | printWrapper output argument | argvLocal.c:121:9:121:10 | i4 | +| argvLocal.c:117:15:117:16 | printWrapper output argument | argvLocal.c:122:15:122:16 | i4 | +| argvLocal.c:117:15:117:16 | printWrapper output argument | argvLocal.c:122:15:122:16 | i4 | +| argvLocal.c:117:15:117:16 | printWrapper output argument | argvLocal.c:122:15:122:16 | i4 | +| argvLocal.c:117:15:117:16 | printWrapper output argument | argvLocal.c:135:9:135:12 | (const char *)... | +| argvLocal.c:117:15:117:16 | printWrapper output argument | argvLocal.c:135:9:135:12 | ... ++ | +| argvLocal.c:117:15:117:16 | printWrapper output argument | argvLocal.c:136:15:136:18 | -- ... | +| argvLocal.c:117:15:117:16 | printWrapper output argument | argvLocal.c:136:15:136:18 | -- ... | +| argvLocal.c:122:15:122:16 | i4 | argvLocal.c:122:15:122:16 | printWrapper output argument | +| argvLocal.c:122:15:122:16 | printWrapper output argument | argvLocal.c:135:9:135:12 | (const char *)... | +| argvLocal.c:122:15:122:16 | printWrapper output argument | argvLocal.c:135:9:135:12 | ... ++ | +| argvLocal.c:122:15:122:16 | printWrapper output argument | argvLocal.c:136:15:136:18 | -- ... | +| argvLocal.c:122:15:122:16 | printWrapper output argument | argvLocal.c:136:15:136:18 | -- ... | +| argvLocal.c:126:10:126:13 | argv | argvLocal.c:127:9:127:10 | (const char *)... | +| argvLocal.c:126:10:126:13 | argv | argvLocal.c:127:9:127:10 | (const char *)... | +| argvLocal.c:126:10:126:13 | argv | argvLocal.c:127:9:127:10 | i5 | +| argvLocal.c:126:10:126:13 | argv | argvLocal.c:127:9:127:10 | i5 | +| argvLocal.c:126:10:126:13 | argv | argvLocal.c:128:15:128:16 | array to pointer conversion | +| argvLocal.c:126:10:126:13 | argv | argvLocal.c:128:15:128:16 | array to pointer conversion | +| argvLocal.c:126:10:126:13 | argv | argvLocal.c:128:15:128:16 | array to pointer conversion | +| argvLocal.c:126:10:126:13 | argv | argvLocal.c:128:15:128:16 | array to pointer conversion | +| argvLocal.c:126:10:126:13 | argv | argvLocal.c:128:15:128:16 | i5 | +| argvLocal.c:126:10:126:13 | argv | argvLocal.c:128:15:128:16 | i5 | +| argvLocal.c:126:10:126:13 | argv | argvLocal.c:131:9:131:14 | (const char *)... | +| argvLocal.c:126:10:126:13 | argv | argvLocal.c:131:9:131:14 | (const char *)... | +| argvLocal.c:126:10:126:13 | argv | argvLocal.c:131:9:131:14 | ... + ... | +| argvLocal.c:126:10:126:13 | argv | argvLocal.c:131:9:131:14 | ... + ... | +| argvLocal.c:126:10:126:13 | argv | argvLocal.c:132:15:132:20 | ... + ... | +| argvLocal.c:126:10:126:13 | argv | argvLocal.c:132:15:132:20 | ... + ... | +| argvLocal.c:126:10:126:13 | argv | argvLocal.c:132:15:132:20 | ... + ... | +| argvLocal.c:126:10:126:13 | argv | argvLocal.c:132:15:132:20 | ... + ... | +| argvLocal.c:128:15:128:16 | array to pointer conversion | argvLocal.c:128:15:128:16 | printWrapper output argument | +| argvLocal.c:128:15:128:16 | printWrapper output argument | argvLocal.c:131:9:131:14 | (const char *)... | +| argvLocal.c:128:15:128:16 | printWrapper output argument | argvLocal.c:131:9:131:14 | ... + ... | +| argvLocal.c:128:15:128:16 | printWrapper output argument | argvLocal.c:132:15:132:20 | ... + ... | +| argvLocal.c:128:15:128:16 | printWrapper output argument | argvLocal.c:132:15:132:20 | ... + ... | +| argvLocal.c:149:11:149:14 | argv | argvLocal.c:150:9:150:10 | (const char *)... | +| argvLocal.c:149:11:149:14 | argv | argvLocal.c:150:9:150:10 | (const char *)... | +| argvLocal.c:149:11:149:14 | argv | argvLocal.c:150:9:150:10 | i8 | +| argvLocal.c:149:11:149:14 | argv | argvLocal.c:150:9:150:10 | i8 | +| argvLocal.c:149:11:149:14 | argv | argvLocal.c:150:9:150:10 | i8 | +| argvLocal.c:149:11:149:14 | argv | argvLocal.c:150:9:150:10 | i8 | +| argvLocal.c:149:11:149:14 | argv | argvLocal.c:151:15:151:16 | i8 | +| argvLocal.c:149:11:149:14 | argv | argvLocal.c:151:15:151:16 | i8 | +| argvLocal.c:149:11:149:14 | argv | argvLocal.c:151:15:151:16 | i8 | +| argvLocal.c:149:11:149:14 | argv | argvLocal.c:151:15:151:16 | i8 | +| argvLocal.c:156:23:156:26 | argv | argvLocal.c:157:9:157:10 | (const char *)... | +| argvLocal.c:156:23:156:26 | argv | argvLocal.c:157:9:157:10 | (const char *)... | +| argvLocal.c:156:23:156:26 | argv | argvLocal.c:157:9:157:10 | i9 | +| argvLocal.c:156:23:156:26 | argv | argvLocal.c:157:9:157:10 | i9 | +| argvLocal.c:156:23:156:26 | argv | argvLocal.c:158:15:158:16 | i9 | +| argvLocal.c:156:23:156:26 | argv | argvLocal.c:158:15:158:16 | i9 | +| argvLocal.c:156:23:156:26 | argv | argvLocal.c:158:15:158:16 | i9 | +| argvLocal.c:156:23:156:26 | argv | argvLocal.c:158:15:158:16 | i9 | +| argvLocal.c:163:22:163:25 | argv | argvLocal.c:164:9:164:11 | (const char *)... | +| argvLocal.c:163:22:163:25 | argv | argvLocal.c:164:9:164:11 | (const char *)... | +| argvLocal.c:163:22:163:25 | argv | argvLocal.c:164:9:164:11 | i91 | +| argvLocal.c:163:22:163:25 | argv | argvLocal.c:164:9:164:11 | i91 | +| argvLocal.c:163:22:163:25 | argv | argvLocal.c:165:15:165:17 | i91 | +| argvLocal.c:163:22:163:25 | argv | argvLocal.c:165:15:165:17 | i91 | +| argvLocal.c:163:22:163:25 | argv | argvLocal.c:165:15:165:17 | i91 | +| argvLocal.c:163:22:163:25 | argv | argvLocal.c:165:15:165:17 | i91 | +| argvLocal.c:168:18:168:21 | argv | argvLocal.c:169:9:169:20 | (char *)... | +| argvLocal.c:168:18:168:21 | argv | argvLocal.c:169:9:169:20 | (char *)... | +| argvLocal.c:168:18:168:21 | argv | argvLocal.c:169:9:169:20 | (const char *)... | +| argvLocal.c:168:18:168:21 | argv | argvLocal.c:169:9:169:20 | (const char *)... | +| argvLocal.c:168:18:168:21 | argv | argvLocal.c:169:18:169:20 | i10 | +| argvLocal.c:168:18:168:21 | argv | argvLocal.c:169:18:169:20 | i10 | +| argvLocal.c:168:18:168:21 | argv | argvLocal.c:169:18:169:20 | i10 | +| argvLocal.c:168:18:168:21 | argv | argvLocal.c:169:18:169:20 | i10 | +| argvLocal.c:168:18:168:21 | argv | argvLocal.c:170:15:170:26 | (char *)... | +| argvLocal.c:168:18:168:21 | argv | argvLocal.c:170:15:170:26 | (char *)... | +| argvLocal.c:168:18:168:21 | argv | argvLocal.c:170:24:170:26 | i10 | +| argvLocal.c:168:18:168:21 | argv | argvLocal.c:170:24:170:26 | i10 | +| argvLocal.c:168:18:168:21 | argv | argvLocal.c:170:24:170:26 | i10 | +| argvLocal.c:168:18:168:21 | argv | argvLocal.c:170:24:170:26 | i10 | +nodes +| argvLocal.c:9:25:9:31 | correct | semmle.label | correct | +| argvLocal.c:10:9:10:15 | Chi | semmle.label | Chi | +| argvLocal.c:95:9:95:12 | argv | semmle.label | argv | +| argvLocal.c:95:9:95:12 | argv | semmle.label | argv | +| argvLocal.c:95:9:95:15 | (const char *)... | semmle.label | (const char *)... | +| argvLocal.c:95:9:95:15 | (const char *)... | semmle.label | (const char *)... | +| argvLocal.c:95:9:95:15 | access to array | semmle.label | access to array | +| argvLocal.c:95:9:95:15 | access to array | semmle.label | access to array | +| argvLocal.c:95:9:95:15 | access to array | semmle.label | access to array | +| argvLocal.c:96:15:96:18 | argv | semmle.label | argv | +| argvLocal.c:96:15:96:18 | argv | semmle.label | argv | +| argvLocal.c:96:15:96:21 | access to array | semmle.label | access to array | +| argvLocal.c:96:15:96:21 | access to array | semmle.label | access to array | +| argvLocal.c:96:15:96:21 | access to array | semmle.label | access to array | +| argvLocal.c:100:7:100:10 | argv | semmle.label | argv | +| argvLocal.c:100:7:100:10 | argv | semmle.label | argv | +| argvLocal.c:101:9:101:10 | (const char *)... | semmle.label | (const char *)... | +| argvLocal.c:101:9:101:10 | (const char *)... | semmle.label | (const char *)... | +| argvLocal.c:101:9:101:10 | i1 | semmle.label | i1 | +| argvLocal.c:101:9:101:10 | i1 | semmle.label | i1 | +| argvLocal.c:101:9:101:10 | i1 | semmle.label | i1 | +| argvLocal.c:102:15:102:16 | i1 | semmle.label | i1 | +| argvLocal.c:102:15:102:16 | i1 | semmle.label | i1 | +| argvLocal.c:102:15:102:16 | i1 | semmle.label | i1 | +| argvLocal.c:105:14:105:17 | argv | semmle.label | argv | +| argvLocal.c:105:14:105:17 | argv | semmle.label | argv | +| argvLocal.c:106:9:106:13 | (const char *)... | semmle.label | (const char *)... | +| argvLocal.c:106:9:106:13 | (const char *)... | semmle.label | (const char *)... | +| argvLocal.c:106:9:106:13 | access to array | semmle.label | access to array | +| argvLocal.c:106:9:106:13 | access to array | semmle.label | access to array | +| argvLocal.c:106:9:106:13 | access to array | semmle.label | access to array | +| argvLocal.c:107:15:107:19 | access to array | semmle.label | access to array | +| argvLocal.c:107:15:107:19 | access to array | semmle.label | access to array | +| argvLocal.c:107:15:107:19 | access to array | semmle.label | access to array | +| argvLocal.c:110:9:110:11 | (const char *)... | semmle.label | (const char *)... | +| argvLocal.c:110:9:110:11 | (const char *)... | semmle.label | (const char *)... | +| argvLocal.c:110:9:110:11 | * ... | semmle.label | * ... | +| argvLocal.c:110:9:110:11 | * ... | semmle.label | * ... | +| argvLocal.c:110:9:110:11 | * ... | semmle.label | * ... | +| argvLocal.c:111:15:111:17 | * ... | semmle.label | * ... | +| argvLocal.c:111:15:111:17 | * ... | semmle.label | * ... | +| argvLocal.c:111:15:111:17 | * ... | semmle.label | * ... | +| argvLocal.c:115:13:115:16 | argv | semmle.label | argv | +| argvLocal.c:115:13:115:16 | argv | semmle.label | argv | +| argvLocal.c:116:9:116:10 | (const char *)... | semmle.label | (const char *)... | +| argvLocal.c:116:9:116:10 | (const char *)... | semmle.label | (const char *)... | +| argvLocal.c:116:9:116:10 | i3 | semmle.label | i3 | +| argvLocal.c:117:15:117:16 | array to pointer conversion | semmle.label | array to pointer conversion | +| argvLocal.c:117:15:117:16 | array to pointer conversion | semmle.label | array to pointer conversion | +| argvLocal.c:117:15:117:16 | i3 | semmle.label | i3 | +| argvLocal.c:117:15:117:16 | printWrapper output argument | semmle.label | printWrapper output argument | +| argvLocal.c:121:9:121:10 | (const char *)... | semmle.label | (const char *)... | +| argvLocal.c:121:9:121:10 | (const char *)... | semmle.label | (const char *)... | +| argvLocal.c:121:9:121:10 | i4 | semmle.label | i4 | +| argvLocal.c:122:15:122:16 | i4 | semmle.label | i4 | +| argvLocal.c:122:15:122:16 | i4 | semmle.label | i4 | +| argvLocal.c:122:15:122:16 | i4 | semmle.label | i4 | +| argvLocal.c:122:15:122:16 | printWrapper output argument | semmle.label | printWrapper output argument | +| argvLocal.c:126:10:126:13 | argv | semmle.label | argv | +| argvLocal.c:126:10:126:13 | argv | semmle.label | argv | +| argvLocal.c:127:9:127:10 | (const char *)... | semmle.label | (const char *)... | +| argvLocal.c:127:9:127:10 | (const char *)... | semmle.label | (const char *)... | +| argvLocal.c:127:9:127:10 | i5 | semmle.label | i5 | +| argvLocal.c:128:15:128:16 | array to pointer conversion | semmle.label | array to pointer conversion | +| argvLocal.c:128:15:128:16 | array to pointer conversion | semmle.label | array to pointer conversion | +| argvLocal.c:128:15:128:16 | i5 | semmle.label | i5 | +| argvLocal.c:128:15:128:16 | printWrapper output argument | semmle.label | printWrapper output argument | +| argvLocal.c:131:9:131:14 | (const char *)... | semmle.label | (const char *)... | +| argvLocal.c:131:9:131:14 | (const char *)... | semmle.label | (const char *)... | +| argvLocal.c:131:9:131:14 | ... + ... | semmle.label | ... + ... | +| argvLocal.c:132:15:132:20 | ... + ... | semmle.label | ... + ... | +| argvLocal.c:132:15:132:20 | ... + ... | semmle.label | ... + ... | +| argvLocal.c:132:15:132:20 | ... + ... | semmle.label | ... + ... | +| argvLocal.c:135:9:135:12 | (const char *)... | semmle.label | (const char *)... | +| argvLocal.c:135:9:135:12 | (const char *)... | semmle.label | (const char *)... | +| argvLocal.c:135:9:135:12 | ... ++ | semmle.label | ... ++ | +| argvLocal.c:136:15:136:18 | -- ... | semmle.label | -- ... | +| argvLocal.c:136:15:136:18 | -- ... | semmle.label | -- ... | +| argvLocal.c:136:15:136:18 | -- ... | semmle.label | -- ... | +| argvLocal.c:144:9:144:10 | (const char *)... | semmle.label | (const char *)... | +| argvLocal.c:144:9:144:10 | (const char *)... | semmle.label | (const char *)... | +| argvLocal.c:144:9:144:10 | i7 | semmle.label | i7 | +| argvLocal.c:144:9:144:10 | i7 | semmle.label | i7 | +| argvLocal.c:144:9:144:10 | i7 | semmle.label | i7 | +| argvLocal.c:145:15:145:16 | i7 | semmle.label | i7 | +| argvLocal.c:145:15:145:16 | i7 | semmle.label | i7 | +| argvLocal.c:145:15:145:16 | i7 | semmle.label | i7 | +| argvLocal.c:149:11:149:14 | argv | semmle.label | argv | +| argvLocal.c:149:11:149:14 | argv | semmle.label | argv | +| argvLocal.c:150:9:150:10 | (const char *)... | semmle.label | (const char *)... | +| argvLocal.c:150:9:150:10 | (const char *)... | semmle.label | (const char *)... | +| argvLocal.c:150:9:150:10 | i8 | semmle.label | i8 | +| argvLocal.c:150:9:150:10 | i8 | semmle.label | i8 | +| argvLocal.c:150:9:150:10 | i8 | semmle.label | i8 | +| argvLocal.c:151:15:151:16 | i8 | semmle.label | i8 | +| argvLocal.c:151:15:151:16 | i8 | semmle.label | i8 | +| argvLocal.c:151:15:151:16 | i8 | semmle.label | i8 | +| argvLocal.c:156:23:156:26 | argv | semmle.label | argv | +| argvLocal.c:156:23:156:26 | argv | semmle.label | argv | +| argvLocal.c:157:9:157:10 | (const char *)... | semmle.label | (const char *)... | +| argvLocal.c:157:9:157:10 | (const char *)... | semmle.label | (const char *)... | +| argvLocal.c:157:9:157:10 | i9 | semmle.label | i9 | +| argvLocal.c:158:15:158:16 | i9 | semmle.label | i9 | +| argvLocal.c:158:15:158:16 | i9 | semmle.label | i9 | +| argvLocal.c:158:15:158:16 | i9 | semmle.label | i9 | +| argvLocal.c:163:22:163:25 | argv | semmle.label | argv | +| argvLocal.c:163:22:163:25 | argv | semmle.label | argv | +| argvLocal.c:164:9:164:11 | (const char *)... | semmle.label | (const char *)... | +| argvLocal.c:164:9:164:11 | (const char *)... | semmle.label | (const char *)... | +| argvLocal.c:164:9:164:11 | i91 | semmle.label | i91 | +| argvLocal.c:165:15:165:17 | i91 | semmle.label | i91 | +| argvLocal.c:165:15:165:17 | i91 | semmle.label | i91 | +| argvLocal.c:165:15:165:17 | i91 | semmle.label | i91 | +| argvLocal.c:168:18:168:21 | argv | semmle.label | argv | +| argvLocal.c:168:18:168:21 | argv | semmle.label | argv | +| argvLocal.c:169:9:169:20 | (char *)... | semmle.label | (char *)... | +| argvLocal.c:169:9:169:20 | (char *)... | semmle.label | (char *)... | +| argvLocal.c:169:9:169:20 | (const char *)... | semmle.label | (const char *)... | +| argvLocal.c:169:9:169:20 | (const char *)... | semmle.label | (const char *)... | +| argvLocal.c:169:18:169:20 | i10 | semmle.label | i10 | +| argvLocal.c:169:18:169:20 | i10 | semmle.label | i10 | +| argvLocal.c:169:18:169:20 | i10 | semmle.label | i10 | +| argvLocal.c:170:15:170:26 | (char *)... | semmle.label | (char *)... | +| argvLocal.c:170:15:170:26 | (char *)... | semmle.label | (char *)... | +| argvLocal.c:170:24:170:26 | i10 | semmle.label | i10 | +| argvLocal.c:170:24:170:26 | i10 | semmle.label | i10 | +| argvLocal.c:170:24:170:26 | i10 | semmle.label | i10 | +#select +| argvLocal.c:95:9:95:15 | access to array | argvLocal.c:95:9:95:12 | argv | argvLocal.c:95:9:95:15 | access to array | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | argvLocal.c:95:9:95:12 | argv | argv | +| argvLocal.c:96:15:96:21 | access to array | argvLocal.c:96:15:96:18 | argv | argvLocal.c:96:15:96:21 | access to array | The value of this argument may come from $@ and is being used as a formatting argument to printWrapper(correct), which calls printf(format) | argvLocal.c:96:15:96:18 | argv | argv | +| argvLocal.c:101:9:101:10 | i1 | argvLocal.c:100:7:100:10 | argv | argvLocal.c:101:9:101:10 | i1 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | argvLocal.c:100:7:100:10 | argv | argv | +| argvLocal.c:102:15:102:16 | i1 | argvLocal.c:100:7:100:10 | argv | argvLocal.c:102:15:102:16 | i1 | The value of this argument may come from $@ and is being used as a formatting argument to printWrapper(correct), which calls printf(format) | argvLocal.c:100:7:100:10 | argv | argv | +| argvLocal.c:106:9:106:13 | access to array | argvLocal.c:105:14:105:17 | argv | argvLocal.c:106:9:106:13 | access to array | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | argvLocal.c:105:14:105:17 | argv | argv | +| argvLocal.c:107:15:107:19 | access to array | argvLocal.c:105:14:105:17 | argv | argvLocal.c:107:15:107:19 | access to array | The value of this argument may come from $@ and is being used as a formatting argument to printWrapper(correct), which calls printf(format) | argvLocal.c:105:14:105:17 | argv | argv | +| argvLocal.c:110:9:110:11 | * ... | argvLocal.c:105:14:105:17 | argv | argvLocal.c:110:9:110:11 | * ... | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | argvLocal.c:105:14:105:17 | argv | argv | +| argvLocal.c:111:15:111:17 | * ... | argvLocal.c:105:14:105:17 | argv | argvLocal.c:111:15:111:17 | * ... | The value of this argument may come from $@ and is being used as a formatting argument to printWrapper(correct), which calls printf(format) | argvLocal.c:105:14:105:17 | argv | argv | +| argvLocal.c:116:9:116:10 | i3 | argvLocal.c:115:13:115:16 | argv | argvLocal.c:116:9:116:10 | i3 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | argvLocal.c:115:13:115:16 | argv | argv | +| argvLocal.c:117:15:117:16 | i3 | argvLocal.c:115:13:115:16 | argv | argvLocal.c:117:15:117:16 | i3 | The value of this argument may come from $@ and is being used as a formatting argument to printWrapper(correct), which calls printf(format) | argvLocal.c:115:13:115:16 | argv | argv | +| argvLocal.c:121:9:121:10 | i4 | argvLocal.c:115:13:115:16 | argv | argvLocal.c:121:9:121:10 | i4 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | argvLocal.c:115:13:115:16 | argv | argv | +| argvLocal.c:122:15:122:16 | i4 | argvLocal.c:115:13:115:16 | argv | argvLocal.c:122:15:122:16 | i4 | The value of this argument may come from $@ and is being used as a formatting argument to printWrapper(correct), which calls printf(format) | argvLocal.c:115:13:115:16 | argv | argv | +| argvLocal.c:127:9:127:10 | i5 | argvLocal.c:126:10:126:13 | argv | argvLocal.c:127:9:127:10 | i5 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | argvLocal.c:126:10:126:13 | argv | argv | +| argvLocal.c:128:15:128:16 | i5 | argvLocal.c:126:10:126:13 | argv | argvLocal.c:128:15:128:16 | i5 | The value of this argument may come from $@ and is being used as a formatting argument to printWrapper(correct), which calls printf(format) | argvLocal.c:126:10:126:13 | argv | argv | +| argvLocal.c:131:9:131:14 | ... + ... | argvLocal.c:126:10:126:13 | argv | argvLocal.c:131:9:131:14 | ... + ... | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | argvLocal.c:126:10:126:13 | argv | argv | +| argvLocal.c:132:15:132:20 | ... + ... | argvLocal.c:126:10:126:13 | argv | argvLocal.c:132:15:132:20 | ... + ... | The value of this argument may come from $@ and is being used as a formatting argument to printWrapper(correct), which calls printf(format) | argvLocal.c:126:10:126:13 | argv | argv | +| argvLocal.c:135:9:135:12 | ... ++ | argvLocal.c:115:13:115:16 | argv | argvLocal.c:135:9:135:12 | ... ++ | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | argvLocal.c:115:13:115:16 | argv | argv | +| argvLocal.c:136:15:136:18 | -- ... | argvLocal.c:115:13:115:16 | argv | argvLocal.c:136:15:136:18 | -- ... | The value of this argument may come from $@ and is being used as a formatting argument to printWrapper(correct), which calls printf(format) | argvLocal.c:115:13:115:16 | argv | argv | +| argvLocal.c:144:9:144:10 | i7 | argvLocal.c:100:7:100:10 | argv | argvLocal.c:144:9:144:10 | i7 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | argvLocal.c:100:7:100:10 | argv | argv | +| argvLocal.c:145:15:145:16 | i7 | argvLocal.c:100:7:100:10 | argv | argvLocal.c:145:15:145:16 | i7 | The value of this argument may come from $@ and is being used as a formatting argument to printWrapper(correct), which calls printf(format) | argvLocal.c:100:7:100:10 | argv | argv | +| argvLocal.c:150:9:150:10 | i8 | argvLocal.c:149:11:149:14 | argv | argvLocal.c:150:9:150:10 | i8 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | argvLocal.c:149:11:149:14 | argv | argv | +| argvLocal.c:151:15:151:16 | i8 | argvLocal.c:149:11:149:14 | argv | argvLocal.c:151:15:151:16 | i8 | The value of this argument may come from $@ and is being used as a formatting argument to printWrapper(correct), which calls printf(format) | argvLocal.c:149:11:149:14 | argv | argv | +| argvLocal.c:157:9:157:10 | i9 | argvLocal.c:156:23:156:26 | argv | argvLocal.c:157:9:157:10 | i9 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | argvLocal.c:156:23:156:26 | argv | argv | +| argvLocal.c:158:15:158:16 | i9 | argvLocal.c:156:23:156:26 | argv | argvLocal.c:158:15:158:16 | i9 | The value of this argument may come from $@ and is being used as a formatting argument to printWrapper(correct), which calls printf(format) | argvLocal.c:156:23:156:26 | argv | argv | +| argvLocal.c:164:9:164:11 | i91 | argvLocal.c:163:22:163:25 | argv | argvLocal.c:164:9:164:11 | i91 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | argvLocal.c:163:22:163:25 | argv | argv | +| argvLocal.c:165:15:165:17 | i91 | argvLocal.c:163:22:163:25 | argv | argvLocal.c:165:15:165:17 | i91 | The value of this argument may come from $@ and is being used as a formatting argument to printWrapper(correct), which calls printf(format) | argvLocal.c:163:22:163:25 | argv | argv | +| argvLocal.c:169:18:169:20 | i10 | argvLocal.c:168:18:168:21 | argv | argvLocal.c:169:18:169:20 | i10 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | argvLocal.c:168:18:168:21 | argv | argv | +| argvLocal.c:170:24:170:26 | i10 | argvLocal.c:168:18:168:21 | argv | argvLocal.c:170:24:170:26 | i10 | The value of this argument may come from $@ and is being used as a formatting argument to printWrapper(correct), which calls printf(format) | argvLocal.c:168:18:168:21 | argv | argv | diff --git a/cpp/ql/test/query-tests/Security/CWE/CWE-134/semmle/funcs/funcsLocal.expected b/cpp/ql/test/query-tests/Security/CWE/CWE-134/semmle/funcs/funcsLocal.expected index 5ae6a003bed..a05e392ecf2 100644 --- a/cpp/ql/test/query-tests/Security/CWE/CWE-134/semmle/funcs/funcsLocal.expected +++ b/cpp/ql/test/query-tests/Security/CWE/CWE-134/semmle/funcs/funcsLocal.expected @@ -1,8 +1,83 @@ -| funcsLocal.c:17:9:17:10 | i1 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | funcsLocal.c:16:8:16:9 | i1 | fread | -| funcsLocal.c:27:9:27:10 | i3 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | funcsLocal.c:26:8:26:9 | i3 | fgets | -| funcsLocal.c:32:9:32:10 | i4 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | funcsLocal.c:31:13:31:17 | call to fgets | fgets | -| funcsLocal.c:32:9:32:10 | i4 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | funcsLocal.c:31:19:31:21 | i41 | fgets | -| funcsLocal.c:37:9:37:10 | i5 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | funcsLocal.c:36:7:36:8 | i5 | gets | -| funcsLocal.c:42:9:42:10 | i6 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | funcsLocal.c:41:13:41:16 | call to gets | gets | -| funcsLocal.c:42:9:42:10 | i6 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | funcsLocal.c:41:18:41:20 | i61 | gets | -| funcsLocal.c:58:9:58:10 | e1 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | funcsLocal.c:16:8:16:9 | i1 | fread | +edges +| funcsLocal.c:16:8:16:9 | fread output argument | funcsLocal.c:17:9:17:10 | (const char *)... | +| funcsLocal.c:16:8:16:9 | fread output argument | funcsLocal.c:17:9:17:10 | i1 | +| funcsLocal.c:16:8:16:9 | fread output argument | funcsLocal.c:58:9:58:10 | (const char *)... | +| funcsLocal.c:16:8:16:9 | fread output argument | funcsLocal.c:58:9:58:10 | e1 | +| funcsLocal.c:16:8:16:9 | i1 | funcsLocal.c:17:9:17:10 | (const char *)... | +| funcsLocal.c:16:8:16:9 | i1 | funcsLocal.c:17:9:17:10 | i1 | +| funcsLocal.c:16:8:16:9 | i1 | funcsLocal.c:58:9:58:10 | (const char *)... | +| funcsLocal.c:16:8:16:9 | i1 | funcsLocal.c:58:9:58:10 | e1 | +| funcsLocal.c:26:8:26:9 | fgets output argument | funcsLocal.c:27:9:27:10 | (const char *)... | +| funcsLocal.c:26:8:26:9 | fgets output argument | funcsLocal.c:27:9:27:10 | i3 | +| funcsLocal.c:26:8:26:9 | i3 | funcsLocal.c:27:9:27:10 | (const char *)... | +| funcsLocal.c:26:8:26:9 | i3 | funcsLocal.c:27:9:27:10 | i3 | +| funcsLocal.c:31:13:31:17 | call to fgets | funcsLocal.c:32:9:32:10 | (const char *)... | +| funcsLocal.c:31:13:31:17 | call to fgets | funcsLocal.c:32:9:32:10 | (const char *)... | +| funcsLocal.c:31:13:31:17 | call to fgets | funcsLocal.c:32:9:32:10 | i4 | +| funcsLocal.c:31:13:31:17 | call to fgets | funcsLocal.c:32:9:32:10 | i4 | +| funcsLocal.c:31:13:31:17 | call to fgets | funcsLocal.c:32:9:32:10 | i4 | +| funcsLocal.c:31:13:31:17 | call to fgets | funcsLocal.c:32:9:32:10 | i4 | +| funcsLocal.c:31:19:31:21 | fgets output argument | funcsLocal.c:32:9:32:10 | (const char *)... | +| funcsLocal.c:31:19:31:21 | fgets output argument | funcsLocal.c:32:9:32:10 | i4 | +| funcsLocal.c:31:19:31:21 | i41 | funcsLocal.c:32:9:32:10 | (const char *)... | +| funcsLocal.c:31:19:31:21 | i41 | funcsLocal.c:32:9:32:10 | i4 | +| funcsLocal.c:36:7:36:8 | gets output argument | funcsLocal.c:37:9:37:10 | (const char *)... | +| funcsLocal.c:36:7:36:8 | gets output argument | funcsLocal.c:37:9:37:10 | i5 | +| funcsLocal.c:36:7:36:8 | i5 | funcsLocal.c:37:9:37:10 | (const char *)... | +| funcsLocal.c:36:7:36:8 | i5 | funcsLocal.c:37:9:37:10 | i5 | +| funcsLocal.c:41:13:41:16 | call to gets | funcsLocal.c:42:9:42:10 | (const char *)... | +| funcsLocal.c:41:13:41:16 | call to gets | funcsLocal.c:42:9:42:10 | (const char *)... | +| funcsLocal.c:41:13:41:16 | call to gets | funcsLocal.c:42:9:42:10 | i6 | +| funcsLocal.c:41:13:41:16 | call to gets | funcsLocal.c:42:9:42:10 | i6 | +| funcsLocal.c:41:13:41:16 | call to gets | funcsLocal.c:42:9:42:10 | i6 | +| funcsLocal.c:41:13:41:16 | call to gets | funcsLocal.c:42:9:42:10 | i6 | +| funcsLocal.c:41:18:41:20 | gets output argument | funcsLocal.c:42:9:42:10 | (const char *)... | +| funcsLocal.c:41:18:41:20 | gets output argument | funcsLocal.c:42:9:42:10 | i6 | +| funcsLocal.c:41:18:41:20 | i61 | funcsLocal.c:42:9:42:10 | (const char *)... | +| funcsLocal.c:41:18:41:20 | i61 | funcsLocal.c:42:9:42:10 | i6 | +nodes +| funcsLocal.c:16:8:16:9 | fread output argument | semmle.label | fread output argument | +| funcsLocal.c:16:8:16:9 | i1 | semmle.label | i1 | +| funcsLocal.c:17:9:17:10 | (const char *)... | semmle.label | (const char *)... | +| funcsLocal.c:17:9:17:10 | (const char *)... | semmle.label | (const char *)... | +| funcsLocal.c:17:9:17:10 | i1 | semmle.label | i1 | +| funcsLocal.c:26:8:26:9 | fgets output argument | semmle.label | fgets output argument | +| funcsLocal.c:26:8:26:9 | i3 | semmle.label | i3 | +| funcsLocal.c:27:9:27:10 | (const char *)... | semmle.label | (const char *)... | +| funcsLocal.c:27:9:27:10 | (const char *)... | semmle.label | (const char *)... | +| funcsLocal.c:27:9:27:10 | i3 | semmle.label | i3 | +| funcsLocal.c:31:13:31:17 | call to fgets | semmle.label | call to fgets | +| funcsLocal.c:31:13:31:17 | call to fgets | semmle.label | call to fgets | +| funcsLocal.c:31:19:31:21 | fgets output argument | semmle.label | fgets output argument | +| funcsLocal.c:31:19:31:21 | i41 | semmle.label | i41 | +| funcsLocal.c:32:9:32:10 | (const char *)... | semmle.label | (const char *)... | +| funcsLocal.c:32:9:32:10 | (const char *)... | semmle.label | (const char *)... | +| funcsLocal.c:32:9:32:10 | i4 | semmle.label | i4 | +| funcsLocal.c:32:9:32:10 | i4 | semmle.label | i4 | +| funcsLocal.c:32:9:32:10 | i4 | semmle.label | i4 | +| funcsLocal.c:36:7:36:8 | gets output argument | semmle.label | gets output argument | +| funcsLocal.c:36:7:36:8 | i5 | semmle.label | i5 | +| funcsLocal.c:37:9:37:10 | (const char *)... | semmle.label | (const char *)... | +| funcsLocal.c:37:9:37:10 | (const char *)... | semmle.label | (const char *)... | +| funcsLocal.c:37:9:37:10 | i5 | semmle.label | i5 | +| funcsLocal.c:41:13:41:16 | call to gets | semmle.label | call to gets | +| funcsLocal.c:41:13:41:16 | call to gets | semmle.label | call to gets | +| funcsLocal.c:41:18:41:20 | gets output argument | semmle.label | gets output argument | +| funcsLocal.c:41:18:41:20 | i61 | semmle.label | i61 | +| funcsLocal.c:42:9:42:10 | (const char *)... | semmle.label | (const char *)... | +| funcsLocal.c:42:9:42:10 | (const char *)... | semmle.label | (const char *)... | +| funcsLocal.c:42:9:42:10 | i6 | semmle.label | i6 | +| funcsLocal.c:42:9:42:10 | i6 | semmle.label | i6 | +| funcsLocal.c:42:9:42:10 | i6 | semmle.label | i6 | +| funcsLocal.c:58:9:58:10 | (const char *)... | semmle.label | (const char *)... | +| funcsLocal.c:58:9:58:10 | (const char *)... | semmle.label | (const char *)... | +| funcsLocal.c:58:9:58:10 | e1 | semmle.label | e1 | +#select +| funcsLocal.c:17:9:17:10 | i1 | funcsLocal.c:16:8:16:9 | i1 | funcsLocal.c:17:9:17:10 | i1 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | funcsLocal.c:16:8:16:9 | i1 | fread | +| funcsLocal.c:27:9:27:10 | i3 | funcsLocal.c:26:8:26:9 | i3 | funcsLocal.c:27:9:27:10 | i3 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | funcsLocal.c:26:8:26:9 | i3 | fgets | +| funcsLocal.c:32:9:32:10 | i4 | funcsLocal.c:31:13:31:17 | call to fgets | funcsLocal.c:32:9:32:10 | i4 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | funcsLocal.c:31:13:31:17 | call to fgets | fgets | +| funcsLocal.c:32:9:32:10 | i4 | funcsLocal.c:31:19:31:21 | i41 | funcsLocal.c:32:9:32:10 | i4 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | funcsLocal.c:31:19:31:21 | i41 | fgets | +| funcsLocal.c:37:9:37:10 | i5 | funcsLocal.c:36:7:36:8 | i5 | funcsLocal.c:37:9:37:10 | i5 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | funcsLocal.c:36:7:36:8 | i5 | gets | +| funcsLocal.c:42:9:42:10 | i6 | funcsLocal.c:41:13:41:16 | call to gets | funcsLocal.c:42:9:42:10 | i6 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | funcsLocal.c:41:13:41:16 | call to gets | gets | +| funcsLocal.c:42:9:42:10 | i6 | funcsLocal.c:41:18:41:20 | i61 | funcsLocal.c:42:9:42:10 | i6 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | funcsLocal.c:41:18:41:20 | i61 | gets | +| funcsLocal.c:58:9:58:10 | e1 | funcsLocal.c:16:8:16:9 | i1 | funcsLocal.c:58:9:58:10 | e1 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | funcsLocal.c:16:8:16:9 | i1 | fread | diff --git a/cpp/ql/test/query-tests/Security/CWE/CWE-134/semmle/globalVars/UncontrolledFormatString.expected b/cpp/ql/test/query-tests/Security/CWE/CWE-134/semmle/globalVars/UncontrolledFormatString.expected index e69de29bb2d..58e3dda0964 100644 --- a/cpp/ql/test/query-tests/Security/CWE/CWE-134/semmle/globalVars/UncontrolledFormatString.expected +++ b/cpp/ql/test/query-tests/Security/CWE/CWE-134/semmle/globalVars/UncontrolledFormatString.expected @@ -0,0 +1,3 @@ +edges +nodes +#select diff --git a/cpp/ql/test/query-tests/Security/CWE/CWE-134/semmle/globalVars/UncontrolledFormatStringThroughGlobalVar.expected b/cpp/ql/test/query-tests/Security/CWE/CWE-134/semmle/globalVars/UncontrolledFormatStringThroughGlobalVar.expected index 0f29313da8e..ba735a3e140 100644 --- a/cpp/ql/test/query-tests/Security/CWE/CWE-134/semmle/globalVars/UncontrolledFormatStringThroughGlobalVar.expected +++ b/cpp/ql/test/query-tests/Security/CWE/CWE-134/semmle/globalVars/UncontrolledFormatStringThroughGlobalVar.expected @@ -1,5 +1,65 @@ -| globalVars.c:27:9:27:12 | copy | This value may flow through $@, originating from $@, and is a formatting argument to printf(format). | globalVars.c:8:7:8:10 | copy | copy | globalVars.c:24:11:24:14 | argv | argv | -| globalVars.c:30:15:30:18 | copy | This value may flow through $@, originating from $@, and is a formatting argument to printWrapper(str), which calls printf(format). | globalVars.c:8:7:8:10 | copy | copy | globalVars.c:24:11:24:14 | argv | argv | -| globalVars.c:38:9:38:13 | copy2 | This value may flow through $@, originating from $@, and is a formatting argument to printf(format). | globalVars.c:9:7:9:11 | copy2 | copy2 | globalVars.c:24:11:24:14 | argv | argv | -| globalVars.c:41:15:41:19 | copy2 | This value may flow through $@, originating from $@, and is a formatting argument to printWrapper(str), which calls printf(format). | globalVars.c:9:7:9:11 | copy2 | copy2 | globalVars.c:24:11:24:14 | argv | argv | -| globalVars.c:50:9:50:13 | copy2 | This value may flow through $@, originating from $@, and is a formatting argument to printf(format). | globalVars.c:9:7:9:11 | copy2 | copy2 | globalVars.c:24:11:24:14 | argv | argv | +edges +| globalVars.c:8:7:8:10 | copy | globalVars.c:27:9:27:12 | copy | +| globalVars.c:8:7:8:10 | copy | globalVars.c:27:9:27:12 | copy | +| globalVars.c:8:7:8:10 | copy | globalVars.c:27:9:27:12 | copy | +| globalVars.c:8:7:8:10 | copy | globalVars.c:30:15:30:18 | copy | +| globalVars.c:8:7:8:10 | copy | globalVars.c:30:15:30:18 | copy | +| globalVars.c:8:7:8:10 | copy | globalVars.c:35:11:35:14 | copy | +| globalVars.c:9:7:9:11 | copy2 | globalVars.c:38:9:38:13 | copy2 | +| globalVars.c:9:7:9:11 | copy2 | globalVars.c:38:9:38:13 | copy2 | +| globalVars.c:9:7:9:11 | copy2 | globalVars.c:38:9:38:13 | copy2 | +| globalVars.c:9:7:9:11 | copy2 | globalVars.c:41:15:41:19 | copy2 | +| globalVars.c:9:7:9:11 | copy2 | globalVars.c:41:15:41:19 | copy2 | +| globalVars.c:9:7:9:11 | copy2 | globalVars.c:50:9:50:13 | copy2 | +| globalVars.c:9:7:9:11 | copy2 | globalVars.c:50:9:50:13 | copy2 | +| globalVars.c:9:7:9:11 | copy2 | globalVars.c:50:9:50:13 | copy2 | +| globalVars.c:11:22:11:25 | argv | globalVars.c:12:2:12:15 | Store | +| globalVars.c:12:2:12:15 | Store | globalVars.c:8:7:8:10 | copy | +| globalVars.c:15:21:15:23 | val | globalVars.c:16:2:16:12 | Store | +| globalVars.c:16:2:16:12 | Store | globalVars.c:9:7:9:11 | copy2 | +| globalVars.c:24:11:24:14 | argv | globalVars.c:11:22:11:25 | argv | +| globalVars.c:24:11:24:14 | argv | globalVars.c:11:22:11:25 | argv | +| globalVars.c:27:9:27:12 | copy | globalVars.c:27:9:27:12 | (const char *)... | +| globalVars.c:27:9:27:12 | copy | globalVars.c:27:9:27:12 | copy | +| globalVars.c:35:11:35:14 | copy | globalVars.c:15:21:15:23 | val | +| globalVars.c:38:9:38:13 | copy2 | globalVars.c:38:9:38:13 | (const char *)... | +| globalVars.c:38:9:38:13 | copy2 | globalVars.c:38:9:38:13 | copy2 | +| globalVars.c:50:9:50:13 | copy2 | globalVars.c:50:9:50:13 | (const char *)... | +| globalVars.c:50:9:50:13 | copy2 | globalVars.c:50:9:50:13 | copy2 | +nodes +| globalVars.c:8:7:8:10 | copy | semmle.label | copy | +| globalVars.c:9:7:9:11 | copy2 | semmle.label | copy2 | +| globalVars.c:11:22:11:25 | argv | semmle.label | argv | +| globalVars.c:12:2:12:15 | Store | semmle.label | Store | +| globalVars.c:15:21:15:23 | val | semmle.label | val | +| globalVars.c:16:2:16:12 | Store | semmle.label | Store | +| globalVars.c:24:11:24:14 | argv | semmle.label | argv | +| globalVars.c:24:11:24:14 | argv | semmle.label | argv | +| globalVars.c:27:9:27:12 | (const char *)... | semmle.label | (const char *)... | +| globalVars.c:27:9:27:12 | (const char *)... | semmle.label | (const char *)... | +| globalVars.c:27:9:27:12 | copy | semmle.label | copy | +| globalVars.c:27:9:27:12 | copy | semmle.label | copy | +| globalVars.c:27:9:27:12 | copy | semmle.label | copy | +| globalVars.c:30:15:30:18 | copy | semmle.label | copy | +| globalVars.c:30:15:30:18 | copy | semmle.label | copy | +| globalVars.c:30:15:30:18 | copy | semmle.label | copy | +| globalVars.c:35:11:35:14 | copy | semmle.label | copy | +| globalVars.c:38:9:38:13 | (const char *)... | semmle.label | (const char *)... | +| globalVars.c:38:9:38:13 | (const char *)... | semmle.label | (const char *)... | +| globalVars.c:38:9:38:13 | copy2 | semmle.label | copy2 | +| globalVars.c:38:9:38:13 | copy2 | semmle.label | copy2 | +| globalVars.c:38:9:38:13 | copy2 | semmle.label | copy2 | +| globalVars.c:41:15:41:19 | copy2 | semmle.label | copy2 | +| globalVars.c:41:15:41:19 | copy2 | semmle.label | copy2 | +| globalVars.c:41:15:41:19 | copy2 | semmle.label | copy2 | +| globalVars.c:50:9:50:13 | (const char *)... | semmle.label | (const char *)... | +| globalVars.c:50:9:50:13 | (const char *)... | semmle.label | (const char *)... | +| globalVars.c:50:9:50:13 | copy2 | semmle.label | copy2 | +| globalVars.c:50:9:50:13 | copy2 | semmle.label | copy2 | +| globalVars.c:50:9:50:13 | copy2 | semmle.label | copy2 | +#select +| globalVars.c:27:9:27:12 | copy | globalVars.c:24:11:24:14 | argv | globalVars.c:27:9:27:12 | copy | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | globalVars.c:24:11:24:14 | argv | argv | +| globalVars.c:30:15:30:18 | copy | globalVars.c:24:11:24:14 | argv | globalVars.c:30:15:30:18 | copy | The value of this argument may come from $@ and is being used as a formatting argument to printWrapper(str), which calls printf(format) | globalVars.c:24:11:24:14 | argv | argv | +| globalVars.c:38:9:38:13 | copy2 | globalVars.c:24:11:24:14 | argv | globalVars.c:38:9:38:13 | copy2 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | globalVars.c:24:11:24:14 | argv | argv | +| globalVars.c:41:15:41:19 | copy2 | globalVars.c:24:11:24:14 | argv | globalVars.c:41:15:41:19 | copy2 | The value of this argument may come from $@ and is being used as a formatting argument to printWrapper(str), which calls printf(format) | globalVars.c:24:11:24:14 | argv | argv | +| globalVars.c:50:9:50:13 | copy2 | globalVars.c:24:11:24:14 | argv | globalVars.c:50:9:50:13 | copy2 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | globalVars.c:24:11:24:14 | argv | argv | diff --git a/cpp/ql/test/query-tests/Security/CWE/CWE-134/semmle/ifs/ifs.expected b/cpp/ql/test/query-tests/Security/CWE/CWE-134/semmle/ifs/ifs.expected index d474364a0cc..62c36d0192d 100644 --- a/cpp/ql/test/query-tests/Security/CWE/CWE-134/semmle/ifs/ifs.expected +++ b/cpp/ql/test/query-tests/Security/CWE/CWE-134/semmle/ifs/ifs.expected @@ -1,11 +1,157 @@ -| ifs.c:62:9:62:10 | c7 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | ifs.c:61:8:61:11 | argv | argv | -| ifs.c:69:9:69:10 | c8 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | ifs.c:68:8:68:11 | argv | argv | -| ifs.c:75:9:75:10 | i1 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | ifs.c:74:8:74:11 | argv | argv | -| ifs.c:81:9:81:10 | i2 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | ifs.c:80:8:80:11 | argv | argv | -| ifs.c:87:9:87:10 | i3 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | ifs.c:86:8:86:11 | argv | argv | -| ifs.c:93:9:93:10 | i4 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | ifs.c:92:8:92:11 | argv | argv | -| ifs.c:99:9:99:10 | i5 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | ifs.c:98:8:98:11 | argv | argv | -| ifs.c:106:9:106:10 | i6 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | ifs.c:105:8:105:11 | argv | argv | -| ifs.c:112:9:112:10 | i7 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | ifs.c:111:8:111:11 | argv | argv | -| ifs.c:118:9:118:10 | i8 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | ifs.c:117:8:117:11 | argv | argv | -| ifs.c:124:9:124:10 | i9 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | ifs.c:123:8:123:11 | argv | argv | +edges +| ifs.c:61:8:61:11 | argv | ifs.c:62:9:62:10 | (const char *)... | +| ifs.c:61:8:61:11 | argv | ifs.c:62:9:62:10 | (const char *)... | +| ifs.c:61:8:61:11 | argv | ifs.c:62:9:62:10 | c7 | +| ifs.c:61:8:61:11 | argv | ifs.c:62:9:62:10 | c7 | +| ifs.c:61:8:61:11 | argv | ifs.c:62:9:62:10 | c7 | +| ifs.c:61:8:61:11 | argv | ifs.c:62:9:62:10 | c7 | +| ifs.c:68:8:68:11 | argv | ifs.c:69:9:69:10 | (const char *)... | +| ifs.c:68:8:68:11 | argv | ifs.c:69:9:69:10 | (const char *)... | +| ifs.c:68:8:68:11 | argv | ifs.c:69:9:69:10 | c8 | +| ifs.c:68:8:68:11 | argv | ifs.c:69:9:69:10 | c8 | +| ifs.c:68:8:68:11 | argv | ifs.c:69:9:69:10 | c8 | +| ifs.c:68:8:68:11 | argv | ifs.c:69:9:69:10 | c8 | +| ifs.c:74:8:74:11 | argv | ifs.c:75:9:75:10 | (const char *)... | +| ifs.c:74:8:74:11 | argv | ifs.c:75:9:75:10 | (const char *)... | +| ifs.c:74:8:74:11 | argv | ifs.c:75:9:75:10 | i1 | +| ifs.c:74:8:74:11 | argv | ifs.c:75:9:75:10 | i1 | +| ifs.c:74:8:74:11 | argv | ifs.c:75:9:75:10 | i1 | +| ifs.c:74:8:74:11 | argv | ifs.c:75:9:75:10 | i1 | +| ifs.c:80:8:80:11 | argv | ifs.c:81:9:81:10 | (const char *)... | +| ifs.c:80:8:80:11 | argv | ifs.c:81:9:81:10 | (const char *)... | +| ifs.c:80:8:80:11 | argv | ifs.c:81:9:81:10 | i2 | +| ifs.c:80:8:80:11 | argv | ifs.c:81:9:81:10 | i2 | +| ifs.c:80:8:80:11 | argv | ifs.c:81:9:81:10 | i2 | +| ifs.c:80:8:80:11 | argv | ifs.c:81:9:81:10 | i2 | +| ifs.c:86:8:86:11 | argv | ifs.c:87:9:87:10 | (const char *)... | +| ifs.c:86:8:86:11 | argv | ifs.c:87:9:87:10 | (const char *)... | +| ifs.c:86:8:86:11 | argv | ifs.c:87:9:87:10 | i3 | +| ifs.c:86:8:86:11 | argv | ifs.c:87:9:87:10 | i3 | +| ifs.c:86:8:86:11 | argv | ifs.c:87:9:87:10 | i3 | +| ifs.c:86:8:86:11 | argv | ifs.c:87:9:87:10 | i3 | +| ifs.c:92:8:92:11 | argv | ifs.c:93:9:93:10 | (const char *)... | +| ifs.c:92:8:92:11 | argv | ifs.c:93:9:93:10 | (const char *)... | +| ifs.c:92:8:92:11 | argv | ifs.c:93:9:93:10 | i4 | +| ifs.c:92:8:92:11 | argv | ifs.c:93:9:93:10 | i4 | +| ifs.c:92:8:92:11 | argv | ifs.c:93:9:93:10 | i4 | +| ifs.c:92:8:92:11 | argv | ifs.c:93:9:93:10 | i4 | +| ifs.c:98:8:98:11 | argv | ifs.c:99:9:99:10 | (const char *)... | +| ifs.c:98:8:98:11 | argv | ifs.c:99:9:99:10 | (const char *)... | +| ifs.c:98:8:98:11 | argv | ifs.c:99:9:99:10 | i5 | +| ifs.c:98:8:98:11 | argv | ifs.c:99:9:99:10 | i5 | +| ifs.c:98:8:98:11 | argv | ifs.c:99:9:99:10 | i5 | +| ifs.c:98:8:98:11 | argv | ifs.c:99:9:99:10 | i5 | +| ifs.c:105:8:105:11 | argv | ifs.c:106:9:106:10 | (const char *)... | +| ifs.c:105:8:105:11 | argv | ifs.c:106:9:106:10 | (const char *)... | +| ifs.c:105:8:105:11 | argv | ifs.c:106:9:106:10 | i6 | +| ifs.c:105:8:105:11 | argv | ifs.c:106:9:106:10 | i6 | +| ifs.c:105:8:105:11 | argv | ifs.c:106:9:106:10 | i6 | +| ifs.c:105:8:105:11 | argv | ifs.c:106:9:106:10 | i6 | +| ifs.c:111:8:111:11 | argv | ifs.c:112:9:112:10 | (const char *)... | +| ifs.c:111:8:111:11 | argv | ifs.c:112:9:112:10 | (const char *)... | +| ifs.c:111:8:111:11 | argv | ifs.c:112:9:112:10 | i7 | +| ifs.c:111:8:111:11 | argv | ifs.c:112:9:112:10 | i7 | +| ifs.c:111:8:111:11 | argv | ifs.c:112:9:112:10 | i7 | +| ifs.c:111:8:111:11 | argv | ifs.c:112:9:112:10 | i7 | +| ifs.c:117:8:117:11 | argv | ifs.c:118:9:118:10 | (const char *)... | +| ifs.c:117:8:117:11 | argv | ifs.c:118:9:118:10 | (const char *)... | +| ifs.c:117:8:117:11 | argv | ifs.c:118:9:118:10 | i8 | +| ifs.c:117:8:117:11 | argv | ifs.c:118:9:118:10 | i8 | +| ifs.c:117:8:117:11 | argv | ifs.c:118:9:118:10 | i8 | +| ifs.c:117:8:117:11 | argv | ifs.c:118:9:118:10 | i8 | +| ifs.c:123:8:123:11 | argv | ifs.c:124:9:124:10 | (const char *)... | +| ifs.c:123:8:123:11 | argv | ifs.c:124:9:124:10 | (const char *)... | +| ifs.c:123:8:123:11 | argv | ifs.c:124:9:124:10 | i9 | +| ifs.c:123:8:123:11 | argv | ifs.c:124:9:124:10 | i9 | +| ifs.c:123:8:123:11 | argv | ifs.c:124:9:124:10 | i9 | +| ifs.c:123:8:123:11 | argv | ifs.c:124:9:124:10 | i9 | +nodes +| ifs.c:61:8:61:11 | argv | semmle.label | argv | +| ifs.c:61:8:61:11 | argv | semmle.label | argv | +| ifs.c:62:9:62:10 | (const char *)... | semmle.label | (const char *)... | +| ifs.c:62:9:62:10 | (const char *)... | semmle.label | (const char *)... | +| ifs.c:62:9:62:10 | c7 | semmle.label | c7 | +| ifs.c:62:9:62:10 | c7 | semmle.label | c7 | +| ifs.c:62:9:62:10 | c7 | semmle.label | c7 | +| ifs.c:68:8:68:11 | argv | semmle.label | argv | +| ifs.c:68:8:68:11 | argv | semmle.label | argv | +| ifs.c:69:9:69:10 | (const char *)... | semmle.label | (const char *)... | +| ifs.c:69:9:69:10 | (const char *)... | semmle.label | (const char *)... | +| ifs.c:69:9:69:10 | c8 | semmle.label | c8 | +| ifs.c:69:9:69:10 | c8 | semmle.label | c8 | +| ifs.c:69:9:69:10 | c8 | semmle.label | c8 | +| ifs.c:74:8:74:11 | argv | semmle.label | argv | +| ifs.c:74:8:74:11 | argv | semmle.label | argv | +| ifs.c:75:9:75:10 | (const char *)... | semmle.label | (const char *)... | +| ifs.c:75:9:75:10 | (const char *)... | semmle.label | (const char *)... | +| ifs.c:75:9:75:10 | i1 | semmle.label | i1 | +| ifs.c:75:9:75:10 | i1 | semmle.label | i1 | +| ifs.c:75:9:75:10 | i1 | semmle.label | i1 | +| ifs.c:80:8:80:11 | argv | semmle.label | argv | +| ifs.c:80:8:80:11 | argv | semmle.label | argv | +| ifs.c:81:9:81:10 | (const char *)... | semmle.label | (const char *)... | +| ifs.c:81:9:81:10 | (const char *)... | semmle.label | (const char *)... | +| ifs.c:81:9:81:10 | i2 | semmle.label | i2 | +| ifs.c:81:9:81:10 | i2 | semmle.label | i2 | +| ifs.c:81:9:81:10 | i2 | semmle.label | i2 | +| ifs.c:86:8:86:11 | argv | semmle.label | argv | +| ifs.c:86:8:86:11 | argv | semmle.label | argv | +| ifs.c:87:9:87:10 | (const char *)... | semmle.label | (const char *)... | +| ifs.c:87:9:87:10 | (const char *)... | semmle.label | (const char *)... | +| ifs.c:87:9:87:10 | i3 | semmle.label | i3 | +| ifs.c:87:9:87:10 | i3 | semmle.label | i3 | +| ifs.c:87:9:87:10 | i3 | semmle.label | i3 | +| ifs.c:92:8:92:11 | argv | semmle.label | argv | +| ifs.c:92:8:92:11 | argv | semmle.label | argv | +| ifs.c:93:9:93:10 | (const char *)... | semmle.label | (const char *)... | +| ifs.c:93:9:93:10 | (const char *)... | semmle.label | (const char *)... | +| ifs.c:93:9:93:10 | i4 | semmle.label | i4 | +| ifs.c:93:9:93:10 | i4 | semmle.label | i4 | +| ifs.c:93:9:93:10 | i4 | semmle.label | i4 | +| ifs.c:98:8:98:11 | argv | semmle.label | argv | +| ifs.c:98:8:98:11 | argv | semmle.label | argv | +| ifs.c:99:9:99:10 | (const char *)... | semmle.label | (const char *)... | +| ifs.c:99:9:99:10 | (const char *)... | semmle.label | (const char *)... | +| ifs.c:99:9:99:10 | i5 | semmle.label | i5 | +| ifs.c:99:9:99:10 | i5 | semmle.label | i5 | +| ifs.c:99:9:99:10 | i5 | semmle.label | i5 | +| ifs.c:105:8:105:11 | argv | semmle.label | argv | +| ifs.c:105:8:105:11 | argv | semmle.label | argv | +| ifs.c:106:9:106:10 | (const char *)... | semmle.label | (const char *)... | +| ifs.c:106:9:106:10 | (const char *)... | semmle.label | (const char *)... | +| ifs.c:106:9:106:10 | i6 | semmle.label | i6 | +| ifs.c:106:9:106:10 | i6 | semmle.label | i6 | +| ifs.c:106:9:106:10 | i6 | semmle.label | i6 | +| ifs.c:111:8:111:11 | argv | semmle.label | argv | +| ifs.c:111:8:111:11 | argv | semmle.label | argv | +| ifs.c:112:9:112:10 | (const char *)... | semmle.label | (const char *)... | +| ifs.c:112:9:112:10 | (const char *)... | semmle.label | (const char *)... | +| ifs.c:112:9:112:10 | i7 | semmle.label | i7 | +| ifs.c:112:9:112:10 | i7 | semmle.label | i7 | +| ifs.c:112:9:112:10 | i7 | semmle.label | i7 | +| ifs.c:117:8:117:11 | argv | semmle.label | argv | +| ifs.c:117:8:117:11 | argv | semmle.label | argv | +| ifs.c:118:9:118:10 | (const char *)... | semmle.label | (const char *)... | +| ifs.c:118:9:118:10 | (const char *)... | semmle.label | (const char *)... | +| ifs.c:118:9:118:10 | i8 | semmle.label | i8 | +| ifs.c:118:9:118:10 | i8 | semmle.label | i8 | +| ifs.c:118:9:118:10 | i8 | semmle.label | i8 | +| ifs.c:123:8:123:11 | argv | semmle.label | argv | +| ifs.c:123:8:123:11 | argv | semmle.label | argv | +| ifs.c:124:9:124:10 | (const char *)... | semmle.label | (const char *)... | +| ifs.c:124:9:124:10 | (const char *)... | semmle.label | (const char *)... | +| ifs.c:124:9:124:10 | i9 | semmle.label | i9 | +| ifs.c:124:9:124:10 | i9 | semmle.label | i9 | +| ifs.c:124:9:124:10 | i9 | semmle.label | i9 | +#select +| ifs.c:62:9:62:10 | c7 | ifs.c:61:8:61:11 | argv | ifs.c:62:9:62:10 | c7 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | ifs.c:61:8:61:11 | argv | argv | +| ifs.c:69:9:69:10 | c8 | ifs.c:68:8:68:11 | argv | ifs.c:69:9:69:10 | c8 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | ifs.c:68:8:68:11 | argv | argv | +| ifs.c:75:9:75:10 | i1 | ifs.c:74:8:74:11 | argv | ifs.c:75:9:75:10 | i1 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | ifs.c:74:8:74:11 | argv | argv | +| ifs.c:81:9:81:10 | i2 | ifs.c:80:8:80:11 | argv | ifs.c:81:9:81:10 | i2 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | ifs.c:80:8:80:11 | argv | argv | +| ifs.c:87:9:87:10 | i3 | ifs.c:86:8:86:11 | argv | ifs.c:87:9:87:10 | i3 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | ifs.c:86:8:86:11 | argv | argv | +| ifs.c:93:9:93:10 | i4 | ifs.c:92:8:92:11 | argv | ifs.c:93:9:93:10 | i4 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | ifs.c:92:8:92:11 | argv | argv | +| ifs.c:99:9:99:10 | i5 | ifs.c:98:8:98:11 | argv | ifs.c:99:9:99:10 | i5 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | ifs.c:98:8:98:11 | argv | argv | +| ifs.c:106:9:106:10 | i6 | ifs.c:105:8:105:11 | argv | ifs.c:106:9:106:10 | i6 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | ifs.c:105:8:105:11 | argv | argv | +| ifs.c:112:9:112:10 | i7 | ifs.c:111:8:111:11 | argv | ifs.c:112:9:112:10 | i7 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | ifs.c:111:8:111:11 | argv | argv | +| ifs.c:118:9:118:10 | i8 | ifs.c:117:8:117:11 | argv | ifs.c:118:9:118:10 | i8 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | ifs.c:117:8:117:11 | argv | argv | +| ifs.c:124:9:124:10 | i9 | ifs.c:123:8:123:11 | argv | ifs.c:124:9:124:10 | i9 | The value of this argument may come from $@ and is being used as a formatting argument to printf(format) | ifs.c:123:8:123:11 | argv | argv | diff --git a/cpp/ql/test/query-tests/Security/CWE/CWE-190/semmle/TaintedAllocationSize/TaintedAllocationSize.expected b/cpp/ql/test/query-tests/Security/CWE/CWE-190/semmle/TaintedAllocationSize/TaintedAllocationSize.expected index f77bf73d52d..3847e91bbc8 100644 --- a/cpp/ql/test/query-tests/Security/CWE/CWE-190/semmle/TaintedAllocationSize/TaintedAllocationSize.expected +++ b/cpp/ql/test/query-tests/Security/CWE/CWE-190/semmle/TaintedAllocationSize/TaintedAllocationSize.expected @@ -1,9 +1,151 @@ -| test.cpp:42:31:42:36 | call to malloc | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) | -| test.cpp:43:31:43:36 | call to malloc | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) | -| test.cpp:43:38:43:63 | ... * ... | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) | -| test.cpp:45:31:45:36 | call to malloc | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) | -| test.cpp:48:25:48:30 | call to malloc | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) | -| test.cpp:49:17:49:30 | new[] | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) | -| test.cpp:52:21:52:27 | call to realloc | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) | -| test.cpp:52:35:52:60 | ... * ... | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) | -| test.cpp:127:17:127:22 | call to malloc | This allocation size is derived from $@ and might overflow | test.cpp:123:25:123:30 | call to getenv | user input (getenv) | +edges +| test.cpp:39:21:39:24 | argv | test.cpp:42:38:42:44 | (size_t)... | +| test.cpp:39:21:39:24 | argv | test.cpp:42:38:42:44 | (size_t)... | +| test.cpp:39:21:39:24 | argv | test.cpp:42:38:42:44 | tainted | +| test.cpp:39:21:39:24 | argv | test.cpp:42:38:42:44 | tainted | +| test.cpp:39:21:39:24 | argv | test.cpp:42:38:42:44 | tainted | +| test.cpp:39:21:39:24 | argv | test.cpp:42:38:42:44 | tainted | +| test.cpp:39:21:39:24 | argv | test.cpp:43:38:43:44 | (unsigned long)... | +| test.cpp:39:21:39:24 | argv | test.cpp:43:38:43:44 | (unsigned long)... | +| test.cpp:39:21:39:24 | argv | test.cpp:43:38:43:44 | tainted | +| test.cpp:39:21:39:24 | argv | test.cpp:43:38:43:44 | tainted | +| test.cpp:39:21:39:24 | argv | test.cpp:43:38:43:44 | tainted | +| test.cpp:39:21:39:24 | argv | test.cpp:43:38:43:44 | tainted | +| test.cpp:39:21:39:24 | argv | test.cpp:43:38:43:63 | ... * ... | +| test.cpp:39:21:39:24 | argv | test.cpp:43:38:43:63 | ... * ... | +| test.cpp:39:21:39:24 | argv | test.cpp:43:38:43:63 | ... * ... | +| test.cpp:39:21:39:24 | argv | test.cpp:43:38:43:63 | ... * ... | +| test.cpp:39:21:39:24 | argv | test.cpp:45:38:45:63 | ... + ... | +| test.cpp:39:21:39:24 | argv | test.cpp:45:38:45:63 | ... + ... | +| test.cpp:39:21:39:24 | argv | test.cpp:45:38:45:63 | ... + ... | +| test.cpp:39:21:39:24 | argv | test.cpp:45:38:45:63 | ... + ... | +| test.cpp:39:21:39:24 | argv | test.cpp:48:32:48:35 | (size_t)... | +| test.cpp:39:21:39:24 | argv | test.cpp:48:32:48:35 | (size_t)... | +| test.cpp:39:21:39:24 | argv | test.cpp:48:32:48:35 | size | +| test.cpp:39:21:39:24 | argv | test.cpp:48:32:48:35 | size | +| test.cpp:39:21:39:24 | argv | test.cpp:48:32:48:35 | size | +| test.cpp:39:21:39:24 | argv | test.cpp:48:32:48:35 | size | +| test.cpp:39:21:39:24 | argv | test.cpp:49:26:49:29 | size | +| test.cpp:39:21:39:24 | argv | test.cpp:49:26:49:29 | size | +| test.cpp:39:21:39:24 | argv | test.cpp:49:26:49:29 | size | +| test.cpp:39:21:39:24 | argv | test.cpp:49:26:49:29 | size | +| test.cpp:39:21:39:24 | argv | test.cpp:52:35:52:60 | ... * ... | +| test.cpp:39:21:39:24 | argv | test.cpp:52:35:52:60 | ... * ... | +| test.cpp:39:21:39:24 | argv | test.cpp:52:35:52:60 | ... * ... | +| test.cpp:39:21:39:24 | argv | test.cpp:52:35:52:60 | ... * ... | +| test.cpp:39:21:39:24 | argv | test.cpp:52:54:52:60 | (unsigned long)... | +| test.cpp:39:21:39:24 | argv | test.cpp:52:54:52:60 | (unsigned long)... | +| test.cpp:39:21:39:24 | argv | test.cpp:52:54:52:60 | tainted | +| test.cpp:39:21:39:24 | argv | test.cpp:52:54:52:60 | tainted | +| test.cpp:39:21:39:24 | argv | test.cpp:52:54:52:60 | tainted | +| test.cpp:39:21:39:24 | argv | test.cpp:52:54:52:60 | tainted | +| test.cpp:123:18:123:23 | call to getenv | test.cpp:127:24:127:27 | (unsigned long)... | +| test.cpp:123:18:123:23 | call to getenv | test.cpp:127:24:127:27 | size | +| test.cpp:123:18:123:23 | call to getenv | test.cpp:127:24:127:27 | size | +| test.cpp:123:18:123:23 | call to getenv | test.cpp:127:24:127:41 | ... * ... | +| test.cpp:123:18:123:23 | call to getenv | test.cpp:127:24:127:41 | ... * ... | +| test.cpp:123:18:123:31 | (const char *)... | test.cpp:127:24:127:27 | (unsigned long)... | +| test.cpp:123:18:123:31 | (const char *)... | test.cpp:127:24:127:27 | size | +| test.cpp:123:18:123:31 | (const char *)... | test.cpp:127:24:127:27 | size | +| test.cpp:123:18:123:31 | (const char *)... | test.cpp:127:24:127:41 | ... * ... | +| test.cpp:123:18:123:31 | (const char *)... | test.cpp:127:24:127:41 | ... * ... | +| test.cpp:132:19:132:24 | call to getenv | test.cpp:134:10:134:13 | (unsigned long)... | +| test.cpp:132:19:132:24 | call to getenv | test.cpp:134:10:134:13 | size | +| test.cpp:132:19:132:24 | call to getenv | test.cpp:134:10:134:13 | size | +| test.cpp:132:19:132:24 | call to getenv | test.cpp:134:10:134:27 | ... * ... | +| test.cpp:132:19:132:24 | call to getenv | test.cpp:134:10:134:27 | ... * ... | +| test.cpp:132:19:132:32 | (const char *)... | test.cpp:134:10:134:13 | (unsigned long)... | +| test.cpp:132:19:132:32 | (const char *)... | test.cpp:134:10:134:13 | size | +| test.cpp:132:19:132:32 | (const char *)... | test.cpp:134:10:134:13 | size | +| test.cpp:132:19:132:32 | (const char *)... | test.cpp:134:10:134:27 | ... * ... | +| test.cpp:132:19:132:32 | (const char *)... | test.cpp:134:10:134:27 | ... * ... | +| test.cpp:138:19:138:24 | call to getenv | test.cpp:142:11:142:14 | (unsigned long)... | +| test.cpp:138:19:138:24 | call to getenv | test.cpp:142:11:142:14 | size | +| test.cpp:138:19:138:24 | call to getenv | test.cpp:142:11:142:14 | size | +| test.cpp:138:19:138:24 | call to getenv | test.cpp:142:11:142:28 | ... * ... | +| test.cpp:138:19:138:24 | call to getenv | test.cpp:142:11:142:28 | ... * ... | +| test.cpp:138:19:138:32 | (const char *)... | test.cpp:142:11:142:14 | (unsigned long)... | +| test.cpp:138:19:138:32 | (const char *)... | test.cpp:142:11:142:14 | size | +| test.cpp:138:19:138:32 | (const char *)... | test.cpp:142:11:142:14 | size | +| test.cpp:138:19:138:32 | (const char *)... | test.cpp:142:11:142:28 | ... * ... | +| test.cpp:138:19:138:32 | (const char *)... | test.cpp:142:11:142:28 | ... * ... | +nodes +| test.cpp:39:21:39:24 | argv | semmle.label | argv | +| test.cpp:39:21:39:24 | argv | semmle.label | argv | +| test.cpp:42:38:42:44 | (size_t)... | semmle.label | (size_t)... | +| test.cpp:42:38:42:44 | (size_t)... | semmle.label | (size_t)... | +| test.cpp:42:38:42:44 | tainted | semmle.label | tainted | +| test.cpp:42:38:42:44 | tainted | semmle.label | tainted | +| test.cpp:42:38:42:44 | tainted | semmle.label | tainted | +| test.cpp:43:38:43:44 | (unsigned long)... | semmle.label | (unsigned long)... | +| test.cpp:43:38:43:44 | (unsigned long)... | semmle.label | (unsigned long)... | +| test.cpp:43:38:43:44 | tainted | semmle.label | tainted | +| test.cpp:43:38:43:44 | tainted | semmle.label | tainted | +| test.cpp:43:38:43:44 | tainted | semmle.label | tainted | +| test.cpp:43:38:43:63 | ... * ... | semmle.label | ... * ... | +| test.cpp:43:38:43:63 | ... * ... | semmle.label | ... * ... | +| test.cpp:43:38:43:63 | ... * ... | semmle.label | ... * ... | +| test.cpp:45:38:45:63 | ... + ... | semmle.label | ... + ... | +| test.cpp:45:38:45:63 | ... + ... | semmle.label | ... + ... | +| test.cpp:45:38:45:63 | ... + ... | semmle.label | ... + ... | +| test.cpp:48:32:48:35 | (size_t)... | semmle.label | (size_t)... | +| test.cpp:48:32:48:35 | (size_t)... | semmle.label | (size_t)... | +| test.cpp:48:32:48:35 | size | semmle.label | size | +| test.cpp:48:32:48:35 | size | semmle.label | size | +| test.cpp:48:32:48:35 | size | semmle.label | size | +| test.cpp:49:26:49:29 | size | semmle.label | size | +| test.cpp:49:26:49:29 | size | semmle.label | size | +| test.cpp:49:26:49:29 | size | semmle.label | size | +| test.cpp:52:35:52:60 | ... * ... | semmle.label | ... * ... | +| test.cpp:52:35:52:60 | ... * ... | semmle.label | ... * ... | +| test.cpp:52:35:52:60 | ... * ... | semmle.label | ... * ... | +| test.cpp:52:54:52:60 | (unsigned long)... | semmle.label | (unsigned long)... | +| test.cpp:52:54:52:60 | (unsigned long)... | semmle.label | (unsigned long)... | +| test.cpp:52:54:52:60 | tainted | semmle.label | tainted | +| test.cpp:52:54:52:60 | tainted | semmle.label | tainted | +| test.cpp:52:54:52:60 | tainted | semmle.label | tainted | +| test.cpp:123:18:123:23 | call to getenv | semmle.label | call to getenv | +| test.cpp:123:18:123:31 | (const char *)... | semmle.label | (const char *)... | +| test.cpp:127:24:127:27 | (unsigned long)... | semmle.label | (unsigned long)... | +| test.cpp:127:24:127:27 | (unsigned long)... | semmle.label | (unsigned long)... | +| test.cpp:127:24:127:27 | size | semmle.label | size | +| test.cpp:127:24:127:27 | size | semmle.label | size | +| test.cpp:127:24:127:27 | size | semmle.label | size | +| test.cpp:127:24:127:41 | ... * ... | semmle.label | ... * ... | +| test.cpp:127:24:127:41 | ... * ... | semmle.label | ... * ... | +| test.cpp:127:24:127:41 | ... * ... | semmle.label | ... * ... | +| test.cpp:132:19:132:24 | call to getenv | semmle.label | call to getenv | +| test.cpp:132:19:132:32 | (const char *)... | semmle.label | (const char *)... | +| test.cpp:134:10:134:13 | (unsigned long)... | semmle.label | (unsigned long)... | +| test.cpp:134:10:134:13 | (unsigned long)... | semmle.label | (unsigned long)... | +| test.cpp:134:10:134:13 | size | semmle.label | size | +| test.cpp:134:10:134:13 | size | semmle.label | size | +| test.cpp:134:10:134:13 | size | semmle.label | size | +| test.cpp:134:10:134:27 | ... * ... | semmle.label | ... * ... | +| test.cpp:134:10:134:27 | ... * ... | semmle.label | ... * ... | +| test.cpp:134:10:134:27 | ... * ... | semmle.label | ... * ... | +| test.cpp:138:19:138:24 | call to getenv | semmle.label | call to getenv | +| test.cpp:138:19:138:32 | (const char *)... | semmle.label | (const char *)... | +| test.cpp:142:11:142:14 | (unsigned long)... | semmle.label | (unsigned long)... | +| test.cpp:142:11:142:14 | (unsigned long)... | semmle.label | (unsigned long)... | +| test.cpp:142:11:142:14 | size | semmle.label | size | +| test.cpp:142:11:142:14 | size | semmle.label | size | +| test.cpp:142:11:142:14 | size | semmle.label | size | +| test.cpp:142:11:142:28 | ... * ... | semmle.label | ... * ... | +| test.cpp:142:11:142:28 | ... * ... | semmle.label | ... * ... | +| test.cpp:142:11:142:28 | ... * ... | semmle.label | ... * ... | +#select +| test.cpp:42:31:42:36 | call to malloc | test.cpp:39:21:39:24 | argv | test.cpp:42:38:42:44 | tainted | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) | +| test.cpp:43:31:43:36 | call to malloc | test.cpp:39:21:39:24 | argv | test.cpp:43:38:43:63 | ... * ... | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) | +| test.cpp:43:38:43:63 | ... * ... | test.cpp:39:21:39:24 | argv | test.cpp:43:38:43:44 | tainted | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) | +| test.cpp:45:31:45:36 | call to malloc | test.cpp:39:21:39:24 | argv | test.cpp:45:38:45:63 | ... + ... | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) | +| test.cpp:48:25:48:30 | call to malloc | test.cpp:39:21:39:24 | argv | test.cpp:48:32:48:35 | size | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) | +| test.cpp:49:17:49:30 | new[] | test.cpp:39:21:39:24 | argv | test.cpp:49:26:49:29 | size | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) | +| test.cpp:52:21:52:27 | call to realloc | test.cpp:39:21:39:24 | argv | test.cpp:52:35:52:60 | ... * ... | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) | +| test.cpp:52:35:52:60 | ... * ... | test.cpp:39:21:39:24 | argv | test.cpp:52:54:52:60 | tainted | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) | +| test.cpp:127:17:127:22 | call to malloc | test.cpp:123:18:123:23 | call to getenv | test.cpp:127:24:127:41 | ... * ... | This allocation size is derived from $@ and might overflow | test.cpp:123:18:123:23 | call to getenv | user input (getenv) | +| test.cpp:127:24:127:41 | ... * ... | test.cpp:123:18:123:23 | call to getenv | test.cpp:127:24:127:27 | size | This allocation size is derived from $@ and might overflow | test.cpp:123:18:123:23 | call to getenv | user input (getenv) | +| test.cpp:134:3:134:8 | call to malloc | test.cpp:132:19:132:24 | call to getenv | test.cpp:134:10:134:27 | ... * ... | This allocation size is derived from $@ and might overflow | test.cpp:132:19:132:24 | call to getenv | user input (getenv) | +| test.cpp:134:10:134:27 | ... * ... | test.cpp:132:19:132:24 | call to getenv | test.cpp:134:10:134:13 | size | This allocation size is derived from $@ and might overflow | test.cpp:132:19:132:24 | call to getenv | user input (getenv) | +| test.cpp:142:4:142:9 | call to malloc | test.cpp:138:19:138:24 | call to getenv | test.cpp:142:11:142:28 | ... * ... | This allocation size is derived from $@ and might overflow | test.cpp:138:19:138:24 | call to getenv | user input (getenv) | +| test.cpp:142:11:142:28 | ... * ... | test.cpp:138:19:138:24 | call to getenv | test.cpp:142:11:142:14 | size | This allocation size is derived from $@ and might overflow | test.cpp:138:19:138:24 | call to getenv | user input (getenv) | diff --git a/cpp/ql/test/query-tests/Security/CWE/CWE-190/semmle/TaintedAllocationSize/test.cpp b/cpp/ql/test/query-tests/Security/CWE/CWE-190/semmle/TaintedAllocationSize/test.cpp index 5cd5f0c0246..4a1bbd8a9dc 100644 --- a/cpp/ql/test/query-tests/Security/CWE/CWE-190/semmle/TaintedAllocationSize/test.cpp +++ b/cpp/ql/test/query-tests/Security/CWE/CWE-190/semmle/TaintedAllocationSize/test.cpp @@ -39,10 +39,10 @@ int main(int argc, char **argv) { int tainted = atoi(argv[1]); MyStruct *arr1 = (MyStruct *)malloc(sizeof(MyStruct)); // GOOD - MyStruct *arr2 = (MyStruct *)malloc(tainted); // BAD + MyStruct *arr2 = (MyStruct *)malloc(tainted); // DUBIOUS (not multiplied by anything) MyStruct *arr3 = (MyStruct *)malloc(tainted * sizeof(MyStruct)); // BAD MyStruct *arr4 = (MyStruct *)malloc(getTainted() * sizeof(MyStruct)); // BAD [NOT DETECTED] - MyStruct *arr5 = (MyStruct *)malloc(sizeof(MyStruct) + tainted); // BAD [NOT DETECTED] + MyStruct *arr5 = (MyStruct *)malloc(sizeof(MyStruct) + tainted); // DUBIOUS (not multiplied by anything) int size = tainted * 8; char *chars1 = (char *)malloc(size); // BAD @@ -52,7 +52,7 @@ int main(int argc, char **argv) { arr1 = (MyStruct *)realloc(arr1, sizeof(MyStruct) * tainted); // BAD size = 8; - chars3 = new char[size]; // GOOD [FALSE POSITIVE] + chars3 = new char[size]; // GOOD return 0; } @@ -120,9 +120,73 @@ int bounded(int x, int limit) { } void open_file_bounded () { - int size = size = atoi(getenv("USER")); + int size = atoi(getenv("USER")); int bounded_size = bounded(size, MAX_SIZE); - int* a = (int*)malloc(bounded_size); // GOOD - int* b = (int*)malloc(size); // BAD -} \ No newline at end of file + int* a = (int*)malloc(bounded_size * sizeof(int)); // GOOD + int* b = (int*)malloc(size * sizeof(int)); // BAD +} + +void more_bounded_tests() { + { + int size = atoi(getenv("USER")); + + malloc(size * sizeof(int)); // BAD + } + + { + int size = atoi(getenv("USER")); + + if (size > 0) + { + malloc(size * sizeof(int)); // BAD + } + } + + { + int size = atoi(getenv("USER")); + + if (size < 100) + { + malloc(size * sizeof(int)); // BAD [NOT DETECTED] + } + } + + { + int size = atoi(getenv("USER")); + + if ((size > 0) && (size < 100)) + { + malloc(size * sizeof(int)); // GOOD + } + } + + { + int size = atoi(getenv("USER")); + + if ((100 > size) && (0 < size)) + { + malloc(size * sizeof(int)); // GOOD + } + } + + { + int size = atoi(getenv("USER")); + + malloc(size * sizeof(int)); // BAD [NOT DETECTED] + + if ((size > 0) && (size < 100)) + { + // ... + } + } + + { + int size = atoi(getenv("USER")); + + if (size > 100) + { + malloc(size * sizeof(int)); // BAD [NOT DETECTED] + } + } +} diff --git a/cpp/ql/test/query-tests/Security/CWE/CWE-190/semmle/uncontrolled/ArithmeticUncontrolled.expected b/cpp/ql/test/query-tests/Security/CWE/CWE-190/semmle/uncontrolled/ArithmeticUncontrolled.expected index ab635869f20..f68464b6b38 100644 --- a/cpp/ql/test/query-tests/Security/CWE/CWE-190/semmle/uncontrolled/ArithmeticUncontrolled.expected +++ b/cpp/ql/test/query-tests/Security/CWE/CWE-190/semmle/uncontrolled/ArithmeticUncontrolled.expected @@ -1,11 +1,138 @@ -| test.c:21:17:21:17 | r | $@ flows to here and is used in arithmetic, potentially causing an overflow. | test.c:18:13:18:16 | call to rand | Uncontrolled value | -| test.c:35:5:35:5 | r | $@ flows to here and is used in arithmetic, potentially causing an overflow. | test.c:34:13:34:18 | call to rand | Uncontrolled value | -| test.c:40:5:40:5 | r | $@ flows to here and is used in arithmetic, potentially causing an overflow. | test.c:39:13:39:21 | ... % ... | Uncontrolled value | -| test.c:45:5:45:5 | r | $@ flows to here and is used in arithmetic, potentially causing an overflow. | test.c:44:13:44:16 | call to rand | Uncontrolled value | -| test.c:56:5:56:5 | r | $@ flows to here and is used in arithmetic, potentially causing an overflow. | test.c:54:13:54:16 | call to rand | Uncontrolled value | -| test.c:67:5:67:5 | r | $@ flows to here and is used in arithmetic, potentially causing an overflow. | test.c:66:13:66:16 | call to rand | Uncontrolled value | -| test.c:77:9:77:9 | r | $@ flows to here and is used in arithmetic, potentially causing an underflow. | test.c:75:13:75:19 | ... ^ ... | Uncontrolled value | -| test.c:100:5:100:5 | r | $@ flows to here and is used in arithmetic, potentially causing an underflow. | test.c:99:14:99:19 | call to rand | Uncontrolled value | -| test.cpp:25:7:25:7 | r | $@ flows to here and is used in arithmetic, potentially causing an overflow. | test.cpp:8:9:8:12 | call to rand | Uncontrolled value | -| test.cpp:31:7:31:7 | r | $@ flows to here and is used in arithmetic, potentially causing an overflow. | test.cpp:13:10:13:13 | call to rand | Uncontrolled value | -| test.cpp:37:7:37:7 | r | $@ flows to here and is used in arithmetic, potentially causing an overflow. | test.cpp:18:9:18:12 | call to rand | Uncontrolled value | +edges +| test.c:18:13:18:16 | call to rand | test.c:21:17:21:17 | r | +| test.c:18:13:18:16 | call to rand | test.c:21:17:21:17 | r | +| test.c:18:13:18:16 | call to rand | test.c:21:17:21:17 | r | +| test.c:18:13:18:16 | call to rand | test.c:21:17:21:17 | r | +| test.c:34:13:34:18 | call to rand | test.c:35:5:35:5 | r | +| test.c:34:13:34:18 | call to rand | test.c:35:5:35:5 | r | +| test.c:34:13:34:18 | call to rand | test.c:35:5:35:5 | r | +| test.c:34:13:34:18 | call to rand | test.c:35:5:35:5 | r | +| test.c:39:13:39:21 | ... % ... | test.c:40:5:40:5 | r | +| test.c:39:13:39:21 | ... % ... | test.c:40:5:40:5 | r | +| test.c:39:13:39:21 | ... % ... | test.c:40:5:40:5 | r | +| test.c:39:13:39:21 | ... % ... | test.c:40:5:40:5 | r | +| test.c:44:13:44:16 | call to rand | test.c:45:5:45:5 | r | +| test.c:44:13:44:16 | call to rand | test.c:45:5:45:5 | r | +| test.c:44:13:44:16 | call to rand | test.c:45:5:45:5 | r | +| test.c:44:13:44:16 | call to rand | test.c:45:5:45:5 | r | +| test.c:54:13:54:16 | call to rand | test.c:56:5:56:5 | r | +| test.c:54:13:54:16 | call to rand | test.c:56:5:56:5 | r | +| test.c:54:13:54:16 | call to rand | test.c:56:5:56:5 | r | +| test.c:54:13:54:16 | call to rand | test.c:56:5:56:5 | r | +| test.c:60:13:60:16 | call to rand | test.c:61:5:61:5 | r | +| test.c:60:13:60:16 | call to rand | test.c:61:5:61:5 | r | +| test.c:60:13:60:16 | call to rand | test.c:61:5:61:5 | r | +| test.c:60:13:60:16 | call to rand | test.c:61:5:61:5 | r | +| test.c:60:13:60:16 | call to rand | test.c:62:5:62:5 | r | +| test.c:60:13:60:16 | call to rand | test.c:62:5:62:5 | r | +| test.c:60:13:60:16 | call to rand | test.c:62:5:62:5 | r | +| test.c:60:13:60:16 | call to rand | test.c:62:5:62:5 | r | +| test.c:66:13:66:16 | call to rand | test.c:67:5:67:5 | r | +| test.c:66:13:66:16 | call to rand | test.c:67:5:67:5 | r | +| test.c:66:13:66:16 | call to rand | test.c:67:5:67:5 | r | +| test.c:66:13:66:16 | call to rand | test.c:67:5:67:5 | r | +| test.c:75:13:75:19 | ... ^ ... | test.c:77:9:77:9 | r | +| test.c:75:13:75:19 | ... ^ ... | test.c:77:9:77:9 | r | +| test.c:75:13:75:19 | ... ^ ... | test.c:77:9:77:9 | r | +| test.c:75:13:75:19 | ... ^ ... | test.c:77:9:77:9 | r | +| test.c:99:14:99:19 | call to rand | test.c:100:5:100:5 | r | +| test.c:99:14:99:19 | call to rand | test.c:100:5:100:5 | r | +| test.c:99:14:99:19 | call to rand | test.c:100:5:100:5 | r | +| test.c:99:14:99:19 | call to rand | test.c:100:5:100:5 | r | +| test.cpp:8:9:8:12 | Store | test.cpp:24:11:24:18 | call to get_rand | +| test.cpp:8:9:8:12 | call to rand | test.cpp:8:9:8:12 | Store | +| test.cpp:8:9:8:12 | call to rand | test.cpp:8:9:8:12 | Store | +| test.cpp:13:2:13:15 | Chi | test.cpp:30:13:30:14 | get_rand2 output argument | +| test.cpp:13:10:13:13 | call to rand | test.cpp:13:2:13:15 | Chi | +| test.cpp:13:10:13:13 | call to rand | test.cpp:13:2:13:15 | Chi | +| test.cpp:18:2:18:14 | Chi | test.cpp:36:13:36:13 | get_rand3 output argument | +| test.cpp:18:9:18:12 | call to rand | test.cpp:18:2:18:14 | Chi | +| test.cpp:18:9:18:12 | call to rand | test.cpp:18:2:18:14 | Chi | +| test.cpp:24:11:24:18 | call to get_rand | test.cpp:25:7:25:7 | r | +| test.cpp:24:11:24:18 | call to get_rand | test.cpp:25:7:25:7 | r | +| test.cpp:30:13:30:14 | get_rand2 output argument | test.cpp:31:7:31:7 | r | +| test.cpp:30:13:30:14 | get_rand2 output argument | test.cpp:31:7:31:7 | r | +| test.cpp:36:13:36:13 | get_rand3 output argument | test.cpp:37:7:37:7 | r | +| test.cpp:36:13:36:13 | get_rand3 output argument | test.cpp:37:7:37:7 | r | +nodes +| test.c:18:13:18:16 | call to rand | semmle.label | call to rand | +| test.c:18:13:18:16 | call to rand | semmle.label | call to rand | +| test.c:21:17:21:17 | r | semmle.label | r | +| test.c:21:17:21:17 | r | semmle.label | r | +| test.c:21:17:21:17 | r | semmle.label | r | +| test.c:34:13:34:18 | call to rand | semmle.label | call to rand | +| test.c:34:13:34:18 | call to rand | semmle.label | call to rand | +| test.c:35:5:35:5 | r | semmle.label | r | +| test.c:35:5:35:5 | r | semmle.label | r | +| test.c:35:5:35:5 | r | semmle.label | r | +| test.c:39:13:39:21 | ... % ... | semmle.label | ... % ... | +| test.c:39:13:39:21 | ... % ... | semmle.label | ... % ... | +| test.c:40:5:40:5 | r | semmle.label | r | +| test.c:40:5:40:5 | r | semmle.label | r | +| test.c:40:5:40:5 | r | semmle.label | r | +| test.c:44:13:44:16 | call to rand | semmle.label | call to rand | +| test.c:44:13:44:16 | call to rand | semmle.label | call to rand | +| test.c:45:5:45:5 | r | semmle.label | r | +| test.c:45:5:45:5 | r | semmle.label | r | +| test.c:45:5:45:5 | r | semmle.label | r | +| test.c:54:13:54:16 | call to rand | semmle.label | call to rand | +| test.c:54:13:54:16 | call to rand | semmle.label | call to rand | +| test.c:56:5:56:5 | r | semmle.label | r | +| test.c:56:5:56:5 | r | semmle.label | r | +| test.c:56:5:56:5 | r | semmle.label | r | +| test.c:60:13:60:16 | call to rand | semmle.label | call to rand | +| test.c:60:13:60:16 | call to rand | semmle.label | call to rand | +| test.c:61:5:61:5 | r | semmle.label | r | +| test.c:61:5:61:5 | r | semmle.label | r | +| test.c:61:5:61:5 | r | semmle.label | r | +| test.c:62:5:62:5 | r | semmle.label | r | +| test.c:62:5:62:5 | r | semmle.label | r | +| test.c:62:5:62:5 | r | semmle.label | r | +| test.c:66:13:66:16 | call to rand | semmle.label | call to rand | +| test.c:66:13:66:16 | call to rand | semmle.label | call to rand | +| test.c:67:5:67:5 | r | semmle.label | r | +| test.c:67:5:67:5 | r | semmle.label | r | +| test.c:67:5:67:5 | r | semmle.label | r | +| test.c:75:13:75:19 | ... ^ ... | semmle.label | ... ^ ... | +| test.c:75:13:75:19 | ... ^ ... | semmle.label | ... ^ ... | +| test.c:77:9:77:9 | r | semmle.label | r | +| test.c:77:9:77:9 | r | semmle.label | r | +| test.c:77:9:77:9 | r | semmle.label | r | +| test.c:99:14:99:19 | call to rand | semmle.label | call to rand | +| test.c:99:14:99:19 | call to rand | semmle.label | call to rand | +| test.c:100:5:100:5 | r | semmle.label | r | +| test.c:100:5:100:5 | r | semmle.label | r | +| test.c:100:5:100:5 | r | semmle.label | r | +| test.cpp:8:9:8:12 | Store | semmle.label | Store | +| test.cpp:8:9:8:12 | call to rand | semmle.label | call to rand | +| test.cpp:8:9:8:12 | call to rand | semmle.label | call to rand | +| test.cpp:13:2:13:15 | Chi | semmle.label | Chi | +| test.cpp:13:10:13:13 | call to rand | semmle.label | call to rand | +| test.cpp:13:10:13:13 | call to rand | semmle.label | call to rand | +| test.cpp:18:2:18:14 | Chi | semmle.label | Chi | +| test.cpp:18:9:18:12 | call to rand | semmle.label | call to rand | +| test.cpp:18:9:18:12 | call to rand | semmle.label | call to rand | +| test.cpp:24:11:24:18 | call to get_rand | semmle.label | call to get_rand | +| test.cpp:25:7:25:7 | r | semmle.label | r | +| test.cpp:25:7:25:7 | r | semmle.label | r | +| test.cpp:25:7:25:7 | r | semmle.label | r | +| test.cpp:30:13:30:14 | get_rand2 output argument | semmle.label | get_rand2 output argument | +| test.cpp:31:7:31:7 | r | semmle.label | r | +| test.cpp:31:7:31:7 | r | semmle.label | r | +| test.cpp:31:7:31:7 | r | semmle.label | r | +| test.cpp:36:13:36:13 | get_rand3 output argument | semmle.label | get_rand3 output argument | +| test.cpp:37:7:37:7 | r | semmle.label | r | +| test.cpp:37:7:37:7 | r | semmle.label | r | +| test.cpp:37:7:37:7 | r | semmle.label | r | +#select +| test.c:21:17:21:17 | r | test.c:18:13:18:16 | call to rand | test.c:21:17:21:17 | r | $@ flows to here and is used in arithmetic, potentially causing an overflow. | test.c:18:13:18:16 | call to rand | Uncontrolled value | +| test.c:35:5:35:5 | r | test.c:34:13:34:18 | call to rand | test.c:35:5:35:5 | r | $@ flows to here and is used in arithmetic, potentially causing an overflow. | test.c:34:13:34:18 | call to rand | Uncontrolled value | +| test.c:40:5:40:5 | r | test.c:39:13:39:21 | ... % ... | test.c:40:5:40:5 | r | $@ flows to here and is used in arithmetic, potentially causing an overflow. | test.c:39:13:39:21 | ... % ... | Uncontrolled value | +| test.c:45:5:45:5 | r | test.c:44:13:44:16 | call to rand | test.c:45:5:45:5 | r | $@ flows to here and is used in arithmetic, potentially causing an overflow. | test.c:44:13:44:16 | call to rand | Uncontrolled value | +| test.c:56:5:56:5 | r | test.c:54:13:54:16 | call to rand | test.c:56:5:56:5 | r | $@ flows to here and is used in arithmetic, potentially causing an overflow. | test.c:54:13:54:16 | call to rand | Uncontrolled value | +| test.c:67:5:67:5 | r | test.c:66:13:66:16 | call to rand | test.c:67:5:67:5 | r | $@ flows to here and is used in arithmetic, potentially causing an overflow. | test.c:66:13:66:16 | call to rand | Uncontrolled value | +| test.c:77:9:77:9 | r | test.c:75:13:75:19 | ... ^ ... | test.c:77:9:77:9 | r | $@ flows to here and is used in arithmetic, potentially causing an underflow. | test.c:75:13:75:19 | ... ^ ... | Uncontrolled value | +| test.c:100:5:100:5 | r | test.c:99:14:99:19 | call to rand | test.c:100:5:100:5 | r | $@ flows to here and is used in arithmetic, potentially causing an underflow. | test.c:99:14:99:19 | call to rand | Uncontrolled value | +| test.cpp:25:7:25:7 | r | test.cpp:8:9:8:12 | call to rand | test.cpp:25:7:25:7 | r | $@ flows to here and is used in arithmetic, potentially causing an overflow. | test.cpp:8:9:8:12 | call to rand | Uncontrolled value | +| test.cpp:31:7:31:7 | r | test.cpp:13:10:13:13 | call to rand | test.cpp:31:7:31:7 | r | $@ flows to here and is used in arithmetic, potentially causing an overflow. | test.cpp:13:10:13:13 | call to rand | Uncontrolled value | +| test.cpp:37:7:37:7 | r | test.cpp:18:9:18:12 | call to rand | test.cpp:37:7:37:7 | r | $@ flows to here and is used in arithmetic, potentially causing an overflow. | test.cpp:18:9:18:12 | call to rand | Uncontrolled value | diff --git a/cpp/ql/test/query-tests/Security/CWE/CWE-290/semmle/AuthenticationBypass/AuthenticationBypass.expected b/cpp/ql/test/query-tests/Security/CWE/CWE-290/semmle/AuthenticationBypass/AuthenticationBypass.expected index ded1524cfca..e2671c07952 100644 --- a/cpp/ql/test/query-tests/Security/CWE/CWE-290/semmle/AuthenticationBypass/AuthenticationBypass.expected +++ b/cpp/ql/test/query-tests/Security/CWE/CWE-290/semmle/AuthenticationBypass/AuthenticationBypass.expected @@ -1,3 +1,33 @@ -| test.cpp:20:7:20:12 | call to strcmp | Untrusted input $@ might be vulnerable to a spoofing attack. | test.cpp:16:25:16:30 | call to getenv | call to getenv | -| test.cpp:31:7:31:12 | call to strcmp | Untrusted input $@ might be vulnerable to a spoofing attack. | test.cpp:27:25:27:30 | call to getenv | call to getenv | -| test.cpp:42:7:42:12 | call to strcmp | Untrusted input $@ might be vulnerable to a spoofing attack. | test.cpp:38:25:38:30 | call to getenv | call to getenv | +edges +| test.cpp:16:25:16:30 | call to getenv | test.cpp:20:14:20:20 | address | +| test.cpp:16:25:16:30 | call to getenv | test.cpp:20:14:20:20 | address | +| test.cpp:16:25:16:42 | (const char *)... | test.cpp:20:14:20:20 | address | +| test.cpp:16:25:16:42 | (const char *)... | test.cpp:20:14:20:20 | address | +| test.cpp:27:25:27:30 | call to getenv | test.cpp:31:14:31:20 | address | +| test.cpp:27:25:27:30 | call to getenv | test.cpp:31:14:31:20 | address | +| test.cpp:27:25:27:42 | (const char *)... | test.cpp:31:14:31:20 | address | +| test.cpp:27:25:27:42 | (const char *)... | test.cpp:31:14:31:20 | address | +| test.cpp:38:25:38:30 | call to getenv | test.cpp:42:14:42:20 | address | +| test.cpp:38:25:38:30 | call to getenv | test.cpp:42:14:42:20 | address | +| test.cpp:38:25:38:42 | (const char *)... | test.cpp:42:14:42:20 | address | +| test.cpp:38:25:38:42 | (const char *)... | test.cpp:42:14:42:20 | address | +nodes +| test.cpp:16:25:16:30 | call to getenv | semmle.label | call to getenv | +| test.cpp:16:25:16:42 | (const char *)... | semmle.label | (const char *)... | +| test.cpp:20:14:20:20 | address | semmle.label | address | +| test.cpp:20:14:20:20 | address | semmle.label | address | +| test.cpp:20:14:20:20 | address | semmle.label | address | +| test.cpp:27:25:27:30 | call to getenv | semmle.label | call to getenv | +| test.cpp:27:25:27:42 | (const char *)... | semmle.label | (const char *)... | +| test.cpp:31:14:31:20 | address | semmle.label | address | +| test.cpp:31:14:31:20 | address | semmle.label | address | +| test.cpp:31:14:31:20 | address | semmle.label | address | +| test.cpp:38:25:38:30 | call to getenv | semmle.label | call to getenv | +| test.cpp:38:25:38:42 | (const char *)... | semmle.label | (const char *)... | +| test.cpp:42:14:42:20 | address | semmle.label | address | +| test.cpp:42:14:42:20 | address | semmle.label | address | +| test.cpp:42:14:42:20 | address | semmle.label | address | +#select +| test.cpp:20:7:20:12 | call to strcmp | test.cpp:16:25:16:30 | call to getenv | test.cpp:20:14:20:20 | address | Untrusted input $@ might be vulnerable to a spoofing attack. | test.cpp:16:25:16:30 | call to getenv | call to getenv | +| test.cpp:31:7:31:12 | call to strcmp | test.cpp:27:25:27:30 | call to getenv | test.cpp:31:14:31:20 | address | Untrusted input $@ might be vulnerable to a spoofing attack. | test.cpp:27:25:27:30 | call to getenv | call to getenv | +| test.cpp:42:7:42:12 | call to strcmp | test.cpp:38:25:38:30 | call to getenv | test.cpp:42:14:42:20 | address | Untrusted input $@ might be vulnerable to a spoofing attack. | test.cpp:38:25:38:30 | call to getenv | call to getenv | diff --git a/cpp/ql/test/query-tests/Security/CWE/CWE-311/semmle/tests/CleartextBufferWrite.expected b/cpp/ql/test/query-tests/Security/CWE/CWE-311/semmle/tests/CleartextBufferWrite.expected index 30b971f13e7..f78d75eeed1 100644 --- a/cpp/ql/test/query-tests/Security/CWE/CWE-311/semmle/tests/CleartextBufferWrite.expected +++ b/cpp/ql/test/query-tests/Security/CWE/CWE-311/semmle/tests/CleartextBufferWrite.expected @@ -1 +1,13 @@ -| test.cpp:58:3:58:9 | call to sprintf | This write into buffer 'passwd' may contain unencrypted data from $@ | test.cpp:54:17:54:20 | argv | user input (argv) | +edges +| test.cpp:54:17:54:20 | argv | test.cpp:58:25:58:29 | input | +| test.cpp:54:17:54:20 | argv | test.cpp:58:25:58:29 | input | +| test.cpp:54:17:54:20 | argv | test.cpp:58:25:58:29 | input | +| test.cpp:54:17:54:20 | argv | test.cpp:58:25:58:29 | input | +nodes +| test.cpp:54:17:54:20 | argv | semmle.label | argv | +| test.cpp:54:17:54:20 | argv | semmle.label | argv | +| test.cpp:58:25:58:29 | input | semmle.label | input | +| test.cpp:58:25:58:29 | input | semmle.label | input | +| test.cpp:58:25:58:29 | input | semmle.label | input | +#select +| test.cpp:58:3:58:9 | call to sprintf | test.cpp:54:17:54:20 | argv | test.cpp:58:25:58:29 | input | This write into buffer 'passwd' may contain unencrypted data from $@ | test.cpp:54:17:54:20 | argv | user input (argv) | diff --git a/cpp/ql/test/query-tests/Security/CWE/CWE-807/semmle/TaintedCondition/TaintedCondition.expected b/cpp/ql/test/query-tests/Security/CWE/CWE-807/semmle/TaintedCondition/TaintedCondition.expected index 0ae48831c80..cc5004047cf 100644 --- a/cpp/ql/test/query-tests/Security/CWE/CWE-807/semmle/TaintedCondition/TaintedCondition.expected +++ b/cpp/ql/test/query-tests/Security/CWE/CWE-807/semmle/TaintedCondition/TaintedCondition.expected @@ -1,2 +1,37 @@ -| test.cpp:24:10:24:35 | ! ... | Reliance on untrusted input $@ to raise privilege at $@ | test.cpp:20:29:20:34 | call to getenv | call to getenv | test.cpp:25:9:25:27 | ... = ... | ... = ... | -| test.cpp:41:10:41:38 | ! ... | Reliance on untrusted input $@ to raise privilege at $@ | test.cpp:20:29:20:34 | call to getenv | call to getenv | test.cpp:42:8:42:26 | ... = ... | ... = ... | +edges +| test.cpp:20:29:20:34 | call to getenv | test.cpp:24:10:24:35 | ! ... | +| test.cpp:20:29:20:34 | call to getenv | test.cpp:24:11:24:16 | call to strcmp | +| test.cpp:20:29:20:34 | call to getenv | test.cpp:24:11:24:16 | call to strcmp | +| test.cpp:20:29:20:34 | call to getenv | test.cpp:24:11:24:35 | (bool)... | +| test.cpp:20:29:20:34 | call to getenv | test.cpp:41:10:41:38 | ! ... | +| test.cpp:20:29:20:34 | call to getenv | test.cpp:41:11:41:16 | call to strcmp | +| test.cpp:20:29:20:34 | call to getenv | test.cpp:41:11:41:16 | call to strcmp | +| test.cpp:20:29:20:34 | call to getenv | test.cpp:41:11:41:38 | (bool)... | +| test.cpp:20:29:20:47 | (const char *)... | test.cpp:24:10:24:35 | ! ... | +| test.cpp:20:29:20:47 | (const char *)... | test.cpp:24:11:24:16 | call to strcmp | +| test.cpp:20:29:20:47 | (const char *)... | test.cpp:24:11:24:16 | call to strcmp | +| test.cpp:20:29:20:47 | (const char *)... | test.cpp:24:11:24:35 | (bool)... | +| test.cpp:20:29:20:47 | (const char *)... | test.cpp:41:10:41:38 | ! ... | +| test.cpp:20:29:20:47 | (const char *)... | test.cpp:41:11:41:16 | call to strcmp | +| test.cpp:20:29:20:47 | (const char *)... | test.cpp:41:11:41:16 | call to strcmp | +| test.cpp:20:29:20:47 | (const char *)... | test.cpp:41:11:41:38 | (bool)... | +| test.cpp:24:11:24:16 | call to strcmp | test.cpp:24:10:24:35 | ! ... | +| test.cpp:24:11:24:16 | call to strcmp | test.cpp:24:11:24:35 | (bool)... | +| test.cpp:41:11:41:16 | call to strcmp | test.cpp:41:10:41:38 | ! ... | +| test.cpp:41:11:41:16 | call to strcmp | test.cpp:41:11:41:38 | (bool)... | +nodes +| test.cpp:20:29:20:34 | call to getenv | semmle.label | call to getenv | +| test.cpp:20:29:20:47 | (const char *)... | semmle.label | (const char *)... | +| test.cpp:24:10:24:35 | ! ... | semmle.label | ! ... | +| test.cpp:24:11:24:16 | call to strcmp | semmle.label | call to strcmp | +| test.cpp:24:11:24:16 | call to strcmp | semmle.label | call to strcmp | +| test.cpp:24:11:24:35 | (bool)... | semmle.label | (bool)... | +| test.cpp:24:11:24:35 | (bool)... | semmle.label | (bool)... | +| test.cpp:41:10:41:38 | ! ... | semmle.label | ! ... | +| test.cpp:41:11:41:16 | call to strcmp | semmle.label | call to strcmp | +| test.cpp:41:11:41:16 | call to strcmp | semmle.label | call to strcmp | +| test.cpp:41:11:41:38 | (bool)... | semmle.label | (bool)... | +| test.cpp:41:11:41:38 | (bool)... | semmle.label | (bool)... | +#select +| test.cpp:24:10:24:35 | ! ... | test.cpp:20:29:20:34 | call to getenv | test.cpp:24:10:24:35 | ! ... | Reliance on untrusted input $@ to raise privilege at $@ | test.cpp:20:29:20:34 | call to getenv | call to getenv | test.cpp:25:9:25:27 | ... = ... | ... = ... | +| test.cpp:41:10:41:38 | ! ... | test.cpp:20:29:20:34 | call to getenv | test.cpp:41:10:41:38 | ! ... | Reliance on untrusted input $@ to raise privilege at $@ | test.cpp:20:29:20:34 | call to getenv | call to getenv | test.cpp:42:8:42:26 | ... = ... | ... = ... | diff --git a/csharp/autobuilder/Semmle.Autobuild.Tests/BuildScripts.cs b/csharp/autobuilder/Semmle.Autobuild.Tests/BuildScripts.cs index a2e59cf7ac3..b66fa4ff399 100644 --- a/csharp/autobuilder/Semmle.Autobuild.Tests/BuildScripts.cs +++ b/csharp/autobuilder/Semmle.Autobuild.Tests/BuildScripts.cs @@ -341,6 +341,8 @@ namespace Semmle.Extraction.Tests string cwd = @"C:\Project") { Actions.GetEnvironmentVariable["CODEQL_AUTOBUILDER_CSHARP_NO_INDEXING"] = "false"; + Actions.GetEnvironmentVariable["CODEQL_EXTRACTOR_CSHARP_TRAP_DIR"] = ""; + Actions.GetEnvironmentVariable["CODEQL_EXTRACTOR_CSHARP_SOURCE_ARCHIVE_DIR"] = ""; Actions.GetEnvironmentVariable["CODEQL_EXTRACTOR_CSHARP_ROOT"] = @"C:\codeql\csharp"; Actions.GetEnvironmentVariable["CODEQL_JAVA_HOME"] = @"C:\codeql\tools\java"; Actions.GetEnvironmentVariable["SEMMLE_DIST"] = @"C:\odasa"; @@ -364,8 +366,7 @@ namespace Semmle.Extraction.Tests Actions.GetCurrentDirectory = cwd; Actions.IsWindows = isWindows; - var options = new AutobuildOptions(); - options.ReadEnvironment(Actions); + var options = new AutobuildOptions(Actions); return new Autobuilder(Actions, options); } diff --git a/csharp/autobuilder/Semmle.Autobuild/AspBuildRule.cs b/csharp/autobuilder/Semmle.Autobuild/AspBuildRule.cs index b5b3dfcf61d..f9c690c273b 100644 --- a/csharp/autobuilder/Semmle.Autobuild/AspBuildRule.cs +++ b/csharp/autobuilder/Semmle.Autobuild/AspBuildRule.cs @@ -7,10 +7,9 @@ { public BuildScript Analyse(Autobuilder builder, bool auto) { - (var javaHome, var dist) = - builder.CodeQLJavaHome != null ? - (builder.CodeQLJavaHome, builder.CodeQLExtractorCSharpRoot) : - (builder.SemmleJavaHome, builder.SemmleDist); + var javaHome = builder.JavaHome; + var dist = builder.Distribution; + var command = new CommandBuilder(builder.Actions). RunCommand(builder.Actions.PathCombine(javaHome, "bin", "java")). Argument("-jar"). diff --git a/csharp/autobuilder/Semmle.Autobuild/AutobuildOptions.cs b/csharp/autobuilder/Semmle.Autobuild/AutobuildOptions.cs index 740b740f9fb..9786e4dcca6 100644 --- a/csharp/autobuilder/Semmle.Autobuild/AutobuildOptions.cs +++ b/csharp/autobuilder/Semmle.Autobuild/AutobuildOptions.cs @@ -10,40 +10,40 @@ namespace Semmle.Autobuild public class AutobuildOptions { public readonly int SearchDepth = 3; - public string RootDirectory = null; - static readonly string prefix = "LGTM_INDEX_"; + public readonly string RootDirectory; + private const string prefix = "LGTM_INDEX_"; - public string VsToolsVersion; - public string MsBuildArguments; - public string MsBuildPlatform; - public string MsBuildConfiguration; - public string MsBuildTarget; - public string DotNetArguments; - public string DotNetVersion; - public string BuildCommand; - public string[] Solution; + public readonly string? VsToolsVersion; + public readonly string? MsBuildArguments; + public readonly string? MsBuildPlatform; + public readonly string? MsBuildConfiguration; + public readonly string? MsBuildTarget; + public readonly string? DotNetArguments; + public readonly string? DotNetVersion; + public readonly string? BuildCommand; + public readonly string[] Solution; - public bool IgnoreErrors; - public bool Buildless; - public bool AllSolutions; - public bool NugetRestore; + public readonly bool IgnoreErrors; + public readonly bool Buildless; + public readonly bool AllSolutions; + public readonly bool NugetRestore; - public Language Language; - public bool Indexing; + public readonly Language Language; + public readonly bool Indexing; /// /// Reads options from environment variables. /// Throws ArgumentOutOfRangeException for invalid arguments. /// - public void ReadEnvironment(IBuildActions actions) + public AutobuildOptions(IBuildActions actions) { RootDirectory = actions.GetCurrentDirectory(); VsToolsVersion = actions.GetEnvironmentVariable(prefix + "VSTOOLS_VERSION"); - MsBuildArguments = actions.GetEnvironmentVariable(prefix + "MSBUILD_ARGUMENTS").AsStringWithExpandedEnvVars(actions); + MsBuildArguments = actions.GetEnvironmentVariable(prefix + "MSBUILD_ARGUMENTS")?.AsStringWithExpandedEnvVars(actions); MsBuildPlatform = actions.GetEnvironmentVariable(prefix + "MSBUILD_PLATFORM"); MsBuildConfiguration = actions.GetEnvironmentVariable(prefix + "MSBUILD_CONFIGURATION"); MsBuildTarget = actions.GetEnvironmentVariable(prefix + "MSBUILD_TARGET"); - DotNetArguments = actions.GetEnvironmentVariable(prefix + "DOTNET_ARGUMENTS").AsStringWithExpandedEnvVars(actions); + DotNetArguments = actions.GetEnvironmentVariable(prefix + "DOTNET_ARGUMENTS")?.AsStringWithExpandedEnvVars(actions); DotNetVersion = actions.GetEnvironmentVariable(prefix + "DOTNET_VERSION"); BuildCommand = actions.GetEnvironmentVariable(prefix + "BUILD_COMMAND"); Solution = actions.GetEnvironmentVariable(prefix + "SOLUTION").AsListWithExpandedEnvVars(actions, new string[0]); @@ -60,7 +60,7 @@ namespace Semmle.Autobuild public static class OptionsExtensions { - public static bool AsBool(this string value, string param, bool defaultValue) + public static bool AsBool(this string? value, string param, bool defaultValue) { if (value == null) return defaultValue; switch (value.ToLower()) @@ -80,7 +80,7 @@ namespace Semmle.Autobuild } } - public static Language AsLanguage(this string key) + public static Language AsLanguage(this string? key) { switch (key) { @@ -95,7 +95,7 @@ namespace Semmle.Autobuild } } - public static string[] AsListWithExpandedEnvVars(this string value, IBuildActions actions, string[] defaultValue) + public static string[] AsListWithExpandedEnvVars(this string? value, IBuildActions actions, string[] defaultValue) { if (value == null) return defaultValue; diff --git a/csharp/autobuilder/Semmle.Autobuild/Autobuilder.cs b/csharp/autobuilder/Semmle.Autobuild/Autobuilder.cs index 73e6fcd181f..f6dbdae788c 100644 --- a/csharp/autobuilder/Semmle.Autobuild/Autobuilder.cs +++ b/csharp/autobuilder/Semmle.Autobuild/Autobuilder.cs @@ -20,6 +20,14 @@ namespace Semmle.Autobuild BuildScript Analyse(Autobuilder builder, bool auto); } + /// + /// Exception indicating that environment variables are missing or invalid. + /// + class InvalidEnvironmentException : Exception + { + public InvalidEnvironmentException(string m) : base(m) { } + } + /// /// Main application logic, containing all data /// gathered from the project and filesystem. @@ -69,7 +77,7 @@ namespace Semmle.Autobuild /// List of project/solution files to build. /// public IList ProjectsOrSolutionsToBuild => projectsOrSolutionsToBuildLazy.Value; - readonly Lazy> projectsOrSolutionsToBuildLazy; + private readonly Lazy> projectsOrSolutionsToBuildLazy; /// /// Holds if a given path was found. @@ -129,7 +137,7 @@ namespace Semmle.Autobuild projectsOrSolutionsToBuildLazy = new Lazy>(() => { - List ret; + List? ret; if (options.Solution.Any()) { ret = new List(); @@ -143,7 +151,7 @@ namespace Semmle.Autobuild return ret; } - IEnumerable FindFiles(string extension, Func create) + IEnumerable? FindFiles(string extension, Func create) { var matchingFiles = GetExtensions(extension). Select(p => (ProjectOrSolution: create(p.Item1), DistanceFromRoot: p.Item2)). @@ -177,19 +185,34 @@ namespace Semmle.Autobuild }); CodeQLExtractorCSharpRoot = Actions.GetEnvironmentVariable("CODEQL_EXTRACTOR_CSHARP_ROOT"); - - CodeQLJavaHome = Actions.GetEnvironmentVariable("CODEQL_JAVA_HOME"); - SemmleDist = Actions.GetEnvironmentVariable("SEMMLE_DIST"); - - SemmleJavaHome = Actions.GetEnvironmentVariable("SEMMLE_JAVA_HOME"); - SemmlePlatformTools = Actions.GetEnvironmentVariable("SEMMLE_PLATFORM_TOOLS"); - if (CodeQLExtractorCSharpRoot == null && SemmleDist == null) - Log(Severity.Error, "The environment variables CODEQL_EXTRACTOR_CSHARP_ROOT and SEMMLE_DIST have not been set."); + JavaHome = + Actions.GetEnvironmentVariable("CODEQL_JAVA_HOME") ?? + Actions.GetEnvironmentVariable("SEMMLE_JAVA_HOME") ?? + throw new InvalidEnvironmentException("The environment variable CODEQL_JAVA_HOME or SEMMLE_JAVA_HOME has not been set."); + + Distribution = + CodeQLExtractorCSharpRoot ?? + SemmleDist ?? + throw new InvalidEnvironmentException("The environment variable CODEQL_EXTRACTOR_CSHARP_ROOT or SEMMLE_DIST has not been set."); + + TrapDir = + Actions.GetEnvironmentVariable("CODEQL_EXTRACTOR_CSHARP_TRAP_DIR") ?? + Actions.GetEnvironmentVariable("TRAP_FOLDER") ?? + throw new InvalidEnvironmentException("The environment variable CODEQL_EXTRACTOR_CSHARP_TRAP_DIR or TRAP_FOLDER has not been set."); + + SourceArchiveDir = + Actions.GetEnvironmentVariable("CODEQL_EXTRACTOR_CSHARP_SOURCE_ARCHIVE_DIR") ?? + Actions.GetEnvironmentVariable("SOURCE_ARCHIVE") ?? + throw new InvalidEnvironmentException("The environment variable CODEQL_EXTRACTOR_CSHARP_SOURCE_ARCHIVE_DIR or SOURCE_ARCHIVE has not been set."); } + private string TrapDir { get; } + + private string SourceArchiveDir { get; } + readonly ILogger logger = new ConsoleLogger(Verbosity.Info); /// @@ -271,9 +294,9 @@ namespace Semmle.Autobuild break; case CSharpBuildStrategy.Auto: var cleanTrapFolder = - BuildScript.DeleteDirectory(Actions.GetEnvironmentVariable("CODEQL_EXTRACTOR_CSHARP_TRAP_DIR") ?? Actions.GetEnvironmentVariable("TRAP_FOLDER")); + BuildScript.DeleteDirectory(TrapDir); var cleanSourceArchive = - BuildScript.DeleteDirectory(Actions.GetEnvironmentVariable("CODEQL_EXTRACTOR_CSHARP_SOURCE_ARCHIVE_DIR") ?? Actions.GetEnvironmentVariable("SOURCE_ARCHIVE")); + BuildScript.DeleteDirectory(SourceArchiveDir); var tryCleanExtractorArgsLogs = BuildScript.Create(actions => { @@ -376,38 +399,33 @@ namespace Semmle.Autobuild /// /// Value of CODEQL_EXTRACTOR_CSHARP_ROOT environment variable. /// - public string CodeQLExtractorCSharpRoot { get; private set; } - - /// - /// Value of CODEQL_JAVA_HOME environment variable. - /// - public string CodeQLJavaHome { get; private set; } + private string? CodeQLExtractorCSharpRoot { get; } /// /// Value of SEMMLE_DIST environment variable. /// - public string SemmleDist { get; private set; } + private string? SemmleDist { get; } - /// - /// Value of SEMMLE_JAVA_HOME environment variable. - /// - public string SemmleJavaHome { get; private set; } + public string Distribution { get; } + + public string JavaHome { get; } /// /// Value of SEMMLE_PLATFORM_TOOLS environment variable. /// - public string SemmlePlatformTools { get; private set; } + public string? SemmlePlatformTools { get; } /// /// The absolute path of the odasa executable. + /// null if we are running in CodeQL. /// - public string Odasa => SemmleDist == null ? null : Actions.PathCombine(SemmleDist, "tools", "odasa"); + public string? Odasa => SemmleDist is null ? null : Actions.PathCombine(SemmleDist, "tools", "odasa"); /// /// Construct a command that executed the given wrapped in /// an odasa --index, unless indexing has been disabled, in which case /// is run directly. /// - internal CommandBuilder MaybeIndex(CommandBuilder builder, string cmd) => Options.Indexing ? builder.IndexCommand(Odasa, cmd) : builder.RunCommand(cmd); + internal CommandBuilder MaybeIndex(CommandBuilder builder, string cmd) => Options.Indexing && !(Odasa is null) ? builder.IndexCommand(Odasa, cmd) : builder.RunCommand(cmd); } } diff --git a/csharp/autobuilder/Semmle.Autobuild/BuildActions.cs b/csharp/autobuilder/Semmle.Autobuild/BuildActions.cs index 837f6e3f69e..7bc4b9b7591 100644 --- a/csharp/autobuilder/Semmle.Autobuild/BuildActions.cs +++ b/csharp/autobuilder/Semmle.Autobuild/BuildActions.cs @@ -21,7 +21,7 @@ namespace Semmle.Autobuild /// Additional environment variables. /// The lines of stdout. /// The process exit code. - int RunProcess(string exe, string args, string workingDirectory, IDictionary env, out IList stdOut); + int RunProcess(string exe, string args, string? workingDirectory, IDictionary? env, out IList stdOut); /// /// Runs a process but does not capture its output. @@ -31,7 +31,7 @@ namespace Semmle.Autobuild /// The working directory (null for current directory). /// Additional environment variables. /// The process exit code. - int RunProcess(string exe, string args, string workingDirectory, IDictionary env); + int RunProcess(string exe, string args, string? workingDirectory, IDictionary? env); /// /// Tests whether a file exists, File.Exists(). @@ -63,7 +63,7 @@ namespace Semmle.Autobuild /// /// The name of the variable. /// The string value, or null if the variable is not defined. - string GetEnvironmentVariable(string name); + string? GetEnvironmentVariable(string name); /// /// Gets the current directory, Directory.GetCurrentDirectory(). @@ -130,7 +130,7 @@ namespace Semmle.Autobuild bool IBuildActions.FileExists(string file) => File.Exists(file); - ProcessStartInfo GetProcessStartInfo(string exe, string arguments, string workingDirectory, IDictionary environment, bool redirectStandardOutput) + ProcessStartInfo GetProcessStartInfo(string exe, string arguments, string? workingDirectory, IDictionary? environment, bool redirectStandardOutput) { var pi = new ProcessStartInfo(exe, arguments) { @@ -146,7 +146,7 @@ namespace Semmle.Autobuild return pi; } - int IBuildActions.RunProcess(string cmd, string args, string workingDirectory, IDictionary environment) + int IBuildActions.RunProcess(string cmd, string args, string? workingDirectory, IDictionary? environment) { var pi = GetProcessStartInfo(cmd, args, workingDirectory, environment, false); using (var p = Process.Start(pi)) @@ -156,7 +156,7 @@ namespace Semmle.Autobuild } } - int IBuildActions.RunProcess(string cmd, string args, string workingDirectory, IDictionary environment, out IList stdOut) + int IBuildActions.RunProcess(string cmd, string args, string? workingDirectory, IDictionary? environment, out IList stdOut) { var pi = GetProcessStartInfo(cmd, args, workingDirectory, environment, true); return pi.ReadOutput(out stdOut); @@ -166,7 +166,7 @@ namespace Semmle.Autobuild bool IBuildActions.DirectoryExists(string dir) => Directory.Exists(dir); - string IBuildActions.GetEnvironmentVariable(string name) => Environment.GetEnvironmentVariable(name); + string? IBuildActions.GetEnvironmentVariable(string name) => Environment.GetEnvironmentVariable(name); string IBuildActions.GetCurrentDirectory() => Directory.GetCurrentDirectory(); diff --git a/csharp/autobuilder/Semmle.Autobuild/BuildCommandAutoRule.cs b/csharp/autobuilder/Semmle.Autobuild/BuildCommandAutoRule.cs index e7e2c9255dd..80a819f403e 100644 --- a/csharp/autobuilder/Semmle.Autobuild/BuildCommandAutoRule.cs +++ b/csharp/autobuilder/Semmle.Autobuild/BuildCommandAutoRule.cs @@ -40,7 +40,7 @@ namespace Semmle.Autobuild chmod.RunCommand("/bin/chmod", $"u+x {scriptPath}"); var chmodScript = builder.Actions.IsWindows() ? BuildScript.Success : BuildScript.Try(chmod.Script); - var dir = Path.GetDirectoryName(scriptPath); + string? dir = Path.GetDirectoryName(scriptPath); // A specific .NET Core version may be required return chmodScript & DotNetRule.WithDotNet(builder, environment => diff --git a/csharp/autobuilder/Semmle.Autobuild/BuildScript.cs b/csharp/autobuilder/Semmle.Autobuild/BuildScript.cs index 93ea941d58b..5c60f4110ba 100644 --- a/csharp/autobuilder/Semmle.Autobuild/BuildScript.cs +++ b/csharp/autobuilder/Semmle.Autobuild/BuildScript.cs @@ -48,8 +48,9 @@ namespace Semmle.Autobuild class BuildCommand : BuildScript { - readonly string exe, arguments, workingDirectory; - readonly IDictionary environment; + readonly string exe, arguments; + readonly string? workingDirectory; + readonly IDictionary? environment; readonly bool silent; /// @@ -60,7 +61,7 @@ namespace Semmle.Autobuild /// Whether this command should run silently. /// The working directory (null for current directory). /// Additional environment variables. - public BuildCommand(string exe, string argumentsOpt, bool silent, string workingDirectory = null, IDictionary environment = null) + public BuildCommand(string exe, string argumentsOpt, bool silent, string? workingDirectory = null, IDictionary? environment = null) { this.exe = exe; this.arguments = argumentsOpt ?? ""; @@ -131,8 +132,8 @@ namespace Semmle.Autobuild class BindBuildScript : BuildScript { readonly BuildScript s1; - readonly Func, int, BuildScript> s2a; - readonly Func s2b; + readonly Func, int, BuildScript>? s2a; + readonly Func? s2b; public BindBuildScript(BuildScript s1, Func, int, BuildScript> s2) { this.s1 = s1; @@ -154,14 +155,19 @@ namespace Semmle.Autobuild return s2a(stdout1, ret1).Run(actions, startCallback, exitCallBack); } - ret1 = s1.Run(actions, startCallback, exitCallBack); - return s2b(ret1).Run(actions, startCallback, exitCallBack); + if (s2b != null) + { + ret1 = s1.Run(actions, startCallback, exitCallBack); + return s2b(ret1).Run(actions, startCallback, exitCallBack); + } + + throw new InvalidOperationException("Unexpected error"); } public override int Run(IBuildActions actions, Action startCallback, Action exitCallBack, out IList stdout) { var ret1 = s1.Run(actions, startCallback, exitCallBack, out var stdout1); - var ret2 = (s2a != null ? s2a(stdout1, ret1) : s2b(ret1)).Run(actions, startCallback, exitCallBack, out var stdout2); + var ret2 = (s2a != null ? s2a(stdout1, ret1) : s2b!(ret1)).Run(actions, startCallback, exitCallBack, out var stdout2); var @out = new List(); @out.AddRange(stdout1); @out.AddRange(stdout2); @@ -177,7 +183,7 @@ namespace Semmle.Autobuild /// Whether the executable should run silently. /// The working directory (null for current directory). /// Additional environment variables. - public static BuildScript Create(string exe, string argumentsOpt, bool silent, string workingDirectory, IDictionary environment) => + public static BuildScript Create(string exe, string argumentsOpt, bool silent, string? workingDirectory, IDictionary? environment) => new BuildCommand(exe, argumentsOpt, silent, workingDirectory, environment); /// diff --git a/csharp/autobuilder/Semmle.Autobuild/CommandBuilder.cs b/csharp/autobuilder/Semmle.Autobuild/CommandBuilder.cs index 7b95e495697..444865400f4 100644 --- a/csharp/autobuilder/Semmle.Autobuild/CommandBuilder.cs +++ b/csharp/autobuilder/Semmle.Autobuild/CommandBuilder.cs @@ -13,10 +13,10 @@ namespace Semmle.Autobuild readonly StringBuilder arguments; bool firstCommand; - string executable; + string? executable; readonly EscapeMode escapingMode; - readonly string workingDirectory; - readonly IDictionary environment; + readonly string? workingDirectory; + readonly IDictionary? environment; readonly bool silent; /// @@ -25,7 +25,7 @@ namespace Semmle.Autobuild /// The working directory (null for current directory). /// Additional environment variables. /// Whether this command should be run silently. - public CommandBuilder(IBuildActions actions, string workingDirectory = null, IDictionary environment = null, bool silent = false) + public CommandBuilder(IBuildActions actions, string? workingDirectory = null, IDictionary? environment = null, bool silent = false) { arguments = new StringBuilder(); if (actions.IsWindows()) @@ -50,7 +50,7 @@ namespace Semmle.Autobuild RunCommand(odasa, "index --auto"); } - public CommandBuilder CallBatFile(string batFile, string argumentsOpt = null) + public CommandBuilder CallBatFile(string batFile, string? argumentsOpt = null) { NextCommand(); arguments.Append(" CALL"); @@ -66,7 +66,7 @@ namespace Semmle.Autobuild /// The command to run. /// Additional arguments. /// this for chaining calls. - public CommandBuilder IndexCommand(string odasa, string command, string argumentsOpt = null) + public CommandBuilder IndexCommand(string odasa, string command, string? argumentsOpt = null) { OdasaIndex(odasa); QuoteArgument(command); @@ -151,7 +151,7 @@ namespace Semmle.Autobuild arguments.Append(' '); } - public CommandBuilder Argument(string argumentsOpt) + public CommandBuilder Argument(string? argumentsOpt) { if (argumentsOpt != null) { @@ -169,7 +169,7 @@ namespace Semmle.Autobuild arguments.Append(" &&"); } - public CommandBuilder RunCommand(string exe, string argumentsOpt = null) + public CommandBuilder RunCommand(string exe, string? argumentsOpt = null) { var (exe0, arg0) = escapingMode == EscapeMode.Process && exe.EndsWith(".exe", System.StringComparison.Ordinal) @@ -193,6 +193,14 @@ namespace Semmle.Autobuild /// /// Returns a build script that contains just this command. /// - public BuildScript Script => BuildScript.Create(executable, arguments.ToString(), silent, workingDirectory, environment); + public BuildScript Script + { + get + { + if (executable is null) + throw new System.InvalidOperationException("executable is null"); + return BuildScript.Create(executable, arguments.ToString(), silent, workingDirectory, environment); + } + } } } diff --git a/csharp/autobuilder/Semmle.Autobuild/DotNetRule.cs b/csharp/autobuilder/Semmle.Autobuild/DotNetRule.cs index f3b81665fca..21215af8434 100644 --- a/csharp/autobuilder/Semmle.Autobuild/DotNetRule.cs +++ b/csharp/autobuilder/Semmle.Autobuild/DotNetRule.cs @@ -56,13 +56,13 @@ namespace Semmle.Autobuild }); } - static BuildScript WithDotNet(Autobuilder builder, Func, bool, BuildScript> f) + static BuildScript WithDotNet(Autobuilder builder, Func?, bool, BuildScript> f) { - var installDir = builder.Actions.PathCombine(builder.Options.RootDirectory, ".dotnet"); + string? installDir = builder.Actions.PathCombine(builder.Options.RootDirectory, ".dotnet"); var installScript = DownloadDotNet(builder, installDir); return BuildScript.Bind(installScript, installed => { - Dictionary env; + Dictionary? env; if (installed == 0) { // The installation succeeded, so use the newly installed .NET Core @@ -120,7 +120,7 @@ namespace Semmle.Autobuild /// variables needed by the installed .NET Core (null when no variables /// are needed). /// - public static BuildScript WithDotNet(Autobuilder builder, Func, BuildScript> f) + public static BuildScript WithDotNet(Autobuilder builder, Func?, BuildScript> f) => WithDotNet(builder, (_1, env, _2) => f(env)); /// @@ -265,10 +265,10 @@ Invoke-Command -ScriptBlock $ScriptBlock"; return listSdks.Script; } - static string DotNetCommand(IBuildActions actions, string dotNetPath) => + static string DotNetCommand(IBuildActions actions, string? dotNetPath) => dotNetPath != null ? actions.PathCombine(dotNetPath, "dotnet") : "dotnet"; - BuildScript GetInfoCommand(IBuildActions actions, string dotNetPath, IDictionary environment) + BuildScript GetInfoCommand(IBuildActions actions, string? dotNetPath, IDictionary? environment) { var info = new CommandBuilder(actions, null, environment). RunCommand(DotNetCommand(actions, dotNetPath)). @@ -276,7 +276,7 @@ Invoke-Command -ScriptBlock $ScriptBlock"; return info.Script; } - CommandBuilder GetCleanCommand(IBuildActions actions, string dotNetPath, IDictionary environment) + CommandBuilder GetCleanCommand(IBuildActions actions, string? dotNetPath, IDictionary? environment) { var clean = new CommandBuilder(actions, null, environment). RunCommand(DotNetCommand(actions, dotNetPath)). @@ -284,7 +284,7 @@ Invoke-Command -ScriptBlock $ScriptBlock"; return clean; } - CommandBuilder GetRestoreCommand(IBuildActions actions, string dotNetPath, IDictionary environment) + CommandBuilder GetRestoreCommand(IBuildActions actions, string? dotNetPath, IDictionary? environment) { var restore = new CommandBuilder(actions, null, environment). RunCommand(DotNetCommand(actions, dotNetPath)). @@ -292,7 +292,7 @@ Invoke-Command -ScriptBlock $ScriptBlock"; return restore; } - static BuildScript GetInstalledRuntimesScript(IBuildActions actions, string dotNetPath, IDictionary environment) + static BuildScript GetInstalledRuntimesScript(IBuildActions actions, string? dotNetPath, IDictionary? environment) { var listSdks = new CommandBuilder(actions, environment: environment, silent: true). RunCommand(DotNetCommand(actions, dotNetPath)). @@ -309,7 +309,7 @@ Invoke-Command -ScriptBlock $ScriptBlock"; /// hence the need for CLR tracing), by adding a /// `/p:UseSharedCompilation=false` argument. /// - BuildScript GetBuildScript(Autobuilder builder, string dotNetPath, IDictionary environment, bool compatibleClr, string projOrSln) + BuildScript GetBuildScript(Autobuilder builder, string? dotNetPath, IDictionary? environment, bool compatibleClr, string projOrSln) { var build = new CommandBuilder(builder.Actions, null, environment); var script = builder.MaybeIndex(build, DotNetCommand(builder.Actions, dotNetPath)). diff --git a/csharp/autobuilder/Semmle.Autobuild/MsBuildRule.cs b/csharp/autobuilder/Semmle.Autobuild/MsBuildRule.cs index b0ee01b8d3c..f9ef06feb08 100644 --- a/csharp/autobuilder/Semmle.Autobuild/MsBuildRule.cs +++ b/csharp/autobuilder/Semmle.Autobuild/MsBuildRule.cs @@ -67,10 +67,10 @@ namespace Semmle.Autobuild string target = builder.Options.MsBuildTarget != null ? builder.Options.MsBuildTarget : "rebuild"; - string platform = builder.Options.MsBuildPlatform != null + string? platform = builder.Options.MsBuildPlatform != null ? builder.Options.MsBuildPlatform : projectOrSolution is ISolution s1 ? s1.DefaultPlatformName : null; - string configuration = builder.Options.MsBuildConfiguration != null + string? configuration = builder.Options.MsBuildConfiguration != null ? builder.Options.MsBuildConfiguration : projectOrSolution is ISolution s2 ? s2.DefaultConfigurationName : null; @@ -96,9 +96,9 @@ namespace Semmle.Autobuild /// /// Returns null when no version is specified. /// - public static VcVarsBatFile GetVcVarsBatFile(Autobuilder builder) + public static VcVarsBatFile? GetVcVarsBatFile(Autobuilder builder) { - VcVarsBatFile vsTools = null; + VcVarsBatFile? vsTools = null; if (builder.Options.VsToolsVersion != null) { diff --git a/csharp/autobuilder/Semmle.Autobuild/Program.cs b/csharp/autobuilder/Semmle.Autobuild/Program.cs index c4542864a09..e4bccb0e626 100644 --- a/csharp/autobuilder/Semmle.Autobuild/Program.cs +++ b/csharp/autobuilder/Semmle.Autobuild/Program.cs @@ -6,23 +6,27 @@ namespace Semmle.Autobuild { static int Main() { - var options = new AutobuildOptions(); - var actions = SystemBuildActions.Instance; try { - options.ReadEnvironment(actions); + var actions = SystemBuildActions.Instance; + var options = new AutobuildOptions(actions); + try + { + Console.WriteLine($"Semmle autobuilder for {options.Language}"); + var builder = new Autobuilder(actions, options); + return builder.AttemptBuild(); + } + catch(InvalidEnvironmentException ex) + { + Console.WriteLine("The environment is invalid: {0}", ex.Message); + } } catch (ArgumentOutOfRangeException ex) { Console.WriteLine("The value \"{0}\" for parameter \"{1}\" is invalid", ex.ActualValue, ex.ParamName); } - - var builder = new Autobuilder(actions, options); - - Console.WriteLine($"Semmle autobuilder for {options.Language}"); - - return builder.AttemptBuild(); + return 1; } } } diff --git a/csharp/autobuilder/Semmle.Autobuild/Project.cs b/csharp/autobuilder/Semmle.Autobuild/Project.cs index 0080f170c54..415ddcbc0f0 100644 --- a/csharp/autobuilder/Semmle.Autobuild/Project.cs +++ b/csharp/autobuilder/Semmle.Autobuild/Project.cs @@ -82,7 +82,7 @@ namespace Semmle.Autobuild foreach (var include in projectFileIncludes.Concat(projectFilesIncludes)) { var includePath = builder.Actions.PathCombine(include.Value.Split('\\', StringSplitOptions.RemoveEmptyEntries)); - ret.Add(new Project(builder, builder.Actions.PathCombine(Path.GetDirectoryName(this.FullPath), includePath))); + ret.Add(new Project(builder, builder.Actions.PathCombine(DirectoryName, includePath))); } return ret; }); diff --git a/csharp/autobuilder/Semmle.Autobuild/ProjectOrSolution.cs b/csharp/autobuilder/Semmle.Autobuild/ProjectOrSolution.cs index 53025345b69..13859a8c0eb 100644 --- a/csharp/autobuilder/Semmle.Autobuild/ProjectOrSolution.cs +++ b/csharp/autobuilder/Semmle.Autobuild/ProjectOrSolution.cs @@ -1,4 +1,5 @@ using System.Collections.Generic; +using System.IO; using System.Linq; namespace Semmle.Autobuild @@ -24,6 +25,8 @@ namespace Semmle.Autobuild { public string FullPath { get; private set; } + public string DirectoryName => Path.GetDirectoryName(FullPath) ?? ""; + protected ProjectOrSolution(Autobuilder builder, string path) { FullPath = builder.Actions.GetFullPath(path); diff --git a/csharp/autobuilder/Semmle.Autobuild/Semmle.Autobuild.csproj b/csharp/autobuilder/Semmle.Autobuild/Semmle.Autobuild.csproj index 5e3f2295ad0..63aab3b29fb 100644 --- a/csharp/autobuilder/Semmle.Autobuild/Semmle.Autobuild.csproj +++ b/csharp/autobuilder/Semmle.Autobuild/Semmle.Autobuild.csproj @@ -9,6 +9,7 @@ false win-x64;linux-x64;osx-x64 + enable diff --git a/csharp/autobuilder/Semmle.Autobuild/Solution.cs b/csharp/autobuilder/Semmle.Autobuild/Solution.cs index 661f46199f8..0429b9f420c 100644 --- a/csharp/autobuilder/Semmle.Autobuild/Solution.cs +++ b/csharp/autobuilder/Semmle.Autobuild/Solution.cs @@ -43,7 +43,7 @@ namespace Semmle.Autobuild /// class Solution : ProjectOrSolution, ISolution { - readonly SolutionFile solution; + readonly SolutionFile? solution; readonly IEnumerable includedProjects; public override IEnumerable IncludedProjects => includedProjects; @@ -81,7 +81,7 @@ namespace Semmle.Autobuild includedProjects = solution.ProjectsInOrder. Where(p => p.ProjectType == SolutionProjectType.KnownToBeMSBuildFormat). - Select(p => builder.Actions.PathCombine(Path.GetDirectoryName(path), builder.Actions.PathCombine(p.RelativePath.Split('\\', StringSplitOptions.RemoveEmptyEntries)))). + Select(p => builder.Actions.PathCombine(DirectoryName, builder.Actions.PathCombine(p.RelativePath.Split('\\', StringSplitOptions.RemoveEmptyEntries)))). Select(p => new Project(builder, p)). ToArray(); } diff --git a/csharp/autobuilder/Semmle.Autobuild/StandaloneBuildRule.cs b/csharp/autobuilder/Semmle.Autobuild/StandaloneBuildRule.cs index 366ce1f08fc..26bc84bb601 100644 --- a/csharp/autobuilder/Semmle.Autobuild/StandaloneBuildRule.cs +++ b/csharp/autobuilder/Semmle.Autobuild/StandaloneBuildRule.cs @@ -1,6 +1,4 @@ -using System.IO; - -namespace Semmle.Autobuild +namespace Semmle.Autobuild { /// /// Build using standalone extraction. @@ -9,8 +7,11 @@ namespace Semmle.Autobuild { public BuildScript Analyse(Autobuilder builder, bool auto) { - BuildScript GetCommand(string solution) + BuildScript GetCommand(string? solution) { + if (builder.SemmlePlatformTools is null) + return BuildScript.Failure; + var standalone = builder.Actions.PathCombine(builder.SemmlePlatformTools, "csharp", "Semmle.Extraction.CSharp.Standalone"); var cmd = new CommandBuilder(builder.Actions); cmd.RunCommand(standalone); diff --git a/csharp/autobuilder/Semmle.Autobuild/XmlBuildRule.cs b/csharp/autobuilder/Semmle.Autobuild/XmlBuildRule.cs index 5814c2b63d2..d9b05dbe0a9 100644 --- a/csharp/autobuilder/Semmle.Autobuild/XmlBuildRule.cs +++ b/csharp/autobuilder/Semmle.Autobuild/XmlBuildRule.cs @@ -7,7 +7,7 @@ { public BuildScript Analyse(Autobuilder builder, bool auto) { - if (!builder.Options.Indexing) + if (!builder.Options.Indexing || builder.Odasa is null) return BuildScript.Success; var command = new CommandBuilder(builder.Actions). diff --git a/csharp/extractor/Semmle.Extraction.CSharp.Standalone/AssemblyCache.cs b/csharp/extractor/Semmle.Extraction.CSharp.Standalone/AssemblyCache.cs index db2664bf4c9..f93911b8a38 100644 --- a/csharp/extractor/Semmle.Extraction.CSharp.Standalone/AssemblyCache.cs +++ b/csharp/extractor/Semmle.Extraction.CSharp.Standalone/AssemblyCache.cs @@ -163,7 +163,19 @@ namespace Semmle.BuildAnalyser /// /// The filename to query. /// The assembly info. - public AssemblyInfo GetAssemblyInfo(string filepath) => assemblyInfo[filepath]; + public AssemblyInfo GetAssemblyInfo(string filepath) + { + if(assemblyInfo.TryGetValue(filepath, out var info)) + { + return info; + } + else + { + info = AssemblyInfo.ReadFromFile(filepath); + assemblyInfo.Add(filepath, info); + return info; + } + } // List of pending DLLs to index. readonly List dlls = new List(); diff --git a/csharp/extractor/Semmle.Extraction.CSharp.Standalone/BuildAnalysis.cs b/csharp/extractor/Semmle.Extraction.CSharp.Standalone/BuildAnalysis.cs index b0ef328bcd0..2894222ca89 100644 --- a/csharp/extractor/Semmle.Extraction.CSharp.Standalone/BuildAnalysis.cs +++ b/csharp/extractor/Semmle.Extraction.CSharp.Standalone/BuildAnalysis.cs @@ -1,10 +1,13 @@ -using System; +using Semmle.Util; +using System; using System.Collections.Generic; using System.IO; using System.Linq; -using System.Runtime.InteropServices; -using Semmle.Util; using Semmle.Extraction.CSharp.Standalone; +using System.Threading.Tasks; +using System.Collections.Concurrent; +using System.Text; +using System.Security.Cryptography; namespace Semmle.BuildAnalyser { @@ -43,19 +46,18 @@ namespace Semmle.BuildAnalyser /// /// Main implementation of the build analysis. /// - class BuildAnalysis : IBuildAnalysis + class BuildAnalysis : IBuildAnalysis, IDisposable { - readonly AssemblyCache assemblyCache; - readonly NugetPackages nuget; - readonly IProgressMonitor progressMonitor; - HashSet usedReferences = new HashSet(); - readonly HashSet usedSources = new HashSet(); - readonly HashSet missingSources = new HashSet(); - readonly Dictionary unresolvedReferences = new Dictionary(); - readonly DirectoryInfo sourceDir; - int failedProjects, succeededProjects; - readonly string[] allSources; - int conflictedReferences = 0; + private readonly AssemblyCache assemblyCache; + private readonly NugetPackages nuget; + private readonly IProgressMonitor progressMonitor; + private readonly IDictionary usedReferences = new ConcurrentDictionary(); + private readonly IDictionary sources = new ConcurrentDictionary(); + private readonly IDictionary unresolvedReferences = new ConcurrentDictionary(); + private readonly DirectoryInfo sourceDir; + private int failedProjects, succeededProjects; + private readonly string[] allSources; + private int conflictedReferences = 0; /// /// Performs a C# build analysis. @@ -64,6 +66,8 @@ namespace Semmle.BuildAnalyser /// Display of analysis progress. public BuildAnalysis(Options options, IProgressMonitor progress) { + var startTime = DateTime.Now; + progressMonitor = progress; sourceDir = new DirectoryInfo(options.SrcDir); @@ -74,36 +78,43 @@ namespace Semmle.BuildAnalyser Where(d => !options.ExcludesFile(d)). ToArray(); - var dllDirNames = options.DllDirs.Select(Path.GetFullPath); + var dllDirNames = options.DllDirs.Select(Path.GetFullPath).ToList(); + PackageDirectory = new TemporaryDirectory(ComputeTempDirectory(sourceDir.FullName)); if (options.UseNuGet) { - nuget = new NugetPackages(sourceDir.FullName); - ReadNugetFiles(); - dllDirNames = dllDirNames.Concat(Enumerators.Singleton(nuget.PackageDirectory)); + try + { + nuget = new NugetPackages(sourceDir.FullName, PackageDirectory); + ReadNugetFiles(); + } + catch(FileNotFoundException) + { + progressMonitor.MissingNuGet(); + } } // Find DLLs in the .Net Framework if (options.ScanNetFrameworkDlls) { - dllDirNames = dllDirNames.Concat(Runtime.Runtimes.Take(1)); + dllDirNames.Add(Runtime.Runtimes.First()); } - assemblyCache = new BuildAnalyser.AssemblyCache(dllDirNames, progress); + // These files can sometimes prevent `dotnet restore` from working correctly. + using (new FileRenamer(sourceDir.GetFiles("global.json", SearchOption.AllDirectories))) + using (new FileRenamer(sourceDir.GetFiles("Directory.Build.props", SearchOption.AllDirectories))) + { + var solutions = options.SolutionFile != null ? + new[] { options.SolutionFile } : + sourceDir.GetFiles("*.sln", SearchOption.AllDirectories).Select(d => d.FullName); - // Analyse all .csproj files in the source tree. - if (options.SolutionFile != null) - { - AnalyseSolution(options.SolutionFile); - } - else if (options.AnalyseCsProjFiles) - { - AnalyseProjectFiles(); - } + RestoreSolutions(solutions); + dllDirNames.Add(PackageDirectory.DirInfo.FullName); + assemblyCache = new BuildAnalyser.AssemblyCache(dllDirNames, progress); + AnalyseSolutions(solutions); - if (!options.AnalyseCsProjFiles) - { - usedReferences = new HashSet(assemblyCache.AllAssemblies.Select(a => a.Filename)); + foreach (var filename in assemblyCache.AllAssemblies.Select(a => a.Filename)) + UseReference(filename); } ResolveConflicts(); @@ -114,7 +125,7 @@ namespace Semmle.BuildAnalyser } // Output the findings - foreach (var r in usedReferences) + foreach (var r in usedReferences.Keys) { progressMonitor.ResolvedReference(r); } @@ -132,7 +143,27 @@ namespace Semmle.BuildAnalyser UnresolvedReferences.Count(), conflictedReferences, succeededProjects + failedProjects, - failedProjects); + failedProjects, + DateTime.Now - startTime); + } + + /// + /// Computes a unique temp directory for the packages associated + /// with this source tree. Use a SHA1 of the directory name. + /// + /// + /// The full path of the temp directory. + private static string ComputeTempDirectory(string srcDir) + { + var bytes = Encoding.Unicode.GetBytes(srcDir); + + using var sha1 = new SHA1CryptoServiceProvider(); + var sha = sha1.ComputeHash(bytes); + var sb = new StringBuilder(); + foreach (var b in sha.Take(8)) + sb.AppendFormat("{0:x2}", b); + + return Path.Combine(Path.GetTempPath(), "GitHub", "packages", sb.ToString()); } /// @@ -143,7 +174,7 @@ namespace Semmle.BuildAnalyser void ResolveConflicts() { var sortedReferences = usedReferences. - Select(r => assemblyCache.GetAssemblyInfo(r)). + Select(r => assemblyCache.GetAssemblyInfo(r.Key)). OrderBy(r => r.Version). ToArray(); @@ -154,7 +185,9 @@ namespace Semmle.BuildAnalyser finalAssemblyList[r.Name] = r; // Update the used references list - usedReferences = new HashSet(finalAssemblyList.Select(r => r.Value.Filename)); + usedReferences.Clear(); + foreach (var r in finalAssemblyList.Select(r => r.Value.Filename)) + UseReference(r); // Report the results foreach (var r in sortedReferences) @@ -183,7 +216,7 @@ namespace Semmle.BuildAnalyser /// The filename of the reference. void UseReference(string reference) { - usedReferences.Add(reference); + usedReferences[reference] = true; } /// @@ -192,25 +225,18 @@ namespace Semmle.BuildAnalyser /// The source file. void UseSource(FileInfo sourceFile) { - if (sourceFile.Exists) - { - usedSources.Add(sourceFile.FullName); - } - else - { - missingSources.Add(sourceFile.FullName); - } + sources[sourceFile.FullName] = sourceFile.Exists; } /// /// The list of resolved reference files. /// - public IEnumerable ReferenceFiles => this.usedReferences; + public IEnumerable ReferenceFiles => this.usedReferences.Keys; /// /// The list of source files used in projects. /// - public IEnumerable ProjectSourceFiles => usedSources; + public IEnumerable ProjectSourceFiles => sources.Where(s => s.Value).Select(s => s.Key); /// /// All of the source files in the source directory. @@ -226,7 +252,7 @@ namespace Semmle.BuildAnalyser /// List of source files which were mentioned in project files but /// do not exist on the file system. /// - public IEnumerable MissingSourceFiles => missingSources; + public IEnumerable MissingSourceFiles => sources.Where(s => !s.Value).Select(s => s.Key); /// /// Record that a particular reference couldn't be resolved. @@ -239,74 +265,101 @@ namespace Semmle.BuildAnalyser unresolvedReferences[id] = projectFile; } - /// - /// Performs an analysis of all .csproj files. - /// - void AnalyseProjectFiles() - { - AnalyseProjectFiles(sourceDir.GetFiles("*.csproj", SearchOption.AllDirectories)); - } + readonly TemporaryDirectory PackageDirectory; /// /// Reads all the source files and references from the given list of projects. /// /// The list of projects to analyse. - void AnalyseProjectFiles(FileInfo[] projectFiles) + void AnalyseProjectFiles(IEnumerable projectFiles) { - progressMonitor.AnalysingProjectFiles(projectFiles.Count()); - foreach (var proj in projectFiles) + AnalyseProject(proj); + } + + void AnalyseProject(FileInfo project) + { + if(!project.Exists) { - try - { - var csProj = new CsProjFile(proj); + progressMonitor.MissingProject(project.FullName); + return; + } - foreach (var @ref in csProj.References) - { - AssemblyInfo resolved = assemblyCache.ResolveReference(@ref); - if (!resolved.Valid) - { - UnresolvedReference(@ref, proj.FullName); - } - else - { - UseReference(resolved.Filename); - } - } + try + { + var csProj = new CsProjFile(project); - foreach (var src in csProj.Sources) - { - // Make a note of which source files the projects use. - // This information doesn't affect the build but is dumped - // as diagnostic output. - UseSource(new FileInfo(src)); - } - ++succeededProjects; - } - catch (Exception ex) // lgtm[cs/catch-of-all-exceptions] + foreach (var @ref in csProj.References) { - ++failedProjects; - progressMonitor.FailedProjectFile(proj.FullName, ex.Message); + AssemblyInfo resolved = assemblyCache.ResolveReference(@ref); + if (!resolved.Valid) + { + UnresolvedReference(@ref, project.FullName); + } + else + { + UseReference(resolved.Filename); + } } + + foreach (var src in csProj.Sources) + { + // Make a note of which source files the projects use. + // This information doesn't affect the build but is dumped + // as diagnostic output. + UseSource(new FileInfo(src)); + } + + ++succeededProjects; + } + catch (Exception ex) // lgtm[cs/catch-of-all-exceptions] + { + ++failedProjects; + progressMonitor.FailedProjectFile(project.FullName, ex.Message); + } + + } + + void Restore(string projectOrSolution) + { + int exit = DotNet.RestoreToDirectory(projectOrSolution, PackageDirectory.DirInfo.FullName); + switch(exit) + { + case 0: + case 1: + // No errors + break; + default: + progressMonitor.CommandFailed("dotnet", $"restore \"{projectOrSolution}\"", exit); + break; } } - /// - /// Delete packages directory. - /// - public void Cleanup() + public void RestoreSolutions(IEnumerable solutions) { - if (nuget != null) nuget.Cleanup(progressMonitor); + Parallel.ForEach(solutions, new ParallelOptions { MaxDegreeOfParallelism = 4 }, Restore); } - /// - /// Analyse all project files in a given solution only. - /// - /// The filename of the solution. - public void AnalyseSolution(string solutionFile) + public void AnalyseSolutions(IEnumerable solutions) { - var sln = new SolutionFile(solutionFile); - AnalyseProjectFiles(sln.Projects.Select(p => new FileInfo(p)).ToArray()); + Parallel.ForEach(solutions, new ParallelOptions { MaxDegreeOfParallelism = 4 } , solutionFile => + { + try + { + var sln = new SolutionFile(solutionFile); + progressMonitor.AnalysingSolution(solutionFile); + AnalyseProjectFiles(sln.Projects.Select(p => new FileInfo(p)).Where(p => p.Exists)); + } + catch (Microsoft.Build.Exceptions.InvalidProjectFileException ex) + { + progressMonitor.FailedProjectFile(solutionFile, ex.BaseMessage); + } + }); + } + + public void Dispose() + { + PackageDirectory?.Dispose(); } } } diff --git a/csharp/extractor/Semmle.Extraction.CSharp.Standalone/CsProjFile.cs b/csharp/extractor/Semmle.Extraction.CSharp.Standalone/CsProjFile.cs index 2c9e72c1eaa..1083c9b6257 100644 --- a/csharp/extractor/Semmle.Extraction.CSharp.Standalone/CsProjFile.cs +++ b/csharp/extractor/Semmle.Extraction.CSharp.Standalone/CsProjFile.cs @@ -10,12 +10,18 @@ namespace Semmle.BuildAnalyser /// class CsProjFile { + private string Filename { get; } + + private string Directory => Path.GetDirectoryName(Filename); + /// /// Reads the .csproj file. /// /// The .csproj file. public CsProjFile(FileInfo filename) { + Filename = filename.FullName; + try { // This can fail if the .csproj is invalid or has @@ -39,7 +45,7 @@ namespace Semmle.BuildAnalyser /// and there seems to be no way to make it succeed. Fails on Linux. /// /// The file to read. - public void ReadMsBuildProject(FileInfo filename) + private void ReadMsBuildProject(FileInfo filename) { var msbuildProject = new Microsoft.Build.Execution.ProjectInstance(filename.FullName); @@ -62,7 +68,7 @@ namespace Semmle.BuildAnalyser /// fallback if ReadMsBuildProject() fails. /// /// The .csproj file. - public void ReadProjectFileAsXml(FileInfo filename) + private void ReadProjectFileAsXml(FileInfo filename) { var projFile = new XmlDocument(); var mgr = new XmlNamespaceManager(projFile.NameTable); @@ -71,22 +77,48 @@ namespace Semmle.BuildAnalyser var projDir = filename.Directory; var root = projFile.DocumentElement; - references = - root.SelectNodes("/msbuild:Project/msbuild:ItemGroup/msbuild:Reference/@Include", mgr). - NodeList(). - Select(node => node.Value). - ToArray(); + // Figure out if it's dotnet core - var relativeCsIncludes = - root.SelectNodes("/msbuild:Project/msbuild:ItemGroup/msbuild:Compile/@Include", mgr). - NodeList(). - Select(node => node.Value). - ToArray(); + bool netCoreProjectFile = root.GetAttribute("Sdk") == "Microsoft.NET.Sdk"; - csFiles = relativeCsIncludes. - Select(cs => Path.DirectorySeparatorChar == '/' ? cs.Replace("\\", "/") : cs). - Select(f => Path.GetFullPath(Path.Combine(projDir.FullName, f))). - ToArray(); + if (netCoreProjectFile) + { + var relativeCsIncludes = + root.SelectNodes("/Project/ItemGroup/Compile/@Include", mgr). + NodeList(). + Select(node => node.Value). + ToArray(); + + var explicitCsFiles = relativeCsIncludes. + Select(cs => Path.DirectorySeparatorChar == '/' ? cs.Replace("\\", "/") : cs). + Select(f => Path.GetFullPath(Path.Combine(projDir.FullName, f))); + + var additionalCsFiles = System.IO.Directory.GetFiles(Directory, "*.cs", SearchOption.AllDirectories); + + csFiles = explicitCsFiles.Concat(additionalCsFiles).ToArray(); + + references = new string[0]; + } + else + { + + references = + root.SelectNodes("/msbuild:Project/msbuild:ItemGroup/msbuild:Reference/@Include", mgr). + NodeList(). + Select(node => node.Value). + ToArray(); + + var relativeCsIncludes = + root.SelectNodes("/msbuild:Project/msbuild:ItemGroup/msbuild:Compile/@Include", mgr). + NodeList(). + Select(node => node.Value). + ToArray(); + + csFiles = relativeCsIncludes. + Select(cs => Path.DirectorySeparatorChar == '/' ? cs.Replace("\\", "/") : cs). + Select(f => Path.GetFullPath(Path.Combine(projDir.FullName, f))). + ToArray(); + } } string[] references; diff --git a/csharp/extractor/Semmle.Extraction.CSharp.Standalone/DotNet.cs b/csharp/extractor/Semmle.Extraction.CSharp.Standalone/DotNet.cs new file mode 100644 index 00000000000..6edd217af8d --- /dev/null +++ b/csharp/extractor/Semmle.Extraction.CSharp.Standalone/DotNet.cs @@ -0,0 +1,21 @@ +using System; +using System.Collections.Generic; +using System.Diagnostics; +using System.IO; +using System.Linq; + +namespace Semmle.BuildAnalyser +{ + /// + /// Utilities to run the "dotnet" command. + /// + static class DotNet + { + public static int RestoreToDirectory(string projectOrSolutionFile, string packageDirectory) + { + using var proc = Process.Start("dotnet", $"restore --no-dependencies \"{projectOrSolutionFile}\" --packages \"{packageDirectory}\" /p:DisableImplicitNuGetFallbackFolder=true"); + proc.WaitForExit(); + return proc.ExitCode; + } + } +} diff --git a/csharp/extractor/Semmle.Extraction.CSharp.Standalone/NugetPackages.cs b/csharp/extractor/Semmle.Extraction.CSharp.Standalone/NugetPackages.cs index 1f0755f307f..2ea3afb6c69 100644 --- a/csharp/extractor/Semmle.Extraction.CSharp.Standalone/NugetPackages.cs +++ b/csharp/extractor/Semmle.Extraction.CSharp.Standalone/NugetPackages.cs @@ -1,10 +1,9 @@ -using System; +using Semmle.Util; +using System; using System.Collections.Generic; using System.Diagnostics; using System.IO; using System.Linq; -using System.Security.Cryptography; -using System.Text; namespace Semmle.BuildAnalyser { @@ -19,10 +18,10 @@ namespace Semmle.BuildAnalyser /// Create the package manager for a specified source tree. /// /// The source directory. - public NugetPackages(string sourceDir) + public NugetPackages(string sourceDir, TemporaryDirectory packageDirectory) { SourceDirectory = sourceDir; - PackageDirectory = computeTempDirectory(sourceDir); + PackageDirectory = packageDirectory; // Expect nuget.exe to be in a `nuget` directory under the directory containing this exe. var currentAssembly = System.Reflection.Assembly.GetExecutingAssembly().Location; @@ -50,45 +49,12 @@ namespace Semmle.BuildAnalyser /// public IEnumerable PackageFiles => packages; - // Whether to delete the packages directory prior to each run. - // Makes each build more reproducible. - const bool cleanupPackages = true; - - public void Cleanup(IProgressMonitor pm) - { - var packagesDirectory = new DirectoryInfo(PackageDirectory); - - if (packagesDirectory.Exists) - { - try - { - packagesDirectory.Delete(true); - } - catch (System.IO.IOException ex) - { - pm.Warning(string.Format("Couldn't delete package directory - it's probably held open by something else: {0}", ex.Message)); - } - } - } - /// /// Download the packages to the temp folder. /// /// The progress monitor used for reporting errors etc. public void InstallPackages(IProgressMonitor pm) { - if (cleanupPackages) - { - Cleanup(pm); - } - - var packagesDirectory = new DirectoryInfo(PackageDirectory); - - if (!Directory.Exists(PackageDirectory)) - { - packagesDirectory.Create(); - } - foreach (var package in packages) { RestoreNugetPackage(package.FullName, pm); @@ -109,31 +75,7 @@ namespace Semmle.BuildAnalyser /// This will be in the Temp location /// so as to not trample the source tree. /// - public string PackageDirectory - { - get; - private set; - } - - readonly SHA1CryptoServiceProvider sha1 = new SHA1CryptoServiceProvider(); - - /// - /// Computes a unique temp directory for the packages associated - /// with this source tree. Use a SHA1 of the directory name. - /// - /// - /// The full path of the temp directory. - string computeTempDirectory(string srcDir) - { - var bytes = Encoding.Unicode.GetBytes(srcDir); - - var sha = sha1.ComputeHash(bytes); - var sb = new StringBuilder(); - foreach (var b in sha.Take(8)) - sb.AppendFormat("{0:x2}", b); - - return Path.Combine(Path.GetTempPath(), "Semmle", "packages", sb.ToString()); - } + public TemporaryDirectory PackageDirectory { get; } /// /// Restore all files in a specified package. @@ -171,16 +113,15 @@ namespace Semmle.BuildAnalyser try { - using (var p = Process.Start(pi)) - { - string output = p.StandardOutput.ReadToEnd(); - string error = p.StandardError.ReadToEnd(); + using var p = Process.Start(pi); - p.WaitForExit(); - if (p.ExitCode != 0) - { - pm.FailedNugetCommand(pi.FileName, pi.Arguments, output + error); - } + string output = p.StandardOutput.ReadToEnd(); + string error = p.StandardError.ReadToEnd(); + + p.WaitForExit(); + if (p.ExitCode != 0) + { + pm.FailedNugetCommand(pi.FileName, pi.Arguments, output + error); } } catch (Exception ex) diff --git a/csharp/extractor/Semmle.Extraction.CSharp.Standalone/Program.cs b/csharp/extractor/Semmle.Extraction.CSharp.Standalone/Program.cs index e0367fa63c1..106771faef2 100644 --- a/csharp/extractor/Semmle.Extraction.CSharp.Standalone/Program.cs +++ b/csharp/extractor/Semmle.Extraction.CSharp.Standalone/Program.cs @@ -23,7 +23,7 @@ namespace Semmle.Extraction.CSharp.Standalone /// /// Searches for source/references and creates separate extractions. /// - class Analysis + class Analysis : IDisposable { readonly ILogger logger; @@ -71,12 +71,9 @@ namespace Semmle.Extraction.CSharp.Standalone projectExtraction.Sources.AddRange(options.SolutionFile == null ? buildAnalysis.AllSourceFiles : buildAnalysis.ProjectSourceFiles); } - /// - /// Delete any Nuget assemblies. - /// - public void Cleanup() + public void Dispose() { - buildAnalysis.Cleanup(); + buildAnalysis.Dispose(); } }; @@ -85,8 +82,9 @@ namespace Semmle.Extraction.CSharp.Standalone static int Main(string[] args) { var options = Options.Create(args); + // options.CIL = true; // To do: Enable this var output = new ConsoleLogger(options.Verbosity); - var a = new Analysis(output); + using var a = new Analysis(output); if (options.Help) { @@ -97,6 +95,8 @@ namespace Semmle.Extraction.CSharp.Standalone if (options.Errors) return 1; + var start = DateTime.Now; + output.Log(Severity.Info, "Running C# standalone extractor"); a.AnalyseProjects(options); int sourceFiles = a.Extraction.Sources.Count(); @@ -117,10 +117,9 @@ namespace Semmle.Extraction.CSharp.Standalone new ExtractionProgress(output), new FileLogger(options.Verbosity, Extractor.GetCSharpLogPath()), options); - output.Log(Severity.Info, "Extraction complete"); + output.Log(Severity.Info, $"Extraction completed in {DateTime.Now-start}"); } - a.Cleanup(); return 0; } @@ -151,7 +150,7 @@ namespace Semmle.Extraction.CSharp.Standalone public void MissingSummary(int missingTypes, int missingNamespaces) { - logger.Log(Severity.Info, "Failed to resolve {0} types and {1} namespaces", missingTypes, missingNamespaces); + logger.Log(Severity.Info, "Failed to resolve {0} types in {1} namespaces", missingTypes, missingNamespaces); } } } diff --git a/csharp/extractor/Semmle.Extraction.CSharp.Standalone/ProgressMonitor.cs b/csharp/extractor/Semmle.Extraction.CSharp.Standalone/ProgressMonitor.cs index f4bde55ec55..5bbe1e3a5b2 100644 --- a/csharp/extractor/Semmle.Extraction.CSharp.Standalone/ProgressMonitor.cs +++ b/csharp/extractor/Semmle.Extraction.CSharp.Standalone/ProgressMonitor.cs @@ -1,4 +1,5 @@ using Semmle.Util.Logging; +using System; namespace Semmle.BuildAnalyser { @@ -9,15 +10,17 @@ namespace Semmle.BuildAnalyser { void FindingFiles(string dir); void UnresolvedReference(string id, string project); - void AnalysingProjectFiles(int count); + void AnalysingSolution(string filename); void FailedProjectFile(string filename, string reason); void FailedNugetCommand(string exe, string args, string message); void NugetInstall(string package); void ResolvedReference(string filename); - void Summary(int existingSources, int usedSources, int missingSources, int references, int unresolvedReferences, int resolvedConflicts, int totalProjects, int failedProjects); + void Summary(int existingSources, int usedSources, int missingSources, int references, int unresolvedReferences, int resolvedConflicts, int totalProjects, int failedProjects, TimeSpan analysisTime); void Warning(string message); void ResolvedConflict(string asm1, string asm2); void MissingProject(string projectFile); + void CommandFailed(string exe, string arguments, int exitCode); + void MissingNuGet(); } class ProgressMonitor : IProgressMonitor @@ -46,9 +49,9 @@ namespace Semmle.BuildAnalyser logger.Log(Severity.Debug, "Unresolved {0} referenced by {1}", id, project); } - public void AnalysingProjectFiles(int count) + public void AnalysingSolution(string filename) { - logger.Log(Severity.Info, "Analyzing project files..."); + logger.Log(Severity.Info, $"Analyzing {filename}..."); } public void FailedProjectFile(string filename, string reason) @@ -73,7 +76,9 @@ namespace Semmle.BuildAnalyser } public void Summary(int existingSources, int usedSources, int missingSources, - int references, int unresolvedReferences, int resolvedConflicts, int totalProjects, int failedProjects) + int references, int unresolvedReferences, + int resolvedConflicts, int totalProjects, int failedProjects, + TimeSpan analysisTime) { logger.Log(Severity.Info, ""); logger.Log(Severity.Info, "Build analysis summary:"); @@ -85,6 +90,7 @@ namespace Semmle.BuildAnalyser logger.Log(Severity.Info, "{0, 6} resolved assembly conflicts", resolvedConflicts); logger.Log(Severity.Info, "{0, 6} projects", totalProjects); logger.Log(Severity.Info, "{0, 6} missing/failed projects", failedProjects); + logger.Log(Severity.Info, "Build analysis completed in {0}", analysisTime); } public void Warning(string message) @@ -94,12 +100,22 @@ namespace Semmle.BuildAnalyser public void ResolvedConflict(string asm1, string asm2) { - logger.Log(Severity.Info, "Resolved {0} as {1}", asm1, asm2); + logger.Log(Severity.Debug, "Resolved {0} as {1}", asm1, asm2); } public void MissingProject(string projectFile) { logger.Log(Severity.Info, "Solution is missing {0}", projectFile); } + + public void CommandFailed(string exe, string arguments, int exitCode) + { + logger.Log(Severity.Error, $"Command {exe} {arguments} failed with exit code {exitCode}"); + } + + public void MissingNuGet() + { + logger.Log(Severity.Error, "Missing nuget.exe"); + } } } diff --git a/csharp/extractor/Semmle.Extraction.CSharp.Standalone/Semmle.Extraction.CSharp.Standalone.csproj b/csharp/extractor/Semmle.Extraction.CSharp.Standalone/Semmle.Extraction.CSharp.Standalone.csproj index 4cf0274b737..f9efd0d9ebb 100644 --- a/csharp/extractor/Semmle.Extraction.CSharp.Standalone/Semmle.Extraction.CSharp.Standalone.csproj +++ b/csharp/extractor/Semmle.Extraction.CSharp.Standalone/Semmle.Extraction.CSharp.Standalone.csproj @@ -1,4 +1,4 @@ - + Exe @@ -14,6 +14,7 @@ + diff --git a/csharp/extractor/Semmle.Extraction.CSharp.Standalone/SolutionFile.cs b/csharp/extractor/Semmle.Extraction.CSharp.Standalone/SolutionFile.cs index b1a3edd4cf6..b4551dd8024 100644 --- a/csharp/extractor/Semmle.Extraction.CSharp.Standalone/SolutionFile.cs +++ b/csharp/extractor/Semmle.Extraction.CSharp.Standalone/SolutionFile.cs @@ -12,6 +12,8 @@ namespace Semmle.BuildAnalyser { readonly Microsoft.Build.Construction.SolutionFile solutionFile; + private string FullPath { get; } + /// /// Read the file. /// @@ -19,8 +21,8 @@ namespace Semmle.BuildAnalyser public SolutionFile(string filename) { // SolutionFile.Parse() expects a rooted path. - var fullPath = Path.GetFullPath(filename); - solutionFile = Microsoft.Build.Construction.SolutionFile.Parse(fullPath); + FullPath = Path.GetFullPath(filename); + solutionFile = Microsoft.Build.Construction.SolutionFile.Parse(FullPath); } /// diff --git a/csharp/extractor/Semmle.Extraction.CSharp/Entities/Expressions/Access.cs b/csharp/extractor/Semmle.Extraction.CSharp/Entities/Expressions/Access.cs index 0488fe84ffe..6962e8381d9 100644 --- a/csharp/extractor/Semmle.Extraction.CSharp/Entities/Expressions/Access.cs +++ b/csharp/extractor/Semmle.Extraction.CSharp/Entities/Expressions/Access.cs @@ -45,7 +45,10 @@ namespace Semmle.Extraction.CSharp.Entities.Expressions Access(ExpressionNodeInfo info, ISymbol symbol, bool implicitThis, IEntity target) : base(info.SetKind(AccessKind(info.Context, symbol))) { - cx.TrapWriter.Writer.expr_access(this, target); + if (!(target is null)) + { + cx.TrapWriter.Writer.expr_access(this, target); + } if (implicitThis && !symbol.IsStatic) { diff --git a/csharp/extractor/Semmle.Extraction.CSharp/Entities/Expressions/MemberAccess.cs b/csharp/extractor/Semmle.Extraction.CSharp/Entities/Expressions/MemberAccess.cs index e41ef0edf23..0bc84ca9c0c 100644 --- a/csharp/extractor/Semmle.Extraction.CSharp/Entities/Expressions/MemberAccess.cs +++ b/csharp/extractor/Semmle.Extraction.CSharp/Entities/Expressions/MemberAccess.cs @@ -71,7 +71,9 @@ namespace Semmle.Extraction.CSharp.Entities.Expressions if (symbol == null) { info.Context.ModelError(info.Node, "Failed to determine symbol for member access"); - return new MemberAccess(info.SetKind(ExprKind.UNKNOWN), expression, symbol); + // Default to property access - this can still give useful results but + // the target of the expression should be checked in QL. + return new MemberAccess(info.SetKind(ExprKind.PROPERTY_ACCESS), expression, symbol); } ExprKind kind; diff --git a/csharp/extractor/Semmle.Extraction.CSharp/Entities/NamespaceDeclaration.cs b/csharp/extractor/Semmle.Extraction.CSharp/Entities/NamespaceDeclaration.cs index a0dd41aaafb..7a14bb719fc 100644 --- a/csharp/extractor/Semmle.Extraction.CSharp/Entities/NamespaceDeclaration.cs +++ b/csharp/extractor/Semmle.Extraction.CSharp/Entities/NamespaceDeclaration.cs @@ -23,7 +23,8 @@ namespace Semmle.Extraction.CSharp.Entities protected override void Populate(TextWriter trapFile) { - var ns = Namespace.Create(cx, (INamespaceSymbol)cx.GetModel(Node).GetSymbolInfo(Node.Name).Symbol); + var @namespace = (INamespaceSymbol) cx.GetModel(Node).GetSymbolInfo(Node.Name).Symbol; + var ns = Namespace.Create(cx, @namespace); trapFile.namespace_declarations(this, ns); trapFile.namespace_declaration_location(this, cx.Create(Node.Name.GetLocation())); diff --git a/csharp/extractor/Semmle.Extraction.CSharp/Entities/Types/NamedType.cs b/csharp/extractor/Semmle.Extraction.CSharp/Entities/Types/NamedType.cs index e22d32c0d01..cecec5bc028 100644 --- a/csharp/extractor/Semmle.Extraction.CSharp/Entities/Types/NamedType.cs +++ b/csharp/extractor/Semmle.Extraction.CSharp/Entities/Types/NamedType.cs @@ -25,7 +25,7 @@ namespace Semmle.Extraction.CSharp.Entities { if (symbol.TypeKind == TypeKind.Error) { - Context.Extractor.MissingType(symbol.ToString()); + Context.Extractor.MissingType(symbol.ToString(), Context.FromSource); return; } diff --git a/csharp/extractor/Semmle.Extraction.CSharp/Entities/UsingDirective.cs b/csharp/extractor/Semmle.Extraction.CSharp/Entities/UsingDirective.cs index 4fdbe7c18ad..02b67efc164 100644 --- a/csharp/extractor/Semmle.Extraction.CSharp/Entities/UsingDirective.cs +++ b/csharp/extractor/Semmle.Extraction.CSharp/Entities/UsingDirective.cs @@ -32,7 +32,7 @@ namespace Semmle.Extraction.CSharp.Entities if (namespaceSymbol == null) { - cx.Extractor.MissingNamespace(Node.Name.ToFullString()); + cx.Extractor.MissingNamespace(Node.Name.ToFullString(), cx.FromSource); cx.ModelError(Node, "Namespace not found"); return; } diff --git a/csharp/extractor/Semmle.Extraction.CSharp/SymbolExtensions.cs b/csharp/extractor/Semmle.Extraction.CSharp/SymbolExtensions.cs index c2cb6a367eb..6050ad910a5 100644 --- a/csharp/extractor/Semmle.Extraction.CSharp/SymbolExtensions.cs +++ b/csharp/extractor/Semmle.Extraction.CSharp/SymbolExtensions.cs @@ -214,7 +214,6 @@ namespace Semmle.Extraction.CSharp static void BuildNamedTypeId(this INamedTypeSymbol named, Context cx, TextWriter trapFile, Action subTermAction) { bool prefixAssembly = true; - if (cx.Extractor.Standalone) prefixAssembly = false; if (named.ContainingAssembly is null) prefixAssembly = false; if (named.IsTupleType) diff --git a/csharp/extractor/Semmle.Extraction/Context.cs b/csharp/extractor/Semmle.Extraction/Context.cs index 93f99381858..918642f198d 100644 --- a/csharp/extractor/Semmle.Extraction/Context.cs +++ b/csharp/extractor/Semmle.Extraction/Context.cs @@ -155,7 +155,7 @@ namespace Semmle.Extraction #if DEBUG_LABELS using (var id = new StringWriter()) { - entity.WriteId(id); + entity.WriteQuotedId(id); CheckEntityHasUniqueLabel(id.ToString(), entity); } #endif @@ -270,6 +270,8 @@ namespace Semmle.Extraction TrapWriter = trapWriter; } + public bool FromSource => Scope.FromSource; + public bool IsGlobalContext => Scope.IsGlobalScope; public readonly ICommentGenerator CommentGenerator = new CommentProcessor(); diff --git a/csharp/extractor/Semmle.Extraction/ExtractionScope.cs b/csharp/extractor/Semmle.Extraction/ExtractionScope.cs index 7f4f599fe5c..60daff8d013 100644 --- a/csharp/extractor/Semmle.Extraction/ExtractionScope.cs +++ b/csharp/extractor/Semmle.Extraction/ExtractionScope.cs @@ -25,6 +25,8 @@ namespace Semmle.Extraction bool InFileScope(string path); bool IsGlobalScope { get; } + + bool FromSource { get; } } /// @@ -49,6 +51,8 @@ namespace Semmle.Extraction public bool InScope(ISymbol symbol) => SymbolEqualityComparer.Default.Equals(symbol.ContainingAssembly, assembly) || SymbolEqualityComparer.Default.Equals(symbol, assembly); + + public bool FromSource => false; } /// @@ -68,5 +72,7 @@ namespace Semmle.Extraction public bool InFileScope(string path) => path == sourceTree.FilePath; public bool InScope(ISymbol symbol) => symbol.Locations.Any(loc => loc.SourceTree == sourceTree); + + public bool FromSource => true; } } diff --git a/csharp/extractor/Semmle.Extraction/Extractor.cs b/csharp/extractor/Semmle.Extraction/Extractor.cs index e470d3258ec..13750c1aa5c 100644 --- a/csharp/extractor/Semmle.Extraction/Extractor.cs +++ b/csharp/extractor/Semmle.Extraction/Extractor.cs @@ -50,13 +50,15 @@ namespace Semmle.Extraction /// Record a new error type. /// /// The display name of the type, qualified where possible. - void MissingType(string fqn); + /// If the missing type was referenced from a source file. + void MissingType(string fqn, bool fromSource); /// /// Record an unresolved `using namespace` directive. /// /// The full name of the namespace. - void MissingNamespace(string fqn); + /// If the missing namespace was referenced from a source file. + void MissingNamespace(string fqn, bool fromSource); /// /// The list of missing types. @@ -167,16 +169,22 @@ namespace Semmle.Extraction readonly ISet missingTypes = new SortedSet(); readonly ISet missingNamespaces = new SortedSet(); - public void MissingType(string fqn) + public void MissingType(string fqn, bool fromSource) { - lock (mutex) - missingTypes.Add(fqn); + if (fromSource) + { + lock (mutex) + missingTypes.Add(fqn); + } } - public void MissingNamespace(string fqdn) + public void MissingNamespace(string fqdn, bool fromSource) { - lock (mutex) - missingNamespaces.Add(fqdn); + if (fromSource) + { + lock (mutex) + missingNamespaces.Add(fqdn); + } } public Context CreateContext(Compilation c, TrapWriter trapWriter, IExtractionScope scope) diff --git a/csharp/extractor/Semmle.Util.Tests/Semmle.Util.Tests.csproj b/csharp/extractor/Semmle.Util.Tests/Semmle.Util.Tests.csproj index d0ef95a4d4c..a82997aea63 100644 --- a/csharp/extractor/Semmle.Util.Tests/Semmle.Util.Tests.csproj +++ b/csharp/extractor/Semmle.Util.Tests/Semmle.Util.Tests.csproj @@ -5,6 +5,7 @@ netcoreapp3.0 false win-x64;linux-x64;osx-x64 + enable diff --git a/csharp/extractor/Semmle.Util/ActionMap.cs b/csharp/extractor/Semmle.Util/ActionMap.cs index 019410be7b2..c9fecbf9da6 100644 --- a/csharp/extractor/Semmle.Util/ActionMap.cs +++ b/csharp/extractor/Semmle.Util/ActionMap.cs @@ -9,20 +9,19 @@ namespace Semmle.Util /// /// /// - public class ActionMap + public class ActionMap where Key : notnull { public void Add(Key key, Value value) { - Action a; - if (actions.TryGetValue(key, out a)) + + if (actions.TryGetValue(key, out var a)) a(value); values[key] = value; } public void OnAdd(Key key, Action action) { - Action a; - if (actions.TryGetValue(key, out a)) + if (actions.TryGetValue(key, out var a)) { actions[key] = a + action; } diff --git a/csharp/extractor/Semmle.Util/CanonicalPathCache.cs b/csharp/extractor/Semmle.Util/CanonicalPathCache.cs index 3610bb0d46e..bbc8ab995b4 100644 --- a/csharp/extractor/Semmle.Util/CanonicalPathCache.cs +++ b/csharp/extractor/Semmle.Util/CanonicalPathCache.cs @@ -127,8 +127,8 @@ namespace Semmle.Util if (parent != null) { - string name = Path.GetFileName(path); - string parentPath = cache.GetCanonicalPath(parent.FullName); + var name = Path.GetFileName(path); + var parentPath = cache.GetCanonicalPath(parent.FullName); try { string[] entries = Directory.GetFileSystemEntries(parentPath, name); @@ -313,14 +313,15 @@ namespace Semmle.Util /// The canonical path. public string GetCanonicalPath(string path) { - string canonicalPath; lock (cache) - if (!cache.TryGetValue(path, out canonicalPath)) + { + if (!cache.TryGetValue(path, out var canonicalPath)) { canonicalPath = pathStrategy.GetCanonicalPath(path, this); AddToCache(path, canonicalPath); } - return canonicalPath; + return canonicalPath; + } } } } diff --git a/csharp/extractor/Semmle.Util/CommandLineExtensions.cs b/csharp/extractor/Semmle.Util/CommandLineExtensions.cs index b7c5166f2f0..01a581d612d 100644 --- a/csharp/extractor/Semmle.Util/CommandLineExtensions.cs +++ b/csharp/extractor/Semmle.Util/CommandLineExtensions.cs @@ -18,7 +18,7 @@ namespace Semmle.Util var found = false; foreach (var arg in commandLineArguments.Where(arg => arg.StartsWith('@')).Select(arg => arg.Substring(1))) { - string line; + string? line; using (StreamReader file = new StreamReader(arg)) while ((line = file.ReadLine()) != null) textWriter.WriteLine(line); diff --git a/csharp/extractor/Semmle.Util/DictionaryExtensions.cs b/csharp/extractor/Semmle.Util/DictionaryExtensions.cs index 7a15ce10c92..bb0d732a17f 100644 --- a/csharp/extractor/Semmle.Util/DictionaryExtensions.cs +++ b/csharp/extractor/Semmle.Util/DictionaryExtensions.cs @@ -9,10 +9,9 @@ namespace Semmle.Util /// dictionary. If a list does not already exist, a new list is /// created. /// - public static void AddAnother(this Dictionary> dict, T1 key, T2 element) + public static void AddAnother(this Dictionary> dict, T1 key, T2 element) where T1:notnull { - List list; - if (!dict.TryGetValue(key, out list)) + if (!dict.TryGetValue(key, out var list)) { list = new List(); dict[key] = list; diff --git a/csharp/extractor/Semmle.Util/FileRenamer.cs b/csharp/extractor/Semmle.Util/FileRenamer.cs new file mode 100644 index 00000000000..ad5001f7e13 --- /dev/null +++ b/csharp/extractor/Semmle.Util/FileRenamer.cs @@ -0,0 +1,34 @@ +using System; +using System.Collections.Generic; +using System.IO; +using System.Linq; + +namespace Semmle.Util +{ + /// + /// Utility to temporarily rename a set of files. + /// + public sealed class FileRenamer : IDisposable + { + readonly string[] files; + const string suffix = ".codeqlhidden"; + + public FileRenamer(IEnumerable oldFiles) + { + files = oldFiles.Select(f => f.FullName).ToArray(); + + foreach (var file in files) + { + File.Move(file, file + suffix); + } + } + + public void Dispose() + { + foreach (var file in files) + { + File.Move(file + suffix, file); + } + } + } +} diff --git a/csharp/extractor/Semmle.Util/FileUtils.cs b/csharp/extractor/Semmle.Util/FileUtils.cs index e5f10a18521..32e2ed88e60 100644 --- a/csharp/extractor/Semmle.Util/FileUtils.cs +++ b/csharp/extractor/Semmle.Util/FileUtils.cs @@ -62,7 +62,7 @@ namespace Semmle.Util /// /// Returns null of no path can be found. /// - public static string FindProgramOnPath(string prog) + public static string? FindProgramOnPath(string prog) { var paths = Environment.GetEnvironmentVariable("PATH")?.Split(Path.PathSeparator); string[] exes; diff --git a/csharp/extractor/Semmle.Util/FuzzyDictionary.cs b/csharp/extractor/Semmle.Util/FuzzyDictionary.cs index a00d75fbbe5..6f364904b06 100644 --- a/csharp/extractor/Semmle.Util/FuzzyDictionary.cs +++ b/csharp/extractor/Semmle.Util/FuzzyDictionary.cs @@ -37,7 +37,7 @@ namespace Semmle.Util /// /// /// The value type. - public class FuzzyDictionary + public class FuzzyDictionary where T:class { // All data items indexed by the "base string" (stripped of numbers) readonly Dictionary>> index = new Dictionary>>(); @@ -61,7 +61,7 @@ namespace Semmle.Util /// Vector 1 /// Vector 2 /// The Hamming Distance. - static int HammingDistance(IEnumerable v1, IEnumerable v2) + static int HammingDistance(IEnumerable v1, IEnumerable v2) where U: notnull { return v1.Zip(v2, (x, y) => x.Equals(y) ? 0 : 1).Sum(); } @@ -72,11 +72,10 @@ namespace Semmle.Util /// The query string. /// The distance between the query string and the stored string. /// The best match, or null (default). - public T FindMatch(string query, out int distance) + public T? FindMatch(string query, out int distance) { string root = StripDigits(query); - List> list; - if (!index.TryGetValue(root, out list)) + if (!index.TryGetValue(root, out var list)) { distance = 0; return default(T); @@ -93,9 +92,9 @@ namespace Semmle.Util /// The distance function. /// The distance between the query and the stored string. /// The stored value. - static T BestMatch(string query, IEnumerable> candidates, Func distance, out int bestDistance) + static T? BestMatch(string query, IEnumerable> candidates, Func distance, out int bestDistance) { - T bestMatch = default(T); + T? bestMatch = default(T); bestDistance = 0; bool first = true; diff --git a/csharp/extractor/Semmle.Util/IEnumerableExtensions.cs b/csharp/extractor/Semmle.Util/IEnumerableExtensions.cs index d53fcf99ff4..7665dedfa70 100644 --- a/csharp/extractor/Semmle.Util/IEnumerableExtensions.cs +++ b/csharp/extractor/Semmle.Util/IEnumerableExtensions.cs @@ -93,7 +93,7 @@ namespace Semmle.Util /// The type of the item. /// The list of items to hash. /// The hash code. - public static int SequenceHash(this IEnumerable items) + public static int SequenceHash(this IEnumerable items) where T: notnull { int h = 0; foreach (var i in items) diff --git a/csharp/extractor/Semmle.Util/LineCounter.cs b/csharp/extractor/Semmle.Util/LineCounter.cs index c122d5865be..f4f6758a250 100644 --- a/csharp/extractor/Semmle.Util/LineCounter.cs +++ b/csharp/extractor/Semmle.Util/LineCounter.cs @@ -31,9 +31,9 @@ namespace Semmle.Util //#################### PUBLIC METHODS #################### #region - public override bool Equals(Object other) + public override bool Equals(object? other) { - LineCounts rhs = other as LineCounts; + var rhs = other as LineCounts; return rhs != null && Total == rhs.Total && Code == rhs.Code && Comment == rhs.Comment; } diff --git a/csharp/extractor/Semmle.Util/Logger.cs b/csharp/extractor/Semmle.Util/Logger.cs index d6405724734..cd353bd4f0f 100644 --- a/csharp/extractor/Semmle.Util/Logger.cs +++ b/csharp/extractor/Semmle.Util/Logger.cs @@ -67,8 +67,8 @@ namespace Semmle.Util.Logging try { - var dir = Path.GetDirectoryName(outputFile); - if (dir.Length > 0 && !System.IO.Directory.Exists(dir)) + string? dir = Path.GetDirectoryName(outputFile); + if (!string.IsNullOrEmpty(dir) && !System.IO.Directory.Exists(dir)) Directory.CreateDirectory(dir); writer = new PidStreamWriter(new FileStream(outputFile, FileMode.Append, FileAccess.Write, FileShare.ReadWrite, 8192)); diff --git a/csharp/extractor/Semmle.Util/LoggerUtils.cs b/csharp/extractor/Semmle.Util/LoggerUtils.cs index 39a40c37158..d94ff6f189f 100644 --- a/csharp/extractor/Semmle.Util/LoggerUtils.cs +++ b/csharp/extractor/Semmle.Util/LoggerUtils.cs @@ -21,7 +21,7 @@ namespace Semmle.Util private readonly string prefix = "[" + Process.GetCurrentProcess().Id + "] "; - public override void WriteLine(string value) + public override void WriteLine(string? value) { lock (mutex) { @@ -29,9 +29,9 @@ namespace Semmle.Util } } - public override void WriteLine(string value, object[] args) + public override void WriteLine(string? value, object?[] args) { - WriteLine(String.Format(value, args)); + WriteLine(value is null ? value : String.Format(value, args)); } readonly object mutex = new object(); diff --git a/csharp/extractor/Semmle.Util/ProcessStartInfoExtensions.cs b/csharp/extractor/Semmle.Util/ProcessStartInfoExtensions.cs index 0d9d6b04a13..5252372a58b 100644 --- a/csharp/extractor/Semmle.Util/ProcessStartInfoExtensions.cs +++ b/csharp/extractor/Semmle.Util/ProcessStartInfoExtensions.cs @@ -14,7 +14,7 @@ namespace Semmle.Util stdout = new List(); using (var process = Process.Start(pi)) { - string s; + string? s; do { s = process.StandardOutput.ReadLine(); diff --git a/csharp/extractor/Semmle.Util/Semmle.Util.csproj b/csharp/extractor/Semmle.Util/Semmle.Util.csproj index 4ef3eda35d9..39f0c7cdedb 100644 --- a/csharp/extractor/Semmle.Util/Semmle.Util.csproj +++ b/csharp/extractor/Semmle.Util/Semmle.Util.csproj @@ -6,6 +6,7 @@ Semmle.Util false win-x64;linux-x64;osx-x64 + enable diff --git a/csharp/extractor/Semmle.Util/SharedReference.cs b/csharp/extractor/Semmle.Util/SharedReference.cs index 6870c5acb35..ba87caeefaa 100644 --- a/csharp/extractor/Semmle.Util/SharedReference.cs +++ b/csharp/extractor/Semmle.Util/SharedReference.cs @@ -13,6 +13,6 @@ namespace Semmle.Util /// /// The shared object to which different parts of the code want to refer. /// - public T Obj { get; set; } + public T? Obj { get; set; } } } diff --git a/csharp/extractor/Semmle.Util/TemporaryDirectory.cs b/csharp/extractor/Semmle.Util/TemporaryDirectory.cs new file mode 100644 index 00000000000..a155372c9d8 --- /dev/null +++ b/csharp/extractor/Semmle.Util/TemporaryDirectory.cs @@ -0,0 +1,30 @@ +using System; +using System.IO; +using System.Linq; +using System.Security.Cryptography; +using System.Text; + +namespace Semmle.Util +{ + /// + /// A temporary directory that is created within the system temp directory. + /// When this object is disposed, the directory is deleted. + /// + public sealed class TemporaryDirectory : IDisposable + { + public DirectoryInfo DirInfo { get; } + + public TemporaryDirectory(string name) + { + DirInfo = new DirectoryInfo(name); + DirInfo.Create(); + } + + public void Dispose() + { + DirInfo.Delete(true); + } + + public override string ToString() => DirInfo.FullName.ToString(); + } +} diff --git a/csharp/ql/src/Security Features/CWE-091/XMLInjection.ql b/csharp/ql/src/Security Features/CWE-091/XMLInjection.ql index 6b9acf69e3c..ac9a3d90dbd 100644 --- a/csharp/ql/src/Security Features/CWE-091/XMLInjection.ql +++ b/csharp/ql/src/Security Features/CWE-091/XMLInjection.ql @@ -11,7 +11,7 @@ */ import csharp -import semmle.code.csharp.dataflow.flowsources.Remote +import semmle.code.csharp.security.dataflow.flowsources.Remote import semmle.code.csharp.frameworks.system.Xml /** diff --git a/csharp/ql/src/Security Features/CWE-114/AssemblyPathInjection.ql b/csharp/ql/src/Security Features/CWE-114/AssemblyPathInjection.ql index 2b87d61f193..0aab5e3181a 100644 --- a/csharp/ql/src/Security Features/CWE-114/AssemblyPathInjection.ql +++ b/csharp/ql/src/Security Features/CWE-114/AssemblyPathInjection.ql @@ -12,7 +12,7 @@ */ import csharp -import semmle.code.csharp.dataflow.flowsources.Remote +import semmle.code.csharp.security.dataflow.flowsources.Remote import semmle.code.csharp.commons.Util /** diff --git a/csharp/ql/src/Security Features/CWE-134/UncontrolledFormatString.ql b/csharp/ql/src/Security Features/CWE-134/UncontrolledFormatString.ql index 1b42ea7399d..3b9d6af7d14 100644 --- a/csharp/ql/src/Security Features/CWE-134/UncontrolledFormatString.ql +++ b/csharp/ql/src/Security Features/CWE-134/UncontrolledFormatString.ql @@ -11,8 +11,8 @@ */ import csharp -import semmle.code.csharp.dataflow.flowsources.Remote -import semmle.code.csharp.dataflow.flowsources.Local +import semmle.code.csharp.security.dataflow.flowsources.Remote +import semmle.code.csharp.security.dataflow.flowsources.Local import semmle.code.csharp.dataflow.TaintTracking import semmle.code.csharp.frameworks.Format import DataFlow::PathGraph diff --git a/csharp/ql/src/Security Features/CWE-201/ExposureInTransmittedData.ql b/csharp/ql/src/Security Features/CWE-201/ExposureInTransmittedData.ql index 4419a4f1312..b8b18c6b56d 100644 --- a/csharp/ql/src/Security Features/CWE-201/ExposureInTransmittedData.ql +++ b/csharp/ql/src/Security Features/CWE-201/ExposureInTransmittedData.ql @@ -11,8 +11,7 @@ import csharp import semmle.code.csharp.security.SensitiveActions -import semmle.code.csharp.security.dataflow.XSS -import semmle.code.csharp.security.dataflow.Email +import semmle.code.csharp.security.dataflow.flowsinks.Remote import semmle.code.csharp.frameworks.system.data.Common import semmle.code.csharp.frameworks.System import semmle.code.csharp.dataflow.DataFlow::DataFlow::PathGraph @@ -42,11 +41,7 @@ class TaintTrackingConfiguration extends TaintTracking::Configuration { ) } - override predicate isSink(DataFlow::Node sink) { - sink instanceof XSS::Sink - or - sink instanceof Email::Sink - } + override predicate isSink(DataFlow::Node sink) { sink instanceof RemoteFlowSink } } from TaintTrackingConfiguration configuration, DataFlow::PathNode source, DataFlow::PathNode sink diff --git a/csharp/ql/src/Security Features/CWE-209/ExceptionInformationExposure.ql b/csharp/ql/src/Security Features/CWE-209/ExceptionInformationExposure.ql index 8cebb2601c4..d9db652c8d8 100644 --- a/csharp/ql/src/Security Features/CWE-209/ExceptionInformationExposure.ql +++ b/csharp/ql/src/Security Features/CWE-209/ExceptionInformationExposure.ql @@ -14,7 +14,7 @@ import csharp import semmle.code.csharp.frameworks.System -import semmle.code.csharp.security.dataflow.XSS +import semmle.code.csharp.security.dataflow.flowsinks.Remote import semmle.code.csharp.dataflow.DataFlow::DataFlow::PathGraph /** @@ -46,7 +46,7 @@ class TaintTrackingConfiguration extends TaintTracking::Configuration { ) } - override predicate isSink(DataFlow::Node sink) { sink instanceof XSS::Sink } + override predicate isSink(DataFlow::Node sink) { sink instanceof RemoteFlowSink } override predicate isSanitizer(DataFlow::Node sanitizer) { // Do not flow through Message diff --git a/csharp/ql/src/Security Features/CWE-838/InappropriateEncoding.ql b/csharp/ql/src/Security Features/CWE-838/InappropriateEncoding.ql index 88a5c3da4ed..88a0b970a7a 100644 --- a/csharp/ql/src/Security Features/CWE-838/InappropriateEncoding.ql +++ b/csharp/ql/src/Security Features/CWE-838/InappropriateEncoding.ql @@ -16,7 +16,7 @@ import semmle.code.csharp.frameworks.system.Net import semmle.code.csharp.frameworks.system.Web import semmle.code.csharp.frameworks.system.web.UI import semmle.code.csharp.security.dataflow.SqlInjection -import semmle.code.csharp.security.dataflow.XSS +import semmle.code.csharp.security.dataflow.flowsinks.Html import semmle.code.csharp.security.dataflow.UrlRedirect import semmle.code.csharp.security.Sanitizers import semmle.code.csharp.dataflow.DataFlow2::DataFlow2 @@ -114,7 +114,7 @@ module EncodingConfigurations { override string getKind() { result = "HTML expression" } - override predicate requiresEncoding(Node n) { n instanceof XSS::HtmlSink } + override predicate requiresEncoding(Node n) { n instanceof HtmlSink } override predicate isPossibleEncodedValue(Expr e) { e instanceof HtmlSanitizedExpr } } diff --git a/csharp/ql/src/Security Features/InsecureRandomness.ql b/csharp/ql/src/Security Features/InsecureRandomness.ql index bd16dda8816..ef1665819f7 100644 --- a/csharp/ql/src/Security Features/InsecureRandomness.ql +++ b/csharp/ql/src/Security Features/InsecureRandomness.ql @@ -16,7 +16,7 @@ import semmle.code.csharp.frameworks.Test import semmle.code.csharp.dataflow.DataFlow::DataFlow::PathGraph module Random { - import semmle.code.csharp.dataflow.flowsources.Remote + import semmle.code.csharp.security.dataflow.flowsources.Remote import semmle.code.csharp.security.SensitiveActions /** diff --git a/csharp/ql/src/codeql-suites/csharp-code-scanning.qls b/csharp/ql/src/codeql-suites/csharp-code-scanning.qls new file mode 100644 index 00000000000..3646204da7d --- /dev/null +++ b/csharp/ql/src/codeql-suites/csharp-code-scanning.qls @@ -0,0 +1,4 @@ +- description: Standard Code Scanning queries for C# +- qlpack: codeql-csharp +- apply: code-scanning-selectors.yml + from: codeql-suite-helpers diff --git a/csharp/ql/src/semmle/code/csharp/dataflow/flowsources/PublicCallableParameter.qll b/csharp/ql/src/semmle/code/csharp/dataflow/flowsources/PublicCallableParameter.qll index 07dc38d320d..cf69f54717a 100644 --- a/csharp/ql/src/semmle/code/csharp/dataflow/flowsources/PublicCallableParameter.qll +++ b/csharp/ql/src/semmle/code/csharp/dataflow/flowsources/PublicCallableParameter.qll @@ -1,9 +1,10 @@ /** + * DEPRECATED. + * * Provides classes representing data flow sources for parameters of public callables. */ import csharp -private import semmle.code.csharp.frameworks.WCF /** * A parameter of a public callable, for example `p` in diff --git a/csharp/ql/src/semmle/code/csharp/dataflow/flowsources/Remote.qll b/csharp/ql/src/semmle/code/csharp/dataflow/flowsources/Remote.qll index f8240583108..07a23b36c26 100644 --- a/csharp/ql/src/semmle/code/csharp/dataflow/flowsources/Remote.qll +++ b/csharp/ql/src/semmle/code/csharp/dataflow/flowsources/Remote.qll @@ -1,218 +1,7 @@ /** - * Provides classes representing data flow sources for remote user input. + * DEPRECATED. + * + * Use `semmle.code.csharp.security.dataflow.flowsources.Remote` instead. */ -import csharp -private import semmle.code.csharp.frameworks.system.Net -private import semmle.code.csharp.frameworks.system.Web -private import semmle.code.csharp.frameworks.system.web.Http -private import semmle.code.csharp.frameworks.system.web.Mvc -private import semmle.code.csharp.frameworks.system.web.Services -private import semmle.code.csharp.frameworks.system.web.ui.WebControls -private import semmle.code.csharp.frameworks.WCF -private import semmle.code.csharp.frameworks.microsoft.Owin -private import semmle.code.csharp.frameworks.microsoft.AspNetCore - -/** A data flow source of remote user input. */ -abstract class RemoteFlowSource extends DataFlow::Node { - /** Gets a string that describes the type of this remote flow source. */ - abstract string getSourceType(); -} - -/** A data flow source of remote user input (ASP.NET). */ -abstract class AspNetRemoteFlowSource extends RemoteFlowSource { } - -/** A member containing an ASP.NET query string. */ -class AspNetQueryStringMember extends Member { - AspNetQueryStringMember() { - exists(RefType t | - t instanceof SystemWebHttpRequestClass or - t instanceof SystemNetHttpListenerRequestClass or - t instanceof SystemWebHttpRequestBaseClass - | - this = t.getProperty(getHttpRequestFlowPropertyNames()) or - this.(Field).getType() = t or - this.(Property).getType() = t or - this.(Callable).getReturnType() = t - ) - } -} - -/** - * Gets the names of the properties in `HttpRequest` classes that should propagate taint out of the - * request. - */ -private string getHttpRequestFlowPropertyNames() { - result = "QueryString" or - result = "Headers" or - result = "RawUrl" or - result = "Url" or - result = "Cookies" or - result = "Form" or - result = "Params" or - result = "Path" or - result = "PathInfo" -} - -/** A data flow source of remote user input (ASP.NET query string). */ -class AspNetQueryStringRemoteFlowSource extends AspNetRemoteFlowSource, DataFlow::ExprNode { - AspNetQueryStringRemoteFlowSource() { - exists(RefType t | - t instanceof SystemWebHttpRequestClass or - t instanceof SystemNetHttpListenerRequestClass or - t instanceof SystemWebHttpRequestBaseClass - | - // A request object can be indexed, so taint the object as well - this.getExpr().getType() = t - ) - or - this.getExpr() = any(AspNetQueryStringMember m).getAnAccess() - } - - override string getSourceType() { result = "ASP.NET query string" } -} - -/** A data flow source of remote user input (ASP.NET unvalidated request data). */ -class AspNetUnvalidatedQueryStringRemoteFlowSource extends AspNetRemoteFlowSource, - DataFlow::ExprNode { - AspNetUnvalidatedQueryStringRemoteFlowSource() { - this.getExpr() = any(SystemWebUnvalidatedRequestValues c).getAProperty().getGetter().getACall() or - this.getExpr() = - any(SystemWebUnvalidatedRequestValuesBase c).getAProperty().getGetter().getACall() - } - - override string getSourceType() { result = "ASP.NET unvalidated request data" } -} - -/** A data flow source of remote user input (ASP.NET user input). */ -class AspNetUserInputRemoteFlowSource extends AspNetRemoteFlowSource, DataFlow::ExprNode { - AspNetUserInputRemoteFlowSource() { getType() instanceof SystemWebUIWebControlsTextBoxClass } - - override string getSourceType() { result = "ASP.NET user input" } -} - -/** A data flow source of remote user input (WCF based web service). */ -class WcfRemoteFlowSource extends RemoteFlowSource, DataFlow::ParameterNode { - WcfRemoteFlowSource() { exists(OperationMethod om | om.getAParameter() = this.getParameter()) } - - override string getSourceType() { result = "web service input" } -} - -/** A data flow source of remote user input (ASP.NET web service). */ -class AspNetServiceRemoteFlowSource extends RemoteFlowSource, DataFlow::ParameterNode { - AspNetServiceRemoteFlowSource() { - exists(Method m | - m.getAParameter() = this.getParameter() and - m.getAnAttribute().getType() instanceof SystemWebServicesWebMethodAttributeClass - ) - } - - override string getSourceType() { result = "ASP.NET web service input" } -} - -/** A data flow source of remote user input (ASP.NET request message). */ -class SystemNetHttpRequestMessageRemoteFlowSource extends RemoteFlowSource, DataFlow::ExprNode { - SystemNetHttpRequestMessageRemoteFlowSource() { - getType() instanceof SystemWebHttpRequestMessageClass - } - - override string getSourceType() { result = "ASP.NET request message" } -} - -/** - * A data flow source of remote user input (Microsoft Owin, a query, request, - * or path string). - */ -class MicrosoftOwinStringFlowSource extends RemoteFlowSource, DataFlow::ExprNode { - MicrosoftOwinStringFlowSource() { - this.getExpr() = any(MicrosoftOwinString owinString).getValueProperty().getGetter().getACall() - } - - override string getSourceType() { result = "Microsoft Owin request or query string" } -} - -/** A data flow source of remote user input (`Microsoft Owin IOwinRequest`). */ -class MicrosoftOwinRequestRemoteFlowSource extends RemoteFlowSource, DataFlow::ExprNode { - MicrosoftOwinRequestRemoteFlowSource() { - exists(Property p, MicrosoftOwinIOwinRequestClass owinRequest | - this.getExpr() = p.getGetter().getACall() - | - p = owinRequest.getAcceptProperty() or - p = owinRequest.getBodyProperty() or - p = owinRequest.getCacheControlProperty() or - p = owinRequest.getContentTypeProperty() or - p = owinRequest.getContextProperty() or - p = owinRequest.getCookiesProperty() or - p = owinRequest.getHeadersProperty() or - p = owinRequest.getHostProperty() or - p = owinRequest.getMediaTypeProperty() or - p = owinRequest.getMethodProperty() or - p = owinRequest.getPathProperty() or - p = owinRequest.getPathBaseProperty() or - p = owinRequest.getQueryProperty() or - p = owinRequest.getQueryStringProperty() or - p = owinRequest.getRemoteIpAddressProperty() or - p = owinRequest.getSchemeProperty() or - p = owinRequest.getURIProperty() - ) - } - - override string getSourceType() { result = "Microsoft Owin request" } -} - -/** A parameter to an Mvc controller action method, viewed as a source of remote user input. */ -class ActionMethodParameter extends RemoteFlowSource, DataFlow::ParameterNode { - ActionMethodParameter() { - exists(Parameter p | - p = this.getParameter() and - p.fromSource() - | - p = any(Controller c).getAnActionMethod().getAParameter() or - p = any(ApiController c).getAnActionMethod().getAParameter() - ) - } - - override string getSourceType() { result = "ASP.NET MVC action method parameter" } -} - -/** A data flow source of remote user input (ASP.NET Core). */ -abstract class AspNetCoreRemoteFlowSource extends RemoteFlowSource { } - -/** A data flow source of remote user input (ASP.NET query collection). */ -class AspNetCoreQueryRemoteFlowSource extends AspNetCoreRemoteFlowSource, DataFlow::ExprNode { - AspNetCoreQueryRemoteFlowSource() { - exists(ValueOrRefType t | - t instanceof MicrosoftAspNetCoreHttpHttpRequest or - t instanceof MicrosoftAspNetCoreHttpQueryCollection or - t instanceof MicrosoftAspNetCoreHttpQueryString - | - this.getExpr().(Call).getTarget().getDeclaringType() = t or - this.asExpr().(Access).getTarget().getDeclaringType() = t - ) - or - exists(Call c | - c - .getTarget() - .getDeclaringType() - .hasQualifiedName("Microsoft.AspNetCore.Http", "IQueryCollection") and - c.getTarget().getName() = "TryGetValue" and - this.asExpr() = c.getArgumentForName("value") - ) - } - - override string getSourceType() { result = "ASP.NET Core query string" } -} - -/** A parameter to a `Mvc` controller action method, viewed as a source of remote user input. */ -class AspNetCoreActionMethodParameter extends RemoteFlowSource, DataFlow::ParameterNode { - AspNetCoreActionMethodParameter() { - exists(Parameter p | - p = this.getParameter() and - p.fromSource() - | - p = any(MicrosoftAspNetCoreMvcController c).getAnActionMethod().getAParameter() - ) - } - - override string getSourceType() { result = "ASP.NET Core MVC action method parameter" } -} +import semmle.code.csharp.security.dataflow.flowsources.Remote diff --git a/csharp/ql/src/semmle/code/csharp/exprs/Access.qll b/csharp/ql/src/semmle/code/csharp/exprs/Access.qll index c0d75bfd9fe..d1e1017e0d1 100644 --- a/csharp/ql/src/semmle/code/csharp/exprs/Access.qll +++ b/csharp/ql/src/semmle/code/csharp/exprs/Access.qll @@ -388,7 +388,12 @@ library class PropertyAccessExpr extends Expr, @property_access_expr { /** Gets the target of this property access. */ Property getProperty() { expr_access(this, result) } - override string toString() { result = "access to property " + this.getProperty().getName() } + override string toString() { + result = "access to property " + this.getProperty().getName() + or + not exists(this.getProperty()) and + result = "access to property (unknown)" + } } /** diff --git a/csharp/ql/src/semmle/code/csharp/ir/implementation/raw/gvn/PrintValueNumbering.qll b/csharp/ql/src/semmle/code/csharp/ir/implementation/raw/gvn/PrintValueNumbering.qll new file mode 100644 index 00000000000..a7fb1b3c07e --- /dev/null +++ b/csharp/ql/src/semmle/code/csharp/ir/implementation/raw/gvn/PrintValueNumbering.qll @@ -0,0 +1,17 @@ +private import internal.ValueNumberingImports +private import ValueNumbering + +/** + * Provides additional information about value numbering in IR dumps. + */ +class ValueNumberPropertyProvider extends IRPropertyProvider { + override string getInstructionProperty(Instruction instr, string key) { + exists(ValueNumber vn | + vn = valueNumber(instr) and + key = "valnum" and + if strictcount(vn.getAnInstruction()) > 1 + then result = vn.getDebugString() + else result = "unique" + ) + } +} diff --git a/csharp/ql/src/semmle/code/csharp/ir/implementation/raw/gvn/ValueNumbering.qll b/csharp/ql/src/semmle/code/csharp/ir/implementation/raw/gvn/ValueNumbering.qll index 161f69936e9..13d19587135 100644 --- a/csharp/ql/src/semmle/code/csharp/ir/implementation/raw/gvn/ValueNumbering.qll +++ b/csharp/ql/src/semmle/code/csharp/ir/implementation/raw/gvn/ValueNumbering.qll @@ -1,21 +1,6 @@ private import internal.ValueNumberingInternal private import internal.ValueNumberingImports -/** - * Provides additional information about value numbering in IR dumps. - */ -class ValueNumberPropertyProvider extends IRPropertyProvider { - override string getInstructionProperty(Instruction instr, string key) { - exists(ValueNumber vn | - vn = valueNumber(instr) and - key = "valnum" and - if strictcount(vn.getAnInstruction()) > 1 - then result = vn.getDebugString() - else result = "unique" - ) - } -} - /** * The value number assigned to a particular set of instructions that produce equivalent results. */ diff --git a/csharp/ql/src/semmle/code/csharp/ir/implementation/unaliased_ssa/gvn/PrintValueNumbering.qll b/csharp/ql/src/semmle/code/csharp/ir/implementation/unaliased_ssa/gvn/PrintValueNumbering.qll new file mode 100644 index 00000000000..a7fb1b3c07e --- /dev/null +++ b/csharp/ql/src/semmle/code/csharp/ir/implementation/unaliased_ssa/gvn/PrintValueNumbering.qll @@ -0,0 +1,17 @@ +private import internal.ValueNumberingImports +private import ValueNumbering + +/** + * Provides additional information about value numbering in IR dumps. + */ +class ValueNumberPropertyProvider extends IRPropertyProvider { + override string getInstructionProperty(Instruction instr, string key) { + exists(ValueNumber vn | + vn = valueNumber(instr) and + key = "valnum" and + if strictcount(vn.getAnInstruction()) > 1 + then result = vn.getDebugString() + else result = "unique" + ) + } +} diff --git a/csharp/ql/src/semmle/code/csharp/ir/implementation/unaliased_ssa/gvn/ValueNumbering.qll b/csharp/ql/src/semmle/code/csharp/ir/implementation/unaliased_ssa/gvn/ValueNumbering.qll index 161f69936e9..13d19587135 100644 --- a/csharp/ql/src/semmle/code/csharp/ir/implementation/unaliased_ssa/gvn/ValueNumbering.qll +++ b/csharp/ql/src/semmle/code/csharp/ir/implementation/unaliased_ssa/gvn/ValueNumbering.qll @@ -1,21 +1,6 @@ private import internal.ValueNumberingInternal private import internal.ValueNumberingImports -/** - * Provides additional information about value numbering in IR dumps. - */ -class ValueNumberPropertyProvider extends IRPropertyProvider { - override string getInstructionProperty(Instruction instr, string key) { - exists(ValueNumber vn | - vn = valueNumber(instr) and - key = "valnum" and - if strictcount(vn.getAnInstruction()) > 1 - then result = vn.getDebugString() - else result = "unique" - ) - } -} - /** * The value number assigned to a particular set of instructions that produce equivalent results. */ diff --git a/csharp/ql/src/semmle/code/csharp/security/dataflow/CleartextStorage.qll b/csharp/ql/src/semmle/code/csharp/security/dataflow/CleartextStorage.qll index 8eff8c88e3f..a0ce6d9d1ae 100644 --- a/csharp/ql/src/semmle/code/csharp/security/dataflow/CleartextStorage.qll +++ b/csharp/ql/src/semmle/code/csharp/security/dataflow/CleartextStorage.qll @@ -5,10 +5,10 @@ import csharp module CleartextStorage { - import semmle.code.csharp.dataflow.flowsources.Remote + import semmle.code.csharp.security.dataflow.flowsources.Remote import semmle.code.csharp.frameworks.system.Web import semmle.code.csharp.security.SensitiveActions - import semmle.code.csharp.security.sinks.ExternalLocationSink + import semmle.code.csharp.security.dataflow.flowsinks.ExternalLocationSink /** * A data flow source for cleartext storage of sensitive information. diff --git a/csharp/ql/src/semmle/code/csharp/security/dataflow/CodeInjection.qll b/csharp/ql/src/semmle/code/csharp/security/dataflow/CodeInjection.qll index 3c168463d9d..d58d851b7c8 100644 --- a/csharp/ql/src/semmle/code/csharp/security/dataflow/CodeInjection.qll +++ b/csharp/ql/src/semmle/code/csharp/security/dataflow/CodeInjection.qll @@ -5,8 +5,8 @@ import csharp module CodeInjection { - import semmle.code.csharp.dataflow.flowsources.Remote - import semmle.code.csharp.dataflow.flowsources.Local + import semmle.code.csharp.security.dataflow.flowsources.Remote + import semmle.code.csharp.security.dataflow.flowsources.Local import semmle.code.csharp.frameworks.system.codedom.Compiler import semmle.code.csharp.security.Sanitizers diff --git a/csharp/ql/src/semmle/code/csharp/security/dataflow/CommandInjection.qll b/csharp/ql/src/semmle/code/csharp/security/dataflow/CommandInjection.qll index 01d6c01fd2b..7d2a49784e1 100644 --- a/csharp/ql/src/semmle/code/csharp/security/dataflow/CommandInjection.qll +++ b/csharp/ql/src/semmle/code/csharp/security/dataflow/CommandInjection.qll @@ -5,7 +5,7 @@ import csharp module CommandInjection { - import semmle.code.csharp.dataflow.flowsources.Remote + import semmle.code.csharp.security.dataflow.flowsources.Remote import semmle.code.csharp.frameworks.system.Diagnostics import semmle.code.csharp.security.Sanitizers diff --git a/csharp/ql/src/semmle/code/csharp/security/dataflow/ConditionalBypass.qll b/csharp/ql/src/semmle/code/csharp/security/dataflow/ConditionalBypass.qll index 18c725d273e..0694b27f985 100644 --- a/csharp/ql/src/semmle/code/csharp/security/dataflow/ConditionalBypass.qll +++ b/csharp/ql/src/semmle/code/csharp/security/dataflow/ConditionalBypass.qll @@ -8,7 +8,7 @@ import csharp module UserControlledBypassOfSensitiveMethod { import semmle.code.csharp.controlflow.Guards import semmle.code.csharp.controlflow.BasicBlocks - import semmle.code.csharp.dataflow.flowsources.Remote + import semmle.code.csharp.security.dataflow.flowsources.Remote import semmle.code.csharp.frameworks.System import semmle.code.csharp.frameworks.system.Net import semmle.code.csharp.security.SensitiveActions diff --git a/csharp/ql/src/semmle/code/csharp/security/dataflow/ExposureOfPrivateInformation.qll b/csharp/ql/src/semmle/code/csharp/security/dataflow/ExposureOfPrivateInformation.qll index 45d5794dca6..19866738341 100644 --- a/csharp/ql/src/semmle/code/csharp/security/dataflow/ExposureOfPrivateInformation.qll +++ b/csharp/ql/src/semmle/code/csharp/security/dataflow/ExposureOfPrivateInformation.qll @@ -5,8 +5,8 @@ import csharp module ExposureOfPrivateInformation { - import semmle.code.csharp.dataflow.flowsources.Remote - import semmle.code.csharp.security.sinks.ExternalLocationSink + import semmle.code.csharp.security.dataflow.flowsources.Remote + import semmle.code.csharp.security.dataflow.flowsinks.ExternalLocationSink import semmle.code.csharp.security.PrivateData /** diff --git a/csharp/ql/src/semmle/code/csharp/security/dataflow/LDAPInjection.qll b/csharp/ql/src/semmle/code/csharp/security/dataflow/LDAPInjection.qll index 329e001236f..50681f9721c 100644 --- a/csharp/ql/src/semmle/code/csharp/security/dataflow/LDAPInjection.qll +++ b/csharp/ql/src/semmle/code/csharp/security/dataflow/LDAPInjection.qll @@ -6,7 +6,7 @@ import csharp module LDAPInjection { - import semmle.code.csharp.dataflow.flowsources.Remote + import semmle.code.csharp.security.dataflow.flowsources.Remote import semmle.code.csharp.frameworks.system.DirectoryServices import semmle.code.csharp.frameworks.system.directoryservices.Protocols import semmle.code.csharp.security.Sanitizers diff --git a/csharp/ql/src/semmle/code/csharp/security/dataflow/LogForging.qll b/csharp/ql/src/semmle/code/csharp/security/dataflow/LogForging.qll index 3460362cf99..400b1ac8763 100644 --- a/csharp/ql/src/semmle/code/csharp/security/dataflow/LogForging.qll +++ b/csharp/ql/src/semmle/code/csharp/security/dataflow/LogForging.qll @@ -5,11 +5,11 @@ import csharp module LogForging { - import semmle.code.csharp.dataflow.flowsources.Remote + import semmle.code.csharp.security.dataflow.flowsources.Remote import semmle.code.csharp.frameworks.System import semmle.code.csharp.frameworks.system.text.RegularExpressions import semmle.code.csharp.security.Sanitizers - import semmle.code.csharp.security.sinks.ExternalLocationSink + import semmle.code.csharp.security.dataflow.flowsinks.ExternalLocationSink /** * A data flow source for untrusted user input used in log entries. diff --git a/csharp/ql/src/semmle/code/csharp/security/dataflow/MissingXMLValidation.qll b/csharp/ql/src/semmle/code/csharp/security/dataflow/MissingXMLValidation.qll index 28b8350a200..ba5ee7147d3 100644 --- a/csharp/ql/src/semmle/code/csharp/security/dataflow/MissingXMLValidation.qll +++ b/csharp/ql/src/semmle/code/csharp/security/dataflow/MissingXMLValidation.qll @@ -6,7 +6,7 @@ import csharp module MissingXMLValidation { - import semmle.code.csharp.dataflow.flowsources.Remote + import semmle.code.csharp.security.dataflow.flowsources.Remote import semmle.code.csharp.frameworks.system.Xml import semmle.code.csharp.security.Sanitizers diff --git a/csharp/ql/src/semmle/code/csharp/security/dataflow/ReDoS.qll b/csharp/ql/src/semmle/code/csharp/security/dataflow/ReDoS.qll index bb40b62a0be..28c5c79e0af 100644 --- a/csharp/ql/src/semmle/code/csharp/security/dataflow/ReDoS.qll +++ b/csharp/ql/src/semmle/code/csharp/security/dataflow/ReDoS.qll @@ -7,7 +7,7 @@ import csharp module ReDoS { private import semmle.code.csharp.dataflow.DataFlow2 - import semmle.code.csharp.dataflow.flowsources.Remote + import semmle.code.csharp.security.dataflow.flowsources.Remote import semmle.code.csharp.frameworks.system.text.RegularExpressions import semmle.code.csharp.security.Sanitizers diff --git a/csharp/ql/src/semmle/code/csharp/security/dataflow/RegexInjection.qll b/csharp/ql/src/semmle/code/csharp/security/dataflow/RegexInjection.qll index 51f8605c1af..2edfbc4ab7c 100644 --- a/csharp/ql/src/semmle/code/csharp/security/dataflow/RegexInjection.qll +++ b/csharp/ql/src/semmle/code/csharp/security/dataflow/RegexInjection.qll @@ -6,7 +6,7 @@ import csharp module RegexInjection { - import semmle.code.csharp.dataflow.flowsources.Remote + import semmle.code.csharp.security.dataflow.flowsources.Remote import semmle.code.csharp.frameworks.system.text.RegularExpressions import semmle.code.csharp.security.Sanitizers diff --git a/csharp/ql/src/semmle/code/csharp/security/dataflow/ResourceInjection.qll b/csharp/ql/src/semmle/code/csharp/security/dataflow/ResourceInjection.qll index 6ae49c6f7d5..236fe62aa2c 100644 --- a/csharp/ql/src/semmle/code/csharp/security/dataflow/ResourceInjection.qll +++ b/csharp/ql/src/semmle/code/csharp/security/dataflow/ResourceInjection.qll @@ -5,8 +5,8 @@ import csharp module ResourceInjection { - import semmle.code.csharp.dataflow.flowsources.Remote - import semmle.code.csharp.dataflow.flowsources.Local + import semmle.code.csharp.security.dataflow.flowsources.Remote + import semmle.code.csharp.security.dataflow.flowsources.Local import semmle.code.csharp.frameworks.system.Data import semmle.code.csharp.security.Sanitizers diff --git a/csharp/ql/src/semmle/code/csharp/security/dataflow/SqlInjection.qll b/csharp/ql/src/semmle/code/csharp/security/dataflow/SqlInjection.qll index cb393e00179..21c2628aa62 100644 --- a/csharp/ql/src/semmle/code/csharp/security/dataflow/SqlInjection.qll +++ b/csharp/ql/src/semmle/code/csharp/security/dataflow/SqlInjection.qll @@ -5,8 +5,8 @@ import csharp module SqlInjection { - import semmle.code.csharp.dataflow.flowsources.Remote - import semmle.code.csharp.dataflow.flowsources.Local + import semmle.code.csharp.security.dataflow.flowsources.Remote + import semmle.code.csharp.security.dataflow.flowsources.Local import semmle.code.csharp.frameworks.Sql import semmle.code.csharp.security.Sanitizers diff --git a/csharp/ql/src/semmle/code/csharp/security/dataflow/TaintedPath.qll b/csharp/ql/src/semmle/code/csharp/security/dataflow/TaintedPath.qll index 78a0ca2c3bf..df59f0a4793 100644 --- a/csharp/ql/src/semmle/code/csharp/security/dataflow/TaintedPath.qll +++ b/csharp/ql/src/semmle/code/csharp/security/dataflow/TaintedPath.qll @@ -7,7 +7,7 @@ import csharp module TaintedPath { import semmle.code.csharp.controlflow.Guards - import semmle.code.csharp.dataflow.flowsources.Remote + import semmle.code.csharp.security.dataflow.flowsources.Remote import semmle.code.csharp.frameworks.system.IO import semmle.code.csharp.frameworks.system.Web import semmle.code.csharp.security.Sanitizers diff --git a/csharp/ql/src/semmle/code/csharp/security/dataflow/UnsafeDeserialization.qll b/csharp/ql/src/semmle/code/csharp/security/dataflow/UnsafeDeserialization.qll index 57b3a78485b..0943774d8cd 100644 --- a/csharp/ql/src/semmle/code/csharp/security/dataflow/UnsafeDeserialization.qll +++ b/csharp/ql/src/semmle/code/csharp/security/dataflow/UnsafeDeserialization.qll @@ -6,7 +6,7 @@ import csharp module UnsafeDeserialization { - private import semmle.code.csharp.dataflow.flowsources.Remote + private import semmle.code.csharp.security.dataflow.flowsources.Remote private import semmle.code.csharp.serialization.Deserializers /** diff --git a/csharp/ql/src/semmle/code/csharp/security/dataflow/UrlRedirect.qll b/csharp/ql/src/semmle/code/csharp/security/dataflow/UrlRedirect.qll index ff501a1c096..2008e62c60d 100644 --- a/csharp/ql/src/semmle/code/csharp/security/dataflow/UrlRedirect.qll +++ b/csharp/ql/src/semmle/code/csharp/security/dataflow/UrlRedirect.qll @@ -5,7 +5,7 @@ import csharp module UrlRedirect { - import semmle.code.csharp.dataflow.flowsources.Remote + import semmle.code.csharp.security.dataflow.flowsources.Remote import semmle.code.csharp.controlflow.Guards import semmle.code.csharp.frameworks.system.Web import semmle.code.csharp.frameworks.system.web.Mvc diff --git a/csharp/ql/src/semmle/code/csharp/security/dataflow/XMLEntityInjection.qll b/csharp/ql/src/semmle/code/csharp/security/dataflow/XMLEntityInjection.qll index 67c5cb552f5..425fd6b4019 100644 --- a/csharp/ql/src/semmle/code/csharp/security/dataflow/XMLEntityInjection.qll +++ b/csharp/ql/src/semmle/code/csharp/security/dataflow/XMLEntityInjection.qll @@ -5,7 +5,7 @@ import csharp module XMLEntityInjection { - import semmle.code.csharp.dataflow.flowsources.Remote + import semmle.code.csharp.security.dataflow.flowsources.Remote import semmle.code.csharp.frameworks.System import semmle.code.csharp.frameworks.system.text.RegularExpressions import semmle.code.csharp.security.xml.InsecureXML diff --git a/csharp/ql/src/semmle/code/csharp/security/dataflow/XPathInjection.qll b/csharp/ql/src/semmle/code/csharp/security/dataflow/XPathInjection.qll index e2ef4797dfe..7c84561cf44 100644 --- a/csharp/ql/src/semmle/code/csharp/security/dataflow/XPathInjection.qll +++ b/csharp/ql/src/semmle/code/csharp/security/dataflow/XPathInjection.qll @@ -5,7 +5,7 @@ import csharp module XPathInjection { - import semmle.code.csharp.dataflow.flowsources.Remote + import semmle.code.csharp.security.dataflow.flowsources.Remote import semmle.code.csharp.frameworks.system.xml.XPath import semmle.code.csharp.frameworks.system.Xml import semmle.code.csharp.security.Sanitizers diff --git a/csharp/ql/src/semmle/code/csharp/security/dataflow/XSS.qll b/csharp/ql/src/semmle/code/csharp/security/dataflow/XSS.qll index 513b0b89f00..635c59363f5 100644 --- a/csharp/ql/src/semmle/code/csharp/security/dataflow/XSS.qll +++ b/csharp/ql/src/semmle/code/csharp/security/dataflow/XSS.qll @@ -6,17 +6,14 @@ import csharp module XSS { - import semmle.code.csharp.dataflow.flowsources.Remote - import semmle.code.csharp.frameworks.microsoft.AspNetCore + import semmle.code.asp.AspNet import semmle.code.csharp.frameworks.system.Net import semmle.code.csharp.frameworks.system.Web - import semmle.code.csharp.frameworks.system.web.Mvc - import semmle.code.csharp.frameworks.system.web.WebPages import semmle.code.csharp.frameworks.system.web.UI - import semmle.code.csharp.frameworks.system.web.ui.WebControls - import semmle.code.csharp.frameworks.system.windows.Forms import semmle.code.csharp.security.Sanitizers - import semmle.code.asp.AspNet + import semmle.code.csharp.security.dataflow.flowsinks.Html + import semmle.code.csharp.security.dataflow.flowsinks.Remote + import semmle.code.csharp.security.dataflow.flowsources.Remote /** * Holds if there is tainted flow from `source` to `sink` that may lead to a @@ -112,8 +109,11 @@ module XSS { /** * A data flow sink for cross-site scripting (XSS) vulnerabilities. + * + * Any XSS sink is also a remote flow sink, so this class contributes + * to the abstract class `RemoteFlowSink`. */ - abstract class Sink extends DataFlow::ExprNode { + abstract class Sink extends DataFlow::ExprNode, RemoteFlowSink { string explanation() { none() } } @@ -166,78 +166,21 @@ module XSS { UrlEncodeSanitizer() { this.getExpr() instanceof UrlSanitizedExpr } } - /** A sink where the value of the expression may be rendered as HTML. */ - abstract class HtmlSink extends DataFlow::Node { } + private class HtmlSinkSink extends Sink { + HtmlSinkSink() { this instanceof HtmlSink } - /** - * An expression that is used as an argument to an XSS sink method on - * `HttpResponse`. - */ - private class HttpResponseSink extends Sink, HtmlSink { - HttpResponseSink() { - exists(Method m, SystemWebHttpResponseClass responseClass | - m = responseClass.getAWriteMethod() or - m = responseClass.getAWriteFileMethod() or - m = responseClass.getATransmitFileMethod() or - m = responseClass.getABinaryWriteMethod() - | - // Calls to these methods, or overrides of them - this.getExpr() = m.getAnOverrider*().getParameter(0).getAnAssignedArgument() - ) - } - } - - /** - * An expression that is used as an argument to an XSS sink method on - * `HtmlTextWriter`. - */ - private class HtmlTextWriterSink extends Sink, HtmlSink { - HtmlTextWriterSink() { - exists(SystemWebUIHtmlTextWriterClass writeClass, Method m, Call c, int paramPos | - paramPos = 0 and - ( - m = writeClass.getAWriteMethod() or - m = writeClass.getAWriteLineMethod() or - m = writeClass.getAWriteLineNoTabsMethod() or - m = writeClass.getAWriteBeginTagMethod() or - m = writeClass.getAWriteAttributeMethod() - ) - or - // The second parameter to the `WriteAttribute` method is the attribute value, which we - // should only consider as tainted if the call does not ask for the attribute value to be - // encoded using the final parameter. - m = writeClass.getAWriteAttributeMethod() and - paramPos = 1 and - not c.getArgumentForParameter(m.getParameter(2)).(BoolLiteral).getBoolValue() = true - | - c = m.getACall() and - this.getExpr() = c.getArgumentForParameter(m.getParameter(paramPos)) - ) - } - } - - /** - * An expression that is used as an argument to an XSS sink method on - * `AttributeCollection`. - */ - private class AttributeCollectionSink extends Sink, HtmlSink { - AttributeCollectionSink() { - exists(SystemWebUIAttributeCollectionClass ac, Parameter p | - p = ac.getAddMethod().getParameter(1) or - p = ac.getItemProperty().getSetter().getParameter(0) - | - this.getExpr() = p.getAnAssignedArgument() - ) - } - } - - /** - * An expression that is used as the second argument `HtmlElement.SetAttribute`. - */ - private class SetAttributeSink extends Sink, HtmlSink { - SetAttributeSink() { - this.getExpr() = - any(SystemWindowsFormsHtmlElement c).getSetAttributeMethod().getACall().getArgument(1) + override string explanation() { + this instanceof WebPageWriteLiteralSink and + result = "System.Web.WebPages.WebPage.WriteLiteral() method" + or + this instanceof WebPageWriteLiteralToSink and + result = "System.Web.WebPages.WebPage.WriteLiteralTo() method" + or + this instanceof MicrosoftAspNetCoreMvcHtmlHelperRawSink and + result = "Microsoft.AspNetCore.Mvc.ViewFeatures.HtmlHelper.Raw() method" + or + this instanceof MicrosoftAspNetRazorPageWriteLiteralSink and + result = "Microsoft.AspNetCore.Mvc.Razor.RazorPageBase.WriteLiteral() method" } } @@ -285,31 +228,6 @@ module XSS { } } - /** - * An expression that is used as an argument to an XSS sink setter, on - * a class within the `System.Web.UI` namespace. - */ - private class SystemWebSetterHtmlSink extends Sink, HtmlSink { - SystemWebSetterHtmlSink() { - exists(Property p, string name, ValueOrRefType declaringType | - declaringType = p.getDeclaringType() and - any(SystemWebUINamespace n).getAChildNamespace*() = declaringType.getNamespace() and - this.getExpr() = p.getSetter().getParameter(0).getAnAssignedArgument() and - p.hasName(name) - | - name = "Caption" and - (declaringType.hasName("Calendar") or declaringType.hasName("Table")) - or - name = "InnerHtml" - ) - or - exists(SystemWebUIWebControlsLabelClass c | - // Unlike `Text` properties of other web controls, `Label.Text` is not automatically HTML encoded - this.getExpr() = c.getTextProperty().getSetter().getParameter(0).getAnAssignedArgument() - ) - } - } - /** * An expression that is used as an argument to an XSS sink setter, on * a class within the `System.Web.UI` namespace. @@ -345,16 +263,6 @@ module XSS { } } - /** - * An expression that is used as an argument to `HtmlHelper.Raw`, typically in - * a `.cshtml` file. - */ - private class SystemWebMvcHtmlHelperRawSink extends Sink, HtmlSink { - SystemWebMvcHtmlHelperRawSink() { - this.getExpr() = any(SystemWebMvcHtmlHelperClass h).getRawMethod().getACall().getAnArgument() - } - } - /** * Gets a member which is accessed by the given `AspInlineCode`. * The code body must consist only of an access to the member, possibly with qualified @@ -493,31 +401,6 @@ module XSS { } } - /** An expression that is returned from a `ToHtmlString` method. */ - private class ToHtmlString extends Sink, HtmlSink { - ToHtmlString() { - exists(Method toHtmlString | - toHtmlString = - any(SystemWebIHtmlString i).getToHtmlStringMethod().getAnUltimateImplementor() and - toHtmlString.canReturn(this.getExpr()) - ) - } - } - - /** - * An expression passed to the constructor of an `HtmlString` or a `MvcHtmlString`. - */ - private class HtmlString extends Sink, HtmlSink { - HtmlString() { - exists(Class c | - c = any(SystemWebMvcMvcHtmlString m) or - c = any(SystemWebHtmlString m) - | - this.getExpr() = c.getAConstructor().getACall().getAnArgument() - ) - } - } - /** * An expression passed as the `content` argument to the constructor of `StringContent`. */ @@ -529,75 +412,6 @@ module XSS { ).getArgumentForName("content") } } - - /** - * An expression that is used as an argument to `Page.WriteLiteral`, typically in - * a `.cshtml` file. - */ - class WebPageWriteLiteralSink extends Sink, HtmlSink { - WebPageWriteLiteralSink() { - this.getExpr() = any(WebPageClass h).getWriteLiteralMethod().getACall().getAnArgument() - } - - override string explanation() { result = "System.Web.WebPages.WebPage.WriteLiteral() method" } - } - - /** - * An expression that is used as an argument to `Page.WriteLiteralTo`, typically in - * a `.cshtml` file. - */ - class WebPageWriteLiteralToSink extends Sink, HtmlSink { - WebPageWriteLiteralToSink() { - this.getExpr() = any(WebPageClass h).getWriteLiteralToMethod().getACall().getAnArgument() - } - - override string explanation() { result = "System.Web.WebPages.WebPage.WriteLiteralTo() method" } - } - - abstract class AspNetCoreSink extends Sink, HtmlSink { } - - /** - * An expression that is used as an argument to `HtmlHelper.Raw`, typically in - * a `.cshtml` file. - */ - class MicrosoftAspNetCoreMvcHtmlHelperRawSink extends AspNetCoreSink { - MicrosoftAspNetCoreMvcHtmlHelperRawSink() { - this.getExpr() = - any(MicrosoftAspNetCoreMvcHtmlHelperClass h).getRawMethod().getACall().getAnArgument() - } - - override string explanation() { - result = "Microsoft.AspNetCore.Mvc.ViewFeatures.HtmlHelper.Raw() method" - } - } - - /** - * An expression that is used as an argument to `Page.WriteLiteral` in ASP.NET 6.0 razor page, typically in - * a `.cshtml` file. - */ - class MicrosoftAspNetRazorPageWriteLiteralSink extends AspNetCoreSink { - MicrosoftAspNetRazorPageWriteLiteralSink() { - this.getExpr() = - any(MicrosoftAspNetCoreMvcRazorPageBase h) - .getWriteLiteralMethod() - .getACall() - .getAnArgument() - } - - override string explanation() { - result = "Microsoft.AspNetCore.Mvc.Razor.RazorPageBase.WriteLiteral() method" - } - } - - /** `HtmlString` that may be rendered as is need to have sanitized value. */ - class MicrosoftAspNetHtmlStringSink extends AspNetCoreSink { - MicrosoftAspNetHtmlStringSink() { - exists(ObjectCreation c, MicrosoftAspNetCoreHttpHtmlString s | - c.getTarget() = s.getAConstructor() and - this.asExpr() = c.getAnArgument() - ) - } - } } private Type getMemberType(Member m) { diff --git a/csharp/ql/src/semmle/code/csharp/security/dataflow/Email.qll b/csharp/ql/src/semmle/code/csharp/security/dataflow/flowsinks/Email.qll similarity index 88% rename from csharp/ql/src/semmle/code/csharp/security/dataflow/Email.qll rename to csharp/ql/src/semmle/code/csharp/security/dataflow/flowsinks/Email.qll index b86fa4822ae..c45bd9e8b39 100644 --- a/csharp/ql/src/semmle/code/csharp/security/dataflow/Email.qll +++ b/csharp/ql/src/semmle/code/csharp/security/dataflow/flowsinks/Email.qll @@ -1,11 +1,13 @@ /** Provides data flow sinks for sending email. */ import csharp +private import Remote private import semmle.code.csharp.frameworks.system.net.Mail +/** Provides sinks for emails. */ module Email { /** A data flow sink for sending email. */ - abstract class Sink extends DataFlow::ExprNode { } + abstract class Sink extends DataFlow::ExprNode, RemoteFlowSink { } /** A data flow sink for sending email via `System.Net.Mail.MailMessage`. */ class MailMessageSink extends Sink { diff --git a/csharp/ql/src/semmle/code/csharp/security/sinks/ExternalLocationSink.qll b/csharp/ql/src/semmle/code/csharp/security/dataflow/flowsinks/ExternalLocationSink.qll similarity index 95% rename from csharp/ql/src/semmle/code/csharp/security/sinks/ExternalLocationSink.qll rename to csharp/ql/src/semmle/code/csharp/security/dataflow/flowsinks/ExternalLocationSink.qll index 6cd1a5cb269..25a50f3733c 100644 --- a/csharp/ql/src/semmle/code/csharp/security/sinks/ExternalLocationSink.qll +++ b/csharp/ql/src/semmle/code/csharp/security/dataflow/flowsinks/ExternalLocationSink.qll @@ -3,6 +3,7 @@ */ import csharp +private import Remote private import semmle.code.csharp.commons.Loggers private import semmle.code.csharp.frameworks.system.Web @@ -45,7 +46,7 @@ class TraceMessageSink extends ExternalLocationSink { /** * An expression set as a value on a cookie instance. */ -class CookieStorageSink extends ExternalLocationSink { +class CookieStorageSink extends ExternalLocationSink, RemoteFlowSink { CookieStorageSink() { exists(Expr e | e = this.getExpr() | e = any(SystemWebHttpCookie cookie).getAConstructor().getACall().getArgumentForName("value") diff --git a/csharp/ql/src/semmle/code/csharp/security/dataflow/flowsinks/Html.qll b/csharp/ql/src/semmle/code/csharp/security/dataflow/flowsinks/Html.qll new file mode 100644 index 00000000000..2363e24ffeb --- /dev/null +++ b/csharp/ql/src/semmle/code/csharp/security/dataflow/flowsinks/Html.qll @@ -0,0 +1,208 @@ +/** + * Provides classes representing HTML data flow sinks. + */ + +import csharp +private import Remote +private import semmle.code.csharp.frameworks.microsoft.AspNetCore +private import semmle.code.csharp.frameworks.system.Net +private import semmle.code.csharp.frameworks.system.Web +private import semmle.code.csharp.frameworks.system.web.Mvc +private import semmle.code.csharp.frameworks.system.web.WebPages +private import semmle.code.csharp.frameworks.system.web.UI +private import semmle.code.csharp.frameworks.system.web.ui.WebControls +private import semmle.code.csharp.frameworks.system.windows.Forms +private import semmle.code.csharp.security.dataflow.flowsources.Remote +private import semmle.code.asp.AspNet + +/** + * A sink where the value of the expression may be rendered as HTML, + * without implicit HTML encoding. + */ +abstract class HtmlSink extends DataFlow::ExprNode, RemoteFlowSink { } + +/** + * An expression that is used as an argument to an HTML sink method on + * `HttpResponse`. + */ +class HttpResponseSink extends HtmlSink { + HttpResponseSink() { + exists(Method m, SystemWebHttpResponseClass responseClass | + m = responseClass.getAWriteMethod() or + m = responseClass.getAWriteFileMethod() or + m = responseClass.getATransmitFileMethod() or + m = responseClass.getABinaryWriteMethod() + | + // Calls to these methods, or overrides of them + this.getExpr() = m.getAnOverrider*().getParameter(0).getAnAssignedArgument() + ) + } +} + +/** + * An expression that is used as an argument to an HTML sink method on + * `HtmlTextWriter`. + */ +class HtmlTextWriterSink extends HtmlSink { + HtmlTextWriterSink() { + exists(SystemWebUIHtmlTextWriterClass writeClass, Method m, Call c, int paramPos | + paramPos = 0 and + ( + m = writeClass.getAWriteMethod() or + m = writeClass.getAWriteLineMethod() or + m = writeClass.getAWriteLineNoTabsMethod() or + m = writeClass.getAWriteBeginTagMethod() or + m = writeClass.getAWriteAttributeMethod() + ) + or + // The second parameter to the `WriteAttribute` method is the attribute value, which we + // should only consider as tainted if the call does not ask for the attribute value to be + // encoded using the final parameter. + m = writeClass.getAWriteAttributeMethod() and + paramPos = 1 and + not c.getArgumentForParameter(m.getParameter(2)).(BoolLiteral).getBoolValue() = true + | + c = m.getACall() and + this.getExpr() = c.getArgumentForParameter(m.getParameter(paramPos)) + ) + } +} + +/** + * An expression that is used as an argument to an HTML sink method on + * `AttributeCollection`. + */ +class AttributeCollectionSink extends HtmlSink { + AttributeCollectionSink() { + exists(SystemWebUIAttributeCollectionClass ac, Parameter p | + p = ac.getAddMethod().getParameter(1) or + p = ac.getItemProperty().getSetter().getParameter(0) + | + this.getExpr() = p.getAnAssignedArgument() + ) + } +} + +/** + * An expression that is used as the second argument `HtmlElement.SetAttribute`. + */ +class SetAttributeSink extends HtmlSink { + SetAttributeSink() { + this.getExpr() = + any(SystemWindowsFormsHtmlElement c).getSetAttributeMethod().getACall().getArgument(1) + } +} + +/** + * An expression that is used as an argument to an HTML sink setter, on + * a class within the `System.Web.UI` namespace. + */ +class SystemWebSetterHtmlSink extends HtmlSink { + SystemWebSetterHtmlSink() { + exists(Property p, string name, ValueOrRefType declaringType | + declaringType = p.getDeclaringType() and + any(SystemWebUINamespace n).getAChildNamespace*() = declaringType.getNamespace() and + this.getExpr() = p.getAnAssignedValue() and + p.hasName(name) + | + name = "Caption" and + (declaringType.hasName("Calendar") or declaringType.hasName("Table")) + or + name = "InnerHtml" + ) + or + exists(SystemWebUIWebControlsLabelClass c | + // Unlike `Text` properties of other web controls, `Label.Text` is not automatically HTML encoded + this.getExpr() = c.getTextProperty().getSetter().getParameter(0).getAnAssignedArgument() + ) + } +} + +/** + * An expression that is used as an argument to `HtmlHelper.Raw`, typically in + * a `.cshtml` file. + */ +class SystemWebMvcHtmlHelperRawSink extends HtmlSink { + SystemWebMvcHtmlHelperRawSink() { + this.getExpr() = any(SystemWebMvcHtmlHelperClass h).getRawMethod().getACall().getAnArgument() + } +} + +/** An expression that is returned from a `ToHtmlString` method. */ +class ToHtmlString extends HtmlSink { + ToHtmlString() { + exists(Method toHtmlString | + toHtmlString = any(SystemWebIHtmlString i).getToHtmlStringMethod().getAnUltimateImplementor() and + toHtmlString.canReturn(this.getExpr()) + ) + } +} + +/** + * An expression passed to the constructor of an `HtmlString` or a `MvcHtmlString`. + */ +class HtmlString extends HtmlSink { + HtmlString() { + exists(Class c | + c = any(SystemWebMvcMvcHtmlString m) or + c = any(SystemWebHtmlString m) + | + this.getExpr() = c.getAConstructor().getACall().getAnArgument() + ) + } +} + +/** + * An expression that is used as an argument to `Page.WriteLiteral`, typically in + * a `.cshtml` file. + */ +class WebPageWriteLiteralSink extends HtmlSink { + WebPageWriteLiteralSink() { + this.getExpr() = any(WebPageClass h).getWriteLiteralMethod().getACall().getAnArgument() + } +} + +/** + * An expression that is used as an argument to `Page.WriteLiteralTo`, typically in + * a `.cshtml` file. + */ +class WebPageWriteLiteralToSink extends HtmlSink { + WebPageWriteLiteralToSink() { + this.getExpr() = any(WebPageClass h).getWriteLiteralToMethod().getACall().getAnArgument() + } +} + +/** An ASP.NET Core HTML sink. */ +abstract class AspNetCoreHtmlSink extends HtmlSink { } + +/** + * An expression that is used as an argument to `HtmlHelper.Raw`, typically in + * a `.cshtml` file. + */ +class MicrosoftAspNetCoreMvcHtmlHelperRawSink extends AspNetCoreHtmlSink { + MicrosoftAspNetCoreMvcHtmlHelperRawSink() { + this.getExpr() = + any(MicrosoftAspNetCoreMvcHtmlHelperClass h).getRawMethod().getACall().getAnArgument() + } +} + +/** + * An expression that is used as an argument to `Page.WriteLiteral` in ASP.NET 6.0 razor page, typically in + * a `.cshtml` file. + */ +class MicrosoftAspNetRazorPageWriteLiteralSink extends AspNetCoreHtmlSink { + MicrosoftAspNetRazorPageWriteLiteralSink() { + this.getExpr() = + any(MicrosoftAspNetCoreMvcRazorPageBase h).getWriteLiteralMethod().getACall().getAnArgument() + } +} + +/** `HtmlString` that may be rendered as is need to have sanitized value. */ +class MicrosoftAspNetHtmlStringSink extends AspNetCoreHtmlSink { + MicrosoftAspNetHtmlStringSink() { + exists(ObjectCreation c, MicrosoftAspNetCoreHttpHtmlString s | + c.getTarget() = s.getAConstructor() and + this.asExpr() = c.getAnArgument() + ) + } +} diff --git a/csharp/ql/src/semmle/code/csharp/security/dataflow/flowsinks/Remote.qll b/csharp/ql/src/semmle/code/csharp/security/dataflow/flowsinks/Remote.qll new file mode 100644 index 00000000000..10885d52a16 --- /dev/null +++ b/csharp/ql/src/semmle/code/csharp/security/dataflow/flowsinks/Remote.qll @@ -0,0 +1,31 @@ +/** + * Provides classes representing data flow sinks for remote user output. + */ + +import csharp +private import Email::Email +private import ExternalLocationSink +private import Html +private import semmle.code.csharp.security.dataflow.XSS +private import semmle.code.csharp.frameworks.system.web.UI + +/** A data flow sink of remote user output. */ +abstract class RemoteFlowSink extends DataFlow::Node { } + +/** + * A value written to the `[Inner]Text` property of an object defined in the + * `System.Web.UI` namespace. + */ +class SystemWebUIText extends RemoteFlowSink { + SystemWebUIText() { + exists(Property p, string name | + p.getDeclaringType().getNamespace().getParentNamespace*() instanceof SystemWebUINamespace and + this.asExpr() = p.getAnAssignedValue() and + p.hasName(name) + | + name = "Text" + or + name = "InnerText" + ) + } +} diff --git a/csharp/ql/src/semmle/code/csharp/dataflow/flowsources/Local.qll b/csharp/ql/src/semmle/code/csharp/security/dataflow/flowsources/Local.qll similarity index 100% rename from csharp/ql/src/semmle/code/csharp/dataflow/flowsources/Local.qll rename to csharp/ql/src/semmle/code/csharp/security/dataflow/flowsources/Local.qll diff --git a/csharp/ql/src/semmle/code/csharp/security/dataflow/flowsources/Remote.qll b/csharp/ql/src/semmle/code/csharp/security/dataflow/flowsources/Remote.qll new file mode 100644 index 00000000000..f8240583108 --- /dev/null +++ b/csharp/ql/src/semmle/code/csharp/security/dataflow/flowsources/Remote.qll @@ -0,0 +1,218 @@ +/** + * Provides classes representing data flow sources for remote user input. + */ + +import csharp +private import semmle.code.csharp.frameworks.system.Net +private import semmle.code.csharp.frameworks.system.Web +private import semmle.code.csharp.frameworks.system.web.Http +private import semmle.code.csharp.frameworks.system.web.Mvc +private import semmle.code.csharp.frameworks.system.web.Services +private import semmle.code.csharp.frameworks.system.web.ui.WebControls +private import semmle.code.csharp.frameworks.WCF +private import semmle.code.csharp.frameworks.microsoft.Owin +private import semmle.code.csharp.frameworks.microsoft.AspNetCore + +/** A data flow source of remote user input. */ +abstract class RemoteFlowSource extends DataFlow::Node { + /** Gets a string that describes the type of this remote flow source. */ + abstract string getSourceType(); +} + +/** A data flow source of remote user input (ASP.NET). */ +abstract class AspNetRemoteFlowSource extends RemoteFlowSource { } + +/** A member containing an ASP.NET query string. */ +class AspNetQueryStringMember extends Member { + AspNetQueryStringMember() { + exists(RefType t | + t instanceof SystemWebHttpRequestClass or + t instanceof SystemNetHttpListenerRequestClass or + t instanceof SystemWebHttpRequestBaseClass + | + this = t.getProperty(getHttpRequestFlowPropertyNames()) or + this.(Field).getType() = t or + this.(Property).getType() = t or + this.(Callable).getReturnType() = t + ) + } +} + +/** + * Gets the names of the properties in `HttpRequest` classes that should propagate taint out of the + * request. + */ +private string getHttpRequestFlowPropertyNames() { + result = "QueryString" or + result = "Headers" or + result = "RawUrl" or + result = "Url" or + result = "Cookies" or + result = "Form" or + result = "Params" or + result = "Path" or + result = "PathInfo" +} + +/** A data flow source of remote user input (ASP.NET query string). */ +class AspNetQueryStringRemoteFlowSource extends AspNetRemoteFlowSource, DataFlow::ExprNode { + AspNetQueryStringRemoteFlowSource() { + exists(RefType t | + t instanceof SystemWebHttpRequestClass or + t instanceof SystemNetHttpListenerRequestClass or + t instanceof SystemWebHttpRequestBaseClass + | + // A request object can be indexed, so taint the object as well + this.getExpr().getType() = t + ) + or + this.getExpr() = any(AspNetQueryStringMember m).getAnAccess() + } + + override string getSourceType() { result = "ASP.NET query string" } +} + +/** A data flow source of remote user input (ASP.NET unvalidated request data). */ +class AspNetUnvalidatedQueryStringRemoteFlowSource extends AspNetRemoteFlowSource, + DataFlow::ExprNode { + AspNetUnvalidatedQueryStringRemoteFlowSource() { + this.getExpr() = any(SystemWebUnvalidatedRequestValues c).getAProperty().getGetter().getACall() or + this.getExpr() = + any(SystemWebUnvalidatedRequestValuesBase c).getAProperty().getGetter().getACall() + } + + override string getSourceType() { result = "ASP.NET unvalidated request data" } +} + +/** A data flow source of remote user input (ASP.NET user input). */ +class AspNetUserInputRemoteFlowSource extends AspNetRemoteFlowSource, DataFlow::ExprNode { + AspNetUserInputRemoteFlowSource() { getType() instanceof SystemWebUIWebControlsTextBoxClass } + + override string getSourceType() { result = "ASP.NET user input" } +} + +/** A data flow source of remote user input (WCF based web service). */ +class WcfRemoteFlowSource extends RemoteFlowSource, DataFlow::ParameterNode { + WcfRemoteFlowSource() { exists(OperationMethod om | om.getAParameter() = this.getParameter()) } + + override string getSourceType() { result = "web service input" } +} + +/** A data flow source of remote user input (ASP.NET web service). */ +class AspNetServiceRemoteFlowSource extends RemoteFlowSource, DataFlow::ParameterNode { + AspNetServiceRemoteFlowSource() { + exists(Method m | + m.getAParameter() = this.getParameter() and + m.getAnAttribute().getType() instanceof SystemWebServicesWebMethodAttributeClass + ) + } + + override string getSourceType() { result = "ASP.NET web service input" } +} + +/** A data flow source of remote user input (ASP.NET request message). */ +class SystemNetHttpRequestMessageRemoteFlowSource extends RemoteFlowSource, DataFlow::ExprNode { + SystemNetHttpRequestMessageRemoteFlowSource() { + getType() instanceof SystemWebHttpRequestMessageClass + } + + override string getSourceType() { result = "ASP.NET request message" } +} + +/** + * A data flow source of remote user input (Microsoft Owin, a query, request, + * or path string). + */ +class MicrosoftOwinStringFlowSource extends RemoteFlowSource, DataFlow::ExprNode { + MicrosoftOwinStringFlowSource() { + this.getExpr() = any(MicrosoftOwinString owinString).getValueProperty().getGetter().getACall() + } + + override string getSourceType() { result = "Microsoft Owin request or query string" } +} + +/** A data flow source of remote user input (`Microsoft Owin IOwinRequest`). */ +class MicrosoftOwinRequestRemoteFlowSource extends RemoteFlowSource, DataFlow::ExprNode { + MicrosoftOwinRequestRemoteFlowSource() { + exists(Property p, MicrosoftOwinIOwinRequestClass owinRequest | + this.getExpr() = p.getGetter().getACall() + | + p = owinRequest.getAcceptProperty() or + p = owinRequest.getBodyProperty() or + p = owinRequest.getCacheControlProperty() or + p = owinRequest.getContentTypeProperty() or + p = owinRequest.getContextProperty() or + p = owinRequest.getCookiesProperty() or + p = owinRequest.getHeadersProperty() or + p = owinRequest.getHostProperty() or + p = owinRequest.getMediaTypeProperty() or + p = owinRequest.getMethodProperty() or + p = owinRequest.getPathProperty() or + p = owinRequest.getPathBaseProperty() or + p = owinRequest.getQueryProperty() or + p = owinRequest.getQueryStringProperty() or + p = owinRequest.getRemoteIpAddressProperty() or + p = owinRequest.getSchemeProperty() or + p = owinRequest.getURIProperty() + ) + } + + override string getSourceType() { result = "Microsoft Owin request" } +} + +/** A parameter to an Mvc controller action method, viewed as a source of remote user input. */ +class ActionMethodParameter extends RemoteFlowSource, DataFlow::ParameterNode { + ActionMethodParameter() { + exists(Parameter p | + p = this.getParameter() and + p.fromSource() + | + p = any(Controller c).getAnActionMethod().getAParameter() or + p = any(ApiController c).getAnActionMethod().getAParameter() + ) + } + + override string getSourceType() { result = "ASP.NET MVC action method parameter" } +} + +/** A data flow source of remote user input (ASP.NET Core). */ +abstract class AspNetCoreRemoteFlowSource extends RemoteFlowSource { } + +/** A data flow source of remote user input (ASP.NET query collection). */ +class AspNetCoreQueryRemoteFlowSource extends AspNetCoreRemoteFlowSource, DataFlow::ExprNode { + AspNetCoreQueryRemoteFlowSource() { + exists(ValueOrRefType t | + t instanceof MicrosoftAspNetCoreHttpHttpRequest or + t instanceof MicrosoftAspNetCoreHttpQueryCollection or + t instanceof MicrosoftAspNetCoreHttpQueryString + | + this.getExpr().(Call).getTarget().getDeclaringType() = t or + this.asExpr().(Access).getTarget().getDeclaringType() = t + ) + or + exists(Call c | + c + .getTarget() + .getDeclaringType() + .hasQualifiedName("Microsoft.AspNetCore.Http", "IQueryCollection") and + c.getTarget().getName() = "TryGetValue" and + this.asExpr() = c.getArgumentForName("value") + ) + } + + override string getSourceType() { result = "ASP.NET Core query string" } +} + +/** A parameter to a `Mvc` controller action method, viewed as a source of remote user input. */ +class AspNetCoreActionMethodParameter extends RemoteFlowSource, DataFlow::ParameterNode { + AspNetCoreActionMethodParameter() { + exists(Parameter p | + p = this.getParameter() and + p.fromSource() + | + p = any(MicrosoftAspNetCoreMvcController c).getAnActionMethod().getAParameter() + ) + } + + override string getSourceType() { result = "ASP.NET Core MVC action method parameter" } +} diff --git a/csharp/ql/test/library-tests/dataflow/flowsources/remote/remoteFlowSource.ql b/csharp/ql/test/library-tests/dataflow/flowsources/remote/remoteFlowSource.ql index 4015799bd1a..29281ffba56 100644 --- a/csharp/ql/test/library-tests/dataflow/flowsources/remote/remoteFlowSource.ql +++ b/csharp/ql/test/library-tests/dataflow/flowsources/remote/remoteFlowSource.ql @@ -1,4 +1,4 @@ -import semmle.code.csharp.dataflow.flowsources.Remote +import semmle.code.csharp.security.dataflow.flowsources.Remote from RemoteFlowSource source select source, source.getSourceType() diff --git a/csharp/ql/test/library-tests/standalone/controlflow/cfg.expected b/csharp/ql/test/library-tests/standalone/controlflow/cfg.expected index 06473196df9..a9eb533c152 100644 --- a/csharp/ql/test/library-tests/standalone/controlflow/cfg.expected +++ b/csharp/ql/test/library-tests/standalone/controlflow/cfg.expected @@ -7,9 +7,14 @@ | ControlFlow.cs:10:9:10:43 | Call (unknown target) | ControlFlow.cs:12:9:12:87 | ...; | | ControlFlow.cs:10:9:10:43 | call to method | ControlFlow.cs:12:9:12:87 | ...; | | ControlFlow.cs:10:9:10:44 | ...; | ControlFlow.cs:10:9:10:13 | Expression | -| ControlFlow.cs:10:22:10:22 | access to local variable v | ControlFlow.cs:10:22:10:24 | Expression | -| ControlFlow.cs:10:22:10:24 | Expression | ControlFlow.cs:10:22:10:26 | Expression | -| ControlFlow.cs:10:22:10:26 | Expression | ControlFlow.cs:10:29:10:42 | "This is true" | +| ControlFlow.cs:10:22:10:22 | access to local variable v | ControlFlow.cs:10:22:10:24 | Call (unknown target) | +| ControlFlow.cs:10:22:10:22 | access to local variable v | ControlFlow.cs:10:22:10:24 | access to property (unknown) | +| ControlFlow.cs:10:22:10:24 | Call (unknown target) | ControlFlow.cs:10:22:10:26 | Call (unknown target) | +| ControlFlow.cs:10:22:10:24 | Call (unknown target) | ControlFlow.cs:10:22:10:26 | access to property (unknown) | +| ControlFlow.cs:10:22:10:24 | access to property (unknown) | ControlFlow.cs:10:22:10:26 | Call (unknown target) | +| ControlFlow.cs:10:22:10:24 | access to property (unknown) | ControlFlow.cs:10:22:10:26 | access to property (unknown) | +| ControlFlow.cs:10:22:10:26 | Call (unknown target) | ControlFlow.cs:10:29:10:42 | "This is true" | +| ControlFlow.cs:10:22:10:26 | access to property (unknown) | ControlFlow.cs:10:29:10:42 | "This is true" | | ControlFlow.cs:10:29:10:42 | "This is true" | ControlFlow.cs:10:9:10:43 | Call (unknown target) | | ControlFlow.cs:10:29:10:42 | "This is true" | ControlFlow.cs:10:9:10:43 | call to method | | ControlFlow.cs:12:9:12:86 | Call (unknown target) | ControlFlow.cs:12:37:12:47 | Expression | @@ -20,5 +25,7 @@ | ControlFlow.cs:12:51:12:62 | access to field Empty | ControlFlow.cs:12:37:12:62 | ... = ... | | ControlFlow.cs:12:65:12:75 | Expression | ControlFlow.cs:12:79:12:79 | access to local variable v | | ControlFlow.cs:12:65:12:84 | ... = ... | ControlFlow.cs:12:35:12:86 | { ..., ... } | -| ControlFlow.cs:12:79:12:79 | access to local variable v | ControlFlow.cs:12:79:12:84 | Expression | -| ControlFlow.cs:12:79:12:84 | Expression | ControlFlow.cs:12:65:12:84 | ... = ... | +| ControlFlow.cs:12:79:12:79 | access to local variable v | ControlFlow.cs:12:79:12:84 | Call (unknown target) | +| ControlFlow.cs:12:79:12:79 | access to local variable v | ControlFlow.cs:12:79:12:84 | access to property (unknown) | +| ControlFlow.cs:12:79:12:84 | Call (unknown target) | ControlFlow.cs:12:65:12:84 | ... = ... | +| ControlFlow.cs:12:79:12:84 | access to property (unknown) | ControlFlow.cs:12:65:12:84 | ... = ... | diff --git a/csharp/ql/test/library-tests/standalone/errorrecovery/ErrorCalls.expected b/csharp/ql/test/library-tests/standalone/errorrecovery/ErrorCalls.expected index 05ae24318c3..d57c3c86bbd 100644 --- a/csharp/ql/test/library-tests/standalone/errorrecovery/ErrorCalls.expected +++ b/csharp/ql/test/library-tests/standalone/errorrecovery/ErrorCalls.expected @@ -2,5 +2,5 @@ | errors.cs:43:21:43:28 | errors.cs:43:21:43:28 | object creation of type C1 | C1 | | errors.cs:44:13:44:19 | errors.cs:44:13:44:19 | call to method m1 | m1 | | errors.cs:45:13:45:19 | errors.cs:45:13:45:19 | call to method m2 | m2 | -| errors.cs:46:13:46:38 | errors.cs:46:13:46:38 | call to method | none | +| errors.cs:46:13:46:38 | errors.cs:46:13:46:38 | call to method WriteLine | WriteLine | | errors.cs:53:17:53:25 | errors.cs:53:17:53:25 | object creation of type C2 | none | diff --git a/csharp/ql/test/library-tests/standalone/regressions/ConstCase.expected b/csharp/ql/test/library-tests/standalone/regressions/ConstCase.expected index 47045b907f6..2c8616d347a 100644 --- a/csharp/ql/test/library-tests/standalone/regressions/ConstCase.expected +++ b/csharp/ql/test/library-tests/standalone/regressions/ConstCase.expected @@ -1,2 +1,3 @@ -| regressions.cs:16:13:16:37 | case ...: | regressions.cs:16:18:16:36 | Expression | -| regressions.cs:18:13:18:37 | case ...: | regressions.cs:18:18:18:36 | Expression | +| regressions.cs:16:13:16:37 | case ...: | regressions.cs:16:18:16:36 | access to property (unknown) | +| regressions.cs:18:13:18:37 | case ...: | regressions.cs:18:18:18:36 | access to property (unknown) | +| regressions.cs:20:13:20:23 | case ...: | regressions.cs:20:18:20:22 | Int32 x | diff --git a/csharp/ql/test/library-tests/standalone/regressions/ConstCase.ql b/csharp/ql/test/library-tests/standalone/regressions/ConstCase.ql index 7ee7754e91e..6c5d54518fe 100644 --- a/csharp/ql/test/library-tests/standalone/regressions/ConstCase.ql +++ b/csharp/ql/test/library-tests/standalone/regressions/ConstCase.ql @@ -1,7 +1,5 @@ import csharp from Case c, Expr e -where - e = c.getPattern().stripCasts() and - (e instanceof @unknown_expr or e instanceof ConstantPatternExpr) +where e = c.getPattern().stripCasts() select c, e diff --git a/csharp/ql/test/query-tests/Nullness/E.cs b/csharp/ql/test/query-tests/Nullness/E.cs index 0c020e9b150..67b5267ffbe 100644 --- a/csharp/ql/test/query-tests/Nullness/E.cs +++ b/csharp/ql/test/query-tests/Nullness/E.cs @@ -376,6 +376,15 @@ public class E } return -1; } + + static bool Ex37(E e1, E e2) + { + if ((e1 == null && e2 != null) || (e1 != null && e2 == null)) + return false; + if (e1 == null && e2 == null) + return true; + return e1.Long == e2.Long; // GOOD (false positive) + } } public static class Extensions diff --git a/csharp/ql/test/query-tests/Nullness/EqualityCheck.expected b/csharp/ql/test/query-tests/Nullness/EqualityCheck.expected index 2aa5b3f4f5e..5f739e93c2b 100644 --- a/csharp/ql/test/query-tests/Nullness/EqualityCheck.expected +++ b/csharp/ql/test/query-tests/Nullness/EqualityCheck.expected @@ -220,6 +220,20 @@ | E.cs:355:13:355:21 | dynamic call to operator != | false | E.cs:355:18:355:21 | null | E.cs:355:13:355:13 | access to local variable x | | E.cs:362:13:362:29 | ... != ... | false | E.cs:362:13:362:13 | access to local variable x | E.cs:362:18:362:29 | (...) ... | | E.cs:362:13:362:29 | ... != ... | false | E.cs:362:18:362:29 | (...) ... | E.cs:362:13:362:13 | access to local variable x | +| E.cs:382:14:382:23 | ... == ... | true | E.cs:382:14:382:15 | access to parameter e1 | E.cs:382:20:382:23 | null | +| E.cs:382:14:382:23 | ... == ... | true | E.cs:382:20:382:23 | null | E.cs:382:14:382:15 | access to parameter e1 | +| E.cs:382:28:382:37 | ... != ... | false | E.cs:382:28:382:29 | access to parameter e2 | E.cs:382:34:382:37 | null | +| E.cs:382:28:382:37 | ... != ... | false | E.cs:382:34:382:37 | null | E.cs:382:28:382:29 | access to parameter e2 | +| E.cs:382:44:382:53 | ... != ... | false | E.cs:382:44:382:45 | access to parameter e1 | E.cs:382:50:382:53 | null | +| E.cs:382:44:382:53 | ... != ... | false | E.cs:382:50:382:53 | null | E.cs:382:44:382:45 | access to parameter e1 | +| E.cs:382:58:382:67 | ... == ... | true | E.cs:382:58:382:59 | access to parameter e2 | E.cs:382:64:382:67 | null | +| E.cs:382:58:382:67 | ... == ... | true | E.cs:382:64:382:67 | null | E.cs:382:58:382:59 | access to parameter e2 | +| E.cs:384:13:384:22 | ... == ... | true | E.cs:384:13:384:14 | access to parameter e1 | E.cs:384:19:384:22 | null | +| E.cs:384:13:384:22 | ... == ... | true | E.cs:384:19:384:22 | null | E.cs:384:13:384:14 | access to parameter e1 | +| E.cs:384:27:384:36 | ... == ... | true | E.cs:384:27:384:28 | access to parameter e2 | E.cs:384:33:384:36 | null | +| E.cs:384:27:384:36 | ... == ... | true | E.cs:384:33:384:36 | null | E.cs:384:27:384:28 | access to parameter e2 | +| E.cs:386:16:386:33 | ... == ... | true | E.cs:386:16:386:22 | access to property Long | E.cs:386:27:386:33 | access to property Long | +| E.cs:386:16:386:33 | ... == ... | true | E.cs:386:27:386:33 | access to property Long | E.cs:386:16:386:22 | access to property Long | | Forwarding.cs:59:13:59:21 | ... == ... | true | Forwarding.cs:59:13:59:13 | access to parameter o | Forwarding.cs:59:18:59:21 | null | | Forwarding.cs:59:13:59:21 | ... == ... | true | Forwarding.cs:59:18:59:21 | null | Forwarding.cs:59:13:59:13 | access to parameter o | | Forwarding.cs:78:16:78:39 | call to method ReferenceEquals | true | Forwarding.cs:78:32:78:32 | access to parameter o | Forwarding.cs:78:35:78:38 | null | diff --git a/csharp/ql/test/query-tests/Nullness/Implications.expected b/csharp/ql/test/query-tests/Nullness/Implications.expected index cbb2de7cafc..d7a06795fc2 100644 --- a/csharp/ql/test/query-tests/Nullness/Implications.expected +++ b/csharp/ql/test/query-tests/Nullness/Implications.expected @@ -1236,6 +1236,26 @@ | E.cs:375:20:375:20 | access to local variable s | non-empty | E.cs:374:21:374:31 | ... as ... | non-empty | | E.cs:375:20:375:20 | access to local variable s | non-null | E.cs:374:21:374:31 | ... as ... | non-null | | E.cs:375:20:375:20 | access to local variable s | null | E.cs:374:21:374:31 | ... as ... | null | +| E.cs:382:13:382:68 | ... \|\| ... | false | E.cs:382:14:382:37 | ... && ... | false | +| E.cs:382:13:382:68 | ... \|\| ... | false | E.cs:382:44:382:67 | ... && ... | false | +| E.cs:382:14:382:23 | ... == ... | false | E.cs:382:14:382:15 | access to parameter e1 | non-null | +| E.cs:382:14:382:23 | ... == ... | true | E.cs:382:14:382:15 | access to parameter e1 | null | +| E.cs:382:14:382:37 | ... && ... | true | E.cs:382:14:382:23 | ... == ... | true | +| E.cs:382:14:382:37 | ... && ... | true | E.cs:382:28:382:37 | ... != ... | true | +| E.cs:382:28:382:37 | ... != ... | false | E.cs:382:28:382:29 | access to parameter e2 | null | +| E.cs:382:28:382:37 | ... != ... | true | E.cs:382:28:382:29 | access to parameter e2 | non-null | +| E.cs:382:44:382:53 | ... != ... | false | E.cs:382:44:382:45 | access to parameter e1 | null | +| E.cs:382:44:382:53 | ... != ... | true | E.cs:382:44:382:45 | access to parameter e1 | non-null | +| E.cs:382:44:382:67 | ... && ... | true | E.cs:382:44:382:53 | ... != ... | true | +| E.cs:382:44:382:67 | ... && ... | true | E.cs:382:58:382:67 | ... == ... | true | +| E.cs:382:58:382:67 | ... == ... | false | E.cs:382:58:382:59 | access to parameter e2 | non-null | +| E.cs:382:58:382:67 | ... == ... | true | E.cs:382:58:382:59 | access to parameter e2 | null | +| E.cs:384:13:384:22 | ... == ... | false | E.cs:384:13:384:14 | access to parameter e1 | non-null | +| E.cs:384:13:384:22 | ... == ... | true | E.cs:384:13:384:14 | access to parameter e1 | null | +| E.cs:384:13:384:36 | ... && ... | true | E.cs:384:13:384:22 | ... == ... | true | +| E.cs:384:13:384:36 | ... && ... | true | E.cs:384:27:384:36 | ... == ... | true | +| E.cs:384:27:384:36 | ... == ... | false | E.cs:384:27:384:28 | access to parameter e2 | non-null | +| E.cs:384:27:384:36 | ... == ... | true | E.cs:384:27:384:28 | access to parameter e2 | null | | Forwarding.cs:9:13:9:30 | !... | false | Forwarding.cs:9:14:9:30 | call to method IsNullOrEmpty | true | | Forwarding.cs:9:13:9:30 | !... | true | Forwarding.cs:9:14:9:30 | call to method IsNullOrEmpty | false | | Forwarding.cs:9:14:9:14 | access to local variable s | empty | Forwarding.cs:7:20:7:23 | null | empty | diff --git a/csharp/ql/test/query-tests/Nullness/NullCheck.expected b/csharp/ql/test/query-tests/Nullness/NullCheck.expected index d9fcd0121f9..1c2b22ca6b5 100644 --- a/csharp/ql/test/query-tests/Nullness/NullCheck.expected +++ b/csharp/ql/test/query-tests/Nullness/NullCheck.expected @@ -264,6 +264,18 @@ | E.cs:362:13:362:29 | ... != ... | E.cs:362:13:362:13 | access to local variable x | false | true | | E.cs:362:13:362:29 | ... != ... | E.cs:362:13:362:13 | access to local variable x | true | false | | E.cs:372:13:372:23 | ... is ... | E.cs:372:13:372:13 | access to parameter o | true | false | +| E.cs:382:14:382:23 | ... == ... | E.cs:382:14:382:15 | access to parameter e1 | false | false | +| E.cs:382:14:382:23 | ... == ... | E.cs:382:14:382:15 | access to parameter e1 | true | true | +| E.cs:382:28:382:37 | ... != ... | E.cs:382:28:382:29 | access to parameter e2 | false | true | +| E.cs:382:28:382:37 | ... != ... | E.cs:382:28:382:29 | access to parameter e2 | true | false | +| E.cs:382:44:382:53 | ... != ... | E.cs:382:44:382:45 | access to parameter e1 | false | true | +| E.cs:382:44:382:53 | ... != ... | E.cs:382:44:382:45 | access to parameter e1 | true | false | +| E.cs:382:58:382:67 | ... == ... | E.cs:382:58:382:59 | access to parameter e2 | false | false | +| E.cs:382:58:382:67 | ... == ... | E.cs:382:58:382:59 | access to parameter e2 | true | true | +| E.cs:384:13:384:22 | ... == ... | E.cs:384:13:384:14 | access to parameter e1 | false | false | +| E.cs:384:13:384:22 | ... == ... | E.cs:384:13:384:14 | access to parameter e1 | true | true | +| E.cs:384:27:384:36 | ... == ... | E.cs:384:27:384:28 | access to parameter e2 | false | false | +| E.cs:384:27:384:36 | ... == ... | E.cs:384:27:384:28 | access to parameter e2 | true | true | | Forwarding.cs:9:14:9:30 | call to method IsNullOrEmpty | Forwarding.cs:9:14:9:14 | access to local variable s | false | false | | Forwarding.cs:14:13:14:32 | call to method IsNotNullOrEmpty | Forwarding.cs:14:13:14:13 | access to local variable s | true | false | | Forwarding.cs:19:14:19:23 | call to method IsNull | Forwarding.cs:19:14:19:14 | access to local variable s | false | false | diff --git a/csharp/ql/test/query-tests/Nullness/NullMaybe.expected b/csharp/ql/test/query-tests/Nullness/NullMaybe.expected index acc3a29817c..4d4f9344986 100644 --- a/csharp/ql/test/query-tests/Nullness/NullMaybe.expected +++ b/csharp/ql/test/query-tests/Nullness/NullMaybe.expected @@ -355,6 +355,21 @@ nodes | E.cs:366:41:366:41 | access to parameter s | | E.cs:374:17:374:31 | SSA def(s) | | E.cs:375:20:375:20 | access to local variable s | +| E.cs:380:24:380:25 | SSA param(e1) | +| E.cs:380:24:380:25 | SSA param(e1) | +| E.cs:380:24:380:25 | SSA param(e1) | +| E.cs:380:30:380:31 | SSA param(e2) | +| E.cs:380:30:380:31 | SSA param(e2) | +| E.cs:380:30:380:31 | SSA param(e2) | +| E.cs:382:28:382:29 | access to parameter e2 | +| E.cs:382:28:382:29 | access to parameter e2 | +| E.cs:382:44:382:67 | ... && ... | +| E.cs:382:44:382:67 | ... && ... | +| E.cs:384:9:385:24 | if (...) ... | +| E.cs:384:9:385:24 | if (...) ... | +| E.cs:384:27:384:28 | access to parameter e2 | +| E.cs:386:16:386:17 | access to parameter e1 | +| E.cs:386:27:386:28 | access to parameter e2 | | Forwarding.cs:7:16:7:23 | SSA def(s) | | Forwarding.cs:14:9:17:9 | if (...) ... | | Forwarding.cs:19:9:22:9 | if (...) ... | @@ -688,6 +703,22 @@ edges | E.cs:348:17:348:36 | SSA def(x) | E.cs:349:9:349:9 | access to local variable x | | E.cs:366:28:366:28 | SSA param(s) | E.cs:366:41:366:41 | access to parameter s | | E.cs:374:17:374:31 | SSA def(s) | E.cs:375:20:375:20 | access to local variable s | +| E.cs:380:24:380:25 | SSA param(e1) | E.cs:382:28:382:29 | access to parameter e2 | +| E.cs:380:24:380:25 | SSA param(e1) | E.cs:382:28:382:29 | access to parameter e2 | +| E.cs:380:24:380:25 | SSA param(e1) | E.cs:382:28:382:29 | access to parameter e2 | +| E.cs:380:30:380:31 | SSA param(e2) | E.cs:382:28:382:29 | access to parameter e2 | +| E.cs:380:30:380:31 | SSA param(e2) | E.cs:382:28:382:29 | access to parameter e2 | +| E.cs:380:30:380:31 | SSA param(e2) | E.cs:382:28:382:29 | access to parameter e2 | +| E.cs:380:30:380:31 | SSA param(e2) | E.cs:382:44:382:67 | ... && ... | +| E.cs:380:30:380:31 | SSA param(e2) | E.cs:382:44:382:67 | ... && ... | +| E.cs:380:30:380:31 | SSA param(e2) | E.cs:382:44:382:67 | ... && ... | +| E.cs:382:28:382:29 | access to parameter e2 | E.cs:382:44:382:67 | ... && ... | +| E.cs:382:28:382:29 | access to parameter e2 | E.cs:382:44:382:67 | ... && ... | +| E.cs:382:44:382:67 | ... && ... | E.cs:384:9:385:24 | if (...) ... | +| E.cs:382:44:382:67 | ... && ... | E.cs:384:9:385:24 | if (...) ... | +| E.cs:384:9:385:24 | if (...) ... | E.cs:384:27:384:28 | access to parameter e2 | +| E.cs:384:9:385:24 | if (...) ... | E.cs:386:27:386:28 | access to parameter e2 | +| E.cs:384:27:384:28 | access to parameter e2 | E.cs:386:16:386:17 | access to parameter e1 | | Forwarding.cs:7:16:7:23 | SSA def(s) | Forwarding.cs:14:9:17:9 | if (...) ... | | Forwarding.cs:14:9:17:9 | if (...) ... | Forwarding.cs:19:9:22:9 | if (...) ... | | Forwarding.cs:19:9:22:9 | if (...) ... | Forwarding.cs:24:9:27:9 | if (...) ... | @@ -790,6 +821,12 @@ edges | E.cs:349:9:349:9 | access to local variable x | E.cs:348:17:348:36 | SSA def(x) | E.cs:349:9:349:9 | access to local variable x | Variable $@ may be null here because of $@ assignment. | E.cs:348:17:348:17 | x | x | E.cs:348:17:348:36 | dynamic x = ... | this | | E.cs:366:41:366:41 | access to parameter s | E.cs:366:28:366:28 | SSA param(s) | E.cs:366:41:366:41 | access to parameter s | Variable $@ may be null here because the parameter has a null default value. | E.cs:366:28:366:28 | s | s | E.cs:366:32:366:35 | null | this | | E.cs:375:20:375:20 | access to local variable s | E.cs:374:17:374:31 | SSA def(s) | E.cs:375:20:375:20 | access to local variable s | Variable $@ may be null here because of $@ assignment. | E.cs:374:17:374:17 | s | s | E.cs:374:17:374:31 | String s = ... | this | +| E.cs:386:16:386:17 | access to parameter e1 | E.cs:380:24:380:25 | SSA param(e1) | E.cs:386:16:386:17 | access to parameter e1 | Variable $@ may be null here as suggested by $@ null check. | E.cs:380:24:380:25 | e1 | e1 | E.cs:382:14:382:23 | ... == ... | this | +| E.cs:386:16:386:17 | access to parameter e1 | E.cs:380:24:380:25 | SSA param(e1) | E.cs:386:16:386:17 | access to parameter e1 | Variable $@ may be null here as suggested by $@ null check. | E.cs:380:24:380:25 | e1 | e1 | E.cs:382:44:382:53 | ... != ... | this | +| E.cs:386:16:386:17 | access to parameter e1 | E.cs:380:24:380:25 | SSA param(e1) | E.cs:386:16:386:17 | access to parameter e1 | Variable $@ may be null here as suggested by $@ null check. | E.cs:380:24:380:25 | e1 | e1 | E.cs:384:13:384:22 | ... == ... | this | +| E.cs:386:27:386:28 | access to parameter e2 | E.cs:380:30:380:31 | SSA param(e2) | E.cs:386:27:386:28 | access to parameter e2 | Variable $@ may be null here as suggested by $@ null check. | E.cs:380:30:380:31 | e2 | e2 | E.cs:382:28:382:37 | ... != ... | this | +| E.cs:386:27:386:28 | access to parameter e2 | E.cs:380:30:380:31 | SSA param(e2) | E.cs:386:27:386:28 | access to parameter e2 | Variable $@ may be null here as suggested by $@ null check. | E.cs:380:30:380:31 | e2 | e2 | E.cs:382:58:382:67 | ... == ... | this | +| E.cs:386:27:386:28 | access to parameter e2 | E.cs:380:30:380:31 | SSA param(e2) | E.cs:386:27:386:28 | access to parameter e2 | Variable $@ may be null here as suggested by $@ null check. | E.cs:380:30:380:31 | e2 | e2 | E.cs:384:27:384:36 | ... == ... | this | | GuardedString.cs:35:31:35:31 | access to local variable s | GuardedString.cs:7:16:7:32 | SSA def(s) | GuardedString.cs:35:31:35:31 | access to local variable s | Variable $@ may be null here because of $@ assignment. | GuardedString.cs:7:16:7:16 | s | s | GuardedString.cs:7:16:7:32 | String s = ... | this | | NullMaybeBad.cs:7:27:7:27 | access to parameter o | NullMaybeBad.cs:13:17:13:20 | null | NullMaybeBad.cs:7:27:7:27 | access to parameter o | Variable $@ may be null here because of $@ null argument. | NullMaybeBad.cs:5:25:5:25 | o | o | NullMaybeBad.cs:13:17:13:20 | null | this | | StringConcatenation.cs:16:17:16:17 | access to local variable s | StringConcatenation.cs:14:16:14:23 | SSA def(s) | StringConcatenation.cs:16:17:16:17 | access to local variable s | Variable $@ may be null here because of $@ assignment. | StringConcatenation.cs:14:16:14:16 | s | s | StringConcatenation.cs:14:16:14:23 | String s = ... | this | diff --git a/csharp/ql/test/query-tests/Security Features/CWE-209/ExceptionInformationExposure.cs b/csharp/ql/test/query-tests/Security Features/CWE-209/ExceptionInformationExposure.cs index 2a7cc3ae232..c0d56d1337e 100644 --- a/csharp/ql/test/query-tests/Security Features/CWE-209/ExceptionInformationExposure.cs +++ b/csharp/ql/test/query-tests/Security Features/CWE-209/ExceptionInformationExposure.cs @@ -2,10 +2,13 @@ using System; using System.Web; +using System.Web.UI.WebControls; public class StackTraceHandler : IHttpHandler { bool b; + TextBox textBox; + public void ProcessRequest(HttpContext ctx) { try @@ -34,6 +37,11 @@ public class StackTraceHandler : IHttpHandler // GOOD: log the stack trace, and send back a non-revealing response log("Exception occurred", ex); ctx.Response.Write("Exception occurred"); + + textBox.Text = ex.InnerException.StackTrace; // BAD + textBox.Text = ex.StackTrace; // BAD + textBox.Text = ex.ToString(); // BAD + textBox.Text = ex.Message; // GOOD return; } diff --git a/csharp/ql/test/query-tests/Security Features/CWE-209/ExceptionInformationExposure.expected b/csharp/ql/test/query-tests/Security Features/CWE-209/ExceptionInformationExposure.expected index 3cdd0f50449..492a61dd038 100644 --- a/csharp/ql/test/query-tests/Security Features/CWE-209/ExceptionInformationExposure.expected +++ b/csharp/ql/test/query-tests/Security Features/CWE-209/ExceptionInformationExposure.expected @@ -1,14 +1,20 @@ edges -| ExceptionInformationExposure.cs:18:32:18:33 | access to local variable ex : Exception | ExceptionInformationExposure.cs:20:32:20:33 | access to local variable ex | +| ExceptionInformationExposure.cs:21:32:21:33 | access to local variable ex : Exception | ExceptionInformationExposure.cs:23:32:23:33 | access to local variable ex | nodes -| ExceptionInformationExposure.cs:18:32:18:33 | access to local variable ex : Exception | semmle.label | access to local variable ex : Exception | -| ExceptionInformationExposure.cs:18:32:18:44 | call to method ToString | semmle.label | call to method ToString | -| ExceptionInformationExposure.cs:20:32:20:33 | access to local variable ex | semmle.label | access to local variable ex | -| ExceptionInformationExposure.cs:22:32:22:44 | access to property StackTrace | semmle.label | access to property StackTrace | -| ExceptionInformationExposure.cs:41:28:41:55 | call to method ToString | semmle.label | call to method ToString | +| ExceptionInformationExposure.cs:21:32:21:33 | access to local variable ex : Exception | semmle.label | access to local variable ex : Exception | +| ExceptionInformationExposure.cs:21:32:21:44 | call to method ToString | semmle.label | call to method ToString | +| ExceptionInformationExposure.cs:23:32:23:33 | access to local variable ex | semmle.label | access to local variable ex | +| ExceptionInformationExposure.cs:25:32:25:44 | access to property StackTrace | semmle.label | access to property StackTrace | +| ExceptionInformationExposure.cs:41:28:41:55 | access to property StackTrace | semmle.label | access to property StackTrace | +| ExceptionInformationExposure.cs:42:28:42:40 | access to property StackTrace | semmle.label | access to property StackTrace | +| ExceptionInformationExposure.cs:43:28:43:40 | call to method ToString | semmle.label | call to method ToString | +| ExceptionInformationExposure.cs:49:28:49:55 | call to method ToString | semmle.label | call to method ToString | #select -| ExceptionInformationExposure.cs:18:32:18:44 | call to method ToString | ExceptionInformationExposure.cs:18:32:18:44 | call to method ToString | ExceptionInformationExposure.cs:18:32:18:44 | call to method ToString | Exception information from $@ flows to here, and is exposed to the user. | ExceptionInformationExposure.cs:18:32:18:44 | call to method ToString | call to method ToString | -| ExceptionInformationExposure.cs:20:32:20:33 | access to local variable ex | ExceptionInformationExposure.cs:18:32:18:33 | access to local variable ex : Exception | ExceptionInformationExposure.cs:20:32:20:33 | access to local variable ex | Exception information from $@ flows to here, and is exposed to the user. | ExceptionInformationExposure.cs:18:32:18:33 | access to local variable ex | access to local variable ex : Exception | -| ExceptionInformationExposure.cs:20:32:20:33 | access to local variable ex | ExceptionInformationExposure.cs:20:32:20:33 | access to local variable ex | ExceptionInformationExposure.cs:20:32:20:33 | access to local variable ex | Exception information from $@ flows to here, and is exposed to the user. | ExceptionInformationExposure.cs:20:32:20:33 | access to local variable ex | access to local variable ex | -| ExceptionInformationExposure.cs:22:32:22:44 | access to property StackTrace | ExceptionInformationExposure.cs:22:32:22:44 | access to property StackTrace | ExceptionInformationExposure.cs:22:32:22:44 | access to property StackTrace | Exception information from $@ flows to here, and is exposed to the user. | ExceptionInformationExposure.cs:22:32:22:44 | access to property StackTrace | access to property StackTrace | -| ExceptionInformationExposure.cs:41:28:41:55 | call to method ToString | ExceptionInformationExposure.cs:41:28:41:55 | call to method ToString | ExceptionInformationExposure.cs:41:28:41:55 | call to method ToString | Exception information from $@ flows to here, and is exposed to the user. | ExceptionInformationExposure.cs:41:28:41:55 | call to method ToString | call to method ToString | +| ExceptionInformationExposure.cs:21:32:21:44 | call to method ToString | ExceptionInformationExposure.cs:21:32:21:44 | call to method ToString | ExceptionInformationExposure.cs:21:32:21:44 | call to method ToString | Exception information from $@ flows to here, and is exposed to the user. | ExceptionInformationExposure.cs:21:32:21:44 | call to method ToString | call to method ToString | +| ExceptionInformationExposure.cs:23:32:23:33 | access to local variable ex | ExceptionInformationExposure.cs:21:32:21:33 | access to local variable ex : Exception | ExceptionInformationExposure.cs:23:32:23:33 | access to local variable ex | Exception information from $@ flows to here, and is exposed to the user. | ExceptionInformationExposure.cs:21:32:21:33 | access to local variable ex | access to local variable ex : Exception | +| ExceptionInformationExposure.cs:23:32:23:33 | access to local variable ex | ExceptionInformationExposure.cs:23:32:23:33 | access to local variable ex | ExceptionInformationExposure.cs:23:32:23:33 | access to local variable ex | Exception information from $@ flows to here, and is exposed to the user. | ExceptionInformationExposure.cs:23:32:23:33 | access to local variable ex | access to local variable ex | +| ExceptionInformationExposure.cs:25:32:25:44 | access to property StackTrace | ExceptionInformationExposure.cs:25:32:25:44 | access to property StackTrace | ExceptionInformationExposure.cs:25:32:25:44 | access to property StackTrace | Exception information from $@ flows to here, and is exposed to the user. | ExceptionInformationExposure.cs:25:32:25:44 | access to property StackTrace | access to property StackTrace | +| ExceptionInformationExposure.cs:41:28:41:55 | access to property StackTrace | ExceptionInformationExposure.cs:41:28:41:55 | access to property StackTrace | ExceptionInformationExposure.cs:41:28:41:55 | access to property StackTrace | Exception information from $@ flows to here, and is exposed to the user. | ExceptionInformationExposure.cs:41:28:41:55 | access to property StackTrace | access to property StackTrace | +| ExceptionInformationExposure.cs:42:28:42:40 | access to property StackTrace | ExceptionInformationExposure.cs:42:28:42:40 | access to property StackTrace | ExceptionInformationExposure.cs:42:28:42:40 | access to property StackTrace | Exception information from $@ flows to here, and is exposed to the user. | ExceptionInformationExposure.cs:42:28:42:40 | access to property StackTrace | access to property StackTrace | +| ExceptionInformationExposure.cs:43:28:43:40 | call to method ToString | ExceptionInformationExposure.cs:43:28:43:40 | call to method ToString | ExceptionInformationExposure.cs:43:28:43:40 | call to method ToString | Exception information from $@ flows to here, and is exposed to the user. | ExceptionInformationExposure.cs:43:28:43:40 | call to method ToString | call to method ToString | +| ExceptionInformationExposure.cs:49:28:49:55 | call to method ToString | ExceptionInformationExposure.cs:49:28:49:55 | call to method ToString | ExceptionInformationExposure.cs:49:28:49:55 | call to method ToString | Exception information from $@ flows to here, and is exposed to the user. | ExceptionInformationExposure.cs:49:28:49:55 | call to method ToString | call to method ToString | diff --git a/docs/language/learn-ql/beginner/fire-1.rst b/docs/language/learn-ql/beginner/catch-the-fire-starter.rst similarity index 73% rename from docs/language/learn-ql/beginner/fire-1.rst rename to docs/language/learn-ql/beginner/catch-the-fire-starter.rst index d5427b56d5f..72f3f4f7685 100644 --- a/docs/language/learn-ql/beginner/fire-1.rst +++ b/docs/language/learn-ql/beginner/catch-the-fire-starter.rst @@ -1,5 +1,7 @@ -Catch the fire starter: Classes and predicates -============================================== +Catch the fire starter +====================== + +Learn about QL predicates and classes to solve your second mystery as a QL detective. Just as you've successfully found the thief and returned the golden crown to the castle, another terrible crime is committed. Early in the morning, a few people start a fire in a field in the north of the village and destroy all the crops! @@ -101,6 +103,50 @@ Now try applying ``isAllowedIn(string region)`` to a person ``p``. If ``p`` is n You know that the fire starters live in the south *and* that they must have been able to travel to the north. Write a query to find the possible suspects. You could also extend the ``select`` clause to list the age of the suspects. That way you can clearly see that all the children have been excluded from the list. -➤ `See the answer in the query console `__ +➤ `See the answer in the query console on LGTM.com `__ -Continue to the :doc:`next page ` to gather more clues and find out which of your suspects started the fire... +You can now continue to gather more clues and find out which of your suspects started the fire... + +Identify the bald bandits +------------------------- + +You ask the northerners if they have any more information about the fire starters. Luckily, you have a witness! The farmer living next to the field saw two people run away just after the fire started. He only saw the tops of their heads, and noticed that they were both bald. + +This is a very helpful clue. Remember that you wrote a QL query to select all bald people: + +.. code-block:: ql + + from Person p + where not exists (string c | p.getHairColor() = c) + select p + +To avoid having to type ``not exists (string c | p.getHairColor() = c)`` every time you want to select a bald person, you can instead define another new predicate ``isBald``. + +.. code-block:: ql + + predicate isBald(Person p) { + not exists (string c | p.getHairColor() = c) + } + +The property ``isBald(p)`` holds whenever ``p`` is bald, so you can replace the previous query with: + +.. code-block:: ql + + from Person p + where isBald(p) + select p + +The predicate ``isBald`` is defined to take a ``Person``, so it can also take a ``Southerner``, as ``Southerner`` is a subtype of ``Person``. It can't take an ``int`` for example—that would cause an error. + +You can now write a query to select the bald southerners who are allowed into the north. + +➤ `See the answer in the query console on LGTM.com `__ + +You have found the two fire starters! They are arrested and the villagers are once again impressed with your work. + +Further reading +--------------- + +- Find out who will be the new ruler of the village in the :doc:`next tutorial `. +- Learn more about predicates and classes in the `QL language reference `__. +- Explore the libraries that help you get data about code in :doc:`Learning CodeQL <../../index>`. diff --git a/docs/language/learn-ql/ql-etudes/river-crossing.rst b/docs/language/learn-ql/beginner/cross-the-river.rst similarity index 95% rename from docs/language/learn-ql/ql-etudes/river-crossing.rst rename to docs/language/learn-ql/beginner/cross-the-river.rst index fef3363a08b..3f5307d98d5 100644 --- a/docs/language/learn-ql/ql-etudes/river-crossing.rst +++ b/docs/language/learn-ql/beginner/cross-the-river.rst @@ -1,7 +1,10 @@ -River crossing puzzle -##################### +Cross the river +=============== -The aim of this tutorial is to write a query that finds a solution to the following classical logic puzzle: +Use common QL features to write a query that finds a solution to the "River crossing" logic puzzle. + +Introduction +------------ .. pull-quote:: @@ -248,15 +251,15 @@ Here are some more example queries that solve the river crossing puzzle: #. This query uses a modified ``path`` variable to describe the resulting path in more detail. - ➤ `See solution in the query console `__ + ➤ `See solution in the query console on LGTM.com `__ #. This query models the man and the cargo items in a different way, using an `abstract `__ class and predicate. It also displays the resulting path in a more visual way. - ➤ `See solution in the query console `__ + ➤ `See solution in the query console on LGTM.com `__ #. This query introduces `algebraic datatypes `__ to model the situation, instead of defining everything as a subclass of ``string``. - ➤ `See solution in the query console `__ \ No newline at end of file + ➤ `See solution in the query console on LGTM.com `__ \ No newline at end of file diff --git a/docs/language/learn-ql/beginner/heir.rst b/docs/language/learn-ql/beginner/crown-the-rightful-heir.rst similarity index 93% rename from docs/language/learn-ql/beginner/heir.rst rename to docs/language/learn-ql/beginner/crown-the-rightful-heir.rst index 42589f03125..df3cd61fde9 100644 --- a/docs/language/learn-ql/beginner/heir.rst +++ b/docs/language/learn-ql/beginner/crown-the-rightful-heir.rst @@ -1,6 +1,11 @@ Crown the rightful heir ======================= +This is a QL detective puzzle that shows you how to use recursion in QL to write more complex queries. + +King Basil's heir +----------------- + Phew! No more crimes in the village—you can finally leave the village and go home. But then... During your last night in the village, the old king—the great King Basil—dies in his sleep and there is chaos everywhere! @@ -9,9 +14,6 @@ The king never married and he had no children, so nobody knows who should inheri Eventually you decide to stay in the village to resolve the argument and find the true heir to the throne. -King Basil's heir ------------------ - You want to find out if anyone in the village is actually related to the king. This seems like a difficult task at first, but you start work confidently. You know the villagers quite well by now, and you have a list of all the parents in the village and their children. To find out more about the king and his family, you get access to the castle and find some old family trees. You also include these relations in your database to see if anyone in the king's family is still alive. @@ -125,7 +127,7 @@ Here is one way to define ``relativeOf()``: Don't forget to use the predicate ``isDeceased()`` to find relatives that are still alive. -➤ `See the answer in the query console `__ +➤ `See the answer in the query console on LGTM.com `__ Select the true heir -------------------- @@ -136,9 +138,9 @@ To decide who should inherit the king's fortune, the villagers carefully read th *"The heir to the throne is the closest living relative of the king. Any person with a criminal record will not be considered. If there are multiple candidates, the oldest person is the heir."* -As your final challenge, define a predicate ``hasCriminalRecord`` so that ``hasCriminalRecord(p)`` holds if ``p`` is any of the criminals you unmasked earlier (in the :doc:`Find the thief ` and :doc:`Catch the fire starter ` tutorials). +As your final challenge, define a predicate ``hasCriminalRecord`` so that ``hasCriminalRecord(p)`` holds if ``p`` is any of the criminals you unmasked earlier (in the :doc:`Find the thief ` and :doc:`Catch the fire starter ` tutorials). -➤ `See the answer in the query console `__ +➤ `See the answer in the query console on LGTM.com `__ Experimental explorations ------------------------- @@ -156,9 +158,9 @@ You could also try writing more of your own QL queries to find interesting facts - Do all villagers live in the same region of the village as their parents? - Find out whether there are any time travelers in the village! (Hint: Look for "impossible" family relations.) -What next? ----------- +Further reading +--------------- -- Learn more about recursion in the `QL language handbook `__. -- Put your QL skills to the test and solve the :doc:`River crossing puzzle <../ql-etudes/river-crossing>`. +- Learn more about recursion in the `QL language reference `__. +- Put your QL skills to the test and solve the :doc:`River crossing puzzle `. - Start using QL to analyze projects. See :doc:`Learning CodeQL <../../index>` for a summary of the available languages and resources. diff --git a/docs/language/learn-ql/beginner/find-the-thief.rst b/docs/language/learn-ql/beginner/find-the-thief.rst new file mode 100644 index 00000000000..02f19cce325 --- /dev/null +++ b/docs/language/learn-ql/beginner/find-the-thief.rst @@ -0,0 +1,297 @@ +Find the thief +============== + +Take on the role of a detective to find the thief in this fictional village. You will learn how to use logical connectives, quantifiers, and aggregates in QL along the way. + +Introduction +------------ + +There is a small village hidden away in the mountains. The village is divided into four parts—north, south, east, and west—and in the center stands a dark and mysterious castle... Inside the castle, locked away in the highest tower, lies the king's valuable golden crown. One night, a terrible crime is committed. A thief breaks into the tower and steals the crown! + +You know that the thief must live in the village, since nobody else knew about the crown. After some expert detective work, you obtain a list of all the people in the village and some of their personal details. + ++------+-----+------------+--------+----------+ +| Name | Age | Hair color | Height | Location | ++======+=====+============+========+==========+ +| ... | ... | ... | ... | ... | ++------+-----+------------+--------+----------+ + +Sadly, you still have no idea who could have stolen the crown so you walk around the village to find clues. The villagers act very suspiciously and you are convinced they have information about the thief. They refuse to share their knowledge with you directly, but they reluctantly agree to answer questions. They are still not very talkative and **only answer questions with 'yes' or 'no'**. + +You start asking some creative questions and making notes of the answers so you can compare them with your information later: + ++------+--------------------------------------------------------------------+--------+ +| | Question | Answer | ++======+====================================================================+========+ +| (1) | Is the thief taller than 150 cm? | yes | ++------+--------------------------------------------------------------------+--------+ +| (2) | Does the thief have blond hair? | no | ++------+--------------------------------------------------------------------+--------+ +| (3) | Is the thief bald? | no | ++------+--------------------------------------------------------------------+--------+ +| (4) | Is the thief younger than 30? | no | ++------+--------------------------------------------------------------------+--------+ +| (5) | Does the thief live east of the castle? | yes | ++------+--------------------------------------------------------------------+--------+ +| (6) | Does the thief have black or brown hair? | yes | ++------+--------------------------------------------------------------------+--------+ +| (7) | Is the thief taller than 180cm and shorter than 190cm? | no | ++------+--------------------------------------------------------------------+--------+ +| (8) | Is the thief the tallest person in the village? | no | ++------+--------------------------------------------------------------------+--------+ +| (9) | Is the thief shorter than the average villager? | yes | ++------+--------------------------------------------------------------------+--------+ +| (10) | Is the thief the oldest person in the eastern part of the village? | yes | ++------+--------------------------------------------------------------------+--------+ + +There is too much information to search through by hand, so you decide to use your newly acquired QL skills to help you with your investigation... + +#. Open the `query console on LGTM.com `__ to get started. +#. Select a language and a demo project. For this tutorial, any language and project will do. +#. Delete the default code ``import select "hello world"``. + +QL libraries +------------ + +We've defined a number of QL `predicates `__ to help you extract data from your table. A QL predicate is a mini-query that expresses a relation between various pieces of data and describes some of their properties. In this case, the predicates give you information about a person, for example their height or age. + ++--------------------+----------------------------------------------------------------------------------------+ +| Predicate | Description | ++====================+========================================================================================+ +| ``getAge()`` | returns the age of the person (in years) as an ``int`` | ++--------------------+----------------------------------------------------------------------------------------+ +| ``getHairColor()`` | returns the hair color of the person as a ``string`` | ++--------------------+----------------------------------------------------------------------------------------+ +| ``getHeight()`` | returns the height of the person (in cm) as a ``float`` | ++--------------------+----------------------------------------------------------------------------------------+ +| ``getLocation()`` | returns the location of the person's home (north, south, east or west) as a ``string`` | ++--------------------+----------------------------------------------------------------------------------------+ + +We've stored these predicates in the QL library ``tutorial.qll``. To access this library, type ``import tutorial`` in the query console. + +Libraries are convenient for storing commonly used predicates. This saves you from defining a predicate every time you need it. Instead you can just ``import`` the library and use the predicate directly. Once you have imported the library, you can apply any of these predicates to an expression by appending it. + +For example, ``t.getHeight()`` applies ``getHeight()`` to ``t`` and returns the height of ``t``. + +Start the search +----------------- + +The villagers answered "yes" to the question "Is the thief taller than 150cm?" To use this information, you can write the following query to list all villagers taller than 150cm. These are all possible suspects. + +.. code-block:: ql + + from Person t + where t.getHeight() > 150 + select t + +The first line, ``from Person t``, declares that ``t`` must be a ``Person``. We say that the `type `__ of ``t`` is ``Person``. + +Before you use the rest of your answers in your QL search, here are some more tools and examples to help you write your own QL queries: + +Logical connectives +------------------- + +Using `logical connectives `__, you can write more complex queries that combine different pieces of information. + +For example, if you know that the thief is older than 30 *and* has brown hair, you can use the following ``where`` clause to link two predicates: + +.. code-block:: ql + + where t.getAge() > 30 and t.getHairColor() = "brown" + +.. pull-quote:: + + Note + + The predicate ``getHairColor()`` returns a ``string``, so we need to include quotation marks around the result ``"brown"``. + +If the thief does *not* live north of the castle, you can use: + +.. code-block:: ql + + where not t.getLocation() = "north" + +If the thief has brown hair *or* black hair, you can use: + +.. code-block:: ql + + where t.getHairColor() = "brown" or t.getHairColor() = "black" + +You can also combine these connectives into longer statements: + +.. code-block:: ql + + where t.getAge() > 30 + and (t.getHairColor() = "brown" or t.getHairColor() = "black") + and not t.getLocation() = "north" + +.. pull-quote:: + + Note + + We've placed parentheses around the ``or`` clause to make sure that the query is evaluated as intended. Without parentheses, the connective ``and`` takes precedence over ``or``. + +Predicates don't always return exactly one value. For example, if a person ``p`` has black hair which is turning gray, ``p.getHairColor()`` will return two values: black and gray. + +What if the thief is bald? In that case, the thief has no hair, so the ``getHairColor()`` predicate simply doesn't return any results! + +If you know that the thief definitely isn't bald, then there must be a color that matches the thief's hair color. One way to express this in QL is to introduce a new variable ``c`` of type ``string`` and select those ``t`` where ``t.getHairColor()`` matches a value of ``c``. + +.. code-block:: ql + + from Person t, string c + where t.getHairColor() = c + select t + +Notice that we have only temporarily introduced the variable ``c`` and we didn't need it at all in the ``select`` clause. In this case, it is better to use ``exists``: + +.. code-block:: ql + + from Person t + where exists(string c | t.getHairColor() = c) + select t + +``exists`` introduces a temporary variable ``c`` of type ``string`` and holds only if there is at least one ``string c`` that satisfies ``t.getHairColor() = c``. + +.. pull-quote:: + + Note + + If you are familiar with logic, you may notice that ``exists`` in QL corresponds to the existential `quantifier `__ in logic. QL also has a universal quantifier ``forall(vars | formula 1 | formula 2)`` which is logically equivalent to ``not exists(vars | formula 1 | not formula 2)``. + +The real investigation +---------------------- + +You are now ready to track down the thief! Using the examples above, write a query to find the people who satisfy the answers to the first eight questions: + ++---+--------------------------------------------------------+--------+ +| | Question | Answer | ++===+========================================================+========+ +| 1 | Is the thief taller than 150 cm? | yes | ++---+--------------------------------------------------------+--------+ +| 2 | Does the thief have blond hair? | no | ++---+--------------------------------------------------------+--------+ +| 3 | Is the thief bald? | no | ++---+--------------------------------------------------------+--------+ +| 4 | Is the thief younger than 30? | no | ++---+--------------------------------------------------------+--------+ +| 5 | Does the thief live east of the castle? | yes | ++---+--------------------------------------------------------+--------+ +| 6 | Does the thief have black or brown hair? | yes | ++---+--------------------------------------------------------+--------+ +| 7 | Is the thief taller than 180cm and shorter than 190cm? | no | ++---+--------------------------------------------------------+--------+ +| 8 | Is the thief the oldest person in the village? | no | ++---+--------------------------------------------------------+--------+ + +Hints +^^^^^ + +#. Don't forget to ``import tutorial``! +#. Translate each question into QL separately. Look at the examples above if you get stuck. +#. For question 3, remember that a bald person does not have a hair color. +#. For question 8, note that if a person is *not* the oldest, then there is at least one person who is older than them. +#. Combine the conditions using logical connectives to get a query of the form: + +.. code-block:: ql + + import tutorial + + from Person t + where and + not and + ... + select t + +Once you have finished, you will have a list of possible suspects. One of those people must be the thief! + +➤ `See the answer in the query console on LGTM.com `__ + +.. pull-quote:: + + Note + + In the answer, we used ``/*`` and ``*/`` to label the different parts of the query. Any text surrounded by ``/*`` and ``*/`` is not evaluated as part of the QL code, but is just a *comment*. + +You are getting closer to solving the mystery! Unfortunately, you still have quite a long list of suspects... To find out which of your suspects is the thief, you must gather more information and refine your query in the next step. + +More advanced queries +--------------------- + +What if you want to find the oldest, youngest, tallest, or shortest person in the village? As mentioned in the previous topic, you can do this using ``exists``. However, there is also a more efficient way to do this in QL using functions like ``max`` and ``min``. These are examples of `aggregates `__. + +In general, an aggregate is a function that performs an operation on multiple pieces of data and returns a single value as its output. Common aggregates are ``count``, ``max``, ``min``, ``avg`` (average) and ``sum``. The general way to use an aggregate is: + +.. code-block:: ql + + ( | | ) + +For example, you can use the ``max`` aggregate to find the age of the oldest person in the village: + +.. code-block:: ql + + max(int i | exists(Person p | p.getAge() = i) | i) + +This aggregate considers all integers ``i``, limits ``i`` to values that match the ages of people in the village, and then returns the largest matching integer. + +But how can you use this in an actual query? + +If the thief is the oldest person in the village, then you know that the thief's age is equal to the maximum age of the villagers: + +.. code-block:: ql + + from Person t + where t.getAge() = max(int i | exists(Person p | p.getAge() = i) | i) + select t + +This general aggregate syntax is quite long and inconvenient. In most cases, you can omit certain parts of the aggregate. A particularly helpful QL feature is *ordered aggregation*. This allows you to order the expression using ``order by``. + +For example, selecting the oldest villager becomes much simpler if you use an ordered aggregate. + +.. code-block:: ql + + select max(Person p | | p order by p.getAge()) + +The ordered aggregate considers every person ``p`` and selects the person with the maximum age. In this case, there are no restrictions on what people to consider, so the ```` clause is empty. Note that if there are several people with the same maximum age, the query lists all of them. + +Here are some more examples of aggregates: + ++-------------------------------------------------------------------------+---------------------------------------------------+ +| Example | Result | ++=========================================================================+===================================================+ +| ``min(Person p | p.getLocation() = "east" | p order by p.getHeight())`` | shortest person in the east of the village | ++-------------------------------------------------------------------------+---------------------------------------------------+ +| ``count(Person p | p.getLocation() = "south" | p)`` | number of people in the south of the village | ++-------------------------------------------------------------------------+---------------------------------------------------+ +| ``avg(Person p | | p.getHeight())`` | average height of the villagers | ++-------------------------------------------------------------------------+---------------------------------------------------+ +| ``sum(Person p | p.getHairColor() = "brown" | p.getAge())`` | combined age of all the villagers with brown hair | ++-------------------------------------------------------------------------+---------------------------------------------------+ + +Capture the culprit +------------------- + +You can now translate the remaining questions into QL: + ++-----+--------------------------------------------------------------------+--------+ +| | Question | Answer | ++=====+====================================================================+========+ +| ... | ... | ... | ++-----+--------------------------------------------------------------------+--------+ +| 9 | Is the thief the tallest person in the village? | no | ++-----+--------------------------------------------------------------------+--------+ +| 10 | Is the thief shorter than the average villager? | yes | ++-----+--------------------------------------------------------------------+--------+ +| 11 | Is the thief the oldest person in the eastern part of the village? | yes | ++-----+--------------------------------------------------------------------+--------+ + +Have you found the thief? + +➤ `See the answer in the query console on LGTM.com `__ + +Further reading +--------------- + +- Help the villagers track down another criminal in the :doc:`next tutorial `. +- Find out more about the concepts you discovered in this tutorial in the `QL language reference `__. +- Explore the libraries that help you get data about code in :doc:`Learning CodeQL <../../index>`. diff --git a/docs/language/learn-ql/beginner/find-thief-1.rst b/docs/language/learn-ql/beginner/find-thief-1.rst deleted file mode 100644 index 3df6c108837..00000000000 --- a/docs/language/learn-ql/beginner/find-thief-1.rst +++ /dev/null @@ -1,71 +0,0 @@ -Find the thief: Introduction -============================ - -There is a small village hidden away in the mountains. The village is divided into four parts—north, south, east, and west—and in the center stands a dark and mysterious castle... Inside the castle, locked away in the highest tower, lies the king's valuable golden crown. One night, a terrible crime is committed. A thief breaks into the tower and steals the crown! - -You know that the thief must live in the village, since nobody else knew about the crown. After some expert detective work, you obtain a list of all the people in the village and some of their personal details. - -+------+-----+------------+--------+----------+ -| Name | Age | Hair color | Height | Location | -+======+=====+============+========+==========+ -| ... | ... | ... | ... | ... | -+------+-----+------------+--------+----------+ - -Sadly, you still have no idea who could have stolen the crown so you walk around the village to find clues. The villagers act very suspiciously and you are convinced they have information about the thief. They refuse to share their knowledge with you directly, but they reluctantly agree to answer questions. They are still not very talkative and **only answer questions with 'yes' or 'no'**. - -You start asking some creative questions and making notes of the answers so you can compare them with your information later: - -+------+--------------------------------------------------------------------+--------+ -| | Question | Answer | -+======+====================================================================+========+ -| (1) | Is the thief taller than 150 cm? | yes | -+------+--------------------------------------------------------------------+--------+ -| (2) | Does the thief have blond hair? | no | -+------+--------------------------------------------------------------------+--------+ -| (3) | Is the thief bald? | no | -+------+--------------------------------------------------------------------+--------+ -| (4) | Is the thief younger than 30? | no | -+------+--------------------------------------------------------------------+--------+ -| (5) | Does the thief live east of the castle? | yes | -+------+--------------------------------------------------------------------+--------+ -| (6) | Does the thief have black or brown hair? | yes | -+------+--------------------------------------------------------------------+--------+ -| (7) | Is the thief taller than 180cm and shorter than 190cm? | no | -+------+--------------------------------------------------------------------+--------+ -| (8) | Is the thief the tallest person in the village? | no | -+------+--------------------------------------------------------------------+--------+ -| (9) | Is the thief shorter than the average villager? | yes | -+------+--------------------------------------------------------------------+--------+ -| (10) | Is the thief the oldest person in the eastern part of the village? | yes | -+------+--------------------------------------------------------------------+--------+ - -There is too much information to search through by hand, so you decide to use your newly acquired QL skills to help you with your investigation... - -#. Open the `query console `__ to get started. -#. Select a language and a demo project. For this tutorial, any language and project will do. -#. Delete the default code ``import select "hello world"``. - -QL libraries ------------- - -We've defined a number of QL `predicates `__ to help you extract data from your table. A QL predicate is a mini-query that expresses a relation between various pieces of data and describes some of their properties. In this case, the predicates give you information about a person, for example their height or age. - -+--------------------+----------------------------------------------------------------------------------------+ -| Predicate | Description | -+====================+========================================================================================+ -| ``getAge()`` | returns the age of the person (in years) as an ``int`` | -+--------------------+----------------------------------------------------------------------------------------+ -| ``getHairColor()`` | returns the hair color of the person as a ``string`` | -+--------------------+----------------------------------------------------------------------------------------+ -| ``getHeight()`` | returns the height of the person (in cm) as a ``float`` | -+--------------------+----------------------------------------------------------------------------------------+ -| ``getLocation()`` | returns the location of the person's home (north, south, east or west) as a ``string`` | -+--------------------+----------------------------------------------------------------------------------------+ - -We've stored these predicates in the QL library ``tutorial.qll``. To access this library, type ``import tutorial`` in the query console. - -Libraries are convenient for storing commonly used predicates. This saves you from defining a predicate every time you need it. Instead you can just ``import`` the library and use the predicate directly. Once you have imported the library, you can apply any of these predicates to an expression by appending it. - -For example, ``t.getHeight()`` applies ``getHeight()`` to ``t`` and returns the height of ``t``. - -Continue to the next page to :doc:`start the investigation `. diff --git a/docs/language/learn-ql/beginner/find-thief-2.rst b/docs/language/learn-ql/beginner/find-thief-2.rst deleted file mode 100644 index b31f978d36a..00000000000 --- a/docs/language/learn-ql/beginner/find-thief-2.rst +++ /dev/null @@ -1,141 +0,0 @@ -Find the thief: Start the search -================================ - -The villagers answered "yes" to the question "Is the thief taller than 150cm?" To use this information, you can write the following query to list all villagers taller than 150cm. These are all possible suspects. - -.. code-block:: ql - - from Person t - where t.getHeight() > 150 - select t - -The first line, ``from Person t``, declares that ``t`` must be a ``Person``. We say that the `type `__ of ``t`` is ``Person``. - -Before you use the rest of your answers in your QL search, here are some more tools and examples to help you write your own QL queries: - -Logical connectives -------------------- - -Using `logical connectives `__, you can write more complex queries that combine different pieces of information. - -For example, if you know that the thief is older than 30 *and* has brown hair, you can use the following ``where`` clause to link two predicates: - -.. code-block:: ql - - where t.getAge() > 30 and t.getHairColor() = "brown" - -.. pull-quote:: - - Note - - The predicate ``getHairColor()`` returns a ``string``, so we need to include quotation marks around the result ``"brown"``. - -If the thief does *not* live north of the castle, you can use: - -.. code-block:: ql - - where not t.getLocation() = "north" - -If the thief has brown hair *or* black hair, you can use: - -.. code-block:: ql - - where t.getHairColor() = "brown" or t.getHairColor() = "black" - -You can also combine these connectives into longer statements: - -.. code-block:: ql - - where t.getAge() > 30 - and (t.getHairColor() = "brown" or t.getHairColor() = "black") - and not t.getLocation() = "north" - -.. pull-quote:: - - Note - - We've placed parentheses around the ``or`` clause to make sure that the query is evaluated as intended. Without parentheses, the connective ``and`` takes precedence over ``or``. - -Predicates don't always return exactly one value. For example, if a person ``p`` has black hair which is turning gray, ``p.getHairColor()`` will return two values: black and gray. - -What if the thief is bald? In that case, the thief has no hair, so the ``getHairColor()`` predicate simply doesn't return any results! - -If you know that the thief definitely isn't bald, then there must be a color that matches the thief's hair color. One way to express this in QL is to introduce a new variable ``c`` of type ``string`` and select those ``t`` where ``t.getHairColor()`` matches a value of ``c``. - -.. code-block:: ql - - from Person t, string c - where t.getHairColor() = c - select t - -Notice that we have only temporarily introduced the variable ``c`` and we didn't need it at all in the ``select`` clause. In this case, it is better to use ``exists``: - -.. code-block:: ql - - from Person t - where exists(string c | t.getHairColor() = c) - select t - -``exists`` introduces a temporary variable ``c`` of type ``string`` and holds only if there is at least one ``string c`` that satisfies ``t.getHairColor() = c``. - -.. pull-quote:: - - Note - - If you are familiar with logic, you may notice that ``exists`` in QL corresponds to the existential `quantifier `__ in logic. QL also has a universal quantifier ``forall(vars | formula 1 | formula 2)`` which is logically equivalent to ``not exists(vars | formula 1 | not formula 2)``. - -The real investigation ----------------------- - -You are now ready to track down the thief! Using the examples above, write a query to find the people who satisfy the answers to the first eight questions: - -+---+--------------------------------------------------------+--------+ -| | Question | Answer | -+===+========================================================+========+ -| 1 | Is the thief taller than 150 cm? | yes | -+---+--------------------------------------------------------+--------+ -| 2 | Does the thief have blond hair? | no | -+---+--------------------------------------------------------+--------+ -| 3 | Is the thief bald? | no | -+---+--------------------------------------------------------+--------+ -| 4 | Is the thief younger than 30? | no | -+---+--------------------------------------------------------+--------+ -| 5 | Does the thief live east of the castle? | yes | -+---+--------------------------------------------------------+--------+ -| 6 | Does the thief have black or brown hair? | yes | -+---+--------------------------------------------------------+--------+ -| 7 | Is the thief taller than 180cm and shorter than 190cm? | no | -+---+--------------------------------------------------------+--------+ -| 8 | Is the thief the oldest person in the village? | no | -+---+--------------------------------------------------------+--------+ - -Hints -^^^^^ - -#. Don't forget to ``import tutorial``! -#. Translate each question into QL separately. Look at the examples above if you get stuck. -#. For question 3, remember that a bald person does not have a hair color. -#. For question 8, note that if a person is *not* the oldest, then there is at least one person who is older than them. -#. Combine the conditions using logical connectives to get a query of the form: - -.. code-block:: ql - - import tutorial - - from Person t - where and - not and - ... - select t - -Once you have finished, you will have a list of possible suspects. One of those people must be the thief! - -➤ `See the answer in the query console `__ - -.. pull-quote:: - - Note - - In the answer, we used ``/*`` and ``*/`` to label the different parts of the query. Any text surrounded by ``/*`` and ``*/`` is not evaluated as part of the QL code, but is just a *comment*. - -You are getting closer to solving the mystery! Unfortunately, you still have quite a long list of suspects... To find out which of your suspects is the thief, you must gather more information and refine your query in the :doc:`next step `. diff --git a/docs/language/learn-ql/beginner/find-thief-3.rst b/docs/language/learn-ql/beginner/find-thief-3.rst deleted file mode 100644 index f8323147cf1..00000000000 --- a/docs/language/learn-ql/beginner/find-thief-3.rst +++ /dev/null @@ -1,80 +0,0 @@ -Find the thief: More advanced queries -===================================== - -What if you want to find the oldest, youngest, tallest, or shortest person in the village? As mentioned in the previous topic, you can do this using ``exists``. However, there is also a more efficient way to do this in QL using functions like ``max`` and ``min``. These are examples of `aggregates `__. - -In general, an aggregate is a function that performs an operation on multiple pieces of data and returns a single value as its output. Common aggregates are ``count``, ``max``, ``min``, ``avg`` (average) and ``sum``. The general way to use an aggregate is: - -.. code-block:: ql - - ( | | ) - -For example, you can use the ``max`` aggregate to find the age of the oldest person in the village: - -.. code-block:: ql - - max(int i | exists(Person p | p.getAge() = i) | i) - -This aggregate considers all integers ``i``, limits ``i`` to values that match the ages of people in the village, and then returns the largest matching integer. - -But how can you use this in an actual query? - -If the thief is the oldest person in the village, then you know that the thief's age is equal to the maximum age of the villagers: - -.. code-block:: ql - - from Person t - where t.getAge() = max(int i | exists(Person p | p.getAge() = i) | i) - select t - -This general aggregate syntax is quite long and inconvenient. In most cases, you can omit certain parts of the aggregate. A particularly helpful QL feature is *ordered aggregation*. This allows you to order the expression using ``order by``. - -For example, selecting the oldest villager becomes much simpler if you use an ordered aggregate. - -.. code-block:: ql - - select max(Person p | | p order by p.getAge()) - -The ordered aggregate considers every person ``p`` and selects the person with the maximum age. In this case, there are no restrictions on what people to consider, so the ```` clause is empty. Note that if there are several people with the same maximum age, the query lists all of them. - -Here are some more examples of aggregates: - -+-------------------------------------------------------------------------+---------------------------------------------------+ -| Example | Result | -+=========================================================================+===================================================+ -| ``min(Person p | p.getLocation() = "east" | p order by p.getHeight())`` | shortest person in the east of the village | -+-------------------------------------------------------------------------+---------------------------------------------------+ -| ``count(Person p | p.getLocation() = "south" | p)`` | number of people in the south of the village | -+-------------------------------------------------------------------------+---------------------------------------------------+ -| ``avg(Person p | | p.getHeight())`` | average height of the villagers | -+-------------------------------------------------------------------------+---------------------------------------------------+ -| ``sum(Person p | p.getHairColor() = "brown" | p.getAge())`` | combined age of all the villagers with brown hair | -+-------------------------------------------------------------------------+---------------------------------------------------+ - -Capture the culprit -------------------- - -You can now translate the remaining questions into QL: - -+-----+--------------------------------------------------------------------+--------+ -| | Question | Answer | -+=====+====================================================================+========+ -| ... | ... | ... | -+-----+--------------------------------------------------------------------+--------+ -| 9 | Is the thief the tallest person in the village? | no | -+-----+--------------------------------------------------------------------+--------+ -| 10 | Is the thief shorter than the average villager? | yes | -+-----+--------------------------------------------------------------------+--------+ -| 11 | Is the thief the oldest person in the eastern part of the village? | yes | -+-----+--------------------------------------------------------------------+--------+ - -Have you found the thief? - -➤ `See the answer in the query console `__ - -What next? ----------- - -- Help the villagers track down another criminal in the :doc:`next tutorial `. -- Find out more about the concepts you discovered in this tutorial in the `QL language handbook `__. -- Explore the libraries that help you get data about code in :doc:`Learning CodeQL <../../index>`. diff --git a/docs/language/learn-ql/beginner/fire-2.rst b/docs/language/learn-ql/beginner/fire-2.rst deleted file mode 100644 index 0f5312e69f0..00000000000 --- a/docs/language/learn-ql/beginner/fire-2.rst +++ /dev/null @@ -1,43 +0,0 @@ -Catch the fire starter: Bald bandits -==================================== - -You ask the northerners if they have any more information about the fire starters. Luckily, you have a witness! The farmer living next to the field saw two people run away just after the fire started. He only saw the tops of their heads, and noticed that they were both bald. - -This is a very helpful clue. Remember that you wrote a QL query to select all bald people: - -.. code-block:: ql - - from Person p - where not exists (string c | p.getHairColor() = c) - select p - -To avoid having to type ``not exists (string c | p.getHairColor() = c)`` every time you want to select a bald person, you can instead define another new predicate ``isBald``. - -.. code-block:: ql - - predicate isBald(Person p) { - not exists (string c | p.getHairColor() = c) - } - -The property ``isBald(p)`` holds whenever ``p`` is bald, so you can replace the previous query with: - -.. code-block:: ql - - from Person p - where isBald(p) - select p - -The predicate ``isBald`` is defined to take a ``Person``, so it can also take a ``Southerner``, as ``Southerner`` is a subtype of ``Person``. It can't take an ``int`` for example—that would cause an error. - -You can now write a query to select the bald southerners who are allowed into the north. - -➤ `See the answer in the query console `__ - -You have found the two fire starters! They are arrested and the villagers are once again impressed with your work. - -What next? ----------- - -- Find out who will be the new ruler of the village in the :doc:`next tutorial `. -- Learn more about predicates and classes in the `QL language handbook `__. -- Explore the libraries that help you get data about code in :doc:`Learning CodeQL <../../index>`. diff --git a/docs/language/learn-ql/beginner/ql-tutorials.rst b/docs/language/learn-ql/beginner/ql-tutorials.rst index 2f3d6cc5e5d..a8dd35ff617 100644 --- a/docs/language/learn-ql/beginner/ql-tutorials.rst +++ b/docs/language/learn-ql/beginner/ql-tutorials.rst @@ -1,27 +1,19 @@ -QL detective tutorials -====================== +QL tutorials +============ + +Solve puzzles to learn the basics of QL before you analyze code with CodeQL. The tutorials teach you how to write queries and introduce you to key logic concepts along the way. .. toctree:: - :glob: :hidden: - ./* + ../introduction-to-ql + find-the-thief + catch-the-fire-starter + crown-the-rightful-heir + cross-the-river -Welcome to the detective tutorials! These are aimed at complete beginners who would like to learn the basics of QL, -before analyzing code with CodeQL. -The tutorials teach you how to write queries and introduce you to key logic concepts along the way. - -We recommend you first read the :doc:`Introduction to QL <../introduction-to-ql>` page for a description of the language and -some simple examples. - -Currently the following detective tutorials are available: - -- :doc:`Find the thief `—a three part mystery that introduces logical connectives, quantifiers, and aggregates -- :doc:`Catch the fire starter `—an intriguing search that introduces predicates and classes -- :doc:`Crown the rightful heir `—a detective puzzle that introduces recursion - -Further resources ------------------ - -- For a summary of available learning resources, see :doc:`Learning CodeQL <../../index>`. -- For an overview of the important concepts in QL, see the `QL language handbook `__. +- :doc:`Introduction to QL <../introduction-to-ql>`: Work through some simple exercises and examples to learn about the basics of QL and CodeQL. +- :doc:`Find the thief `: Take on the role of a detective to find the thief in this fictional village. You will learn how to use logical connectives, quantifiers, and aggregates in QL along the way. +- :doc:`Catch the fire starter `: Learn about QL predicates and classes to solve your second mystery as a QL detective. +- :doc:`Crown the rightful heir `: This is a QL detective puzzle that shows you how to use recursion in QL to write more complex queries. +- :doc:`Cross the river `: Use common QL features to write a query that finds a solution to the "River crossing" logic puzzle. diff --git a/docs/language/learn-ql/ql-etudes/river-crossing-1.ql b/docs/language/learn-ql/beginner/river-crossing-1.ql similarity index 100% rename from docs/language/learn-ql/ql-etudes/river-crossing-1.ql rename to docs/language/learn-ql/beginner/river-crossing-1.ql diff --git a/docs/language/learn-ql/ql-etudes/river-crossing.ql b/docs/language/learn-ql/beginner/river-crossing.ql similarity index 100% rename from docs/language/learn-ql/ql-etudes/river-crossing.ql rename to docs/language/learn-ql/beginner/river-crossing.ql diff --git a/docs/language/learn-ql/cpp/conversions-classes.rst b/docs/language/learn-ql/cpp/conversions-classes.rst index 016efa50d67..553423bb47e 100644 --- a/docs/language/learn-ql/cpp/conversions-classes.rst +++ b/docs/language/learn-ql/cpp/conversions-classes.rst @@ -1,15 +1,14 @@ -Tutorial: Conversions and classes -================================= +Conversions and classes in C and C++ +==================================== -Overview --------- - -This topic contains worked examples of how to write queries using the CodeQL library classes for C/C++ conversions and classes. +You can use the standard CodeQL libraries for C and C++ to detect when the type of an expression is changed. Conversions ----------- -Let us take a look at the ``Conversion`` class in the standard library: +In C and C++, conversions change the type of an expression. They may be implicit conversions generated by the compiler, or explicit conversions requested by the user. + +Let's take a look at the `Conversion `__ class in the standard library: - ``Expr`` @@ -25,8 +24,6 @@ Let us take a look at the ``Conversion`` class in the standard library: - ``ArrayToPointerConversion`` - ``VirtualMemberToFunctionPointerConversion`` -All conversions change the type of an expression. They may be implicit conversions (generated by the compiler) or explicit conversions (requested by the user). - Exploring the subexpressions of an assignment ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -166,7 +163,7 @@ Our starting point for the query is pairs of a base class and a derived class, c where derived.getABaseClass+() = base select base, derived, "The second class is derived from the first." -➤ `See this in the query console `__ +➤ `See this in the query console on LGTM.com `__ Note that the transitive closure symbol ``+`` indicates that ``Class.getABaseClass()`` may be followed one or more times, rather than only accepting a direct base class. @@ -178,7 +175,7 @@ A lot of the results are uninteresting template parameters. You can remove those and not exists(base.getATemplateArgument()) and not exists(derived.getATemplateArgument()) -➤ `See this in the query console `__ +➤ `See this in the query console on LGTM.com `__ Finding derived classes with destructors ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -197,7 +194,7 @@ Now we can extend the query to find derived classes with destructors, using the and d2 = derived.getDestructor() select base, derived, "The second class is derived from the first, and both have a destructor." -➤ `See this in the query console `__ +➤ `See this in the query console on LGTM.com `__ Notice that getting the destructor implicitly asserts that one exists. As a result, this version of the query returns fewer results than before. @@ -217,17 +214,17 @@ Our last change is to use ``Function.isVirtual()`` to find cases where the base and not d1.isVirtual() select d1, "This destructor should probably be virtual." -➤ `See this in the query console `__ +➤ `See this in the query console on LGTM.com `__ That completes the query. -There is a similar built-in LGTM `query `__ that finds classes in a C/C++ project with virtual functions but no virtual destructor. You can take a look at the code for this query by clicking **Open in query console** at the top of that page. +There is a similar built-in `query `__ on LGTM.com that finds classes in a C/C++ project with virtual functions but no virtual destructor. You can take a look at the code for this query by clicking **Open in query console** at the top of that page. -What next? ----------- +Further reading +--------------- - Explore other ways of querying classes using examples from the `C/C++ cookbook `__. -- Take a look at the :doc:`Analyzing data flow in C/C++ ` tutorial. -- Try the worked examples in the following topics: :doc:`Example: Checking that constructors initialize all private fields `, and :doc:`Example: Checking for allocations equal to 'strlen(string)' without space for a null terminator `. -- Find out more about QL in the `QL language handbook `__ and `QL language specification `__. -- Learn more about the query console in `Using the query console `__. +- Take a look at the :doc:`Analyzing data flow in C and C++ ` tutorial. +- Try the worked examples in the following topics: :doc:`Refining a query to account for edge cases `, and :doc:`Detecting a potential buffer overflow `. +- Find out more about QL in the `QL language reference `__. +- Learn more about the query console in `Using the query console `__ on LGTM.com. diff --git a/docs/language/learn-ql/cpp/dataflow.rst b/docs/language/learn-ql/cpp/dataflow.rst index 686759c049f..2b12986ce94 100644 --- a/docs/language/learn-ql/cpp/dataflow.rst +++ b/docs/language/learn-ql/cpp/dataflow.rst @@ -1,13 +1,12 @@ -Analyzing data flow in C/C++ -============================ +Analyzing data flow in C and C++ +================================ -Overview --------- +You can use data flow analysis to track the flow of potentially malicious or insecure data that can cause vulnerabilities in your codebase. -This topic describes how data flow analysis is implemented in the CodeQL libraries for C/C++ and includes examples to help you write your own data flow queries. -The following sections describe how to utilize the libraries for local data flow, global data flow, and taint tracking. +About data flow +--------------- -For a more general introduction to modeling data flow, see :doc:`Introduction to data flow analysis with CodeQL <../intro-to-data-flow>`. +Data flow analysis computes the possible values that a variable can hold at various points in a program, determining how those values propagate through the program, and where they are used. In CodeQL, you can model both local data flow and global data flow. For a more general introduction to modeling data flow, see :doc:`About data flow analysis <../intro-to-data-flow>`. Local data flow --------------- @@ -296,12 +295,12 @@ Exercise 3: Write a class that represents flow sources from ``getenv``. (`Answer Exercise 4: Using the answers from 2 and 3, write a query which finds all global data flows from ``getenv`` to ``gethostbyname``. (`Answer <#exercise-4>`__) -What next? ----------- +Further reading +--------------- -- Try the worked examples in the following topics: :doc:`Example: Checking that constructors initialize all private fields ` and :doc:`Example: Checking for allocations equal to 'strlen(string)' without space for a null terminator `. -- Find out more about QL in the `QL language handbook `__ and `QL language specification `__. -- Learn more about the query console in `Using the query console `__. +- Try the worked examples in the following topics: :doc:`Refining a query to account for edge cases ` and :doc:`Detecting a potential buffer overflow `. +- Find out more about QL in the `QL language reference `__. +- Learn more about the query console in `Using the query console `__ on LGTM.com. Answers ------- diff --git a/docs/language/learn-ql/cpp/expressions-types.rst b/docs/language/learn-ql/cpp/expressions-types.rst index 1595d594683..39da25330d3 100644 --- a/docs/language/learn-ql/cpp/expressions-types.rst +++ b/docs/language/learn-ql/cpp/expressions-types.rst @@ -1,13 +1,10 @@ -Tutorial: Expressions, types and statements -=========================================== +Expressions, types, and statements in C and C++ +=============================================== -Overview --------- +You can use CodeQL to explore expressions, types, and statements in C and C++ code to find, for example, incorrect assignments. -This topic contains worked examples of how to write queries using the standard CodeQL library classes for C/C++ expressions, types, and statements. - -Expressions and types ---------------------- +Expressions and types in CodeQL +------------------------------- Each part of an expression in C becomes an instance of the ``Expr`` class. For example, the C code ``x = x + 1`` becomes an ``AssignExpr``, an ``AddExpr``, two instances of ``VariableAccess`` and a ``Literal``. All of these CodeQL classes extend ``Expr``. @@ -24,7 +21,7 @@ In the following example we find instances of ``AssignExpr`` which assign the co where e.getRValue().getValue().toInt() = 0 select e, "Assigning the value 0 to something." -➤ `See this in the query console `__ +➤ `See this in the query console on LGTM.com `__ The ``where`` clause in this example gets the expression on the right side of the assignment, ``getRValue()``, and compares it with zero. Notice that there are no checks to make sure that the right side of the assignment is an integer or that it has a value (that is, it is compile-time constant, rather than a variable). For expressions where either of these assumptions is wrong, the associated predicate simply does not return anything and the ``where`` clause will not produce a result. You could think of it as if there is an implicit ``exists(e.getRValue().getValue().toInt())`` at the beginning of this line. @@ -34,7 +31,7 @@ It is also worth noting that the query above would find this C code: yPtr = NULL; -This is because the database contains a representation of the code base after the preprocessor transforms have run (for more information, see `Database generation `__). This means that any macro invocations, such as the ``NULL`` define used here, are expanded during the creation of the database. If you want to write queries about macros then there are some special library classes that have been designed specifically for this purpose (for example, the ``Macro``, ``MacroInvocation`` classes and predicates like ``Element.isInMacroExpansion()``). In this case, it is good that macros are expanded, but we do not want to find assignments to pointers. +This is because the database contains a representation of the code base after the preprocessor transforms have run. This means that any macro invocations, such as the ``NULL`` define used here, are expanded during the creation of the database. If you want to write queries about macros then there are some special library classes that have been designed specifically for this purpose (for example, the ``Macro``, ``MacroInvocation`` classes and predicates like ``Element.isInMacroExpansion()``). In this case, it is good that macros are expanded, but we do not want to find assignments to pointers. For more information, see `Database generation `__ on LGTM.com. Finding assignments of 0 to an integer ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -50,7 +47,7 @@ We can make the query more specific by defining a condition for the left side of and e.getLValue().getType().getUnspecifiedType() instanceof IntegralType select e, "Assigning the value 0 to an integer." -➤ `See this in the query console `__ +➤ `See this in the query console on LGTM.com `__ This checks that the left side of the assignment has a type that is some kind of integer. Note the call to ``Type.getUnspecifiedType()``. This resolves ``typedef`` types to their underlying types so that the query finds assignments like this one: @@ -61,8 +58,8 @@ This checks that the left side of the assignment has a type that is some kind of i = 0; -Statements ----------- +Statements in CodeQL +-------------------- We can refine the query further using statements. In this case we use the class ``ForStmt``: @@ -110,7 +107,7 @@ Unfortunately this would not quite work, because the loop initialization is actu and e.getLValue().getType().getUnspecifiedType() instanceof IntegralType select e, "Assigning the value 0 to an integer, inside a for loop initialization." -➤ `See this in the query console `__ +➤ `See this in the query console on LGTM.com `__ Finding assignments of 0 within the loop body ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -128,14 +125,14 @@ We can find assignments inside the loop body using similar code with the predica and e.getLValue().getType().getUnderlyingType() instanceof IntegralType select e, "Assigning the value 0 to an integer, inside a for loop body." -➤ `See this in the query console `__ +➤ `See this in the query console on LGTM.com `__ Note that we replaced ``e.getEnclosingStmt()`` with ``e.getEnclosingStmt().getParentStmt*()``, to find an assignment expression that is deeply nested inside the loop body. The transitive closure modifier ``*`` here indicates that ``Stmt.getParentStmt()`` may be followed zero or more times, rather than just once, giving us the statement, its parent statement, its parent's parent statement etc. -What next? ----------- +Further reading +--------------- - Explore other ways of finding types and statements using examples from the C/C++ cookbook for `types `__ and `statements `__. -- Take a look at the :doc:`Conversions and classes ` and :doc:`Analyzing data flow in C/C++ ` tutorials. -- Find out more about QL in the `QL language handbook `__ and `QL language specification `__. -- Learn more about the query console in `Using the query console `__. +- Take a look at the :doc:`Conversions and classes in C and C++ ` and :doc:`Analyzing data flow in C and C++ ` tutorials. +- Find out more about QL in the `QL language reference `__. +- Learn more about the query console in `Using the query console `__ on LGTM.com. diff --git a/docs/language/learn-ql/cpp/function-classes.rst b/docs/language/learn-ql/cpp/function-classes.rst index 51f07a8d0f0..841add0d4b0 100644 --- a/docs/language/learn-ql/cpp/function-classes.rst +++ b/docs/language/learn-ql/cpp/function-classes.rst @@ -1,10 +1,12 @@ -Tutorial: Function classes -========================== +Functions in C and C++ +======================= + +You can use CodeQL to explore functions in C and C++ code. Overview -------- -The standard CodeQL library for C and C++ represents functions using the ``Function`` class (see :doc:`Introducing the C/C++ libraries `). +The standard CodeQL library for C and C++ represents functions using the ``Function`` class (see :doc:`CodeQL libraries for C and C++ `). The example queries in this topic explore some of the most useful library predicates for querying functions. @@ -26,7 +28,7 @@ This query is very general, so there are probably too many results to be interes Finding functions that are not called ------------------------------------- -It might be more interesting to find functions that are not called, using the standard CodeQL ``FunctionCall`` class from the **abstract syntax tree** category (see :doc:`Introducing the C/C++ libraries `). The ``FunctionCall`` class can be used to identify places where a function is actually used, and it is related to ``Function`` through the ``FunctionCall.getTarget()`` predicate. +It might be more interesting to find functions that are not called, using the standard CodeQL ``FunctionCall`` class from the **abstract syntax tree** category (see :doc:`CodeQL libraries for C and C++ `). The ``FunctionCall`` class can be used to identify places where a function is actually used, and it is related to ``Function`` through the ``FunctionCall.getTarget()`` predicate. .. code-block:: ql @@ -36,7 +38,7 @@ It might be more interesting to find functions that are not called, using the st where not exists(FunctionCall fc | fc.getTarget() = f) select f, "This function is never called." -➤ `See this in the query console `__ +➤ `See this in the query console on LGTM.com `__ The new query finds functions that are not the target of any ``FunctionCall``—in other words, functions that are never called. You may be surprised by how many results the query finds. However, if you examine the results, you can see that many of the functions it finds are used indirectly. To create a query that finds only unused functions, we need to refine the query and exclude other ways of using a function. @@ -54,7 +56,7 @@ You can modify the query to remove functions where a function pointer is used to and not exists(FunctionAccess fa | fa.getTarget() = f) select f, "This function is never called, or referenced with a function pointer." -➤ `See this in the query console `__ +➤ `See this in the query console on LGTM.com `__ This query returns fewer results. However, if you examine the results then you can probably still find potential refinements. @@ -76,7 +78,7 @@ This query uses ``Function`` and ``FunctionCall`` to find calls to the function and not fc.getArgument(1) instanceof StringLiteral select fc, "sprintf called with variable format string." -➤ `See this in the query console `__ +➤ `See this in the query console on LGTM.com `__ This uses: @@ -87,10 +89,10 @@ Note that we could have used ``Declaration.getName()``, but ``Declaration.getQua The LGTM version of this query is considerably more complicated, but if you look carefully you will find that its structure is the same. See `Non-constant format string `__ and click **Open in query console** at the top of the page. -What next? ----------- +Further reading +--------------- - Explore other ways of finding functions using examples from the `C/C++ cookbook `__. -- Take a look at some of the other tutorials: :doc:`Expressions, types and statements `, :doc:`Conversions and classes `, and :doc:`Analyzing data flow in C/C++ `. -- Find out more about QL in the `QL language handbook `__ and `QL language specification `__. -- Learn more about the query console in `Using the query console `__. +- Take a look at some other tutorials: :doc:`Expressions, types and statements in C and C++ `, :doc:`Conversions and classes in C and C++ `, and :doc:`Analyzing data flow in C and C++ `. +- Find out more about QL in the `QL language reference `__. +- Learn more about the query console in `Using the query console `__ on LGTM.com. diff --git a/docs/language/learn-ql/cpp/guards.rst b/docs/language/learn-ql/cpp/guards.rst index f21a7f866e5..409df0a8f81 100644 --- a/docs/language/learn-ql/cpp/guards.rst +++ b/docs/language/learn-ql/cpp/guards.rst @@ -1,8 +1,10 @@ Using the guards library in C and C++ ===================================== -Overview --------- +You can use the CodeQL guards library to identify conditional expressions that control the execution of other parts of a program in C and C++ codebases. + +About the guards library +------------------------ The guards library (defined in ``semmle.code.cpp.controlflow.Guards``) provides a class `GuardCondition `__ representing Boolean values that are used to make control flow decisions. A ``GuardCondition`` is considered to guard a basic block if the block can only be reached if the ``GuardCondition`` is evaluated a certain way. For instance, in the following code, ``x < 10`` is a ``GuardCondition``, and it guards all the code before the return statement. @@ -20,7 +22,7 @@ A ``GuardCondition`` is considered to guard a basic block if the block can only The ``controls`` predicate ------------------------------------------------- +-------------------------- The ``controls`` predicate helps determine which blocks are only run when the ``GuardCondition`` evaluates a certain way. ``guard.controls(block, testIsTrue)`` holds if ``block`` is only entered if the value of this condition is ``testIsTrue``. diff --git a/docs/language/learn-ql/cpp/introduce-libraries-cpp.rst b/docs/language/learn-ql/cpp/introduce-libraries-cpp.rst index 9e7e910e78e..4960cfc5dba 100644 --- a/docs/language/learn-ql/cpp/introduce-libraries-cpp.rst +++ b/docs/language/learn-ql/cpp/introduce-libraries-cpp.rst @@ -1,10 +1,13 @@ -Introducing the CodeQL libraries for C/C++ -========================================== +CodeQL library for C and C++ +============================ -Overview --------- +When analyzing C or C++ code, you can use the large collection of classes in the CodeQL library for C and C++. -There is an extensive library for analyzing CodeQL databases extracted from C/C++ projects. The classes in this library present the data from a database in an object-oriented form and provide abstractions and predicates to help you with common analysis tasks. The library is implemented as a set of QL modules, that is, files with the extension ``.qll``. The module ``cpp.qll`` imports all the core C/C++ library modules, so you can include the complete library by beginning your query with: +About the CodeQL library for C and C++ +-------------------------------------- + +There is an extensive library for analyzing CodeQL databases extracted from C/C++ projects. The classes in this library present the data from a database in an object-oriented form and provide abstractions and predicates to help you with common analysis tasks. +The library is implemented as a set of QL modules, that is, files with the extension ``.qll``. The module ``cpp.qll`` imports all the core C/C++ library modules, so you can include the complete library by beginning your query with: .. code-block:: ql @@ -12,9 +15,7 @@ There is an extensive library for analyzing CodeQL databases extracted from C/C+ The rest of this topic summarizes the available CodeQL classes and corresponding C/C++ constructs. -NOTE: You can find related classes and features using the query console's auto-complete feature. You can also press *F3* to jump to the definition of any element; library files are opened in new tabs in the console. - -Summary of the library classes +Commonly-used library classes ------------------------------ The most commonly used standard library classes are listed below. The listing is broken down by functionality. Each library class is annotated with a C/C++ construct it corresponds to. @@ -521,9 +522,9 @@ This table lists `Preprocessor `, :doc:`Expressions, types and statements `, :doc:`Conversions and classes `, and :doc:`Analyzing data flow in C/C++ `. -- Find out more about QL in the `QL language handbook `__ and `QL language specification `__. -- Learn more about the query console in `Using the query console `__. +- Experiment with the worked examples in the CodeQL for C/C++ topics: :doc:`Functions in C and C++ `, :doc:`Expressions, types, and statements in C and C++ `, :doc:`Conversions and classes in C and C++ `, and :doc:`Analyzing data flow in C and C++ `. +- Find out more about QL in the `QL language reference `__. +- Learn more about the query console in `Using the query console `__ on LGTM.com. diff --git a/docs/language/learn-ql/cpp/private-field-initialization.rst b/docs/language/learn-ql/cpp/private-field-initialization.rst index 26b57660142..c1f9224a145 100644 --- a/docs/language/learn-ql/cpp/private-field-initialization.rst +++ b/docs/language/learn-ql/cpp/private-field-initialization.rst @@ -1,13 +1,15 @@ -Example: Checking that constructors initialize all private fields -================================================================= +Refining a query to account for edge cases +========================================== + +You can improve the results generated by a CodeQL query by adding conditions to remove false positive results caused by common edge cases. Overview -------- -This topic describes how a C++ query was developed. The example introduces recursive predicates and demonstrates the typical workflow used to refine a query. For a full overview of the topics available for learning to write queries for C/C++ code, see :doc:`CodeQL for C/C++ `. +This topic describes how a C++ query was developed. The example introduces recursive predicates and demonstrates the typical workflow used to refine a query. For a full overview of the topics available for learning to write queries for C/C++ code, see :doc:`CodeQL for C and C++ `. -Problem—finding every private field and checking for initialization -------------------------------------------------------------------- +Finding every private field and checking for initialization +----------------------------------------------------------- Writing a query to check if a constructor initializes all private fields seems like a simple problem, but there are several edge cases to account for. @@ -100,7 +102,7 @@ You may also wish to consider methods called by constructors that assign to the int m_value; }; -This case can be excluded by creating a recursive predicate. The recursive predicate is given a function and a field, then checks whether the function assigns to the field. The predicate runs itself on all the functions called by the function that it has been given. By passing the constructor to this predicate, we can check for assignments of a field in all functions called by the constructor, and then do the same for all functions called by those functions all the way down the tree of function calls (see `Recursion `__ for more information). +This case can be excluded by creating a recursive predicate. The recursive predicate is given a function and a field, then checks whether the function assigns to the field. The predicate runs itself on all the functions called by the function that it has been given. By passing the constructor to this predicate, we can check for assignments of a field in all functions called by the constructor, and then do the same for all functions called by those functions all the way down the tree of function calls. For more information, see `Recursion `__ in the QL language reference. .. code-block:: ql @@ -124,7 +126,7 @@ This case can be excluded by creating a recursive predicate. The recursive predi Refinement 4—simplifying the query ---------------------------------- -Finally we can simplify the query by using the `transitive closure operator `__. In this final version of the query, ``c.calls*(fun)`` resolves to the set of all functions that are ``c`` itself, are called by ``c``, are called by a function that is called by ``c``, and so on. This eliminates the need to make a new predicate all together. +Finally we can simplify the query by using the transitive closure operator. In this final version of the query, ``c.calls*(fun)`` resolves to the set of all functions that are ``c`` itself, are called by ``c``, are called by a function that is called by ``c``, and so on. This eliminates the need to make a new predicate all together. For more information, see `Transitive closures `__ in the QL language reference. .. code-block:: ql @@ -142,11 +144,11 @@ Finally we can simplify the query by using the `transitive closure operator `__ +➤ `See this in the query console on LGTM.com `__ -What next? ----------- +Further reading +--------------- -- Take a look at another example: :doc:`Checking for allocations equal to 'strlen(string)' without space for a null terminator `. -- Find out more about QL in the `QL language handbook `__ and `QL language specification `__. -- Learn more about the query console in `Using the query console `__. +- Take a look at another example: :doc:`Detecting a potential buffer overflow `. +- Find out more about QL in the `QL language reference `__. +- Learn more about the query console in `Using the query console `__ on LGTM.com. diff --git a/docs/language/learn-ql/cpp/ql-for-cpp.rst b/docs/language/learn-ql/cpp/ql-for-cpp.rst index b1596c7de92..bd52291a7f8 100644 --- a/docs/language/learn-ql/cpp/ql-for-cpp.rst +++ b/docs/language/learn-ql/cpp/ql-for-cpp.rst @@ -1,8 +1,9 @@ -CodeQL for C/C++ -================ +CodeQL for C and C++ +==================== + +Experiment and learn how to write effective and efficient queries for CodeQL databases generated from C and C++ codebases. .. toctree:: - :glob: :hidden: introduce-libraries-cpp @@ -12,47 +13,36 @@ CodeQL for C/C++ dataflow private-field-initialization zero-space-terminator - -These topics provide an overview of the CodeQL libraries for C/C++ and show examples of how to write queries that use them. - -- `Basic C/C++ query `__ describes how to write and run queries using LGTM. - -- :doc:`Introducing the CodeQL libraries for C/C++ ` introduces the standard libraries used to write queries for C and C++ code. - -- :doc:`Tutorial: Function classes ` demonstrates how to write queries using the standard CodeQL library classes for C/C++ functions. - -- :doc:`Tutorial: Expressions, types and statements ` demonstrates how to write queries using the standard CodeQL library classes for C/C++ expressions, types and statements. - -- :doc:`Tutorial: Conversions and classes ` demonstrates how to write queries using the standard CodeQL library classes for C/C++ conversions and classes. - -- :doc:`Tutorial: Analyzing data flow in C/C++ ` demonstrates how to write queries using the standard data flow and taint tracking libraries for C/C++. - -- :doc:`Example: Checking that constructors initialize all private fields ` works through the development of a query. It introduces recursive predicates and shows the typical workflow used to refine a query. - -- :doc:`Example: Checking for allocations equal to strlen(string) without space for a null terminator ` shows how a query to detect this particular buffer issue was developed. - -Advanced libraries ----------------------------------- - -.. toctree:: - :hidden: - guards range-analysis value-numbering-hash-cons -- :doc:`Using the guards library in C and C++ ` demonstrates how to identify conditional expressions that control the execution of other code and what guarantees they provide. -- :doc:`Using range analysis for C and C++ ` demonstrates how to determine constant upper and lower bounds and possible overflow or underflow of expressions. +- `Basic C/C++ query `__: Learn to write and run a simple CodeQL query using LGTM. -- :doc:`Using hash consing and value numbering for C and C++ ` demonstrates how to recognize expressions that are syntactically identical or compute the same value at runtime. +- :doc:`CodeQL library for C and C++ `: When analyzing C or C++ code, you can use the large collection of classes in the CodeQL library for C and C++. +- :doc:`Functions in C and C++ `: You can use CodeQL to explore functions in C and C++ code. -Other resources +- :doc:`Expressions, types, and statements in C and C++ `: You can use CodeQL to explore expressions, types, and statements in C and C++ code to find, for example, incorrect assignments. + +- :doc:`Conversions and classes in C and C++ `: You can use the standard CodeQL libraries for C and C++ to detect when the type of an expression is changed. + +- :doc:`Analyzing data flow in C and C++ `: You can use data flow analysis to track the flow of potentially malicious or insecure data that can cause vulnerabilities in your codebase. + +- :doc:`Refining a query to account for edge cases `: You can improve the results generated by a CodeQL query by adding conditions to remove false positive results caused by common edge cases. + +- :doc:`Detecting a potential buffer overflow `: You can use CodeQL to detect potential buffer overflows by checking for allocations equal to ``strlen`` in C and C++. + +- :doc:`Using the guards library in C and C++ `: You can use the CodeQL guards library to identify conditional expressions that control the execution of other parts of a program in C and C++ codebases. + +- :doc:`Using range analysis for C and C++ `: You can use range analysis to determine the upper or lower bounds on an expression, or whether an expression could potentially over or underflow. + +- :doc:`Hash consing and value numbering `: You can use specialized CodeQL libraries to recognize expressions that are syntactically identical or compute the same value at runtime in C and C++ codebases. + +Further reading --------------- -.. TODO: Rename the cookbooks: C/C++ cookbook, or C/C++ CodeQL cookbook, or CodeQL cookbook for C/C++, or...? - - For examples of how to query common C/C++ elements, see the `C/C++ cookbook `__. - For the queries used in LGTM, display a `C/C++ query `__ and click **Open in query console** to see the code used to find alerts. - For more information about the library for C/C++ see the `CodeQL library for C/C++ `__. diff --git a/docs/language/learn-ql/cpp/range-analysis.rst b/docs/language/learn-ql/cpp/range-analysis.rst index c60465e131d..ba324e86ac9 100644 --- a/docs/language/learn-ql/cpp/range-analysis.rst +++ b/docs/language/learn-ql/cpp/range-analysis.rst @@ -1,10 +1,10 @@ Using range analysis for C and C++ ================================== -Overview --------- +You can use range analysis to determine the upper or lower bounds on an expression, or whether an expression could potentially over or underflow. -Range analysis determines upper and lower bounds for an expression. +About the range analysis library +-------------------------------- The range analysis library (defined in ``semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis``) provides a set of predicates for determining constant upper and lower bounds on expressions, as well as recognizing integer overflows. For performance, the library performs automatic widening and therefore may not provide the tightest possible bounds. diff --git a/docs/language/learn-ql/cpp/value-numbering-hash-cons.rst b/docs/language/learn-ql/cpp/value-numbering-hash-cons.rst index e05f93dbae1..de102a15f6d 100644 --- a/docs/language/learn-ql/cpp/value-numbering-hash-cons.rst +++ b/docs/language/learn-ql/cpp/value-numbering-hash-cons.rst @@ -1,15 +1,14 @@ Hash consing and value numbering -================================================= +================================ -Overview --------- +You can use specialized CodeQL libraries to recognize expressions that are syntactically identical or compute the same value at runtime in C and C++ codebases. + +About the hash consing and value numbering libraries +---------------------------------------------------- In C and C++ databases, each node in the abstract syntax tree is represented by a separate object. This allows both analysis and results display to refer to specific appearances of a piece of syntax. However, it is frequently useful to determine whether two expressions are equivalent, either syntactically or semantically. -The `hash consing `__ library (defined in ``semmle.code.cpp.valuenumbering.HashCons``) provides a mechanism for identifying expressions that have the same syntactic structure. The `global value numbering `__ library (defined in ``semmle.code.cpp.valuenumbering.GlobalValueNumbering``) provides a mechanism for identifying expressions that compute the same value at runtime. - -Both libraries partition the expressions in each function into equivalence classes represented by objects. Each ``HashCons`` object represents a set of expressions with identical parse trees, while ``GVN`` objects represent sets of expressions that will always compute the same value. - +The hash consing library (defined in ``semmle.code.cpp.valuenumbering.HashCons``) provides a mechanism for identifying expressions that have the same syntactic structure. The global value numbering library (defined in ``semmle.code.cpp.valuenumbering.GlobalValueNumbering``) provides a mechanism for identifying expressions that compute the same value at runtime. Both libraries partition the expressions in each function into equivalence classes represented by objects. Each ``HashCons`` object represents a set of expressions with identical parse trees, while ``GVN`` objects represent sets of expressions that will always compute the same value. For more information, see `Hash consing `__ and `Value numbering `__ on Wikipedia. Example C code -------------- @@ -111,4 +110,3 @@ Example query hashCons(outer.getCondition()) = hashCons(inner.getCondition()) select inner.getCondition(), "The condition of this if statement duplicates the condition of $@", outer.getCondition(), "an enclosing if statement" - diff --git a/docs/language/learn-ql/cpp/zero-space-terminator.rst b/docs/language/learn-ql/cpp/zero-space-terminator.rst index a9370f7828d..a25437865f7 100644 --- a/docs/language/learn-ql/cpp/zero-space-terminator.rst +++ b/docs/language/learn-ql/cpp/zero-space-terminator.rst @@ -1,10 +1,7 @@ -Example: Checking for allocations equal to ``strlen(string)`` without space for a null terminator -================================================================================================= +Detecting a potential buffer overflow +===================================== -Overview --------- - -This topic describes how a C/C++ query for detecting a potential buffer overflow was developed. For a full overview of the topics available for learning to write queries for C/C++ code, see :doc:`CodeQL for C/C++ `. +You can use CodeQL to detect potential buffer overflows by checking for allocations equal to ``strlen`` in C and C++. This topic describes how a C/C++ query for detecting a potential buffer overflow was developed. Problem—detecting memory allocation that omits space for a null termination character ------------------------------------------------------------------------------------- @@ -98,7 +95,7 @@ When you have defined the basic query then you can refine the query to include f Improving the query using the 'SSA' library ------------------------------------------- -The ``SSA`` library represents variables in `static single assignment `__ (SSA) form. In this form, each variable is assigned exactly once and every variable is defined before it is used. The use of SSA variables simplifies queries considerably as much of the local data flow analysis has been done for us. +The ``SSA`` library represents variables in static single assignment (SSA) form. In this form, each variable is assigned exactly once and every variable is defined before it is used. The use of SSA variables simplifies queries considerably as much of the local data flow analysis has been done for us. For more information, see `Static single assignment `__ on Wikipedia. Including examples where the string size is stored before use ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -224,8 +221,8 @@ The completed query will now identify cases where the result of ``strlen`` is st where malloc.getAllocatedSize() instanceof StrlenCall select malloc, "This allocation does not include space to null-terminate the string." -What next? ----------- +Further reading +--------------- -- Find out more about QL in the `QL language handbook `__ and `QL language specification `__. -- Learn more about the query console in `Using the query console `__. +- Find out more about QL in the `QL language reference `__. +- Learn more about the query console in `Using the query console `__ on LGTM.com. diff --git a/docs/language/learn-ql/csharp/dataflow.rst b/docs/language/learn-ql/csharp/dataflow.rst index 3b9bb69d7da..c4b597428df 100644 --- a/docs/language/learn-ql/csharp/dataflow.rst +++ b/docs/language/learn-ql/csharp/dataflow.rst @@ -1,13 +1,14 @@ Analyzing data flow in C# ========================= -Overview --------- +You can use CodeQL to track the flow of data through a C# program to its use. -This topic describes how data flow analysis is implemented in the CodeQL libraries for C# and includes examples to help you write your own data flow queries. -The following sections describe how to utilize the libraries for local data flow, global data flow, and taint tracking. +About this article +------------------ -For a more general introduction to modeling data flow, see :doc:`Introduction to data flow analysis with CodeQL <../intro-to-data-flow>`. +This article describes how data flow analysis is implemented in the CodeQL libraries for C# and includes examples to help you write your own data flow queries. +The following sections describe how to use the libraries for local data flow, global data flow, and taint tracking. +For a more general introduction to modeling data flow, see :doc:`About data flow analysis <../intro-to-data-flow>`. Local data flow --------------- @@ -17,7 +18,7 @@ Local data flow is data flow within a single method or callable. Local data flow Using local data flow ~~~~~~~~~~~~~~~~~~~~~ -The local data flow library is in the module ``DataFlow``, which defines the class ``Node`` denoting any element that data can flow through. ``Node``\ s are divided into expression nodes (``ExprNode``) and parameter nodes (``ParameterNode``). It is possible to map between data flow nodes and expressions/parameters using the member predicates ``asExpr`` and ``asParameter``: +The local data flow library is in the module ``DataFlow``, which defines the class ``Node`` denoting any element that data can flow through. ``Node``\ s are divided into expression nodes (``ExprNode``) and parameter nodes (``ParameterNode``). You can map between data flow nodes and expressions/parameters using the member predicates ``asExpr`` and ``asParameter``: .. code-block:: ql @@ -45,9 +46,9 @@ or using the predicates ``exprNode`` and ``parameterNode``: */ ParameterNode parameterNode(Parameter p) { ... } -The predicate ``localFlowStep(Node nodeFrom, Node nodeTo)`` holds if there is an immediate data flow edge from the node ``nodeFrom`` to the node ``nodeTo``. The predicate can be applied recursively (using the ``+`` and ``*`` operators), or it is possible to use the predefined recursive predicate ``localFlow``. +The predicate ``localFlowStep(Node nodeFrom, Node nodeTo)`` holds if there is an immediate data flow edge from the node ``nodeFrom`` to the node ``nodeTo``. You can apply the predicate recursively, by using the ``+`` and ``*`` operators, or you can use the predefined recursive predicate ``localFlow``. -For example, finding flow from a parameter ``source`` to an expression ``sink`` in zero or more local steps can be achieved as follows: +For example, you can find flow from a parameter ``source`` to an expression ``sink`` in zero or more local steps: .. code-block:: ql @@ -65,9 +66,9 @@ Local taint tracking extends local data flow by including non-value-preserving f If ``x`` is a tainted string then ``y`` is also tainted. -The local taint tracking library is in the module ``TaintTracking``. Like local data flow, a predicate ``localTaintStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo)`` holds if there is an immediate taint propagation edge from the node ``nodeFrom`` to the node ``nodeTo``. The predicate can be applied recursively (using the ``+`` and ``*`` operators), or it is possible to use the predefined recursive predicate ``localTaint``. +The local taint tracking library is in the module ``TaintTracking``. Like local data flow, a predicate ``localTaintStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo)`` holds if there is an immediate taint propagation edge from the node ``nodeFrom`` to the node ``nodeTo``. You can apply the predicate recursively, by using the ``+`` and ``*`` operators, or you can use the predefined recursive predicate ``localTaint``. -For example, finding taint propagation from a parameter ``source`` to an expression ``sink`` in zero or more local steps can be achieved as follows: +For example, you can find taint propagation from a parameter ``source`` to an expression ``sink`` in zero or more local steps: .. code-block:: ql @@ -76,7 +77,7 @@ For example, finding taint propagation from a parameter ``source`` to an express Examples ~~~~~~~~ -The following query finds the filename passed to ``System.IO.File.Open``: +This query finds the filename passed to ``System.IO.File.Open``: .. code-block:: ql @@ -99,7 +100,7 @@ Unfortunately this will only give the expression in the argument, not the values and DataFlow::localFlow(DataFlow::exprNode(src), DataFlow::exprNode(call.getArgument(0))) select src -Then we can make the source more specific, for example an access to a public parameter. The following query finds where a public parameter is used to open a file: +Then we can make the source more specific, for example an access to a public parameter. This query finds instances where a public parameter is used to open a file: .. code-block:: ql @@ -112,7 +113,7 @@ Then we can make the source more specific, for example an access to a public par and call.getEnclosingCallable().(Member).isPublic() select p, "Opening a file from a public method." -The following example finds calls to ``String.Format`` where the format string isn't hard-coded: +This query finds calls to ``String.Format`` where the format string isn't hard-coded: .. code-block:: ql @@ -139,7 +140,7 @@ Global data flow tracks data flow throughout the entire program, and is therefor Using global data flow ~~~~~~~~~~~~~~~~~~~~~~ -The global data flow library is used by extending the class ``DataFlow::Configuration`` as follows: +The global data flow library is used by extending the class ``DataFlow::Configuration``: .. code-block:: ql @@ -157,12 +158,12 @@ The global data flow library is used by extending the class ``DataFlow::Configur } } -The following predicates are defined in the configuration: +These predicates are defined in the configuration: -- ``isSource`` - defines where data may flow from -- ``isSink`` - defines where data may flow to -- ``isBarrier`` - optionally, restricts the data flow -- ``isAdditionalFlowStep`` - optionally, adds additional flow steps +- ``isSource`` - defines where data may flow from. +- ``isSink`` - defines where data may flow to. +- ``isBarrier`` - optionally, restricts the data flow. +- ``isAdditionalFlowStep`` - optionally, adds additional flow steps. The characteristic predicate (``MyDataFlowConfiguration()``) defines the name of the configuration, so ``"..."`` must be replaced with a unique name. @@ -177,7 +178,7 @@ The data flow analysis is performed using the predicate ``hasFlow(DataFlow::Node Using global taint tracking ~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Global taint tracking is to global data flow what local taint tracking is to local data flow. That is, global taint tracking extends global data flow with additional non-value-preserving steps. The global taint tracking library is used by extending the class ``TaintTracking::Configuration`` as follows: +Global taint tracking is to global data flow what local taint tracking is to local data flow. That is, global taint tracking extends global data flow with additional non-value-preserving steps. The global taint tracking library is used by extending the class ``TaintTracking::Configuration``: .. code-block:: ql @@ -195,12 +196,12 @@ Global taint tracking is to global data flow what local taint tracking is to loc } } -The following predicates are defined in the configuration: +These predicates are defined in the configuration: -- ``isSource`` - defines where taint may flow from -- ``isSink`` - defines where taint may flow to -- ``isSanitizer`` - optionally, restricts the taint flow -- ``isAdditionalTaintStep`` - optionally, adds additional taint steps +- ``isSource`` - defines where taint may flow from. +- ``isSink`` - defines where taint may flow to. +- ``isSanitizer`` - optionally, restricts the taint flow. +- ``isAdditionalTaintStep`` - optionally, adds additional taint steps. Similar to global data flow, the characteristic predicate (``MyTaintTrackingConfiguration()``) defines the unique name of the configuration and the taint analysis is performed using the predicate ``hasFlow(DataFlow::Node source, DataFlow::Node sink)``. @@ -214,7 +215,7 @@ The class ``RemoteSourceFlow`` (defined in module ``semmle.code.csharp.dataflow. Example ~~~~~~~ -The following example shows a data flow configuration that uses all public API parameters as data sources. +This query shows a data flow configuration that uses all public API parameters as data sources: .. code-block:: ql @@ -236,30 +237,30 @@ The following example shows a data flow configuration that uses all public API p Class hierarchy ~~~~~~~~~~~~~~~ -- ``DataFlow::Configuration`` - base class for custom global data flow analysis -- ``DataFlow::Node`` - an element behaving as a data flow node +- ``DataFlow::Configuration`` - base class for custom global data flow analysis. +- ``DataFlow::Node`` - an element behaving as a data flow node. - - ``DataFlow::ExprNode`` - an expression behaving as a data flow node - - ``DataFlow::ParameterNode`` - a parameter data flow node representing the value of a parameter at function entry + - ``DataFlow::ExprNode`` - an expression behaving as a data flow node. + - ``DataFlow::ParameterNode`` - a parameter data flow node representing the value of a parameter at function entry. - - ``PublicCallableParameter`` - a parameter to a public method/callable in a public class + - ``PublicCallableParameter`` - a parameter to a public method/callable in a public class. - - ``RemoteSourceFlow`` - data flow from network/remote input + - ``RemoteSourceFlow`` - data flow from network/remote input. - - ``AspNetRemoteFlowSource`` - data flow from remote ASP.NET user input + - ``AspNetRemoteFlowSource`` - data flow from remote ASP.NET user input. - - ``AspNetQueryStringRemoteFlowSource`` - data flow from ``System.Web.HttpRequest`` - - ``AspNetUserInputRemoveFlowSource`` - data flow from ``System.Web.IO.WebControls.TextBox`` + - ``AspNetQueryStringRemoteFlowSource`` - data flow from ``System.Web.HttpRequest``. + - ``AspNetUserInputRemoveFlowSource`` - data flow from ``System.Web.IO.WebControls.TextBox``. - - ``WcfRemoteFlowSource`` - data flow from a WCF web service - - ``AspNetServiceRemoteFlowSource`` - data flow from an ASP.NET web service + - ``WcfRemoteFlowSource`` - data flow from a WCF web service. + - ``AspNetServiceRemoteFlowSource`` - data flow from an ASP.NET web service. -- ``TaintTracking::Configuration`` - base class for custom global taint tracking analysis +- ``TaintTracking::Configuration`` - base class for custom global taint tracking analysis. Examples ~~~~~~~~ -The following data flow configuration tracks data flow from environment variables to opening files: +This data flow configuration tracks data flow from environment variables to opening files: .. code-block:: ql @@ -300,7 +301,7 @@ Exercise 4: Using the answers from 2 and 3, write a query to find all global dat Extending library data flow --------------------------- -*Library* data flow defines how data flows through libraries where the source code is not available, such as the .NET Framework, third-party libraries or proprietary libraries. +Library data flow defines how data flows through libraries where the source code is not available, such as the .NET Framework, third-party libraries or proprietary libraries. To define new library data flow, extend the class ``LibraryTypeDataFlow`` from the module ``semmle.code.csharp.dataflow.LibraryTypeDataFlow``. Override the predicate ``callableFlow`` to define how data flows through the methods in the class. ``callableFlow`` has the signature @@ -308,9 +309,9 @@ To define new library data flow, extend the class ``LibraryTypeDataFlow`` from t predicate callableFlow(CallableFlowSource source, CallableFlowSink sink, SourceDeclarationCallable callable, boolean preservesValue) -- ``callable`` - the ``Callable`` (such as a method, constructor, property getter or setter) performing the data flow -- ``source`` - the data flow input -- ``sink`` - the data flow output +- ``callable`` - the ``Callable`` (such as a method, constructor, property getter or setter) performing the data flow. +- ``source`` - the data flow input. +- ``sink`` - the data flow output. - ``preservesValue`` - whether the flow step preserves the value, for example if ``x`` is a string then ``x.ToString()`` preserves the value where as ``x.ToLower()`` does not. Class hierarchy @@ -318,24 +319,24 @@ Class hierarchy - ``Callable`` - a callable (methods, accessors, constructors etc.) - - ``SourceDeclarationCallable`` - an unconstructed callable + - ``SourceDeclarationCallable`` - an unconstructed callable. -- ``CallableFlowSource`` - the input of data flow into the callable +- ``CallableFlowSource`` - the input of data flow into the callable. - - ``CallableFlowSourceQualifier`` - the data flow comes from the object itself - - ``CallableFlowSourceArg`` - the data flow comes from an argument to the call + - ``CallableFlowSourceQualifier`` - the data flow comes from the object itself. + - ``CallableFlowSourceArg`` - the data flow comes from an argument to the call. -- ``CallableFlowSink`` - the output of data flow from the callable +- ``CallableFlowSink`` - the output of data flow from the callable. - - ``CallableFlowSinkQualifier`` - the output is to the object itself - - ``CallableFlowSinkReturn`` - the output is returned from the call - - ``CallableFlowSinkArg`` - the output is an argument - - ``CallableFlowSinkDelegateArg`` - the output flows through a delegate argument (for example, LINQ) + - ``CallableFlowSinkQualifier`` - the output is to the object itself. + - ``CallableFlowSinkReturn`` - the output is returned from the call. + - ``CallableFlowSinkArg`` - the output is an argument. + - ``CallableFlowSinkDelegateArg`` - the output flows through a delegate argument (for example, LINQ). Example ~~~~~~~ -The following example is adapted from ``LibraryTypeDataFlow.qll``. It declares data flow through the class ``System.Uri``, including the constructor, the ``ToString`` method, and the properties ``Query``, ``OriginalString``, and ``PathAndQuery``. +This example is adapted from ``LibraryTypeDataFlow.qll``. It declares data flow through the class ``System.Uri``, including the constructor, the ``ToString`` method, and the properties ``Query``, ``OriginalString``, and ``PathAndQuery``. .. code-block:: ql @@ -489,7 +490,7 @@ Exercise 4 Exercise 5 ~~~~~~~~~~ -All properties can flow data. We can declare this as follows: +All properties can flow data: .. code-block:: ql @@ -545,9 +546,9 @@ This can be adapted from the ``SystemUriFlow`` class: } } -What next? ----------- +Further reading +--------------- - Learn about the standard libraries used to write queries for C# in :doc:`Introducing the C# libraries `. -- Find out more about QL in the `QL language handbook `__ and `QL language specification `__. -- Learn more about the query console in `Using the query console `__. +- Find out more about QL in the `QL language reference `__. +- Learn more about the query console in `Using the query console `__ on LGTM.com. diff --git a/docs/language/learn-ql/csharp/introduce-libraries-csharp.rst b/docs/language/learn-ql/csharp/introduce-libraries-csharp.rst index 32a782d5034..f262478dc44 100644 --- a/docs/language/learn-ql/csharp/introduce-libraries-csharp.rst +++ b/docs/language/learn-ql/csharp/introduce-libraries-csharp.rst @@ -1,18 +1,20 @@ -Introducing the CodeQL libraries for C# -======================================= +CodeQL library for C# +===================== -Overview --------- +When you're analyzing a C# program, you can make use of the large collection of classes in the CodeQL library for C#. -There is an extensive library for analyzing CodeQL databases extracted from C# projects. The classes in this library present the data from a database in an object-oriented form and provide abstractions and predicates to help you with common analysis tasks. The library is implemented as a set of QL modules, that is, files with the extension ``.qll``. The module ``csharp.qll`` imports all the core C# library modules, so you can include the complete library by beginning your query with: +About the CodeQL libraries for C# +--------------------------------- + +There is an extensive core library for analyzing CodeQL databases extracted from C# projects. The classes in this library present the data from a database in an object-oriented form and provide abstractions and predicates to help you with common analysis tasks. The library is implemented as a set of QL modules, that is, files with the extension ``.qll``. The module ``csharp.qll`` imports all the core C# library modules, so you can include the complete library by beginning your query with: .. code-block:: ql import csharp -Since this is required for all C# queries, it is omitted from code snippets below. +Since this is required for all C# queries, it's omitted from code snippets below. -The core library contains all the program elements, including `files <#files>`__, `types <#types>`__, methods, `variables <#variables>`__, `statements <#statements>`__, and `expressions <#expressions>`__. This is sufficient for most queries, however additional libraries can be imported for bespoke functionality such as control flow and data flow. See :doc:`CodeQL for C# ` for information about these additional libraries. +The core library contains all the program elements, including `files <#files>`__, `types <#types>`__, methods, `variables <#variables>`__, `statements <#statements>`__, and `expressions <#expressions>`__. This is sufficient for most queries, however additional libraries can be imported for bespoke functionality such as control flow and data flow. For information about these additional libraries, see :doc:`CodeQL for C# `. Class hierarchies ~~~~~~~~~~~~~~~~~ @@ -42,7 +44,7 @@ Each section contains a class hierarchy, showing the inheritance structure betwe - ``AddExpr``, ``SubExpr``, ``MulExpr``, ``DivExpr``, ``RemExpr`` -This means that the class ``AddExpr`` extends class ``BinaryArithmeticOperation``, which in turn extends class ``ArithmeticOperation`` and so on. If you want to query any arithmetic operation, then use the class ``ArithmeticOperation``, but if you specifically want to limit the query to addition operations, then use the class ``AddExpr``. +This means that the class ``AddExpr`` extends class ``BinaryArithmeticOperation``, which in turn extends class ``ArithmeticOperation`` and so on. If you want to query any arithmetic operation, use the class ``ArithmeticOperation``, but if you specifically want to limit the query to addition operations, use the class ``AddExpr``. Classes can also be considered to be *sets*, and the ``extends`` relation between classes defines a subset. Every member of class ``AddExpr`` is also in the class ``BinaryArithmeticOperation``. In general, classes overlap and an entity can be a member of several classes. @@ -50,14 +52,14 @@ This overview omits some of the less important or intermediate classes from the Each class has predicates, which are logical propositions about that class. They also define navigable relationships between classes. Predicates are inherited, so for example the ``AddExpr`` class inherits the predicates ``getLeftOperand()`` and ``getRightOperand()`` from ``BinaryArithmeticOperation``, and ``getType()`` from class ``Expr``. This is similar to how methods are inherited in object-oriented programming languages. -In this overview, we present the most common and useful predicates. Consult the `reference `__, the CodeQL source code, and autocomplete in the editor for the complete list of predicates available on each class. +In this overview, we present the most common and useful predicates. For the complete list of predicates available on each class, you can look in the CodeQL source code, use autocomplete in the editor, or see the `C# reference `__. Exercises ~~~~~~~~~ Each section in this topic contains exercises to check your understanding. -Exercise 1: Simplify the following query: +Exercise 1: Simplify this query: .. code-block:: ql @@ -84,11 +86,11 @@ Class hierarchy Predicates ~~~~~~~~~~ -- ``getName()`` - gets the full path of the file (for example, ``C:\Temp\test.cs``) -- ``getNumberOfLines()`` - gets the number of lines (for source files only) -- ``getShortName()`` - gets the name of the file without the extension (for example, ``test``) -- ``getBaseName()`` - gets the name and extension of the file (for example, ``test.cs``) -- ``getParent()`` - gets the parent directory +- ``getName()`` - gets the full path of the file (for example, ``C:\Temp\test.cs``). +- ``getNumberOfLines()`` - gets the number of lines (for source files only). +- ``getShortName()`` - gets the name of the file without the extension (for example, ``test``). +- ``getBaseName()`` - gets the name and extension of the file (for example, ``test.cs``). +- ``getParent()`` - gets the parent directory. Examples ~~~~~~~~ @@ -116,7 +118,7 @@ Exercise 2: Write a query to find the source file with the largest number of lin Elements -------- -The class `Element `__ is the base class for all parts of a C# program, and it is the root of the element class hierarchy. All program elements (such as types, methods, statements, and expressions) ultimately derive from this common base class. +The class `Element `__ is the base class for all parts of a C# program, and it's the root of the element class hierarchy. All program elements (such as types, methods, statements, and expressions) ultimately derive from this common base class. ``Element`` forms a hierarchical structure of the program, which can be navigated using the ``getParent()`` and ``getChild()`` predicates. This is much like an abstract syntax tree, and also applies to elements in assemblies. @@ -125,10 +127,10 @@ Predicates The ``Element`` class provides common functionality for all program elements, including: -- ``getLocation()`` - gets the text span in the source code -- ``getFile()`` - gets the ``File`` containing the ``Element`` -- ``getParent()`` - gets the parent ``Element``, if any -- ``getAChild()`` - gets a child ``Element`` of this element, if any +- ``getLocation()`` - gets the text span in the source code. +- ``getFile()`` - gets the ``File`` containing the ``Element``. +- ``getParent()`` - gets the parent ``Element``, if any. +- ``getAChild()`` - gets a child ``Element`` of this element, if any. Examples ~~~~~~~~ @@ -163,11 +165,11 @@ Predicates Some predicates of ``Location`` include: -- ``getFile()`` - gets the ``File`` -- ``getStartLine()`` - gets the first line of the text -- ``getEndLine()`` - gets the last line of the text -- ``getStartColumn()`` - gets the column of the start of the text -- ``getEndColumn()`` - gets the column of the end of the text +- ``getFile()`` - gets the ``File``. +- ``getStartLine()`` - gets the first line of the text. +- ``getEndLine()`` - gets the last line of the text. +- ``getStartColumn()`` - gets the column of the start of the text. +- ``getEndColumn()`` - gets the column of the end of the text. Examples ~~~~~~~~ @@ -213,10 +215,10 @@ Predicates Useful member predicates on ``Declaration`` include: -- ``getDeclaringType()`` - gets the type containing the declaration, if any -- ``getName()``/``hasName(string)`` - gets the name of the declared entity -- ``isSourceDeclaration()`` - whether the declaration is source code and is not a constructed type/method -- ``getSourceDeclaration()`` - gets the original (unconstructed) declaration +- ``getDeclaringType()`` - gets the type containing the declaration, if any. +- ``getName()``/``hasName(string)`` - gets the name of the declared entity. +- ``isSourceDeclaration()`` - whether the declaration is source code and is not a constructed type/method. +- ``getSourceDeclaration()`` - gets the original (unconstructed) declaration. Examples ~~~~~~~~ @@ -262,10 +264,10 @@ Predicates Some common predicates on ``Variable`` are: -- ``getType()`` - gets the ``Type`` of this variable -- ``getAnAccess()`` - gets an expression that accesses (reads or writes) this variable, if any -- ``getAnAssignedValue()`` - gets an expression that is assigned to this variable, if any -- ``getInitializer()`` - gets the expression used to initialize the variable, if any +- ``getType()`` - gets the ``Type`` of this variable. +- ``getAnAccess()`` - gets an expression that accesses (reads or writes) this variable, if any. +- ``getAnAssignedValue()`` - gets an expression that is assigned to this variable, if any. +- ``getInitializer()`` - gets the expression used to initialize the variable, if any. Examples ~~~~~~~~ @@ -309,7 +311,7 @@ Class hierarchy - ``VoidType`` - ``void`` - ``PointerType`` - a pointer type -The ``ValueType`` class extends further as follows: +The ``ValueType`` class extends further: - ``ValueType`` - a value type @@ -345,7 +347,7 @@ The ``ValueType`` class extends further as follows: - ``NullableType`` - ``ArrayType`` -The ``RefType`` class extends further as follows: +The ``RefType`` class extends further: - ``RefType`` @@ -369,19 +371,19 @@ Predicates Useful members of ``ValueOrRefType`` include: -- ``getQualifiedName()/hasQualifiedName(string)`` - gets the qualified name of the type (for example, ``"System.String"``) -- ``getABaseInterface()`` - gets an immediate interface of this type, if any -- ``getABaseType()`` - gets an immediate base class or interface of this type, if any -- ``getBaseClass()`` - gets the immediate base class of this type, if any -- ``getASubType()`` - gets an immediate subtype, a type which directly inherits from this type, if any -- ``getAMember()`` - gets any member (field/method/property etc), if any -- ``getAMethod()`` - gets a method, if any -- ``getAProperty()`` - gets a property, if any -- ``getAnIndexer()`` - gets an indexer, if any -- ``getAnEvent()`` - gets an event, if any -- ``getAnOperator()`` - gets an operator, if any -- ``getANestedType()`` - gets a nested type -- ``getNamespace()`` - gets the enclosing namespace +- ``getQualifiedName()/hasQualifiedName(string)`` - gets the qualified name of the type (for example, ``"System.String"``). +- ``getABaseInterface()`` - gets an immediate interface of this type, if any. +- ``getABaseType()`` - gets an immediate base class or interface of this type, if any. +- ``getBaseClass()`` - gets the immediate base class of this type, if any. +- ``getASubType()`` - gets an immediate subtype, a type which directly inherits from this type, if any. +- ``getAMember()`` - gets any member (field/method/property etc), if any. +- ``getAMethod()`` - gets a method, if any. +- ``getAProperty()`` - gets a property, if any. +- ``getAnIndexer()`` - gets an indexer, if any. +- ``getAnEvent()`` - gets an event, if any. +- ``getAnOperator()`` - gets an operator, if any. +- ``getANestedType()`` - gets a nested type. +- ``getNamespace()`` - gets the enclosing namespace. Examples ~~~~~~~~ @@ -492,12 +494,12 @@ Predicates Here are a few useful predicates on the ``Callable`` class: -- ``getParameter(int)``/``getAParameter()`` - gets a parameter -- ``calls(Callable)`` - whether there's a direct call from one callable to another -- ``getReturnType()`` - gets the return type -- ``getBody()``/``getExpressionBody()`` - gets the body of the callable +- ``getParameter(int)``/``getAParameter()`` - gets a parameter. +- ``calls(Callable)`` - whether there's a direct call from one callable to another. +- ``getReturnType()`` - gets the return type. +- ``getBody()``/``getExpressionBody()`` - gets the body of the callable. -Since ``Callable`` extends ``Declaration``, it also has predicates from ``Declaration``, such as +Since ``Callable`` extends ``Declaration``, it also has predicates from ``Declaration``, such as: - ``getName()``/``hasName(string)`` - ``getSourceDeclaration()`` @@ -506,10 +508,10 @@ Since ``Callable`` extends ``Declaration``, it also has predicates from ``Declar Methods have additional predicates, including: -- ``getAnOverridee()`` - gets a method that is immediately overridden by this method -- ``getAnOverrider()`` - gets a method that immediately overrides this method -- ``getAnImplementee()`` - gets an interface method that is immediately implemented by this method -- ``getAnImplementor()`` - gets a method that immediately implements this interface method +- ``getAnOverridee()`` - gets a method that is immediately overridden by this method. +- ``getAnOverrider()`` - gets a method that immediately overrides this method. +- ``getAnImplementee()`` - gets an interface method that is immediately implemented by this method. +- ``getAnImplementor()`` - gets a method that immediately implements this interface method. Examples ~~~~~~~~ @@ -665,7 +667,7 @@ Find an ``if`` statement with a constant condition: where ifStmt.getCondition().hasValue() select ifStmt, "This 'if' statement is constant." -Find an ``if`` statement with an empty "then" clause: +Find an ``if`` statement with an empty "then" block: .. code-block:: ql @@ -680,7 +682,7 @@ Exercises Exercise 6: Write a query to list all empty methods. (`Answer <#exercise-6>`__) -Exercise 7: Modify the last example to also detect empty statements (``;``) in the then block. (`Answer <#exercise-7>`__) +Exercise 7: Modify the last example to also detect empty statements (``;``) in the "then" block. (`Answer <#exercise-7>`__) Exercise 8: Modify the last example to exclude chains of ``if`` statements, where the ``else`` part is another ``if`` statement. (`Answer <#exercise-8>`__) @@ -874,13 +876,13 @@ Predicates Useful predicates on ``Expr`` include: -- ``getType()`` - gets the ``Type`` of the expression -- ``getValue()`` - gets the compile-time constant, if any -- ``hasValue()`` - whether the expression has a compile-time constant -- ``getEnclosingStmt()`` - gets the statement containing the expression, if any -- ``getEnclosingCallable()`` - gets the callable containing the expression, if any -- ``stripCasts()`` - remove all explicit or implicit casts -- ``isImplicit()`` - whether the expression was implicit, such as an implicit ``this`` qualifier (``ThisAccess``) +- ``getType()`` - gets the ``Type`` of the expression. +- ``getValue()`` - gets the compile-time constant, if any. +- ``hasValue()`` - whether the expression has a compile-time constant. +- ``getEnclosingStmt()`` - gets the statement containing the expression, if any. +- ``getEnclosingCallable()`` - gets the callable containing the expression, if any. +- ``stripCasts()`` - remove all explicit or implicit casts. +- ``isImplicit()`` - whether the expression was implicit, such as an implicit ``this`` qualifier (``ThisAccess``). Examples ~~~~~~~~ @@ -922,7 +924,7 @@ Attributes C# attributes are represented by the class `Attribute `__. They can be present on many C# elements, such as classes, methods, fields, and parameters. The database contains attributes from the source code and all assembly references. -The attribute of any ``Element`` can be obtained via ``getAnAttribute()``, whereas if you have an attribute, you can find its element via ``getTarget()``. The following two query fragments are identical: +The attribute of any ``Element`` can be obtained via ``getAnAttribute()``, whereas if you have an attribute, you can find its element via ``getTarget()``. These two query fragments are identical: .. code-block:: ql @@ -939,8 +941,8 @@ Class hierarchy Predicates ~~~~~~~~~~ -- ``getTarget()`` - gets the ``Element`` to which this attribute applies -- ``getArgument(int)`` - gets the given argument of the attribute +- ``getTarget()`` - gets the ``Element`` to which this attribute applies. +- ``getArgument(int)`` - gets the given argument of the attribute. - ``getType()`` - gets the type of this attribute. Note that the class name must end in ``"Attribute"``. Examples @@ -1117,9 +1119,9 @@ Here is the fixed version: else reason = "(not given)" select e, "This is obsolete because " + reason -What next? ----------- +Further reading +--------------- -- Visit :doc:`Tutorial: Analyzing data flow in C# ` to learn more about writing queries using the standard data flow and taint tracking libraries. -- Find out more about QL in the `QL language handbook `__ and `QL language specification `__. -- Learn more about the query console in `Using the query console `__. \ No newline at end of file +- Visit :doc:`Analyzing data flow in C# ` to learn more about writing queries using the standard data flow and taint tracking libraries. +- Find out more about QL in the `QL language reference `__. +- Learn more about the query console in `Using the query console `__ on LGTM.com. diff --git a/docs/language/learn-ql/csharp/ql-for-csharp.rst b/docs/language/learn-ql/csharp/ql-for-csharp.rst index bc5cc9e9959..6eb3567e808 100644 --- a/docs/language/learn-ql/csharp/ql-for-csharp.rst +++ b/docs/language/learn-ql/csharp/ql-for-csharp.rst @@ -1,27 +1,21 @@ CodeQL for C# ============= +Experiment and learn how to write effective and efficient queries for CodeQL databases generated from C# codebases. + .. toctree:: - :glob: :hidden: introduce-libraries-csharp dataflow -These topics provide an overview of the CodeQL libraries for C# and show examples of how to use them. +- `Basic C# query `__: Learn to write and run a simple CodeQL query using LGTM. -- `Basic C# query `__ describes how to write and run queries using LGTM. +- :doc:`CodeQL library for C# `: When you're analyzing a C# program, you can make use of the large collection of classes in the CodeQL library for C#. -- :doc:`Introducing the CodeQL libraries for C# ` introduces the standard libraries used to write queries for C# code. +- :doc:`Analyzing data flow in C# `: You can use CodeQL to track the flow of data through a C# program to its use. -.. raw:: html - - - -- :doc:`Tutorial: Analyzing data flow in C# ` demonstrates how to write queries using the standard data flow and taint tracking libraries for C#. - - -Other resources +Further reading --------------- - For examples of how to query common C# elements, see the `C# cookbook `__. diff --git a/docs/language/learn-ql/database.rst b/docs/language/learn-ql/database.rst index fe8e2e273ef..369b1d73fb0 100644 --- a/docs/language/learn-ql/database.rst +++ b/docs/language/learn-ql/database.rst @@ -1,7 +1,7 @@ What's in a CodeQL database? ============================ -A CodeQL database contains a variety of data related to a particular code base at a particular point in time. For details of how the database is generated see `Database generation `__. +A CodeQL database contains a variety of data related to a particular code base at a particular point in time. For details of how the database is generated see `Database generation `__ on LGTM.com. The database contains a full, hierarchical representation of the program defined by the code base. The database schema varies according to the language analyzed. The schema provides an interface between the initial lexical analysis during the extraction process, and the actual complex analysis using CodeQL. When the source code languages being analyzed change (such as Java 7 evolving into Java 8), this interface between the analysis phases can also change. diff --git a/docs/language/learn-ql/go/ast.dot b/docs/language/learn-ql/go/ast.dot new file mode 100644 index 00000000000..fbf32b744ed --- /dev/null +++ b/docs/language/learn-ql/go/ast.dot @@ -0,0 +1,22 @@ +digraph ast { + graph [dpi=300]; + "x" [shape=rect]; + "y" [shape=rect]; + "x + y" [shape=rect]; + "(x + y)" [shape=rect]; + "z" [shape=rect]; + "(x + y) * z" [shape=rect]; + invis1 [style=invis]; + invis2 [style=invis]; + invis3 [style=invis]; + + "(x + y) * z" -> "(x + y)" [label=" 0"]; + "(x + y) * z" -> "z" [label=" 1"]; + "(x + y)" -> "x + y" [label=" 0"]; + "x + y" -> "x" [label=" 0"]; + "x + y" -> "y" [label=" 1"]; + + "z" -> invis1 [style=invis]; + invis1 -> invis2 [style=invis]; + invis1 -> invis3 [style=invis]; +} diff --git a/docs/language/learn-ql/go/ast.png b/docs/language/learn-ql/go/ast.png new file mode 100644 index 00000000000..61a9f29b80a Binary files /dev/null and b/docs/language/learn-ql/go/ast.png differ diff --git a/docs/language/learn-ql/go/cfg.dot b/docs/language/learn-ql/go/cfg.dot new file mode 100644 index 00000000000..47db3df4c57 --- /dev/null +++ b/docs/language/learn-ql/go/cfg.dot @@ -0,0 +1,8 @@ +digraph cfg { + graph [dpi=300]; + rankdir=LR; + "x := 0" -> "p != nil"; + "p != nil" -> "x = p.f"; + "p != nil" -> "return x"; + "x = p.f" -> "return x"; +} diff --git a/docs/language/learn-ql/go/cfg.png b/docs/language/learn-ql/go/cfg.png new file mode 100644 index 00000000000..1290dadb870 Binary files /dev/null and b/docs/language/learn-ql/go/cfg.png differ diff --git a/docs/language/learn-ql/go/cfg2.dot b/docs/language/learn-ql/go/cfg2.dot new file mode 100644 index 00000000000..b8dcd71ee25 --- /dev/null +++ b/docs/language/learn-ql/go/cfg2.dot @@ -0,0 +1,14 @@ +digraph cfg2 { + graph [dpi=300]; + rankdir=LR; + + "p != nil is true" [shape=box]; + "p != nil is false" [shape=box]; + + "x := 0" -> "p != nil"; + "p != nil" -> "p != nil is true"; + "p != nil is true" -> "x = p.f"; + "p != nil" -> "p != nil is false"; + "p != nil is false" -> "return x"; + "x = p.f" -> "return x"; +} diff --git a/docs/language/learn-ql/go/cfg2.png b/docs/language/learn-ql/go/cfg2.png new file mode 100644 index 00000000000..617fe4fe4dc Binary files /dev/null and b/docs/language/learn-ql/go/cfg2.png differ diff --git a/docs/language/learn-ql/go/dfg.dot b/docs/language/learn-ql/go/dfg.dot new file mode 100644 index 00000000000..82253c9b133 --- /dev/null +++ b/docs/language/learn-ql/go/dfg.dot @@ -0,0 +1,11 @@ +digraph dfg { + graph [dpi=300]; + rankdir=LR; + + "x" [shape=diamond]; + "return x" [label=x>]; + + "0" -> "x"; + "p.f" -> "x"; + "x" -> "return x"; +} diff --git a/docs/language/learn-ql/go/dfg.png b/docs/language/learn-ql/go/dfg.png new file mode 100644 index 00000000000..6727af7b4ac Binary files /dev/null and b/docs/language/learn-ql/go/dfg.png differ diff --git a/docs/language/learn-ql/go/introduce-libraries-go.rst b/docs/language/learn-ql/go/introduce-libraries-go.rst new file mode 100644 index 00000000000..e47ce2b94c4 --- /dev/null +++ b/docs/language/learn-ql/go/introduce-libraries-go.rst @@ -0,0 +1,621 @@ +CodeQL library for Go +===================== + +When you're analyzing a Go program, you can make use of the large collection of classes in the CodeQL library for Go. + +Overview +-------- + +CodeQL ships with an extensive library for analyzing Go code. The classes in this library present +the data from a CodeQL database in an object-oriented form and provide abstractions and predicates +to help you with common analysis tasks. + +The library is implemented as a set of QL modules, that is, files with the extension ``.qll``. The +module ``go.qll`` imports most other standard library modules, so you can include the complete +library by beginning your query with: + +.. code-block:: ql + + import go + +Broadly speaking, the CodeQL library for Go provides two views of a Go code base: at the `syntactic +level`, source code is represented as an `abstract syntax tree +`__ (AST), while at the `data-flow level` it is +represented as a `data-flow graph `__ (DFG). In +between, there is also an intermediate representation of the program as a control-flow graph (CFG), +though this representation is rarely useful on its own and mostly used to construct the higher-level +DFG representation. + +The AST representation captures the syntactic structure of the program. You can use it to reason +about syntactic properties such as the nesting of statements within each other, but also about the +types of expressions and which variable a name refers to. + +The DFG, on the other hand, provides an approximation of how data flows through variables and +operations at runtime. It is used, for example, by the security queries to model the way +user-controlled input can propagate through the program. Additionally, the DFG contains information +about which function may be invoked by a given call (taking virtual dispatch through interfaces into +account), as well as control-flow information about the order in which different operations may be +executed at runtime. + +As a rule of thumb, you normally want to use the AST only for superficial syntactic queries. Any +analysis involving deeper semantic properties of the program should be done on the DFG. + +The rest of this tutorial briefly summarizes the most important classes and predicates provided by +this library, including references to the `detailed API documentation +`__ where applicable. We start by giving an overview of the AST +representation, followed by an explanation of names and entities, which are used to represent +name-binding information, and of types and type information. Then we move on to control flow and the +data-flow graph, and finally the call graph and a few advanced topics. + +Abstract syntax +--------------- + +The AST presents the program as a hierarchical structure of nodes, each of which corresponds to a +syntactic element of the program source text. For example, there is an AST node for each expression +and each statement in the program. These AST nodes are arranged into a parent-child relationship +reflecting the nesting of syntactic elements and the order in which inner elements appear in +enclosing ones. + +For example, this is the AST for the expression ``(x + y) * z``: + +|ast| + +It is composed of six AST nodes, representing ``x``, ``y``, ``x + y``, ``(x + y)``, ``z`` and the +entire expression ``(x + y) * z``, respectively. The AST nodes representing ``x`` and ``y`` are +children of the AST node representing ``x + y``, ``x`` being the zeroth child and ``y`` being the +first child, reflecting their order in the program text. Similarly, ``x + y`` is the only child of +``(x + y)``, which is the zeroth child of ``(x + y) * z``, whose first child is ``z``. + +All AST nodes belong to class `AstNode +`__, which defines generic +tree traversal predicates: + +- ``getChild(i)``: returns the ``i``\ th child of this AST node. +- ``getAChild()``: returns any child of this AST node. +- ``getParent()``: returns the parent node of this AST node, if any. + +These predicates should only be used to perform generic AST traversal. To access children of +specific AST node types, the specialized predicates introduced below should be used instead. In +particular, queries should not rely on the numeric indices of child nodes relative to their parent +nodes: these are considered an implementation detail that may change between versions of the +library. + +The predicate ``toString()`` in class ``AstNode`` nodes gives a short description of the AST node, +usually just indicating what kind of node it is. The ``toString()`` predicate does `not` provide +access to the source text corresponding to an AST node. The source text is not stored in the +dataset, and hence is not directly accessible to CodeQL queries. + +The predicate ``getLocation()`` in class ``AstNode`` returns a `Location +`__ entity +describing the source location of the program element represented by the AST node. You can use its +member predicates ``getFile()``, ``getStartLine()``, ``getStartColumn``, ``getEndLine()``, and +``getEndColumn()`` to obtain information about its file, start line and column, and end line and +column. + +The most important subclasses of `AstNode +`__ are `Stmt +`__ and `Expr +`__, which represent +statements and expressions, respectively. This section briefly discusses some of their more +important subclasses and predicates. For a full reference of all the subclasses of `Stmt +`__ and `Expr +`__ and their API, see +`Stmt.qll `__ and `Expr.qll +`__. + +Statements +~~~~~~~~~~ + +- ``ExprStmt``: an expression statement; use ``getExpr()`` to access the expression itself +- ``Assignment``: an assignment statement; use ``getLhs(i)`` to access the ``i``\ th left-hand side + and ``getRhs(i)`` to access the ``i``\ th right-hand side; if there is only a single left-hand side + you can use ``getLhs()`` instead, and similar for the right-hand side + + - ``SimpleAssignStmt``: an assignment statement that does not involve a compound operator + + - ``AssignStmt``: a plain assignment statement of the form ``lhs = rhs`` + - ``DefineStmt``: a short-hand variable declaration of the form ``lhs := rhs`` + + - ``CompoundAssignStmt``: an assignment statement with a compound operator, such as ``lhs += rhs`` + +- ``IncStmt``, ``DecStmt``: an increment statement or a decrement statement, respectively; use + ``getOperand()`` to access the expression being incremented or decremented +- ``BlockStmt``: a block of statements between curly braces; use ``getStmt(i)`` to access the + ``i``\ th statement in a block +- ``IfStmt``: an ``if`` statement; use ``getInit()``, ``getCond()``, ``getThen()``, and + ``getElse()`` to access the (optional) init statement, the condition being checked, the "then" + branch to evaluate if the condition is true, and the (optional) "else" branch to evaluate + otherwise, respectively +- ``LoopStmt``: a loop; use ``getBody()`` to access its body + + - ``ForStmt``: a ``for`` statement; use ``getInit()``, ``getCond()``, and ``getPost()`` to access + the init statement, loop condition, and post statement, respectively, all of which are optional + + - ``RangeStmt``: a ``range`` statement; use ``getDomain()`` to access the iteration domain, and + ``getKey()`` and ``getValue()`` to access the expressions to which successive keys and values + are assigned, if any + +- ``GoStmt``: a ``go`` statement; use ``getCall()`` to access the call expression that is evaluated + in the new goroutine +- ``DeferStmt``: a ``defer`` statement; use ``getCall()`` to access the call expression being + deferred +- ``SendStmt``: a send statement; use ``getChannel()`` and ``getValue()`` to access the channel and + the value being sent over the channel, respectively +- ``ReturnStmt``: a ``return`` statement; use ``getExpr(i)`` to access the ``i``\ th returned + expression; if there is only a single returned expression you can use ``getExpr()`` instead +- ``BranchStmt``: a statement that interrupts structured control flow; use ``getLabel()`` to get the + optional target label + + - ``BreakStmt``: a ``break`` statement + - ``ContinueStmt``: a ``continue`` statement + - ``FallthroughStmt``: a ``fallthrough`` statement at the end of a switch case + - ``GotoStmt``: a ``goto`` statement + +- ``DeclStmt``: a declaration statement, use ``getDecl()`` to access the declaration in this + statement; note that one rarely needs to deal with declaration statements directly, since + reasoning about the entities they declare is usually easier +- ``SwitchStmt``: a ``switch`` statement; use ``getInit()`` to access the (optional) init statement, + and ``getCase(i)`` to access the ``i``\ th ``case`` or ``default`` clause + + - ``ExpressionSwitchStmt``: a ``switch`` statement examining the value of an expression + - ``TypeSwitchStmt``: a ``switch`` statement examining the type of an expression + +- ``CaseClause``: a ``case`` or ``default`` clause in a ``switch`` statement; use ``getExpr(i)`` to + access the ``i``\ th expression, and ``getStmt(i)`` to access the ``i``\ th statement in the body + of this clause +- ``SelectStmt``: a ``select`` statement; use ``getCommClause(i)`` to access the ``i``\ th ``case`` + or ``default`` clause +- ``CommClause``: a ``case`` or ``default`` clause in a ``select`` statement; use ``getComm()`` to + access the send/receive statement of this clause (not defined for ``default`` clauses), and + ``getStmt(i)`` to access the ``i``\ th statement in the body of this clause +- ``RecvStmt``: a receive statement in a ``case`` clause of a ``select`` statement; use + ``getLhs(i)`` to access the ``i``\ th left-hand side of this statement, and ``getExpr()`` to + access the underlying receive expression + +Expressions +~~~~~~~~~~~ + +Class ``Expression`` has a predicate ``isConst()`` that holds if the expression is a compile-time +constant. For such constant expressions, ``getNumericValue()`` and ``getStringValue()`` can be used +to determine their numeric value and string value, respectively. Note that these predicates are not +defined for expressions whose value cannot be determined at compile time. Also note that the result +type of ``getNumericValue()`` is the QL type ``float``. If an expression has a numeric value that +cannot be represented as a QL ``float``, this predicate is also not defined. In such cases, you can +use ``getExactValue()`` to obtain a string representation of the value of the constant. + +- ``Ident``: an identifier; use ``getName()`` to access its name +- ``SelectorExpr``: a selector of the form ``base.sel``; use ``getBase()`` to access the part before + the dot, and ``getSelector()`` for the identifier after the dot +- ``BasicLit``: a literal of a basic type; subclasses ``IntLit``, ``FloatLit``, ``ImagLit``, + ``RuneLit``, and ``StringLit`` represent various specific kinds of literals +- ``FuncLit``: a function literal; use ``getBody()`` to access the body of the function +- ``CompositeLit``: a composite literal; use ``getKey(i)`` and ``getValue(i)`` to access the + ``i``\ th key and the ``i``\ th value, respectively +- ``ParenExpr``: a parenthesized expression; use ``getExpr()`` to access the expression between the + parentheses +- ``IndexExpr``: an index expression ``base[idx]``; use ``getBase()`` and ``getIndex()`` to access + ``base`` and ``idx``, respectively +- ``SliceExpr``: a slice expression ``base[lo:hi:max]``; use ``getBase()``, ``getLow()``, + ``getHigh()``, and ``getMax()`` to access ``base``, ``lo``, ``hi``, and ``max``, respectively; + note that ``lo``, ``hi``, and ``max`` can be omitted, in which case the corresponding predicates are not defined +- ``ConversionExpr``: a conversion expression ``T(e)``; use ``getTypeExpr()`` and ``getOperand()`` + to access ``T`` and ``e``, respectively +- ``TypeAssertExpr``: a type assertion ``e.(T)``; use ``getExpr()`` and ``getTypeExpr()`` to access + ``e`` and ``T``, respectively +- ``CallExpr``: a call expression ``callee(arg0, ..., argn)``; use ``getCalleeExpr()`` to access + ``callee``, and ``getArg(i)`` to access the ``i``\ th argument +- ``StarExpr``: a star expression, which may be either a pointer-type expression or a + pointer-dereference expression, depending on context; use ``getBase()`` to access the operand of + the star +- ``TypeExpr``: an expression that denotes a type +- ``OperatorExpr``: an expression with a unary or binary operator; use ``getOperator()`` to access + the operator + + - ``UnaryExpr``: an expression with a unary operator; use ``getAnOperand()`` to access the operand + of the operator + - ``BinaryExpr``: an expression with a binary operator; use ``getLeftOperand()`` and + ``getRightOperand()`` to access the left and the right operand, respectively + + - ``ComparisonExpr``: a binary expression that performs a comparison, including both equality + tests and relational comparisons + + - ``EqualityTestExpr``: an equality test, that is, either ``==`` or ``!=``; the predicate + ``getPolarity()`` has result ``true`` for the former and ``false`` for the latter + - ``RelationalComparisonExpr``: a relational comparison; use ``getLesserOperand()`` and + ``getGreaterOperand()`` to access the lesser and greater operand of the comparison, + respectively; ``isStrict()`` holds if this is a strict comparison using ``<`` or ``>``, + as opposed to ``<=`` or ``>=`` + +Names +~~~~~ + +While ``Ident`` and ``SelectorExpr`` are very useful classes, they are often too general: ``Ident`` +covers all identifiers in a program, including both identifiers appearing in a declaration as well +as references, and does not distinguish between names referring to packages, types, variables, +constants, functions, or statement labels. Similarly, a ``SelectorExpr`` might refer to a package, a +type, a function, or a method. + +Class ``Name`` and its subclasses provide a more fine-grained mapping of this space, organized along +the two axes of structure and namespace. In terms of structure, a name can be a ``SimpleName``, +meaning that it is a simple identifier (and hence an ``Ident``), or it can be a ``QualifiedName``, +meaning that it is a qualified identifier (and hence a ``SelectorExpr``). In terms of namespacing, a +``Name`` can be a ``PackageName``, ``TypeName``, ``ValueName``, or ``LabelName``. A ``ValueName``, +in turn, can be either a ``ConstantName``, a ``VariableName``, or a ``FunctionName``, depending on +what sort of entity the name refers to. + +A related abstraction is provided by class ``ReferenceExpr``: a reference expression is an +expression that refers to a variable, a constant, a function, a field, or an element of an array or +a slice. Use predicates ``isLvalue()`` and ``isRvalue()`` to determine whether a reference +expression appears in a syntactic context where it is assigned to or read from, respectively. + +Finally, ``ValueExpr`` generalizes ``ReferenceExpr`` to include all other kinds of expressions that +can be evaluated to a value (as opposed to expressions that refer to a package, a type, or a +statement label). + +Functions +~~~~~~~~~ + +At the syntactic level, functions appear in two forms: in function declarations (represented by +class ``FuncDecl``) and as function literals (represented by class ``FuncLit``). Since it is often +convenient to reason about functions of either kind, these two classes share a common superclass +``FuncDef``, which defines a few useful member predicates: + + - ``getBody()`` provides access to the function body + - ``getName()`` gets the function name; it is undefined for function literals, which do not have a + name + - ``getParameter(i)`` gets the ``i``\ th parameter of the function + - ``getResultVar(i)`` gets the ``i``\ th result variable of the function; if there is only + one result, ``getResultVar()`` can be used to access it + - ``getACall()`` gets a data-flow node (see below) representing a call to this function + +Entities and name binding +------------------------- + +Not all elements of a code base can be represented as AST nodes. For example, functions defined in +the standard library or in a dependency do not have a source-level definition within the source code +of the program itself, and built-in functions like ``len`` do not have a definition at all. Hence +functions cannot simplify be identified with their definition, and similarly for variables, types, +and so on. + +To smooth over this difference and provide a unified view of functions no matter where they are +defined, the Go library introduces the concept of an `entity`. An entity is a named program element, +that is, a package, a type, a constant, a variable, a field, a function, or a label. All entities +belong to class ``Entity``, which defines a few useful predicates: + + - ``getName()`` gets the name of the entity + - ``hasQualifiedName(pkg, n)`` holds if this entity is declared in package ``pkg`` and has name + ``n``; this predicate is only defined for types, functions, and package-level variables and + constants (but not for methods or local variables) + - ``getDeclaration()`` connects an entity to its declaring identifier, if any + - ``getAReference()`` gets a ``Name`` that refers to this entity + +Conversely, class ``Name`` defines a predicate ``getTarget()`` that gets the entity to which the +name refers. + +Class ``Entity`` has several subclasses representing specific kinds of entities: ``PackageEntity`` +for packages; ``TypeEntity`` for types; ``ValueEntity`` for constants (``Constant``), variables +(``Variable``), and functions (``Function``); and ``Label`` for statement labels. + +Class ``Variable``, in turn, has a few subclasses representing specific kinds of variables: a +``LocalVariable`` is a variable declared in a local scope, that is, not at package level; +``ReceiverVariable``, ``Parameter`` and ``ResultVariable`` describe receivers, parameters and +results, respectively, and define a predicate ``getFunction()`` to access the corresponding +function. Finally, class ``Field`` represents struct fields, and provides a member predicate +``hasQualifiedName(pkg, tp, f)`` that holds if this field has name ``f`` and belongs to type ``tp`` +in package ``pkg``. (Note that due to embedding the same field can belong to multiple types.) + +Class ``Function`` has a subclass ``Method`` representing methods (including both interface methods +and methods defined on a named type). Similar to ``Field``, ``Method`` provides a member predicate +``hasQualifiedName(pkg, tp, m)`` that holds if this method has name ``m`` and belongs to type ``tp`` +in package ``pkg``. Predicate ``implements(m2)`` holds if this method implements method ``m2``, that +is, it has the same name and signature as ``m2`` and it belongs to a type that implements the +interface to which ``m2`` belongs. For any function, ``getACall()`` provides access to call sites +that may call this function, possibly through virtual dispatch. + +Finally, module ``Builtin`` provides a convenient way of looking up the entities corresponding to +built-in functions and types. For example, ``Builtin::len()`` is the entity representing the +built-in function ``len``, ``Builtin::bool()`` is the ``bool`` type, and ``Builtin::nil()`` is the +value ``nil``. + +Type information +---------------- + +Types are represented by class ``Type`` and its subclasses, such as ``BoolType`` for the built-in +type ``bool``; ``NumericType`` for the various numeric types including ``IntType``, ``Uint8Type``, +``Float64Type`` and others; ``StringType`` for the type ``string``; ``NamedType``, ``ArrayType``, +``SliceType``, ``StructType``, ``InterfaceType``, ``PointerType``, ``MapType``, ``ChanType`` for +named types, arrays, slices, structs, interfaces, pointers, maps, and channels, respectively. +Finally, ``SignatureType`` represents function types. + +Note that the type ``BoolType`` is distinct from the entity ``Builtin::bool()``: the latter views +``bool`` as a declared entity, the former as a type. You can, however, map from types to their +corresponding entity (if any) using the predicate ``getEntity()``. + +Class ``Expr`` and class ``Entity`` both define a predicate ``getType()`` to determine the type of +an expression or entity. If the type of an expression or entity cannot be determined (for example +because some dependency could not be found during extraction), it will be associated with an invalid +type of class ``InvalidType``. + +Control flow +------------ + +Most CodeQL query writers will rarely use the control-flow representation of a program directly, but +it is nevertheless useful to understand how it works. + +Unlike the abstract syntax tree, which views the program as a hierarchy of AST nodes, the +control-flow graph views it as a collection of `control-flow nodes`, each representing a single +operation performed at runtime. These nodes are connected to each other by (directed) edges +representing the order in which operations are performed. + +For example, consider the following code snippet: + +.. code-block:: go + + x := 0 + if p != nil { + x = p.f + } + return x + +In the AST, this is represented as an ``IfStmt`` and a ``ReturnStmt``, with the former having an +``NeqExpr`` and a ``BlockStmt`` as its children, and so on. This provides a very detailed picture of +the syntactic structure of the code, but it does not immediately help us reason about the order +in which the various operations such as the comparison and the assignment are performed. + +In the CFG, there are nodes corresponding to ``x := 0``, ``p != nil``, ``x = p.f``, and ``return +x``, as well as a few others. The edges between these nodes model the possible execution orders of +these statements and expressions, and look as follows (simplified somewhat for presentational +purposes): + +|cfg| + +For example, the edge from ``p != nil`` to ``x = p.f`` models the case where the comparison +evaluates to ``true`` and the "then" branch is evaluated, while the edge from ``p != nil`` to +``return x`` models the case where the comparison evaluates to ``false`` and the "then" branch is +skipped. + +Note, in particular, that a CFG node can have multiple outgoing edges (like from ``p != nil``) as +well as multiple incoming edges (like into ``return x``) to represent control-flow branching at +runtime. + +Also note that only AST nodes that perform some kind of operation on values have a corresponding CFG +node. This includes expressions (such as the comparison ``p != nil``), assignment statements (such +as ``x = p.f``) and return statements (such as ``return x``), but not statements that serve a purely +syntactic purpose (such as block statements) and statements whose semantics is already reflected by +the CFG edges (such as ``if`` statements). + +It is important to point out that the control-flow graph provided by the CodeQL libraries for Go +only models `local` control flow, that is, flow within a single function. Flow from function calls +to the function they invoke, for example, is not represented by control-flow edges. + +In CodeQL, control-flow nodes are represented by class ``ControlFlow::Node``, and the edges between +nodes are captured by the member predicates ``getASuccessor()`` and ``getAPredecessor()`` of +``ControlFlow::Node``. In addition to control-flow nodes representing runtime operations, each +function also has a synthetic entry node and an exit node, representing the start and end of an +execution of the function, respectively. These exist to ensure that the control-flow graph +corresponding to a function has a unique entry node and a unique exit node, which is required for +many standard control-flow analysis algorithms. + +Data flow +--------- + +At the data-flow level, the program is thought of as a collection of `data-flow nodes`. These nodes +are connected to each other by (directed) edges representing the way data flows through the program +at runtime. + +For example, there are data-flow nodes corresponding to expressions and other data-flow nodes +corresponding to variables (`SSA variables +`__, to be precise). Here is the +data-flow graph corresponding to the code snippet shown above, ignoring SSA conversion for +simplicity: + +|dfg| + +Note that unlike in the control-flow graph, the assignments ``x := 0`` and ``x = p.f`` are not +represented as nodes. Instead, they are expressed as edges between the node representing the +right-hand side of the assignment and the node representing the variable on the left-hand side. For +any subsequent uses of that variable, there is a data-flow edge from the variable to that use, so by +following the edges in the data-flow graph we can trace the flow of values through variables at +runtime. + +It is important to point out that the data-flow graph provided by the CodeQL libraries for Go only +models `local` flow, that is, flow within a single function. Flow from arguments in a function call +to the corresponding function parameters, for example, is not represented by data-flow edges. + +In CodeQL, data-flow nodes are represented by class ``DataFlow::Node``, and the edges between nodes +are captured by the predicate ``DataFlow::localFlowStep``. The predicate ``DataFlow::localFlow`` +generalizes this from a single flow step to zero or more flow steps. + +Most expressions have a corresponding data-flow node; exceptions include type expressions, statement +labels and other expressions that do not have a value, as well as short-circuiting operators. To map +from the AST node of an expression to the corresponding DFG node, use ``DataFlow::exprNode``. Note +that the AST node and the DFG node are different entities and cannot be used interchangeably. + +There is also a predicate ``asExpr()`` on ``DataFlow::Node`` that allows you to recover the +expression underlying a DFG node. However, this predicate should be used with caution, since many +data-flow nodes do not correspond to an expression, and so this predicate will not be defined for +them. + +Similar to ``Expr``, ``DataFlow::Node`` has a member predicate ``getType()`` to determine the type +of a node, as well as predicates ``getNumericValue()``, ``getStringValue()``, and +``getExactValue()`` to retrieve the value of a node if it is constant. + +Important subclasses of ``DataFlow::Node`` include: + + - ``DataFlow::CallNode``: a function call or method call; use ``getArgument(i)`` and + ``getResult(i)`` to obtain the data-flow nodes corresponding to the ``i``\ th argument and the + ``i``\ th result of this call, respectively; if there is only a single result, ``getResult()`` + will return it + - ``DataFlow::ParameterNode``: a parameter of a function; use ``asParameter()`` to access the + corresponding AST node + - ``DataFlow::BinaryOperationNode``: an operation involving a binary operator; each ``BinaryExpr`` + has a corresponding ``BinaryOperationNode``, but there are also binary operations that are not + explicit at the AST level, such as those arising from compound assignments and + increment/decrement statements; at the AST level, ``x + 1``, ``x += 1``, and ``x++`` are + represented by different kinds of AST nodes, while at the DFG level they are all modeled as a + binary operation node with operands ``x`` and ``1`` + - ``DataFlow::UnaryOperationNode``: analogous, but for unary operators + + - ``DataFlow::PointerDereferenceNode``: a pointer dereference, either explicit in an expression + of the form ``*p``, or implicit in a field or method reference through a pointer + - ``DataFlow::AddressOperationNode``: analogous, but for taking the address of an entity + - ``DataFlow::RelationalComparisonNode``, ``DataFlow::EqualityTestNode``: data-flow nodes + corresponding to ``RelationalComparisonExpr`` and ``EqualityTestExpr`` AST nodes + +Finally, classes ``Read`` and ``Write`` represent, respectively, a read or a write of a variable, a +field, or an element of an array, a slice or a map. Use their member predicates ``readsVariable``, +``writesVariable``, ``readsField``, ``writesField``, ``readsElement``, and ``writesElement`` to +determine what the read/write refers to. + +Call graph +---------- + +The call graph connects function (and method) calls to the functions they invoke. Call graph +information is made available by two member predicates on ``DataFlow::CallNode``: ``getTarget()`` +returns the declared target of a call, while ``getACallee()`` returns all possible actual functions +a call may invoke at runtime. + +These two predicates differ in how they handle calls to interface methods: while ``getTarget()`` +will return the interface method itself, ``getACallee()`` will return all concrete methods that +implement the interface method. + +Global data flow and taint tracking +----------------------------------- + +The predicates ``DataFlow::localFlowStep`` and ``DataFlow::localFlow`` are useful for reasoning +about the flow of values in a single function. However, more advanced use cases, particularly in +security analysis, will invariably require reasoning about global data flow, including flow into, +out of, and across function calls, and through fields. + +In CodeQL, such reasoning is expressed in terms of `data-flow configurations`. A data-flow +configuration has three ingredients: sources, sinks, and barriers (also called sanitizers), all of +which are sets of data-flow nodes. Given these three sets, CodeQL provides a general mechanism for +finding paths from a source to a sink, possibly going into and out of functions and fields, but +never flowing through a barrier. + +To define a data-flow configuration, you can define a subclass of ``DataFlow::Configuration``, +overriding the member predicates ``isSource``, ``isSink``, and ``isBarrier`` to define the sets of +sources, sinks, and barriers. + +Going beyond pure data flow, many security analyses need to perform more general `taint tracking`, +which also considers flow through value-transforming operations such as string operations. To track +taint, you can define a subclass of ``TaintTracking::Configuration``, which works similar to +data-flow configurations. + +A detailed exposition of global data flow and taint tracking is out of scope for this brief +introduction. For a general overview of data flow and taint tracking, see `About data flow analysis `__. + +Advanced libraries +------------------ + +Finally, we briefly describe a few concepts and libraries that are useful for advanced query +writers. + +Basic blocks and dominance +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Many important control-flow analyses organize control-flow nodes into `basic blocks +`__, which are maximal straight-line sequences of +control-flow nodes without any branching. In the CodeQL libraries, basic blocks are represented by +class ``BasicBlock``. Each control-flow node belongs to a basic block. You can use the predicate +``getBasicBlock()`` in class ``ControlFlow::Node`` and the predicate ``getNode(i)`` in +``BasicBlock`` to move from one to the other. + +Dominance is a standard concept in control-flow analysis: a basic block ``dom`` is said to +`dominate` a basic block ``bb`` if any path through the control-flow graph from the entry node to +the first node of ``bb`` must pass through ``dom``. In other words, whenever program execution +reaches the beginning of ``bb``, it must have come through ``dom``. Each basic block is moreover +considered to dominate itself. + +Dually, a basic block ``postdom`` is said to `post-dominate` a basic block ``bb`` if any path +through the control-flow graph from the last node of ``bb`` to the exit node must pass through +``postdom``. In other words, after program execution leaves ``bb``, it must eventually reach +``postdom``. + +These two concepts are captured by two member predicates ``dominates`` and ``postDominates`` of class +``BasicBlock``. + +Condition guard nodes +~~~~~~~~~~~~~~~~~~~~~ + +A condition guard node is a synthetic control-flow node that records the fact that at some point in +the control-flow graph the truth value of a condition is known. For example, consider again the code snippet we saw above: + +.. code-block:: go + + x := 0 + if p != nil { + x = p.f + } + return x + +At the beginning of the "then" branch ``p`` is known not be ``nil``. This knowledge is encoded in +the control-flow graph by a condition guard node preceding the assignment to ``x``, recording the +fact that ``p != nil`` is ``true`` at this point: + +|cfg2| + +A typical use of this information would be in an analyis that looks for ``nil`` dereferences: such +an analysis would be able to conclude that the field read ``p.f`` is safe because it is immediately +preceded by a condition guard node guaranteeing that ``p`` is not ``nil``. + +In CodeQL, condition guard nodes are represented by class ``ControlFlow::ConditionGuardNode`` which +offers a variety of member predicates to reason about which conditions a guard node guarantees. + +Static single-assignment form +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +`Static single-assignment form `__ (SSA +form for short) is a program representation in which the original program variables are mapped onto +more fine-grained `SSA variables`. Each SSA variable has exactly one definition, so program +variables with multiple assignments correspond to multiple SSA variables. + +Most of the time query authors do not have to deal with SSA form directly. The data-flow graph uses +it under the hood, and so most of the benefits derived from SSA can be gained by simply using the +data-flow graph. + +For example, the data-flow graph for our running example actually looks more like this: + +|ssa| + +Note that the program variable ``x`` has been mapped onto three distinct SSA variables ``x1``, +``x2``, and ``x3``. In this case there is not much benefit to such a representation, but in general +SSA form has well-known advantages for data-flow analysis for which we refer to the literature. + +If you do need to work with raw SSA variables, they are represented by the class ``SsaVariable``. +Class ``SsaDefinition`` represents definitions of SSA variables, which have a one-to-one +correspondence with ``SsaVariable``\ s. Member predicates ``getDefinition()`` and ``getVariable()`` +exist to map from one to the other. You can use member predicate ``getAUse()`` of ``SsaVariable`` to +look for uses of an SSA variable. To access the program variable underlying an SSA variable, use +member predicate ``getSourceVariable()``. + +Global value numbering +~~~~~~~~~~~~~~~~~~~~~~ + +`Global value numbering `__ is a technique for +determining when two computations in a program are guaranteed to yield the same result. This is done +by associating with each data-flow node an abstract representation of its value (conventionally +called a `value number`, even though in practice it is not usually a number) such that identical +computations are represented by identical value numbers. + +Since this is an undecidable problem, global value numbering is `conservative` in the sense that if +two data-flow nodes have the same value number they are guaranteed to have the same value at +runtime, but not conversely. (That is, there may be data-flow nodes that do, in fact, always +evaluate to the same value, but their value numbers are different.) + +In the CodeQL libraries for Go, you can use the ``globalValueNumber(nd)`` predicate to compute the +global value number for a data-flow node ``nd``. Value numbers are represented as an opaque QL type +``GVN`` that provides very little information. Usually, all you need to do with global value numbers +is to compare them to each other to determine whether two data-flow nodes have the same value. + +Further reading +--------------- + +- Find out more about QL in the `QL language reference `__. +- Learn more about the query console in `Using the query console `__ on LGTM.com. + +.. |ast| image:: ast.png +.. |cfg| image:: cfg.png +.. |dfg| image:: dfg.png +.. |cfg2| image:: cfg2.png +.. |ssa| image:: ssa.png diff --git a/docs/language/learn-ql/go/ql-for-go.rst b/docs/language/learn-ql/go/ql-for-go.rst index baa390dc7c6..a0e04cf0370 100644 --- a/docs/language/learn-ql/go/ql-for-go.rst +++ b/docs/language/learn-ql/go/ql-for-go.rst @@ -1,12 +1,18 @@ CodeQL for Go ============= -This page provides an overview of the CodeQL for Go documentation that is currently available. +Experiment and learn how to write effective and efficient queries for CodeQL databases generated from Go codebases. -- `Basic Go query `__ describes how to write and run queries using LGTM. +.. toctree:: + :hidden: + introduce-libraries-go -Other resources +- `Basic Go query `__: Learn to write and run a simple CodeQL query using LGTM. + +- :doc:`CodeQL library for Go `: When you're analyzing a Go program, you can make use of the large collection of classes in the CodeQL library for Go. + +Further reading --------------- - For the queries used in LGTM, display a `Go query `__ and click **Open in query console** to see the code used to find alerts. diff --git a/docs/language/learn-ql/go/ssa.dot b/docs/language/learn-ql/go/ssa.dot new file mode 100644 index 00000000000..c8bfb8c62a0 --- /dev/null +++ b/docs/language/learn-ql/go/ssa.dot @@ -0,0 +1,15 @@ +digraph ssa { + graph [dpi=300]; + rankdir=LR; + + "x1" [shape=diamond,label=1>]; + "x2" [shape=diamond,label=2>]; + "x3" [shape=diamond,label=3>]; + "return x" [label=x>]; + + "0" -> "x1"; + "p.f" -> "x2"; + "x1" -> "x3"; + "x2" -> "x3"; + "x3" -> "return x"; +} diff --git a/docs/language/learn-ql/go/ssa.png b/docs/language/learn-ql/go/ssa.png new file mode 100644 index 00000000000..cd5ba3f29de Binary files /dev/null and b/docs/language/learn-ql/go/ssa.png differ diff --git a/docs/language/learn-ql/index.rst b/docs/language/learn-ql/index.rst index 5adafde09e9..e5617299436 100644 --- a/docs/language/learn-ql/index.rst +++ b/docs/language/learn-ql/index.rst @@ -4,7 +4,7 @@ Learning CodeQL CodeQL is the code analysis platform used by security researchers to automate variant analysis. You can use CodeQL queries to explore code and quickly find variants of security vulnerabilities and bugs. These queries are easy to write and share–visit the topics below and `our open source repository on GitHub `__ to learn more. -You can also try out CodeQL in the `query console `__ on `LGTM.com `__. +You can also try out CodeQL in the `query console on LGTM.com `__. Here, you can query open source projects directly, without having to download CodeQL databases and libraries. CodeQL is based on a powerful query language called QL. The following topics help you understand QL in general, as well as how to use it when analyzing code with CodeQL. @@ -15,85 +15,26 @@ CodeQL is based on a powerful query language called QL. The following topics hel If you've previously used QL, you may notice slight changes in terms we use to describe some important concepts. For more information, see our note about :doc:`Recent terminology changes `. -.. toctree:: - :hidden: - - terminology-note - - -.. _getting-started: - -Getting started -*************** - -If you are new to QL, start by looking at the following topics: - .. toctree:: :maxdepth: 1 - introduction-to-ql - about-ql beginner/ql-tutorials - ql-etudes/river-crossing - -CodeQL training and variant analysis examples -********************************************* - -To start learning how to use CodeQL for variant analysis for code written in a specific language, see: - -.. toctree:: - :maxdepth: -1 - - ql-training - -.. _writing-ql-queries: - -Writing CodeQL queries -********************** - -To learn more about writing your own queries, see: - -.. toctree:: - :maxdepth: 3 - :includehidden: - writing-queries/writing-queries - -For more information on using CodeQL to query code written in a specific language, see: - -.. toctree:: - :maxdepth: 2 - :includehidden: - cpp/ql-for-cpp csharp/ql-for-csharp go/ql-for-go java/ql-for-java javascript/ql-for-javascript python/ql-for-python - -Technical information -********************* - -For more technical information see: + ql-training + technical-info .. toctree:: - :maxdepth: 2 - :includehidden: + :hidden: + + terminology-note - technical-info +Further reading +*************** -Reference topics -**************** - -For a more comprehensive guide to the query language itself, see the following reference topics: - -- `QL language handbook `__—a description of important concepts in QL. -- `QL language specification `__—a formal specification of QL. - -Search -****** - -.. * :ref:`genindex` remove index for the time being as we currently have no tags - -* :ref:`search` +- `QL language reference `__: A description of important concepts in QL and a formal specification of the QL language. diff --git a/docs/language/learn-ql/intro-to-data-flow.rst b/docs/language/learn-ql/intro-to-data-flow.rst index f266849dfa7..e509d3a8c16 100644 --- a/docs/language/learn-ql/intro-to-data-flow.rst +++ b/docs/language/learn-ql/intro-to-data-flow.rst @@ -1,10 +1,11 @@ -Introduction to data flow analysis with CodeQL -############################################## +About data flow analysis +######################## + +Data flow analysis is used to compute the possible values that a variable can hold at various points in a program, determining how those values propagate through the program and where they are used. Overview ******** -Data flow analysis computes the possible values that a variable can hold at various points in a program, determining how those values propagate through the program and where they are used. Many CodeQL security queries implement data flow analysis, which can highlight the fate of potentially malicious or insecure data that can cause vulnerabilities in your code base. These queries help you understand if data is used in an insecure way, whether dangerous arguments are passed to functions, or whether sensitive data can leak. As well as highlighting potential security issues, you can also use data flow analysis to understand other aspects of how a program behaves, by finding, for example, uses of uninitialized variables and resource leaks. @@ -17,13 +18,13 @@ See the following tutorials for more information about analyzing data flow in sp - :doc:`Analyzing data flow in C# ` - :doc:`Analyzing data flow in Java ` - :doc:`Analyzing data flow in JavaScript/TypeScript ` -- :doc:`Taint tracking and data flow analysis in Python ` +- :doc:`Analyzing data flow and tracking tainted data in Python ` .. pull-quote:: Note - Data flow analysis is used extensively in path queries. To learn more about path queries, see :doc:`Constructing path queries `. + Data flow analysis is used extensively in path queries. To learn more about path queries, see :doc:`Creating path queries `. .. _data-flow-graph: diff --git a/docs/language/learn-ql/introduction-to-ql.rst b/docs/language/learn-ql/introduction-to-ql.rst index 4945d713803..9deb7523661 100644 --- a/docs/language/learn-ql/introduction-to-ql.rst +++ b/docs/language/learn-ql/introduction-to-ql.rst @@ -1,22 +1,21 @@ Introduction to QL ================== -QL is the powerful query language that underlies CodeQL, which is used to analyze code. -Queries written with CodeQL can find errors and uncover variants of important security vulnerabilities. -Visit `GitHub Security Lab `__ to read about examples of vulnerabilities that we have recently found in open source projects. - -Before diving into code analysis with CodeQL, it can be helpful to learn about the underlying language more generally. - -QL is a logic programming language, so it is built up of logical formulas. QL uses common logical connectives (such as ``and``, ``or``, and ``not``), quantifiers (such as ``forall`` and ``exists``), and other important logical concepts such as predicates. - -QL also supports recursion and aggregates. This allows you to write complex recursive queries using simple QL syntax and directly use aggregates such as ``count``, ``sum``, and ``average``. +Work through some simple exercises and examples to learn about the basics of QL and CodeQL. Basic syntax ------------ The basic syntax of QL will look familiar to anyone who has used SQL, but it is used somewhat differently. -A query is defined by a **select** clause, which specifies what the result of the query should be. You can try out the examples and exercises in this topic directly in LGTM. Open the `query console `__. Before you can run a query, you need to select a language and project to query (for these logic examples, any language and project will do). +QL is a logic programming language, so it is built up of logical formulas. QL uses common logical connectives (such as ``and``, ``or``, and ``not``), quantifiers (such as ``forall`` and ``exists``), and other important logical concepts such as predicates. + +QL also supports recursion and aggregates. This allows you to write complex recursive queries using simple QL syntax and directly use aggregates such as ``count``, ``sum``, and ``average``. + +Running a query +--------------- + +You can try out the following examples and exercises using `CodeQL for VS Code `__, or you can run them in the `query console on LGTM.com `__. Before you can run a query on LGTM.com, you need to select a language and project to query (for these logic examples, any language and project will do). Once you have selected a language, the query console is populated with the query: @@ -26,7 +25,7 @@ Once you have selected a language, the query console is populated with the query select "hello world" -This query simply returns the string ``"hello world"``. +This query returns the string ``"hello world"``. More complicated queries typically look like this: @@ -49,14 +48,14 @@ Note that ``int`` specifies that the **type** of ``x`` and ``y`` is 'integer'. T Simple exercises ---------------- -You can try to write simple queries using the some of the basic functions that are available for the ``integer``, ``date``, ``float``, ``boolean`` and ``string`` types. To apply a function, simply append it to the argument. For example, ``1.toString()`` converts the value ``1`` to a string. Notice that as you start typing a function, a pop-up is displayed making it easy to select the function that you want. Also note that you can apply multiple functions in succession. For example, ``100.log().sqrt()`` first takes the natural logarithm of 100 and then computes the square root of the result. +You can write simple queries using the some of the basic functions that are available for the ``int``, ``date``, ``float``, ``boolean`` and ``string`` types. To apply a function, append it to the argument. For example, ``1.toString()`` converts the value ``1`` to a string. Notice that as you start typing a function, a pop-up is displayed making it easy to select the function that you want. Also note that you can apply multiple functions in succession. For example, ``100.log().sqrt()`` first takes the natural logarithm of 100 and then computes the square root of the result. Exercise 1 ~~~~~~~~~~ Write a query which returns the length of the string ``"lgtm"``. (Hint: `here `__ is the list of the functions that can be applied to strings.) -➤ `Answer `__ +➤ `See answer in the query console on LGTM.com `__ There is often more than one way to define a query. For example, we can also write the above query in the shorter form: @@ -69,24 +68,24 @@ Exercise 2 Write a query which returns the sine of the minimum of ``3^5`` (``3`` raised to the power ``5``) and ``245.6``. -➤ `Answer `__ +➤ `See answer in the query console on LGTM.com `__ Exercise 3 ~~~~~~~~~~ Write a query which returns the opposite of the boolean ``false``. -➤ `Answer `__ +➤ `See answer in the query console on LGTM.com `__ Exercise 4 ~~~~~~~~~~ Write a query which computes the number of days between June 10 and September 28, 2017. -➤ `Answer `__ +➤ `See answer in the query console on LGTM.com `__ -Example queries ---------------- +Example query with multiple results +----------------------------------- The exercises above all show queries with exactly one result, but in fact many queries have multiple results. For example, the following query computes all `Pythagorean triples `__ between 1 and 10: @@ -97,7 +96,7 @@ The exercises above all show queries with exactly one result, but in fact many q x*x + y*y = z*z select x, y, z -➤ `See this in the query console `__ +➤ `See this in the query console on LGTM.com `__ To simplify the query, we can introduce a class ``SmallInt`` representing the integers between 1 and 10. We can also define a predicate ``square()`` on integers in that class. Defining classes and predicates in this way makes it easy to reuse code without having to repeat it every time. @@ -112,17 +111,18 @@ To simplify the query, we can introduce a class ``SmallInt`` representing the in where x.square() + y.square() = z.square() select x, y, z -➤ `See this in the query console `__ +➤ `See this in the query console on LGTM.com `__ -Now that you've seen some general examples, let's use the CodeQL libraries to analyze projects. -In particular, LGTM generates a database representing the code and then CodeQL is used to query this database. See `Database generation `__ for more details on how the database is built. +Example CodeQL queries +---------------------- -.. XX: Perhaps a link to the "CodeQL libraries for X"? +The previous examples used the primitive types built in to QL. Although we chose a project to query, we didn't use the information in that project's database. +The following example queries *do* use these databases and give you an idea of how to use CodeQL to analyze projects. -The previous exercises just used the primitive types built in to QL. Although we chose a project to query, they did not use the project-specific database. The following example queries *do* use these databases and give you an idea of what CodeQL can be used for. There are more details about how to use CodeQL `below <#learning-ql>`__, so don't worry if you don't fully understand these examples yet! +Queries using the CodeQL libraries can find errors and uncover variants of important security vulnerabilities in codebases. +Visit `GitHub Security Lab `__ to read about examples of vulnerabilities that we have recently found in open source projects. -Python -~~~~~~ +To import the CodeQL library for a specific programming language, type ``import `` at the start of the query. .. code-block:: ql @@ -132,10 +132,7 @@ Python where count(f.getAnArg()) > 7 select f -➤ `See this in the query console `__. The ``from`` clause defines a variable ``f`` representing a function. The ``where`` part limits the functions ``f`` to those with more than 7 arguments. Finally, the ``select`` clause lists these functions. - -JavaScript -~~~~~~~~~~ +➤ `See this in the query console on LGTM.com `__. The ``from`` clause defines a variable ``f`` representing a Python function. The ``where`` part limits the functions ``f`` to those with more than 7 arguments. Finally, the ``select`` clause lists these functions. .. code-block:: ql @@ -145,10 +142,7 @@ JavaScript where c.getText().regexpMatch("(?si).*\\bTODO\\b.*") select c -➤ `See this in the query console `__. The ``from`` clause defines a variable ``c`` representing a comment. The ``where`` part limits the comments ``c`` to those containing the word ``"TODO"``. The ``select`` clause lists these comments. - -Java -~~~~ +➤ `See this in the query console on LGTM.com `__. The ``from`` clause defines a variable ``c`` representing a JavaScript comment. The ``where`` part limits the comments ``c`` to those containing the word ``"TODO"``. The ``select`` clause lists these comments. .. code-block:: ql @@ -158,11 +152,11 @@ Java where not exists(p.getAnAccess()) select p -➤ `See this in the query console `__. The ``from`` clause defines a variable ``p`` representing a parameter. The ``where`` clause finds unused parameters by limiting the parameters ``p`` to those which are not accessed. Finally, the ``select`` clause lists these parameters. +➤ `See this in the query console on LGTM.com `__. The ``from`` clause defines a variable ``p`` representing a Java parameter. The ``where`` clause finds unused parameters by limiting the parameters ``p`` to those which are not accessed. Finally, the ``select`` clause lists these parameters. -Learning CodeQL +Further reading --------------- -- To find out more about how to write your own queries, try working through the :doc:`QL detective tutorials `. +- To find out more about how to write your own queries, try working through the :doc:`QL tutorials `. - For an overview of the other available resources, see :doc:`Learning CodeQL <../index>`. -- For a more technical description of the underlying language, see :doc:`About QL `. +- For a more technical description of the underlying language, see the `QL language reference `__. \ No newline at end of file diff --git a/docs/language/learn-ql/java/annotations.rst b/docs/language/learn-ql/java/annotations.rst index 9fbb70776c4..497f20e1ff3 100644 --- a/docs/language/learn-ql/java/annotations.rst +++ b/docs/language/learn-ql/java/annotations.rst @@ -1,19 +1,19 @@ -Tutorial: Annotations -===================== - -Overview --------- +Annotations in Java +=================== CodeQL databases of Java projects contain information about all annotations attached to program elements. -Annotations are represented by the following CodeQL classes: +About working with annotations +------------------------------ + +Annotations are represented by these CodeQL classes: - The class ``Annotatable`` represents all entities that may have an annotation attached to them (that is, packages, reference types, fields, methods, and local variables). - The class ``AnnotationType`` represents a Java annotation type, such as ``java.lang.Override``; annotation types are interfaces. - The class ``AnnotationElement`` represents an annotation element, that is, a member of an annotation type. - The class ``Annotation`` represents an annotation such as ``@Override``; annotation values can be accessed through member predicate ``getValue``. -As an example, recall that the Java standard library defines an annotation ``SuppressWarnings`` that instructs the compiler not to emit certain kinds of warnings. It is defined as follows: +For example, the Java standard library defines an annotation ``SuppressWarnings`` that instructs the compiler not to emit certain kinds of warnings: .. code-block:: java @@ -25,7 +25,7 @@ As an example, recall that the Java standard library defines an annotation ``Sup ``SuppressWarnings`` is represented as an ``AnnotationType``, with ``value`` as its only ``AnnotationElement``. -A typical usage of ``SuppressWarnings`` would be the following annotation to prevent a warning about using raw types: +A typical usage of ``SuppressWarnings`` would be this annotation for preventing a warning about using raw types: .. code-block:: java @@ -37,7 +37,7 @@ A typical usage of ``SuppressWarnings`` would be the following annotation to pre The expression ``@SuppressWarnings("rawtypes")`` is represented as an ``Annotation``. The string literal ``"rawtypes"`` is used to initialize the annotation element ``value``, and its value can be extracted from the annotation by means of the ``getValue`` predicate. -We could then write the following query to find all ``@SuppressWarnings`` annotations attached to constructors, and return both the annotation itself and the value of its ``value`` element: +We could then write this query to find all ``@SuppressWarnings`` annotations attached to constructors, and return both the annotation itself and the value of its ``value`` element: .. code-block:: ql @@ -49,7 +49,7 @@ We could then write the following query to find all ``@SuppressWarnings`` annota anntp.hasQualifiedName("java.lang", "SuppressWarnings") select ann, ann.getValue("value") -➤ `See the full query in the query console `__. Several of the LGTM.com demo projects use the ``@SuppressWarnings`` annotation. Looking at the ``value``\ s of the annotation element returned by the query, we can see that the *apache/activemq* project uses the ``"rawtypes"`` value described above. +➤ `See the full query in the query console on LGTM.com `__. Several of the LGTM.com demo projects use the ``@SuppressWarnings`` annotation. Looking at the ``value``\ s of the annotation element returned by the query, we can see that the *apache/activemq* project uses the ``"rawtypes"`` value described above. As another example, this query finds all annotation types that only have a single annotation element, which has name ``value``: @@ -64,14 +64,14 @@ As another example, this query finds all annotation types that only have a singl ) select anntp -➤ `See the full query in the query console `__. +➤ `See the full query in the query console on LGTM.com `__. Example: Finding missing ``@Override`` annotations -------------------------------------------------- -In newer versions of Java, it is recommended (though not required) to annotate methods that override another method with an ``@Override`` annotation. These annotations, which are checked by the compiler, serve as documentation, and also help you avoid accidental overloading where overriding was intended. +In newer versions of Java, it's recommended (though not required) that you annotate methods that override another method with an ``@Override`` annotation. These annotations, which are checked by the compiler, serve as documentation, and also help you avoid accidental overloading where overriding was intended. -For example, consider the following example program: +For example, consider this example program: .. code-block:: java @@ -89,9 +89,9 @@ For example, consider the following example program: Here, both ``Sub1.m`` and ``Sub2.m`` override ``Super.m``, but only ``Sub1.m`` is annotated with ``@Override``. -We will now develop a query for finding methods like ``Sub2.m`` that should be annotated with ``@Override``, but are not. +We'll now develop a query for finding methods like ``Sub2.m`` that should be annotated with ``@Override``, but are not. -As a first step, let us write a query that finds all ``@Override`` annotations. Annotations are expressions, so their type can be accessed using ``getType``. Annotation types, on the other hand, are interfaces, so their qualified name can be queried using ``hasQualifiedName``. Therefore we can implement the query as follows: +As a first step, let's write a query that finds all ``@Override`` annotations. Annotations are expressions, so their type can be accessed using ``getType``. Annotation types, on the other hand, are interfaces, so their qualified name can be queried using ``hasQualifiedName``. Therefore we can implement the query like this: .. code-block:: ql @@ -111,7 +111,7 @@ As always, it is a good idea to try this query on a CodeQL database for a Java p } } -This makes it very easy to write our query for finding methods that override another method, but do not have an ``@Override`` annotation: we use predicate ``overrides`` to find out whether one method overrides another, and predicate ``getAnAnnotation`` (available on any ``Annotatable``) to retrieve some annotation. +This makes it very easy to write our query for finding methods that override another method, but don't have an ``@Override`` annotation: we use predicate ``overrides`` to find out whether one method overrides another, and predicate ``getAnAnnotation`` (available on any ``Annotatable``) to retrieve some annotation. .. code-block:: ql @@ -122,14 +122,14 @@ This makes it very easy to write our query for finding methods that override ano not overriding.getAnAnnotation() instanceof OverrideAnnotation select overriding, "Method overrides another method, but does not have an @Override annotation." -➤ `See this in the query console `__. In practice, this query may yield many results from compiled library code, which are not very interesting. Therefore, it is a good idea to add another conjunct ``overriding.fromSource()`` to restrict the result to only report methods for which source code is available. +➤ `See this in the query console on LGTM.com `__. In practice, this query may yield many results from compiled library code, which aren't very interesting. It's therefore a good idea to add another conjunct ``overriding.fromSource()`` to restrict the result to only report methods for which source code is available. Example: Finding calls to deprecated methods -------------------------------------------- As another example, we can write a query that finds calls to methods marked with a ``@Deprecated`` annotation. -For example, consider the following example program: +For example, consider this example program: .. code-block:: java @@ -147,7 +147,7 @@ For example, consider the following example program: Here, both ``A.m`` and ``A.n`` are marked as deprecated. Methods ``n`` and ``r`` both call ``m``, but note that ``n`` itself is deprecated, so we probably should not warn about this call. -Like in the previous example, we start by defining a class for representing ``@Deprecated`` annotations: +As in the previous example, we'll start by defining a class for representing ``@Deprecated`` annotations: .. code-block:: ql @@ -167,7 +167,7 @@ Now we can define a class for representing deprecated methods: } } -Finally, we use these classes to find calls to deprecated methods, excluding calls that themselves appear in deprecated methods (see :doc:`Tutorial: Navigating the call graph ` for more information on class ``Call``): +Finally, we use these classes to find calls to deprecated methods, excluding calls that themselves appear in deprecated methods: .. code-block:: ql @@ -178,7 +178,9 @@ Finally, we use these classes to find calls to deprecated methods, excluding cal and not call.getCaller() instanceof DeprecatedMethod select call, "This call invokes a deprecated method." -On our example, this query flags the call to ``A.m`` in ``A.r``, but not the one in ``A.n``. +In our example, this query flags the call to ``A.m`` in ``A.r``, but not the one in ``A.n``. + +For more information about the class ``Call``, see :doc:`Navigating the call graph `. Improvements ~~~~~~~~~~~~ @@ -233,11 +235,11 @@ Now we can extend our query to filter out calls in methods carrying a ``Suppress and not call.getCaller().getAnAnnotation() instanceof SuppressDeprecationWarningAnnotation select call, "This call invokes a deprecated method." -➤ `See this in the query console `__. It's fairly common for projects to contain calls to methods that appear to be deprecated. +➤ `See this in the query console on LGTM.com `__. It's fairly common for projects to contain calls to methods that appear to be deprecated. -What next? ----------- +Further reading +--------------- -- Take a look at some of the other tutorials: :doc:`Tutorial: Javadoc ` and :doc:`Tutorial: Working with source locations `. -- Find out how specific classes in the AST are represented in the standard library for Java: :doc:`AST class reference `. -- Find out more about QL in the `QL language handbook `__ and `QL language specification `__. +- Take a look at some of the other articles in this section: :doc:`Javadoc ` and :doc:`Working with source locations `. +- Find out how specific classes in the AST are represented in the standard library for Java: :doc:`Classes for working with Java code `. +- Find out more about QL in the `QL language reference `__. diff --git a/docs/language/learn-ql/java/ast-class-reference.rst b/docs/language/learn-ql/java/ast-class-reference.rst index 268b95acac5..f34e2eec764 100644 --- a/docs/language/learn-ql/java/ast-class-reference.rst +++ b/docs/language/learn-ql/java/ast-class-reference.rst @@ -1,5 +1,7 @@ -AST class reference -=================== +Classes for working with Java code +================================== + +CodeQL has a large selection of classes for working with Java statements and expressions. .. _Expr: https://help.semmle.com/qldoc/java/semmle/code/java/Expr.qll/type.Expr$Expr.html .. _Stmt: https://help.semmle.com/qldoc/java/semmle/code/java/Statement.qll/type.Statement$Stmt.html diff --git a/docs/language/learn-ql/java/call-graph.rst b/docs/language/learn-ql/java/call-graph.rst index cfaa33cad9f..6f7874c772f 100644 --- a/docs/language/learn-ql/java/call-graph.rst +++ b/docs/language/learn-ql/java/call-graph.rst @@ -1,8 +1,10 @@ -Tutorial: Navigating the call graph -=================================== +Navigating the call graph +========================= -Call graph API --------------- +CodeQL has classes for identifying code that calls other code, and code that can be called from elsewhere. This allows you to find, for example, methods that are never used. + +Call graph classes +------------------ The CodeQL library for Java provides two abstract classes for representing a program's call graph: ``Callable`` and ``Call``. The former is simply the common superclass of ``Method`` and ``Constructor``, the latter is a common superclass of ``MethodAccess``, ``ClassInstanceExpression``, ``ThisConstructorInvocationStmt`` and ``SuperConstructorInvocationStmt``. Simply put, a ``Callable`` is something that can be invoked, and a ``Call`` is something that invokes a ``Callable``. @@ -56,7 +58,7 @@ Class ``Call`` provides two call graph navigation predicates: For instance, in our example ``getCallee`` of the second call in ``Client.main`` would return ``Super.getX``. At runtime, though, this call would actually invoke ``Sub.getX``. -Class ``Callable`` defines a large number of member predicates; for our purposes, the two most important ones are as follows: +Class ``Callable`` defines a large number of member predicates; for our purposes, the two most important ones are: - ``calls(Callable target)`` succeeds if this callable contains a call whose callee is ``target``. - ``polyCalls(Callable target)`` succeeds if this callable may call ``target`` at runtime; this is the case if it contains a call whose callee is either ``target`` or a method that ``target`` overrides. @@ -66,7 +68,7 @@ In our example, ``Client.main`` calls the constructor ``Sub(int)`` and the metho Example: Finding unused methods ------------------------------- -Given this API, we can easily write a query that finds methods that are not called by any other method: +We can use the ``Callable`` class to write a query that finds methods that are not called by any other method: .. code-block:: ql @@ -76,7 +78,7 @@ Given this API, we can easily write a query that finds methods that are not call where not exists(Callable caller | caller.polyCalls(callee)) select callee -➤ `See this in the query console `__. This simple query typically returns a large number of results. +➤ `See this in the query console on LGTM.com `__. This simple query typically returns a large number of results. .. pull-quote:: @@ -84,7 +86,7 @@ Given this API, we can easily write a query that finds methods that are not call We have to use ``polyCalls`` instead of ``calls`` here: we want to be reasonably sure that ``callee`` is not called, either directly or via overriding. -Running this query on a typical Java project results in lots of hits in the Java standard library. This makes sense, since no single client program uses every method of the standard library. More generally, we may want to exclude methods and constructors from compiled libraries. We can use the predicate ``fromSource`` to check whether a compilation unit is a source file, and refine our query as follows: +Running this query on a typical Java project results in lots of hits in the Java standard library. This makes sense, since no single client program uses every method of the standard library. More generally, we may want to exclude methods and constructors from compiled libraries. We can use the predicate ``fromSource`` to check whether a compilation unit is a source file, and refine our query: .. code-block:: ql @@ -95,7 +97,7 @@ Running this query on a typical Java project results in lots of hits in the Java callee.getCompilationUnit().fromSource() select callee, "Not called." -➤ `See this in the query console `__. This change reduces the number of results returned for most projects. +➤ `See this in the query console on LGTM.com `__. This change reduces the number of results returned for most projects. We might also notice several unused methods with the somewhat strange name ````: these are class initializers; while they are not explicitly called anywhere in the code, they are called implicitly whenever the surrounding class is loaded. Hence it makes sense to exclude them from our query. While we are at it, we can also exclude finalizers, which are similarly invoked implicitly: @@ -109,7 +111,7 @@ We might also notice several unused methods with the somewhat strange name ``") and not callee.hasName("finalize") select callee, "Not called." -➤ `See this in the query console `__. This also reduces the number of results returned by most projects. +➤ `See this in the query console on LGTM.com `__. This also reduces the number of results returned by most projects. We may also want to exclude public methods from our query, since they may be external API entry points: @@ -124,7 +126,7 @@ We may also want to exclude public methods from our query, since they may be ext not callee.isPublic() select callee, "Not called." -➤ `See this in the query console `__. This should have a more noticeable effect on the number of results returned. +➤ `See this in the query console on LGTM.com `__. This should have a more noticeable effect on the number of results returned. A further special case is non-public default constructors: in the singleton pattern, for example, a class is provided with private empty default constructor to prevent it from being instantiated. Since the very purpose of such constructors is their not being called, they should not be flagged up: @@ -140,9 +142,9 @@ A further special case is non-public default constructors: in the singleton patt not callee.(Constructor).getNumberOfParameters() = 0 select callee, "Not called." -➤ `See this in the query console `__. This change has a large effect on the results for some projects but little effect on the results for others. Use of this pattern varies widely between different projects. +➤ `See this in the query console on LGTM.com `__. This change has a large effect on the results for some projects but little effect on the results for others. Use of this pattern varies widely between different projects. -Finally, on many Java projects there are methods that are invoked indirectly by reflection. Thus, while there are no calls invoking these methods, they are, in fact, used. It is in general very hard to identify such methods. A very common special case, however, is JUnit test methods, which are reflectively invoked by a test runner. The QL Java library has support for recognizing test classes of JUnit and other testing frameworks, which we can employ to filter out methods defined in such classes: +Finally, on many Java projects there are methods that are invoked indirectly by reflection. So, while there are no calls invoking these methods, they are, in fact, used. It is in general very hard to identify such methods. A very common special case, however, is JUnit test methods, which are reflectively invoked by a test runner. The QL Java library has support for recognizing test classes of JUnit and other testing frameworks, which we can employ to filter out methods defined in such classes: .. code-block:: ql @@ -157,11 +159,11 @@ Finally, on many Java projects there are methods that are invoked indirectly by not callee.getDeclaringType() instanceof TestClass select callee, "Not called." -➤ `See this in the query console `__. This should give a further reduction in the number of results returned. +➤ `See this in the query console on LGTM.com `__. This should give a further reduction in the number of results returned. -What next? ----------- +Further reading +--------------- -- Find out how to query metadata and white space: :doc:`Tutorial: Annotations `, :doc:`Tutorial: Javadoc `, and :doc:`Tutorial: Working with source locations `. -- Find out how specific classes in the AST are represented in the standard library for Java: :doc:`AST class reference `. -- Find out more about QL in the `QL language handbook `__ and `QL language specification `__. +- Find out how to query metadata and white space: :doc:`Annotations in Java `, :doc:`Javadoc `, and :doc:`Working with source locations `. +- Find out how specific classes in the AST are represented in the standard library for Java: :doc:`Classes for working with Java code `. +- Find out more about QL in the `QL language reference `__. diff --git a/docs/language/learn-ql/java/dataflow.rst b/docs/language/learn-ql/java/dataflow.rst index 98b8b97e153..d49128113ba 100644 --- a/docs/language/learn-ql/java/dataflow.rst +++ b/docs/language/learn-ql/java/dataflow.rst @@ -1,13 +1,15 @@ Analyzing data flow in Java -============================ +=========================== -Overview --------- +You can use CodeQL to track the flow of data through a Java program to its use. -This topic describes how data flow analysis is implemented in the CodeQL libraries for Java and includes examples to help you write your own data flow queries. -The following sections describe how to utilize the libraries for local data flow, global data flow, and taint tracking. +About this article +------------------ -For a more general introduction to modeling data flow, see :doc:`Introduction to data flow analysis with CodeQL <../intro-to-data-flow>`. +This article describes how data flow analysis is implemented in the CodeQL libraries for Java and includes examples to help you write your own data flow queries. +The following sections describe how to use the libraries for local data flow, global data flow, and taint tracking. + +For a more general introduction to modeling data flow, see :doc:`About data flow analysis <../intro-to-data-flow>`. Local data flow --------------- @@ -17,7 +19,7 @@ Local data flow is data flow within a single method or callable. Local data flow Using local data flow ~~~~~~~~~~~~~~~~~~~~~ -The local data flow library is in the module ``DataFlow``, which defines the class ``Node`` denoting any element that data can flow through. ``Node``\ s are divided into expression nodes (``ExprNode``) and parameter nodes (``ParameterNode``). It is possible to map between data flow nodes and expressions/parameters using the member predicates ``asExpr`` and ``asParameter``: +The local data flow library is in the module ``DataFlow``, which defines the class ``Node`` denoting any element that data can flow through. ``Node``\ s are divided into expression nodes (``ExprNode``) and parameter nodes (``ParameterNode``). You can map between data flow nodes and expressions/parameters using the member predicates ``asExpr`` and ``asParameter``: .. code-block:: ql @@ -45,9 +47,9 @@ or using the predicates ``exprNode`` and ``parameterNode``: */ ParameterNode parameterNode(Parameter p) { ... } -The predicate ``localFlowStep(Node nodeFrom, Node nodeTo)`` holds if there is an immediate data flow edge from the node ``nodeFrom`` to the node ``nodeTo``. The predicate can be applied recursively (using the ``+`` and ``*`` operators), or through the predefined recursive predicate ``localFlow``, which is equivalent to ``localFlowStep*``. +The predicate ``localFlowStep(Node nodeFrom, Node nodeTo)`` holds if there is an immediate data flow edge from the node ``nodeFrom`` to the node ``nodeTo``. You can apply the predicate recursively by using the ``+`` and ``*`` operators, or by using the predefined recursive predicate ``localFlow``, which is equivalent to ``localFlowStep*``. -For example, finding flow from a parameter ``source`` to an expression ``sink`` in zero or more local steps can be achieved as follows: +For example, you can find flow from a parameter ``source`` to an expression ``sink`` in zero or more local steps: .. code-block:: ql @@ -65,9 +67,9 @@ Local taint tracking extends local data flow by including non-value-preserving f If ``x`` is a tainted string then ``y`` is also tainted. -The local taint tracking library is in the module ``TaintTracking``. Like local data flow, a predicate ``localTaintStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo)`` holds if there is an immediate taint propagation edge from the node ``nodeFrom`` to the node ``nodeTo``. The predicate can be applied recursively (using the ``+`` and ``*`` operators), or through the predefined recursive predicate ``localTaint``, which is equivalent to ``localTaintStep*``. +The local taint tracking library is in the module ``TaintTracking``. Like local data flow, a predicate ``localTaintStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo)`` holds if there is an immediate taint propagation edge from the node ``nodeFrom`` to the node ``nodeTo``. You can apply the predicate recursively by using the ``+`` and ``*`` operators, or by using the predefined recursive predicate ``localTaint``, which is equivalent to ``localTaintStep*``. -For example, finding taint propagation from a parameter ``source`` to an expression ``sink`` in zero or more local steps can be achieved as follows: +For example, you can find taint propagation from a parameter ``source`` to an expression ``sink`` in zero or more local steps: .. code-block:: ql @@ -76,7 +78,7 @@ For example, finding taint propagation from a parameter ``source`` to an express Examples ~~~~~~~~ -The following query finds the filename passed to ``new FileReader(..)``. +This query finds the filename passed to ``new FileReader(..)``. .. code-block:: ql @@ -88,7 +90,7 @@ The following query finds the filename passed to ``new FileReader(..)``. call.getCallee() = fileReader select call.getArgument(0) -Unfortunately, this will only give the expression in the argument, not the values which could be passed to it. So we use local data flow to find all expressions that flow into the argument: +Unfortunately, this only gives the expression in the argument, not the values which could be passed to it. So we use local data flow to find all expressions that flow into the argument: .. code-block:: ql @@ -102,7 +104,7 @@ Unfortunately, this will only give the expression in the argument, not the value DataFlow::localFlow(DataFlow::exprNode(src), DataFlow::exprNode(call.getArgument(0))) select src -Then we can make the source more specific, for example an access to a public parameter. The following query finds where a public parameter is passed to ``new FileReader(..)``: +Then we can make the source more specific, for example an access to a public parameter. This query finds where a public parameter is passed to ``new FileReader(..)``: .. code-block:: ql @@ -116,7 +118,7 @@ Then we can make the source more specific, for example an access to a public par DataFlow::localFlow(DataFlow::parameterNode(p), DataFlow::exprNode(call.getArgument(0))) select p -The following example finds calls to formatting functions where the format string is not hard-coded. +This query finds calls to formatting functions where the format string is not hard-coded. .. code-block:: ql @@ -148,7 +150,7 @@ Global data flow tracks data flow throughout the entire program, and is therefor Using global data flow ~~~~~~~~~~~~~~~~~~~~~~ -The global data flow library is used by extending the class ``DataFlow::Configuration`` as follows: +You use the global data flow library by extending the class ``DataFlow::Configuration``: .. code-block:: ql @@ -166,7 +168,7 @@ The global data flow library is used by extending the class ``DataFlow::Configur } } -The following predicates are defined in the configuration: +These predicates are defined in the configuration: - ``isSource``—defines where data may flow from - ``isSink``—defines where data may flow to @@ -186,7 +188,7 @@ The data flow analysis is performed using the predicate ``hasFlow(DataFlow::Node Using global taint tracking ~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Global taint tracking is to global data flow as local taint tracking is to local data flow. That is, global taint tracking extends global data flow with additional non-value-preserving steps. The global taint tracking library is used by extending the class ``TaintTracking::Configuration`` as follows: +Global taint tracking is to global data flow as local taint tracking is to local data flow. That is, global taint tracking extends global data flow with additional non-value-preserving steps. You use the global taint tracking library by extending the class ``TaintTracking::Configuration``: .. code-block:: ql @@ -204,7 +206,7 @@ Global taint tracking is to global data flow as local taint tracking is to local } } -The following predicates are defined in the configuration: +These predicates are defined in the configuration: - ``isSource``—defines where taint may flow from - ``isSink``—defines where taint may flow to @@ -223,7 +225,7 @@ The data flow library contains some predefined flow sources. The class ``RemoteF Examples ~~~~~~~~ -The following example shows a taint-tracking configuration that uses remote user input as data sources. +This query shows a taint-tracking configuration that uses remote user input as data sources. .. code-block:: ql @@ -251,12 +253,12 @@ Exercise 3: Write a class that represents flow sources from ``java.lang.System.g Exercise 4: Using the answers from 2 and 3, write a query which finds all global data flows from ``getenv`` to ``java.net.URL``. (`Answer <#exercise-4>`__) -What next? ----------- +Further reading +--------------- -- Try the worked examples in the following topics: :doc:`Tutorial: Navigating the call graph ` and :doc:`Tutorial: Working with source locations `. -- Find out more about QL in the `QL language handbook `__ and `QL language specification `__. -- Learn more about the query console in `Using the query console `__. +- Try the worked examples in these articles: :doc:`Navigating the call graph ` and :doc:`Working with source locations `. +- Find out more about QL in the `QL language reference `__. +- Learn more about the query console in `Using the query console `__ on LGTM.com. Answers ------- diff --git a/docs/language/learn-ql/java/expressions-statements.rst b/docs/language/learn-ql/java/expressions-statements.rst index 25d605b18b8..24fb943ace5 100644 --- a/docs/language/learn-ql/java/expressions-statements.rst +++ b/docs/language/learn-ql/java/expressions-statements.rst @@ -1,12 +1,14 @@ -Tutorial: Expressions and statements -==================================== +Overflow-prone comparisons in Java +================================== -Overview --------- +You can use CodeQL to check for comparisons in Java code where one side of the comparison is prone to overflow. -This tutorial develops a query for finding comparisons between integers and long integers in loops that may lead to non-termination due to overflow. +About this article +------------------ -Specifically, consider the following code snippet: +In this tutorial article you'll write a query for finding comparisons between integers and long integers in loops that may lead to non-termination due to overflow. + +To begin, consider this code snippet: .. code-block:: java @@ -24,12 +26,12 @@ If ``l`` is bigger than 2\ :sup:`31`\ - 1 (the largest positive value of type `` All primitive numeric types have a maximum value, beyond which they will wrap around to their lowest possible value (called an "overflow"). For ``int``, this maximum value is 2\ :sup:`31`\ - 1. Type ``long`` can accommodate larger values up to a maximum of 2\ :sup:`63`\ - 1. In this example, this means that ``l`` can take on a value that is higher than the maximum for type ``int``; ``i`` will never be able to reach this value, instead overflowing and returning to a low value. -We will develop a query that finds code that looks like it might exhibit this kind of behavior. We will be using several of the standard library classes for representing statements and functions, a full list of which can be found in the :doc:`AST class reference `. +We're going to develop a query that finds code that looks like it might exhibit this kind of behavior. We'll be using several of the standard library classes for representing statements and functions. For a full list, see :doc:`Classes for working with Java code `. Initial query ------------- -We start out by writing a query that finds less-than expressions (CodeQL class ``LTExpr``) where the left operand is of type ``int`` and the right operand is of type ``long``: +We'll start by writing a query that finds less-than expressions (CodeQL class ``LTExpr``) where the left operand is of type ``int`` and the right operand is of type ``long``: .. code-block:: ql @@ -40,9 +42,9 @@ We start out by writing a query that finds less-than expressions (CodeQL class ` expr.getRightOperand().getType().hasName("long") select expr -➤ `See this in the query console `__. This query usually finds results on most projects. +➤ `See this in the query console on LGTM.com `__. This query usually finds results on most projects. -Notice that we use the predicate ``getType`` (available on all subclasses of ``Expr``) to determine the type of the operands. Types, in turn, define the ``hasName`` predicate, which allows us to identify the primitive types ``int`` and ``long``. As it stands, this query finds *all* less-than expressions comparing ``int`` and ``long``, but in fact we are only interested in comparisons that are part of a loop condition. Also, we want to filter out comparisons where either operand is constant, since these are less likely to be real bugs. The revised query looks as follows: +Notice that we use the predicate ``getType`` (available on all subclasses of ``Expr``) to determine the type of the operands. Types, in turn, define the ``hasName`` predicate, which allows us to identify the primitive types ``int`` and ``long``. As it stands, this query finds *all* less-than expressions comparing ``int`` and ``long``, but in fact we are only interested in comparisons that are part of a loop condition. Also, we want to filter out comparisons where either operand is constant, since these are less likely to be real bugs. The revised query looks like this: .. code-block:: ql @@ -55,7 +57,7 @@ Notice that we use the predicate ``getType`` (available on all subclasses of ``E not expr.getAnOperand().isCompileTimeConstant() select expr -➤ `See this in the query console `__. Notice that fewer results are found. +➤ `See this in the query console on LGTM.com `__. Notice that fewer results are found. The class ``LoopStmt`` is a common superclass of all loops, including, in particular, ``for`` loops as in our example above. While different kinds of loops have different syntax, they all have a loop condition, which can be accessed through predicate ``getCondition``. We use the reflexive transitive closure operator ``*`` applied to the ``getAChildExpr`` predicate to express the requirement that ``expr`` should be nested inside the loop condition. In particular, it can be the loop condition itself. @@ -78,7 +80,7 @@ In order to compare the ranges of types, we define a predicate that returns the (pt.hasName("long") and result=64) } -We now want to generalize our query to apply to any comparison where the width of the type on the smaller end of the comparison is less than the width of the type on the greater end. Let us call such a comparison *overflow prone*, and introduce an abstract class to model it: +We now want to generalize our query to apply to any comparison where the width of the type on the smaller end of the comparison is less than the width of the type on the greater end. Let's call such a comparison *overflow prone*, and introduce an abstract class to model it: .. code-block:: ql @@ -118,11 +120,11 @@ Now we rewrite our query to make use of these new classes: not expr.getAnOperand().isCompileTimeConstant() select expr -➤ `See the full query in the query console `__. +➤ `See the full query in the query console on LGTM.com `__. -What next? ----------- +Further reading +--------------- -- Have a look at some of the other tutorials: :doc:`Tutorial: Types and the class hierarchy `, :doc:`Tutorial: Navigating the call graph `, :doc:`Tutorial: Annotations `, :doc:`Tutorial: Javadoc `, and :doc:`Tutorial: Working with source locations `. -- Find out how specific classes in the AST are represented in the standard library for Java: :doc:`AST class reference `. -- Find out more about QL in the `QL language handbook `__ and `QL language specification `__. +- Have a look at some of the other articles in this section: :doc:`Java types `, :doc:`Navigating the call graph `, :doc:`Annotations in Java `, :doc:`Javadoc `, and :doc:`Working with source locations `. +- Find out how specific classes in the AST are represented in the standard library for Java: :doc:`Classes for working with Java code `. +- Find out more about QL in the `QL language reference `__. diff --git a/docs/language/learn-ql/java/introduce-libraries-java.rst b/docs/language/learn-ql/java/introduce-libraries-java.rst index cf8a2f3c27c..8fd21c2d0c2 100644 --- a/docs/language/learn-ql/java/introduce-libraries-java.rst +++ b/docs/language/learn-ql/java/introduce-libraries-java.rst @@ -1,8 +1,10 @@ -Introducing the CodeQL libraries for Java -========================================= +CodeQL library for Java +======================= -Overview --------- +When you're analyzing a Java program in {{ site.data.variables.product.prodname_dotcom }}, you can make use of the large collection of classes in the CodeQL library for Java. + +About the CodeQL library for Java +--------------------------------- There is an extensive library for analyzing CodeQL databases extracted from Java projects. The classes in this library present the data from a database in an object-oriented form and provide abstractions and predicates to help you with common analysis tasks. @@ -12,13 +14,13 @@ The library is implemented as a set of QL modules, that is, files with the exten import java -The rest of this topic briefly summarizes the most important classes and predicates provided by this library. +The rest of this article briefly summarizes the most important classes and predicates provided by this library. .. pull-quote:: Note - The example queries in this topic illustrate the types of results returned by different library classes. The results themselves are not interesting but can be used as the basis for developing a more complex query. The tutorial topics show how you can take a simple query and fine-tune it to find precisely the results you're interested in. + The example queries in this article illustrate the types of results returned by different library classes. The results themselves are not interesting but can be used as the basis for developing a more complex query. The other articles in this section of the help show how you can take a simple query and fine-tune it to find precisely the results you're interested in. Summary of the library classes ------------------------------ @@ -40,7 +42,7 @@ These classes represent named program elements: packages (``Package``), compilat Their common superclass is ``Element``, which provides general member predicates for determining the name of a program element and checking whether two elements are nested inside each other. -It is often convenient to refer to an element that might either be a method or a constructor; the class ``Callable``, which is a common superclass of ``Method`` and ``Constructor``, can be used for this purpose. +It's often convenient to refer to an element that might either be a method or a constructor; the class ``Callable``, which is a common superclass of ``Method`` and ``Constructor``, can be used for this purpose. Types ~~~~~ @@ -66,9 +68,9 @@ For example, the following query finds all variables of type ``int`` in the prog pt.hasName("int") select v -➤ `See this in the query console `__. You are likely to get many results when you run this query because most projects contain many variables of type ``int``. +➤ `See this in the query console on LGTM.com `__. You're likely to get many results when you run this query because most projects contain many variables of type ``int``. -Reference types can also be categorized according to their declaration scope: +Reference types are also categorized according to their declaration scope: - ``TopLevelType`` represents a reference type declared at the top-level of a compilation unit. - ``NestedType`` is a type declared inside another type. @@ -83,7 +85,7 @@ For instance, this query finds all top-level types whose name is not the same as where tl.getName() != tl.getCompilationUnit().getName() select tl -➤ `See this in the query console `__. This pattern is seen in many projects. When we ran it on the LGTM.com demo projects, most of the projects had at least one instance of this problem in the source code. There were many more instances in the files referenced by the source code. +➤ `See this in the query console on LGTM.com `__. This pattern is seen in many projects. When we ran it on the LGTM.com demo projects, most of the projects had at least one instance of this problem in the source code. There were many more instances in the files referenced by the source code. Several more specialized classes are available as well: @@ -105,7 +107,7 @@ As an example, we can write a query that finds all nested classes that directly where nc.getASupertype() instanceof TypeObject select nc -➤ `See this in the query console `__. You are likely to get many results when you run this query because many projects include nested classes that extend ``Object`` directly. +➤ `See this in the query console on LGTM.com `__. You're likely to get many results when you run this query because many projects include nested classes that extend ``Object`` directly. Generics ~~~~~~~~ @@ -139,7 +141,7 @@ For instance, we could use the following query to find all parameterized instanc pt.getSourceDeclaration() = map select pt -➤ `See this in the query console `__. None of the LGTM.com demo projects contain parameterized instances of ``java.util.Map`` in their source code, but they all have results in reference files. +➤ `See this in the query console on LGTM.com `__. None of the LGTM.com demo projects contain parameterized instances of ``java.util.Map`` in their source code, but they all have results in reference files. In general, generic types may restrict which types a type parameter can be bound to. For instance, a type of maps from strings to numbers could be declared as follows: @@ -162,7 +164,7 @@ As an example, the following query finds all type variables with type bound ``Nu tb.getType().hasQualifiedName("java.lang", "Number") select tv -➤ `See this in the query console `__. When we ran it on the LGTM.com demo projects, the *neo4j/neo4j*, *gradle/gradle* and *hibernate/hibernate-orm* projects all contained examples of this pattern. +➤ `See this in the query console on LGTM.com `__. When we ran it on the LGTM.com demo projects, the *neo4j/neo4j*, *gradle/gradle* and *hibernate/hibernate-orm* projects all contained examples of this pattern. For dealing with legacy code that is unaware of generics, every generic type has a "raw" version without any type parameters. In the CodeQL libraries, raw types are represented using class ``RawType``, which has the expected subclasses ``RawClass`` and ``RawInterface``. Again, there is a predicate ``getSourceDeclaration`` for obtaining the corresponding generic type. As an example, we can find variables of (raw) type ``Map``: @@ -175,7 +177,7 @@ For dealing with legacy code that is unaware of generics, every generic type has rt.getSourceDeclaration().hasQualifiedName("java.util", "Map") select v -➤ `See this in the query console `__. Many projects have variables of raw type ``Map``. +➤ `See this in the query console on LGTM.com `__. Many projects have variables of raw type ``Map``. For example, in the following code snippet this query would find ``m1``, but not ``m2``: @@ -194,7 +196,7 @@ The wildcards ``? extends Number`` and ``? super Float`` are represented by clas For dealing with generic methods, there are classes ``GenericMethod``, ``ParameterizedMethod`` and ``RawMethod``, which are entirely analogous to the like-named classes for representing generic types. -More information on working with types can be found in the :doc:`tutorial on types and the class hierarchy `. +For more information on working with types, see the :doc:`article on Java types `. Variables ~~~~~~~~~ @@ -208,7 +210,7 @@ Class ``Variable`` represents a variable `in the Java sense ` for an exhaustive list of all expression and statement types available in the standard QL library. +Classes in this category represent abstract syntax tree (AST) nodes, that is, statements (class ``Stmt``) and expressions (class ``Expr``). For a full list of expression and statement types available in the standard QL library, see :doc:`Classes for working with Java code `. Both ``Expr`` and ``Stmt`` provide member predicates for exploring the abstract syntax tree of a program: @@ -226,7 +228,7 @@ For example, the following query finds all expressions whose parents are ``retur where e.getParent() instanceof ReturnStmt select e -➤ `See this in the query console `__. Many projects have examples of ``return`` statements with child statements. +➤ `See this in the query console on LGTM.com `__. Many projects have examples of ``return`` statements with child statements. Therefore, if the program contains a return statement ``return x + y;``, this query will return ``x + y``. @@ -240,7 +242,7 @@ As another example, the following query finds statements whose parent is an ``if where s.getParent() instanceof IfStmt select s -➤ `See this in the query console `__. Many projects have examples of ``if`` statements with child statements. +➤ `See this in the query console on LGTM.com `__. Many projects have examples of ``if`` statements with child statements. This query will find both ``then`` branches and ``else`` branches of all ``if`` statements in the program. @@ -254,11 +256,11 @@ Finally, here is a query that finds method bodies: where s.getParent() instanceof Method select s -➤ `See this in the query console `__. Most projects have many method bodies. +➤ `See this in the query console on LGTM.com `__. Most projects have many method bodies. As these examples show, the parent node of an expression is not always an expression: it may also be a statement, for example, an ``IfStmt``. Similarly, the parent node of a statement is not always a statement: it may also be a method or a constructor. To capture this, the QL Java library provides two abstract class ``ExprParent`` and ``StmtParent``, the former representing any node that may be the parent node of an expression, and the latter any node that may be the parent node of a statement. - For more information on working with AST classes, see the :doc:`tutorial on expressions and statements `. +For more information on working with AST classes, see the :doc:`article on overflow-prone comparisons in Java `. Metadata -------- @@ -274,7 +276,7 @@ For annotations, class ``Annotatable`` is a superclass of all program elements t from Constructor c select c.getAnAnnotation() -➤ `See this in the query console `__. The LGTM.com demo projects all use annotations, you can see examples where they are used to suppress warnings and mark code as deprecated. +➤ `See this in the query console on LGTM.com `__. The LGTM.com demo projects all use annotations, you can see examples where they are used to suppress warnings and mark code as deprecated. These annotations are represented by class ``Annotation``. An annotation is simply an expression whose type is an ``AnnotationType``. For example, you can amend this query so that it only reports deprecated constructors: @@ -288,9 +290,9 @@ These annotations are represented by class ``Annotation``. An annotation is simp anntp.hasQualifiedName("java.lang", "Deprecated") select ann -➤ `See this in the query console `__. Only constructors with the ``@deprecated`` annotation are reported this time. +➤ `See this in the query console on LGTM.com `__. Only constructors with the ``@deprecated`` annotation are reported this time. -For more information on working with annotations, see the :doc:`tutorial on annotations `. +For more information on working with annotations, see the :doc:`article on annotations `. For Javadoc, class ``Element`` has a member predicate ``getDoc`` that returns a delegate ``Documentable`` object, which can then be queried for its attached Javadoc comments. For example, the following query finds Javadoc comments on private fields: @@ -303,7 +305,7 @@ For Javadoc, class ``Element`` has a member predicate ``getDoc`` that returns a jdoc = f.getDoc().getJavadoc() select jdoc -➤ `See this in the query console `__. You can see this pattern in many projects. +➤ `See this in the query console on LGTM.com `__. You can see this pattern in many projects. Class ``Javadoc`` represents an entire Javadoc comment as a tree of ``JavadocElement`` nodes, which can be traversed using member predicates ``getAChild`` and ``getParent``. For instance, you could edit the query so that it finds all ``@author`` tags in Javadoc comments on private fields: @@ -317,7 +319,7 @@ Class ``Javadoc`` represents an entire Javadoc comment as a tree of ``JavadocEle at.getParent+() = jdoc select at -➤ `See this in the query console `__. None of the LGTM.com demo projects uses the ``@author`` tag on private fields. +➤ `See this in the query console on LGTM.com `__. None of the LGTM.com demo projects uses the ``@author`` tag on private fields. .. pull-quote:: @@ -325,7 +327,7 @@ Class ``Javadoc`` represents an entire Javadoc comment as a tree of ``JavadocEle On line 5 we used ``getParent+`` to capture tags that are nested at any depth within the Javadoc comment. -For more information on working with Javadoc, see the :doc:`tutorial on Javadoc `. +For more information on working with Javadoc, see the :doc:`article on Javadoc `. Metrics ------- @@ -345,7 +347,7 @@ For example, the following query finds methods with a `cyclomatic complexity 40 select m -➤ `See this in the query console `__. Most large projects include some methods with a very high cyclomatic complexity. These methods are likely to be difficult to understand and test. +➤ `See this in the query console on LGTM.com `__. Most large projects include some methods with a very high cyclomatic complexity. These methods are likely to be difficult to understand and test. Call graph ---------- @@ -365,7 +367,7 @@ We can use predicate ``Call.getCallee`` to find out which method or constructor m.hasName("println") select c -➤ `See this in the query console `__. The LGTM.com demo projects all include many calls to methods of this name. +➤ `See this in the query console on LGTM.com `__. The LGTM.com demo projects all include many calls to methods of this name. Conversely, ``Callable.getAReference`` returns a ``Call`` that refers to it. So we can find methods and constructors that are never called using this query: @@ -377,13 +379,13 @@ Conversely, ``Callable.getAReference`` returns a ``Call`` that refers to it. So where not exists(c.getAReference()) select c -➤ `See this in the query console `__. The LGTM.com demo projects all appear to have many methods that are not called directly, but this is unlikely to be the whole story. To explore this area further, see :doc:`Navigating the call graph `. +➤ `See this in the query console on LGTM.com `__. The LGTM.com demo projects all appear to have many methods that are not called directly, but this is unlikely to be the whole story. To explore this area further, see :doc:`Navigating the call graph `. -For more information about callables and calls, see the :doc:`call graph tutorial `. +For more information about callables and calls, see the :doc:`article on the call graph `. -What next? ----------- +Further reading +--------------- -- Experiment with the worked examples in the CodeQL for Java tutorial topics: :doc:`Types and the class hierarchy `, :doc:`Expressions and statements `, :doc:`Navigating the call graph `, :doc:`Annotations `, :doc:`Javadoc ` and :doc:`Working with source locations `. -- Find out how specific classes in the AST are represented in the standard library for Java: :doc:`AST class reference `. -- Find out more about QL in the `QL language handbook `__ and `QL language specification `__. +- Experiment with the worked examples in the CodeQL for Java articles: :doc:`Java types `, :doc:`Overflow-prone comparisons in Java `, :doc:`Navigating the call graph `, :doc:`Annotations in Java `, :doc:`Javadoc ` and :doc:`Working with source locations `. +- Find out how specific classes in the AST are represented in the standard library for Java: :doc:`Classes for working with Java code `. +- Find out more about QL in the `QL language reference `__. diff --git a/docs/language/learn-ql/java/javadoc.rst b/docs/language/learn-ql/java/javadoc.rst index bd0a11b9132..85981b4bd59 100644 --- a/docs/language/learn-ql/java/javadoc.rst +++ b/docs/language/learn-ql/java/javadoc.rst @@ -1,8 +1,10 @@ -Tutorial: Javadoc -================= +Javadoc +======= -Overview --------- +You can use CodeQL to find errors in Javadoc comments in Java code. + +About analyzing Javadoc +----------------------- To access Javadoc associated with a program element, we use member predicate ``getDoc`` of class ``Element``, which returns a ``Documentable``. Class ``Documentable``, in turn, offers a member predicate ``getJavadoc`` to retrieve the Javadoc attached to the element in question, if any. @@ -49,9 +51,9 @@ The ``JavadocTag`` has several subclasses representing specific kinds of Javadoc Example: Finding spurious @param tags ------------------------------------- -As an example of using the CodeQL Javadoc API, let us write a query that finds ``@param`` tags that refer to a non-existent parameter. +As an example of using the CodeQL Javadoc API, let's write a query that finds ``@param`` tags that refer to a non-existent parameter. -For example, consider the following program: +For example, consider this program: .. code-block:: java @@ -76,7 +78,7 @@ To begin with, we write a query that finds all callables (that is, methods or co where c.getDoc().getJavadoc() = pt.getParent() select c, pt -It is now easy to add another conjunct to the ``where`` clause, restricting the query to ``@param`` tags that refer to a non-existent parameter: we simply need to require that no parameter of ``c`` has the name ``pt.getParamName()``. +It's now easy to add another conjunct to the ``where`` clause, restricting the query to ``@param`` tags that refer to a non-existent parameter: we simply need to require that no parameter of ``c`` has the name ``pt.getParamName()``. .. code-block:: ql @@ -92,7 +94,7 @@ Example: Finding spurious @throws tags A related, but somewhat more involved, problem is finding ``@throws`` tags that refer to an exception that the method in question cannot actually throw. -For example, consider the following Java program: +For example, consider this Java program: .. code-block:: java @@ -108,9 +110,9 @@ For example, consider the following Java program: } } -Notice that the Javadoc comment of ``A.foo`` documents two thrown exceptions: ``IOException`` and ``RuntimeException``. The former is clearly spurious: ``A.foo`` does not have a ``throws IOException`` clause, and thus cannot throw this kind of exception. On the other hand, ``RuntimeException`` is an unchecked exception, so it can be thrown even if there is no explicit ``throws`` clause listing it. Therefore, our query should flag the ``@throws`` tag for ``IOException``, but not the one for ``RuntimeException.`` +Notice that the Javadoc comment of ``A.foo`` documents two thrown exceptions: ``IOException`` and ``RuntimeException``. The former is clearly spurious: ``A.foo`` doesn't have a ``throws IOException`` clause, and therefore can't throw this kind of exception. On the other hand, ``RuntimeException`` is an unchecked exception, so it can be thrown even if there is no explicit ``throws`` clause listing it. So our query should flag the ``@throws`` tag for ``IOException``, but not the one for ``RuntimeException.`` -Recall from above that the CodeQL library represents ``@throws`` tags using class ``ThrowsTag``. This class does not provide a member predicate for determining the exception type that is being documented, so we first need to implement our own version. A simple version might look as follows: +Remember that the CodeQL library represents ``@throws`` tags using class ``ThrowsTag``. This class doesn't provide a member predicate for determining the exception type that is being documented, so we first need to implement our own version. A simple version might look like this: .. code-block:: ql @@ -118,7 +120,7 @@ Recall from above that the CodeQL library represents ``@throws`` tags using clas result.hasName(tt.getExceptionName()) } -Similarly, ``Callable`` does not come with a member predicate for querying all exceptions that the method or constructor may possibly throw. We can, however, implement this ourselves by using ``getAnException`` to find all ``throws`` clauses of the callable, and then use ``getType`` to resolve the corresponding exception types: +Similarly, ``Callable`` doesn't come with a member predicate for querying all exceptions that the method or constructor may possibly throw. We can, however, implement this ourselves by using ``getAnException`` to find all ``throws`` clauses of the callable, and then use ``getType`` to resolve the corresponding exception types: .. code-block:: ql @@ -131,7 +133,7 @@ Note the use of ``getASupertype*`` to find both exceptions declared in a ``throw Now we can write a query for finding all callables ``c`` and ``@throws`` tags ``tt`` such that: - ``tt`` belongs to a Javadoc comment attached to ``c``. -- ``c`` cannot throw the exception documented by ``tt``. +- ``c`` can't throw the exception documented by ``tt``. .. code-block:: ql @@ -145,17 +147,17 @@ Now we can write a query for finding all callables ``c`` and ``@throws`` tags `` not mayThrow(c, exn) select tt, "Spurious @throws tag." -➤ `See this in the query console `__. This finds several results in the LGTM.com demo projects. +➤ `See this in the query console on LGTM.com `__. This finds several results in the LGTM.com demo projects. Improvements ~~~~~~~~~~~~ Currently, there are two problems with this query: -#. ``getDocumentedException`` is too liberal: it will return *any* reference type with the right name, even if it is in a different package and not actually visible in the current compilation unit. -#. ``mayThrow`` is too restrictive: it does not account for unchecked exceptions, which do not need to be declared. +#. ``getDocumentedException`` is too liberal: it will return *any* reference type with the right name, even if it's in a different package and not actually visible in the current compilation unit. +#. ``mayThrow`` is too restrictive: it doesn't account for unchecked exceptions, which do not need to be declared. -To see why the former is a problem, consider the following program: +To see why the former is a problem, consider this program: .. code-block:: java @@ -166,9 +168,9 @@ To see why the former is a problem, consider the following program: void bar() throws IOException {} } -This program defines its own class ``IOException``, which is unrelated to the class ``java.io.IOException`` in the standard library: they are in different packages. Our ``getDocumentedException`` predicate does not check packages, however, so it will consider the ``@throws`` clause to refer to both ``IOException`` classes, and thus flag the ``@param`` tag as spurious, since ``B.bar`` cannot actually throw ``java.io.IOException``. +This program defines its own class ``IOException``, which is unrelated to the class ``java.io.IOException`` in the standard library: they are in different packages. Our ``getDocumentedException`` predicate doesn't check packages, however, so it will consider the ``@throws`` clause to refer to both ``IOException`` classes, and thus flag the ``@param`` tag as spurious, since ``B.bar`` can't actually throw ``java.io.IOException``. -As an example of the second problem, method ``A.foo`` from our previous example was annotated with a ``@throws RuntimeException`` tag. Our current version of ``mayThrow``, however, would think that ``A.foo`` cannot throw a ``RuntimeException``, and thus flag the tag as spurious. +As an example of the second problem, method ``A.foo`` from our previous example was annotated with a ``@throws RuntimeException`` tag. Our current version of ``mayThrow``, however, would think that ``A.foo`` can't throw a ``RuntimeException``, and thus flag the tag as spurious. We can make ``mayThrow`` less restrictive by introducing a new class to represent unchecked exceptions, which are just the subtypes of ``java.lang.RuntimeException`` and ``java.lang.Error``: @@ -196,7 +198,7 @@ Fixing ``getDocumentedException`` is more complicated, but we can easily cover t #. The ``@throws`` tag refers to a type in the same package. #. The ``@throws`` tag refers to a type that is imported by the current compilation unit. -The first case can be covered by changing ``getDocumentedException`` to use the qualified name of the ``@throws`` tag. To handle the second and the third case, we can introduce a new predicate ``visibleIn`` that checks whether a reference type is visible in a compilation unit, either by virtue of belonging to the same package or by being explicitly imported. We then rewrite ``getDocumentedException`` as follows: +The first case can be covered by changing ``getDocumentedException`` to use the qualified name of the ``@throws`` tag. To handle the second and the third case, we can introduce a new predicate ``visibleIn`` that checks whether a reference type is visible in a compilation unit, either by virtue of belonging to the same package or by being explicitly imported. We then rewrite ``getDocumentedException`` as: .. code-block:: ql @@ -212,13 +214,13 @@ The first case can be covered by changing ``getDocumentedException`` to use the (result.hasName(tt.getExceptionName()) and visibleIn(tt.getFile(), result)) } -➤ `See this in the query console `__. This finds many fewer, more interesting results in the LGTM.com demo projects. +➤ `See this in the query console on LGTM.com `__. This finds many fewer, more interesting results in the LGTM.com demo projects. -Currently, ``visibleIn`` only considers single-type imports, but it would be possible to extend it with support for other kinds of imports. +Currently, ``visibleIn`` only considers single-type imports, but you could extend it with support for other kinds of imports. -What next? ----------- +Further reading +--------------- -- Find out how you can use the location API to define queries on whitespace: :doc:`Tutorial: Working with source locations `. -- Find out how specific classes in the AST are represented in the standard library for Java: :doc:`AST class reference `. -- Find out more about QL in the `QL language handbook `__ and `QL language specification `__. +- Find out how you can use the location API to define queries on whitespace: :doc:`Working with source locations `. +- Find out how specific classes in the AST are represented in the standard library for Java: :doc:`Classes for working with Java code `. +- Find out more about QL in the `QL language reference `__. diff --git a/docs/language/learn-ql/java/ql-for-java.rst b/docs/language/learn-ql/java/ql-for-java.rst index aa26c5ba6bb..3b5b64dd99b 100644 --- a/docs/language/learn-ql/java/ql-for-java.rst +++ b/docs/language/learn-ql/java/ql-for-java.rst @@ -1,8 +1,9 @@ CodeQL for Java =============== +Experiment and learn how to write effective and efficient queries for CodeQL databases generated from Java codebases. + .. toctree:: - :glob: :hidden: introduce-libraries-java @@ -15,29 +16,28 @@ CodeQL for Java source-locations ast-class-reference -These topics provide an overview of the CodeQL libraries for Java and show examples of how to use them. +- `Basic Java query `__: Learn to write and run a simple CodeQL query using LGTM. -- `Basic Java query `__ describes how to write and run queries using LGTM. +- :doc:`CodeQL library for Java `: When analyzing Java code, you can use the large collection of classes in the CodeQL library for Java. -- :doc:`Introducing the CodeQL libraries for Java ` introduces the standard libraries used to write queries for Java code. +- :doc:`Analyzing data flow in Java `: You can use CodeQL to track the flow of data through a Java program to its use. -- :doc:`Tutorial: Analyzing data flow in Java ` demonstrates how to write queries using the standard data flow and taint tracking libraries for Java. +- :doc:`Java types `: You can use CodeQL to find out information about data types used in Java code. This allows you to write queries to identify specific type-related issues. -- :doc:`Tutorial: Types and the class hierarchy ` introduces the classes for representing a program's class hierarchy by means of examples. +- :doc:`Overflow-prone comparisons in Java `: You can use CodeQL to check for comparisons in Java code where one side of the comparison is prone to overflow. -- :doc:`Tutorial: Expressions and statements ` introduces the classes for representing a program's syntactic structure by means of examples. +- :doc:`Navigating the call graph `: CodeQL has classes for identifying code that calls other code, and code that can be called from elsewhere. This allows you to find, for example, methods that are never used. -- :doc:`Tutorial: Navigating the call graph ` is a worked example of how to write a query that navigates a program's call graph to find unused methods. +- :doc:`Annotations in Java `: CodeQL databases of Java projects contain information about all annotations attached to program elements. -- :doc:`Tutorial: Annotations ` introduces the classes for representing annotations by means of examples. +- :doc:`Javadoc `: You can use CodeQL to find errors in Javadoc comments in Java code. -- :doc:`Tutorial: Javadoc ` introduces the classes for representing Javadoc comments by means of examples. +- :doc:`Working with source locations `: You can use the location of entities within Java code to look for potential errors. Locations allow you to deduce the presence, or absence, of white space which, in some cases, may indicate a problem. -- :doc:`Tutorial: Working with source locations ` is a worked example of how to write a query that uses the location information provided in the database for finding likely bugs. +- :doc:`Classes for working with Java code `: CodeQL has a large selection of classes for working with Java statements and expressions. -- :doc:`AST class reference ` gives an overview of all AST classes in the standard CodeQL library for Java. -Other resources +Further reading --------------- - For examples of how to query common Java elements, see the `Java cookbook `__. diff --git a/docs/language/learn-ql/java/source-locations.rst b/docs/language/learn-ql/java/source-locations.rst index 6e29777fab9..7d3506b3923 100644 --- a/docs/language/learn-ql/java/source-locations.rst +++ b/docs/language/learn-ql/java/source-locations.rst @@ -1,10 +1,12 @@ -Tutorial: Working with source locations -======================================= +Working with source locations +============================= -Overview --------- +You can use the location of entities within Java code to look for potential errors. Locations allow you to deduce the presence, or absence, of white space which, in some cases, may indicate a problem. -Java offers a rich set of operators with complex precedence rules, which are sometimes confusing to developers. For instance, the class ``ByteBufferCache`` in the OpenJDK Java compiler (which is a member class of ``com.sun.tools.javac.util.BaseFileManager``) contains the following code for allocating a buffer: +About source locations +---------------------- + +Java offers a rich set of operators with complex precedence rules, which are sometimes confusing to developers. For instance, the class ``ByteBufferCache`` in the OpenJDK Java compiler (which is a member class of ``com.sun.tools.javac.util.BaseFileManager``) contains this code for allocating a buffer: .. code-block:: java @@ -14,14 +16,14 @@ Presumably, the author meant to allocate a buffer that is 1.5 times the size ind Note that the source layout gives a fairly clear indication of the intended meaning: there is more white space around ``+`` than around ``>>``, suggesting that the latter is meant to bind more tightly. -We will now develop a query that finds this kind of suspicious nesting, where the operator of the inner expression has more white space around it than the operator of the outer expression. This pattern may not necessarily indicate a bug, but at the very least it makes the code hard to read and prone to misinterpretation. +We're going to develop a query that finds this kind of suspicious nesting, where the operator of the inner expression has more white space around it than the operator of the outer expression. This pattern may not necessarily indicate a bug, but at the very least it makes the code hard to read and prone to misinterpretation. -White space is not directly represented in the CodeQL database, but we can deduce its presence from the location information associated with program elements and AST nodes. So we will start by providing an overview of source location management in the standard library for Java. +White space is not directly represented in the CodeQL database, but we can deduce its presence from the location information associated with program elements and AST nodes. So, before we write our query, we need an understanding of source location management in the standard library for Java. Location API ------------ -For every entity that has a representation in Java source code (including, in particular, program elements and AST nodes), the standard CodeQL library provides the following predicates for accessing source location information: +For every entity that has a representation in Java source code (including, in particular, program elements and AST nodes), the standard CodeQL library provides these predicates for accessing source location information: - ``getLocation`` returns a ``Location`` object describing the start and end position of the entity. - ``getFile`` returns a ``File`` object representing the file containing the entity. @@ -29,7 +31,7 @@ For every entity that has a representation in Java source code (including, in pa - ``getNumberOfCommentLines`` returns the number of comment lines. - ``getNumberOfLinesOfCode`` returns the number of non-comment lines. -For example, assume the following Java class is defined in compilation unit ``SayHello.java``: +For example, let's assume this Java class is defined in the compilation unit ``SayHello.java``: .. code-block:: java @@ -44,20 +46,20 @@ For example, assume the following Java class is defined in compilation unit ``Sa } } -Invoking ``getFile`` on the expression statement in the body of ``main`` will return a ``File`` object representing the file ``SayHello.java``. The statement spans four lines in total ``(getTotalNumberOfLines``), of which one is a comment line (``getNumberOfCommentLines``), while three lines contain code (``getNumberOfLinesOfCode``). +Invoking ``getFile`` on the expression statement in the body of ``main`` returns a ``File`` object representing the file ``SayHello.java``. The statement spans four lines in total ``(getTotalNumberOfLines``), of which one is a comment line (``getNumberOfCommentLines``), while three lines contain code (``getNumberOfLinesOfCode``). Class ``Location`` defines member predicates ``getStartLine``, ``getEndLine``, ``getStartColumn`` and ``getEndColumn`` to retrieve the line and column number an entity starts and ends at, respectively. Both lines and columns are counted starting from 1 (not 0), and the end position is inclusive, that is, it is the position of the last character belonging to the source code of the entity. In our example, the expression statement starts at line 5, column 3 (the first two characters on the line are tabs, which each count as one character), and it ends at line 8, column 4. -Class ``File`` defines the following member predicates: +Class ``File`` defines these member predicates: - ``getFullName`` returns the fully qualified name of the file. - ``getRelativePath`` returns the path of the file relative to the base directory of the source code. - ``getExtension`` returns the extension of the file. - ``getShortName`` returns the base name of the file, without its extension. -In our example, assume file ``A.java`` is located in directory ``/home/testuser/code/pkg``, where ``/home/testuser/code`` is the base directory of the program being analyzed. Then, a ``File`` object for ``A.java`` returns the following: +In our example, assume file ``A.java`` is located in directory ``/home/testuser/code/pkg``, where ``/home/testuser/code`` is the base directory of the program being analyzed. Then, a ``File`` object for ``A.java`` returns: - ``getFullName`` is ``/home/testuser/code/pkg/A.java``. - ``getRelativePath`` is ``pkg/A.java``. @@ -67,7 +69,7 @@ In our example, assume file ``A.java`` is located in directory ``/home/testuser/ Determining white space around an operator ------------------------------------------ -Let us start by considering how to write a predicate that computes the total amount of white space surrounding the operator of a given binary expression. If ``rcol`` is the start column of the expression's right operand and ``lcol`` is the end column of its left operand, then ``rcol - (lcol+1)`` gives us the total number of characters in between the two operands (note that we have to use ``lcol+1`` instead of ``lcol`` because end positions are inclusive). +Let's start by considering how to write a predicate that computes the total amount of white space surrounding the operator of a given binary expression. If ``rcol`` is the start column of the expression's right operand and ``lcol`` is the end column of its left operand, then ``rcol - (lcol+1)`` gives us the total number of characters in between the two operands (note that we have to use ``lcol+1`` instead of ``lcol`` because end positions are inclusive). This number includes the length of the operator itself, which we need to subtract out. For this, we can use predicate ``getOp``, which returns the operator string, surrounded by one white space on either side. Overall, the expression for computing the amount of white space around the operator of a binary expression ``expr`` is: @@ -88,12 +90,12 @@ Clearly, however, this only works if the entire expression is on a single line, ) } -Notice that we use an ``exists`` to introduce our temporary variables ``lcol`` and ``rcol``. The predicate could be written without them by just inlining ``lcol`` and ``rcol`` into their use, at some cost in readability. +Notice that we use an ``exists`` to introduce our temporary variables ``lcol`` and ``rcol``. You could write the predicate without them by just inlining ``lcol`` and ``rcol`` into their use, at some cost in readability. Find suspicious nesting ----------------------- -A first version of our query can now be written: +Here's a first version of our query: .. code-block:: ql @@ -108,7 +110,7 @@ A first version of our query can now be written: wsinner > wsouter select outer, "Whitespace around nested operators contradicts precedence." -➤ `See this in the query console `__. This query is likely to find results on most projects. +➤ `See this in the query console on LGTM.com `__. This query is likely to find results on most projects. The first conjunct of the ``where`` clause restricts ``inner`` to be an operand of ``outer``, the second conjunct binds ``wsinner`` and ``wsouter``, while the last conjunct selects the suspicious cases. @@ -123,7 +125,7 @@ If we run this initial query, we might notice some false positives arising from i< start + 100 -Note that our predicate ``operatorWS`` computes the **total** amount of white space around the operator, which, in this case, is one for the ``<`` and two for the ``+``. Ideally, we would like to exclude cases where the amount of white space before and after the operator are not the same. Currently, CodeQL databases do not record enough information to figure this out, but as an approximation we could require that the total number of white space characters is even: +Note that our predicate ``operatorWS`` computes the **total** amount of white space around the operator, which, in this case, is one for the ``<`` and two for the ``+``. Ideally, we would like to exclude cases where the amount of white space before and after the operator are not the same. Currently, CodeQL databases don't record enough information to figure this out, but as an approximation we could require that the total number of white space characters is even: .. code-block:: ql @@ -139,7 +141,7 @@ Note that our predicate ``operatorWS`` computes the **total** amount of white sp wsinner > wsouter select outer, "Whitespace around nested operators contradicts precedence." -➤ `See this in the query console `__. Any results will be refined by our changes to the query. +➤ `See this in the query console on LGTM.com `__. Any results will be refined by our changes to the query. Another source of false positives are associative operators: in an expression of the form ``x + y+z``, the first plus is syntactically nested inside the second, since + in Java associates to the left; hence the expression is flagged as suspicious. But since + is associative to begin with, it does not matter which way around the operators are nested, so this is a false positive.To exclude these cases, let us define a new class identifying binary expressions with an associative operator: @@ -171,7 +173,7 @@ Now we can extend our query to discard results where the outer and the inner exp wsinner > wsouter select outer, "Whitespace around nested operators contradicts precedence." -➤ `See this in the query console `__. +➤ `See this in the query console on LGTM.com `__. Notice that we again use ``getOp``, this time to determine whether two binary expressions have the same operator. Running our improved query now finds the Java standard library bug described in the Overview. It also flags up the following suspicious code in `Hadoop HBase `__: @@ -181,8 +183,8 @@ Notice that we again use ``getOp``, this time to determine whether two binary ex Whitespace suggests that the programmer meant to toggle ``i`` between zero and one, but in fact the expression is parsed as ``i + (1%2)``, which is the same as ``i + 1``, so ``i`` is simply incremented. -What next? ----------- +Further reading +--------------- -- Find out how specific classes in the AST are represented in the standard library for Java: :doc:`AST class reference `. -- Find out more about QL in the `QL language handbook `__ and `QL language specification `__. +- Find out how specific classes in the AST are represented in the standard library for Java: :doc:`Classes for working with Java code `. +- Find out more about QL in the `QL language reference `__. diff --git a/docs/language/learn-ql/java/types-class-hierarchy.rst b/docs/language/learn-ql/java/types-class-hierarchy.rst index a438f111cab..3cdafa8389a 100644 --- a/docs/language/learn-ql/java/types-class-hierarchy.rst +++ b/docs/language/learn-ql/java/types-class-hierarchy.rst @@ -1,8 +1,10 @@ -Tutorial: Types and the class hierarchy -======================================= +Java types +========== -Overview --------- +You can use CodeQL to find out information about data types used in Java code. This allows you to write queries to identify specific type-related issues. + +About working with Java types +----------------------------- The standard CodeQL library represents Java types by means of the ``Type`` class and its various subclasses. @@ -30,7 +32,7 @@ To determine ancestor types (including immediate super types, and also *their* s where B.hasName("B") select B.getASupertype+() -➤ `See this in the query console `__. If this query were run on the example snippet above, the query would return ``A``, ``I``, and ``java.lang.Object``. +➤ `See this in the query console on LGTM.com `__. If this query were run on the example snippet above, the query would return ``A``, ``I``, and ``java.lang.Object``. .. pull-quote:: @@ -59,7 +61,7 @@ If the expression ``e`` happens to actually evaluate to a ``B[]`` array, on the Object[] o = new String[] { "Hello", "world" }; String[] s = (String[])o; -In this tutorial, we do not try to distinguish these two cases. Our query should simply look for cast expressions ``ce`` that cast from some type ``source`` to another type ``target``, such that: +In this tutorial, we don't try to distinguish these two cases. Our query should simply look for cast expressions ``ce`` that cast from some type ``source`` to another type ``target``, such that: - Both ``source`` and ``target`` are array types. - The element type of ``source`` is a transitive super type of the element type of ``target``. @@ -76,7 +78,7 @@ This recipe is not too difficult to translate into a query: target.getElementType().(RefType).getASupertype+() = source.getElementType() select ce, "Potentially problematic array downcast." -➤ `See this in the query console `__. Many projects return results for this query. +➤ `See this in the query console on LGTM.com `__. Many projects return results for this query. Note that by casting ``target.getElementType()`` to a ``RefType``, we eliminate all cases where the element type is a primitive type, that is, ``target`` is an array of primitive type: the problem we are looking for cannot arise in that case. Unlike in Java, a cast in QL never fails: if an expression cannot be cast to the desired type, it is simply excluded from the query results, which is exactly what we want. @@ -139,12 +141,12 @@ Using these new classes we can extend our query to exclude calls to ``toArray`` not ce.getExpr().(CollectionToArrayCall).getActualReturnType() = target select ce, "Potentially problematic array downcast." -➤ `See this in the query console `__. Notice that fewer results are found by this improved query. +➤ `See this in the query console on LGTM.com `__. Notice that fewer results are found by this improved query. Example: Finding mismatched contains checks ------------------------------------------- -As another example, we develop a query that finds uses of ``Collection.contains`` where the type of the queried element is unrelated to the element type of the collection, thus guaranteeing that the test will always return ``false``. +We'll now develop a query that finds uses of ``Collection.contains`` where the type of the queried element is unrelated to the element type of the collection, which guarantees that the test will always return ``false``. For example, `Apache Zookeeper `__ used to have a snippet of code similar to the following in class ``QuorumPeerConfig``: @@ -265,14 +267,14 @@ Now we are ready to write a first version of our query: not haveCommonDescendant(collEltType, argType) select juccc, "Element type " + collEltType + " is incompatible with argument type " + argType -➤ `See this in the query console `__. +➤ `See this in the query console on LGTM.com `__. Improvements ~~~~~~~~~~~~ For many programs, this query yields a large number of false positive results due to type variables and wild cards: if the collection element type is some type variable ``E`` and the argument type is ``String``, for example, CodeQL will consider that the two have no common subtype, and our query will flag the call. An easy way to exclude such false positive results is to simply require that neither ``collEltType`` nor ``argType`` are instances of ``TypeVariable``. -Another source of false positives is autoboxing of primitive types: if, for example, the collection's element type is ``Integer`` and the argument is of type ``int``, predicate ``haveCommonDescendant`` will fail, since ``int`` is not a ``RefType``. Thus, our query should check that ``collEltType`` is not the boxed type of ``argType``. +Another source of false positives is autoboxing of primitive types: if, for example, the collection's element type is ``Integer`` and the argument is of type ``int``, predicate ``haveCommonDescendant`` will fail, since ``int`` is not a ``RefType``. To account for this, our query should check that ``collEltType`` is not the boxed type of ``argType``. Finally, ``null`` is special because its type (known as ```` in the CodeQL library) is compatible with every reference type, so we should exclude it from consideration. @@ -292,11 +294,11 @@ Adding these three improvements, our final query becomes: not argType.hasName("") select juccc, "Element type " + collEltType + " is incompatible with argument type " + argType -➤ `See the full query in the query console `__. +➤ `See the full query in the query console on LGTM.com `__. -What next? ----------- +Further reading +--------------- -- Take a look at some of the other tutorials: :doc:`Tutorial: Expressions and statements `, :doc:`Tutorial: Navigating the call graph `, :doc:`Tutorial: Annotations `, :doc:`Tutorial: Javadoc `, and :doc:`Tutorial: Working with source locations `. -- Find out how specific classes in the AST are represented in the standard library for Java: :doc:`AST class reference `. -- Find out more about QL in the `QL language handbook `__ and `QL language specification `__. +- Take a look at some of the other articles in this section: :doc:`Overflow-prone comparisons in Java `, :doc:`Navigating the call graph `, :doc:`Annotations in Java `, :doc:`Javadoc `, and :doc:`Working with source locations `. +- Find out how specific classes in the AST are represented in the standard library for Java: :doc:`Classes for working with Java code `. +- Find out more about QL in the `QL language reference `__. diff --git a/docs/language/learn-ql/javascript/ast-class-reference.rst b/docs/language/learn-ql/javascript/ast-class-reference.rst index 1b513b2d4ec..f0a7f88fc17 100644 --- a/docs/language/learn-ql/javascript/ast-class-reference.rst +++ b/docs/language/learn-ql/javascript/ast-class-reference.rst @@ -1,5 +1,7 @@ -AST class reference -=================== +Abstract syntax tree classes for JavaScript and TypeScript +========================================================== + +CodeQL has a large selection of classes for working with JavaScript and TypeScript statements and expressions. Statement classes ----------------- diff --git a/docs/language/learn-ql/javascript/dataflow-cheat-sheet.rst b/docs/language/learn-ql/javascript/dataflow-cheat-sheet.rst index a2ff4309ee0..a78dc116e37 100644 --- a/docs/language/learn-ql/javascript/dataflow-cheat-sheet.rst +++ b/docs/language/learn-ql/javascript/dataflow-cheat-sheet.rst @@ -1,7 +1,7 @@ -Data flow cheat sheet -===================== +Data flow cheat sheet for JavaScript +==================================== -This page describes parts of the JavaScript libraries commonly used for variant analysis and in data flow queries. +This article describes parts of the JavaScript libraries commonly used for variant analysis and in data flow queries. Taint tracking path queries --------------------------- @@ -34,12 +34,12 @@ This query reports flow paths which: - Step through variables, function calls, properties, strings, arrays, promises, exceptions, and steps added by `isAdditionalTaintStep `__. - End at a node matched by `isSink `__. -See also: `Global data flow `__ and :doc:`Constructing path queries <../writing-queries/path-queries>`. +See also: `Global data flow `__ and :doc:`Creating path queries <../writing-queries/path-queries>`. DataFlow module --------------- -Use data flow nodes to match program elements independently of syntax. See also: :doc:`Analyzing data flow in JavaScript/TypeScript `. +Use data flow nodes to match program elements independently of syntax. See also: :doc:`Analyzing data flow in JavaScript and TypeScript `. Predicates in the ``DataFlow::`` module: @@ -142,7 +142,7 @@ Files AST nodes --------- -See also: :doc:`AST class reference `. +See also: :doc:`Abstract syntax tree classes for JavaScript and TypeScript `. Conversion between DataFlow and AST nodes: @@ -163,7 +163,7 @@ String matching Type tracking ------------- -See also: :doc:`Type tracking tutorial `. +See also: :doc:`Using type tracking for API modeling `. Use the following template to define forward type tracking predicates: diff --git a/docs/language/learn-ql/javascript/dataflow.rst b/docs/language/learn-ql/javascript/dataflow.rst index 15cd1ad7e46..5009db4756a 100644 --- a/docs/language/learn-ql/javascript/dataflow.rst +++ b/docs/language/learn-ql/javascript/dataflow.rst @@ -1,16 +1,15 @@ Analyzing data flow in JavaScript and TypeScript ================================================ +This topic describes how data flow analysis is implemented in the CodeQL libraries for JavaScript/TypeScript and includes examples to help you write your own data flow queries. + Overview -------- - -This topic describes how data flow analysis is implemented in the CodeQL libraries for JavaScript/TypeScript and includes examples to help you write your own data flow queries. -The following sections describe how to utilize the libraries for local data flow, global data flow, and taint tracking. - +The various sections in this article describe how to utilize the libraries for local data flow, global data flow, and taint tracking. As our running example, we will develop a query that identifies command-line arguments that are passed as a file path to the standard Node.js ``readFile`` function. While this is not a problematic pattern as such, it is typical of the kind of reasoning that is frequently used in security queries. -For a more general introduction to modeling data flow, see :doc:`Introduction to data flow analysis with CodeQL <../intro-to-data-flow>`. +For a more general introduction to modeling data flow, see :doc:`About data flow analysis <../intro-to-data-flow>`. Data flow nodes --------------- @@ -465,12 +464,12 @@ Hint: array indices are properties with numeric names; you can use regular expre Exercise 4: Using the answers from 2 and 3, write a query which finds all global data flows from array elements of the result of a call to the ``tagName`` argument to the ``createElement`` function. (`Answer <#exercise-4>`__) -What next? ----------- +Further reading +--------------- -- Find out more about QL in the `QL language handbook `__ and `QL language specification `__. -- Learn more about the query console in `Using the query console `__. -- Learn about writing more precise data-flow analyses in :doc:`Advanced data-flow analysis using flow labels ` +- Find out more about QL in the `QL language reference `__. +- Learn more about the query console in `Using the query console `__ on LGTM.com. +- Learn about writing more precise data-flow analyses in :doc:`Using flow labels for precise data flow analysis ` Answers ------- diff --git a/docs/language/learn-ql/javascript/flow-labels.rst b/docs/language/learn-ql/javascript/flow-labels.rst index c693319576d..ecd8dec6b29 100644 --- a/docs/language/learn-ql/javascript/flow-labels.rst +++ b/docs/language/learn-ql/javascript/flow-labels.rst @@ -1,8 +1,13 @@ -Tutorial: Precise data-flow analysis using flow labels -====================================================== +Using flow labels for precise data flow analysis +================================================ + +You can associate flow labels with each value tracked by the flow analysis to determine whether the flow contains potential vulnerabilities. + +Overview +-------- You can use basic inter-procedural data-flow analysis and taint tracking as described in -:doc:`Analyzing data flow in JavaScript/TypeScript ` to check whether there is a path in +:doc:`Analyzing data flow in JavaScript and TypeScript ` to check whether there is a path in the data-flow graph from some source node to a sink node that does not pass through any sanitizer nodes. Another way of thinking about this is that it statically models the flow of data through the program, and associates a flag with every data value telling us whether it might have come from a @@ -390,9 +395,9 @@ tainted objects from partially tainted objects. The `Uncontrolled data used in p `_ query uses four flow labels to track whether a user-controlled string may be an absolute path and whether it may contain ``..`` components. -What next? ----------- +Further reading +--------------- -- Learn about the standard CodeQL libraries used to write queries for JavaScript in :doc:`Introducing the JavaScript libraries `. -- Find out more about QL in the `QL language handbook `__ and `QL language specification `__. -- Learn more about the query console in `Using the query console `__. +- Learn about the standard CodeQL libraries used to write queries for JavaScript in :doc:`CodeQL libraries for JavaScript `. +- Find out more about QL in the `QL language reference `__. +- Learn more about the query console in `Using the query console `__ on LGTM.com. diff --git a/docs/language/learn-ql/javascript/introduce-libraries-js.rst b/docs/language/learn-ql/javascript/introduce-libraries-js.rst index f01a8387a18..65b19f2687d 100644 --- a/docs/language/learn-ql/javascript/introduce-libraries-js.rst +++ b/docs/language/learn-ql/javascript/introduce-libraries-js.rst @@ -1,5 +1,7 @@ -Introducing the CodeQL libraries for JavaScript -=============================================== +CodeQL library for JavaScript +============================= + +When you're analyzing a JavaScript program, you can make use of the large collection of classes in the CodeQL library for JavaScript. Overview -------- @@ -73,7 +75,7 @@ For example, the following query computes, for each folder, the number of JavaSc from Folder d select d.getRelativePath(), count(File f | f = d.getAFile() and f.getExtension() = "js") -➤ `See this in the query console `__. When you run the query on most projects, the results include folders that contain files with a ``js`` extension and folders that don't. +➤ `See this in the query console on LGTM.com `__. When you run the query on most projects, the results include folders that contain files with a ``js`` extension and folders that don't. Locations ^^^^^^^^^ @@ -134,7 +136,7 @@ As an example of a query operating entirely on the lexical level, consider the f where comma.getNextToken() instanceof CommaToken select comma, "Omitted array elements are bad style." -➤ `See this in the query console `__. If the query returns no results, this pattern isn't used in the projects that you analyzed. +➤ `See this in the query console on LGTM.com `__. If the query returns no results, this pattern isn't used in the projects that you analyzed. You can use predicate ``Locatable.getFirstToken()`` and ``Locatable.getLastToken()`` to access the first and last token (if any) belonging to an element with a source location. @@ -175,7 +177,7 @@ As an example of a query using only lexical information, consider the following from HtmlLineComment c select c, "Do not use HTML comments." -➤ `See this in the query console `__. When we ran this query on the *mozilla/pdf.js* project in LGTM.com, we found three HTML comments. +➤ `See this in the query console on LGTM.com `__. When we ran this query on the *mozilla/pdf.js* project in LGTM.com, we found three HTML comments. Syntactic level ~~~~~~~~~~~~~~~ @@ -347,7 +349,7 @@ As an example of how to use expression AST nodes, here is a query that finds exp where add = shift.getAnOperand() select add, "This expression should be bracketed to clarify precedence rules." -➤ `See this in the query console `__. When we ran this query on the *meteor/meteor* project in LGTM.com, we found many results where precedence could be clarified using brackets. +➤ `See this in the query console on LGTM.com `__. When we ran this query on the *meteor/meteor* project in LGTM.com, we found many results where precedence could be clarified using brackets. Functions ^^^^^^^^^ @@ -369,7 +371,7 @@ As an example, here is a query that finds all expression closures: where fe.getBody() instanceof Expr select fe, "Use arrow expressions instead of expression closures." -➤ `See this in the query console `__. None of the LGTM.com demo projects uses expression closures, but you may find this query gets results on other projects. +➤ `See this in the query console on LGTM.com `__. None of the LGTM.com demo projects uses expression closures, but you may find this query gets results on other projects. As another example, this query finds functions that have two parameters that bind the same variable: @@ -384,7 +386,7 @@ As another example, this query finds functions that have two parameters that bin p.getAVariable() = q.getAVariable() select fun, "This function has two parameters that bind the same variable." -➤ `See this in the query console `__. None of the LGTM.com demo projects has functions where two parameters bind the same variable. +➤ `See this in the query console on LGTM.com `__. None of the LGTM.com demo projects has functions where two parameters bind the same variable. Classes ^^^^^^^ @@ -440,7 +442,7 @@ Here is an example of a query to find declaration statements that declare the sa not ds.getTopLevel().isMinified() select ds, "Variable " + v.getName() + " is declared both $@ and $@.", d1, "here", d2, "here" -➤ `See this in the query console `__. This is not a common problem, so you may not find any results in your own projects. The *angular/angular.js* project on LGTM.com has one instance of this problem at the time of writing. +➤ `See this in the query console on LGTM.com `__. This is not a common problem, so you may not find any results in your own projects. The *angular/angular.js* project on LGTM.com has one instance of this problem at the time of writing. Notice the use of ``not ... isMinified()`` here and in the next few queries. This excludes any results found in minified code. If you delete ``and not ds.getTopLevel().isMinified()`` and re-run the query, two results in minified code in the *meteor/meteor* project are reported. @@ -467,7 +469,7 @@ As an example of a query involving properties, consider the following query that not oe.getTopLevel().isMinified() select oe, "Property " + p1.getName() + " is defined both $@ and $@.", p1, "here", p2, "here" -➤ `See this in the query console `__. Many projects have a few instances of object expressions with two identically named properties. +➤ `See this in the query console on LGTM.com `__. Many projects have a few instances of object expressions with two identically named properties. Modules ^^^^^^^ @@ -533,7 +535,7 @@ As an example, consider the following query which finds distinct function declar not g.getTopLevel().isMinified() select f, g -➤ `See this in the query console `__. Some projects declare conflicting functions of the same name and rely on platform-specific behavior to disambiguate the two declarations. +➤ `See this in the query console on LGTM.com `__. Some projects declare conflicting functions of the same name and rely on platform-specific behavior to disambiguate the two declarations. Control flow ~~~~~~~~~~~~ @@ -570,7 +572,7 @@ As an example of an analysis using basic blocks, ``BasicBlock.isLiveAtEntry(v, u not f.getStartBB().isLiveAtEntry(gv, _) select f, "This function uses " + gv + " like a local variable." -➤ `See this in the query console `__. Many projects have some variables which look as if they were intended to be local. +➤ `See this in the query console on LGTM.com `__. Many projects have some variables which look as if they were intended to be local. Data flow ~~~~~~~~~ @@ -595,7 +597,7 @@ As an example, the following query finds definitions of local variables that are not exists (VarUse use | def = use.getADef()) select def, "Dead store of local variable." -➤ `See this in the query console `__. Many projects have some examples of useless assignments to local variables. +➤ `See this in the query console on LGTM.com `__. Many projects have some examples of useless assignments to local variables. SSA ^^^ @@ -638,7 +640,7 @@ For example, here is a query that finds all invocations of a method called ``sen send.getMethodName() = "send" select send -➤ `See this in the query console `__. The query finds HTTP response sends in the `AMP HTML `__ project. +➤ `See this in the query console on LGTM.com `__. The query finds HTTP response sends in the `AMP HTML `__ project. Note that the data flow modeling in this library is intraprocedural, that is, flow across function calls and returns is *not* modeled. Likewise, flow through object properties and global variables is not modeled. @@ -703,7 +705,7 @@ As an example of a call-graph-based query, here is a query to find invocations f not exists(invk.getACallee()) select invk, "Unable to find a callee for this invocation." -➤ `See this in the query console `__ +➤ `See this in the query console on LGTM.com `__ Inter-procedural data flow ~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -839,7 +841,7 @@ As an example of the use of these classes, here is a query that counts for every from NodeModule m select m, count(m.getAnImportedModule()) -➤ `See this in the query console `__. When you analyze a project, for each module you can see how many other modules it imports. +➤ `See this in the query console on LGTM.com `__. When you analyze a project, for each module you can see how many other modules it imports. NPM ^^^ @@ -868,7 +870,7 @@ As an example of the use of these classes, here is a query that identifies unuse not exists (Require req | req.getTopLevel() = pkg.getAModule() | name = req.getImportedPath().getValue()) select deps, "Unused dependency '" + name + "'." -➤ `See this in the query console `__. It is not uncommon for projects to have some unused dependencies. +➤ `See this in the query console on LGTM.com `__. It is not uncommon for projects to have some unused dependencies. React ^^^^^ @@ -895,7 +897,7 @@ For example, here is a query to find SQL queries that use string concatenation ( where ss instanceof AddExpr select ss, "Use templating instead of string concatenation." -➤ `See this in the query console `__, showing two (benign) results on `strong-arc `__. +➤ `See this in the query console on LGTM.com `__, showing two (benign) results on `strong-arc `__. Miscellaneous ~~~~~~~~~~~~~ @@ -961,7 +963,7 @@ As an example, here is a query that finds ``@param`` tags that do not specify th not exists(t.getName()) select t, "@param tag is missing name." -➤ `See this in the query console `__. Of the LGTM.com demo projects analyzed, only *Semantic-Org/Semantic-UI* has an example where the ``@param`` tag omits the name. +➤ `See this in the query console on LGTM.com `__. Of the LGTM.com demo projects analyzed, only *Semantic-Org/Semantic-UI* has an example where the ``@param`` tag omits the name. For full details on these and other classes representing JSDoc comments and type expressions, see `the API documentation `__. @@ -1026,9 +1028,9 @@ Alias nodes are represented by class `YAMLAliasNode `. -- Find out more about QL in the `QL language handbook `__ and `QL language specification `__. -- Learn more about the query console in `Using the query console `__. +- Learn about the standard CodeQL libraries used to write queries for TypeScript in :doc:`CodeQL libraries for TypeScript `. +- Find out more about QL in the `QL language reference `__. +- Learn more about the query console in `Using the query console `__ on LGTM.com. diff --git a/docs/language/learn-ql/javascript/introduce-libraries-ts.rst b/docs/language/learn-ql/javascript/introduce-libraries-ts.rst index e0c1ec748ff..672ea369479 100644 --- a/docs/language/learn-ql/javascript/introduce-libraries-ts.rst +++ b/docs/language/learn-ql/javascript/introduce-libraries-ts.rst @@ -1,5 +1,7 @@ -Introducing the CodeQL libraries for TypeScript -=============================================== +CodeQL library for TypeScript +============================= + +When you're analyzing a TypeScript program, you can make use of the large collection of classes in the CodeQL library for TypeScript. Overview -------- @@ -10,7 +12,7 @@ Support for analyzing TypeScript code is bundled with the CodeQL libraries for J import javascript -The :doc:`CodeQL library introduction for JavaScript ` covers most of this library, and is also relevant for TypeScript analysis. This document supplements the JavaScript documentation with the TypeScript-specific classes and predicates. +:doc:`CodeQL libraries for JavaScript ` covers most of this library, and is also relevant for TypeScript analysis. This document supplements the JavaScript documentation with the TypeScript-specific classes and predicates. Syntax ------ @@ -119,7 +121,7 @@ Select expressions that cast a value to a type parameter: where assertion.getTypeAnnotation() = param.getLocalTypeName().getAnAccess() select assertion, "Cast to type parameter." -➤ `See this in the query console `__. +➤ `See this in the query console on LGTM.com `__. Classes and interfaces ~~~~~~~~~~~~~~~~~~~~~~ @@ -134,7 +136,7 @@ The CodeQL class `ClassOrInterface `__. -Also see the documentation for classes in the `Introduction to the CodeQL libraries for JavaScript `__. +Also see the documentation for classes in the `CodeQL libraries for JavaScript `__. To select the type references to a class or an interface, use ``getTypeName()``. @@ -405,7 +407,7 @@ It is best to use `TypeName `__. +➤ `See this in the query console on LGTM.com `__. Find imported names that are used as both a type and a value: @@ -418,7 +420,7 @@ Find imported names that are used as both a type and a value: and exists (VarAccess access | access.getVariable().getADeclaration() = spec.getLocal()) select spec, "Used as both variable and type" -➤ `See this in the query console `__. +➤ `See this in the query console on LGTM.com `__. Namespace names ~~~~~~~~~~~~~~~ @@ -444,9 +446,9 @@ A `LocalNamespaceName `. -- Find out more about QL in the `QL language handbook `__ and `QL language specification `__. -- Learn more about the query console in `Using the query console `__. \ No newline at end of file +- Learn about the standard CodeQL libraries used to write queries for JavaScript in :doc:`CodeQL libraries for JavaScript `. +- Find out more about QL in the `QL language reference `__. +- Learn more about the query console in `Using the query console `__ on LGTM.com. diff --git a/docs/language/learn-ql/javascript/ql-for-javascript.rst b/docs/language/learn-ql/javascript/ql-for-javascript.rst index d2d864cc072..5e10e9f979d 100644 --- a/docs/language/learn-ql/javascript/ql-for-javascript.rst +++ b/docs/language/learn-ql/javascript/ql-for-javascript.rst @@ -1,8 +1,9 @@ CodeQL for JavaScript ===================== +Experiment and learn how to write effective and efficient queries for CodeQL databases generated from JavaScript codebases. + .. toctree:: - :glob: :hidden: introduce-libraries-js @@ -13,23 +14,23 @@ CodeQL for JavaScript ast-class-reference dataflow-cheat-sheet -These documents provide an overview of the CodeQL libraries for JavaScript and TypeScript and show examples of how to use them. +- `Basic JavaScript query `__: Learn to write and run a simple CodeQL query using LGTM. -- `Basic JavaScript query `__ describes how to write and run queries using LGTM. +- :doc:`CodeQL library for JavaScript `: When you're analyzing a JavaScript program, you can make use of the large collection of classes in the CodeQL library for JavaScript. -- :doc:`Introducing the CodeQL libraries for JavaScript ` introduces the standard libraries used to write queries for JavaScript code. There is an extensive CodeQL library for analyzing JavaScript code. This tutorial briefly summarizes the most important classes and predicates provided by this library. +- :doc:`CodeQL library for TypeScript `: When you're analyzing a TypeScript program, you can make use of the large collection of classes in the CodeQL library for TypeScript. -- :doc:`Introducing the CodeQL libraries for TypeScript ` introduces the standard libraries used to write queries for TypeScript code. +- :doc:`Analyzing data flow in JavaScript and TypeScript `: This topic describes how data flow analysis is implemented in the CodeQL libraries for JavaScript/TypeScript and includes examples to help you write your own data flow queries. -- :doc:`Analyzing data flow in JavaScript/TypeScript ` demonstrates how to write queries using the standard data flow and taint tracking libraries for JavaScript/TypeScript. +- :doc:`Using flow labels for precise data flow analysis `: You can associate flow labels with each value tracked by the flow analysis to determine whether the flow contains potential vulnerabilities. -- :doc:`Advanced data-flow analysis using flow labels ` shows a more advanced example of data flow analysis using flow labels. +- :doc:`Using type tracking for API modeling `: You can track data through an API by creating a model using the CodeQL type-tracking library for JavaScript. -- :doc:`AST class reference ` gives an overview of all AST classes in the standard CodeQL library for JavaScript. +- :doc:`Abstract syntax tree classes for JavaScript and TypeScript `: CodeQL has a large selection of classes for working with JavaScript and TypeScript statements and expressions. -- :doc:`Data flow cheat sheet ` lists parts of the CodeQL libraries that are commonly used for variant analysis and in data flow queries. +- :doc:`Data flow cheat sheet for JavaScript `: This article describes parts of the JavaScript libraries commonly used for variant analysis and in data flow queries. -Other resources +Further reading --------------- - For examples of how to query common JavaScript elements, see the `JavaScript cookbook `__. diff --git a/docs/language/learn-ql/javascript/type-tracking.rst b/docs/language/learn-ql/javascript/type-tracking.rst index f0dadd9ef70..d192d98472e 100644 --- a/docs/language/learn-ql/javascript/type-tracking.rst +++ b/docs/language/learn-ql/javascript/type-tracking.rst @@ -1,9 +1,10 @@ -Tutorial: API modelling using type tracking -=========================================== +Using type tracking for API modeling +==================================== -This tutorial demonstrates how to build a simple model of the Firebase API -using the CodeQL type-tracking library for JavaScript. +You can track data through an API by creating a model using the CodeQL type-tracking library for JavaScript. +Overview +-------- The type-tracking library makes it possible to track values through properties and function calls, usually to recognize method calls and properties accessed on a specific type of object. @@ -489,7 +490,7 @@ Prefer type tracking when: Prefer data-flow configurations when: - Tracking user-controlled data -- use `taint tracking `__. -- Differentiating between different kinds of user-controlled data -- use :doc:`flow labels `. +- Differentiating between different kinds of user-controlled data -- see :doc:`Using flow labels for precise data flow analysis `. - Tracking transformations of a value through generic utility functions. - Tracking values through string manipulation. - Generating a path from source to sink -- see :doc:`constructing path queries <../writing-queries/path-queries>`. @@ -517,9 +518,9 @@ Type tracking is used in a few places in the standard libraries: - The `Firebase `__ and `Socket.io `__ models use type tracking to track objects coming from their respective APIs. -What next? ----------- +Further reading +--------------- -- Find out more about QL in the `QL language handbook `__ and `QL language specification `__. -- Learn more about the query console in `Using the query console `__. -- Learn about writing precise data-flow analyses in :doc:`Advanced data-flow analysis using flow labels `. +- Find out more about QL in the `QL language reference `__. +- Learn more about the query console in `Using the query console `__ on LGTM.com. +- Learn about writing precise data-flow analyses in :doc:`Using flow labels for precise data flow analysis `. diff --git a/docs/language/learn-ql/locations.rst b/docs/language/learn-ql/locations.rst index df210819ef2..a06f428bef8 100644 --- a/docs/language/learn-ql/locations.rst +++ b/docs/language/learn-ql/locations.rst @@ -1,10 +1,13 @@ -Locations and strings for QL entities +Providing locations in CodeQL queries ===================================== .. Not sure how much of this topic needs to change, and what the title should be -Providing locations -------------------- +CodeQL includes mechanisms for extracting the location of elements in a codebase. Use these mechanisms when writing custom CodeQL queries and libraries to help display information to users. + + +About locations +--------------- When displaying information to the user, LGTM needs to be able to extract location information from the results of a query. In order to do this, all QL classes which can provide location information should do this by using one of the following mechanisms: diff --git a/docs/language/learn-ql/python/control-flow-graph.rst b/docs/language/learn-ql/python/control-flow-graph.rst deleted file mode 100644 index 099c252784b..00000000000 --- a/docs/language/learn-ql/python/control-flow-graph.rst +++ /dev/null @@ -1,9 +0,0 @@ -Python control flow graph -========================= - -:doc:`Back to tutorial: control flow analysis ` - -|Python control flow graph| - -.. |Python control flow graph| image:: ../../images/python-flow-graph.png - diff --git a/docs/language/learn-ql/python/control-flow.rst b/docs/language/learn-ql/python/control-flow.rst index fc41f59c933..9291f4dc907 100644 --- a/docs/language/learn-ql/python/control-flow.rst +++ b/docs/language/learn-ql/python/control-flow.rst @@ -1,7 +1,12 @@ -Tutorial: Control flow analysis -=============================== +Analyzing control flow in Python +================================ -To analyze the `Control-flow graph `__ of a ``Scope`` we can use the two CodeQL classes ``ControlFlowNode`` and ``BasicBlock``. These classes allow you to ask such questions as "can you reach point A from point B?" or "Is it possible to reach point B *without* going through point A?". To report results we use the class ``AstNode``, which represents a syntactic element and corresponds to the source code - allowing the results of the query to be more easily understood. +You can write CodeQL queries to explore the control-flow graph of a Python program, for example, to discover unreachable code or mutually exclusive blocks of code. + +About analyzing control flow +-------------------------------------- + +To analyze the control-flow graph of a ``Scope`` we can use the two CodeQL classes ``ControlFlowNode`` and ``BasicBlock``. These classes allow you to ask such questions as "can you reach point A from point B?" or "Is it possible to reach point B *without* going through point A?". To report results we use the class ``AstNode``, which represents a syntactic element and corresponds to the source code - allowing the results of the query to be more easily understood. For more information, see `Control-flow graph `__ on Wikipedia. The ``ControlFlowNode`` class ----------------------------- @@ -19,11 +24,18 @@ To show why this complex relation is required consider the following Python code finally: close_resource() -There are many paths through the above code. There are three different paths through the call to ``close_resource();`` one normal path, one path that breaks out of the loop, and one path where an exception is raised by ``might_raise()``. (An annotated flow graph can be seen :doc:`here `.) +There are many paths through the above code. There are three different paths through the call to ``close_resource();`` one normal path, one path that breaks out of the loop, and one path where an exception is raised by ``might_raise()``. + +An annotated flow graph: + +|Python control flow graph| + +.. |Python control flow graph| image:: ../../images/python-flow-graph.png The simplest use of the ``ControlFlowNode`` and ``AstNode`` classes is to find unreachable code. There is one ``ControlFlowNode`` per path through any ``AstNode`` and any ``AstNode`` that is unreachable has no paths flowing through it. Therefore, any ``AstNode`` without a corresponding ``ControlFlowNode`` is unreachable. -**Unreachable AST nodes** +Example finding unreachable AST nodes +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: ql @@ -33,9 +45,10 @@ The simplest use of the ``ControlFlowNode`` and ``AstNode`` classes is to find u where not exists(node.getAFlowNode()) select node -➤ `See this in the query console `__. The demo projects on LGTM.com all have some code that has no control flow node, and is therefore unreachable. However, since the ``Module`` class is also a subclass of the ``AstNode`` class, the query also finds any modules implemented in C or with no source code. Therefore, it is better to find all unreachable statements: +➤ `See this in the query console on LGTM.com `__. The demo projects on LGTM.com all have some code that has no control flow node, and is therefore unreachable. However, since the ``Module`` class is also a subclass of the ``AstNode`` class, the query also finds any modules implemented in C or with no source code. Therefore, it is better to find all unreachable statements. -**Unreachable statements** +Example finding unreachable statements +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: ql @@ -45,15 +58,15 @@ The simplest use of the ``ControlFlowNode`` and ``AstNode`` classes is to find u where not exists(s.getAFlowNode()) select s -➤ `See this in the query console `__. This query gives fewer results, but most of the projects have some unreachable nodes. These are also highlighted by the standard query: `Unreachable code `__. +➤ `See this in the query console on LGTM.com `__. This query gives fewer results, but most of the projects have some unreachable nodes. These are also highlighted by the standard "Unreachable code" query. For more information, see `Unreachable code `__ on LGTM.com. The ``BasicBlock`` class ------------------------ -The ``BasicBlock`` class represents a `basic block `__ of control flow nodes. The ``BasicBlock`` class is not that useful for writing queries directly, but is very useful for building complex analyses, such as data flow. The reason it is useful is that it shares many of the interesting properties of control flow nodes, such as what can reach what and what `dominates `__ what, but there are fewer basic blocks than control flow nodes - resulting in queries that are faster and use less memory. +The ``BasicBlock`` class represents a basic block of control flow nodes. The ``BasicBlock`` class is not that useful for writing queries directly, but is very useful for building complex analyses, such as data flow. The reason it is useful is that it shares many of the interesting properties of control flow nodes, such as, what can reach what, and what dominates what, but there are fewer basic blocks than control flow nodes - resulting in queries that are faster and use less memory. For more information, see `Basic block `__ and `Dominator `__ on Wikipedia. -Example: Finding mutually exclusive basic blocks -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Example finding mutually exclusive basic blocks +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Suppose we have the following Python code: @@ -84,7 +97,8 @@ However, by that definition, two basic blocks are mutually exclusive if they are Combining these conditions we get: -**Mutually exclusive blocks within the same function** +Example finding mutually exclusive blocks within the same function +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: ql @@ -98,10 +112,11 @@ Combining these conditions we get: ) select b1, b2 -➤ `See this in the query console `__. This typically gives a very large number of results, because it is a common occurrence in normal control flow. It is, however, an example of the sort of control-flow analysis that is possible. Control-flow analyses such as this are an important aid to data flow analysis which is covered in the next tutorial. +➤ `See this in the query console on LGTM.com `__. This typically gives a very large number of results, because it is a common occurrence in normal control flow. It is, however, an example of the sort of control-flow analysis that is possible. Control-flow analyses such as this are an important aid to data flow analysis. For more information, see :doc:`Analyzing data flow and tracking tainted data in Python `. -What next? ----------- +Further reading +--------------- -- Experiment with the worked examples in the tutorial topic :doc:`Taint tracking and data flow analysis in Python `. -- Find out more about QL in the `QL language handbook `__ and `QL language specification `__. +- ":doc:`Analyzing data flow and tracking tainted data in Python `" + +.. include:: ../../reusables/python-other-resources.rst diff --git a/docs/language/learn-ql/python/functions.rst b/docs/language/learn-ql/python/functions.rst index c3c8a5e6eac..20e47267825 100644 --- a/docs/language/learn-ql/python/functions.rst +++ b/docs/language/learn-ql/python/functions.rst @@ -1,7 +1,9 @@ -Tutorial: Functions +Functions in Python =================== -This example uses the standard CodeQL class ``Function`` (see :doc:`Introducing the Python libraries `). +You can use syntactic classes from the standard CodeQL library to find Python functions and identify calls to them. + +These examples use the standard CodeQL class `Function `__. For more information, see ":doc:`Introducing the Python libraries `." Finding all functions called "get..." ------------------------------------- @@ -24,7 +26,7 @@ Using the member predicate ``Function.getName()``, we can list all of the getter where f.getName().matches("get%") select f, "This is a function called get..." -➤ `See this in the query console `__. This query typically finds a large number of results. Usually, many of these results are for functions (rather than methods) which we are not interested in. +➤ `See this in the query console on LGTM.com `__. This query typically finds a large number of results. Usually, many of these results are for functions (rather than methods) which we are not interested in. Finding all methods called "get..." ----------------------------------- @@ -39,7 +41,7 @@ You can modify the query above to return more interesting results. As we are onl where f.getName().matches("get%") and f.isMethod() select f, "This is a method called get..." -➤ `See this in the query console `__. This finds methods whose name starts with ``"get"``, but many of those are not the sort of simple getters we are interested in. +➤ `See this in the query console on LGTM.com `__. This finds methods whose name starts with ``"get"``, but many of those are not the sort of simple getters we are interested in. Finding one line methods called "get..." ---------------------------------------- @@ -55,7 +57,7 @@ We can modify the query further to include only methods whose body consists of a and count(f.getAStmt()) = 1 select f, "This function is (probably) a getter." -➤ `See this in the query console `__. This query returns fewer results, but if you examine the results you can see that there are still refinements to be made. This is refined further in :doc:`Tutorial: Statements and expressions `. +➤ `See this in the query console on LGTM.com `__. This query returns fewer results, but if you examine the results you can see that there are still refinements to be made. This is refined further in ":doc:`Expressions and statements in Python `." Finding a call to a specific function ------------------------------------- @@ -70,14 +72,18 @@ This query uses ``Call`` and ``Name`` to find calls to the function ``eval`` - w where call.getFunc() = name and name.getId() = "eval" select call, "call to 'eval'." -➤ `See this in the query console `__. Some of the demo projects on LGTM.com use this function. +➤ `See this in the query console on LGTM.com `__. Some of the demo projects on LGTM.com use this function. The ``Call`` class represents calls in Python. The ``Call.getFunc()`` predicate gets the expression being called. ``Name.getId()`` gets the identifier (as a string) of the ``Name`` expression. Due to the dynamic nature of Python, this query will select any call of the form ``eval(...)`` regardless of whether it is a call to the built-in function ``eval`` or not. In a later tutorial we will see how to use the type-inference library to find calls to the built-in function ``eval`` regardless of name of the variable called. -What next? ----------- +Further reading +--------------- -- Experiment with the worked examples in the following tutorial topics: :doc:`Statements and expressions `, :doc:`Control flow `, and :doc:`Points-to analysis and type inference `. -- Find out more about QL in the `QL language handbook `__ and `QL language specification `__. +- ":doc:`Expressions and statements in Python `" +- ":doc:`Pointer analysis and type inference in Python `" +- ":doc:`Analyzing control flow in Python `" +- ":doc:`Analyzing data flow and tracking tainted data in Python `" + +.. include:: ../../reusables/python-other-resources.rst diff --git a/docs/language/learn-ql/python/introduce-libraries-python.rst b/docs/language/learn-ql/python/introduce-libraries-python.rst index 54276aedd8e..d124277d0b5 100644 --- a/docs/language/learn-ql/python/introduce-libraries-python.rst +++ b/docs/language/learn-ql/python/introduce-libraries-python.rst @@ -1,17 +1,19 @@ -Introducing the CodeQL libraries for Python -=========================================== +CodeQL library for Python +========================= -There is an extensive library for analyzing CodeQL databases extracted from Python projects. The classes in this library present the data from a database in an object-oriented form and provide abstractions and predicates to help you with common analysis tasks. The library is implemented as a set of QL modules, that is, files with the extension ``.qll``. The module ``python.qll`` imports all the core Python library modules, so you can include the complete library by beginning your query with: +When you need to analyze a Python program, you can make use of the large collection of classes in the CodeQL library for Python. + +About the CodeQL library for Python +----------------------------------- + +The CodeQL library for each programming language uses classes with abstractions and predicates to present data in an object-oriented form. + +Each CodeQL library is implemented as a set of QL modules, that is, files with the extension ``.qll``. The module ``python.qll`` imports all the core Python library modules, so you can include the complete library by beginning your query with: .. code-block:: ql import python -The rest of this tutorial summarizes the contents of the standard libraries for Python. We recommend that you read this and then work through the practical examples in the tutorials shown at the end of the page. - -Overview of the library ------------------------ - The CodeQL library for Python incorporates a large number of classes. Each class corresponds either to one kind of entity in Python source code or to an entity that can be derived from the source code using static analysis. These classes can be divided into four categories: - **Syntactic** - classes that represent entities in the Python source code. @@ -20,16 +22,14 @@ The CodeQL library for Python incorporates a large number of classes. Each class - **Taint tracking** - classes that represent the source, sinks and kinds of taint used to implement taint-tracking queries. Syntactic classes -~~~~~~~~~~~~~~~~~ +----------------- -This part of the library represents the Python source code. The ``Module``, ``Class``, and ``Function`` classes correspond to Python modules, classes, and functions respectively, collectively these are known as ``Scope`` classes. Each ``Scope`` contains a list of statements each of which is represented by a subclass of the class ``Stmt``. Statements themselves can contain other statements or expressions which are represented by subclasses of ``Expr``. Finally, there are a few additional classes for the parts of more complex expressions such as list comprehensions. Collectively these classes are subclasses of ``AstNode`` and form an `Abstract syntax tree `__ (AST). The root of each AST is a ``Module``. - -`Symbolic information `__ is attached to the AST in the form of variables (represented by the class ``Variable``). +This part of the library represents the Python source code. The ``Module``, ``Class``, and ``Function`` classes correspond to Python modules, classes, and functions respectively, collectively these are known as ``Scope`` classes. Each ``Scope`` contains a list of statements each of which is represented by a subclass of the class ``Stmt``. Statements themselves can contain other statements or expressions which are represented by subclasses of ``Expr``. Finally, there are a few additional classes for the parts of more complex expressions such as list comprehensions. Collectively these classes are subclasses of ``AstNode`` and form an Abstract syntax tree (AST). The root of each AST is a ``Module``. Symbolic information is attached to the AST in the form of variables (represented by the class ``Variable``). For more information, see `Abstract syntax tree `__ and `Symbolic information `__ on Wikipedia. Scope ^^^^^ -A Python program is a group of modules. Technically a module is just a list of statements, but we often think of it as composed of classes and functions. These top-level entities, the module, class, and function are represented by the three CodeQL classes (`Module `__, `Class `__ and `Function `__ which are all subclasses of ``Scope``. +A Python program is a group of modules. Technically a module is just a list of statements, but we often think of it as composed of classes and functions. These top-level entities, the module, class, and function are represented by the three CodeQL classes `Module `__, `Class `__ and `Function `__ which are all subclasses of ``Scope``. - ``Scope`` @@ -47,7 +47,7 @@ All scopes are basically a list of statements, although ``Scope`` classes have a where f.getScope() instanceof Function select f -➤ `See this in the query console `__. Many projects have nested functions. +➤ `See this in the query console on LGTM.com `__. Many projects have nested functions. Statement ^^^^^^^^^ @@ -89,7 +89,7 @@ As an example, to find expressions of the form ``a+2`` where the left is a simpl where bin.getLeft() instanceof Name and bin.getRight() instanceof Num select bin -➤ `See this in the query console `__. Many projects include examples of this pattern. +➤ `See this in the query console on LGTM.com `__. Many projects include examples of this pattern. Variable ^^^^^^^^ @@ -120,7 +120,7 @@ For our first example, we can find all ``finally`` blocks by using the ``Try`` c from Try t select t.getFinalbody() -➤ `See this in the query console `__. Many projects include examples of this pattern. +➤ `See this in the query console on LGTM.com `__. Many projects include examples of this pattern. 2. Finding ``except`` blocks that do nothing '''''''''''''''''''''''''''''''''''''''''''' @@ -151,7 +151,7 @@ Both forms are equivalent. Using the positive expression, the whole query looks where forall(Stmt s | s = ex.getAStmt() | s instanceof Pass) select ex -➤ `See this in the query console `__. Many projects include pass-only ``except`` blocks. +➤ `See this in the query console on LGTM.com `__. Many projects include pass-only ``except`` blocks. Summary ^^^^^^^ @@ -237,11 +237,14 @@ Other - ``Comment`` – A comment Control flow classes -~~~~~~~~~~~~~~~~~~~~ +-------------------- -This part of the library represents the control flow graph of each ``Scope`` (classes, functions, and modules). Each ``Scope`` contains a graph of ``ControlFlowNode`` elements. Each scope has a single entry point and at least one (potentially many) exit points. To speed up control and data flow analysis, control flow nodes are grouped into `basic blocks `__. +This part of the library represents the control flow graph of each ``Scope`` (classes, functions, and modules). Each ``Scope`` contains a graph of ``ControlFlowNode`` elements. Each scope has a single entry point and at least one (potentially many) exit points. To speed up control and data flow analysis, control flow nodes are grouped into basic blocks. For more information, see `Basic block `__ on Wikipedia. -As an example, we might want to find the longest sequence of code without any branches. A ``BasicBlock`` is, by definition, a sequence of code without any branches, so we just need to find the longest ``BasicBlock``. +Example +^^^^^^^ + +If we want to find the longest sequence of code without any branches, we need to consider control flow. A ``BasicBlock`` is, by definition, a sequence of code without any branches, so we just need to find the longest ``BasicBlock``. First of all we introduce a simple predicate ``bb_length()`` which relates ``BasicBlock``\ s to their length. @@ -269,7 +272,7 @@ Using this predicate we can select the longest ``BasicBlock`` by selecting the ` where bb_length(b) = max(bb_length(_)) select b -➤ `See this in the query console `__. When we ran it on the LGTM.com demo projects, the *openstack/nova* and *ytdl-org/youtube-dl* projects both contained source code results for this query. +➤ `See this in the query console on LGTM.com `__. When we ran it on the LGTM.com demo projects, the *openstack/nova* and *ytdl-org/youtube-dl* projects both contained source code results for this query. .. pull-quote:: @@ -289,7 +292,12 @@ The classes in the control-flow part of the library are: Type-inference classes ---------------------- -The CodeQL library for Python also supplies some classes for accessing the inferred types of values. The classes ``Value`` and ``ClassValue`` allow you to query the possible classes that an expression may have at runtime. For example, which ``ClassValue``\ s are iterable can be determined using the query: +The CodeQL library for Python also supplies some classes for accessing the inferred types of values. The classes ``Value`` and ``ClassValue`` allow you to query the possible classes that an expression may have at runtime. + +Example +^^^^^^^ + +For example, which ``ClassValue``\ s are iterable can be determined using the query: **Find iterable "ClassValue"s** @@ -301,10 +309,10 @@ The CodeQL library for Python also supplies some classes for accessing the infer where cls.hasAttribute("__iter__") select cls -➤ `See this in the query console `__ This query returns a list of classes for the projects analyzed. If you want to include the results for `builtin classes `__, which do not have any Python source code, show the non-source results. +➤ `See this in the query console on LGTM.com `__ This query returns a list of classes for the projects analyzed. If you want to include the results for ``builtin`` classes, which do not have any Python source code, show the non-source results. For more information, see `builtin classes `__ in the Python documentation. Summary -~~~~~~~ +^^^^^^^ - `Value `__ @@ -312,7 +320,7 @@ Summary - ``CallableValue`` - ``ModuleValue`` -These classes are explained in more detail in :doc:`Tutorial: Points-to analysis and type inference `. +For more information about these classes, see ":doc:`Pointer analysis and type inference in Python `." Taint-tracking classes ---------------------- @@ -321,16 +329,21 @@ The CodeQL library for Python also supplies classes to specify taint-tracking an Summary -~~~~~~~ +^^^^^^^ - `TaintKind `__ - `Configuration `__ -These classes are explained in more detail in :doc:`Tutorial: Taint tracking and data flow analysis in Python `. +For more information about these classes, see ":doc:`Analyzing data flow and tracking tainted data in Python `." -What next? ----------- +Further reading +--------------- -- Experiment with the worked examples in the following tutorial topics: :doc:`Functions `, :doc:`Statements and expressions `, :doc:`Control flow `, :doc:`Points-to analysis and type inference `, and :doc:`Taint tracking and data flow analysis in Python `. -- Find out more about QL in the `QL language handbook `__ and `QL language specification `__. +- ":doc:`Functions in Python `" +- ":doc:`Expressions and statements in Python `" +- ":doc:`Pointer analysis and type inference in Python `" +- ":doc:`Analyzing control flow in Python `" +- ":doc:`Analyzing data flow and tracking tainted data in Python `" + +.. include:: ../../reusables/python-other-resources.rst diff --git a/docs/language/learn-ql/python/pointsto-type-infer.rst b/docs/language/learn-ql/python/pointsto-type-infer.rst index 7ae9368d02c..8fbde0d9b35 100644 --- a/docs/language/learn-ql/python/pointsto-type-infer.rst +++ b/docs/language/learn-ql/python/pointsto-type-infer.rst @@ -1,7 +1,7 @@ -Tutorial: Points-to analysis and type inference -=============================================== +Pointer analysis and type inference in Python +============================================= -This topic contains worked examples of how to write queries using the standard CodeQL library classes for Python type inference. +At runtime, each Python expression has a value with an associated type. You can learn how an expression behaves at runtime by using type-inference classes from the standard CodeQL library. The ``Value`` class -------------------- @@ -9,7 +9,7 @@ The ``Value`` class The ``Value`` class and its subclasses ``FunctionValue``, ``ClassValue``, and ``ModuleValue`` represent the values an expression may hold at runtime. Summary -~~~~~~~ +^^^^^^^ Class hierarchy for ``Value``: @@ -22,9 +22,7 @@ Class hierarchy for ``Value``: Points-to analysis and type inference ------------------------------------- -Points-to analysis, sometimes known as `pointer analysis `__, allows us to determine which objects an expression may "point to" at runtime. - -`Type inference `__ allows us to infer what the types (classes) of an expression may be at runtime. +Points-to analysis, sometimes known as pointer analysis, allows us to determine which objects an expression may "point to" at runtime. Type inference allows us to infer what the types (classes) of an expression may be at runtime. For more information, see `Pointer analysis `__ and `Type inference `__ on Wikipedia. The predicate ``ControlFlowNode.pointsTo(...)`` shows which object a control flow node may "point to" at runtime. @@ -76,7 +74,7 @@ First we can write a query to find ordered pairs of ``except`` blocks for a ``tr ) select t, ex1, ex2 -➤ `See this in the query console `__. Many projects contain ordered ``except`` blocks in a ``try`` statement. +➤ `See this in the query console on LGTM.com `__. Many projects contain ordered ``except`` blocks in a ``try`` statement. Here ``ex1`` and ``ex2`` are both ``except`` handlers in the ``try`` statement ``t``. By using the indices ``i`` and ``j`` we can also ensure that ``ex1`` precedes ``ex2``. @@ -123,7 +121,7 @@ Combining the parts of the query we get this: ) select t, ex1, ex2 -➤ `See this in the query console `__. This query finds only one result in the demo projects on LGTM.com (`youtube-dl `__). The result is also highlighted by the standard query: `Unreachable 'except' block `__. +➤ `See this in the query console on LGTM.com `__. This query finds only one result in the demo projects on LGTM.com (`youtube-dl `__). The result is also highlighted by the standard "Unreachable 'except' block" query. For more information, see `Unreachable 'except' block `__ on LGTM.com. .. pull-quote:: @@ -158,7 +156,7 @@ Then we need to determine if the object ``iter`` is iterable. We can test ``Clas not exists(cls.lookup("__iter__")) select loop, cls -➤ `See this in the query console `__. Many projects use a non-iterable as a loop iterator. +➤ `See this in the query console on LGTM.com `__. Many projects use a non-iterable as a loop iterator. Many of the results shown will have ``cls`` as ``NoneType``. It is more informative to show where these ``None`` values may come from. To do this we use the final field of ``pointsTo``, as follows: @@ -174,7 +172,7 @@ Many of the results shown will have ``cls`` as ``NoneType``. It is more informat not cls.hasAttribute("__iter__") select loop, cls, origin -➤ `See this in the query console `__. This reports the same results, but with a third column showing the source of the ``None`` values. +➤ `See this in the query console on LGTM.com `__. This reports the same results, but with a third column showing the source of the ``None`` values. Finding calls using call-graph analysis ---------------------------------------------------- @@ -183,7 +181,7 @@ The ``Value`` class has a method ``getACall()`` which allows us to find calls to If we wish to restrict the callables to actual functions we can use the ``FunctionValue`` class, which is a subclass of ``Value`` and corresponds to function objects in Python, in much the same way as the ``ClassValue`` class corresponds to class objects in Python. -Returning to an example from :doc:`Tutorial: Functions `, we wish to find calls to the ``eval`` function. +Returning to an example from ":doc:`Functions in Python `," we wish to find calls to the ``eval`` function. The original query looked this: @@ -195,7 +193,7 @@ The original query looked this: where call.getFunc() = name and name.getId() = "eval" select call, "call to 'eval'." -➤ `See this in the query console `__. Some of the demo projects on LGTM.com have calls that match this pattern. +➤ `See this in the query console on LGTM.com `__. Some of the demo projects on LGTM.com have calls that match this pattern. There are two problems with this query: @@ -223,10 +221,12 @@ Then we can use ``Value.getACall()`` to identify calls to the ``eval`` function, call = eval.getACall() select call, "call to 'eval'." -➤ `See this in the query console `__. This accurately identifies calls to the builtin ``eval`` function even when they are referred to using an alternative name. Any false positive results with calls to other ``eval`` functions, reported by the original query, have been eliminated. +➤ `See this in the query console on LGTM.com `__. This accurately identifies calls to the builtin ``eval`` function even when they are referred to using an alternative name. Any false positive results with calls to other ``eval`` functions, reported by the original query, have been eliminated. -What next? ----------- +Further reading +--------------- -- Find out more about QL in the `QL language handbook `__ and `QL language specification `__. -- Read a description of the CodeQL database in :doc:`What's in a CodeQL database? <../database>` +- ":doc:`Analyzing control flow in Python `" +- ":doc:`Analyzing data flow and tracking tainted data in Python `" + +.. include:: ../../reusables/python-other-resources.rst diff --git a/docs/language/learn-ql/python/ql-for-python.rst b/docs/language/learn-ql/python/ql-for-python.rst index 680c0c374b5..c9aa0a241e6 100644 --- a/docs/language/learn-ql/python/ql-for-python.rst +++ b/docs/language/learn-ql/python/ql-for-python.rst @@ -1,37 +1,35 @@ CodeQL for Python ================= +Experiment and learn how to write effective and efficient queries for CodeQL databases generated from Python codebases. + .. toctree:: - :glob: :hidden: introduce-libraries-python functions statements-expressions - control-flow - control-flow-graph - taint-tracking pointsto-type-infer + control-flow + taint-tracking -The following tutorials and worked examples are designed to help you learn how to write effective and efficient queries for Python projects. You should work through these topics in the order displayed. +- `Basic Python query `__ : Learn to write and run a simple CodeQL query using LGTM. -- `Basic Python query `__ describes how to write and run queries using LGTM. +- :doc:`CodeQL library for Python `: When you need to analyze a Python program, you can make use of the large collection of classes in the CodeQL library for Python. -- :doc:`Introducing the CodeQL libraries for Python ` introduces the standard libraries used to write queries for Python code. +- :doc:`Functions in Python `: You can use syntactic classes from the standard CodeQL library to find Python functions and identify calls to them. -- :doc:`Tutorial: Functions ` demonstrates how to write queries using the standard CodeQL library classes for Python functions. +- :doc:`Expressions and statements in Python `: You can use syntactic classes from the CodeQL library to explore how Python expressions and statements are used in a codebase. -- :doc:`Tutorial: Statements and expressions ` demonstrates how to write queries using the standard CodeQL library classes for Python statements and expressions. +- :doc:`Analyzing control flow in Python `: You can write CodeQL queries to explore the control-flow graph of a Python program, for example, to discover unreachable code or mutually exclusive blocks of code. -- :doc:`Tutorial: Control flow ` demonstrates how to write queries using the standard CodeQL library classes for Python control flow. +- :doc:`Pointer analysis and type inference in Python `: At runtime, each Python expression has a value with an associated type. You can learn how an expression behaves at runtime by using type-inference classes from the standard CodeQL library. -- :doc:`Tutorial: Points-to analysis and type inference ` demonstrates how to write queries using the standard CodeQL library classes for Python type inference. +- :doc:`Analyzing data flow and tracking tainted data in Python `: You can use CodeQL to track the flow of data through a Python program. Tracking user-controlled, or tainted, data is a key technique for security researchers. -- :doc:`Taint tracking and data flow analysis in Python ` demonstrates how to write queries using the standard taint tracking and data flow libraries for Python. - -Other resources +Further reading --------------- -- For examples of how to query common Python elements, see the `Python cookbook `__. -- For the queries used in LGTM, display a `Python query `__ and click **Open in query console** to see the code used to find alerts. -- For more information about the library for Python see the `CodeQL library for Python `__. +- For examples of how to query common Python elements, see the `JavaScript cookbook `__. +- For the queries used in LGTM, display a `Python query `__ and click **Open in query console** to see the code used to find alerts. +- For more information about the library for JavaScript see the `CodeQL library for Python `__. diff --git a/docs/language/learn-ql/python/statements-expressions.rst b/docs/language/learn-ql/python/statements-expressions.rst index d3b4e68af6c..9e817d5c5c6 100644 --- a/docs/language/learn-ql/python/statements-expressions.rst +++ b/docs/language/learn-ql/python/statements-expressions.rst @@ -1,6 +1,8 @@ -Tutorial: Statements and expressions +Expressions and statements in Python ==================================== +You can use syntactic classes from the CodeQL library to explore how Python expressions and statements are used in a code base. + Statements ---------- @@ -37,13 +39,11 @@ Here is the full class hierarchy: - ``While`` – A ``while`` statement - ``With`` – A ``with`` statement -Example: Finding redundant 'global' statements -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Example finding redundant 'global' statements +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``global`` statement in Python declares a variable with a global (module-level) scope, when it would otherwise be local. Using the ``global`` statement outside a class or function is redundant as the variable is already global. -**Finding redundant global statements** - .. code-block:: ql import python @@ -52,17 +52,15 @@ The ``global`` statement in Python declares a variable with a global (module-lev where g.getScope() instanceof Module select g -➤ `See this in the query console `__. None of the demo projects on LGTM.com has a global statement that matches this pattern. +➤ `See this in the query console on LGTM.com `__. None of the demo projects on LGTM.com has a global statement that matches this pattern. The line: ``g.getScope() instanceof Module`` ensures that the ``Scope`` of ``Global g`` is a ``Module``, rather than a class or function. -Example: Finding 'if' statements with redundant branches --------------------------------------------------------- +Example finding 'if' statements with redundant branches +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ An ``if`` statement where one branch is composed of just ``pass`` statements could be simplified by negating the condition and dropping the ``else`` clause. -**An 'if' statement that could be simplified** - .. code-block:: python if cond(): @@ -70,9 +68,7 @@ An ``if`` statement where one branch is composed of just ``pass`` statements cou else: do_something -To find statements like this we can run the following query: - -**Find 'if' statements with empty branches** +To find statements like this that could be simplified we can write a query. .. code-block:: ql @@ -83,7 +79,7 @@ To find statements like this we can run the following query: and forall(Stmt p | p = l.getAnItem() | p instanceof Pass) select i -➤ `See this in the query console `__. Many projects have some ``if`` statements that match this pattern. +➤ `See this in the query console on LGTM.com `__. Many projects have some ``if`` statements that match this pattern. The line: ``(l = i.getBody() or l = i.getOrelse())`` restricts the ``StmtList l`` to branches of the ``if`` statement. @@ -131,8 +127,8 @@ Each kind of Python expression has its own class. Here is the full class hierarc - ``Yield`` – A ``yield`` expression - ``YieldFrom`` – A ``yield from`` expression (Python 3.3+) -Example: Finding comparisons to integer or string literals using 'is' -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Example finding comparisons to integer or string literals using 'is' +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Python implementations commonly cache small integers and single character strings, which means that comparisons such as the following often work correctly, but this is not guaranteed and we might want to check for them. @@ -141,9 +137,7 @@ Python implementations commonly cache small integers and single character string x is 10 x is "A" -We can check for these as follows: - -**Find comparisons to integer or string literals using** ``is`` +We can check for these using a query. .. code-block:: ql @@ -154,7 +148,7 @@ We can check for these as follows: and cmp.getOp(0) instanceof Is and cmp.getComparator(0) = literal select cmp -➤ `See this in the query console `__. Two of the demo projects on LGTM.com use this pattern: *saltstack/salt* and *openstack/nova*. +➤ `See this in the query console on LGTM.com `__. Two of the demo projects on LGTM.com use this pattern: *saltstack/salt* and *openstack/nova*. The clause ``cmp.getOp(0) instanceof Is and cmp.getComparator(0) = literal`` checks that the first comparison operator is "is" and that the first comparator is a literal. @@ -164,15 +158,11 @@ The clause ``cmp.getOp(0) instanceof Is and cmp.getComparator(0) = literal`` che We have to use ``cmp.getOp(0)`` and ``cmp.getComparator(0)``\ as there is no ``cmp.getOp()`` or ``cmp.getComparator()``. The reason for this is that a ``Compare`` expression can have multiple operators. For example, the expression ``3 < x < 7`` has two operators and two comparators. You use ``cmp.getComparator(0)`` to get the first comparator (in this example the ``3``) and ``cmp.getComparator(1)`` to get the second comparator (in this example the ``7``). -Example: Duplicates in dictionary literals -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Example finding duplicates in dictionary literals +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If there are duplicate keys in a Python dictionary, then the second key will overwrite the first, which is almost certainly a mistake. We can find these duplicates with CodeQL, but the query is more complex than previous examples and will require us to write a ``predicate`` as a helper. -Here is the query: - -**Find duplicate dictionary keys** - .. code-block:: ql import python @@ -188,7 +178,7 @@ Here is the query: and k1 != k2 and same_key(k1, k2) select k1, "Duplicate key in dict literal" -➤ `See this in the query console `__. When we ran this query on LGTM.com, the source code of the *saltstack/salt* project contained an example of duplicate dictionary keys. The results were also highlighted as alerts by the standard `Duplicate key in dict literal `__ query. Two of the other demo projects on LGTM.com refer to duplicate dictionary keys in library files. +➤ `See this in the query console on LGTM.com `__. When we ran this query on LGTM.com, the source code of the *saltstack/salt* project contained an example of duplicate dictionary keys. The results were also highlighted as alerts by the standard "Duplicate key in dict literal" query. Two of the other demo projects on LGTM.com refer to duplicate dictionary keys in library files. For more information, see `Duplicate key in dict literal `__ on LGTM.com. The supporting predicate ``same_key`` checks that the keys have the same identifier. Separating this part of the logic into a supporting predicate, instead of directly including it in the query, makes it easier to understand the query as a whole. The casts defined in the predicate restrict the expression to the type specified and allow predicates to be called on the type that is cast-to. For example: @@ -204,12 +194,10 @@ is equivalent to The short version is usually used as this is easier to read. -Example: Finding Java-style getters -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Example finding Java-style getters +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Returning to the example from :doc:`Tutorial: Functions `, the query identified all methods with a single line of code and a name starting with ``get``: - -**Basic: Find Java-style getters** +Returning to the example from ":doc:`Functions in Python `," the query identified all methods with a single line of code and a name starting with ``get``. .. code-block:: ql @@ -220,9 +208,7 @@ Returning to the example from :doc:`Tutorial: Functions `, the query and count(f.getAStmt()) = 1 select f, "This function is (probably) a getter." -This basic query can be improved by checking that the one line of code is of the form ``return self.attr`` - -**Improved: Find Java-style getters** +This basic query can be improved by checking that the one line of code is a Java-style getter of the form ``return self.attr``. .. code-block:: ql @@ -234,23 +220,19 @@ This basic query can be improved by checking that the one line of code is of the and attr.getObject() = self and self.getId() = "self" select f, "This function is a Java-style getter." -➤ `See this in the query console `__. Of the demo projects on LGTM.com, only the *openstack/nova* project has examples of functions that appear to be Java-style getters. - -In this query, the condition: +➤ `See this in the query console on LGTM.com `__. Of the demo projects on LGTM.com, only the *openstack/nova* project has examples of functions that appear to be Java-style getters. .. code-block:: ql ret = f.getStmt(0) and ret.getValue() = attr -checks that the first line in the method is a return statement and that the expression returned (``ret.getValue()``) is an ``Attribute`` expression. Note that the equality ``ret.getValue() = attr`` means that ``ret.getValue()`` is restricted to ``Attribute``\ s, since ``attr`` is an ``Attribute``. - -The condition: +This condition checks that the first line in the method is a return statement and that the expression returned (``ret.getValue()``) is an ``Attribute`` expression. Note that the equality ``ret.getValue() = attr`` means that ``ret.getValue()`` is restricted to ``Attribute``\ s, since ``attr`` is an ``Attribute``. .. code-block:: ql attr.getObject() = self and self.getId() = "self" -checks that the value of the attribute (the expression to the left of the dot in ``value.attr``) is an access to a variable called ``"self"``. +This condition checks that the value of the attribute (the expression to the left of the dot in ``value.attr``) is an access to a variable called ``"self"``. Class and function definitions ------------------------------ @@ -271,8 +253,12 @@ Here is the relevant part of the class hierarchy: - ``Class`` - ``Function`` -What next? ----------- +Further reading +--------------- -- Experiment with the worked examples in the following tutorial topics: :doc:`Control flow ` and :doc:`Points-to analysis and type inference `. -- Find out more about QL in the `QL language handbook `__ and `QL language specification `__. +- ":doc:`Functions in Python `" +- ":doc:`Pointer analysis and type inference in Python `" +- ":doc:`Analyzing control flow in Python `" +- ":doc:`Analyzing data flow and tracking tainted data in Python `" + +.. include:: ../../reusables/python-other-resources.rst diff --git a/docs/language/learn-ql/python/taint-tracking.rst b/docs/language/learn-ql/python/taint-tracking.rst index 2ea24369bf4..bfdae7aa4eb 100644 --- a/docs/language/learn-ql/python/taint-tracking.rst +++ b/docs/language/learn-ql/python/taint-tracking.rst @@ -1,8 +1,10 @@ -Taint tracking and data flow analysis in Python -=============================================== +Analyzing data flow and tracking tainted data in Python +======================================================= -Overview --------- +You can use CodeQL to track the flow of data through a Python program. Tracking user-controlled, or tainted, data is a key technique for security researchers. + +About data flow and taint tracking +---------------------------------- Taint tracking is used to analyze how potentially insecure, or 'tainted' data flows throughout a program at runtime. You can use taint tracking to find out whether user-controlled input can be used in a malicious way, @@ -14,12 +16,12 @@ For example, in the assignment ``dir = path + "/"``, if ``path`` is tainted then even though there is no data flow from ``path`` to ``path + "/"``. Separate CodeQL libraries have been written to handle 'normal' data flow and taint tracking in :doc:`C/C++ <../cpp/dataflow>`, :doc:`C# <../csharp/dataflow>`, :doc:`Java <../java/dataflow>`, and :doc:`JavaScript <../javascript/dataflow>`. You can access the appropriate classes and predicates that reason about these different modes of data flow by importing the appropriate library in your query. -In Python analysis, we can use the same taint tracking library to model both 'normal' data flow and taint flow, but we are still able make the distinction between steps that preserve value and those that don't by defining additional data flow properties. +In Python analysis, we can use the same taint tracking library to model both 'normal' data flow and taint flow, but we are still able make the distinction between steps that preserve values and those that don't by defining additional data flow properties. -For further information on data flow and taint tracking with CodeQL, see :doc:`Introduction to data flow <../intro-to-data-flow>`. +For further information on data flow and taint tracking with CodeQL, see ":doc:`Introduction to data flow <../intro-to-data-flow>`." -Fundamentals of taint tracking and data flow analysis -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Fundamentals of taint tracking using data flow analysis +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The taint tracking library is in the `TaintTracking `__ module. Any taint tracking or data flow analysis query has three explicit components, one of which is optional, and an implicit component. @@ -39,7 +41,7 @@ The kind of taint determines which non-value-preserving steps are possible, in a In the above example ``dir = path + "/"``, taint flows from ``path`` to ``dir`` if the taint represents a string, but not if the taint is ``None``. Limitations -~~~~~~~~~~~ +^^^^^^^^^^^ Although taint tracking is a powerful technique, it is worth noting that it depends on the underlying data flow graphs. Creating a data flow graph that is both accurate and covers a large enough part of a program is a challenge, @@ -79,6 +81,9 @@ A simple taint tracking query has the basic form: where config.hasFlow(src, sink) select sink, "Alert message, including reference to $@.", src, "string describing the source" +Example +^^^^^^^ + As a contrived example, here is a query that looks for flow from a HTTP request to a function called ``"unsafe"``. The sources are predefined and accessed by importing library ``semmle.python.web.HttpRequest``. The sink is defined by using a custom ``TaintTracking::Sink`` class. @@ -126,8 +131,8 @@ The sink is defined by using a custom ``TaintTracking::Sink`` class. -Implementing path queries -~~~~~~~~~~~~~~~~~~~~~~~~~ +Converting a taint-tracking query to a path query +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Although the taint tracking query above tells which sources flow to which sinks, it doesn't tell us how. For that we need a path query. @@ -202,8 +207,8 @@ Thus, our example query becomes: -Custom taint kinds and flows ----------------------------- +Tracking custom taint kinds and flows +------------------------------------- In the above examples, we have assumed the existence of a suitable ``TaintKind``, but sometimes it is necessary to model the flow of other objects, such as database connections, or ``None``. @@ -226,8 +231,8 @@ The ``TaintKind`` itself is just a string (a QL string, not a CodeQL entity repr which provides methods to extend flow and allow the kind of taint to change along the path. The ``TaintKind`` class has many predicates allowing flow to be modified. This simplest ``TaintKind`` does not override any predicates, meaning that it only flows as opaque data. -An example of this is the `Hard-coded credentials query `_, -which defines the simplest possible taint kind class, ``HardcodedValue``, and custom source and sink classes. +An example of this is the "Hard-coded credentials" query, +which defines the simplest possible taint kind class, ``HardcodedValue``, and custom source and sink classes. For more information, see `Hard-coded credentials `_ on LGTM.com. .. code-block:: ql @@ -251,8 +256,11 @@ which defines the simplest possible taint kind class, ``HardcodedValue``, and cu } } -What next? ----------- +Further reading +--------------- -- Experiment with the worked examples in the following tutorial topics: :doc:`Control flow ` and :doc:`Points-to analysis and type inference `. -- Find out more about QL in the `QL language handbook `__ and `QL language specification `__. +- ":doc:`Pointer analysis and type inference in Python `" +- ":doc:`Analyzing control flow in Python `" +- ":doc:`Analyzing data flow and tracking tainted data in Python `" + +.. include:: ../../reusables/python-other-resources.rst diff --git a/docs/language/learn-ql/ql-training.rst b/docs/language/learn-ql/ql-training.rst index 55c3d33f705..d0eac290a56 100644 --- a/docs/language/learn-ql/ql-training.rst +++ b/docs/language/learn-ql/ql-training.rst @@ -57,9 +57,8 @@ CodeQL and variant analysis for Java - `Exercise: Apache Struts <../ql-training/java/apache-struts-java.html>`__–an example demonstrating how to develop a data flow query. - `Introduction to global data flow <../ql-training/java/global-data-flow-java.html>`__–an introduction to analyzing global data flow in Java using CodeQL. -More resources -~~~~~~~~~~~~~~ +Further reading +~~~~~~~~~~~~~~~ - If you are completely new to CodeQL, look at our introductory topics in :doc:`Learning CodeQL `. -- To find more detailed information about how to write queries for specific languages, visit the links in :ref:`Writing CodeQL queries `. -- To see examples of CodeQL queries that have been used to find security vulnerabilities and bugs in open-source software projects, visit the `GitHub Security Lab website `__ and the associated `repository `__. \ No newline at end of file +- To see examples of CodeQL queries that have been used to find security vulnerabilities and bugs in open source software projects, visit the `GitHub Security Lab website `__ and the associated `repository `__. \ No newline at end of file diff --git a/docs/language/learn-ql/writing-queries/debugging-queries.rst b/docs/language/learn-ql/writing-queries/debugging-queries.rst index b0b03321354..9eee078d18c 100644 --- a/docs/language/learn-ql/writing-queries/debugging-queries.rst +++ b/docs/language/learn-ql/writing-queries/debugging-queries.rst @@ -1,12 +1,17 @@ -Query writing: common performance issues -======================================== +Troubleshooting query performance +================================= + +Improve the performance of your CodeQL queries by following a few simple guidelines. + +About query performance +----------------------- This topic offers some simple tips on how to avoid common problems that can affect the performance of your queries. Before reading the tips below, it is worth reiterating a few important points about CodeQL and the QL language: - CodeQL `predicates `__ and `classes `__ are evaluated to database `tables `__. Large predicates generate large tables with many rows, and are therefore expensive to compute. -- The QL language is implemented using standard database operations and `relational algebra `__ (such as join, projection, and union). For further information about query languages and databases, see :doc:`About QL <../about-ql>`. -- Queries are evaluated *bottom-up*, which means that a predicate is not evaluated until *all* of the predicates that it depends on are evaluated. For more information on query evaluation, see `Evaluation of QL programs `__ in the QL handbook. +- The QL language is implemented using standard database operations and `relational algebra `__ (such as join, projection, and union). For further information about query languages and databases, see `About the QL language `__. +- Queries are evaluated *bottom-up*, which means that a predicate is not evaluated until *all* of the predicates that it depends on are evaluated. For more information on query evaluation, see `Evaluation of QL programs `__. Performance tips ---------------- @@ -19,9 +24,7 @@ Eliminate cartesian products The performance of a predicate can often be judged by considering roughly how many results it has. One way of creating badly performing predicates is by using two variables without relating them in any way, or only relating them using a negation. This leads to computing the `Cartesian product `__ between the sets of possible values for each variable, potentially generating a huge table of results. - This can occur if you don't specify restrictions on your variables. - For instance, consider the following predicate that checks whether a Java method ``m`` may access a field ``f``:: predicate mayAccess(Method m, Field f) { @@ -148,4 +151,4 @@ Now the structure we want is clearer. We've separated out the easy part into its Further information ------------------- -- Find out more about QL in the `QL language handbook `__ and `QL language specification `__. +- Find out more about QL in the `QL language reference `__. diff --git a/docs/language/learn-ql/writing-queries/introduction-to-queries.rst b/docs/language/learn-ql/writing-queries/introduction-to-queries.rst index 76b419d4966..fc17a2e8d36 100644 --- a/docs/language/learn-ql/writing-queries/introduction-to-queries.rst +++ b/docs/language/learn-ql/writing-queries/introduction-to-queries.rst @@ -1,10 +1,12 @@ -Introduction to query files -########################### +About CodeQL queries +#################### + +CodeQL queries are used to analyze code for issues related to security, correctness, maintainability, and readability. Overview ******** -Queries are programs written with CodeQL. They are designed to highlight issues related to the security, correctness, maintainability, and readability of a code base. You can also write custom queries to find specific issues relevant to your own project. Three important types of query are: +CodeQL includes queries to find the most relevant and interesting problems for each supported language. You can also write custom queries to find specific issues relevant to your own project. The important types of query are: - **Alert queries**: queries that highlight issues in specific locations in your code. - **Path queries**: queries that describe the flow of information between a source and a sink in your code. @@ -21,9 +23,8 @@ You can add custom queries to `custom query packs `__ and in the `Results view `__ in VS Code. -This topic is a basic introduction to structuring query files. You can find further information on writing queries for specific programming languages `here `__, and detailed technical information about QL in the `QL language handbook `__ and the `QL language specification `__. -For information on how to format your code when contributing queries to the GitHub repository, see the `CodeQL style guide `__. - +This topic is a basic introduction to query files. You can find more information on writing queries for specific programming languages `here `__, and detailed technical information about QL in the `QL language reference `__. +For more information on how to format your code when contributing queries to the GitHub repository, see the `CodeQL style guide `__. Basic query structure ********************* @@ -44,7 +45,7 @@ Basic query structure where /* ... logical formula ... */ select /* ... expressions ... */ -The following sections describe the information that is typically included in a query file for alerts and metrics. Path queries are discussed in more detail in :doc:`Constructing path queries `. +The following sections describe the information that is typically included in a query file for alerts and metrics. Path queries are discussed in more detail in :doc:`Creating path queries `. Query metadata ============== @@ -54,7 +55,7 @@ Query metadata is used to identify your custom queries when they are added to th - If you are contributing a query to the GitHub repository, please read the `query metadata style guide `__. - If you are adding a custom query to a query pack for analysis using LGTM , see `Writing custom queries to include in LGTM analysis `__. - If you are analyzing a database using the `CodeQL CLI `__, your query metadata must contain ``@kind``. -- If you are running a query in the query console on LGTM or with the CodeQL extension for VS Code, metadata is not mandatory. However, if you want your results to be displayed as either an 'alert' or a 'path', you must specify the correct ``@kind`` property, as explained below. See `Using the query console `__ and `Using the extension `__ for further information. +- If you are running a query in the query console on LGTM or with the CodeQL extension for VS Code, metadata is not mandatory. However, if you want your results to be displayed as either an 'alert' or a 'path', you must specify the correct ``@kind`` property, as explained below. For more information, see `Using the query console `__ on LGTM.com and `Using the extension `__ in the CodeQL for VS Code help. .. pull-quote:: @@ -83,21 +84,20 @@ When writing your own alert queries, you would typically import the standard lib - JavaScript/TypeScript: ``javascript`` - Python: ``python`` -There are also libraries containing commonly used predicates, types, and other modules associated with different analyses, including data flow, control flow, and taint-tracking. In order to calculate path graphs, path queries require you to import a data flow library into the query file. See :doc:`Constructing path queries ` for further information. +There are also libraries containing commonly used predicates, types, and other modules associated with different analyses, including data flow, control flow, and taint-tracking. In order to calculate path graphs, path queries require you to import a data flow library into the query file. For more information, see :doc:`Creating path queries `. You can explore the contents of all the standard libraries in the `CodeQL library reference documentation `__ or in the `GitHub repository `__. - Optional CodeQL classes and predicates -------------------------------------- -You can customize your analysis by defining your own predicates and classes in the query. See `Defining a predicate `__ and `Defining a class `__ for further details. +You can customize your analysis by defining your own predicates and classes in the query. For further information, see `Defining a predicate `__ and `Defining a class `__. From clause =========== The ``from`` clause declares the variables that are used in the query. Each declaration must be of the form `` ``. -For more information on the available `types `__, and to learn how to define your own types using `classes `__, see the `QL language handbook `__. +For more information on the available `types `__, and to learn how to define your own types using `classes `__, see the `QL language reference `__. Where clause ============ @@ -117,9 +117,9 @@ Select clauses for alert queries (``@kind problem``) consist of two 'columns', w - ``element``: a code element that is identified by the query, which defines where the alert is displayed. - ``string``: a message, which can also include links and placeholders, explaining why the alert was generated. -The alert message defined in the final column of the ``select`` statement can be developed to give more detail about the alert or path found by the query using links and placeholders. For further information, see :doc:`Defining 'select' statements `. +You can modify the alert message defined in the final column of the ``select`` statement to give more detail about the alert or path found by the query using links and placeholders. For further information, see :doc:`Defining the results of a query `. -Select clauses for path queries (``@kind path-problem``) are crafted to display both an alert and the source and sink of an associated path graph. See :doc:`Constructing path queries ` for further information. +Select clauses for path queries (``@kind path-problem``) are crafted to display both an alert and the source and sink of an associated path graph. For more information, see :doc:`Creating path queries `. Select clauses for metric queries (``@kind metric``) consist of two 'columns', with the following structure:: @@ -128,16 +128,34 @@ Select clauses for metric queries (``@kind metric``) consist of two 'columns', w - ``element``: a code element that is identified by the query, which defines where the alert is displayed. - ``metric``: the result of the metric that the query computes. +Viewing the standard CodeQL queries +*********************************** + +One of the easiest ways to get started writing your own queries is to modify an existing query. To view the standard CodeQL queries, or to try out other examples, visit the `CodeQL `__ and `CodeQL for Go `__ repositories on GitHub. + +You can also find examples of queries developed to find security vulnerabilities and bugs in open source software projects on the `GitHub Security Lab website `__ and in the associated `repository `__. + +Contributing queries +******************** + +Contributions to the standard queries and libraries are very welcome. For more information, see our `contributing guidelines `__. +If you are contributing a query to the open source GitHub repository, writing a custom query for LGTM, or using a custom query in an analysis with the CodeQL CLI, then you need to include extra metadata in your query to ensure that the query results are interpreted and displayed correctly. See the following topics for more information on query metadata: + +- :doc:`Metadata for CodeQL queries ` +- `Query metadata style guide on GitHub `__ + +Query contributions to the open source GitHub repository may also have an accompanying query help file to provide information about their purpose for other users. For more information on writing query help, see the `Query help style guide on GitHub `__ and the :doc:`Query help files `. + Query help files **************** -When you write a custom query, we also recommend that you write a query help file to explain the purpose of the query to other users. For more information, see the `Query help style guide `__ on GitHub, and the :doc:`Query help reference `. +When you write a custom query, we also recommend that you write a query help file to explain the purpose of the query to other users. For more information, see the `Query help style guide `__ on GitHub, and the :doc:`Query help files `. What next? ========== - See the queries used in real-life variant analysis on the `GitHub Security Lab website `__. -- To learn more about writing path queries, see :doc:`Constructing path queries `. +- To learn more about writing path queries, see :doc:`Creating path queries `. - Take a look at the `built-in queries `__ to see examples of the queries included in CodeQL. - Explore the `query cookbooks `__ to see how to access the basic language elements contained in the CodeQL libraries. - For a full list of resources to help you learn CodeQL, including beginner tutorials and language-specific examples, visit `Learning CodeQL `__. diff --git a/docs/language/learn-ql/writing-queries/path-queries.rst b/docs/language/learn-ql/writing-queries/path-queries.rst index 7b3d52515c3..41e9d0742eb 100644 --- a/docs/language/learn-ql/writing-queries/path-queries.rst +++ b/docs/language/learn-ql/writing-queries/path-queries.rst @@ -1,5 +1,7 @@ -Constructing path queries -######################### +Creating path queries +##################### + +You can create path queries to visualize the flow of information through a codebase. Overview ======== @@ -24,7 +26,7 @@ For more language-specific information on analyzing data flow, see: - :doc:`Analyzing data flow in C# <../csharp/dataflow>` - :doc:`Analyzing data flow in Java <../java/dataflow>` - :doc:`Analyzing data flow in JavaScript/TypeScript <../javascript/dataflow>` -- :doc:`Taint tracking and data flow analysis in Python <../python/taint-tracking>` +- :doc:`Analyzing data flow and tracking tainted data in Python <../python/taint-tracking>` Path query examples ******************* @@ -95,7 +97,7 @@ Path query metadata ******************* Path query metadata must contain the property ``@kind path-problem``–this ensures that query results are interpreted and displayed correctly. -The other metadata requirements depend on how you intend to run the query. See the section on query metadata in :doc:`Introduction to query files ` for further information. +The other metadata requirements depend on how you intend to run the query. For more information, see `Query metadata `__. Generating path explanations **************************** @@ -185,7 +187,7 @@ Each result generated by your query is displayed at a single location in the sam The ``element`` that you select in the first column depends on the purpose of the query and the type of issue that it is designed to find. This is particularly important for security issues. For example, if you believe the ``source`` value to be globally invalid or malicious it may be best to display the alert at the ``source``. In contrast, you should consider displaying the alert at the ``sink`` if you believe it is the element that requires sanitization. -The alert message defined in the final column in the ``select`` statement can be developed to give more detail about the alert or path found by the query using links and placeholders. For further information, see :doc:`Defining 'select' statements `. +The alert message defined in the final column in the ``select`` statement can be developed to give more detail about the alert or path found by the query using links and placeholders. For more information, see :doc:`Defining the results of a query `. What next? ********** diff --git a/docs/language/learn-ql/writing-queries/query-help.rst b/docs/language/learn-ql/writing-queries/query-help.rst index c3692b7fac0..2bfc0db4b7e 100644 --- a/docs/language/learn-ql/writing-queries/query-help.rst +++ b/docs/language/learn-ql/writing-queries/query-help.rst @@ -1,5 +1,7 @@ -Query help reference -******************** +Query help files +**************** + +Query help files tell users the purpose of a query, and recommend how to solve the potential problem the query finds. This topic provides detailed information on the structure of query help files. For more information about how to write useful query help in a style that is consistent with the standard CodeQL queries, see the `Query help style guide `__ on GitHub. @@ -24,7 +26,7 @@ Each query help file provides detailed information about the purpose and use of Structure ========= -Query help files are written using an XML format called Qhelp (``.qhelp``). Query help files must have the same base name as the query they describe, and must be located in the same directory. The basic structure is as follows: +Query help files are written using a custom XML format, and stored in a file with a ``.qhelp`` extension. Query help files must have the same base name as the query they describe, and must be located in the same directory. The basic structure is as follows: .. code-block:: xml @@ -42,32 +44,32 @@ Section-level elements Section-level elements are used to group the information in the help file into sections. Many sections have a heading, either defined by a ``title`` attribute or a default value. The following section-level elements are optional child elements of the ``qhelp`` element. -+--------------------+------------------------------------+------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ -| Element | Attributes | Children | Purpose of section | -+====================+====================================+========================+===============================================================================================================================================+ -| ``example`` | None | Any block element | Demonstrate an example of code that violates the rule implemented by the query with guidance on how to fix it. Default heading. | -+--------------------+------------------------------------+------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ -| ``fragment`` | None | Any block element | See :ref:`Qhelp inclusion ` below. No heading. | -+--------------------+------------------------------------+------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ -| ``hr`` | None | None | A horizontal rule. No heading. | -+--------------------+------------------------------------+------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ -| ``include`` | ``src`` The Qhelp file to include. | None | Include a Qhelp file at the location of this element. See :ref:`Qhelp inclusion ` below. No heading. | -+--------------------+------------------------------------+------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ -| ``overview`` | None | Any block element | Overview of the purpose of the query. Typically this is the first section in a query document. No heading. | -+--------------------+------------------------------------+------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ -| ``recommendation`` | None | Any block element | Recommend how to address any alerts that this query identifies. Default heading. | -+--------------------+------------------------------------+------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ -| ``references`` | None | ``li`` elements | Reference list. Typically this is the last section in a query document. Default heading. | -+--------------------+------------------------------------+------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ -| ``section`` | ``title`` Title of the section | Any block element | General-purpose section with a heading defined by the ``title`` attribute. | -+--------------------+------------------------------------+------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ -| ``semmleNotes`` | None | Any block element | Semmle-specific notes about the query. This section is used only for queries that implement a rule defined by a third party. Default heading. | -+--------------------+------------------------------------+------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ ++--------------------+-----------------------------------------+------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ +| Element | Attributes | Children | Purpose of section | ++====================+=========================================+========================+===============================================================================================================================================+ +| ``example`` | None | Any block element | Demonstrate an example of code that violates the rule implemented by the query with guidance on how to fix it. Default heading. | ++--------------------+-----------------------------------------+------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ +| ``fragment`` | None | Any block element | See :ref:`Query help inclusion ` below. No heading. | ++--------------------+-----------------------------------------+------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ +| ``hr`` | None | None | A horizontal rule. No heading. | ++--------------------+-----------------------------------------+------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ +| ``include`` | ``src`` The query help file to include. | None | Include a query help file at the location of this element. See :ref:`Query help inclusion ` below. No heading. | ++--------------------+-----------------------------------------+------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ +| ``overview`` | None | Any block element | Overview of the purpose of the query. Typically this is the first section in a query document. No heading. | ++--------------------+-----------------------------------------+------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ +| ``recommendation`` | None | Any block element | Recommend how to address any alerts that this query identifies. Default heading. | ++--------------------+-----------------------------------------+------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ +| ``references`` | None | ``li`` elements | Reference list. Typically this is the last section in a query document. Default heading. | ++--------------------+-----------------------------------------+------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ +| ``section`` | ``title`` Title of the section | Any block element | General-purpose section with a heading defined by the ``title`` attribute. | ++--------------------+-----------------------------------------+------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ +| ``semmleNotes`` | None | Any block element | Semmle-specific notes about the query. This section is used only for queries that implement a rule defined by a third party. Default heading. | ++--------------------+-----------------------------------------+------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ Block elements ============== -The following elements are optional child elements of the ``section``, ``example``, ``fragment``, ``recommendation``, ``overview`` and ``semmleNotes`` elements. +The following elements are optional child elements of the ``section``, ``example``, ``fragment``, ``recommendation``, ``overview``, and ``semmleNotes`` elements. .. table:: :widths: 7 20 10 25 @@ -82,7 +84,7 @@ The following elements are optional child elements of the ``section``, ``example | | | ``height`` Optional, height of the image. | | | | | | ``width`` Optional, the width of the image. | | | +----------------+----------------------------------------------------------+--------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | ``include`` | ``src`` The Qhelp file to include. | None | Include a Qhelp file at the location of this element. See :ref:`Qhelp inclusion ` below for more information. | + | ``include`` | ``src`` The query help file to include. | None | Include a query help file at the location of this element. See :ref:`Query help inclusion ` below for more information. | +----------------+----------------------------------------------------------+--------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ``ol`` | None | ``li`` | Display an ordered list. See List elements below. | +----------------+----------------------------------------------------------+--------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ @@ -103,12 +105,12 @@ The following elements are optional child elements of the ``section``, ``example List elements ============= -Qhelp files support two types of block elements for lists: ``ul`` and ``ol``. Both block elements support only one child elements of the type ``li``. Each ``li`` element contains either inline content or a block element. +Query help files support two types of block elements for lists: ``ul`` and ``ol``. Both block elements support only one child elements of the type ``li``. Each ``li`` element contains either inline content or a block element. Table elements ============== -The ``table`` block element is used to include a table in a Qhelp file. Each table includes a number of rows, each of which includes a number of cells. The data in the cells will be rendered as a grid. +The ``table`` block element is used to include a table in a query help file. Each table includes a number of rows, each of which includes a number of cells. The data in the cells will be rendered as a grid. +-----------+------------+--------------------+-------------------------------------------+ | Element | Attributes | Children | Purpose | @@ -157,12 +159,12 @@ Inline content is used to define the content for paragraphs, list items, table c .. _qhelp-inclusion: -Qhelp inclusion -=============== +Query help inclusion +==================== -To enable the reuse of content between different help topics, shared content can be stored in one Qhelp file and then included in a number of other Qhelp files using the ``include`` element. The shared content can stored either in the same directory as the including files, or in ``SEMMLE_DIST/docs/include``. +To reuse content between different help topics, you can store shared content in one query help file and then include it in a number of other query help files using the ``include`` element. The shared content can be stored either in the same directory as the including files, or in ``SEMMLE_DIST/docs/include``. -The ``include`` element can be used as a section or block element, the content of the Qhelp file defined by the ``src`` attribute must contain elements that are appropriate to the location of the ``include`` element. +The ``include`` element can be used as a section or block element. The content of the query help file defined by the ``src`` attribute must contain elements that are appropriate to the location of the ``include`` element. Section-level include elements ------------------------------ @@ -175,7 +177,7 @@ Section-level ``include`` elements can be located beneath the top-level ``qhelp` -In this example, the `XSS.qhelp `__ file must conform to the standard for a full Qhelp file as described above. That is, the ``qhelp`` element may only contain non-``fragment``, section-level elements. +In this example, the `XSS.qhelp `__ file must conform to the standard for a full query help file as described above. That is, the ``qhelp`` element may only contain non-``fragment``, section-level elements. Block-level include elements ---------------------------- diff --git a/docs/language/learn-ql/writing-queries/query-metadata.rst b/docs/language/learn-ql/writing-queries/query-metadata.rst index 51af3018fdd..362b3c54405 100644 --- a/docs/language/learn-ql/writing-queries/query-metadata.rst +++ b/docs/language/learn-ql/writing-queries/query-metadata.rst @@ -1,5 +1,10 @@ -Query metadata -============== +Metadata for CodeQL queries +=========================== + +Metadata tells users important information about CodeQL queries. You must include the correct query metadata in a query to be able to view query results in source code. + +About query metadata +-------------------- Any query that is run as part of an analysis includes a number of properties, known as query metadata. Metadata is included at the top of each query file as the content of a `QLDoc `__ comment. For alerts and path queries, this metadata tells LGTM and the CodeQL `extension for VS Code `__ how to handle the query and display its results correctly. @@ -10,31 +15,31 @@ You can also add metric queries to LGTM, but the results are not shown. To see t Note - The exact metadata requirement depends on how you are going to run your query. For more information, see the section on query metadata in :doc:`Introduction to query files `. + The exact metadata requirement depends on how you are going to run your query. For more information, see the section on query metadata in :doc:`About CodeQL queries `. Core properties --------------- The following properties are supported by all query files: -+-----------------------+---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| Property | Value | Description | -+=======================+===========================+=============================================================================================================================================================================================================================================================================================================================================================================================================================================================================+ -| ``@description`` | ```` | A sentence or short paragraph to describe the purpose of the query and *why* the result is useful or important. The description is written in plain text, and uses single quotes (``'``) to enclose code elements. | -+-----------------------+---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| ``@id`` | ```` | A sequence of words composed of lowercase letters or digits, delimited by ``/`` or ``-``, identifying and classifying the query. Each query must have a **unique** ID. To ensure this, it may be helpful to use a fixed structure for each ID. For example, the standard LGTM queries have the following format: ``/``. | -+-----------------------+---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| ``@kind`` | | ``problem`` | Identifies the query is an alert (``@kind problem``), a path (``@kind path-problem``), or a metric (``@kind metric``). For further information on these query types, see :doc:`Introduction to query files ` | -| | | ``path-problem`` | | -| | | ``metric`` | | -+-----------------------+---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| ``@name`` | ```` | A statement that defines the label of the query. The name is written in plain text, and uses single quotes (``'``) to enclose code elements. | -+-----------------------+---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| ``@tags`` | | ``correctness`` | These tags group queries together in broad categories to make it easier to search for them and identify them. You can also `filter alerts `__ based on their tags. In addition to the common tags listed here, there are also a number of more specific categories. For more information about some of the tags that are already used and what they mean, see `Query tags `__. | -| | | ``mantainability`` | | -| | | ``readability`` | | -| | | ``security`` | | -+-----------------------+---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ ++-----------------------+---------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| Property | Value | Description | ++=======================+===========================+==============================================================================================================================================================================================================================================================================================================================================================================+ +| ``@description`` | ```` | A sentence or short paragraph to describe the purpose of the query and *why* the result is useful or important. The description is written in plain text, and uses single quotes (``'``) to enclose code elements. | ++-----------------------+---------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| ``@id`` | ```` | A sequence of words composed of lowercase letters or digits, delimited by ``/`` or ``-``, identifying and classifying the query. Each query must have a **unique** ID. To ensure this, it may be helpful to use a fixed structure for each ID. For example, the standard LGTM queries have the following format: ``/``. | ++-----------------------+---------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| ``@kind`` | | ``problem`` | Identifies the query is an alert (``@kind problem``), a path (``@kind path-problem``), or a metric (``@kind metric``). For further information on these query types, see :doc:`About CodeQL queries `. | +| | | ``path-problem`` | | +| | | ``metric`` | | ++-----------------------+---------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| ``@name`` | ```` | A statement that defines the label of the query. The name is written in plain text, and uses single quotes (``'``) to enclose code elements. | ++-----------------------+---------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| ``@tags`` | | ``correctness`` | These tags group queries together in broad categories to make it easier to search for them and identify them. In addition to the common tags listed here, there are also a number of more specific categories. For more information about some of the tags that are already used and what they mean, see `Query tags `__ on LGTM.com. | +| | | ``mantainability`` | | +| | | ``readability`` | | +| | | ``security`` | | ++-----------------------+---------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ Additional properties for problem and path-problem queries diff --git a/docs/language/learn-ql/writing-queries/select-statement.rst b/docs/language/learn-ql/writing-queries/select-statement.rst index c974f8a21d6..5531a958e3e 100644 --- a/docs/language/learn-ql/writing-queries/select-statement.rst +++ b/docs/language/learn-ql/writing-queries/select-statement.rst @@ -1,5 +1,10 @@ -Defining 'select' statements -============================ +Defining the results of a query +=============================== + +You can control how analysis results are displayed in source code by modifying a query's ``select`` statement. + +About query results +------------------- The information contained in the results of a query is controlled by the ``select`` statement. Part of the process of developing a useful query is to make the results clear and easy for other users to understand. When you write your own queries in the query console or in the CodeQL `extension for VS Code `__ there are no constraints on what can be selected. @@ -22,7 +27,7 @@ If you look at some of the LGTM queries, you'll see that they can select extra e Note - An in-depth discussion of ``select`` statements for path and metric queries is not included in this topic. However, you can develop the string column of the ``select`` statement in the same way as for alert queries. For more specific information about path queries, see :doc:`Constructing path queries `. + An in-depth discussion of ``select`` statements for path and metric queries is not included in this topic. However, you can develop the string column of the ``select`` statement in the same way as for alert queries. For more specific information about path queries, see :doc:`Creating path queries `. Developing a select statement ----------------------------- diff --git a/docs/language/learn-ql/writing-queries/writing-queries.rst b/docs/language/learn-ql/writing-queries/writing-queries.rst index e08048953c0..bdf702ded1c 100644 --- a/docs/language/learn-ql/writing-queries/writing-queries.rst +++ b/docs/language/learn-ql/writing-queries/writing-queries.rst @@ -1,57 +1,25 @@ -Writing CodeQL queries -###################### +CodeQL queries +############## -If you are familiar with CodeQL, you can modify the existing queries or write custom queries to analyze, improve, and secure your own projects. Get started by reading the information for query writers and viewing the examples provided below. - -Information for query writers -***************************** +CodeQL queries are used in code scanning analyses to find problems in source code, including potential security vulnerabilities. .. toctree:: - :glob: :hidden: introduction-to-queries - path-queries - ../intro-to-data-flow - select-statement - ../locations - debugging-queries - - -Visit `Learning CodeQL `__ to find basic information about CodeQL. This includes information about the underlying query language QL, as well as help and advice on writing queries for specific programming languages. -To learn more about the structure of query files, the key information to include when writing your own queries, and how to format them for clarity and consistency, see the following topics: - -- :doc:`Introduction to query files `–an introduction to the information contained in a basic query file. -- :doc:`Constructing path queries `–a quick guide to structuring path queries to use in security research. -- :doc:`Introduction to data flow analysis with CodeQL <../intro-to-data-flow>`–a brief introduction to modeling data flow using CodeQL. -- :doc:`Defining 'select' statements `–further detail on developing query alert messages to provide extra information in your query results. -- :doc:`Locations and strings for CodeQL entities <../locations>`–further detail on providing location information in query results. -- `CodeQL style guide on GitHub `__–a guide to formatting your queries for consistency and clarity. - -Viewing existing CodeQL queries -******************************* - -The easiest way to get started writing your own queries is to modify an existing query. To see these queries, or to try out the CodeQL query cookbooks, visit `Exploring CodeQL queries `__. -You can also find all the CodeQL queries in our `open source repository on GitHub `__. - -You can also find examples of queries developed to find security vulnerabilities and bugs in open-source software projects on the `GitHub Security Lab website `__ and in the associated `repository `__. - -Contributing queries -******************** - -.. toctree:: - :glob: - :hidden: - query-metadata query-help + select-statement + ../locations + ../intro-to-data-flow + path-queries + debugging-queries -Contributions to the standard queries and libraries are very welcome–see our `contributing guidelines `__ for further information. -If you are contributing a query to the open source GitHub repository, writing a custom query for LGTM, or using a custom query in an analysis with our command-line tools, then you need to include extra metadata in your query to ensure that the query results are interpreted and displayed correctly. See the following topics for more information on query metadata: - -.. TODO: Change "command-line tools" to a link to the CodeQL CLI? - -- :doc:`Query metadata reference ` -- `Query metadata style guide on GitHub `__ - -Query contributions to the open source GitHub repository may also have an accompanying query help file to provide information about their purpose for other users. For more information on writing query help, see the `Query help style guide on GitHub `__ and the :doc:`Query help reference `. \ No newline at end of file +- :doc:`About CodeQL queries `: CodeQL queries are used to analyze code for issues related to security, correctness, maintainability, and readability. +- :doc:`Metadata for CodeQL queries `: Metadata tells users important information about CodeQL queries. You must include the correct query metadata in a query to be able to view query results in source code. +- :doc:`Query help files `: Query help files tell users the purpose of a query, and recommend how to solve the potential problem the query finds. +- :doc:`Defining the results of a query `: You can control how analysis results are displayed in source code by modifying a query's ``select`` statement. +- :doc:`Providing locations in CodeQL queries <../locations>`: CodeQL includes mechanisms for extracting the location of elements in a codebase. Use these mechanisms when writing custom CodeQL queries and libraries to help display information to users. +- :doc:`About data flow analysis <../intro-to-data-flow>`: Data flow analysis is used to compute the possible values that a variable can hold at various points in a program, determining how those values propagate through the program and where they are used. +- :doc:`Creating path queries `: You can create path queries to visualize the flow of information through a codebase. +- :doc:`Troubleshooting query performance `: Improve the performance of your CodeQL queries by following a few simple guidelines. diff --git a/docs/language/learn-ql/about-ql.rst b/docs/language/ql-handbook/about-the-ql-language.rst similarity index 82% rename from docs/language/learn-ql/about-ql.rst rename to docs/language/ql-handbook/about-the-ql-language.rst index 5f5362b8fd8..8cb38d05df3 100644 --- a/docs/language/learn-ql/about-ql.rst +++ b/docs/language/ql-handbook/about-the-ql-language.rst @@ -1,17 +1,15 @@ -About QL -======== +About the QL language +###################### -This section is aimed at users with a background in general purpose programming as well as in databases. For a basic introduction and information on how to get started, see :doc:`Introduction to QL ` and :doc:`Learning CodeQL <../index>`. - -QL is a declarative, object-oriented query language that is optimized to enable efficient analysis of hierarchical data structures, in particular, databases representing software artifacts. - -The queries and metrics used in LGTM are implemented using CodeQL, which uses QL to analyze code. This ensures that they can be extended or revised easily to keep up with changes in definitions of best coding practice. We continually improve existing queries as we work towards the ultimate goal of 100% precision. - -You can write queries to identify security vulnerabilities, find coding errors and bugs, or find code that breaks your team's guidelines for best practice. You can also create customized versions of the default queries to accommodate a new framework. +QL is the powerful query language that underlies CodeQL, which is used to analyze code. About query languages and databases ----------------------------------- +This section is aimed at users with a background in general purpose programming as well as in databases. For a basic introduction and information on how to get started, see `Learning CodeQL `__. + +QL is a declarative, object-oriented query language that is optimized to enable efficient analysis of hierarchical data structures, in particular, databases representing software artifacts. + A database is an organized collection of data. The most commonly used database model is a relational model which stores data in tables and SQL (Structured Query Language) is the most commonly used query language for relational databases. The purpose of a query language is to provide a programming platform where you can ask questions about information stored in a database. A database management system manages the storage and administration of data and provides the querying mechanism. A query typically refers to the relevant database entities and specifies various conditions (called predicates) that must be satisfied by the results. Query evaluation involves checking these predicates and generating the results. Some of the desirable properties of a good query language and its implementation include: @@ -41,6 +39,11 @@ When you write this process in QL, it closely resembles the above structure. Not result = count(getADescendant(p)) } +For more information about the important concepts and syntactic constructs of QL, see the individual reference topics such as :doc:`Expressions ` and :doc:`Recursion `. +The explanations and examples help you understand how the language works, and how to write more advanced QL code. + +For formal specifications of the QL language and QLDoc comments, see the :doc:`QL language specification ` and :doc:`QLDoc comment specification `. + QL and object orientation ------------------------- @@ -56,12 +59,10 @@ Here are a few prominent conceptual and functional differences between general p - QL's set-based semantics makes it very natural to process collections of values without having to worry about efficiently storing, indexing and traversing them. - In object oriented programming languages, instantiating a class involves creating an object by allocating physical memory to hold the state of that instance of the class. In QL, classes are just logical properties describing sets of already existing values. -These topics are discussed in detail in the `QL language handbook `__. +Further reading +--------------- -References ----------- - -Academic references available from the `Semmle website `__ also provide an overview of QL and its semantics. Other useful references on database query languages and Datalog: +`Academic references `__ also provide an overview of QL and its semantics. Other useful references on database query languages and Datalog: - `Database theory: Query languages `__ - `Logic Programming and Databases book - Amazon page `__ diff --git a/docs/language/ql-handbook/aliases.rst b/docs/language/ql-handbook/aliases.rst index 88675370081..03bdc835a30 100644 --- a/docs/language/ql-handbook/aliases.rst +++ b/docs/language/ql-handbook/aliases.rst @@ -4,8 +4,9 @@ Aliases ####### -An alias is an alternative name for an existing QL entity. Once you've defined an alias, -you can use that new name to refer to the entity in the current module's :ref:`namespace `. +An alias is an alternative name for an existing QL entity. + +Once you've defined an alias, you can use that new name to refer to the entity in the current module's :ref:`namespace `. Defining an alias ***************** diff --git a/docs/language/ql-handbook/annotations.rst b/docs/language/ql-handbook/annotations.rst index 8673b8194f6..816b5ea5310 100644 --- a/docs/language/ql-handbook/annotations.rst +++ b/docs/language/ql-handbook/annotations.rst @@ -4,6 +4,7 @@ Annotations ########### An annotation is a string that you can place directly before the declaration of a QL entity or name. + For example, to declare a module ``M`` as private, you could use:: private module M { diff --git a/docs/language/ql-handbook/conf.py b/docs/language/ql-handbook/conf.py index 1b4887c0f06..10bda010a7e 100644 --- a/docs/language/ql-handbook/conf.py +++ b/docs/language/ql-handbook/conf.py @@ -1,6 +1,6 @@ # -*- coding: utf-8 -*- # -# QL language handbook build configuration file, created by +# QL language reference build configuration file, created by # sphinx-quickstart on Wed Feb 28 12:01:34 2018. # # This file is execfile()d with the current directory set to its @@ -17,9 +17,10 @@ ################################################################################ # -# Modified 22052019. - -# The configuration values below are specific to the QL handbook +# Modified 02042020 to rename "handbook" to "reference". +# This Sphinx project now contains both the handbook articles and the specifications. +# +# The configuration values below are specific to the QL language reference project. # To amend html_theme_options, update version/release number, or add more sphinx extensions, # refer to code/documentation/ql-documentation/global-sphinx-files/global-conf.py @@ -41,7 +42,7 @@ highlight_language = 'ql' master_doc = 'index' # Project-specific information. -project = u'QL language handbook' +project = u'QL language reference' # The version info for this project, if different from version and release in main conf.py file. # The short X.Y version. @@ -53,10 +54,10 @@ project = u'QL language handbook' # The name for this set of Sphinx documents. If None, it defaults to # " v documentation". -html_title = 'QL language handbook' +html_title = 'QL language reference' # Output file base name for HTML help builder. -htmlhelp_basename = 'QL language handbook' +htmlhelp_basename = 'QL language reference' # -- Currently unused, but potentially useful, configs-------------------------------------- diff --git a/docs/language/ql-handbook/evaluation.rst b/docs/language/ql-handbook/evaluation.rst index 251b951c0e1..42520f634a2 100644 --- a/docs/language/ql-handbook/evaluation.rst +++ b/docs/language/ql-handbook/evaluation.rst @@ -3,6 +3,8 @@ Evaluation of QL programs ######################### +A QL program is evaluated in a number of different steps. + Process ******* diff --git a/docs/language/ql-handbook/expressions.rst b/docs/language/ql-handbook/expressions.rst index 57a814b958a..729bd9e4876 100644 --- a/docs/language/ql-handbook/expressions.rst +++ b/docs/language/ql-handbook/expressions.rst @@ -3,11 +3,10 @@ Expressions ########### -An expression evaluates to a set of values in QL. For example, the expression ``1 + 2`` -evaluates to the integer ``3`` and the expression ``"QL"`` evaluates to the string ``"QL"``. +An expression evaluates to a set of values and has a type. -A valid expression also has a :ref:`type `. -In the above examples, ``1 + 2`` has type ``int`` and ``"QL"`` has type ``string``. +For example, the expression ``1 + 2`` +evaluates to the integer ``3`` and the expression ``"QL"`` evaluates to the string ``"QL"``. ``1 + 2`` has :ref:`type ` ``int`` and ``"QL"`` has type ``string``. The following sections describe the expressions that are available in QL. diff --git a/docs/language/ql-handbook/formulas.rst b/docs/language/ql-handbook/formulas.rst index 45ef1302fb7..cedbfa4a4cf 100644 --- a/docs/language/ql-handbook/formulas.rst +++ b/docs/language/ql-handbook/formulas.rst @@ -3,10 +3,9 @@ Formulas ######## -Formulas define logical relations between the :ref:`free variables ` used in -:ref:`expressions `. +Formulas define logical relations between the free variables used in expressions. -Depending on the values assigned to those free variables, a formula can be true or false. +Depending on the values assigned to those :ref:`free variables `, a formula can be true or false. When a formula is true, we often say that the formula *holds*. For example, the formula ``x = 4 + 5`` holds if the value ``9`` is assigned to ``x``, but it doesn't hold for other assignments to ``x``. diff --git a/docs/language/ql-handbook/index.rst b/docs/language/ql-handbook/index.rst index 701d7d02237..ec4ab6d671e 100644 --- a/docs/language/ql-handbook/index.rst +++ b/docs/language/ql-handbook/index.rst @@ -1,29 +1,12 @@ -QL language handbook -#################### +QL language reference +##################### -Welcome to the QL language handbook! - -This document describes the main features of the QL language. The explanations and examples -help you understand how the language works, and how to write more advanced QL code. - -Each section describes an important concept or syntactic construct of QL. For an overview, see -the table of contents below. - -If you are just getting started with QL, see `Learning CodeQL `_ -for a list of the available resources. - -.. index:: specification - -For a formal specification of the QL language, see the `QL language specification -`_. - -Table of contents -***************** +Learn all about QL, the powerful query language that underlies the code scanning tool CodeQL. .. toctree:: - :numbered: 3 - :maxdepth: 3 + :maxdepth: 1 + about-the-ql-language predicates queries types @@ -37,9 +20,5 @@ Table of contents lexical-syntax name-resolution evaluation - -Index and search -**************** - -* :ref:`genindex` -* :ref:`search` + language + qldoc \ No newline at end of file diff --git a/docs/language/ql-spec/language.rst b/docs/language/ql-handbook/language.rst similarity index 99% rename from docs/language/ql-spec/language.rst rename to docs/language/ql-handbook/language.rst index d3c1cb48d48..5bb7370d11c 100644 --- a/docs/language/ql-spec/language.rst +++ b/docs/language/ql-handbook/language.rst @@ -1,12 +1,16 @@ QL language specification ========================= +This is a formal specification for the QL language. It provides a comprehensive reference for terminology, syntax, and other technical details about QL. + +.. This ``highlight`` directive prevents code blocks in this file being highlighted as QL (the default language for this Sphinx project). + +.. highlight:: none + Introduction ------------ -This document specifies the QL language. It provides a comprehensive reference for terminology, syntax, and other technical details about QL. - -QL is a query language for Semmle databases. The data is relational: named relations hold sets of tuples. The query language is a dialect of Datalog, using stratified semantics, and it includes object-oriented classes. +QL is a query language for CodeQL databases. The data is relational: named relations hold sets of tuples. The query language is a dialect of Datalog, using stratified semantics, and it includes object-oriented classes. Notation -------- diff --git a/docs/language/ql-handbook/lexical-syntax.rst b/docs/language/ql-handbook/lexical-syntax.rst index d5f152e829e..481b0bf87ea 100644 --- a/docs/language/ql-handbook/lexical-syntax.rst +++ b/docs/language/ql-handbook/lexical-syntax.rst @@ -3,10 +3,10 @@ Lexical syntax ############## +The QL syntax includes different kinds of keywords, identifiers, and comments. + For an overview of the lexical syntax, see `Lexical syntax -`_ -in the QL language specification. In particular, you can find the list of QL keywords, the -different kinds of identifiers, and a description of comments. +`_ in the QL language specification. .. index:: comment, QLDoc .. _comments: @@ -19,7 +19,7 @@ All standard one-line and multiline comments, as described in the `QL language s compiler and are only visible in the source code. You can also write another kind of comment, namely **QLDoc comments**. These comments describe QL entities and are displayed as pop-up information in QL editors. For information about QLDoc -comments, see the `QLDoc specification `_. +comments, see the `QLDoc comment specification `_. The following example uses these three different kinds of comments:: diff --git a/docs/language/ql-handbook/name-resolution.rst b/docs/language/ql-handbook/name-resolution.rst index 6c6bd47e6fe..3e44140640e 100644 --- a/docs/language/ql-handbook/name-resolution.rst +++ b/docs/language/ql-handbook/name-resolution.rst @@ -3,6 +3,8 @@ Name resolution ############### +The QL compiler resolves names to program elements. + As in other programming languages, there is a distinction between the names used in QL code, and the underlying QL entities they refer to. @@ -240,7 +242,7 @@ and the global namespaces. (You can think of global namespaces as the enclosing Let's see what the module, type, and predicate namespaces look like in a concrete example: For example, you could define a library module ``Villagers`` containing some of the classes and predicates that -were defined in the `QL detective tutorials `_: +were defined in the `QL tutorials `_: **Villagers.qll** diff --git a/docs/language/ql-spec/qldoc.rst b/docs/language/ql-handbook/qldoc.rst similarity index 81% rename from docs/language/ql-spec/qldoc.rst rename to docs/language/ql-handbook/qldoc.rst index 62987574c12..d2ca924598e 100644 --- a/docs/language/ql-spec/qldoc.rst +++ b/docs/language/ql-handbook/qldoc.rst @@ -1,7 +1,12 @@ -QLDoc specification -=================== +QLDoc comment specification +=========================== -This document is a specification for QLDoc comments in QL source files. +This document is a formal specification for QLDoc comments. + +About QLDoc comments +-------------------- + +You can provide documentation for a QL entity by adding a QLDoc comment in the source file. The QLDoc comment is displayed as pop-up information in QL editors, for example when you hover over a predicate name. Notation -------- @@ -36,7 +41,7 @@ Content The content of a QLDoc comment is interpreted as standard Markdown, with the following extensions: -- Fenced code blocks using \`s. +- Fenced code blocks using backticks. - Automatic interpretation of links and email addresses. - Use of appropriate characters for ellipses, dashes, apostrophes, and quotes. diff --git a/docs/language/ql-handbook/queries.rst b/docs/language/ql-handbook/queries.rst index 1c21482af35..c6ef62d6b13 100644 --- a/docs/language/ql-handbook/queries.rst +++ b/docs/language/ql-handbook/queries.rst @@ -4,8 +4,7 @@ Queries ####### -Queries are the output of a QL program: they evaluate to sets of results. Indeed, we -often refer to the whole QL program as a *query*. +Queries are the output of a QL program. They evaluate to sets of results. There are two kinds of queries. For a given :ref:`query module `, the queries in that module are: - The :ref:`select clause `, if any, defined in that module. @@ -13,6 +12,8 @@ There are two kinds of queries. For a given :ref:`query module `, :ref:`namespace `. That is, they can be defined in the module itself, or imported from a different module. +We often also refer to the whole QL program as a query. + .. index:: from, where, select .. _select-clauses: diff --git a/docs/language/ql-handbook/types.rst b/docs/language/ql-handbook/types.rst index dfa7a8cd30f..92ac01bd42b 100644 --- a/docs/language/ql-handbook/types.rst +++ b/docs/language/ql-handbook/types.rst @@ -5,7 +5,9 @@ Types ##### -QL is a statically typed language, so each variable must have a declared **type**. A type is a set of values. +QL is a statically typed language, so each variable must have a declared type. + +A type is a set of values. For example, the type ``int`` is the set of integers. Note that a value can belong to more than one of these sets, which means that it can have more than one type. diff --git a/docs/language/ql-handbook/variables.rst b/docs/language/ql-handbook/variables.rst index 25b4fff1ad2..a74d492b61c 100644 --- a/docs/language/ql-handbook/variables.rst +++ b/docs/language/ql-handbook/variables.rst @@ -5,11 +5,11 @@ Variables ######### Variables in QL are used in a similar way to variables in algebra or logic. They represent sets -of values, and those values are usually restricted by a :ref:`formula `. +of values, and those values are usually restricted by a formula. This is different from variables in some other programming languages, where variables represent memory locations that may contain data. That data can also change over time. For example, in -QL, ``n = n + 1`` is an equality formula that holds only +QL, ``n = n + 1`` is an equality :ref:`formula ` that holds only if ``n`` is equal to ``n + 1`` (so in fact it does not hold for any numeric value). In Java, ``n = n + 1`` is not an equality, but an assignment that changes the value of ``n`` by adding ``1`` to the current value. diff --git a/docs/language/ql-spec/index.rst b/docs/language/ql-spec/index.rst index 4cbdc834936..cba06a87746 100644 --- a/docs/language/ql-spec/index.rst +++ b/docs/language/ql-spec/index.rst @@ -1,24 +1,5 @@ -QL specifications -################# +README +###### -.. index:: specification - - -For a formal specification of the QL language, including a description of syntax, terminology, and other technical details, please consult the QL language specification. - -For information on the terminology and syntax used in QLDoc comments, please consult the QLDoc specification. - -.. toctree:: - :maxdepth: 2 - - language - - qldoc - - -Search -****** - -.. * :ref:`genindex` - -* :ref:`search` +The specifications have moved to ``ql/docs/language/ql-handbook``. +See https://github.com/github/semmle-docs/issues/21 for details of the restructuring. \ No newline at end of file diff --git a/docs/language/ql-training/cpp/intro-ql-cpp.rst b/docs/language/ql-training/cpp/intro-ql-cpp.rst index 6beff0b708e..7f398da8d4b 100644 --- a/docs/language/ql-training/cpp/intro-ql-cpp.rst +++ b/docs/language/ql-training/cpp/intro-ql-cpp.rst @@ -68,7 +68,7 @@ A simple CodeQL query We are going to write a simple query which finds “if statements” with empty “then” blocks, so we can highlight the results like those on the previous slide. The query can be run in the `query console on LGTM `__, or in your `IDE `__. - A `query `__ consists of a “select” clause that indicates what results should be returned. Typically it will also provide a “from” clause to declare some variables, and a “where” clause to state conditions over those variables. For more information on the structure of query files (including links to useful topics in the `QL language handbook `__), see `Introduction to query files `__. + A `query `__ consists of a “select” clause that indicates what results should be returned. Typically it will also provide a “from” clause to declare some variables, and a “where” clause to state conditions over those variables. For more information on the structure of query files (including links to useful topics in the `QL language reference `__), see `Introduction to query files `__. In our example here, the first line of the query imports the `CodeQL library for C/C++ `__, which defines concepts like ``IfStmt`` and ``Block``. The query proper starts by declaring two variables–ifStmt and block. These variables represent sets of values in the database, according to the type of each of the variables. For example, ifStmt has the type IfStmt, which means it represents the set of all if statements in the program. diff --git a/docs/language/ql-training/java/intro-ql-java.rst b/docs/language/ql-training/java/intro-ql-java.rst index f93a619c142..66c41df44b0 100644 --- a/docs/language/ql-training/java/intro-ql-java.rst +++ b/docs/language/ql-training/java/intro-ql-java.rst @@ -68,7 +68,7 @@ A simple CodeQL query We are going to write a simple query which finds “if statements” with empty “then” blocks, so we can highlight the results like those on the previous slide. The query can be run in the `query console on LGTM `__, or in your `IDE `__. - A `query `__ consists of a “select” clause that indicates what results should be returned. Typically it will also provide a “from” clause to declare some variables, and a “where” clause to state conditions over those variables. For more information on the structure of query files (including links to useful topics in the `QL language handbook `__), see `Introduction to query files `__. + A `query `__ consists of a “select” clause that indicates what results should be returned. Typically it will also provide a “from” clause to declare some variables, and a “where” clause to state conditions over those variables. For more information on the structure of query files (including links to useful topics in the `QL language reference `__), see `Introduction to query files `__. In our example here, the first line of the query imports the `CodeQL library for Java `__, which defines concepts like ``IfStmt`` and ``Block``. The query proper starts by declaring two variables–ifStmt and block. These variables represent sets of values in the database, according to the type of each of the variables. For example, ``ifStmt`` has the type ``IfStmt``, which means it represents the set of all if statements in the program. diff --git a/docs/language/ql-training/slide-snippets/intro-ql-general.rst b/docs/language/ql-training/slide-snippets/intro-ql-general.rst index 8d6a25afad7..2dc64517465 100644 --- a/docs/language/ql-training/slide-snippets/intro-ql-general.rst +++ b/docs/language/ql-training/slide-snippets/intro-ql-general.rst @@ -124,7 +124,7 @@ QL is: .. note:: - QL is the high-level, object-oriented logic language that underpins all CodeQL libraries and analyses. You can learn lots more about QL by visiting `Introduction to the QL language `__ and `About QL `__. + QL is the high-level, object-oriented logic language that underpins all CodeQL libraries and analyses. You can learn lots more about QL by visiting the `QL language reference `__. The key features of QL are: - All common logic connectives are available, including quantifiers like ``exist``, which can also introduce new variables. diff --git a/docs/language/ql-training/slide-snippets/local-data-flow.rst b/docs/language/ql-training/slide-snippets/local-data-flow.rst index ed681b0398a..0bbb2c20ba4 100644 --- a/docs/language/ql-training/slide-snippets/local-data-flow.rst +++ b/docs/language/ql-training/slide-snippets/local-data-flow.rst @@ -111,8 +111,8 @@ So all references will need to be qualified (that is, ``DataFlow::Node``) A **module** is a way of organizing QL code by grouping together related predicates, classes, and (sub-)modules. They can be either explicitly declared or implicit. A query library implicitly declares a module with the same name as the QLL file. - For further information on libraries and modules in QL, see the chapter on `Modules `__ in the QL language handbook. - For further information on importing QL libraries and modules, see the chapter on `Name resolution `__ in the QL language handbook. + For further information on libraries and modules in QL, see the chapter on `Modules `__ in the QL language reference. + For further information on importing QL libraries and modules, see the chapter on `Name resolution `__ in the QL language reference. Data flow graph =============== diff --git a/docs/language/reusables/python-other-resources.rst b/docs/language/reusables/python-other-resources.rst new file mode 100644 index 00000000000..8e9482cf230 --- /dev/null +++ b/docs/language/reusables/python-other-resources.rst @@ -0,0 +1,3 @@ +- "`QL language reference `__" +- `Python cookbook queries `__ in the Semmle wiki +- `Python queries in action `__ on LGTM.com diff --git a/docs/language/support/language-support.rst b/docs/language/support/language-support.rst index 1dea250d430..4ffd28f3797 100644 --- a/docs/language/support/language-support.rst +++ b/docs/language/support/language-support.rst @@ -10,18 +10,4 @@ Customers with any questions should contact their usual Semmle contact with any If you're not a customer yet, contact us at info@semmle.com with any questions you have about language and compiler support. -.. csv-table:: - :file: versions-compilers.csv - :header-rows: 1 - :widths: auto - :stub-columns: 1 - -.. container:: footnote-group - - .. [1] Support for the clang-cl compiler is preliminary. - .. [2] Support for the Arm Compiler (armcc) is preliminary. - .. [3] In addition, support is included for the preview features of C# 8.0 and .NET Core 3.0. - .. [4] Builds that execute on Java 6 to 12 can be analyzed. The analysis understands Java 12 language features. - .. [5] ECJ is supported when the build invokes it via the Maven Compiler plugin or the Takari Lifecycle plugin. - .. [6] JSX and Flow code, YAML, JSON, HTML, and XML files may also be analyzed with JavaScript files. - .. [7] TypeScript analysis is performed by running the JavaScript extractor with TypeScript enabled. This is the default for LGTM. +.. include:: versions-compilers.rst diff --git a/docs/language/support/versions-compilers.csv b/docs/language/support/versions-compilers.csv deleted file mode 100644 index 8b60f8ee3a9..00000000000 --- a/docs/language/support/versions-compilers.csv +++ /dev/null @@ -1,18 +0,0 @@ -Language,Variants,Compilers,Extensions -C/C++,"C89, C99, C11, C18, C++98, C++03, C++11, C++14, C++17","Clang (and clang-cl [1]_) extensions (up to Clang 9.0), - -GNU extensions (up to GCC 9.2), - -Microsoft extensions (up to VS 2019), - -Arm Compiler 5 [2]_","``.cpp``, ``.c++``, ``.cxx``, ``.hpp``, ``.hh``, ``.h++``, ``.hxx``, ``.c``, ``.cc``, ``.h``" -C#,C# up to 8.0. with .NET up to 4.8 [3]_,"Microsoft Visual Studio up to 2019, - -.NET Core up to 3.0","``.sln``, ``.csproj``, ``.cs``, ``.cshtml``, ``.xaml``" -Go (aka Golang), "Go up to 1.14", "Go 1.11 or more recent", ``.go`` -Java,"Java 6 to 13 [4]_","javac (OpenJDK and Oracle JDK), - -Eclipse compiler for Java (ECJ) [5]_",``.java`` -JavaScript,ECMAScript 2019 or lower,Not applicable,"``.js``, ``.jsx``, ``.mjs``, ``.es``, ``.es6``, ``.htm``, ``.html``, ``.xhm``, ``.xhtml``, ``.vue``, ``.json``, ``.yaml``, ``.yml``, ``.raml``, ``.xml`` [6]_" -Python,"2.7, 3.5, 3.6, 3.7, 3.8",Not applicable,``.py`` -TypeScript [7]_,"2.6-3.7",Standard TypeScript compiler,"``.ts``, ``.tsx``" diff --git a/docs/language/support/versions-compilers.rst b/docs/language/support/versions-compilers.rst new file mode 100644 index 00000000000..3b244e592bd --- /dev/null +++ b/docs/language/support/versions-compilers.rst @@ -0,0 +1,33 @@ +.. csv-table:: + :header-rows: 1 + :widths: auto + :stub-columns: 1 + + Language,Variants,Compilers,Extensions + C/C++,"C89, C99, C11, C18, C++98, C++03, C++11, C++14, C++17","Clang (and clang-cl [1]_) extensions (up to Clang 9.0), + + GNU extensions (up to GCC 9.2), + + Microsoft extensions (up to VS 2019), + + Arm Compiler 5 [2]_","``.cpp``, ``.c++``, ``.cxx``, ``.hpp``, ``.hh``, ``.h++``, ``.hxx``, ``.c``, ``.cc``, ``.h``" + C#,C# up to 8.0. with .NET up to 4.8 [3]_,"Microsoft Visual Studio up to 2019, + + .NET Core up to 3.0","``.sln``, ``.csproj``, ``.cs``, ``.cshtml``, ``.xaml``" + Go (aka Golang), "Go up to 1.14", "Go 1.11 or more recent", ``.go`` + Java,"Java 6 to 14 [4]_","javac (OpenJDK and Oracle JDK), + + Eclipse compiler for Java (ECJ) [5]_",``.java`` + JavaScript,ECMAScript 2019 or lower,Not applicable,"``.js``, ``.jsx``, ``.mjs``, ``.es``, ``.es6``, ``.htm``, ``.html``, ``.xhm``, ``.xhtml``, ``.vue``, ``.json``, ``.yaml``, ``.yml``, ``.raml``, ``.xml`` [6]_" + Python,"2.7, 3.5, 3.6, 3.7, 3.8",Not applicable,``.py`` + TypeScript [7]_,"2.6-3.7",Standard TypeScript compiler,"``.ts``, ``.tsx``" + +.. container:: footnote-group + + .. [1] Support for the clang-cl compiler is preliminary. + .. [2] Support for the Arm Compiler (armcc) is preliminary. + .. [3] In addition, support is included for the preview features of C# 8.0 and .NET Core 3.0. + .. [4] Builds that execute on Java 6 to 14 can be analyzed. The analysis understands Java 14 standard language features. + .. [5] ECJ is supported when the build invokes it via the Maven Compiler plugin or the Takari Lifecycle plugin. + .. [6] JSX and Flow code, YAML, JSON, HTML, and XML files may also be analyzed with JavaScript files. + .. [7] TypeScript analysis is performed by running the JavaScript extractor with TypeScript enabled. This is the default for LGTM. diff --git a/docs/query-help-style-guide.md b/docs/query-help-style-guide.md index 58e2d93f95e..c56cd885fce 100644 --- a/docs/query-help-style-guide.md +++ b/docs/query-help-style-guide.md @@ -16,7 +16,7 @@ Query help files must have the same base name as the query they describe and mus ### File structure and layout -Query files are written using an XML format called Qhelp, and stored in a file with a `.qhelp` extension. The basic structure is as follows: +Query help files are written using a custom XML format, and stored in a file with a `.qhelp` extension. The basic structure is as follows: ``` @@ -25,7 +25,7 @@ Query files are written using an XML format called Qhelp, and stored in a file w ``` -The header and single top-level `qhelp` element are both mandatory. +The header and single top-level `` element are both mandatory. ### Section-level elements @@ -36,7 +36,7 @@ Section-level elements are used to group the information within the query help f 3. `example`—an example of code showing the problem. Where possible, this section should also include a solution to the issue. 4. `references`—relevant references, such as authoritative sources on language semantics and best practice. -For further information about the other section-level, block, list and table elements supported by the qhelp format, see the [Query help reference](https://help.semmle.com/QL/learn-ql/ql/writing-queries/query-help.html) on help.semmle.com. +For further information about the other section-level, block, list and table elements supported by query help files, see the [Query help reference](https://help.semmle.com/QL/learn-ql/ql/writing-queries/query-help.html) on help.semmle.com. ## English style @@ -86,7 +86,7 @@ For example: >W. C. Wake, _Refactoring Workbook_, pp. 93 – 94, Addison-Wesley Professional, 2004. -Note, & symbols need to be replaced by \&. The symbol will be displayed correctly in the html files generated from the qhelp files. +Note, & symbols need to be replaced by \&. The symbol will be displayed correctly in the HTML files generated from the query help files. ### Academic papers @@ -107,11 +107,11 @@ For example: ### Referencing potential security weaknesses -If your query checks code for a CWE weakness, you should use the `@tags` element in the query file to reference the associated CWEs, as explained [here](query-metadata-style-guide.md). When you use these tags, a link to the appropriate entry from the [MITRE.org](https://cwe.mitre.org/scoring/index.html) site will automatically appear as a reference in the qhelp file. +If your query checks code for a CWE weakness, you should use the `@tags` element in the query file to reference the associated CWEs, as explained [here](query-metadata-style-guide.md). When you use these tags, a link to the appropriate entry from the [MITRE.org](https://cwe.mitre.org/scoring/index.html) site will automatically appear as a reference in the output HTML file. ## Query help example -The following example is a qhelp file for a query from the standard query suite for Java: +The following example is a query help file for a query from the standard query suite for Java: ``` -``` \ No newline at end of file +``` diff --git a/java/ql/src/codeql-suites/java-code-scanning.qls b/java/ql/src/codeql-suites/java-code-scanning.qls new file mode 100644 index 00000000000..7dc29ab8049 --- /dev/null +++ b/java/ql/src/codeql-suites/java-code-scanning.qls @@ -0,0 +1,4 @@ +- description: Standard Code Scanning queries for Java +- qlpack: codeql-java +- apply: code-scanning-selectors.yml + from: codeql-suite-helpers diff --git a/java/ql/src/config/semmlecode.dbscheme b/java/ql/src/config/semmlecode.dbscheme index 054d7e823b2..2a682863863 100755 --- a/java/ql/src/config/semmlecode.dbscheme +++ b/java/ql/src/config/semmlecode.dbscheme @@ -268,6 +268,10 @@ classes( int sourceid: @class ref ); +isRecord( + unique int id: @class ref +); + interfaces( unique int id: @interface, string nodeName: string ref, diff --git a/java/ql/src/config/semmlecode.dbscheme.stats b/java/ql/src/config/semmlecode.dbscheme.stats index 165db4674b5..ae0f5e3b34f 100644 --- a/java/ql/src/config/semmlecode.dbscheme.stats +++ b/java/ql/src/config/semmlecode.dbscheme.stats @@ -9263,6 +9263,17 @@ +isRecord +100 + + +id +100 + + + + + interfaces 249736 diff --git a/java/ql/src/semmle/code/java/Expr.qll b/java/ql/src/semmle/code/java/Expr.qll index 37cc608d878..73475f6cc70 100755 --- a/java/ql/src/semmle/code/java/Expr.qll +++ b/java/ql/src/semmle/code/java/Expr.qll @@ -1076,8 +1076,6 @@ class ConditionalExpr extends Expr, @conditionalexpr { } /** - * PREVIEW FEATURE in Java 13. Subject to removal in a future release. - * * A `switch` expression. */ class SwitchExpr extends Expr, @switchexpr { @@ -1132,7 +1130,25 @@ deprecated class ParExpr extends Expr, @parexpr { /** An `instanceof` expression. */ class InstanceOfExpr extends Expr, @instanceofexpr { /** Gets the expression on the left-hand side of the `instanceof` operator. */ - Expr getExpr() { result.isNthChildOf(this, 0) } + Expr getExpr() { + if isPattern() + then result = getLocalVariableDeclExpr().getInit() + else result.isNthChildOf(this, 0) + } + + /** + * PREVIEW FEATURE in Java 14. Subject to removal in a future release. + * + * Holds if this `instanceof` expression uses pattern matching. + */ + predicate isPattern() { exists(getLocalVariableDeclExpr()) } + + /** + * PREVIEW FEATURE in Java 14. Subject to removal in a future release. + * + * Gets the local variable declaration of this `instanceof` expression if pattern matching is used. + */ + LocalVariableDeclExpr getLocalVariableDeclExpr() { result.isNthChildOf(this, 0) } /** Gets the access to the type on the right-hand side of the `instanceof` operator. */ Expr getTypeName() { result.isNthChildOf(this, 1) } @@ -1163,6 +1179,8 @@ class LocalVariableDeclExpr extends Expr, @localvariabledeclexpr { exists(ForStmt fs | fs.getAnInit() = this | result.isNthChildOf(fs, 0)) or exists(EnhancedForStmt efs | efs.getVariable() = this | result.isNthChildOf(efs, -1)) + or + exists(InstanceOfExpr ioe | this.getParent() = ioe | result.isNthChildOf(ioe, 1)) } /** Gets the name of the variable declared by this local variable declaration expression. */ diff --git a/java/ql/src/semmle/code/java/Statement.qll b/java/ql/src/semmle/code/java/Statement.qll index 2a5f94ef781..fa303a7c3ee 100755 --- a/java/ql/src/semmle/code/java/Statement.qll +++ b/java/ql/src/semmle/code/java/Statement.qll @@ -417,8 +417,6 @@ class SwitchCase extends Stmt, @case { SwitchStmt getSwitch() { result.getACase() = this } /** - * PREVIEW FEATURE in Java 13. Subject to removal in a future release. - * * Gets the switch expression to which this case belongs, if any. */ SwitchExpr getSwitchExpr() { result.getACase() = this } @@ -432,8 +430,6 @@ class SwitchCase extends Stmt, @case { } /** - * PREVIEW FEATURE in Java 13. Subject to removal in a future release. - * * Holds if this `case` is a switch labeled rule of the form `... -> ...`. */ predicate isRule() { @@ -443,15 +439,11 @@ class SwitchCase extends Stmt, @case { } /** - * PREVIEW FEATURE in Java 13. Subject to removal in a future release. - * * Gets the expression on the right-hand side of the arrow, if any. */ Expr getRuleExpression() { result.getParent() = this and result.getIndex() = -1 } /** - * PREVIEW FEATURE in Java 13. Subject to removal in a future release. - * * Gets the statement on the right-hand side of the arrow, if any. */ Stmt getRuleStatement() { result.getParent() = this and result.getIndex() = -1 } @@ -465,8 +457,6 @@ class ConstCase extends SwitchCase { Expr getValue() { result.getParent() = this and result.getIndex() = 0 } /** - * PREVIEW FEATURE in Java 13. Subject to removal in a future release. - * * Gets the `case` constant at the specified index. */ Expr getValue(int i) { result.getParent() = this and result.getIndex() = i and i >= 0 } @@ -624,8 +614,6 @@ class BreakStmt extends Stmt, @breakstmt { } /** - * PREVIEW FEATURE in Java 13. Subject to removal in a future release. - * * A `yield` statement. */ class YieldStmt extends Stmt, @yieldstmt { diff --git a/java/ql/src/semmle/code/java/Type.qll b/java/ql/src/semmle/code/java/Type.qll index f08c5a5eea4..35f35d44928 100755 --- a/java/ql/src/semmle/code/java/Type.qll +++ b/java/ql/src/semmle/code/java/Type.qll @@ -615,6 +615,15 @@ class Class extends RefType, @class { } } +/** + * PREVIEW FEATURE in Java 14. Subject to removal in a future release. + * + * A record declaration. + */ +class Record extends Class { + Record() { isRecord(this) } +} + /** An intersection type. */ class IntersectionType extends RefType, @class { IntersectionType() { diff --git a/java/ql/test/library-tests/guards12/options b/java/ql/test/library-tests/guards12/options index 99827347b32..3f12170222c 100644 --- a/java/ql/test/library-tests/guards12/options +++ b/java/ql/test/library-tests/guards12/options @@ -1 +1 @@ -//semmle-extractor-options: --javac-args --enable-preview -source 13 -target 13 +//semmle-extractor-options: --javac-args -source 14 -target 14 diff --git a/java/ql/test/library-tests/structure/DeclaresMember.expected b/java/ql/test/library-tests/structure/DeclaresMember.expected index f4a9c678cc3..32c5f05891b 100644 --- a/java/ql/test/library-tests/structure/DeclaresMember.expected +++ b/java/ql/test/library-tests/structure/DeclaresMember.expected @@ -28,7 +28,6 @@ | LocalClass | LocalClass | | LocalClass | n | | MemberClass | MemberClass | -| Object | | | Object | Object | | Object | clone | | Object | equals | diff --git a/java/upgrades/054d7e823b2c5b93bf2a14d5c22a107934fbc133/old.dbscheme b/java/upgrades/054d7e823b2c5b93bf2a14d5c22a107934fbc133/old.dbscheme new file mode 100755 index 00000000000..054d7e823b2 --- /dev/null +++ b/java/upgrades/054d7e823b2c5b93bf2a14d5c22a107934fbc133/old.dbscheme @@ -0,0 +1,950 @@ +/** + * An invocation of the compiler. Note that more than one file may be + * compiled per invocation. For example, this command compiles three + * source files: + * + * javac A.java B.java C.java + * + * The `id` simply identifies the invocation, while `cwd` is the working + * directory from which the compiler was invoked. + */ +compilations( + /** + * An invocation of the compiler. Note that more than one file may + * be compiled per invocation. For example, this command compiles + * three source files: + * + * javac A.java B.java C.java + */ + unique int id : @compilation, + string cwd : string ref +); + +/** + * The arguments that were passed to the extractor for a compiler + * invocation. If `id` is for the compiler invocation + * + * javac A.java B.java C.java + * + * then typically there will be rows for + * + * num | arg + * --- | --- + * 0 | *path to extractor* + * 1 | `--javac-args` + * 2 | A.java + * 3 | B.java + * 4 | C.java + */ +#keyset[id, num] +compilation_args( + int id : @compilation ref, + int num : int ref, + string arg : string ref +); + +/** + * The source files that are compiled by a compiler invocation. + * If `id` is for the compiler invocation + * + * javac A.java B.java C.java + * + * then there will be rows for + * + * num | arg + * --- | --- + * 0 | A.java + * 1 | B.java + * 2 | C.java + */ +#keyset[id, num] +compilation_compiling_files( + int id : @compilation ref, + int num : int ref, + int file : @file ref +); + +/** + * The time taken by the extractor for a compiler invocation. + * + * For each file `num`, there will be rows for + * + * kind | seconds + * ---- | --- + * 1 | CPU seconds used by the extractor frontend + * 2 | Elapsed seconds during the extractor frontend + * 3 | CPU seconds used by the extractor backend + * 4 | Elapsed seconds during the extractor backend + */ +#keyset[id, num, kind] +compilation_time( + int id : @compilation ref, + int num : int ref, + /* kind: + 1 = frontend_cpu_seconds + 2 = frontend_elapsed_seconds + 3 = extractor_cpu_seconds + 4 = extractor_elapsed_seconds + */ + int kind : int ref, + float seconds : float ref +); + +/** + * An error or warning generated by the extractor. + * The diagnostic message `diagnostic` was generated during compiler + * invocation `compilation`, and is the `file_number_diagnostic_number`th + * message generated while extracting the `file_number`th file of that + * invocation. + */ +#keyset[compilation, file_number, file_number_diagnostic_number] +diagnostic_for( + unique int diagnostic : @diagnostic ref, + int compilation : @compilation ref, + int file_number : int ref, + int file_number_diagnostic_number : int ref +); + +/** + * If extraction was successful, then `cpu_seconds` and + * `elapsed_seconds` are the CPU time and elapsed time (respectively) + * that extraction took for compiler invocation `id`. + */ +compilation_finished( + unique int id : @compilation ref, + float cpu_seconds : float ref, + float elapsed_seconds : float ref +); + +diagnostics( + unique int id: @diagnostic, + int severity: int ref, + string error_tag: string ref, + string error_message: string ref, + string full_error_message: string ref, + int location: @location_default ref +); + +/* + * External artifacts + */ + +externalData( + int id : @externalDataElement, + string path : string ref, + int column: int ref, + string value : string ref +); + +snapshotDate( + unique date snapshotDate : date ref +); + +sourceLocationPrefix( + string prefix : string ref +); + +/* + * Duplicate code + */ + +duplicateCode( + unique int id : @duplication, + string relativePath : string ref, + int equivClass : int ref +); + +similarCode( + unique int id : @similarity, + string relativePath : string ref, + int equivClass : int ref +); + +@duplication_or_similarity = @duplication | @similarity + +tokens( + int id : @duplication_or_similarity ref, + int offset : int ref, + int beginLine : int ref, + int beginColumn : int ref, + int endLine : int ref, + int endColumn : int ref +); + +/* + * Locations and files + */ + +@location = @location_default ; + +locations_default( + unique int id: @location_default, + int file: @file ref, + int beginLine: int ref, + int beginColumn: int ref, + int endLine: int ref, + int endColumn: int ref +); + +hasLocation( + int locatableid: @locatable ref, + int id: @location ref +); + +@sourceline = @locatable ; + +#keyset[element_id] +numlines( + int element_id: @sourceline ref, + int num_lines: int ref, + int num_code: int ref, + int num_comment: int ref +); + +files( + unique int id: @file, + string name: string ref, + string simple: string ref, + string ext: string ref, + int fromSource: int ref // deprecated +); + +folders( + unique int id: @folder, + string name: string ref, + string simple: string ref +); + +@container = @folder | @file + +containerparent( + int parent: @container ref, + unique int child: @container ref +); + +/* + * Java + */ + +cupackage( + unique int id: @file ref, + int packageid: @package ref +); + +#keyset[fileid,keyName] +jarManifestMain( + int fileid: @file ref, + string keyName: string ref, + string value: string ref +); + +#keyset[fileid,entryName,keyName] +jarManifestEntries( + int fileid: @file ref, + string entryName: string ref, + string keyName: string ref, + string value: string ref +); + +packages( + unique int id: @package, + string nodeName: string ref +); + +primitives( + unique int id: @primitive, + string nodeName: string ref +); + +modifiers( + unique int id: @modifier, + string nodeName: string ref +); + +classes( + unique int id: @class, + string nodeName: string ref, + int parentid: @package ref, + int sourceid: @class ref +); + +interfaces( + unique int id: @interface, + string nodeName: string ref, + int parentid: @package ref, + int sourceid: @interface ref +); + +fielddecls( + unique int id: @fielddecl, + int parentid: @reftype ref +); + +#keyset[fieldId] #keyset[fieldDeclId,pos] +fieldDeclaredIn( + int fieldId: @field ref, + int fieldDeclId: @fielddecl ref, + int pos: int ref +); + +fields( + unique int id: @field, + string nodeName: string ref, + int typeid: @type ref, + int parentid: @reftype ref, + int sourceid: @field ref +); + +constrs( + unique int id: @constructor, + string nodeName: string ref, + string signature: string ref, + int typeid: @type ref, + int parentid: @reftype ref, + int sourceid: @constructor ref +); + +methods( + unique int id: @method, + string nodeName: string ref, + string signature: string ref, + int typeid: @type ref, + int parentid: @reftype ref, + int sourceid: @method ref +); + +#keyset[parentid,pos] +params( + unique int id: @param, + int typeid: @type ref, + int pos: int ref, + int parentid: @callable ref, + int sourceid: @param ref +); + +paramName( + unique int id: @param ref, + string nodeName: string ref +); + +isVarargsParam( + int param: @param ref +); + +exceptions( + unique int id: @exception, + int typeid: @type ref, + int parentid: @callable ref +); + +isAnnotType( + int interfaceid: @interface ref +); + +isAnnotElem( + int methodid: @method ref +); + +annotValue( + int parentid: @annotation ref, + int id2: @method ref, + unique int value: @expr ref +); + +isEnumType( + int classid: @class ref +); + +isEnumConst( + int fieldid: @field ref +); + +#keyset[parentid,pos] +typeVars( + unique int id: @typevariable, + string nodeName: string ref, + int pos: int ref, + int kind: int ref, // deprecated + int parentid: @typeorcallable ref +); + +wildcards( + unique int id: @wildcard, + string nodeName: string ref, + int kind: int ref +); + +#keyset[parentid,pos] +typeBounds( + unique int id: @typebound, + int typeid: @reftype ref, + int pos: int ref, + int parentid: @boundedtype ref +); + +#keyset[parentid,pos] +typeArgs( + int argumentid: @reftype ref, + int pos: int ref, + int parentid: @typeorcallable ref +); + +isParameterized( + int memberid: @member ref +); + +isRaw( + int memberid: @member ref +); + +erasure( + unique int memberid: @member ref, + int erasureid: @member ref +); + +#keyset[classid] #keyset[parent] +isAnonymClass( + int classid: @class ref, + int parent: @classinstancexpr ref +); + +#keyset[classid] #keyset[parent] +isLocalClass( + int classid: @class ref, + int parent: @localclassdeclstmt ref +); + +isDefConstr( + int constructorid: @constructor ref +); + +#keyset[exprId] +lambdaKind( + int exprId: @lambdaexpr ref, + int bodyKind: int ref +); + +arrays( + unique int id: @array, + string nodeName: string ref, + int elementtypeid: @type ref, + int dimension: int ref, + int componenttypeid: @type ref +); + +enclInReftype( + unique int child: @reftype ref, + int parent: @reftype ref +); + +extendsReftype( + int id1: @reftype ref, + int id2: @classorinterface ref +); + +implInterface( + int id1: @classorarray ref, + int id2: @interface ref +); + +hasModifier( + int id1: @modifiable ref, + int id2: @modifier ref +); + +imports( + unique int id: @import, + int holder: @typeorpackage ref, + string name: string ref, + int kind: int ref +); + +#keyset[parent,idx] +stmts( + unique int id: @stmt, + int kind: int ref, + int parent: @stmtparent ref, + int idx: int ref, + int bodydecl: @callable ref +); + +@stmtparent = @callable | @stmt | @switchexpr; + +case @stmt.kind of + 0 = @block +| 1 = @ifstmt +| 2 = @forstmt +| 3 = @enhancedforstmt +| 4 = @whilestmt +| 5 = @dostmt +| 6 = @trystmt +| 7 = @switchstmt +| 8 = @synchronizedstmt +| 9 = @returnstmt +| 10 = @throwstmt +| 11 = @breakstmt +| 12 = @continuestmt +| 13 = @emptystmt +| 14 = @exprstmt +| 15 = @labeledstmt +| 16 = @assertstmt +| 17 = @localvariabledeclstmt +| 18 = @localclassdeclstmt +| 19 = @constructorinvocationstmt +| 20 = @superconstructorinvocationstmt +| 21 = @case +| 22 = @catchclause +| 23 = @yieldstmt +; + +#keyset[parent,idx] +exprs( + unique int id: @expr, + int kind: int ref, + int typeid: @type ref, + int parent: @exprparent ref, + int idx: int ref +); + +callableEnclosingExpr( + unique int id: @expr ref, + int callable_id: @callable ref +); + +statementEnclosingExpr( + unique int id: @expr ref, + int statement_id: @stmt ref +); + +isParenthesized( + unique int id: @expr ref, + int parentheses: int ref +); + +case @expr.kind of + 1 = @arrayaccess +| 2 = @arraycreationexpr +| 3 = @arrayinit +| 4 = @assignexpr +| 5 = @assignaddexpr +| 6 = @assignsubexpr +| 7 = @assignmulexpr +| 8 = @assigndivexpr +| 9 = @assignremexpr +| 10 = @assignandexpr +| 11 = @assignorexpr +| 12 = @assignxorexpr +| 13 = @assignlshiftexpr +| 14 = @assignrshiftexpr +| 15 = @assignurshiftexpr +| 16 = @booleanliteral +| 17 = @integerliteral +| 18 = @longliteral +| 19 = @floatingpointliteral +| 20 = @doubleliteral +| 21 = @characterliteral +| 22 = @stringliteral +| 23 = @nullliteral +| 24 = @mulexpr +| 25 = @divexpr +| 26 = @remexpr +| 27 = @addexpr +| 28 = @subexpr +| 29 = @lshiftexpr +| 30 = @rshiftexpr +| 31 = @urshiftexpr +| 32 = @andbitexpr +| 33 = @orbitexpr +| 34 = @xorbitexpr +| 35 = @andlogicalexpr +| 36 = @orlogicalexpr +| 37 = @ltexpr +| 38 = @gtexpr +| 39 = @leexpr +| 40 = @geexpr +| 41 = @eqexpr +| 42 = @neexpr +| 43 = @postincexpr +| 44 = @postdecexpr +| 45 = @preincexpr +| 46 = @predecexpr +| 47 = @minusexpr +| 48 = @plusexpr +| 49 = @bitnotexpr +| 50 = @lognotexpr +| 51 = @castexpr +| 52 = @newexpr +| 53 = @conditionalexpr +| 54 = @parexpr // deprecated +| 55 = @instanceofexpr +| 56 = @localvariabledeclexpr +| 57 = @typeliteral +| 58 = @thisaccess +| 59 = @superaccess +| 60 = @varaccess +| 61 = @methodaccess +| 62 = @unannotatedtypeaccess +| 63 = @arraytypeaccess +| 64 = @packageaccess +| 65 = @wildcardtypeaccess +| 66 = @declannotation +| 67 = @uniontypeaccess +| 68 = @lambdaexpr +| 69 = @memberref +| 70 = @annotatedtypeaccess +| 71 = @typeannotation +| 72 = @intersectiontypeaccess +| 73 = @switchexpr +; + +@classinstancexpr = @newexpr | @lambdaexpr | @memberref + +@annotation = @declannotation | @typeannotation +@typeaccess = @unannotatedtypeaccess | @annotatedtypeaccess + +@assignment = @assignexpr + | @assignop; + +@unaryassignment = @postincexpr + | @postdecexpr + | @preincexpr + | @predecexpr; + +@assignop = @assignaddexpr + | @assignsubexpr + | @assignmulexpr + | @assigndivexpr + | @assignremexpr + | @assignandexpr + | @assignorexpr + | @assignxorexpr + | @assignlshiftexpr + | @assignrshiftexpr + | @assignurshiftexpr; + +@literal = @booleanliteral + | @integerliteral + | @longliteral + | @floatingpointliteral + | @doubleliteral + | @characterliteral + | @stringliteral + | @nullliteral; + +@binaryexpr = @mulexpr + | @divexpr + | @remexpr + | @addexpr + | @subexpr + | @lshiftexpr + | @rshiftexpr + | @urshiftexpr + | @andbitexpr + | @orbitexpr + | @xorbitexpr + | @andlogicalexpr + | @orlogicalexpr + | @ltexpr + | @gtexpr + | @leexpr + | @geexpr + | @eqexpr + | @neexpr; + +@unaryexpr = @postincexpr + | @postdecexpr + | @preincexpr + | @predecexpr + | @minusexpr + | @plusexpr + | @bitnotexpr + | @lognotexpr; + +@caller = @classinstancexpr + | @methodaccess + | @constructorinvocationstmt + | @superconstructorinvocationstmt; + +callableBinding( + unique int callerid: @caller ref, + int callee: @callable ref +); + +memberRefBinding( + unique int id: @expr ref, + int callable: @callable ref +); + +@exprparent = @stmt | @expr | @callable | @field | @fielddecl | @class | @interface | @param | @localvar | @typevariable; + +variableBinding( + unique int expr: @varaccess ref, + int variable: @variable ref +); + +@variable = @localscopevariable | @field; + +@localscopevariable = @localvar | @param; + +localvars( + unique int id: @localvar, + string nodeName: string ref, + int typeid: @type ref, + int parentid: @localvariabledeclexpr ref +); + +@namedexprorstmt = @breakstmt + | @continuestmt + | @labeledstmt + | @literal; + +namestrings( + string name: string ref, + string value: string ref, + unique int parent: @namedexprorstmt ref +); + +/* + * Modules + */ + +#keyset[name] +modules( + unique int id: @module, + string name: string ref +); + +isOpen( + int id: @module ref +); + +#keyset[fileId] +cumodule( + int fileId: @file ref, + int moduleId: @module ref +); + +@directive = @requires + | @exports + | @opens + | @uses + | @provides + +#keyset[directive] +directives( + int id: @module ref, + int directive: @directive ref +); + +requires( + unique int id: @requires, + int target: @module ref +); + +isTransitive( + int id: @requires ref +); + +isStatic( + int id: @requires ref +); + +exports( + unique int id: @exports, + int target: @package ref +); + +exportsTo( + int id: @exports ref, + int target: @module ref +); + +opens( + unique int id: @opens, + int target: @package ref +); + +opensTo( + int id: @opens ref, + int target: @module ref +); + +uses( + unique int id: @uses, + string serviceInterface: string ref +); + +provides( + unique int id: @provides, + string serviceInterface: string ref +); + +providesWith( + int id: @provides ref, + string serviceImpl: string ref +); + +/* + * Javadoc + */ + +javadoc( + unique int id: @javadoc +); + +isNormalComment( + int commentid : @javadoc ref +); + +isEolComment( + int commentid : @javadoc ref +); + +hasJavadoc( + int documentableid: @member ref, + int javadocid: @javadoc ref +); + +#keyset[parentid,idx] +javadocTag( + unique int id: @javadocTag, + string name: string ref, + int parentid: @javadocParent ref, + int idx: int ref +); + +#keyset[parentid,idx] +javadocText( + unique int id: @javadocText, + string text: string ref, + int parentid: @javadocParent ref, + int idx: int ref +); + +@javadocParent = @javadoc | @javadocTag; +@javadocElement = @javadocTag | @javadocText; + +@typeorpackage = @type | @package; + +@typeorcallable = @type | @callable; +@classorinterface = @interface | @class; +@boundedtype = @typevariable | @wildcard; +@reftype = @classorinterface | @array | @boundedtype; +@classorarray = @class | @array; +@type = @primitive | @reftype; +@callable = @method | @constructor; +@element = @file | @package | @primitive | @class | @interface | @method | @constructor | @modifier | @param | @exception | @field | + @annotation | @boundedtype | @array | @localvar | @expr | @stmt | @import | @fielddecl; + +@modifiable = @member_modifiable| @param | @localvar ; + +@member_modifiable = @class | @interface | @method | @constructor | @field ; + +@member = @method | @constructor | @field | @reftype ; + +@locatable = @file | @class | @interface | @fielddecl | @field | @constructor | @method | @param | @exception + | @boundedtype | @typebound | @array | @primitive + | @import | @stmt | @expr | @localvar | @javadoc | @javadocTag | @javadocText + | @xmllocatable; + +@top = @element | @locatable | @folder; + +/* + * XML Files + */ + +xmlEncoding( + unique int id: @file ref, + string encoding: string ref +); + +xmlDTDs( + unique int id: @xmldtd, + string root: string ref, + string publicId: string ref, + string systemId: string ref, + int fileid: @file ref +); + +xmlElements( + unique int id: @xmlelement, + string name: string ref, + int parentid: @xmlparent ref, + int idx: int ref, + int fileid: @file ref +); + +xmlAttrs( + unique int id: @xmlattribute, + int elementid: @xmlelement ref, + string name: string ref, + string value: string ref, + int idx: int ref, + int fileid: @file ref +); + +xmlNs( + int id: @xmlnamespace, + string prefixName: string ref, + string URI: string ref, + int fileid: @file ref +); + +xmlHasNs( + int elementId: @xmlnamespaceable ref, + int nsId: @xmlnamespace ref, + int fileid: @file ref +); + +xmlComments( + unique int id: @xmlcomment, + string text: string ref, + int parentid: @xmlparent ref, + int fileid: @file ref +); + +xmlChars( + unique int id: @xmlcharacters, + string text: string ref, + int parentid: @xmlparent ref, + int idx: int ref, + int isCDATA: int ref, + int fileid: @file ref +); + +@xmlparent = @file | @xmlelement; +@xmlnamespaceable = @xmlelement | @xmlattribute; + +xmllocations( + int xmlElement: @xmllocatable ref, + int location: @location_default ref +); + +@xmllocatable = @xmlcharacters | @xmlelement | @xmlcomment | @xmlattribute | @xmldtd | @file | @xmlnamespace; + +/* + * configuration files with key value pairs + */ + +configs( + unique int id: @config +); + +configNames( + unique int id: @configName, + int config: @config ref, + string name: string ref +); + +configValues( + unique int id: @configValue, + int config: @config ref, + string value: string ref +); + +configLocations( + int locatable: @configLocatable ref, + int location: @location_default ref +); + +@configLocatable = @config | @configName | @configValue; diff --git a/java/upgrades/054d7e823b2c5b93bf2a14d5c22a107934fbc133/semmlecode.dbscheme b/java/upgrades/054d7e823b2c5b93bf2a14d5c22a107934fbc133/semmlecode.dbscheme new file mode 100755 index 00000000000..2a682863863 --- /dev/null +++ b/java/upgrades/054d7e823b2c5b93bf2a14d5c22a107934fbc133/semmlecode.dbscheme @@ -0,0 +1,954 @@ +/** + * An invocation of the compiler. Note that more than one file may be + * compiled per invocation. For example, this command compiles three + * source files: + * + * javac A.java B.java C.java + * + * The `id` simply identifies the invocation, while `cwd` is the working + * directory from which the compiler was invoked. + */ +compilations( + /** + * An invocation of the compiler. Note that more than one file may + * be compiled per invocation. For example, this command compiles + * three source files: + * + * javac A.java B.java C.java + */ + unique int id : @compilation, + string cwd : string ref +); + +/** + * The arguments that were passed to the extractor for a compiler + * invocation. If `id` is for the compiler invocation + * + * javac A.java B.java C.java + * + * then typically there will be rows for + * + * num | arg + * --- | --- + * 0 | *path to extractor* + * 1 | `--javac-args` + * 2 | A.java + * 3 | B.java + * 4 | C.java + */ +#keyset[id, num] +compilation_args( + int id : @compilation ref, + int num : int ref, + string arg : string ref +); + +/** + * The source files that are compiled by a compiler invocation. + * If `id` is for the compiler invocation + * + * javac A.java B.java C.java + * + * then there will be rows for + * + * num | arg + * --- | --- + * 0 | A.java + * 1 | B.java + * 2 | C.java + */ +#keyset[id, num] +compilation_compiling_files( + int id : @compilation ref, + int num : int ref, + int file : @file ref +); + +/** + * The time taken by the extractor for a compiler invocation. + * + * For each file `num`, there will be rows for + * + * kind | seconds + * ---- | --- + * 1 | CPU seconds used by the extractor frontend + * 2 | Elapsed seconds during the extractor frontend + * 3 | CPU seconds used by the extractor backend + * 4 | Elapsed seconds during the extractor backend + */ +#keyset[id, num, kind] +compilation_time( + int id : @compilation ref, + int num : int ref, + /* kind: + 1 = frontend_cpu_seconds + 2 = frontend_elapsed_seconds + 3 = extractor_cpu_seconds + 4 = extractor_elapsed_seconds + */ + int kind : int ref, + float seconds : float ref +); + +/** + * An error or warning generated by the extractor. + * The diagnostic message `diagnostic` was generated during compiler + * invocation `compilation`, and is the `file_number_diagnostic_number`th + * message generated while extracting the `file_number`th file of that + * invocation. + */ +#keyset[compilation, file_number, file_number_diagnostic_number] +diagnostic_for( + unique int diagnostic : @diagnostic ref, + int compilation : @compilation ref, + int file_number : int ref, + int file_number_diagnostic_number : int ref +); + +/** + * If extraction was successful, then `cpu_seconds` and + * `elapsed_seconds` are the CPU time and elapsed time (respectively) + * that extraction took for compiler invocation `id`. + */ +compilation_finished( + unique int id : @compilation ref, + float cpu_seconds : float ref, + float elapsed_seconds : float ref +); + +diagnostics( + unique int id: @diagnostic, + int severity: int ref, + string error_tag: string ref, + string error_message: string ref, + string full_error_message: string ref, + int location: @location_default ref +); + +/* + * External artifacts + */ + +externalData( + int id : @externalDataElement, + string path : string ref, + int column: int ref, + string value : string ref +); + +snapshotDate( + unique date snapshotDate : date ref +); + +sourceLocationPrefix( + string prefix : string ref +); + +/* + * Duplicate code + */ + +duplicateCode( + unique int id : @duplication, + string relativePath : string ref, + int equivClass : int ref +); + +similarCode( + unique int id : @similarity, + string relativePath : string ref, + int equivClass : int ref +); + +@duplication_or_similarity = @duplication | @similarity + +tokens( + int id : @duplication_or_similarity ref, + int offset : int ref, + int beginLine : int ref, + int beginColumn : int ref, + int endLine : int ref, + int endColumn : int ref +); + +/* + * Locations and files + */ + +@location = @location_default ; + +locations_default( + unique int id: @location_default, + int file: @file ref, + int beginLine: int ref, + int beginColumn: int ref, + int endLine: int ref, + int endColumn: int ref +); + +hasLocation( + int locatableid: @locatable ref, + int id: @location ref +); + +@sourceline = @locatable ; + +#keyset[element_id] +numlines( + int element_id: @sourceline ref, + int num_lines: int ref, + int num_code: int ref, + int num_comment: int ref +); + +files( + unique int id: @file, + string name: string ref, + string simple: string ref, + string ext: string ref, + int fromSource: int ref // deprecated +); + +folders( + unique int id: @folder, + string name: string ref, + string simple: string ref +); + +@container = @folder | @file + +containerparent( + int parent: @container ref, + unique int child: @container ref +); + +/* + * Java + */ + +cupackage( + unique int id: @file ref, + int packageid: @package ref +); + +#keyset[fileid,keyName] +jarManifestMain( + int fileid: @file ref, + string keyName: string ref, + string value: string ref +); + +#keyset[fileid,entryName,keyName] +jarManifestEntries( + int fileid: @file ref, + string entryName: string ref, + string keyName: string ref, + string value: string ref +); + +packages( + unique int id: @package, + string nodeName: string ref +); + +primitives( + unique int id: @primitive, + string nodeName: string ref +); + +modifiers( + unique int id: @modifier, + string nodeName: string ref +); + +classes( + unique int id: @class, + string nodeName: string ref, + int parentid: @package ref, + int sourceid: @class ref +); + +isRecord( + unique int id: @class ref +); + +interfaces( + unique int id: @interface, + string nodeName: string ref, + int parentid: @package ref, + int sourceid: @interface ref +); + +fielddecls( + unique int id: @fielddecl, + int parentid: @reftype ref +); + +#keyset[fieldId] #keyset[fieldDeclId,pos] +fieldDeclaredIn( + int fieldId: @field ref, + int fieldDeclId: @fielddecl ref, + int pos: int ref +); + +fields( + unique int id: @field, + string nodeName: string ref, + int typeid: @type ref, + int parentid: @reftype ref, + int sourceid: @field ref +); + +constrs( + unique int id: @constructor, + string nodeName: string ref, + string signature: string ref, + int typeid: @type ref, + int parentid: @reftype ref, + int sourceid: @constructor ref +); + +methods( + unique int id: @method, + string nodeName: string ref, + string signature: string ref, + int typeid: @type ref, + int parentid: @reftype ref, + int sourceid: @method ref +); + +#keyset[parentid,pos] +params( + unique int id: @param, + int typeid: @type ref, + int pos: int ref, + int parentid: @callable ref, + int sourceid: @param ref +); + +paramName( + unique int id: @param ref, + string nodeName: string ref +); + +isVarargsParam( + int param: @param ref +); + +exceptions( + unique int id: @exception, + int typeid: @type ref, + int parentid: @callable ref +); + +isAnnotType( + int interfaceid: @interface ref +); + +isAnnotElem( + int methodid: @method ref +); + +annotValue( + int parentid: @annotation ref, + int id2: @method ref, + unique int value: @expr ref +); + +isEnumType( + int classid: @class ref +); + +isEnumConst( + int fieldid: @field ref +); + +#keyset[parentid,pos] +typeVars( + unique int id: @typevariable, + string nodeName: string ref, + int pos: int ref, + int kind: int ref, // deprecated + int parentid: @typeorcallable ref +); + +wildcards( + unique int id: @wildcard, + string nodeName: string ref, + int kind: int ref +); + +#keyset[parentid,pos] +typeBounds( + unique int id: @typebound, + int typeid: @reftype ref, + int pos: int ref, + int parentid: @boundedtype ref +); + +#keyset[parentid,pos] +typeArgs( + int argumentid: @reftype ref, + int pos: int ref, + int parentid: @typeorcallable ref +); + +isParameterized( + int memberid: @member ref +); + +isRaw( + int memberid: @member ref +); + +erasure( + unique int memberid: @member ref, + int erasureid: @member ref +); + +#keyset[classid] #keyset[parent] +isAnonymClass( + int classid: @class ref, + int parent: @classinstancexpr ref +); + +#keyset[classid] #keyset[parent] +isLocalClass( + int classid: @class ref, + int parent: @localclassdeclstmt ref +); + +isDefConstr( + int constructorid: @constructor ref +); + +#keyset[exprId] +lambdaKind( + int exprId: @lambdaexpr ref, + int bodyKind: int ref +); + +arrays( + unique int id: @array, + string nodeName: string ref, + int elementtypeid: @type ref, + int dimension: int ref, + int componenttypeid: @type ref +); + +enclInReftype( + unique int child: @reftype ref, + int parent: @reftype ref +); + +extendsReftype( + int id1: @reftype ref, + int id2: @classorinterface ref +); + +implInterface( + int id1: @classorarray ref, + int id2: @interface ref +); + +hasModifier( + int id1: @modifiable ref, + int id2: @modifier ref +); + +imports( + unique int id: @import, + int holder: @typeorpackage ref, + string name: string ref, + int kind: int ref +); + +#keyset[parent,idx] +stmts( + unique int id: @stmt, + int kind: int ref, + int parent: @stmtparent ref, + int idx: int ref, + int bodydecl: @callable ref +); + +@stmtparent = @callable | @stmt | @switchexpr; + +case @stmt.kind of + 0 = @block +| 1 = @ifstmt +| 2 = @forstmt +| 3 = @enhancedforstmt +| 4 = @whilestmt +| 5 = @dostmt +| 6 = @trystmt +| 7 = @switchstmt +| 8 = @synchronizedstmt +| 9 = @returnstmt +| 10 = @throwstmt +| 11 = @breakstmt +| 12 = @continuestmt +| 13 = @emptystmt +| 14 = @exprstmt +| 15 = @labeledstmt +| 16 = @assertstmt +| 17 = @localvariabledeclstmt +| 18 = @localclassdeclstmt +| 19 = @constructorinvocationstmt +| 20 = @superconstructorinvocationstmt +| 21 = @case +| 22 = @catchclause +| 23 = @yieldstmt +; + +#keyset[parent,idx] +exprs( + unique int id: @expr, + int kind: int ref, + int typeid: @type ref, + int parent: @exprparent ref, + int idx: int ref +); + +callableEnclosingExpr( + unique int id: @expr ref, + int callable_id: @callable ref +); + +statementEnclosingExpr( + unique int id: @expr ref, + int statement_id: @stmt ref +); + +isParenthesized( + unique int id: @expr ref, + int parentheses: int ref +); + +case @expr.kind of + 1 = @arrayaccess +| 2 = @arraycreationexpr +| 3 = @arrayinit +| 4 = @assignexpr +| 5 = @assignaddexpr +| 6 = @assignsubexpr +| 7 = @assignmulexpr +| 8 = @assigndivexpr +| 9 = @assignremexpr +| 10 = @assignandexpr +| 11 = @assignorexpr +| 12 = @assignxorexpr +| 13 = @assignlshiftexpr +| 14 = @assignrshiftexpr +| 15 = @assignurshiftexpr +| 16 = @booleanliteral +| 17 = @integerliteral +| 18 = @longliteral +| 19 = @floatingpointliteral +| 20 = @doubleliteral +| 21 = @characterliteral +| 22 = @stringliteral +| 23 = @nullliteral +| 24 = @mulexpr +| 25 = @divexpr +| 26 = @remexpr +| 27 = @addexpr +| 28 = @subexpr +| 29 = @lshiftexpr +| 30 = @rshiftexpr +| 31 = @urshiftexpr +| 32 = @andbitexpr +| 33 = @orbitexpr +| 34 = @xorbitexpr +| 35 = @andlogicalexpr +| 36 = @orlogicalexpr +| 37 = @ltexpr +| 38 = @gtexpr +| 39 = @leexpr +| 40 = @geexpr +| 41 = @eqexpr +| 42 = @neexpr +| 43 = @postincexpr +| 44 = @postdecexpr +| 45 = @preincexpr +| 46 = @predecexpr +| 47 = @minusexpr +| 48 = @plusexpr +| 49 = @bitnotexpr +| 50 = @lognotexpr +| 51 = @castexpr +| 52 = @newexpr +| 53 = @conditionalexpr +| 54 = @parexpr // deprecated +| 55 = @instanceofexpr +| 56 = @localvariabledeclexpr +| 57 = @typeliteral +| 58 = @thisaccess +| 59 = @superaccess +| 60 = @varaccess +| 61 = @methodaccess +| 62 = @unannotatedtypeaccess +| 63 = @arraytypeaccess +| 64 = @packageaccess +| 65 = @wildcardtypeaccess +| 66 = @declannotation +| 67 = @uniontypeaccess +| 68 = @lambdaexpr +| 69 = @memberref +| 70 = @annotatedtypeaccess +| 71 = @typeannotation +| 72 = @intersectiontypeaccess +| 73 = @switchexpr +; + +@classinstancexpr = @newexpr | @lambdaexpr | @memberref + +@annotation = @declannotation | @typeannotation +@typeaccess = @unannotatedtypeaccess | @annotatedtypeaccess + +@assignment = @assignexpr + | @assignop; + +@unaryassignment = @postincexpr + | @postdecexpr + | @preincexpr + | @predecexpr; + +@assignop = @assignaddexpr + | @assignsubexpr + | @assignmulexpr + | @assigndivexpr + | @assignremexpr + | @assignandexpr + | @assignorexpr + | @assignxorexpr + | @assignlshiftexpr + | @assignrshiftexpr + | @assignurshiftexpr; + +@literal = @booleanliteral + | @integerliteral + | @longliteral + | @floatingpointliteral + | @doubleliteral + | @characterliteral + | @stringliteral + | @nullliteral; + +@binaryexpr = @mulexpr + | @divexpr + | @remexpr + | @addexpr + | @subexpr + | @lshiftexpr + | @rshiftexpr + | @urshiftexpr + | @andbitexpr + | @orbitexpr + | @xorbitexpr + | @andlogicalexpr + | @orlogicalexpr + | @ltexpr + | @gtexpr + | @leexpr + | @geexpr + | @eqexpr + | @neexpr; + +@unaryexpr = @postincexpr + | @postdecexpr + | @preincexpr + | @predecexpr + | @minusexpr + | @plusexpr + | @bitnotexpr + | @lognotexpr; + +@caller = @classinstancexpr + | @methodaccess + | @constructorinvocationstmt + | @superconstructorinvocationstmt; + +callableBinding( + unique int callerid: @caller ref, + int callee: @callable ref +); + +memberRefBinding( + unique int id: @expr ref, + int callable: @callable ref +); + +@exprparent = @stmt | @expr | @callable | @field | @fielddecl | @class | @interface | @param | @localvar | @typevariable; + +variableBinding( + unique int expr: @varaccess ref, + int variable: @variable ref +); + +@variable = @localscopevariable | @field; + +@localscopevariable = @localvar | @param; + +localvars( + unique int id: @localvar, + string nodeName: string ref, + int typeid: @type ref, + int parentid: @localvariabledeclexpr ref +); + +@namedexprorstmt = @breakstmt + | @continuestmt + | @labeledstmt + | @literal; + +namestrings( + string name: string ref, + string value: string ref, + unique int parent: @namedexprorstmt ref +); + +/* + * Modules + */ + +#keyset[name] +modules( + unique int id: @module, + string name: string ref +); + +isOpen( + int id: @module ref +); + +#keyset[fileId] +cumodule( + int fileId: @file ref, + int moduleId: @module ref +); + +@directive = @requires + | @exports + | @opens + | @uses + | @provides + +#keyset[directive] +directives( + int id: @module ref, + int directive: @directive ref +); + +requires( + unique int id: @requires, + int target: @module ref +); + +isTransitive( + int id: @requires ref +); + +isStatic( + int id: @requires ref +); + +exports( + unique int id: @exports, + int target: @package ref +); + +exportsTo( + int id: @exports ref, + int target: @module ref +); + +opens( + unique int id: @opens, + int target: @package ref +); + +opensTo( + int id: @opens ref, + int target: @module ref +); + +uses( + unique int id: @uses, + string serviceInterface: string ref +); + +provides( + unique int id: @provides, + string serviceInterface: string ref +); + +providesWith( + int id: @provides ref, + string serviceImpl: string ref +); + +/* + * Javadoc + */ + +javadoc( + unique int id: @javadoc +); + +isNormalComment( + int commentid : @javadoc ref +); + +isEolComment( + int commentid : @javadoc ref +); + +hasJavadoc( + int documentableid: @member ref, + int javadocid: @javadoc ref +); + +#keyset[parentid,idx] +javadocTag( + unique int id: @javadocTag, + string name: string ref, + int parentid: @javadocParent ref, + int idx: int ref +); + +#keyset[parentid,idx] +javadocText( + unique int id: @javadocText, + string text: string ref, + int parentid: @javadocParent ref, + int idx: int ref +); + +@javadocParent = @javadoc | @javadocTag; +@javadocElement = @javadocTag | @javadocText; + +@typeorpackage = @type | @package; + +@typeorcallable = @type | @callable; +@classorinterface = @interface | @class; +@boundedtype = @typevariable | @wildcard; +@reftype = @classorinterface | @array | @boundedtype; +@classorarray = @class | @array; +@type = @primitive | @reftype; +@callable = @method | @constructor; +@element = @file | @package | @primitive | @class | @interface | @method | @constructor | @modifier | @param | @exception | @field | + @annotation | @boundedtype | @array | @localvar | @expr | @stmt | @import | @fielddecl; + +@modifiable = @member_modifiable| @param | @localvar ; + +@member_modifiable = @class | @interface | @method | @constructor | @field ; + +@member = @method | @constructor | @field | @reftype ; + +@locatable = @file | @class | @interface | @fielddecl | @field | @constructor | @method | @param | @exception + | @boundedtype | @typebound | @array | @primitive + | @import | @stmt | @expr | @localvar | @javadoc | @javadocTag | @javadocText + | @xmllocatable; + +@top = @element | @locatable | @folder; + +/* + * XML Files + */ + +xmlEncoding( + unique int id: @file ref, + string encoding: string ref +); + +xmlDTDs( + unique int id: @xmldtd, + string root: string ref, + string publicId: string ref, + string systemId: string ref, + int fileid: @file ref +); + +xmlElements( + unique int id: @xmlelement, + string name: string ref, + int parentid: @xmlparent ref, + int idx: int ref, + int fileid: @file ref +); + +xmlAttrs( + unique int id: @xmlattribute, + int elementid: @xmlelement ref, + string name: string ref, + string value: string ref, + int idx: int ref, + int fileid: @file ref +); + +xmlNs( + int id: @xmlnamespace, + string prefixName: string ref, + string URI: string ref, + int fileid: @file ref +); + +xmlHasNs( + int elementId: @xmlnamespaceable ref, + int nsId: @xmlnamespace ref, + int fileid: @file ref +); + +xmlComments( + unique int id: @xmlcomment, + string text: string ref, + int parentid: @xmlparent ref, + int fileid: @file ref +); + +xmlChars( + unique int id: @xmlcharacters, + string text: string ref, + int parentid: @xmlparent ref, + int idx: int ref, + int isCDATA: int ref, + int fileid: @file ref +); + +@xmlparent = @file | @xmlelement; +@xmlnamespaceable = @xmlelement | @xmlattribute; + +xmllocations( + int xmlElement: @xmllocatable ref, + int location: @location_default ref +); + +@xmllocatable = @xmlcharacters | @xmlelement | @xmlcomment | @xmlattribute | @xmldtd | @file | @xmlnamespace; + +/* + * configuration files with key value pairs + */ + +configs( + unique int id: @config +); + +configNames( + unique int id: @configName, + int config: @config ref, + string name: string ref +); + +configValues( + unique int id: @configValue, + int config: @config ref, + string value: string ref +); + +configLocations( + int locatable: @configLocatable ref, + int location: @location_default ref +); + +@configLocatable = @config | @configName | @configValue; diff --git a/java/upgrades/054d7e823b2c5b93bf2a14d5c22a107934fbc133/upgrade.properties b/java/upgrades/054d7e823b2c5b93bf2a14d5c22a107934fbc133/upgrade.properties new file mode 100644 index 00000000000..d42c96673ad --- /dev/null +++ b/java/upgrades/054d7e823b2c5b93bf2a14d5c22a107934fbc133/upgrade.properties @@ -0,0 +1,2 @@ +description: Java 14: add `isRecord` relation +compatibility: backwards diff --git a/javascript/ql/src/Declarations/DeadStoreOfLocal.ql b/javascript/ql/src/Declarations/DeadStoreOfLocal.ql index 8fcd88724af..1002ac0a4e3 100644 --- a/javascript/ql/src/Declarations/DeadStoreOfLocal.ql +++ b/javascript/ql/src/Declarations/DeadStoreOfLocal.ql @@ -63,8 +63,10 @@ where ( // To avoid confusion about the meaning of "definition" and "declaration" we avoid // the term "definition" when the alert location is a variable declaration. - if dead instanceof VariableDeclarator + if + dead instanceof VariableDeclarator and + not exists(SsaImplicitInit init | init.getVariable().getSourceVariable() = v) // the variable is dead at the hoisted implicit initialization. then msg = "The initial value of " + v.getName() + " is unused, since it is always overwritten." - else msg = "This definition of " + v.getName() + " is useless, since its value is never read." + else msg = "The value assigned to " + v.getName() + " here is unused." ) select dead, msg diff --git a/javascript/ql/src/Expressions/MissingAwait.ql b/javascript/ql/src/Expressions/MissingAwait.ql index be40eef0d4b..e4c092af5c9 100644 --- a/javascript/ql/src/Expressions/MissingAwait.ql +++ b/javascript/ql/src/Expressions/MissingAwait.ql @@ -45,7 +45,8 @@ predicate isBadPromiseContext(Expr expr) { or exists(UnaryExpr e | expr = e.getOperand() and - not e instanceof VoidExpr + not e instanceof VoidExpr and + not e instanceof DeleteExpr ) or expr = any(UpdateExpr e).getOperand() diff --git a/javascript/ql/src/Security/CWE-020/IncompleteUrlSchemeCheck.ql b/javascript/ql/src/Security/CWE-020/IncompleteUrlSchemeCheck.ql index 86ab1744086..0a9314382f2 100644 --- a/javascript/ql/src/Security/CWE-020/IncompleteUrlSchemeCheck.ql +++ b/javascript/ql/src/Security/CWE-020/IncompleteUrlSchemeCheck.ql @@ -12,11 +12,46 @@ */ import javascript -import semmle.javascript.dataflow.internal.AccessPaths /** A URL scheme that can be used to represent executable code. */ class DangerousScheme extends string { DangerousScheme() { this = "data:" or this = "javascript:" or this = "vbscript:" } + + /** Gets the name of this scheme without the `:`. */ + string getWithoutColon() { this = result + ":" } + + /** Gets the name of this scheme, with or without the `:`. */ + string getWithOrWithoutColon() { result = this or result = getWithoutColon() } +} + +/** Returns a node that refers to the scheme of `url`. */ +DataFlow::SourceNode schemeOf(DataFlow::Node url) { + // url.split(":")[0] + exists(DataFlow::MethodCallNode split | + split.getMethodName() = "split" and + split.getArgument(0).getStringValue() = ":" and + result = split.getAPropertyRead("0") and + url = split.getReceiver() + ) + or + // url.getScheme(), url.getProtocol(), getScheme(url), getProtocol(url) + exists(DataFlow::CallNode call | + result = call and + (call.getCalleeName() = "getScheme" or call.getCalleeName() = "getProtocol") + | + call.getNumArgument() = 1 and + url = call.getArgument(0) + or + call.getNumArgument() = 0 and + url = call.getReceiver() + ) + or + // url.scheme, url.protocol + exists(DataFlow::PropRead prop | + result = prop and + (prop.getPropertyName() = "scheme" or prop.getPropertyName() = "protocol") and + url = prop.getBase() + ) } /** Gets a data-flow node that checks `nd` against the given `scheme`. */ @@ -27,6 +62,20 @@ DataFlow::Node schemeCheck(DataFlow::Node nd, DangerousScheme scheme) { sw.getSubstring().mayHaveStringValue(scheme) ) or + // check of the form `array.includes(getScheme(nd))` + exists(InclusionTest test, DataFlow::ArrayCreationNode array | test = result | + schemeOf(nd).flowsTo(test.getContainedNode()) and + array.flowsTo(test.getContainerNode()) and + array.getAnElement().mayHaveStringValue(scheme.getWithOrWithoutColon()) + ) + or + // check of the form `getScheme(nd) === scheme` + exists(EqualityTest test, Expr op1, Expr op2 | test.flow() = result | + test.hasOperands(op1, op2) and + schemeOf(nd).flowsToExpr(op1) and + op2.mayHaveStringValue(scheme.getWithOrWithoutColon()) + ) + or // propagate through trimming, case conversion, and regexp replace exists(DataFlow::MethodCallNode stringop | stringop.getMethodName().matches("trim%") or @@ -42,14 +91,14 @@ DataFlow::Node schemeCheck(DataFlow::Node nd, DangerousScheme scheme) { } /** Gets a data-flow node that checks an instance of `ap` against the given `scheme`. */ -DataFlow::Node schemeCheckOn(AccessPath ap, DangerousScheme scheme) { - result = schemeCheck(ap.getAnInstance().flow(), scheme) +DataFlow::Node schemeCheckOn(DataFlow::SourceNode root, string path, DangerousScheme scheme) { + result = schemeCheck(AccessPath::getAReferenceTo(root, path), scheme) } -from AccessPath ap, int n +from DataFlow::SourceNode root, string path, int n where n = strictcount(DangerousScheme s) and - strictcount(DangerousScheme s | exists(schemeCheckOn(ap, s))) < n -select schemeCheckOn(ap, "javascript:"), + strictcount(DangerousScheme s | exists(schemeCheckOn(root, path, s))) < n +select schemeCheckOn(root, path, "javascript:"), "This check does not consider " + - strictconcat(DangerousScheme s | not exists(schemeCheckOn(ap, s)) | s, " and ") + "." + strictconcat(DangerousScheme s | not exists(schemeCheckOn(root, path, s)) | s, " and ") + "." diff --git a/javascript/ql/src/codeql-suites/javascript-code-scanning.qls b/javascript/ql/src/codeql-suites/javascript-code-scanning.qls new file mode 100644 index 00000000000..f87a55157a2 --- /dev/null +++ b/javascript/ql/src/codeql-suites/javascript-code-scanning.qls @@ -0,0 +1,4 @@ +- description: Standard Code Scanning queries for JavaScript +- qlpack: codeql-javascript +- apply: code-scanning-selectors.yml + from: codeql-suite-helpers diff --git a/javascript/ql/src/semmle/javascript/security/dataflow/TaintedPath.qll b/javascript/ql/src/semmle/javascript/security/dataflow/TaintedPath.qll index 06b0a6eb9e6..3a8a20895d8 100644 --- a/javascript/ql/src/semmle/javascript/security/dataflow/TaintedPath.qll +++ b/javascript/ql/src/semmle/javascript/security/dataflow/TaintedPath.qll @@ -206,6 +206,20 @@ module TaintedPath { dstlabel.isNormalized() ) or + // foo.replace(/(\.\.\/)*/, "") and similar + exists(DotDotSlashPrefixRemovingReplace call | + src = call.getInput() and + dst = call.getOutput() + | + // the 4 possible combinations of normalized + relative for `srclabel`, and the possible values for `dstlabel` in each case. + srclabel.isNonNormalized() and srclabel.isRelative() // raw + relative -> any() + or + srclabel.isNormalized() and srclabel.isAbsolute() and srclabel = dstlabel // normalized + absolute -> normalized + absolute + or + srclabel.isNonNormalized() and srclabel.isAbsolute() and dstlabel.isAbsolute() // raw + absolute -> raw/normalized + absolute + // normalized + relative -> none() + ) + or // path.join() exists(DataFlow::CallNode join, int n | join = NodeJSLib::Path::moduleMember("join").getACall() diff --git a/javascript/ql/src/semmle/javascript/security/dataflow/TaintedPathCustomizations.qll b/javascript/ql/src/semmle/javascript/security/dataflow/TaintedPathCustomizations.qll index c55fa8bad33..66708b4630e 100644 --- a/javascript/ql/src/semmle/javascript/security/dataflow/TaintedPathCustomizations.qll +++ b/javascript/ql/src/semmle/javascript/security/dataflow/TaintedPathCustomizations.qll @@ -225,7 +225,8 @@ module TaintedPath { term.getAMatchedString() = "/" or term.getAMatchedString() = "." or term.getAMatchedString() = ".." - ) + ) and + not this instanceof DotDotSlashPrefixRemovingReplace } /** @@ -239,6 +240,57 @@ module TaintedPath { DataFlow::Node getOutput() { result = output } } + /** + * A call that removes all instances of "../" in the prefix of the string. + */ + class DotDotSlashPrefixRemovingReplace extends DataFlow::CallNode { + DataFlow::Node input; + DataFlow::Node output; + + DotDotSlashPrefixRemovingReplace() { + this.getCalleeName() = "replace" and + input = getReceiver() and + output = this and + exists(RegExpLiteral literal, RegExpTerm term | + getArgument(0).getALocalSource().asExpr() = literal and + (term instanceof RegExpStar or term instanceof RegExpPlus) and + term.getChild(0) = getADotDotSlashMatcher() + | + literal.getRoot() = term + or + exists(RegExpSequence seq | seq.getNumChild() = 2 and literal.getRoot() = seq | + seq.getChild(0) instanceof RegExpCaret and + seq.getChild(1) = term + ) + ) + } + + /** + * Gets the input path to be sanitized. + */ + DataFlow::Node getInput() { result = input } + + /** + * Gets the path where prefix "../" has been removed. + */ + DataFlow::Node getOutput() { result = output } + } + + /** + * Gets a RegExpTerm that matches a variation of "../". + */ + private RegExpTerm getADotDotSlashMatcher() { + result.getAMatchedString() = "../" + or + exists(RegExpSequence seq | seq = result | + seq.getChild(0).getConstantValue() = "." and + seq.getChild(1).getConstantValue() = "." and + seq.getAChild().getAMatchedString() = "/" + ) + or + exists(RegExpGroup group | result = group | group.getChild(0) = getADotDotSlashMatcher()) + } + /** * A call that removes all "." or ".." from a path, without also removing all forward slashes. */ diff --git a/javascript/ql/test/query-tests/Declarations/DeadStoreOfLocal/DeadStoreOfLocal.expected b/javascript/ql/test/query-tests/Declarations/DeadStoreOfLocal/DeadStoreOfLocal.expected index 52143fd3d7b..88b5fc55b25 100644 --- a/javascript/ql/test/query-tests/Declarations/DeadStoreOfLocal/DeadStoreOfLocal.expected +++ b/javascript/ql/test/query-tests/Declarations/DeadStoreOfLocal/DeadStoreOfLocal.expected @@ -1,12 +1,13 @@ -| overload.ts:10:12:10:14 | baz | This definition of baz is useless, since its value is never read. | +| overload.ts:10:12:10:14 | baz | The value assigned to baz here is unused. | | tst2.js:26:9:26:14 | x = 23 | The initial value of x is unused, since it is always overwritten. | -| tst2.js:28:9:28:14 | x = 42 | This definition of x is useless, since its value is never read. | -| tst3.js:2:1:2:36 | exports ... a: 23 } | This definition of exports is useless, since its value is never read. | -| tst3b.js:2:18:2:36 | exports = { a: 23 } | This definition of exports is useless, since its value is never read. | -| tst.js:6:2:6:7 | y = 23 | This definition of y is useless, since its value is never read. | +| tst2.js:28:9:28:14 | x = 42 | The value assigned to x here is unused. | +| tst3.js:2:1:2:36 | exports ... a: 23 } | The value assigned to exports here is unused. | +| tst3b.js:2:18:2:36 | exports = { a: 23 } | The value assigned to exports here is unused. | +| tst.js:6:2:6:7 | y = 23 | The value assigned to y here is unused. | | tst.js:13:6:13:11 | a = 23 | The initial value of a is unused, since it is always overwritten. | -| tst.js:13:14:13:19 | a = 42 | This definition of a is useless, since its value is never read. | +| tst.js:13:14:13:19 | a = 42 | The value assigned to a here is unused. | | tst.js:45:6:45:11 | x = 23 | The initial value of x is unused, since it is always overwritten. | | tst.js:51:6:51:11 | x = 23 | The initial value of x is unused, since it is always overwritten. | | tst.js:132:7:132:13 | {x} = o | The initial value of x is unused, since it is always overwritten. | | tst.js:162:6:162:14 | [x] = [0] | The initial value of x is unused, since it is always overwritten. | +| tst.js:172:7:172:17 | nSign = foo | The value assigned to nSign here is unused. | diff --git a/javascript/ql/test/query-tests/Declarations/DeadStoreOfLocal/tst.js b/javascript/ql/test/query-tests/Declarations/DeadStoreOfLocal/tst.js index c398cf3370e..f19b1656da2 100644 --- a/javascript/ql/test/query-tests/Declarations/DeadStoreOfLocal/tst.js +++ b/javascript/ql/test/query-tests/Declarations/DeadStoreOfLocal/tst.js @@ -166,3 +166,11 @@ function v() { x; y; }); + +(function() { + if (something()) { + var nSign = foo; + } else { + console.log(nSign); + } +})() diff --git a/javascript/ql/test/query-tests/Expressions/MissingAwait/tsTest.ts b/javascript/ql/test/query-tests/Expressions/MissingAwait/tsTest.ts new file mode 100644 index 00000000000..4362c11a8e6 --- /dev/null +++ b/javascript/ql/test/query-tests/Expressions/MissingAwait/tsTest.ts @@ -0,0 +1,5 @@ +declare let cache: { [x: string]: Promise }; + +function deleteCache(x: string) { + delete cache[x]; // OK +} diff --git a/javascript/ql/test/query-tests/Expressions/MissingAwait/tsconfig.json b/javascript/ql/test/query-tests/Expressions/MissingAwait/tsconfig.json new file mode 100644 index 00000000000..82194fc7ab0 --- /dev/null +++ b/javascript/ql/test/query-tests/Expressions/MissingAwait/tsconfig.json @@ -0,0 +1,3 @@ +{ + "include": ["."] +} diff --git a/javascript/ql/test/query-tests/Security/CWE-020/IncompleteUrlSchemeCheck.expected b/javascript/ql/test/query-tests/Security/CWE-020/IncompleteUrlSchemeCheck.expected index de7800f58fd..05b255fad02 100644 --- a/javascript/ql/test/query-tests/Security/CWE-020/IncompleteUrlSchemeCheck.expected +++ b/javascript/ql/test/query-tests/Security/CWE-020/IncompleteUrlSchemeCheck.expected @@ -1 +1,5 @@ -| IncompleteUrlSchemeCheck.js:3:9:3:35 | u.start ... ript:") | This check does not consider data: and vbscript:. | +| IncompleteUrlSchemeCheck.js:5:9:5:35 | u.start ... ript:") | This check does not consider data: and vbscript:. | +| IncompleteUrlSchemeCheck.js:16:9:16:39 | badProt ... otocol) | This check does not consider vbscript:. | +| IncompleteUrlSchemeCheck.js:23:9:23:43 | badProt ... scheme) | This check does not consider vbscript:. | +| IncompleteUrlSchemeCheck.js:30:9:30:43 | badProt ... scheme) | This check does not consider vbscript:. | +| IncompleteUrlSchemeCheck.js:37:9:37:31 | scheme ... script" | This check does not consider data: and vbscript:. | diff --git a/javascript/ql/test/query-tests/Security/CWE-020/IncompleteUrlSchemeCheck.js b/javascript/ql/test/query-tests/Security/CWE-020/IncompleteUrlSchemeCheck.js index 270cff2d821..617bb224da9 100644 --- a/javascript/ql/test/query-tests/Security/CWE-020/IncompleteUrlSchemeCheck.js +++ b/javascript/ql/test/query-tests/Security/CWE-020/IncompleteUrlSchemeCheck.js @@ -1,6 +1,47 @@ +import * as dummy from 'dummy'; + function sanitizeUrl(url) { let u = decodeURI(url).trim().toLowerCase(); - if (u.startsWith("javascript:")) + if (u.startsWith("javascript:")) // NOT OK + return "about:blank"; + return url; +} + +let badProtocols = ['javascript:', 'data:']; +let badProtocolNoColon = ['javascript', 'data']; +let badProtocolsGood = ['javascript:', 'data:', 'vbscript:']; + +function test2(url) { + let protocol = new URL(url).protocol; + if (badProtocols.includes(protocol)) // NOT OK + return "about:blank"; + return url; +} + +function test3(url) { + let scheme = goog.uri.utils.getScheme(url); + if (badProtocolNoColon.includes(scheme)) // NOT OK + return "about:blank"; + return url; +} + +function test4(url) { + let scheme = url.split(':')[0]; + if (badProtocolNoColon.includes(scheme)) // NOT OK + return "about:blank"; + return url; +} + +function test5(url) { + let scheme = url.split(':')[0]; + if (scheme === "javascript") // NOT OK + return "about:blank"; + return url; +} + +function test6(url) { + let protocol = new URL(url).protocol; + if (badProtocolsGood.includes(protocol)) // OK return "about:blank"; return url; } diff --git a/javascript/ql/test/query-tests/Security/CWE-022/TaintedPath/TaintedPath.expected b/javascript/ql/test/query-tests/Security/CWE-022/TaintedPath/TaintedPath.expected index ec4531fde13..fb4fcb3503e 100644 --- a/javascript/ql/test/query-tests/Security/CWE-022/TaintedPath/TaintedPath.expected +++ b/javascript/ql/test/query-tests/Security/CWE-022/TaintedPath/TaintedPath.expected @@ -1277,6 +1277,48 @@ nodes | TaintedPath.js:186:29:186:57 | path.re ... /g, '') | | TaintedPath.js:186:29:186:57 | path.re ... /g, '') | | TaintedPath.js:186:29:186:57 | path.re ... /g, '') | +| TaintedPath.js:201:29:201:73 | "prefix ... +/, '') | +| TaintedPath.js:201:29:201:73 | "prefix ... +/, '') | +| TaintedPath.js:201:29:201:73 | "prefix ... +/, '') | +| TaintedPath.js:201:29:201:73 | "prefix ... +/, '') | +| TaintedPath.js:201:29:201:73 | "prefix ... +/, '') | +| TaintedPath.js:201:40:201:43 | path | +| TaintedPath.js:201:40:201:43 | path | +| TaintedPath.js:201:40:201:43 | path | +| TaintedPath.js:201:40:201:43 | path | +| TaintedPath.js:201:40:201:43 | path | +| TaintedPath.js:201:40:201:43 | path | +| TaintedPath.js:201:40:201:43 | path | +| TaintedPath.js:201:40:201:43 | path | +| TaintedPath.js:201:40:201:73 | path.re ... +/, '') | +| TaintedPath.js:201:40:201:73 | path.re ... +/, '') | +| TaintedPath.js:201:40:201:73 | path.re ... +/, '') | +| TaintedPath.js:201:40:201:73 | path.re ... +/, '') | +| TaintedPath.js:201:40:201:73 | path.re ... +/, '') | +| TaintedPath.js:201:40:201:73 | path.re ... +/, '') | +| TaintedPath.js:201:40:201:73 | path.re ... +/, '') | +| TaintedPath.js:201:40:201:73 | path.re ... +/, '') | +| TaintedPath.js:201:40:201:73 | path.re ... +/, '') | +| TaintedPath.js:201:40:201:73 | path.re ... +/, '') | +| TaintedPath.js:201:40:201:73 | path.re ... +/, '') | +| TaintedPath.js:201:40:201:73 | path.re ... +/, '') | +| TaintedPath.js:202:29:202:54 | pathMod ... e(path) | +| TaintedPath.js:202:29:202:54 | pathMod ... e(path) | +| TaintedPath.js:202:29:202:54 | pathMod ... e(path) | +| TaintedPath.js:202:29:202:54 | pathMod ... e(path) | +| TaintedPath.js:202:29:202:84 | pathMod ... +/, '') | +| TaintedPath.js:202:29:202:84 | pathMod ... +/, '') | +| TaintedPath.js:202:29:202:84 | pathMod ... +/, '') | +| TaintedPath.js:202:29:202:84 | pathMod ... +/, '') | +| TaintedPath.js:202:29:202:84 | pathMod ... +/, '') | +| TaintedPath.js:202:50:202:53 | path | +| TaintedPath.js:202:50:202:53 | path | +| TaintedPath.js:202:50:202:53 | path | +| TaintedPath.js:202:50:202:53 | path | +| TaintedPath.js:202:50:202:53 | path | +| TaintedPath.js:202:50:202:53 | path | +| TaintedPath.js:202:50:202:53 | path | +| TaintedPath.js:202:50:202:53 | path | | normalizedPaths.js:11:7:11:27 | path | | normalizedPaths.js:11:7:11:27 | path | | normalizedPaths.js:11:7:11:27 | path | @@ -4385,6 +4427,22 @@ edges | TaintedPath.js:173:7:173:48 | path | TaintedPath.js:186:29:186:32 | path | | TaintedPath.js:173:7:173:48 | path | TaintedPath.js:186:29:186:32 | path | | TaintedPath.js:173:7:173:48 | path | TaintedPath.js:186:29:186:32 | path | +| TaintedPath.js:173:7:173:48 | path | TaintedPath.js:201:40:201:43 | path | +| TaintedPath.js:173:7:173:48 | path | TaintedPath.js:201:40:201:43 | path | +| TaintedPath.js:173:7:173:48 | path | TaintedPath.js:201:40:201:43 | path | +| TaintedPath.js:173:7:173:48 | path | TaintedPath.js:201:40:201:43 | path | +| TaintedPath.js:173:7:173:48 | path | TaintedPath.js:201:40:201:43 | path | +| TaintedPath.js:173:7:173:48 | path | TaintedPath.js:201:40:201:43 | path | +| TaintedPath.js:173:7:173:48 | path | TaintedPath.js:201:40:201:43 | path | +| TaintedPath.js:173:7:173:48 | path | TaintedPath.js:201:40:201:43 | path | +| TaintedPath.js:173:7:173:48 | path | TaintedPath.js:202:50:202:53 | path | +| TaintedPath.js:173:7:173:48 | path | TaintedPath.js:202:50:202:53 | path | +| TaintedPath.js:173:7:173:48 | path | TaintedPath.js:202:50:202:53 | path | +| TaintedPath.js:173:7:173:48 | path | TaintedPath.js:202:50:202:53 | path | +| TaintedPath.js:173:7:173:48 | path | TaintedPath.js:202:50:202:53 | path | +| TaintedPath.js:173:7:173:48 | path | TaintedPath.js:202:50:202:53 | path | +| TaintedPath.js:173:7:173:48 | path | TaintedPath.js:202:50:202:53 | path | +| TaintedPath.js:173:7:173:48 | path | TaintedPath.js:202:50:202:53 | path | | TaintedPath.js:173:14:173:37 | url.par ... , true) | TaintedPath.js:173:14:173:43 | url.par ... ).query | | TaintedPath.js:173:14:173:37 | url.par ... , true) | TaintedPath.js:173:14:173:43 | url.par ... ).query | | TaintedPath.js:173:14:173:37 | url.par ... , true) | TaintedPath.js:173:14:173:43 | url.par ... ).query | @@ -4561,6 +4619,62 @@ edges | TaintedPath.js:186:29:186:32 | path | TaintedPath.js:186:29:186:57 | path.re ... /g, '') | | TaintedPath.js:186:29:186:32 | path | TaintedPath.js:186:29:186:57 | path.re ... /g, '') | | TaintedPath.js:186:29:186:32 | path | TaintedPath.js:186:29:186:57 | path.re ... /g, '') | +| TaintedPath.js:201:40:201:43 | path | TaintedPath.js:201:40:201:73 | path.re ... +/, '') | +| TaintedPath.js:201:40:201:43 | path | TaintedPath.js:201:40:201:73 | path.re ... +/, '') | +| TaintedPath.js:201:40:201:43 | path | TaintedPath.js:201:40:201:73 | path.re ... +/, '') | +| TaintedPath.js:201:40:201:43 | path | TaintedPath.js:201:40:201:73 | path.re ... +/, '') | +| TaintedPath.js:201:40:201:43 | path | TaintedPath.js:201:40:201:73 | path.re ... +/, '') | +| TaintedPath.js:201:40:201:43 | path | TaintedPath.js:201:40:201:73 | path.re ... +/, '') | +| TaintedPath.js:201:40:201:43 | path | TaintedPath.js:201:40:201:73 | path.re ... +/, '') | +| TaintedPath.js:201:40:201:43 | path | TaintedPath.js:201:40:201:73 | path.re ... +/, '') | +| TaintedPath.js:201:40:201:43 | path | TaintedPath.js:201:40:201:73 | path.re ... +/, '') | +| TaintedPath.js:201:40:201:43 | path | TaintedPath.js:201:40:201:73 | path.re ... +/, '') | +| TaintedPath.js:201:40:201:43 | path | TaintedPath.js:201:40:201:73 | path.re ... +/, '') | +| TaintedPath.js:201:40:201:43 | path | TaintedPath.js:201:40:201:73 | path.re ... +/, '') | +| TaintedPath.js:201:40:201:43 | path | TaintedPath.js:201:40:201:73 | path.re ... +/, '') | +| TaintedPath.js:201:40:201:43 | path | TaintedPath.js:201:40:201:73 | path.re ... +/, '') | +| TaintedPath.js:201:40:201:43 | path | TaintedPath.js:201:40:201:73 | path.re ... +/, '') | +| TaintedPath.js:201:40:201:43 | path | TaintedPath.js:201:40:201:73 | path.re ... +/, '') | +| TaintedPath.js:201:40:201:73 | path.re ... +/, '') | TaintedPath.js:201:29:201:73 | "prefix ... +/, '') | +| TaintedPath.js:201:40:201:73 | path.re ... +/, '') | TaintedPath.js:201:29:201:73 | "prefix ... +/, '') | +| TaintedPath.js:201:40:201:73 | path.re ... +/, '') | TaintedPath.js:201:29:201:73 | "prefix ... +/, '') | +| TaintedPath.js:201:40:201:73 | path.re ... +/, '') | TaintedPath.js:201:29:201:73 | "prefix ... +/, '') | +| TaintedPath.js:201:40:201:73 | path.re ... +/, '') | TaintedPath.js:201:29:201:73 | "prefix ... +/, '') | +| TaintedPath.js:201:40:201:73 | path.re ... +/, '') | TaintedPath.js:201:29:201:73 | "prefix ... +/, '') | +| TaintedPath.js:201:40:201:73 | path.re ... +/, '') | TaintedPath.js:201:29:201:73 | "prefix ... +/, '') | +| TaintedPath.js:201:40:201:73 | path.re ... +/, '') | TaintedPath.js:201:29:201:73 | "prefix ... +/, '') | +| TaintedPath.js:201:40:201:73 | path.re ... +/, '') | TaintedPath.js:201:29:201:73 | "prefix ... +/, '') | +| TaintedPath.js:201:40:201:73 | path.re ... +/, '') | TaintedPath.js:201:29:201:73 | "prefix ... +/, '') | +| TaintedPath.js:201:40:201:73 | path.re ... +/, '') | TaintedPath.js:201:29:201:73 | "prefix ... +/, '') | +| TaintedPath.js:201:40:201:73 | path.re ... +/, '') | TaintedPath.js:201:29:201:73 | "prefix ... +/, '') | +| TaintedPath.js:201:40:201:73 | path.re ... +/, '') | TaintedPath.js:201:29:201:73 | "prefix ... +/, '') | +| TaintedPath.js:201:40:201:73 | path.re ... +/, '') | TaintedPath.js:201:29:201:73 | "prefix ... +/, '') | +| TaintedPath.js:201:40:201:73 | path.re ... +/, '') | TaintedPath.js:201:29:201:73 | "prefix ... +/, '') | +| TaintedPath.js:201:40:201:73 | path.re ... +/, '') | TaintedPath.js:201:29:201:73 | "prefix ... +/, '') | +| TaintedPath.js:201:40:201:73 | path.re ... +/, '') | TaintedPath.js:201:29:201:73 | "prefix ... +/, '') | +| TaintedPath.js:201:40:201:73 | path.re ... +/, '') | TaintedPath.js:201:29:201:73 | "prefix ... +/, '') | +| TaintedPath.js:201:40:201:73 | path.re ... +/, '') | TaintedPath.js:201:29:201:73 | "prefix ... +/, '') | +| TaintedPath.js:201:40:201:73 | path.re ... +/, '') | TaintedPath.js:201:29:201:73 | "prefix ... +/, '') | +| TaintedPath.js:201:40:201:73 | path.re ... +/, '') | TaintedPath.js:201:29:201:73 | "prefix ... +/, '') | +| TaintedPath.js:201:40:201:73 | path.re ... +/, '') | TaintedPath.js:201:29:201:73 | "prefix ... +/, '') | +| TaintedPath.js:201:40:201:73 | path.re ... +/, '') | TaintedPath.js:201:29:201:73 | "prefix ... +/, '') | +| TaintedPath.js:201:40:201:73 | path.re ... +/, '') | TaintedPath.js:201:29:201:73 | "prefix ... +/, '') | +| TaintedPath.js:202:29:202:54 | pathMod ... e(path) | TaintedPath.js:202:29:202:84 | pathMod ... +/, '') | +| TaintedPath.js:202:29:202:54 | pathMod ... e(path) | TaintedPath.js:202:29:202:84 | pathMod ... +/, '') | +| TaintedPath.js:202:29:202:54 | pathMod ... e(path) | TaintedPath.js:202:29:202:84 | pathMod ... +/, '') | +| TaintedPath.js:202:29:202:54 | pathMod ... e(path) | TaintedPath.js:202:29:202:84 | pathMod ... +/, '') | +| TaintedPath.js:202:29:202:54 | pathMod ... e(path) | TaintedPath.js:202:29:202:84 | pathMod ... +/, '') | +| TaintedPath.js:202:29:202:54 | pathMod ... e(path) | TaintedPath.js:202:29:202:84 | pathMod ... +/, '') | +| TaintedPath.js:202:29:202:54 | pathMod ... e(path) | TaintedPath.js:202:29:202:84 | pathMod ... +/, '') | +| TaintedPath.js:202:29:202:54 | pathMod ... e(path) | TaintedPath.js:202:29:202:84 | pathMod ... +/, '') | +| TaintedPath.js:202:50:202:53 | path | TaintedPath.js:202:29:202:54 | pathMod ... e(path) | +| TaintedPath.js:202:50:202:53 | path | TaintedPath.js:202:29:202:54 | pathMod ... e(path) | +| TaintedPath.js:202:50:202:53 | path | TaintedPath.js:202:29:202:54 | pathMod ... e(path) | +| TaintedPath.js:202:50:202:53 | path | TaintedPath.js:202:29:202:54 | pathMod ... e(path) | +| TaintedPath.js:202:50:202:53 | path | TaintedPath.js:202:29:202:54 | pathMod ... e(path) | +| TaintedPath.js:202:50:202:53 | path | TaintedPath.js:202:29:202:54 | pathMod ... e(path) | +| TaintedPath.js:202:50:202:53 | path | TaintedPath.js:202:29:202:54 | pathMod ... e(path) | +| TaintedPath.js:202:50:202:53 | path | TaintedPath.js:202:29:202:54 | pathMod ... e(path) | | normalizedPaths.js:11:7:11:27 | path | normalizedPaths.js:13:19:13:22 | path | | normalizedPaths.js:11:7:11:27 | path | normalizedPaths.js:13:19:13:22 | path | | normalizedPaths.js:11:7:11:27 | path | normalizedPaths.js:13:19:13:22 | path | @@ -6391,6 +6505,8 @@ edges | TaintedPath.js:184:29:184:53 | path.re ... /g, '') | TaintedPath.js:173:24:173:30 | req.url | TaintedPath.js:184:29:184:53 | path.re ... /g, '') | This path depends on $@. | TaintedPath.js:173:24:173:30 | req.url | a user-provided value | | TaintedPath.js:185:29:185:51 | path.re ... /g, '') | TaintedPath.js:173:24:173:30 | req.url | TaintedPath.js:185:29:185:51 | path.re ... /g, '') | This path depends on $@. | TaintedPath.js:173:24:173:30 | req.url | a user-provided value | | TaintedPath.js:186:29:186:57 | path.re ... /g, '') | TaintedPath.js:173:24:173:30 | req.url | TaintedPath.js:186:29:186:57 | path.re ... /g, '') | This path depends on $@. | TaintedPath.js:173:24:173:30 | req.url | a user-provided value | +| TaintedPath.js:201:29:201:73 | "prefix ... +/, '') | TaintedPath.js:173:24:173:30 | req.url | TaintedPath.js:201:29:201:73 | "prefix ... +/, '') | This path depends on $@. | TaintedPath.js:173:24:173:30 | req.url | a user-provided value | +| TaintedPath.js:202:29:202:84 | pathMod ... +/, '') | TaintedPath.js:173:24:173:30 | req.url | TaintedPath.js:202:29:202:84 | pathMod ... +/, '') | This path depends on $@. | TaintedPath.js:173:24:173:30 | req.url | a user-provided value | | normalizedPaths.js:13:19:13:22 | path | normalizedPaths.js:11:14:11:27 | req.query.path | normalizedPaths.js:13:19:13:22 | path | This path depends on $@. | normalizedPaths.js:11:14:11:27 | req.query.path | a user-provided value | | normalizedPaths.js:14:19:14:29 | './' + path | normalizedPaths.js:11:14:11:27 | req.query.path | normalizedPaths.js:14:19:14:29 | './' + path | This path depends on $@. | normalizedPaths.js:11:14:11:27 | req.query.path | a user-provided value | | normalizedPaths.js:15:19:15:38 | path + '/index.html' | normalizedPaths.js:11:14:11:27 | req.query.path | normalizedPaths.js:15:19:15:38 | path + '/index.html' | This path depends on $@. | normalizedPaths.js:11:14:11:27 | req.query.path | a user-provided value | diff --git a/javascript/ql/test/query-tests/Security/CWE-022/TaintedPath/TaintedPath.js b/javascript/ql/test/query-tests/Security/CWE-022/TaintedPath/TaintedPath.js index 0ecf680367f..506dc280a3d 100644 --- a/javascript/ql/test/query-tests/Security/CWE-022/TaintedPath/TaintedPath.js +++ b/javascript/ql/test/query-tests/Security/CWE-022/TaintedPath/TaintedPath.js @@ -191,4 +191,13 @@ var server = http.createServer(function(req, res) { res.write(fs.readFileSync(path.replace(/\./g, ''))); // OK res.write(fs.readFileSync(path.replace(/\.\.|BLA/g, ''))); // OK } + + // removing of "../" from prefix. + res.write(fs.readFileSync("prefix" + pathModule.normalize(path).replace(/^(\.\.[\/\\])+/, ''))); // OK + res.write(fs.readFileSync("prefix" + pathModule.normalize(path).replace(/(\.\.[\/\\])+/, ''))); // OK + res.write(fs.readFileSync("prefix" + pathModule.normalize(path).replace(/(\.\.\/)+/, ''))); // OK + res.write(fs.readFileSync("prefix" + pathModule.normalize(path).replace(/(\.\.\/)*/, ''))); // OK + + res.write(fs.readFileSync("prefix" + path.replace(/^(\.\.[\/\\])+/, ''))); // NOT OK - not normalized + res.write(fs.readFileSync(pathModule.normalize(path).replace(/^(\.\.[\/\\])+/, ''))); // NOT OK (can be absolute) }); \ No newline at end of file diff --git a/misc/suite-helpers/code-scanning-selectors.yml b/misc/suite-helpers/code-scanning-selectors.yml new file mode 100644 index 00000000000..ffa40d8e4b1 --- /dev/null +++ b/misc/suite-helpers/code-scanning-selectors.yml @@ -0,0 +1,16 @@ +- description: Selectors for selecting the Code-Scanning-relevant queries for a language +- include: + kind: + - problem + - path-problem + precision: + - high + - very-high + problem.severity: + - error + - warning + tags contain: + - security +- exclude: + deprecated: // + diff --git a/python/ql/src/Functions/IterReturnsNonSelf.ql b/python/ql/src/Functions/IterReturnsNonSelf.ql index 7ca63493015..095685b749a 100644 --- a/python/ql/src/Functions/IterReturnsNonSelf.ql +++ b/python/ql/src/Functions/IterReturnsNonSelf.ql @@ -12,9 +12,7 @@ import python -Function iter_method(ClassObject t) { - result = t.lookupAttribute("__iter__").(FunctionObject).getFunction() -} +Function iter_method(ClassValue t) { result = t.lookup("__iter__").(FunctionValue).getScope() } predicate is_self(Name value, Function f) { value.getVariable() = f.getArg(0).(Name).getVariable() } @@ -26,7 +24,7 @@ predicate returns_non_self(Function f) { exists(Return r | r.getScope() = f and not exists(r.getValue())) } -from ClassObject t, Function iter +from ClassValue t, Function iter where t.isIterator() and iter = iter_method(t) and returns_non_self(iter) select t, "Class " + t.getName() + " is an iterator but its $@ method does not return 'self'.", iter, iter.getName() diff --git a/python/ql/src/Functions/OverlyComplexDelMethod.ql b/python/ql/src/Functions/OverlyComplexDelMethod.ql index 2503f7ac6a7..b709af7fb11 100644 --- a/python/ql/src/Functions/OverlyComplexDelMethod.ql +++ b/python/ql/src/Functions/OverlyComplexDelMethod.ql @@ -15,10 +15,10 @@ import python -from FunctionObject method +from FunctionValue method where - exists(ClassObject c | + exists(ClassValue c | c.declaredAttribute("__del__") = method and - method.getFunction().getMetrics().getCyclomaticComplexity() > 3 + method.getScope().getMetrics().getCyclomaticComplexity() > 3 ) select method, "Overly complex '__del__' method." diff --git a/python/ql/src/codeql-suites/python-code-scanning.qls b/python/ql/src/codeql-suites/python-code-scanning.qls new file mode 100644 index 00000000000..f9f9a5425b6 --- /dev/null +++ b/python/ql/src/codeql-suites/python-code-scanning.qls @@ -0,0 +1,4 @@ +- description: Standard Code Scanning queries for Python +- qlpack: codeql-python +- apply: code-scanning-selectors.yml + from: codeql-suite-helpers diff --git a/python/ql/src/semmle/python/objects/ObjectAPI.qll b/python/ql/src/semmle/python/objects/ObjectAPI.qll index 66d3950b1ed..9772a77ed8b 100644 --- a/python/ql/src/semmle/python/objects/ObjectAPI.qll +++ b/python/ql/src/semmle/python/objects/ObjectAPI.qll @@ -430,6 +430,29 @@ class ClassValue extends Value { this.hasAttribute("__getitem__") } + /** Holds if this class is an iterator. */ + predicate isIterator() { + this.hasAttribute("__iter__") and + ( + major_version() = 3 and this.hasAttribute("__next__") + or + /* + * Because 'next' is a common method name we need to check that an __iter__ + * method actually returns this class. This is not needed for Py3 as the + * '__next__' method exists to define a class as an iterator. + */ + + major_version() = 2 and + this.hasAttribute("next") and + exists(ClassValue other, FunctionValue iter | other.declaredAttribute("__iter__") = iter | + iter.getAnInferredReturnType() = this + ) + ) + or + /* This will be redundant when we have C class information */ + this = ClassValue::generator() + } + /** Holds if this class is a container(). That is, does it have a __getitem__ method. */ predicate isContainer() { exists(this.lookup("__getitem__")) } @@ -583,11 +606,7 @@ abstract class FunctionValue extends CallableValue { } /** Gets a class that this function may return */ - ClassValue getAnInferredReturnType() { - result = TBuiltinClassObject(this.(BuiltinFunctionObjectInternal).getReturnType()) - or - result = TBuiltinClassObject(this.(BuiltinMethodObjectInternal).getReturnType()) - } + abstract ClassValue getAnInferredReturnType(); } /** Class representing Python functions */ @@ -616,6 +635,13 @@ class PythonFunctionValue extends FunctionValue { /** Gets a control flow node corresponding to a return statement in this function */ ControlFlowNode getAReturnedNode() { result = this.getScope().getAReturnValueFlowNode() } + + override ClassValue getAnInferredReturnType() { + /* We have to do a special version of this because builtin functions have no + * explicit return nodes that we can query and get the class of. + */ + result = this.getAReturnedNode().pointsTo().getClass() + } } /** Class representing builtin functions, such as `len` or `print` */ @@ -627,6 +653,13 @@ class BuiltinFunctionValue extends FunctionValue { override int minParameters() { none() } override int maxParameters() { none() } + + override ClassValue getAnInferredReturnType() { + /* We have to do a special version of this because builtin functions have no + * explicit return nodes that we can query and get the class of. + */ + result = TBuiltinClassObject(this.(BuiltinFunctionObjectInternal).getReturnType()) + } } /** Class representing builtin methods, such as `list.append` or `set.add` */ @@ -644,6 +677,10 @@ class BuiltinMethodValue extends FunctionValue { override int minParameters() { none() } override int maxParameters() { none() } + + override ClassValue getAnInferredReturnType() { + result = TBuiltinClassObject(this.(BuiltinMethodObjectInternal).getReturnType()) + } } /** diff --git a/python/ql/test/query-tests/Functions/general/OverlyComplexDelMethod.expected b/python/ql/test/query-tests/Functions/general/OverlyComplexDelMethod.expected index 84c08d89426..2eff178d972 100644 --- a/python/ql/test/query-tests/Functions/general/OverlyComplexDelMethod.expected +++ b/python/ql/test/query-tests/Functions/general/OverlyComplexDelMethod.expected @@ -1 +1 @@ -| protocols.py:74:5:74:22 | Function __del__ | Overly complex '__del__' method. | +| protocols.py:74:5:74:22 | Function MegaDel.__del__ | Overly complex '__del__' method. |