* An Operational View of CodeQL These are notes used to develop the slides in [[./operational-view.key]] and its [[./operational-view.pdf][pdf version]]. They may be handy for running the examples, and are a simpler read than the slides. A diagram view of the use of codeql is in [[./notes/codeql-build.drawio]]. To edit / view / print these, use the open-source version of [[https://www.drawio.com][drawio]]. It can be used in the browser or downloaded. For simpler viewing, a [[./notes/codeql-build.drawio.pdf][pdf]] is provided. * The codeql / C compiler comparison *** sql injection problem, compiler view Think Compiler (C) with library: #+BEGIN_SRC sh # Prepare System ./admin -c # Convert data if needed cat users.txt # Edit your code edit add-user.c # Compile & run your code clang -Wall add-user.c -lsqlite3 -o add-user for user in `cat input.txt` ; do echo "$user" | ./add-user 2>> users.log ; done # Examine results ./admin -s #+END_SRC *** sql injection problem, codeql view Think Compiler (CodeQL) with library: #+BEGIN_SRC sh # Prepare System export PATH=$HOME/local/vmsync/codeql250:"$PATH" # Convert data if needed SRCDIR=. DB=add-user.db cd $SRCDIR && \ codeql database create --language=cpp \ -s . -j 8 -v \ $DB \ --command='clang -Wall add-user.c -lsqlite3 -o add-user' # Edit your code edit SqlInjection.ql # Compile & run your code RESULTS=cpp-sqli.sarif codeql database analyze \ -v --ram=14000 -j12 --rerun \ --search-path ~/local/vmsync/ql \ --format=sarif-latest \ --output=$RESULTS \ -- \ $DB \ $SRCDIR/SqlInjection.ql # Examine results # Plain text, look for # "results" : [ { # and # "codeFlows" : [ { edit $RESULTS # Or jq --raw-output --join-output -f sarif-summary.jq < cpp-sqli.sarif | less # Or use vs code's sarif viewer # Or use the GHAS integration via actions #+END_SRC ** Connecting to the compiler core - IDEs: use vs code for full functionality, any lsp-using editor for completion/jump to source - choose a repository layout that best fits your custom queries' development model - check for library X support in the ql/ library. Better yet, check for particular function names. ** Best practice CodeQL ** Key Ideas All of following ideas follow from one simple observation: *The CodeQL CLI is a compiler and you should treat it as such* - ghas setup and integration are almost completely independent of query customization; take advantage of this. If you can build your code on your desktop/laptop/own server, you don't have to wait for GHAS integration to produce codeql databases. In fact, you should *start on your desktop/laptop/server* to find issues around the build: memory / thread requirements, ensuring the build system runs correctly when invoked from codeql, etc. - use desktop-based code scanning earlier in workflow - *cli setup / analysis should be done as prototype* for your github admins to work off - *customize scanning tools to actually get results:* - bug bounty programs - known entry / exit points for services - Just like your CI/CD pipeline encapsulates your compiler cli tools, github and GHAS encapsulate the codeql cli tools. So you can always think about what makes sense for the cli, try it there, and then update your GHAS workflow. ** Some Q&A via compiler analogy *** Q: Is the C standard library supported? A: Much of it, typically from a conceptual level. To find the supported APIs, search the [[https://github.com/github/codeql/blob/87ee7849a929fff00343071315fa8108976d5c70/cpp/ql/src/][=ql/=]] library source tree. For example, for a top-down search start with =cpp.qll= and notice the import =import semmle.code.cpp.commons.Printf=. Follow this to find the [[https://github.com/github/codeql/blob/87ee7849a929fff00343071315fa8108976d5c70/cpp/ql/src/semmle/code/cpp/commons/][=cpp.commons=]] module and see what it models: # /Users/hohn/local/vmsync/ql/cpp/ql/src/semmle/code/cpp/commons: #+BEGIN_SRC text Alloc.qll Dependency.qll NullTermination.qll StringAnalysis.qll Assertions.qll Environment.qll PolymorphicClass.qll StructLikeClass.qll Buffer.qll Exclusions.qll Printf.qll Synchronization.qll CommonType.qll File.qll Scanf.qll VoidContext.qll DateTime.qll NULL.qll Strcat.qll unix/ #+END_SRC *** Q: Is library X supported? A: If it is, you'll find it in the [[https://github.com/github/codeql/blob/87ee7849a929fff00343071315fa8108976d5c70/cpp/ql/src/][=ql/=]] library source tree. A whole-tree search, =grep=-style, is easiest. # /Users/hohn/local/vmsync/ql/cpp/ql/src: For example, to check support for sqlite: #+BEGIN_SRC text hohn@gh-hohn ~/local/vmsync/ql/cpp/ql/src 0:$ grep -l -R sqlite * Security/CWE/CWE-313/CleartextSqliteDatabase.ql Security/CWE/CWE-313/CleartextSqliteDatabase.c semmle/code/cpp/security/Security.qll #+END_SRC So we have a query (=.ql=) and a library (=.qll=); look at both to get some ideas: **** =Security/CWE/CWE-313/CleartextSqliteDatabase.ql= has some info [[https://github.com/github/codeql/blob/87ee7849a929fff00343071315fa8108976d5c70/cpp/ql/src/Security/CWE/CWE-313/CleartextSqliteDatabase.ql#L2][in the header]] #+begin_src javascript /** ,* @name Cleartext storage of sensitive information in an SQLite database ,* @description Storing sensitive information in a non-encrypted ,* database can expose it to an attacker. ,*/ #+end_src and [[https://github.com/github/codeql/blob/87ee7849a929fff00343071315fa8108976d5c70/cpp/ql/src/Security/CWE/CWE-313/CleartextSqliteDatabase.ql#L25][a promising class]]: #+begin_src javascript class SqliteFunctionCall extends FunctionCall { SqliteFunctionCall() { this.getTarget().getName().matches("sqlite%") } Expr getASource() { result = this.getAnArgument() } } #+end_src **** =semmle/code/cpp/security/Security.qll= has [[https://github.com/github/codeql/blob/87ee7849a929fff00343071315fa8108976d5c70/cpp/ql/src/semmle/code/cpp/security/Security.qll#L12][some very promising entries]] #+begin_src javascript /** ,* Extend this class to customize the security queries for ,* a particular code base. Provide no constructor in the ,* subclass, and override any methods that need customizing. ,*/ class SecurityOptions extends string { ;; predicate sqlArgument(string function, int arg) { ;; // SQLite3 C API function = "sqlite3_exec" and arg = 1 } ;; /** ,* The argument of the given function is filled in from user input. ,*/ predicate userInputArgument(FunctionCall functionCall, int arg) { ;; fname = "scanf" and arg >= 1 ;; } ;; } #+end_src This is a library, so some sample uses would be nice. Another search via : grep -nH -R SecurityOptions * [[https://github.com/github/codeql/blob/87ee7849a929fff00343071315fa8108976d5c70/docs/codeql/ql-training/cpp/global-data-flow-cpp.rst#L59][finds documentation]]: #+begin_src text docs/codeql/ql-training/cpp/global-data-flow-cpp.rst:59:The library class ``SecurityOptions`` provides a (configurable) model of what counts as user-controlled data: #+end_src and an [[https://github.com/github/codeql/blob/87ee7849a929fff00343071315fa8108976d5c70/cpp/ql/src/semmle/code/cpp/security/SecurityOptions.qll#L16][extension point]]: #+begin_src text cpp/ql/src/semmle/code/cpp/security/SecurityOptions.qll:16:class CustomSecurityOptions extends SecurityOptions #+end_src #+begin_src javascript /** ,* This class overrides `SecurityOptions` and can be used to add project ,* specific customization. ,*/ class CustomSecurityOptions extends SecurityOptions {...} #+end_src *** Q: How should we go about modeling our libraries with CodeQL? A: Follow the way you use a C library, say =sqlite3=. Your code includes only =sqlite3.h=; you use, but don't care about, =libsqlite3.a=. Thus for CodeQL: don't try to model the library internals, only model the parts of the API you actually use. For other languages, you need also only model the exposed API. *** Q: Should we use the most recent version of codeql at all times? A: Follow the way you use your compiler. Do you use the most recent version of compiler at all times, or do you use a rolling release cycle? To get your current version's info: #+BEGIN_SRC sh hohn@gh-hohn ~/local/vmsync/ql/cpp/ql/src 0:$ codeql --version CodeQL command-line toolchain release 2.5.0. Copyright (C) 2019-2021 GitHub, Inc. Unpacked in: /Users/hohn/local/vmsync/codeql250 Analysis results depend critically on separately distributed query and extractor modules. To list modules that are visible to the toolchain, use 'codeql resolve qlpacks' and 'codeql resolve languages'. #+END_SRC You should match the CodeQL cli version to the CodeQL library version; the [[https://github.com/github/codeql/releases][library releases]] have =codeql-cli/= tags to allow matching with the [[https://github.com/github/codeql-cli-binaries/releases/tag/v2.6.2][binaries]]. When using git for the library, you should check out the appropriate version via, e.g., : cd $HOME/local/vmsync/ql && git checkout codeql-cli/v2.5.9