Files
codeql-operational-view/README.org
2023-08-08 19:38:08 -07:00

252 lines
10 KiB
Org Mode

* An Operational View of CodeQL
These are notes used to develop the slides in [[./operational-view.key]]
and its [[./operational-view.pdf][pdf version]]. They may be handy for running the examples, and are a
simpler read than the slides.
A diagram view of the use of codeql is in [[./notes/codeql-build.drawio]]. To edit
/ view / print these, use the open-source version of [[https://www.drawio.com][drawio]]. It can be used in
the browser or downloaded. For simpler viewing, a [[./notes/codeql-build.drawio.pdf][pdf]] is provided.
* The codeql / C compiler comparison
*** sql injection problem, compiler view
Think Compiler (C) with library:
#+BEGIN_SRC sh
# Prepare System
./admin -c
# Convert data if needed
cat users.txt
# Edit your code
edit add-user.c
# Compile & run your code
clang -Wall add-user.c -lsqlite3 -o add-user
for user in `cat input.txt` ; do echo "$user" | ./add-user 2>> users.log ; done
# Examine results
./admin -s
#+END_SRC
*** sql injection problem, codeql view
Think Compiler (CodeQL) with library:
#+BEGIN_SRC sh
# Prepare System
export PATH=$HOME/local/vmsync/codeql250:"$PATH"
# Convert data if needed
SRCDIR=.
DB=add-user.db
cd $SRCDIR && \
codeql database create --language=cpp \
-s . -j 8 -v \
$DB \
--command='clang -Wall add-user.c -lsqlite3 -o add-user'
# Edit your code
edit SqlInjection.ql
# Compile & run your code
RESULTS=cpp-sqli.sarif
codeql database analyze \
-v --ram=14000 -j12 --rerun \
--search-path ~/local/vmsync/ql \
--format=sarif-latest \
--output=$RESULTS \
-- \
$DB \
$SRCDIR/SqlInjection.ql
# Examine results
# Plain text, look for
# "results" : [ {
# and
# "codeFlows" : [ {
edit $RESULTS
# Or
jq --raw-output --join-output -f sarif-summary.jq < cpp-sqli.sarif | less
# Or use vs code's sarif viewer
# Or use the GHAS integration via actions
#+END_SRC
** Connecting to the compiler core
- IDEs: use vs code for full functionality, any lsp-using editor for
completion/jump to source
- choose a repository layout that best fits your custom queries' development
model
- check for library X support in the ql/ library. Better yet, check for
particular function names.
** Best practice CodeQL
** Key Ideas
All of following ideas follow from one simple observation: *The CodeQL CLI
is a compiler and you should treat it as such*
- ghas setup and integration are almost completely independent of query
customization; take advantage of this.
If you can build your code on your desktop/laptop/own server, you don't have
to wait for GHAS integration to produce codeql databases.
In fact, you should *start on your desktop/laptop/server* to find issues
around the build: memory / thread requirements, ensuring the build system
runs correctly when invoked from codeql, etc.
- use desktop-based code scanning earlier in workflow
- *cli setup / analysis should be done as prototype* for your github admins to
work off
- *customize scanning tools to actually get results:*
- bug bounty programs
- known entry / exit points for services
-
Just like your CI/CD pipeline encapsulates your compiler cli tools,
github and GHAS encapsulate the codeql cli tools.
So you can always think about what makes sense for the cli, try it there, and
then update your GHAS workflow.
** Some Q&A via compiler analogy
***
Q: Is the C standard library supported?
A: Much of it, typically from a conceptual level.
To find the supported APIs, search the [[https://github.com/github/codeql/blob/87ee7849a929fff00343071315fa8108976d5c70/cpp/ql/src/][=ql/=]] library source tree.
For example, for a top-down search start with =cpp.qll= and notice the import
=import semmle.code.cpp.commons.Printf=. Follow this to find the
[[https://github.com/github/codeql/blob/87ee7849a929fff00343071315fa8108976d5c70/cpp/ql/src/semmle/code/cpp/commons/][=cpp.commons=]] module and see what it models:
# /Users/hohn/local/vmsync/ql/cpp/ql/src/semmle/code/cpp/commons:
#+BEGIN_SRC text
Alloc.qll Dependency.qll NullTermination.qll StringAnalysis.qll
Assertions.qll Environment.qll PolymorphicClass.qll StructLikeClass.qll
Buffer.qll Exclusions.qll Printf.qll Synchronization.qll
CommonType.qll File.qll Scanf.qll VoidContext.qll
DateTime.qll NULL.qll Strcat.qll unix/
#+END_SRC
***
Q: Is library X supported?
A: If it is, you'll find it in the [[https://github.com/github/codeql/blob/87ee7849a929fff00343071315fa8108976d5c70/cpp/ql/src/][=ql/=]] library source tree. A whole-tree
search, =grep=-style, is easiest.
# /Users/hohn/local/vmsync/ql/cpp/ql/src:
For example, to check support for sqlite:
#+BEGIN_SRC text
hohn@gh-hohn ~/local/vmsync/ql/cpp/ql/src
0:$ grep -l -R sqlite *
Security/CWE/CWE-313/CleartextSqliteDatabase.ql
Security/CWE/CWE-313/CleartextSqliteDatabase.c
semmle/code/cpp/security/Security.qll
#+END_SRC
So we have a query (=.ql=) and a library (=.qll=); look at both to get
some ideas:
**** =Security/CWE/CWE-313/CleartextSqliteDatabase.ql= has some info [[https://github.com/github/codeql/blob/87ee7849a929fff00343071315fa8108976d5c70/cpp/ql/src/Security/CWE/CWE-313/CleartextSqliteDatabase.ql#L2][in the header]]
#+begin_src javascript
/**
,* @name Cleartext storage of sensitive information in an SQLite database
,* @description Storing sensitive information in a non-encrypted
,* database can expose it to an attacker.
,*/
#+end_src
and [[https://github.com/github/codeql/blob/87ee7849a929fff00343071315fa8108976d5c70/cpp/ql/src/Security/CWE/CWE-313/CleartextSqliteDatabase.ql#L25][a promising class]]:
#+begin_src javascript
class SqliteFunctionCall extends FunctionCall {
SqliteFunctionCall() { this.getTarget().getName().matches("sqlite%") }
Expr getASource() { result = this.getAnArgument() }
}
#+end_src
**** =semmle/code/cpp/security/Security.qll= has [[https://github.com/github/codeql/blob/87ee7849a929fff00343071315fa8108976d5c70/cpp/ql/src/semmle/code/cpp/security/Security.qll#L12][some very promising entries]]
#+begin_src javascript
/**
,* Extend this class to customize the security queries for
,* a particular code base. Provide no constructor in the
,* subclass, and override any methods that need customizing.
,*/
class SecurityOptions extends string {
;;
predicate sqlArgument(string function, int arg) {
;;
// SQLite3 C API
function = "sqlite3_exec" and arg = 1
}
;;
/**
,* The argument of the given function is filled in from user input.
,*/
predicate userInputArgument(FunctionCall functionCall, int arg) {
;;
fname = "scanf" and arg >= 1
;;
}
;;
}
#+end_src
This is a library, so some sample uses would be nice. Another search via
: grep -nH -R SecurityOptions *
[[https://github.com/github/codeql/blob/87ee7849a929fff00343071315fa8108976d5c70/docs/codeql/ql-training/cpp/global-data-flow-cpp.rst#L59][finds documentation]]:
#+begin_src text
docs/codeql/ql-training/cpp/global-data-flow-cpp.rst:59:The library class ``SecurityOptions`` provides a (configurable) model of what counts as user-controlled data:
#+end_src
and an [[https://github.com/github/codeql/blob/87ee7849a929fff00343071315fa8108976d5c70/cpp/ql/src/semmle/code/cpp/security/SecurityOptions.qll#L16][extension point]]:
#+begin_src text
cpp/ql/src/semmle/code/cpp/security/SecurityOptions.qll:16:class CustomSecurityOptions extends SecurityOptions
#+end_src
#+begin_src javascript
/**
,* This class overrides `SecurityOptions` and can be used to add project
,* specific customization.
,*/
class CustomSecurityOptions extends SecurityOptions {...}
#+end_src
***
Q: How should we go about modeling our libraries with CodeQL?
A: Follow the way you use a C library, say =sqlite3=. Your code includes only
=sqlite3.h=; you use, but don't care about, =libsqlite3.a=.
Thus for CodeQL: don't try to model the library internals, only model the
parts of the API you actually use.
For other languages, you need also only model the exposed API.
***
Q: Should we use the most recent version of codeql at all times?
A: Follow the way you use your compiler. Do you use the most recent version
of compiler at all times, or do you use a rolling release cycle?
To get your current version's info:
#+BEGIN_SRC sh
hohn@gh-hohn ~/local/vmsync/ql/cpp/ql/src
0:$ codeql --version
CodeQL command-line toolchain release 2.5.0.
Copyright (C) 2019-2021 GitHub, Inc.
Unpacked in: /Users/hohn/local/vmsync/codeql250
Analysis results depend critically on separately distributed query and
extractor modules. To list modules that are visible to the toolchain,
use 'codeql resolve qlpacks' and 'codeql resolve languages'.
#+END_SRC
You should match the CodeQL cli version to the CodeQL library version;
the [[https://github.com/github/codeql/releases][library releases]] have =codeql-cli/<VERSION>= tags to allow matching with
the [[https://github.com/github/codeql-cli-binaries/releases/tag/v2.6.2][binaries]].
When using git for the library, you should check out the appropriate version
via, e.g.,
: cd $HOME/local/vmsync/ql && git checkout codeql-cli/v2.5.9