* Using sqlite to illustrate models-as-data This description uses / recycles a codeql workshop. The original instructions are below: [[*SQL injection example][SQL injection example]] ** Build the codeql database To get started, build the codeql database (adjust paths to your setup): #+BEGIN_SRC sh # Build the db with source commit id. SRCDIR=$(pwd) DB=$SRCDIR/java-sqlite-$(cd $SRCDIR && git rev-parse --short HEAD).db echo $DB test -d "$DB" && rm -fR "$DB" mkdir -p "$DB" # Use the correct codeql export PATH="$(cd ../codeql && pwd):$PATH" codeql database create --language=java -s . -j 8 -v $DB --command='./build.sh' # Check for AddUser in the db unzip -v $DB/src.zip | grep AddUser #+END_SRC Then add this database directory to your VS Code =DATABASES= tab. ** Tests using a default query You can run the stdlib query [[../ql/java/ql/src/Security/CWE/CWE-089/SqlTainted.ql]] but will get no results. It does point at classes to inspect -- in particular, the source and sink classes. Run [[./Illustrations.ql]]; from the command line or vs studio code. Via cli: #+BEGIN_SRC sh # run query codeql query run \ -v \ --database java-sqlite-e2e555c.db \ --output result.bqrs \ --threads=12 \ --ram=14000 \ Illustrations.ql # format results codeql bqrs decode --format=text result.bqrs | sed -n '/^Result set: #select/,$p' #+END_SRC This shows #+BEGIN_SRC text Result set: #select | ui | qsi | +------+-------+ | args | query | #+END_SRC In the editor, these link to 1. =main(ARGS)= and 2. =conn.createStatement().executeUpdate(QUERY);= The second is correct, but =System.console().readLine();= is not found. Thus, =SqlTainted.ql= will not find anything. ** TODO supplement sources via the model editor - [ ] We have no flow + check source, sink + we have a sink + but ActiveThreatModelSource finds no source - [ ] We can supplement in different ways - supplement codeql: Write full manual query: already in workshop - supplement codeql: Add to FlowSource or a subclass Note: this /one area/ that just has to be known. Browsing source will *not* help you. CodeQL reading hint: : class ActiveThreatModelSource extends DataFlow::Node uses : this.(SourceNode).getThreatModel() So following the cast (SourceNode) may be useful: #+BEGIN_SRC java /** ,* A data flow source. ,*/ abstract class SourceNode extends DataFlow::Node #+END_SRC Following the =abstract class= is promising: #+BEGIN_SRC java abstract class RemoteFlowSource extends SourceNode #+END_SRC and others. In [[../ql/java/ql/lib/Customizations.qll]] notice the comments mentioning RemoteFlowSource. Use imports from [[../ql/java/ql/src/Security/CWE/CWE-089/SqlTainted.ql]] but note that there are conflicts. you will use : private import semmle.code.java.dataflow.FlowSources Follow this to FlowSources, and find the mentioned RemoteFlowSource : abstract class RemoteFlowSource extends SourceNode Add the custom source. The modified [[../ql/java/ql/lib/Customizations.qll]] is #+BEGIN_SRC java private import semmle.code.java.dataflow.FlowSources class ReadLine extends RemoteFlowSource { ReadLine() { exists(Call read | read.getCallee().getName() = "readLine" and read = this.asExpr() ) } override string getSourceType() { result = "Console readline" } } #+END_SRC Note that the predicate #+BEGIN_SRC java module QueryInjectionFlowConfig implements DataFlow::ConfigSig { predicate isSource(DataFlow::Node src) { src instanceof ActiveThreatModelSource } ...; } #+END_SRC now also returns the readLine() result -- although we extended RemoteFlowSource, not ActiveThreatModelSource + [ ] customizations in staging repo - supplement codeql: Add to models-as-data - schema in codeql: [[../ql/java/ql/lib/semmle/code/java/dataflow/internal/ExternalFlowExtensions.qll]] - data sample: [[../.github/codeql/extensions/jedis-db-local-java/models/redis.clients.jedis.model.yml]] In the model editor, we see a java.io.*Console.*readline' #+BEGIN_SRC sh 1:$ rg -i 'java.io.*Console.*readline' ql/java ql/java/ql/lib/ext/generated/java.io.model.yml 16: - ["java.io", "Console", False, "readLine", "()", "", "Argument[this]", "ReturnValue", "taint", "df-generated"] 17: - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[0]", "Argument[this]", "taint", "df-generated"] 18: - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[1].ArrayElement", "Argument[this]", "taint", "df-generated"] 19: - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[this]", "ReturnValue", "taint", "df-generated"] #+END_SRC - [ ] checkkn - [ ] Also check RemoteFlowSource, from : import semmle.code.cpp.security.FlowSources The goal now is to supplement sources via the model editor. * SQL injection example This directory contains the problematic Java source code. The rest of this README describes - the [[*Setup and sample run][Setup and sample run]] for the problem, - briefly describes how to [[*Identify the problem][Identify the problem]] and - instructions to [[*Build the codeql database][Build the codeql database]] The codeql query is developed in [[../session/README.org]]. ** Setup and sample run The jdbc connector at https://github.com/xerial/sqlite-jdbc, from [[https://github.com/xerial/sqlite-jdbc/releases/download/3.36.0.1/sqlite-jdbc-3.36.0.1.jar][here]] is included in the git repository. #+BEGIN_SRC sh # Use a simple headline prompt PS1=' \033[32m---- SQL injection demo ----\[\033[33m\033[0m\] $?:$ ' # Build ./build.sh # Prepare db ./admin -r ./admin -c ./admin -s # Add regular user interactively ./add-user 2>> users.log First User # Check ./admin -s # Add Johnny Droptable ./add-user 2>> users.log Johnny'); DROP TABLE users; -- # And the problem: ./admin -s # Check the log tail users.log #+END_SRC ** Identify the problem =./add-user= is reading from =STDIN=, and writing to a database; looking at the code in [[./AddUser.java]] leads to : System.console().readLine(); for the read and : conn.createStatement().executeUpdate(query); for the write. This problem is thus a dataflow problem; in codeql terminology we have - a /source/ at the =System.console().readLine();= - a /sink/ at the =conn.createStatement().executeUpdate(query);= We write codeql to identify these two, and then connect them via - a /dataflow configuration/ -- for this problem, the more general /taintflow configuration/. ** Build the codeql database To get started, build the codeql database (adjust paths to your setup): #+BEGIN_SRC sh # Build the db with source commit id. SRCDIR=$(pwd) DB=$SRCDIR/java-sqlite-$(cd $SRCDIR && git rev-parse --short HEAD).db echo $DB test -d "$DB" && rm -fR "$DB" mkdir -p "$DB" # Use the correct codeql export PATH="$(cd ../codeql && pwd):$PATH" codeql database create --language=java -s . -j 8 -v $DB --command='./build.sh' # Check for AddUser in the db unzip -v $DB/src.zip | grep AddUser #+END_SRC Then add this database directory to your VS Code =DATABASES= tab. ** (optional) Build the codeql database in steps For larger projects, using a single command to build everything is costly when any part of the build fails. The sequence here is also used by the GHAS default setup, so familiarity with it helps in reviewing logs. The purpose of these sections is to illustrate the codeql commands used in default setup and making the connection between the GHAS default action and the CodeQL CLI explicit. After running default setup and downloading the log, you will see the following entries embedded in the full log. They are repeated here for completeness; you can skip the command-line options for now. #+BEGIN_SRC sh codeql version --format=json codeql resolve languages --format=betterjson --extractor-options-verbosity=4 --extractor-include-aliases codeql database init --force-overwrite --db-cluster /home/runner/work/_temp/codeql_databases --source-root=/home/runner/work/codeql-workshop-sql-injection-java/codeql-workshop-sql-injection-java --extractor-include-aliases --language=java --codescanning-config=/home/runner/work/_temp/user-config.yaml --build-mode=none --calculate-language-specific-baseline --sublanguage-file-coverage codeql database trace-command --use-build-mode --working-dir /home/runner/work/codeql-workshop-sql-injection-java/codeql-workshop-sql-injection-java /home/runner/work/_temp/codeql_databases/java codeql database finalize --finalize-dataset --threads=4 --ram=14567 /home/runner/work/_temp/codeql_databases/java codeql database run-queries --ram=14567 --threads=4 /home/runner/work/_temp/codeql_databases/java --expect-discarded-cache --min-disk-free=1024 -v --intra-layer-parallelism codeql database cleanup /home/runner/work/_temp/codeql_databases/java --cache-cleanup=brutal codeql database bundle /home/runner/work/_temp/codeql_databases/java --output=/home/runner/work/_temp/codeql_databases/java.zip --name=java #+END_SRC To build a database in steps locally, use the following sequence, adjusting paths to your setup: #+BEGIN_SRC sh # Build the db with source commit id. SRCDIR=$HOME/local/codeql-workshop-sql-injection-java/src DB=$SRCDIR/java-sqli-$(cd $SRCDIR && git rev-parse --short HEAD) # Check paths echo "DB will be: $DB" echo "SRC is in: $SRCDIR" # Prepare db directory test -d "$DB" && rm -fR "$DB" mkdir -p "$DB" # Run the build, without --db-cluster # Init database cd $SRCDIR codeql database init \ --language=java \ --build-mode=none \ --source-root=. \ -v $DB # Repeat trace-command as needed to cover all targets codeql database trace-command \ --use-build-mode \ --working-dir . \ $DB # Finalize database codeql database finalize \ --finalize-dataset \ --threads=4 \ --ram=14567 \ $DB # Use the database; get the location echo $DB # /Users/hohn/local/codeql-workshop-sql-injection-java/src/java-sqli-161a1d5 #+END_SRC To also analyze the database just built, we use the log's command but add an explicit query name: #+BEGIN_SRC sh codeql database run-queries \ --ram=14567 \ --threads=4 $DB \ --expect-discarded-cache \ --min-disk-free=1024 \ -v \ --intra-layer-parallelism \ -- \ ../session/simple.ql #+END_SRC This only gives us a bqrs file, we want sarif. Checking help: #+BEGIN_SRC text codeql database run-queries --help Usage: codeql database run-queries [OPTIONS] -- [...] [Plumbing] Run a set of queries together. Run one or more queries against a CodeQL database, saving the results to the results subdirectory of the database directory. The results can later be converted to readable formats by codeql database interpret-results, or query-for-query by with codeql bqrs decode or codeql bqrs interpret. #+END_SRC So we run the following #+BEGIN_SRC sh VERSION=$(cd $SRCDIR && git rev-parse --short HEAD) codeql database interpret-results \ --format=sarifv2.1.0 \ -o simple-$VERSION.sarif \ -- $DB ../session/simple.ql echo "Results in simple-$VERSION.sarif" #+END_SRC We kept the output for this sample in [[./simple-161a1d5.sarif]]