[[https://imgs.xkcd.com/comics/exploits_of_a_mom.png]] (from https://xkcd.com/327/) * Using sqlite to illustrate models-as-data ** Build codeql database To get started, build the codeql database (adjust paths to your setup): #+BEGIN_SRC sh # Build the db with source commit id. # export PATH=$HOME/local/vmsync/codeql250:"$PATH" SRCDIR=$(pwd) DB=$SRCDIR/cpp-sqli-$(cd $SRCDIR && git rev-parse --short HEAD) echo $DB test -d "$DB" && rm -fR "$DB" mkdir -p "$DB" cd $SRCDIR && codeql database create --language=cpp -s . -j 8 -v $DB --command='./build.sh' #+END_SRC Then add this database directory to your VS Code =DATABASES= tab. ** Tests using a default query ** TODO supplement sources via the model editor ** TODO supplement codeql: Add to FlowSource or a subclass Note: this /one area/ that just has to be known. Browsing source will *not* help you. CodeQL reading hint: : class ActiveThreatModelSource extends DataFlow::Node uses : this.(SourceNode).getThreatModel() So following the cast (SourceNode) may be useful: #+BEGIN_SRC java /** ,* A data flow source. ,*/ abstract class SourceNode extends DataFlow::Node #+END_SRC Following the =abstract class= is promising: #+BEGIN_SRC java abstract class RemoteFlowSource extends SourceNode #+END_SRC and others. XX: no java, use C In [[../ql/java/ql/lib/Customizations.qll]] notice the comments mentioning RemoteFlowSource. Use imports from [[../ql/java/ql/src/Security/CWE/CWE-089/SqlTainted.ql]] but note that there are conflicts. you will use : private import semmle.code.java.dataflow.FlowSources Follow this to FlowSources, and find the mentioned RemoteFlowSource : abstract class RemoteFlowSource extends SourceNode Add the custom source. The modified [[../ql/java/ql/lib/Customizations.qll]] is #+BEGIN_SRC java import java private import semmle.code.java.dataflow.FlowSources class ReadLine extends RemoteFlowSource { ReadLine() { exists(Call read | read.getCallee().getName() = "readLine" and read = this.asExpr() ) } override string getSourceType() { result = "Console readline" } } #+END_SRC Note that the predicate #+BEGIN_SRC java module QueryInjectionFlowConfig implements DataFlow::ConfigSig { predicate isSource(DataFlow::Node src) { src instanceof ActiveThreatModelSource } ...; } #+END_SRC now also returns the readLine() result -- although we extended RemoteFlowSource, not ActiveThreatModelSource ** TODO supplement codeql: Add to models-as-data - schema in codeql: [[../ql/cpp/ql/lib/semmle/code/cpp/dataflow/internal/ExternalFlowExtensions.qll]] #+BEGIN_SRC java extensible predicate sourceModel( string namespace, string type, boolean subtypes, string name, string signature, string ext, string output, string kind, string provenance, QlBuiltins::ExtensionId madId ); #+END_SRC - schema in json: ../tmp.bundle/codeql/qlpacks/codeql/cpp-queries/1.3.0/.codeql/libraries/codeql/cpp-all/3.0.0/.packinfo #+BEGIN_SRC sh ../bin/hovjson < ../tmp.bundle/codeql/qlpacks/codeql/cpp-queries/1.3.0/.codeql/libraries/codeql/cpp-all/3.0.0/.packinfo { "extensible_predicate_metadata": { "extensible_predicates": [ { "name": "sourceModel", "parameters": [ {"name": "namespace","type": "string"}, {"name": "type","type": "string"}, {"name": "subtypes","type": "boolean"}, {"name": "name","type": "string"}, {"name": "signature","type": "string"}, {"name": "ext","type": "string"}, {"name": "output","type": "string"}, {"name": "kind","type": "string"}, {"name": "provenance","type": "string"} ], "has_origin": true, "path": "semmle/code/cpp/dataflow/internal/ExternalFlowExtensions.qll", "start_line": 8, "start_column": 1, "end_line": 11, "end_column": 3 }, .... ] } } #+END_SRC - note: QlBuiltins::ExtensionId madId is only in ql, not json. - file format sample: ../ql/cpp/ql/lib/ext/empty.model.yml - data sample: #+begin_src javascript # partial model of windows system calls extensions: - addsTo: pack: codeql/cpp-all extensible: sourceModel data: # namespace, type, subtypes, name, signature, ext, output, kind, provenance # processenv.h - ["", "", False, "GetCommandLineA", "", "", "ReturnValue[*]", "local", "manual"] #+end_src - add a =sourceModel= #+BEGIN_SRC yaml extensions: - addsTo: pack: codeql/cpp-all extensible: sourceModel data: - [ "", "", False, "get_user_info", "", "", "ReturnValue[*]", "remote", "manual", ] - addsTo: pack: codeql/cpp-all extensible: sinkModel data: [] - addsTo: pack: codeql/cpp-all extensible: summaryModel data: [] #+END_SRC #+BEGIN_SRC sh 0:$ ls .github/codeql/extensions/ jedis-db-local-java/ sqlite-db/ (venv) hohn@ghm3 ~/work-gh/codeql-lab 0:$ cp -r .github/codeql/extensions/sqlite-db .github/codeql/extensions/sqlite-db-c pushd .github/codeql/extensions/sqlite-db-c sed -i -e 's/java-all/cpp-all/g;' codeql-pack.yml # TODO also replace pack name 0:$ cat > models/sqlite.model.yml extensions: - addsTo: pack: codeql/cpp-all extensible: sourceModel data: - [ "", "", False, "get_user_info", "", "", "ReturnValue[*]", "remote", "manual", ] - addsTo: pack: codeql/cpp-all extensible: sinkModel data: [] - addsTo: pack: codeql/cpp-all extensible: summaryModel data: [] #+END_SRC - back to SqlTainted.ql - - In the model editor, we see a java.io.*Console.*readline' (using =show already modeled= option) #+BEGIN_SRC sh 1:$ rg -i 'java.io.*Console.*readline' ql/java ql/java/ql/lib/ext/generated/java.io.model.yml 16: - ["java.io", "Console", False, "readLine", "()", "", "Argument[this]", "ReturnValue", "taint", "df-generated"] 17: - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[0]", "Argument[this]", "taint", "df-generated"] 18: - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[1].ArrayElement", "Argument[this]", "taint", "df-generated"] 19: - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[this]", "ReturnValue", "taint", "df-generated"] #+END_SRC note: this file is in the generated/ tree. The current readline modeling is in the =summaryModel= section; we need it in a =sourceModel= #+BEGIN_SRC yaml extensions: - addsTo: pack: codeql/java-all extensible: summaryModel data: ... - ["java.io", "Console", False, "readLine", "()", "", "Argument[this]", "ReturnValue", "taint", "df-generated"] - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[0]", "Argument[this]", "taint", "df-generated"] - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[1].ArrayElement", "Argument[this]", "taint", "df-generated"] - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument #+END_SRC The model editor will not show this because its already modeled. To illustrate text-based additions, we'll use plain text. Starting from #+BEGIN_SRC yaml extensions: - addsTo: pack: codeql/java-all extensible: summaryModel data: ... - ["java.io", "Console", False, "readLine", "()", "", "Argument[this]", "ReturnValue", "taint", "df-generated"] - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[0]", "Argument[this]", "taint", "df-generated"] - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[1].ArrayElement", "Argument[this]", "taint", "df-generated"] - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument #+END_SRC and the field information #+BEGIN_SRC java extensible predicate sourceModel( string package, string type, boolean subtypes, string name, string signature, string ext, string output, string kind, string provenance, QlBuiltins::ExtensionId madId ); #+END_SRC Starting from =summaryModel= #+BEGIN_SRC yaml # summaryModel # string package, string type, boolean subtypes, string name, string signature, string ext, string input, string output, string kind, string provenance, QlBuiltins::ExtensionId madId - ["java.io", "Console", False, "readLine", "()", "", "Argument[this]", "ReturnValue", "taint", "df-generated"] #+END_SRC we can construct the =sourceModel= #+BEGIN_SRC yaml extensions: - addsTo: pack: codeql/java-all extensible: sourceModel data: # sourceModel # string package, string type, boolean subtypes, string name, string signature, string ext, string output, string kind, string provenance, QlBuiltins::ExtensionId madId - ["java.io", "Console", False, "readLine", "()", "", "ReturnValue", "remote", "manual"] # # from original # # summaryModel # # string package, string type, boolean subtypes, string name, string signature, string ext, string input, string output, string kind, string provenance, QlBuiltins::ExtensionId madId # - ["java.io", "Console", False, "readLine", "()", "", "Argument[this]", "ReturnValue", "taint", "df-generated"] #+END_SRC and move this into [[../.github/codeql/extensions/sqlite-db/models/sqlite.model.yml]] To ensure that these model extensions are applied during query runs, include this setting #+begin_src javascript { ..., "settings": { ..., "codeQL.runningQueries.useExtensionPacks": "all" } } #+end_src in the workspace configuration file [[../qllab.code-workspace]] In some environments (e.g., older VS Code versions), you may also need to replicate this setting in [[../.vscode/settings.json]]; there it simplifies to #+begin_src javascript "codeQL.runningQueries.useExtensionPacks": "all" #+end_src Now we can run [[../ql/java/ql/src/Security/CWE/CWE-089/SqlTainted.ql]] again.