* Using sqlite to illustrate models-as-data This section demonstrates the use of the models-as-data system by analyzing a small Java application that uses the SQLite JDBC driver. The example is adapted from a CodeQL workshop. ** Build the CodeQL Database To get started, build the CodeQL database for the SQLite-backed Java sample. Adjust paths as needed. #+BEGIN_SRC sh SRCDIR=$(pwd) DB=$SRCDIR/java-sqlite-$(cd $SRCDIR && git rev-parse --short HEAD).db echo $DB test -d "$DB" && rm -fR "$DB" mkdir -p "$DB" # Ensure the correct CodeQL version is in your PATH export PATH="$(cd ../codeql && pwd):$PATH" codeql database create --language=java -s . -j 8 -v $DB --command='./build.sh' # Check for presence of AddUser.java in the resulting database unzip -v $DB/src.zip | grep AddUser #+END_SRC Then add this database directory to your VS Code =DATABASES= tab. ** Tests Using a Default Query You can run the standard SQL injection query: [[../ql/java/ql/src/Security/CWE/CWE-089/SqlTainted.ql]] but it will return no results. However, it does help identify which classes are being analyzed as potential sources and sinks. Instead, run the diagnostic query: [[./Illustrations.ql]] You can run it from the CLI: #+BEGIN_SRC sh codeql query run \ -v \ --database java-sqlite-e2e555c.db \ --output result.bqrs \ --threads=12 \ --ram=14000 \ Illustrations.ql codeql bqrs decode --format=text result.bqrs | sed -n '/^Result set: #select/,$p' #+END_SRC The result will look like: #+BEGIN_SRC text Result set: #select | ui | qsi | +------+-------+ | args | query | #+END_SRC In the editor, these correspond to: 1. =main(String[] args)= — source-like 2. =conn.createStatement().executeUpdate(query)= — sink However, =System.console().readLine()= is not detected as a source. Therefore, =SqlTainted.ql= cannot find a complete flow. ** Supplement Sources via the Model Editor - [ ] We observe no flow from source to sink - A sink exists (=executeUpdate=) - But no recognized source is found - [ ] There are two ways to fix this: 1. Add a new source in =Customizations.qll= 2. Add a new source in the models-as-data YAML format ** Supplement CodeQL: Write a Full Manual Query A manual dataflow query is already available: [[./full-query.ql]] This can trace the data manually even when standard configuration fails. ** Supplement CodeQL: Add to FlowSource or a Subclass Sometimes, the only way to identify how to extend a source is to understand how CodeQL internally resolves source nodes. Key class hierarchies: #+BEGIN_SRC java abstract class SourceNode extends DataFlow::Node abstract class RemoteFlowSource extends SourceNode #+END_SRC Follow usage in: - [[../ql/java/ql/lib/Customizations.qll]] - [[../ql/java/ql/src/Security/CWE/CWE-089/SqlTainted.ql]] Then modify =Customizations.qll= by adding the custom source. The modified [[../ql/java/ql/lib/Customizations.qll]] is #+BEGIN_SRC java import java private import semmle.code.java.dataflow.FlowSources class ReadLine extends RemoteFlowSource { ReadLine() { exists(Call read | read.getCallee().getName() = "readLine" and read = this.asExpr() ) } override string getSourceType() { result = "Console readline" } } #+END_SRC This allows #+BEGIN_SRC java predicate isSource(DataFlow::Node src) { src instanceof ActiveThreatModelSource } #+END_SRC to include =readLine()= even though we extended =RemoteFlowSource=. ** Supplement CodeQL: Add to models-as-data To modify the dataflow configuration using the models-as-data mechanism, we will explicitly define a new source model for =java.io.Console.readLine=. This function is already modeled in CodeQL—but only as a =summaryModel=. For SQL injection tracking, we want to treat it as a =sourceModel=. Since it’s already covered in auto-generated data, it does not appear in the model editor interface. We begin by locating the existing model: - The model schema used for extensions is defined here: [[../ql/java/ql/lib/semmle/code/java/dataflow/internal/ExternalFlowExtensions.qll]] - For reference, see a sample manually written model YAML: [[../.github/codeql/extensions/jedis-db-local-java/models/redis.clients.jedis.model.yml]] - To verify that readline is already modeled, use ripgrep: #+BEGIN_SRC sh 1:$ rg -i 'java.io.*Console.*readline' ql/java ql/java/ql/lib/ext/generated/java.io.model.yml 16: - ["java.io", "Console", False, "readLine", "()", "", "Argument[this]", "ReturnValue", "taint", "df-generated"] 17: - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[0]", "Argument[this]", "taint", "df-generated"] 18: - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[1].ArrayElement", "Argument[this]", "taint", "df-generated"] 19: - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[this]", "ReturnValue", "taint", "df-generated"] #+END_SRC Note: this model is auto-generated (=df-generated=) and appears under =summaryModel=. Here is an example of that structure: #+BEGIN_SRC yaml extensions: - addsTo: pack: codeql/java-all extensible: summaryModel data: ... - ["java.io", "Console", False, "readLine", "()", "", "Argument[this]", "ReturnValue", "taint", "df-generated"] - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[0]", "Argument[this]", "taint", "df-generated"] - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument[1].ArrayElement", "Argument[this]", "taint", "df-generated"] - ["java.io", "Console", False, "readLine", "(String,Object[])", "", "Argument #+END_SRC Because this modeling is already present, the model editor UI will hide the function. To override it, we’ll define a new source manually in plain YAML. First, recall the schema definition for =sourceModel=: #+BEGIN_SRC java extensible predicate sourceModel( string package, string type, boolean subtypes, string name, string signature, string ext, string output, string kind, string provenance, QlBuiltins::ExtensionId madId ); #+END_SRC Starting from the existing =summaryModel=, #+BEGIN_SRC yaml # summaryModel # string package, string type, boolean subtypes, string name, string signature, string ext, string input, string output, string kind, string provenance, QlBuiltins::ExtensionId madId - ["java.io", "Console", False, "readLine", "()", "", "Argument[this]", "ReturnValue", "taint", "df-generated"] #+END_SRC we construct the following =sourceModel= definition instead: #+BEGIN_SRC yaml extensions: - addsTo: pack: codeql/java-all extensible: sourceModel data: # sourceModel # string package, string type, boolean subtypes, string name, string signature, string ext, string output, string kind, string provenance, QlBuiltins::ExtensionId madId - ["java.io", "Console", False, "readLine", "()", "", "ReturnValue", "remote", "manual"] # # from original # # summaryModel # # string package, string type, boolean subtypes, string name, string signature, string ext, string input, string output, string kind, string provenance, QlBuiltins::ExtensionId madId # - ["java.io", "Console", False, "readLine", "()", "", "Argument[this]", "ReturnValue", "taint", "df-generated"] #+END_SRC Place this in: [[../.github/codeql/extensions/sqlite-db/models/sqlite.model.yml]] To ensure the model extension is applied, you must instruct the CodeQL extension to include all extension packs. In =qllab.code-workspace=, add: #+BEGIN_SRC javascript { ..., "settings": { ..., "codeQL.runningQueries.useExtensionPacks": "all" } } #+END_SRC If needed, also include this setting in =.vscode/settings.json=: #+BEGIN_SRC javascript "codeQL.runningQueries.useExtensionPacks": "all" #+END_SRC Now re-run the query: [[../ql/java/ql/src/Security/CWE/CWE-089/SqlTainted.ql]] You should see flows that originate at =readLine()= and reach the SQL sink. This confirms that your manual =sourceModel= extension is effective.