codeql-lab/codeql-jedis-java/README.org

* Jedis Codeql Setup
  - fork at https://github.com/hohn/jedis
  - github db build: enable code scanning, advanced config
    - only java-kotlin, build-mode: none.
    - creates https://github.com/hohn/jedis/blob/master/.github/workflows/codeql.yml
    - action run at https://github.com/hohn/jedis/actions/workflows/codeql.yml
    - db download
      #+BEGIN_SRC sh
        # list dbs
        curl -H "Authorization: token $GITHUB_TOKEN" \
             https://api.github.com/repos/hohn/jedis/code-scanning/analyses


        # Get DB via curl
        cd ~/work-gh/codeql-lab/assets
        curl -H "Authorization: token $GITHUB_TOKEN" \
             -H "Accept: application/zip" \
             -L \
             https://api.github.com/repos/hohn/jedis/code-scanning/codeql/databases/java \
             -o jedis-database-gh.zip
      #+END_SRC
    - db at ~/work-gh/codeql-lab/assets/jedis-database-gh.zip
  - local db build:
    #+BEGIN_SRC sh
      cd ~/work-gh/codeql-lab/

      # Add the submodule
      git submodule add https://github.com/hohn/jedis extern/jedis

      # Initialize and clone the submodule
      git submodule update --init --recursive


      # Build directly once to resolve any errors
      cd ~/work-gh/codeql-lab/extern/jedis
      mvn install -DskipTests=true -Dmaven.javadoc.skip=true -B -V

      # Build under codeql
      # Step 1: Clean any prior Maven builds
      cd ~/work-gh/codeql-lab/extern/jedis
      mvn clean

      # Step 2: Run CodeQL DB creation with mvn install
      cd ~/work-gh/codeql-lab
      codeql database create assets/jedis-db-local \
             --overwrite \
             --language=java \
             --command="mvn install -DskipTests=true -Dmaven.javadoc.skip=true -B -V" \
             --source-root=extern/jedis
    #+END_SRC

* Jedis Codeql Modeling
** Setup and Start
  #+BEGIN_SRC sh
    # Step 1: Go to your CodeQL lab directory
    cd ~/work-gh/codeql-lab

    # Step 2: Extract the prebuilt CodeQL database for the Jedis project
    unzip -q assets/jedis-db-local.zip

    # Step 3: Extract the CodeQL command-line tools (platform-specific)
    unzip -q assets/codeql-osx64.zip

    # Step 4: Change directory to the unpacked CodeQL CLI tools
    cd ~/work-gh/codeql-lab/codeql

    # Step 5: Add the CodeQL CLI directory to your shell's PATH
    # This allows you to run `codeql` from any location
    export PATH="$(pwd):$PATH"

    # Step 6: Launch Visual Studio Code with the lab workspace
    code qllab.code-workspace

    # In VS Code, perform the following setup manually:
    # - Set the current database to: jedis-db-local
    #   (Usually from the CodeQL extension pane – this connects the UI to your analysis DB)
    # - Set the CodeQL CLI executable to: ~/work-gh/codeql-lab/codeql/codeql
    #   (Tell the extension where to find the CLI you just extracted)
    # - In the CodeQL extension tab, scroll to the bottom and select:
    #   'CodeQL: Method modeling' to begin a guided modeling tutorial

  #+END_SRC
** Using the Editor
   Note that just by starting =CodeQL: Method modeling=, the new file
   : .github/codeql/extensions/jedis-db-local-java/codeql-pack.yml
   is created.

** Relevant Queries
   A quick =grep= shows
   #+BEGIN_SRC text
     grep 'java.*modelgen' files  |grep -v test/

     ql/java/ql/src/utils/modelgenerator
     ql/java/ql/src/utils/modelgenerator/CaptureNeutralModels.ql
     ql/java/ql/src/utils/modelgenerator/CaptureTypeBasedSummaryModels.ql
     ql/java/ql/src/utils/modelgenerator/CaptureSinkModels.ql
     ql/java/ql/src/utils/modelgenerator/CaptureContentSummaryModels.ql
     ql/java/ql/src/utils/modelgenerator/internal
     ql/java/ql/src/utils/modelgenerator/internal/CaptureModels.qll
     ql/java/ql/src/utils/modelgenerator/internal/CaptureTypeBasedSummaryModels.qll
     ql/java/ql/src/utils/modelgenerator/internal/CaptureModelsPrinting.qll
     ql/java/ql/src/utils/modelgenerator/CaptureSummaryModels.ql
     ql/java/ql/src/utils/modelgenerator/RegenerateModels.py
     ql/java/ql/src/utils/modelgenerator/CaptureSourceModels.ql
     ql/java/ql/src/utils/modelgenerator/debug
     ql/java/ql/src/utils/modelgenerator/debug/CaptureSummaryModelsPartialPath.ql
     ql/java/ql/src/utils/modelgenerator/debug/CaptureSummaryModelsPath.ql
     ql/java/ql/src/utils/modelgenerator/debug/README.md
   #+END_SRC

** Primary Query File
   The primary query file is
   : ../ql/java/ql/src/utils/modelgenerator/internal/CaptureModels.qll
   This acts as the backbone, exposing traits like:

   - SummaryModelGeneratorInput
   - ModelGeneratorCommonInput
   - isPrimitiveTypeUsedForBulkData(...)
   - Likely common predicates such as:
     + hasNoSideEffects(...)
     + isNeutralReturn(...)
     + isBulkGetterLike(...)

   These are imported by:
     - CaptureSinkModels.ql
     - CaptureSummaryModels.ql
     - CaptureContentSummaryModels.ql
     - CaptureHeuristicSummaryModels.ql

   Design: Three Modeling Targets
     | Module                       | Implements                      | Purpose                                          |
     | ---------------------------- | ------------------------------- | ------------------------------------------------ |
     | =SummaryModelGeneratorInput= | =SummaryModelGeneratorInputSig= | Models pass-through or computed summaries        |
     | =SourceModelGeneratorInput=  | =SourceModelGeneratorInputSig=  | Models user-controlled or origin taint sources   |
     | =SinkModelGeneratorInput=    | =SinkModelGeneratorInputSig=    | Models taint sinks (e.g., logging, SQL, network) |

   Shared Input System
     ModelGeneratorCommonInput provides:
     - Name formatting
     - Type filtering (isRelevantType)
     - Signature stringification
     - “Approximate output” helpers like Argument[pos].Element

     This gives a stable data interface to the rest of the system.

   Filtering logic
     #+BEGIN_SRC java
       private predicate relevant(Callable api) {
         api.isPublic() and
         api.getDeclaringType().isPublic() and
         api.fromSource() and
         not isUninterestingForModels(api) and
         not isInfrequentlyUsed(api.getCompilationUnit())
       }
     #+END_SRC

** Experiment with test clone
   The needed imports are private, so clone
   : ql/java/ql/test/utils/modelgenerator/dataflow/CaptureSourceModels.ql
   and experiment there.

   #+BEGIN_SRC java
     import java
     import utils.modelgenerator.internal.CaptureModels
     import SourceModels
     import utils.test.InlineMadTest

     module InlineMadTestConfig implements InlineMadTestConfigSig {
       string getCapturedModel(Callable c) { result = Heuristic::captureSource(c) }

       string getKind() { result = "source" }
     }

     import InlineMadTest<InlineMadTestConfig>


   #+END_SRC

* Modeling Jedis as a Dependency in Model Editor
** Set up and run Editor
   To model =jedis= for taint analysis using the /model editor/, select the /"model
   as dependency"/ option.

   When this mode is active, the following CodeQL query is used:
   : /Users/hohn/work-gh/codeql-lab/ql/java/ql/src/utils/modeleditor/FrameworkModeEndpoints.ql

   This query defines:
   #+BEGIN_SRC java
     from PublicEndpointFromSource endpoint, boolean supported, string type
     where
         supported = isSupported(endpoint) and
         type = supportedType(endpoint)
     select endpoint, endpoint.getPackageName(), endpoint.getTypeName(), endpoint.getName(),
         endpoint.getParameterTypes(), supported,
         endpoint.getCompilationUnit().getParentContainer().getBaseName(), type
   #+END_SRC

   There is a direct connection between this query and output columns in the model
   editor:
   - =supported = true= → shows in the UI as /"Method already modeled"/
   - =supported = false= → shown as /"Unmodeled"/

** Files Created or Modified by the Modeling Workflow
   - Upon launching =CodeQL: Method modeling=, a new pack manifest is created:
     [[../.github/codeql/extensions/jedis-db-local-java/codeql-pack.yml][codeql-pack.yml]]
   - After selecting methods and saving, modeling results are written to:
     [[../.github/codeql/extensions/jedis-db-local-java/models/redis.clients.jedis.model.yml][redis.clients.jedis.model.yml]]

** Workspace Configuration Required
   To ensure that these model extensions are applied during query runs, include
   the setting
   : "codeQL.runningQueries.useExtensionPacks": "all"
   in the workspace configuration file [[../qllab.code-workspace]]

   In some environments (e.g., older VS Code versions), you may also need to
   replicate this setting in [[../.vscode/settings.json]]

* Verifying the Modeled Sink
  Once the modeling is in place, a dataflow query like the following can be used
  to confirm the modeled sinks:

  #+BEGIN_SRC java
    import java
    private import semmle.code.java.dataflow.ExternalFlow
    private import semmle.code.java.dataflow.DataFlow

    from DataFlow::Node n, string type
    where sinkNode(n, type) and type = "code-injection"
    select n, type
  #+END_SRC

  Sample query result (run on the =jedis-db-local= database):
  - example.ql on jedis-db-local - finished in 2 seconds (14 results)
    |  1 | script                           | code-injection |
    |  2 | getBytes(...)                    | code-injection |
    |  3 | script                           | code-injection |
    |  4 | script                           | code-injection |
    |  5 | script                           | code-injection |
    |  6 | script                           | code-injection |
    |  7 | "return redis.call('get','foo')" | code-injection |
    |  8 | "return redis.call('get','foo')" | code-injection |
    |  9 | encode(...)                      | code-injection |
    | 10 | encode(...)                      | code-injection |
    | 11 | "return redis.call('get','foo')" | code-injection |
    | 12 | "return redis.call('get','foo')" | code-injection |
    | 13 | script                           | code-injection |
    | 14 | "return {}"                      | code-injection |

* Identify usage of injection-related models in existing queries
  To verify whether existing CodeQL queries make use of the injection-related
  models, we can search for files in the =ql/java= and =ql/cpp= directories that
  contain the string =-injection=. This string often appears in taint-tracking
  configuration or query metadata.

** Java Queries

   The following command locates =.ql= and =.qll= files in the Java query suite that reference =-injection=:

   #+BEGIN_SRC sh
     rg -l -- '-injection' ql/java | grep '\.qll*'
   #+END_SRC

   Example output:

   #+BEGIN_SRC text
     ql/java/ql/src/Security/CWE/CWE-643/XPathInjection.ql
     ql/java/ql/src/Security/CWE/CWE-078/ExecTainted.ql
     ql/java/ql/src/Security/CWE/CWE-022/TaintedPath.ql
     ql/java/ql/src/Security/CWE/CWE-117/LogInjection.ql
     ql/java/ql/src/Security/CWE/CWE-470/FragmentInjection.ql
     ql/java/ql/src/Security/CWE/CWE-470/FragmentInjectionInPreferenceActivity.ql
     ql/java/ql/src/Security/CWE/CWE-730/RegexInjection.ql
     ql/java/ql/lib/semmle/code/java/security/XsltInjection.qll
     ql/java/ql/src/Security/CWE/CWE-090/LdapInjection.ql
     ql/java/ql/lib/semmle/code/java/security/GroovyInjection.qll
     ql/java/ql/lib/semmle/code/java/security/XPath.qll
     ql/java/ql/lib/semmle/code/java/security/TaintedEnvironmentVariableQuery.qll
     ql/java/ql/src/Security/CWE/CWE-074/XsltInjection.ql
     ql/java/ql/src/Security/CWE/CWE-074/JndiInjection.ql
     ...
     ql/java/ql/src/utils/modelgenerator/internal/CaptureModels.qll
   #+END_SRC

   These files include both top-level queries (under =src/Security/...=) and reusable model libraries (under =lib/semmle/...=). Experimental and framework-specific queries are also included.

** C++ Queries
   Likewise, to check for C++ queries that reference =-injection=, use:

   #+BEGIN_SRC sh
     rg -l -- '-injection' ql/cpp | grep '\.qll*'
   #+END_SRC

   Example output:

   #+BEGIN_SRC text
     ql/cpp/ql/src/Security/CWE/CWE-078/ExecTainted.ql
     ql/cpp/ql/src/Security/CWE/CWE-022/TaintedPath.ql
     ql/cpp/ql/src/experimental/Security/CWE/CWE-078/WordexpTainted.ql
     ql/cpp/ql/src/Security/CWE/CWE-089/SqlTainted.ql
   #+END_SRC

   These files indicate active use of injection-related taint tracking in the C++ suite as well.

* TODO Modeling Gaps in SqlTainted.ql (Java)
  The built-in SQL injection query
  [[../ql/java/ql/src/Security/CWE/CWE-089/SqlTainted.ql]] correctly identifies the
  sink in the Jedis sample, but not the source. This is because
  =java.io.Console.readLine()= is modeled as a taint *step*, not a *source*. Since
  the model editor excludes functions that are already modeled in any capacity,
  this function is not visible for editing.

  To detect the source, we must override or supplement the model manually—either
  by using the models-as-data mechanism or extending =Customizations.qll= with a
  new source declaration.

* TODO Modeling SQLite as a Dependency
  The directory [[../codeql-sqlite-java/]] contains a minimal Java sample derived from
  a prior workshop. It uses =sqlite-jdbc-3.36.0.1.jar= and serves as a small-scale
  test case for dependency-based modeling. This example is especially useful for
  illustrating subtle modeling issues.

  In particular, it uses =java.io.Console.readLine()=, which is already modeled as
  a taint *step*. However, for SQL injection tracking, we need it to act as a
  *source*. Because of its preexisting status, it does not appear in the model
  editor. To handle this, we must add a manual source override—either as a raw
  YAML model or as a hardcoded entry via =Customizations.qll=.

* TODO Creating a Vulnerable SQLite Sample for Query Visibility
  To ensure that taint-based queries (e.g., SqlTainted.ql) identify vulnerable
  behavior, the sink function -- such as =.eval()= or =sqlite3_exec()= -- must
  actually be invoked in application code. It is not sufficient for the function
  to merely exist in a linked library or dependency. CodeQL analysis only
  considers *reachable* code in the source tree.

  To address this, we modify the file [[../codeql-sqlite-java/AddUser.java]] to
  include a realistic, vulnerable flow that mimics typical usage patterns. For
  example, the program should:

  1. Accept user input (e.g., via =System.in=, =BufferedReader=, or
     =Console.readLine()=),
  2. Store it in a variable without sanitization,
  3. Construct an SQL query using string concatenation,
  4. Call =eval()= or =sqlite3_exec()= with the tainted query.

  This guarantees that the sink is both *present* and *exercised*, allowing
  built-in and custom CodeQL queries to detect the dataflow path from source to
  sink.

  The same flow structure used in the Jedis version can be reused here. That way,
  we maintain consistency across modeling examples while switching the underlying
  dependency from Redis to SQLite.