From 3869a61388435dd373417086e2c6923f17704407 Mon Sep 17 00:00:00 2001 From: Michael Hohn Date: Wed, 30 Jul 2025 16:34:54 -0700 Subject: [PATCH] major revisions --- README.org | 78 ++++++++++++++++++++++++++++++++++++------------------ 1 file changed, 52 insertions(+), 26 deletions(-) diff --git a/README.org b/README.org index 53375b2..38b9287 100644 --- a/README.org +++ b/README.org @@ -74,9 +74,27 @@ - [[./codeql-sqlite-java/TaintFlowDebugging.ql]] - [[./codeql-sqlite-java/TaintFlowDebugging.md]] -*** Debugging data flow config (instead of taint flow), C +*** TODO Debugging data flow config (instead of taint flow), C + A corresponding example for C is planned, using a simplified query to trace + value propagation in [[file:~/work-gh/codeql-lab/codeql-dataflow-sql-injection-c/add-user.c]]. + Unlike Java, C may require manual modeling even to visualize basic flows. ** Modeling + There are two primary approaches to modeling: direct use of CodeQL predicates + and the models-as-data system. The models-as-data system is implemented in QL + but relies on external YAML files that are interpreted at query evaluation + time. + + The model editor provides a GUI for managing YAML-based models, but the + underlying format is identical to that used by the models-as-data system. In C + and other cases where GUI support is limited or unavailable, we write these + YAML models manually and invoke them directly from queries. + + When YAML models are written directly, the use of GPT-based tooling becomes + very natural. GPTs can extract function signatures, parameter semantics, and + flow annotations from documentation or code examples, then generate valid YAML + model entries automatically. + *** Review: SQLite Injection Workshop, Java We begin with a recap of the Java-based injection example, focusing on the vulnerable code in [[./codeql-sqlite-java/AddUser.java][AddUser.java]]. Following that, we examine a fully manual @@ -128,31 +146,10 @@ *** Review: SQLite Injection Workshop (C) - This is the C version of the workshop. - -*** Extending Queries with Customizations.qll for C - While most CodeQL-supported languages provide out-of-the-box support for - =Customizations.qll=, C and C++ do not include this by default. However, it is - possible to enable such support by building a custom CodeQL bundle. This can - be done using the CLI tool at - https://github.com/advanced-security/codeql-bundle. Since the tool functions - largely as a black box, we provide a more detailed illustration of the - underlying steps. - - A working demonstration is available in - [[./codeql-dataflow-sql-injection-c/README.org]]. In languages like Java, - =Customizations.qll= is included automatically via imports from - =.qll=, such as [[./ql/java/ql/lib/java.qll][java.qll]] importing [[./ql/java/ql/lib/Customizations.qll][Customizations.qll]], which defines - user-extensible predicates for flow modeling. - - For C/C++, the process requires explicit modification: - 1. Modify =ql/cpp/ql/lib/cpp.qll= to import =Customizations.qll=. - 2. Create and populate =ql/cpp/ql/lib/Customizations.qll= with custom sources/sinks or extensions. - 3. Rebuild the CodeQL bundle to include these changes. - - This customization enables consistent user-defined flow modeling across - languages, making it possible to reuse modeling patterns from Java or Python - in C/C++ contexts. + This is the C version of the injection workshop, based on + [[file:~/work-gh/codeql-lab/codeql-dataflow-sql-injection-c/add-user.c]]. It + serves as the basis for both the "models-as-data" manual modeling and the + extension via Customizations.qll. *** Use models-as-data QL code directly (no graphical editor) This section focuses on applying the models-as-data system without using the @@ -179,6 +176,35 @@ calls a function already modeled as a source—to illustrate how user-defined extensions propagate through the query logic. +*** Extending Queries with Customizations.qll for C + The manual YAML modeling approach from the previous section works well for + isolated cases. However, to integrate seamlessly with idiomatic CodeQL + queries, we show how to extend the standard QL libraries via + =Customizations.qll= + + While most CodeQL-supported languages provide out-of-the-box support for + =Customizations.qll=, C and C++ do not include this by default. However, it is + possible to enable such support by building a custom CodeQL bundle. This can + be done using the CLI tool at + https://github.com/advanced-security/codeql-bundle. Since the tool functions + largely as a black box, we provide a more detailed illustration of the + underlying steps. + + A working demonstration is available in + [[./codeql-dataflow-sql-injection-c/README.org]]. In languages like Java, + =Customizations.qll= is included automatically via imports from + =.qll=, such as [[./ql/java/ql/lib/java.qll][java.qll]] importing [[./ql/java/ql/lib/Customizations.qll][Customizations.qll]], which defines + user-extensible predicates for flow modeling. + + For C/C++, the process requires explicit modification: + 1. Modify =ql/cpp/ql/lib/cpp.qll= to import =Customizations.qll=. + 2. Create and populate =ql/cpp/ql/lib/Customizations.qll= with custom sources/sinks or extensions. + 3. Rebuild the CodeQL bundle to include these changes. + + This customization enables consistent user-defined flow modeling across + languages, making it possible to reuse modeling patterns from Java or Python + in C/C++ contexts. + ** TODO codeql-bundling TBD: detailed description of https://github.com/advanced-security/codeql-bundle, in