From 8d1d29fe1016b9170854aab29aaf51f91e9a813e Mon Sep 17 00:00:00 2001 From: michael hohn Date: Mon, 1 Sep 2025 21:45:53 -0700 Subject: [PATCH] Add prompt support files generated from rst doc --- codeql-docs/README.org | 19 ++++ ...e-classes-for-working-with-go-programs.gpt | 90 +++++++++++++++++++ codeql-docs/analyzing-data-flow-in-go.gpt | 50 +++++++++++ codeql-docs/basic-query-for-go-code.gpt | 36 ++++++++ codeql-docs/codeql-for-go.gpt | 21 +++++ codeql-docs/codeql-library-for-go.gpt | 37 ++++++++ .../customizing-library-models-for-go.gpt | 44 +++++++++ 7 files changed, 297 insertions(+) create mode 100644 codeql-docs/README.org create mode 100644 codeql-docs/abstract-syntax-tree-classes-for-working-with-go-programs.gpt create mode 100644 codeql-docs/analyzing-data-flow-in-go.gpt create mode 100644 codeql-docs/basic-query-for-go-code.gpt create mode 100644 codeql-docs/codeql-for-go.gpt create mode 100644 codeql-docs/codeql-library-for-go.gpt create mode 100644 codeql-docs/customizing-library-models-for-go.gpt diff --git a/codeql-docs/README.org b/codeql-docs/README.org new file mode 100644 index 0000000..d304bc6 --- /dev/null +++ b/codeql-docs/README.org @@ -0,0 +1,19 @@ +* TODO Direct Conversion RST -> Prompt by GPT +** For Go + + [[../ql/docs/codeql/codeql-language-guides/abstract-syntax-tree-classes-for-working-with-go-programs.rst]] + - ./abstract-syntax-tree-classes-for-working-with-go-programs.gpt + + [[../ql/docs/codeql/codeql-language-guides/analyzing-data-flow-in-go.rst]] + - ./analyzing-data-flow-in-go.gpt + + [[../ql/docs/codeql/codeql-language-guides/basic-query-for-go-code.rst]] + - ./basic-query-for-go-code.gpt + + [[../ql/docs/codeql/codeql-language-guides/codeql-for-go.rst]] + - ./codeql-for-go.gpt + + [[../ql/docs/codeql/codeql-language-guides/codeql-library-for-go.rst]] + - ./codeql-library-for-go.gpt + + [[../ql/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst]] + - ./customizing-library-models-for-go.gpt + +** For Python + +** For C/C++ + diff --git a/codeql-docs/abstract-syntax-tree-classes-for-working-with-go-programs.gpt b/codeql-docs/abstract-syntax-tree-classes-for-working-with-go-programs.gpt new file mode 100644 index 0000000..c494963 --- /dev/null +++ b/codeql-docs/abstract-syntax-tree-classes-for-working-with-go-programs.gpt @@ -0,0 +1,90 @@ +Purpose +- Write CodeQL queries over Go by navigating the Go AST classes. +- Model: Syntax → CodeQL class hierarchy; use predicates to access parts (condition, body, operands). +- Pattern: get(), getA(), getOperand>(), getAnArgument(), getCallee(). + +Core Namespaces +- Statements: subclasses of Stmt. +- Expressions: subclasses of Expr (literals, unary, binary, calls, selectors, etc.). +- Declarations: FuncDecl, GenDecl (+ ImportSpec, TypeSpec, ValueSpec). +- Types: TypeExpr nodes (ArrayTypeExpr, StructTypeExpr, FuncTypeExpr, InterfaceTypeExpr, MapTypeExpr, ChanTypeExpr variants). +- Names/Selectors: SimpleName, SelectorExpr; Name hierarchy: PackageName, TypeName, ValueName, LabelName. + +Statements (Stmt) +- EmptyStmt “;”; ExprStmt expression-as-stmt; BlockStmt “{…}”. +- IfStmt: if cond then [else]; supports init; Then/Else are blocks or statements. +- ForStmt: classic init/cond/post; LoopStmt superclass. RangeStmt: “for k,v := range expr { … }”. +- SwitchStmt/ExpressionSwitchStmt; TypeSwitchStmt; CaseClause inside switch. +- SelectStmt with CommClause; SendStmt “ch <- x”; RecvStmt “x = <-ch”. +- DeclStmt; Assignment family: SimpleAssignStmt (=), DefineStmt (:=), CompoundAssignStmt (+, -, *, /, %, &, |, ^, <<, >>, &^). +- IncStmt x++, DecStmt x--. GoStmt “go f()”; DeferStmt “defer f()”. LabeledStmt, BreakStmt, ContinueStmt, GotoStmt, FallthroughStmt, BadStmt. + +Expressions (Expr) +Literals +- BasicLit subclasses: IntLit, FloatLit, ImagLit, CharLit/RuneLit, StringLit. +- CompositeLit: StructLit (T{…}), MapLit (map[K]V{…}). +- FuncLit: function literal (FuncDef). + +UnaryExpr (UnaryExpr) +- PlusExpr “+x”, MinusExpr “-x”, NotExpr “!x”, ComplementExpr “^x”, AddressExpr “&x”, RecvExpr “<-x”. + +BinaryExpr (BinaryExpr) +- Arithmetic: MulExpr, QuoExpr, RemExpr, AddExpr, SubExpr. +- Shift: ShlExpr “<<”, ShrExpr “>>”. +- Logical: LandExpr “&&”, LorExpr “||”. +- Relational: LssExpr “<”, GtrExpr “>”, LeqExpr “<=”, GeqExpr “>=”. +- Equality: EqlExpr “==”, NeqExpr “!=”. +- Bitwise: AndExpr “&”, OrExpr “|”, XorExpr “^”, AndNotExpr “&^”. + +Type expressions (no common superclass) +- ArrayTypeExpr “[N]T”/“[]T”; StructTypeExpr “struct{…}”; FuncTypeExpr “func(…) …”. +- InterfaceTypeExpr; MapTypeExpr; ChanTypeExpr variants: SendChanTypeExpr, RecvChanTypeExpr, SendRecvChanTypeExpr. + +Name/Selector/Call +- Name subclasses: SimpleName, QualifiedName; ValueName → ConstantName, VariableName, FunctionName. +- SelectorExpr “X.Y” for pkg qualifiers and field/method access. +- CallExpr: getCallee(), getAnArgument(); method calls often SelectorExpr as callee. +- IndexExpr “a[i]”; SliceExpr “a[i:j:k]”; KeyValueExpr in CompositeLit. +- ParenExpr; StarExpr pointer deref/type; TypeAssertExpr “x.(T)”; Conversion “T(x)”. + +Declarations +- FuncDecl/FuncLit via FuncDef: getBody(), getName(), getParameter(i), getResultVar(i), getACall(). +- GenDecl with ImportSpec/TypeSpec/ValueSpec; Field/FieldList for params, results, struct/interface fields. + +Concurrency +- SelectStmt with CommClause; SendStmt; RecvExpr/RecvStmt; GoStmt; DeferStmt. + +Navigation Idioms +- If: getCondition(), getThen(), getElse(); For/Range: inspect init/cond/post or range expr. +- Calls: from CallExpr c, SelectorExpr s | c.getCallee() = s and s.getMemberName() = "Foo". +- Method vs function: SelectorExpr callee vs SimpleName callee. +- Switch/TypeSwitch: use CaseClause, getExpr(i)/getStmt(i); Select: CommClause. +- Assign: match AssignStmt subclasses; short var define is DefineStmt. +- Binary/Unary: use specific subclasses or operator accessors. +- Literals: filter BasicLit subclasses; CompositeLit elements via keys/values. + +Selection Patterns (QL sketches) +- Method calls by name: + from CallExpr call, SelectorExpr sel + where call.getCallee() = sel and sel.getMemberName() = "Close" + select call +- Range over map/slice: + from RangeStmt r select r +- Short var with channel receive: + from RecvStmt rs select rs +- Struct literal of type Point: + from StructLit lit where lit.getType().getName() = "Point" select lit +- Defer call: + from DeferStmt d, CallExpr c where d.getExpr() = c select d, c + +Tips +- Prefer class tests over string parsing. Disambiguate type conversions (CallExpr callee is a TypeExpr). +- Inc/Dec are statements, not expressions. Handle ":=" vs "=" separately. Exclude BadStmt/BadExpr. + +Cheatsheet (syntax → class) +- If: IfStmt; For: ForStmt; Range: RangeStmt; Switch: SwitchStmt/ExpressionSwitchStmt; Type switch: TypeSwitchStmt; Select: SelectStmt; Case: CaseClause; Select case: CommClause. +- Assign: SimpleAssignStmt (=), DefineStmt (:=), CompoundAssignStmt; Inc/Dec: IncStmt, DecStmt. +- Call: CallExpr; Selector: SelectorExpr; Index/Slice: IndexExpr/SliceExpr; Type assert: TypeAssertExpr; Unary/Binary: UnaryExpr/BinaryExpr subtypes. +- Literals: IntLit, FloatLit, ImagLit, CharLit/RuneLit, StringLit, StructLit, MapLit, FuncLit. +- Types: ArrayTypeExpr, StructTypeExpr, FuncTypeExpr, InterfaceTypeExpr, MapTypeExpr, ChanTypeExpr. +- Names/Entities: Name, ValueName, FunctionName; FuncDef, FuncDecl, FuncLit. diff --git a/codeql-docs/analyzing-data-flow-in-go.gpt b/codeql-docs/analyzing-data-flow-in-go.gpt new file mode 100644 index 0000000..a35d3c4 --- /dev/null +++ b/codeql-docs/analyzing-data-flow-in-go.gpt @@ -0,0 +1,50 @@ +Purpose +- Use CodeQL’s Go data-flow libraries to find how values and taint propagate. +- Cover local flow/taint (intra-procedural) and global flow/taint (inter-procedural), with configurable sources/sinks/barriers. + +Local Data Flow (DataFlow) +- Node hierarchy: Node (ExprNode, ParameterNode, InstructionNode). Map to/from AST/IR via asExpr/asParameter/asInstruction and exprNode/parameterNode/instructionNode. +- localFlowStep(a,b): immediate edge; localFlow(a,b) is transitive closure (localFlowStep*). +- Example: find all expressions that flow to call arg 0 of os.Open: + import go + from Function osOpen, CallExpr call, Expr src + where osOpen.hasQualifiedName("os","Open") and call.getTarget() = osOpen and + DataFlow::localFlow(DataFlow::exprNode(src), DataFlow::exprNode(call.getArgument(0))) + select src + +Local Taint (TaintTracking) +- localTaintStep / localTaint analogous to DataFlow but includes non-value-preserving steps (e.g., concatenation). +- Example: parameter → sink taint check with TaintTracking::localTaint. + +Global Data Flow (DataFlow::Global) +- Implement DataFlow::ConfigSig: + - isSource(Node): where flow originates. + - isSink(Node): where flow ends. + - isBarrier(Node) [optional]: blocks flow. + - isAdditionalFlowStep(a,b) [optional]: add extra edges. +- Apply module: module MyFlow = DataFlow::Global. +- Query via MyFlow::flow(source, sink). + +Global Taint (TaintTracking::Global) +- Same signature as Global data flow; includes taint-style non-value-preserving steps. +- Good for security queries (untrusted → sink). + +Predefined Sources +- RemoteFlowSource: user-controllable inputs; use as source for security findings. + +Idioms +- Targeted call/arg sink: define isSink by matching call.getTarget() and sink.asExpr() = call.getArgument(i). +- Literal-only filter: require source.asExpr() instanceof StringLit (or other BasicLit subclass). +- Env source example: class GetenvSource extends CallExpr where getTarget().hasQualifiedName("os","Getenv"). +- Compose flows: define MyFlow for literal→url.Parse, or taint from getenv→url.Parse using ConfigSig and Global. + +Exercises (patterns to emulate) +- Hard-coded strings → url.Parse (local/global). +- Sources from os.Getenv. +- Full path query from getenv to url.Parse. + +Tips +- Prefer DataFlow/TaintTracking APIs over string matching; use .asExpr() to recover expressions when defined. +- Be explicit about package-qualified targets with hasQualifiedName. +- For better perf/precision, start with localFlow/localTaint and expand to Global only when needed. +- Use select source, "... $@", sink to show path endpoints in results; add path explanation with path queries (outside this scope). diff --git a/codeql-docs/basic-query-for-go-code.gpt b/codeql-docs/basic-query-for-go-code.gpt new file mode 100644 index 0000000..af5e2dc --- /dev/null +++ b/codeql-docs/basic-query-for-go-code.gpt @@ -0,0 +1,36 @@ +Purpose +- Minimal Go query in VS Code; variables, constraints, and results for a concrete bug pattern. + +Target Pattern +- Methods defined on value receivers that write to a field have no effect (receiver is copied). +- Safer alternative: method should use a pointer receiver. + +Query + import go + from Method m, Variable recv, Write w, Field f + where recv = m.getReceiver() and + w.writesField(recv.getARead(), f, _) and + not recv.getType() instanceof PointerType + select w, "This update to " + f + " has no effect, because " + recv + " is not a pointer." + +Structure (analogy to SQL) +- import: include standard Go library (import go). +- from: declare typed variables to range over (Method, Variable, Write, Field). +- where: constrain relationships among variables with predicates. +- select: emit results; message can concatenate strings and AST entities. + +Key Predicates/Classes +- Method.getReceiver(): receiver variable of a method. +- Write.writesField(baseRead, field, idx): a write whose LHS writes field of a base expression. +- Variable.getARead(): a read expression of the variable (used to match Write receiver base). +- PointerType: type test to exclude pointer receivers. + +Usage Hints +- Use hasQualifiedName(pkg, name) to narrow functions/methods by package. +- Start with quick query in the VS Code CodeQL extension; paste query under "import go". +- Click results to jump to the write site; refine constraints if needed. + +Extensions +- Add a guard to exclude writes to fields of temporary copies (e.g., values returned from functions). +- Restrict to exported methods/types, or to specific packages. +- Convert to a path query to show flows leading to the write (optional). diff --git a/codeql-docs/codeql-for-go.gpt b/codeql-docs/codeql-for-go.gpt new file mode 100644 index 0000000..aa71971 --- /dev/null +++ b/codeql-docs/codeql-for-go.gpt @@ -0,0 +1,21 @@ +Purpose +- Orientation page for Go query authors; links and core concepts. + +What to Learn (roadmap) +- Basic query for Go code: variables, predicates, SELECT formatting. +- CodeQL library for Go: AST, entities/names, types, DFG/CFG, calls. +- AST classes for Go: concrete syntax → CodeQL classes mapping and accessors. +- Analyzing data flow in Go: local/global flow and taint. +- Customizing library models for Go: data extensions (sources/sinks/summaries) and model packs. + +Core Import +- Use "import go" to bring the standard Go library (go.qll and friends). + +Best Practices +- Start syntactic (AST) for structure; switch to DFG for semantic flow. +- Use hasQualifiedName for stable matching of stdlib/framework APIs. +- Prefer library predicates over string parsing; rely on classes and accessors. +- Keep queries specific and cheap first; generalize after validation. + +Next Steps +- Follow each linked topic for details and examples. Combine AST selections with DataFlow/TaintTracking when moving from structure to behavior. diff --git a/codeql-docs/codeql-library-for-go.gpt b/codeql-docs/codeql-library-for-go.gpt new file mode 100644 index 0000000..28d4e53 --- /dev/null +++ b/codeql-docs/codeql-library-for-go.gpt @@ -0,0 +1,37 @@ +Purpose +- Quick reference to the Go standard library for CodeQL queries. + +Views +- AST (syntactic): statements/expressions, names, declarations. +- CFG/IR: control flow, instructions (rarely used directly by queries). +- DFG (data-flow): value and taint propagation, call/callee mapping. + +AST Essentials +- AstNode: getChild(i), getAChild(), getParent() for generic traversal (avoid index reliance). +- Statements: IfStmt, ForStmt, RangeStmt, SwitchStmt/ExpressionSwitchStmt, TypeSwitchStmt, SelectStmt, CaseClause, CommClause, BlockStmt, DeclStmt, Assign variants, Inc/Dec, GoStmt, DeferStmt, Labeled/Break/Continue/Goto/Fallthrough. +- Expressions: Ident, SelectorExpr (base/selector), BasicLit (IntLit/FloatLit/ImagLit/RuneLit/StringLit), FuncLit, CompositeLit (getKey/getValue), ParenExpr, IndexExpr, SliceExpr, ConversionExpr, TypeAssertExpr, CallExpr (getCalleeExpr/getArg), StarExpr, TypeExpr, OperatorExpr → UnaryExpr/BinaryExpr (ComparisonExpr with EqualityTestExpr/RelationalComparisonExpr). +- Statement accessors: per-class getters (getCondition, getThen, getElse, getInit, getPost, getExpr(i), getStmt(i), getComm(), etc.). + +Names/Entities/Types +- Name hierarchy: SimpleName vs QualifiedName; namespaces: PackageName, TypeName, ValueName, LabelName; ValueName → ConstantName, VariableName, FunctionName. +- ReferenceExpr: lvalue/rvalue; ValueExpr: expressions with values. +- Entity: PackageEntity, TypeEntity, ValueEntity (Constant/Variable/Function), Label; hasQualifiedName, getDeclaration, getAReference. +- Variable subclasses: LocalVariable, ReceiverVariable, Parameter, ResultVariable; Field with hasQualifiedName(pkg,type,field). +- Function/Method: FuncDef unifies FuncDecl/FuncLit; getBody, getName, getParameter(i), getResultVar(i), getACall. Method.hasQualifiedName(pkg,type,method); implements(m2). + +Data Flow Graph (DFG) +- DataFlow::Node ↔ optional AST via asExpr (use cautiously). getType(), getNumericValue/getStringValue/getExactValue for constants. +- Nodes: CallNode (getArgument(i), getResult(i), getTarget(), getACallee()), ParameterNode (asParameter), BinaryOperationNode (covers x+1, x+=1, x++), UnaryOperationNode; PointerDereferenceNode, AddressOperationNode, RelationalComparisonNode, EqualityTestNode. +- Read/Write: readsVariable/Field/Element, writesVariable/Field/Element. + +Call Graph +- getTarget(): declared (may be interface method). getACallee(): all possible dynamic callees. + +Global Flow/Taint (overview) +- Define ConfigSig with isSource/isSink/[isBarrier]; apply DataFlow::Global<..> or TaintTracking::Global<..>. + +Advanced +- Basic blocks/dominance for CFG-based reasoning (rare for standard queries). + +Guidance +- Prefer AST for structure, DFG for semantics. Use qualified names. Rely on library types/predicates over string parsing. Start local, move to global only as needed. diff --git a/codeql-docs/customizing-library-models-for-go.gpt b/codeql-docs/customizing-library-models-for-go.gpt new file mode 100644 index 0000000..619a820 --- /dev/null +++ b/codeql-docs/customizing-library-models-for-go.gpt @@ -0,0 +1,44 @@ +Purpose +- Customize data-flow/taint analysis for Go by modeling frameworks/libraries via data extensions (YAML) and model packs. + +Data Extensions (YAML) +- Structure: + extensions: + - addsTo: + pack: codeql/go-all + extensible: + data: + - + - +- Union semantics across files: rows are combined; duplicates removed. + +Extensible Predicates (Go) +- sourceModel(package, type, subtypes, name, signature, ext, output, kind, provenance) + - Define sources (e.g., user input). kind maps to threat model; provenance tags origin (manual/ai-manual/etc.). +- sinkModel(package, type, subtypes, name, signature, ext, input, kind, provenance) + - Define sinks (dangerous use). +- summaryModel(package, type, subtypes, name, signature, ext, input, output, kind, provenance) + - Define through-flow when dependency code isn’t in repo. +- neutralModel(package, type, name, signature, kind, provenance) + - Low-impact flows (weaker than summaries) to reduce over-taint/noise. + +Access Paths (examples) +- Argument[i], Argument[i].ArrayElement, ReturnValue, ReturnValue.ArrayElement, Receiver, Qualifier, Field["name"], Field[]. +- kind: "value" moves whole values; "taint" propagates taint only. + +Examples +- slices.Max: elements of arg[0] → ReturnValue. + - ["slices","",False,"Max","","","Argument[0].ArrayElement","ReturnValue","value","manual"] +- slices.Concat: elements of all args → elements of ReturnValue. + - ["slices","",False,"Concat","","","Argument[0].ArrayElement.ArrayElement","ReturnValue.ArrayElement","value","manual"] +- Threat models: use kind/provenance and pack wiring to include/exclude sets of sources. + +Model Packs +- Group YAML files into a CodeQL model pack; publish to GHCR. +- Consumers add the pack to their query suite or CLI invocation to apply models. + +Workflow Tips +- Start with summaries for common library flows that otherwise break paths (builders, containers, helpers). +- Add specific sources/sinks tied to your threat model. +- Keep models narrow to avoid false positives (match by hasQualifiedName). +- Validate with path queries and unit tests; iterate on precision vs recall.