Add prompt support files generated from rst doc

This commit is contained in:
2025-09-01 21:45:53 -07:00
committed by =michael hohn
parent d82a957df0
commit 8d1d29fe10
7 changed files with 297 additions and 0 deletions

19
codeql-docs/README.org Normal file
View File

@@ -0,0 +1,19 @@
* TODO Direct Conversion RST -> Prompt by GPT
** For Go
+ [[../ql/docs/codeql/codeql-language-guides/abstract-syntax-tree-classes-for-working-with-go-programs.rst]]
- ./abstract-syntax-tree-classes-for-working-with-go-programs.gpt
+ [[../ql/docs/codeql/codeql-language-guides/analyzing-data-flow-in-go.rst]]
- ./analyzing-data-flow-in-go.gpt
+ [[../ql/docs/codeql/codeql-language-guides/basic-query-for-go-code.rst]]
- ./basic-query-for-go-code.gpt
+ [[../ql/docs/codeql/codeql-language-guides/codeql-for-go.rst]]
- ./codeql-for-go.gpt
+ [[../ql/docs/codeql/codeql-language-guides/codeql-library-for-go.rst]]
- ./codeql-library-for-go.gpt
+ [[../ql/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst]]
- ./customizing-library-models-for-go.gpt
** For Python
** For C/C++

View File

@@ -0,0 +1,90 @@
Purpose
- Write CodeQL queries over Go by navigating the Go AST classes.
- Model: Syntax → CodeQL class hierarchy; use predicates to access parts (condition, body, operands).
- Pattern: get<Part>(), getA<Part>(), get<Left/Right>Operand>(), getAnArgument(), getCallee().
Core Namespaces
- Statements: subclasses of Stmt.
- Expressions: subclasses of Expr (literals, unary, binary, calls, selectors, etc.).
- Declarations: FuncDecl, GenDecl (+ ImportSpec, TypeSpec, ValueSpec).
- Types: TypeExpr nodes (ArrayTypeExpr, StructTypeExpr, FuncTypeExpr, InterfaceTypeExpr, MapTypeExpr, ChanTypeExpr variants).
- Names/Selectors: SimpleName, SelectorExpr; Name hierarchy: PackageName, TypeName, ValueName, LabelName.
Statements (Stmt)
- EmptyStmt “;”; ExprStmt expression-as-stmt; BlockStmt “{…}”.
- IfStmt: if cond then [else]; supports init; Then/Else are blocks or statements.
- ForStmt: classic init/cond/post; LoopStmt superclass. RangeStmt: “for k,v := range expr { … }”.
- SwitchStmt/ExpressionSwitchStmt; TypeSwitchStmt; CaseClause inside switch.
- SelectStmt with CommClause; SendStmt “ch <- x”; RecvStmt “x = <-ch”.
- DeclStmt; Assignment family: SimpleAssignStmt (=), DefineStmt (:=), CompoundAssignStmt (+, -, *, /, %, &, |, ^, <<, >>, &^).
- IncStmt x++, DecStmt x--. GoStmt “go f()”; DeferStmt “defer f()”. LabeledStmt, BreakStmt, ContinueStmt, GotoStmt, FallthroughStmt, BadStmt.
Expressions (Expr)
Literals
- BasicLit subclasses: IntLit, FloatLit, ImagLit, CharLit/RuneLit, StringLit.
- CompositeLit: StructLit (T{…}), MapLit (map[K]V{…}).
- FuncLit: function literal (FuncDef).
UnaryExpr (UnaryExpr)
- PlusExpr “+x”, MinusExpr “-x”, NotExpr “!x”, ComplementExpr “^x”, AddressExpr “&x”, RecvExpr “<-x”.
BinaryExpr (BinaryExpr)
- Arithmetic: MulExpr, QuoExpr, RemExpr, AddExpr, SubExpr.
- Shift: ShlExpr “<<”, ShrExpr “>>”.
- Logical: LandExpr “&&”, LorExpr “||”.
- Relational: LssExpr “<”, GtrExpr “>”, LeqExpr “<=”, GeqExpr “>=”.
- Equality: EqlExpr “==”, NeqExpr “!=”.
- Bitwise: AndExpr “&”, OrExpr “|”, XorExpr “^”, AndNotExpr “&^”.
Type expressions (no common superclass)
- ArrayTypeExpr “[N]T”/“[]T”; StructTypeExpr “struct{…}”; FuncTypeExpr “func(…) …”.
- InterfaceTypeExpr; MapTypeExpr; ChanTypeExpr variants: SendChanTypeExpr, RecvChanTypeExpr, SendRecvChanTypeExpr.
Name/Selector/Call
- Name subclasses: SimpleName, QualifiedName; ValueName → ConstantName, VariableName, FunctionName.
- SelectorExpr “X.Y” for pkg qualifiers and field/method access.
- CallExpr: getCallee(), getAnArgument(); method calls often SelectorExpr as callee.
- IndexExpr “a[i]”; SliceExpr “a[i:j:k]”; KeyValueExpr in CompositeLit.
- ParenExpr; StarExpr pointer deref/type; TypeAssertExpr “x.(T)”; Conversion “T(x)”.
Declarations
- FuncDecl/FuncLit via FuncDef: getBody(), getName(), getParameter(i), getResultVar(i), getACall().
- GenDecl with ImportSpec/TypeSpec/ValueSpec; Field/FieldList for params, results, struct/interface fields.
Concurrency
- SelectStmt with CommClause; SendStmt; RecvExpr/RecvStmt; GoStmt; DeferStmt.
Navigation Idioms
- If: getCondition(), getThen(), getElse(); For/Range: inspect init/cond/post or range expr.
- Calls: from CallExpr c, SelectorExpr s | c.getCallee() = s and s.getMemberName() = "Foo".
- Method vs function: SelectorExpr callee vs SimpleName callee.
- Switch/TypeSwitch: use CaseClause, getExpr(i)/getStmt(i); Select: CommClause.
- Assign: match AssignStmt subclasses; short var define is DefineStmt.
- Binary/Unary: use specific subclasses or operator accessors.
- Literals: filter BasicLit subclasses; CompositeLit elements via keys/values.
Selection Patterns (QL sketches)
- Method calls by name:
from CallExpr call, SelectorExpr sel
where call.getCallee() = sel and sel.getMemberName() = "Close"
select call
- Range over map/slice:
from RangeStmt r select r
- Short var with channel receive:
from RecvStmt rs select rs
- Struct literal of type Point:
from StructLit lit where lit.getType().getName() = "Point" select lit
- Defer call:
from DeferStmt d, CallExpr c where d.getExpr() = c select d, c
Tips
- Prefer class tests over string parsing. Disambiguate type conversions (CallExpr callee is a TypeExpr).
- Inc/Dec are statements, not expressions. Handle ":=" vs "=" separately. Exclude BadStmt/BadExpr.
Cheatsheet (syntax → class)
- If: IfStmt; For: ForStmt; Range: RangeStmt; Switch: SwitchStmt/ExpressionSwitchStmt; Type switch: TypeSwitchStmt; Select: SelectStmt; Case: CaseClause; Select case: CommClause.
- Assign: SimpleAssignStmt (=), DefineStmt (:=), CompoundAssignStmt; Inc/Dec: IncStmt, DecStmt.
- Call: CallExpr; Selector: SelectorExpr; Index/Slice: IndexExpr/SliceExpr; Type assert: TypeAssertExpr; Unary/Binary: UnaryExpr/BinaryExpr subtypes.
- Literals: IntLit, FloatLit, ImagLit, CharLit/RuneLit, StringLit, StructLit, MapLit, FuncLit.
- Types: ArrayTypeExpr, StructTypeExpr, FuncTypeExpr, InterfaceTypeExpr, MapTypeExpr, ChanTypeExpr.
- Names/Entities: Name, ValueName, FunctionName; FuncDef, FuncDecl, FuncLit.

View File

@@ -0,0 +1,50 @@
Purpose
- Use CodeQLs Go data-flow libraries to find how values and taint propagate.
- Cover local flow/taint (intra-procedural) and global flow/taint (inter-procedural), with configurable sources/sinks/barriers.
Local Data Flow (DataFlow)
- Node hierarchy: Node (ExprNode, ParameterNode, InstructionNode). Map to/from AST/IR via asExpr/asParameter/asInstruction and exprNode/parameterNode/instructionNode.
- localFlowStep(a,b): immediate edge; localFlow(a,b) is transitive closure (localFlowStep*).
- Example: find all expressions that flow to call arg 0 of os.Open:
import go
from Function osOpen, CallExpr call, Expr src
where osOpen.hasQualifiedName("os","Open") and call.getTarget() = osOpen and
DataFlow::localFlow(DataFlow::exprNode(src), DataFlow::exprNode(call.getArgument(0)))
select src
Local Taint (TaintTracking)
- localTaintStep / localTaint analogous to DataFlow but includes non-value-preserving steps (e.g., concatenation).
- Example: parameter → sink taint check with TaintTracking::localTaint.
Global Data Flow (DataFlow::Global)
- Implement DataFlow::ConfigSig:
- isSource(Node): where flow originates.
- isSink(Node): where flow ends.
- isBarrier(Node) [optional]: blocks flow.
- isAdditionalFlowStep(a,b) [optional]: add extra edges.
- Apply module: module MyFlow = DataFlow::Global<MyConfig>.
- Query via MyFlow::flow(source, sink).
Global Taint (TaintTracking::Global)
- Same signature as Global data flow; includes taint-style non-value-preserving steps.
- Good for security queries (untrusted → sink).
Predefined Sources
- RemoteFlowSource: user-controllable inputs; use as source for security findings.
Idioms
- Targeted call/arg sink: define isSink by matching call.getTarget() and sink.asExpr() = call.getArgument(i).
- Literal-only filter: require source.asExpr() instanceof StringLit (or other BasicLit subclass).
- Env source example: class GetenvSource extends CallExpr where getTarget().hasQualifiedName("os","Getenv").
- Compose flows: define MyFlow for literal→url.Parse, or taint from getenv→url.Parse using ConfigSig and Global.
Exercises (patterns to emulate)
- Hard-coded strings → url.Parse (local/global).
- Sources from os.Getenv.
- Full path query from getenv to url.Parse.
Tips
- Prefer DataFlow/TaintTracking APIs over string matching; use .asExpr() to recover expressions when defined.
- Be explicit about package-qualified targets with hasQualifiedName.
- For better perf/precision, start with localFlow/localTaint and expand to Global only when needed.
- Use select source, "... $@", sink to show path endpoints in results; add path explanation with path queries (outside this scope).

View File

@@ -0,0 +1,36 @@
Purpose
- Minimal Go query in VS Code; variables, constraints, and results for a concrete bug pattern.
Target Pattern
- Methods defined on value receivers that write to a field have no effect (receiver is copied).
- Safer alternative: method should use a pointer receiver.
Query
import go
from Method m, Variable recv, Write w, Field f
where recv = m.getReceiver() and
w.writesField(recv.getARead(), f, _) and
not recv.getType() instanceof PointerType
select w, "This update to " + f + " has no effect, because " + recv + " is not a pointer."
Structure (analogy to SQL)
- import: include standard Go library (import go).
- from: declare typed variables to range over (Method, Variable, Write, Field).
- where: constrain relationships among variables with predicates.
- select: emit results; message can concatenate strings and AST entities.
Key Predicates/Classes
- Method.getReceiver(): receiver variable of a method.
- Write.writesField(baseRead, field, idx): a write whose LHS writes field of a base expression.
- Variable.getARead(): a read expression of the variable (used to match Write receiver base).
- PointerType: type test to exclude pointer receivers.
Usage Hints
- Use hasQualifiedName(pkg, name) to narrow functions/methods by package.
- Start with quick query in the VS Code CodeQL extension; paste query under "import go".
- Click results to jump to the write site; refine constraints if needed.
Extensions
- Add a guard to exclude writes to fields of temporary copies (e.g., values returned from functions).
- Restrict to exported methods/types, or to specific packages.
- Convert to a path query to show flows leading to the write (optional).

View File

@@ -0,0 +1,21 @@
Purpose
- Orientation page for Go query authors; links and core concepts.
What to Learn (roadmap)
- Basic query for Go code: variables, predicates, SELECT formatting.
- CodeQL library for Go: AST, entities/names, types, DFG/CFG, calls.
- AST classes for Go: concrete syntax → CodeQL classes mapping and accessors.
- Analyzing data flow in Go: local/global flow and taint.
- Customizing library models for Go: data extensions (sources/sinks/summaries) and model packs.
Core Import
- Use "import go" to bring the standard Go library (go.qll and friends).
Best Practices
- Start syntactic (AST) for structure; switch to DFG for semantic flow.
- Use hasQualifiedName for stable matching of stdlib/framework APIs.
- Prefer library predicates over string parsing; rely on classes and accessors.
- Keep queries specific and cheap first; generalize after validation.
Next Steps
- Follow each linked topic for details and examples. Combine AST selections with DataFlow/TaintTracking when moving from structure to behavior.

View File

@@ -0,0 +1,37 @@
Purpose
- Quick reference to the Go standard library for CodeQL queries.
Views
- AST (syntactic): statements/expressions, names, declarations.
- CFG/IR: control flow, instructions (rarely used directly by queries).
- DFG (data-flow): value and taint propagation, call/callee mapping.
AST Essentials
- AstNode: getChild(i), getAChild(), getParent() for generic traversal (avoid index reliance).
- Statements: IfStmt, ForStmt, RangeStmt, SwitchStmt/ExpressionSwitchStmt, TypeSwitchStmt, SelectStmt, CaseClause, CommClause, BlockStmt, DeclStmt, Assign variants, Inc/Dec, GoStmt, DeferStmt, Labeled/Break/Continue/Goto/Fallthrough.
- Expressions: Ident, SelectorExpr (base/selector), BasicLit (IntLit/FloatLit/ImagLit/RuneLit/StringLit), FuncLit, CompositeLit (getKey/getValue), ParenExpr, IndexExpr, SliceExpr, ConversionExpr, TypeAssertExpr, CallExpr (getCalleeExpr/getArg), StarExpr, TypeExpr, OperatorExpr → UnaryExpr/BinaryExpr (ComparisonExpr with EqualityTestExpr/RelationalComparisonExpr).
- Statement accessors: per-class getters (getCondition, getThen, getElse, getInit, getPost, getExpr(i), getStmt(i), getComm(), etc.).
Names/Entities/Types
- Name hierarchy: SimpleName vs QualifiedName; namespaces: PackageName, TypeName, ValueName, LabelName; ValueName → ConstantName, VariableName, FunctionName.
- ReferenceExpr: lvalue/rvalue; ValueExpr: expressions with values.
- Entity: PackageEntity, TypeEntity, ValueEntity (Constant/Variable/Function), Label; hasQualifiedName, getDeclaration, getAReference.
- Variable subclasses: LocalVariable, ReceiverVariable, Parameter, ResultVariable; Field with hasQualifiedName(pkg,type,field).
- Function/Method: FuncDef unifies FuncDecl/FuncLit; getBody, getName, getParameter(i), getResultVar(i), getACall. Method.hasQualifiedName(pkg,type,method); implements(m2).
Data Flow Graph (DFG)
- DataFlow::Node ↔ optional AST via asExpr (use cautiously). getType(), getNumericValue/getStringValue/getExactValue for constants.
- Nodes: CallNode (getArgument(i), getResult(i), getTarget(), getACallee()), ParameterNode (asParameter), BinaryOperationNode (covers x+1, x+=1, x++), UnaryOperationNode; PointerDereferenceNode, AddressOperationNode, RelationalComparisonNode, EqualityTestNode.
- Read/Write: readsVariable/Field/Element, writesVariable/Field/Element.
Call Graph
- getTarget(): declared (may be interface method). getACallee(): all possible dynamic callees.
Global Flow/Taint (overview)
- Define ConfigSig with isSource/isSink/[isBarrier]; apply DataFlow::Global<..> or TaintTracking::Global<..>.
Advanced
- Basic blocks/dominance for CFG-based reasoning (rare for standard queries).
Guidance
- Prefer AST for structure, DFG for semantics. Use qualified names. Rely on library types/predicates over string parsing. Start local, move to global only as needed.

View File

@@ -0,0 +1,44 @@
Purpose
- Customize data-flow/taint analysis for Go by modeling frameworks/libraries via data extensions (YAML) and model packs.
Data Extensions (YAML)
- Structure:
extensions:
- addsTo:
pack: codeql/go-all
extensible: <extensible-predicate>
data:
- <tuple1>
- <tuple2>
- Union semantics across files: rows are combined; duplicates removed.
Extensible Predicates (Go)
- sourceModel(package, type, subtypes, name, signature, ext, output, kind, provenance)
- Define sources (e.g., user input). kind maps to threat model; provenance tags origin (manual/ai-manual/etc.).
- sinkModel(package, type, subtypes, name, signature, ext, input, kind, provenance)
- Define sinks (dangerous use).
- summaryModel(package, type, subtypes, name, signature, ext, input, output, kind, provenance)
- Define through-flow when dependency code isnt in repo.
- neutralModel(package, type, name, signature, kind, provenance)
- Low-impact flows (weaker than summaries) to reduce over-taint/noise.
Access Paths (examples)
- Argument[i], Argument[i].ArrayElement, ReturnValue, ReturnValue.ArrayElement, Receiver, Qualifier, Field["name"], Field[<index>].
- kind: "value" moves whole values; "taint" propagates taint only.
Examples
- slices.Max: elements of arg[0] → ReturnValue.
- ["slices","",False,"Max","","","Argument[0].ArrayElement","ReturnValue","value","manual"]
- slices.Concat: elements of all args → elements of ReturnValue.
- ["slices","",False,"Concat","","","Argument[0].ArrayElement.ArrayElement","ReturnValue.ArrayElement","value","manual"]
- Threat models: use kind/provenance and pack wiring to include/exclude sets of sources.
Model Packs
- Group YAML files into a CodeQL model pack; publish to GHCR.
- Consumers add the pack to their query suite or CLI invocation to apply models.
Workflow Tips
- Start with summaries for common library flows that otherwise break paths (builders, containers, helpers).
- Add specific sources/sinks tied to your threat model.
- Keep models narrow to avoid false positives (match by hasQualifiedName).
- Validate with path queries and unit tests; iterate on precision vs recall.