Commit Graph

87260 Commits

Author SHA1 Message Date
Taus
bc48ea4e3c Yeast: Skip unnamed tokens in AST dump output
Unnamed tokens (keywords, operators, punctuation) in the unnamed
children bucket are no longer shown in dump output. They still appear
if they are inside a named field.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:01 +00:00
Taus
2113ba7f61 Yeast: Add AST dumper for human-readable tree output
Add yeast::dump::dump_ast() which produces indented text output:

    program
      assignment
        left:
          left_assignment_list
            identifier "x"
            identifier "y"
        right:
          call
            method: identifier "foo"

Named fields are shown with "field:" labels, unnamed children are
indented under their parent. Leaf nodes show their text content.
Locations are optional via DumpOptions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:01 +00:00
Taus
83739f6eaf Yeast: Add shorthand rule! syntax for capture-to-field mapping
rule!(query => kind_name) is a shorthand for rules that simply gather
query captures into fields on a new node type. Each capture name
becomes a field: single captures produce single-valued fields, repeated
captures produce multi-valued fields.

    rule!((foo f: (boo (_) @blah) (_)* @blop) => bar)

is equivalent to:

    rule!((foo ...) => (bar blah: {blah} blop: {..blop}))

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:01 +00:00
Taus
79f00b87a3 Yeast: Remove unnecessary braces in rule! closure
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:01 +00:00
Taus
9893c45148 Yeast: Add rule! macro combining query and transform
rule! declares a desugaring rule with query pattern and transform
template in a single expression:

    rule!(
        (for pattern: (_) @pat value: (in (_) @val) body: (do (_)* @body))
        =>
        (call receiver: {val} method: (identifier "each") ...)
    )

Captures become Rust variables automatically: @name binds as Id
(single capture) or Vec<Id> (after * or +). The BuildCtx is created
implicitly. tree! and trees! can also be used without an explicit
context inside rule! transforms.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:01 +00:00
Taus
116c13a920 Yeast: Remove unnecessary .collect() from {..} splice
{..expr} calls extend(), which accepts any IntoIterator — no need to
collect into a Vec first.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:01 +00:00
Taus
66cde4719e Yeast: Update documentation for tree!/trees! macros
Rewrite the builder language section to document the current API:
tree!(ctx, ...) / trees!(ctx, ...) with BuildCtx, {expr} for embedded
Rust, {..expr} for splicing, #{expr} for computed literals, and $name
for fresh identifiers. Remove all references to the old TreeBuilder.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:01 +00:00
Taus
2f53acac82 Yeast: Remove old TreeBuilder infrastructure
Delete TreeBuilder, TreeChildBuilder, TreesBuilder enums and all their
methods, along with the tree_builder! and trees_builder! proc macros.
All building is now done through tree!/trees! with BuildCtx.

The tree_builder module is kept for FreshScope only.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:01 +00:00
Taus
1613372191 Yeast: Make tree! always return Vec<Id>
tree! now consistently returns Vec<Id>, even for a single element.
This matches the Rule transform signature and removes the need to
wrap single-node results in vec![].

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:01 +00:00
Taus
b0e2e5eb0d Yeast: Unify tree! and trees! into a single tree! macro
tree! now returns Id for a single element or Vec<Id> for multiple,
determined by how many top-level elements appear in the template.
trees! is kept as an alias for backward compatibility.

- tree!(ctx, (single_node ...))        → Id
- tree!(ctx, (node1 ...) (node2 ...))  → Vec<Id>
- tree!(ctx, (node) {..splice})        → Vec<Id>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:00 +00:00
Taus
e54d84c29f Yeast: Add {..expr} splice syntax and inline assignment rule
{..expr} in tree!/trees! splices a Vec<Id> (extend), while {expr}
inserts a single Id (push). The .. mirrors Rust spread syntax.

The assignment rule is now a single trees! expression with the loop
inlined via {..iter.map(|...| tree!(...)).collect()}.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:00 +00:00
Taus
4b666a7e4b Yeast: Use #{expr} for computed literals in tree templates
Change computed literal syntax from (kind {expr}) to (kind #{expr})
to avoid ambiguity with child node splicing ({expr}). The # sigil
mirrors Rust format string syntax.

- (kind "static")  — static string content
- (kind #{expr})   — computed content via expr.to_string()
- (kind $name)     — fresh identifier
- {expr}           — child node (always Id, unambiguous)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:00 +00:00
Taus
7d561f13c6 Yeast: Support computed literals in tree templates
(kind {expr}) creates a leaf node whose content is expr.to_string().
This eliminates ctx.literal() calls for computed values like loop
counters: (integer {i}) instead of {ctx.literal("integer", &i.to_string())}.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:00 +00:00
Taus
99d1730ad1 Yeast: Add tree!/trees! macros with embedded Rust expressions
New tree!(ctx, ...) and trees!(ctx, ...) macros that directly build AST
nodes through a BuildCtx, replacing the intermediate TreeBuilder data
structures for new code.

Key features:
- {expr} embeds Rust expressions inline in templates
- @name references captures from the query match
- $fresh generates unique identifiers (shared across the template)
- (kind "literal") creates literal leaf nodes
- (@name)* splices repeated captures

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:00 +00:00
Taus
01506688aa Yeast: Add documentation for the desugaring framework
Covers architecture, the query and builder languages, capture semantics,
fresh identifiers, and extractor integration, with a complete for-loop
desugaring example.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:00 +00:00
Taus
1378390063 Yeast: Remove unnecessary end-token deletion rule
The end token rule was deleting orphaned "end" tokens after
desugaring. This is no longer needed since the for-loop rule replaces
the entire for node (including its end token), and unnamed tokens are
skipped automatically during matching.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:00 +00:00
Taus
c6f362d13e Yeast: Remove unnecessary token patterns from queries
Since (_) and (_)* skip unnamed tokens automatically, explicit patterns
for discarding tokens are no longer needed.

Before: (left_assignment_list ((identifier) @left (",")?)*  )
After:  (left_assignment_list (identifier)* @left)

Before: (do "do"? (_)* @body)
After:  (do (_)* @body)

Also fix proc macro parsing of (node_kind)* to correctly treat the
group as a repeated single node pattern rather than a repeated list.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:00 +00:00
Taus
1b74721af9 Yeast: Require captures to follow a pattern (tree-sitter style)
Align capture syntax with tree-sitter queries: @name must always follow
a pattern, never appear standalone. This eliminates the ambiguity where
`@name` could be confused with a positional wildcard.

Before: right: @right, (@body)*
After:  right: (_) @right, (_)* @body

For repeated patterns, `(_)* @body` captures each matched node into
the repeated capture variable, matching tree-sitter semantics.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:00 +00:00
Taus
185a637915 Yeast: Make (_) match only named nodes (tree-sitter semantics)
In tree-sitter queries, (_) matches any named node but skips unnamed
tokens. Implement the same semantics: QueryNode::Any and captures
wrapping Any now skip unnamed children in positional matching.

This allows writing `(in (_) @val)` to match the first named child of
an `in` node, skipping the "in" keyword token. Previously this
required the awkward `(in "in" @val)`.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:00 +00:00
Taus
0b601290b1 Yeast: Document the YAML node-types format
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:38:59 +00:00
Taus
cd6aab62a6 Yeast: Always quote unnamed node types in YAML output
Unnamed tokens are now always wrapped in double quotes in the YAML
output, making them visually distinct from named node references.
YAML treats both forms as equivalent strings, so this is purely
a readability improvement.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:38:59 +00:00
Taus
782181aeae Yeast: Add JSON-to-YAML conversion for node types
Add convert_from_json() to convert tree-sitter node-types.json back to
the YAML format. The CLI gains a --from-json flag.

The round-trip is tested: YAML → JSON → YAML → JSON produces identical
JSON output.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:38:59 +00:00
Taus
1340c516c7 Yeast: Add YAML node-types format and converter
Add a human-friendly YAML format for specifying node types, with a
converter to tree-sitter's node-types.json format.

YAML format has three top-level sections:
- supertypes: union types mapping to lists of member types
- named: concrete AST nodes with fields (?, *, + suffixes for
  multiplicity) and $children for unnamed children
- unnamed: list of token strings

Type references are resolved automatically: if a name appears only as
unnamed, it's treated as unnamed; otherwise named. Use {unnamed: name}
for explicit disambiguation.

Includes a library API (yeast::node_types_yaml::convert) and a CLI
binary (node_types_yaml) that reads YAML and outputs JSON.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:38:59 +00:00
Taus
93fea91f76 Shared extractor: Support separate output node types
Add an optional output_node_types field to Language (generator) and
LanguageSpec (extractor). When set, the generator produces dbscheme/QL
from the output types, and the extractor validates TRAP against them.

This enables desugaring transforms that produce AST shapes different
from the tree-sitter grammar. When unset (None), behavior is unchanged
— the tree-sitter node_types are used for both input and output.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:38:59 +00:00
Taus
dc86d9118f Yeast: Auto-manage FreshScope in build_tree/build_trees
build_tree() and build_trees() now create their own FreshScope
internally. For the rare case where a shared scope is needed across
multiple build calls (e.g. the assignment rule), _with_fresh variants
are available.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:38:59 +00:00
Taus
c064b4a369 Yeast: Add literal and fresh identifier syntax to tree builders
Add two new TreeBuilder variants for declarative identifier creation:

- Literal: (kind "value") — creates a leaf node with fixed content
  Example: (identifier "each") creates an identifier node with text 'each'

- Fresh: (kind $name) — creates a leaf node with an auto-generated
  unique name. All occurrences of the same $name within one rule
  application share the same generated value.
  Example: (identifier $tmp) creates 'tmp-0', 'tmp-1', etc.

FreshScope tracks generated names per rule application. This eliminates
the need for manual Rc<Cell> counters and create_named_token calls.

The for-loop rule is now fully declarative (no imperative code in the
transform closure). The assignment rule still needs a closure for the
index counter in repeated captures.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:38:59 +00:00
Taus
2f18ad6f31 Shared extractor: Add extract_and_desugar function
Add extract_and_desugar() which accepts yeast desugaring rules.
The original extract() function is unchanged and delegates to
extract_and_desugar with empty rules.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:38:59 +00:00
Taus
99d4f391cb Yeast: Remove unnecessary parens around literal with quantifier
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:38:59 +00:00
Taus
7d4d145626 Yeast: Implicit child field for bare patterns
Bare patterns inside a node (not preceded by 'field:') are now
automatically assigned to the synthetic 'child' field. This removes the
need for explicit 'child*: (...)' syntax.

Before: (do child*: (("do")? (@body)*))
After:  (do ("do")? (@body)*)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:38:59 +00:00
Taus
9beab6d1fc Yeast: Replace macro_rules with procedural macros
with procedural macros in a new yeast-macros crate. The proc macros parse
a tree-sitter-inspired syntax and generate the same runtime data structures.

Key improvements:
- Better error messages with source spans
- Cleaner syntax closer to tree-sitter query notation
- Captures use @name after patterns (tree-sitter style)
- Fields with bare @capture no longer need wrapping parens
- Removed ~10 interdependent macro_rules! definitions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:38:58 +00:00
Taus
7efb03d4cc Yeast: AST desugaring framework (rebased from hackathon-desugaring)
Add the yeast crate (Yet another Elaborator for Abstract Syntax Trees),
a framework for tree-sitter AST transformations/desugaring. Integrate it
into the shared tree-sitter extractor.

Key components:
- shared/yeast/: New crate with query/match/transform pipeline for
  tree-sitter ASTs, with Ruby desugaring rules as an example
- shared/tree-sitter-extractor: Pass parsed trees through yeast before
  TRAP extraction, applying language-specific desugaring rules

Updated from the original hackathon branch to work with tree-sitter 0.24
and current main dependencies.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:38:58 +00:00
Mathias Vorreiter Pedersen
154d213fd2 Merge pull request #21768 from github/speed-up-unchecked-leap-year-after-modification
C++: Speed up `cpp/leap-year/unchecked-after-arithmetic-year-modification`
2026-04-30 16:06:17 +01:00
Michael Nebel
4446f42846 Merge pull request #21684 from michaelnebel/csharp/improve-reachability-checks
C#: Improve BMN feed checking & handling.
2026-04-30 15:53:52 +02:00
Owen Mansel-Chan
87c35e6401 Merge pull request #21654 from MarkLee131/fix/sensitive-log-hash-sanitizer
Java: treat hash/encrypt/digest methods as sensitive-log sanitizers
2026-04-30 13:21:03 +01:00
Tom Hvitved
a473fdb709 Merge pull request #21759 from hvitved/csharp/cfg-params
C#: Include parameters and their defaults in the CFG
2026-04-30 11:31:06 +02:00
Owen Mansel-Chan
fed42d655f Merge pull request #21656 from MarkLee131/fix/trust-boundary-regexp-barrier
Java: add RegexpCheckBarrier to trust-boundary-violation sanitizers
2026-04-29 14:59:01 +01:00
Michael Nebel
03d70b9f94 C#: Add another nuget.config integration test. 2026-04-29 15:47:32 +02:00
Michael Nebel
e29770c2b5 C#: Fix missing slash in comments. 2026-04-29 15:27:47 +02:00
MarkLee131
28a6ff208c Merge remote-tracking branch 'origin/main' into fix/sensitive-log-hash-sanitizer
# Conflicts:
#	java/ql/test/query-tests/security/CWE-532/SensitiveLogInfo.expected
#	java/ql/test/query-tests/security/CWE-532/Test.java
2026-04-29 20:59:59 +08:00
Tom Hvitved
e14b654e8a Update shared/controlflow/codeql/controlflow/ControlFlowGraph.qll
Co-authored-by: Anders Schack-Mulligen <aschackmull@users.noreply.github.com>
2026-04-29 14:57:35 +02:00
MarkLee131
51e2a5418b Java: move EncryptedSensitiveMethodCall into Sanitizers.qll
Address review feedback by moving the shared method-name-based encryption/hash/digest
check into Sanitizers.qll, and reference it from both CleartextStorageQuery.qll and
SensitiveLoggingQuery.qll instead of duplicating the definition.
2026-04-29 20:56:36 +08:00
MarkLee131
75162bb9eb Update java/ql/test/query-tests/security/CWE-532/Test.java
Co-authored-by: Owen Mansel-Chan <62447351+owen-mc@users.noreply.github.com>
2026-04-29 20:53:58 +08:00
MarkLee131
49d014cbac Merge branch 'main' into fix/trust-boundary-regexp-barrier 2026-04-29 20:48:22 +08:00
MarkLee131
d27ee86242 Java: refactor trust-boundary sanitizers into TrustBoundaryValidationSanitizer subclasses
Address review feedback by introducing dedicated subclasses of
TrustBoundaryValidationSanitizer for SimpleTypeSanitizer, RegexpCheckBarrier,
and the HttpServletSession type check, so isBarrier only references the
abstract class.
2026-04-29 20:46:11 +08:00
Jack Nørskov Jørgensen
0192ffab07 Merge pull request #21751 from github/jacknojo/move_java_generated_mads
Move generated MaDs into modelgenerator/
2026-04-29 14:33:58 +02:00
Tom Hvitved
99b5cecb18 Java: Adapt to changes in shared CFG library 2026-04-29 14:03:06 +02:00
Tom Hvitved
99023f8b59 C#: Add upgrade/downgrade scripts 2026-04-29 14:03:05 +02:00
Tom Hvitved
b6c464281b C#: Move internal logic into internal/ControlFlowGraph.qll 2026-04-29 14:01:14 +02:00
Tom Hvitved
d4a32476da C#: No need to special-case default arguments in nullness analysis 2026-04-29 14:01:13 +02:00
Tom Hvitved
6c42418faf C#: Use parameter CFG nodes in SSA 2026-04-29 14:01:11 +02:00