1. Depth guard now only counts rule rewrites, not tree traversal.
Deep ASTs (>100 levels) no longer trigger false "non-terminating
cycle" errors. Only actual rule-rewrite chains are depth-limited.
2. Repeated matcher detects zero-width matches and breaks the loop.
Patterns like ((_)?)* no longer infinite-loop — if the iterator
does not advance after a successful match, the repetition stops.
3. Shorthand rule! syntax now propagates source_range to synthetic
nodes via create_node_with_range, matching the full template path.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Captures on ? patterns now bind as Option<Id> instead of Id, correctly
handling the case where the optional pattern matches zero times.
Capture multiplicity is now tracked as a three-way enum:
- Single (no quantifier) → Id
- Optional (?) → Option<Id>
- Repeated (* or +) → Vec<Id>
Added Captures::get_opt() which returns None for unmatched captures.
The shorthand rule! syntax also handles optional fields correctly,
only inserting the child when Some.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The field*: syntax was a leftover from the old explicit child*: pattern.
Fields now always match a single node pattern. Bare children (unnamed
positional matches) are expressed as patterns after all named fields.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
#8: Reject * after non-capture template groups with a compile error.
Previously (foo (bar)*) silently dropped the *, behaving like (bar).
#9: Verify inner token streams are exhausted after parsing query nodes.
Unconsumed tokens inside a parenthesized group now produce a compile
error. Fixed a test using the old redundant (pattern)* syntax inside
a field*: group.
#10: Use ast.get_root() instead of hardcoded 0 for the root node id
in apply_rules calls.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Non-terminating rule cycles now produce an error instead of a stack
overflow.
Captures inside a repeated group (e.g. ((_) @x)*) are now correctly
marked as repeated by passing a parent_repeated flag through recursion.
Empty repetition children are handled as a special case before the
loop.
stale captures could remain in the map after a failed sub-query match.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Runner::run and run_from_tree now return Result<Ast, String> instead
of panicking on errors. Errors from query matching (unknown node kinds
or field names), parser setup, and unexpected result counts are all
propagated.
The extractor's extract_and_desugar gracefully falls back to the
un-desugared AST on error, logging the failure via tracing::error.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Synthetic nodes created by desugaring rules now inherit the source
range of the original matched node. This fixes invalid TRAP locations
(previously (0,0)-(0,0)) for desugared nodes.
- Node gains a source_range field used as fallback for position/byte
methods when content is not a Range
- BuildCtx stores the matched node's range and passes it to all
created nodes
- Rule::try_rule extracts the source range from the matched node
and passes it through the transform closure
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add an AstNode trait abstracting over tree_sitter::Node and yeast::Node,
implemented by both types. The Visitor methods (enter_node, leave_node,
record_parse_error_for_node, complex_node, sliced_source_arg,
location_for) are now generic over AstNode.
traverse_yeast uses yeast's AstCursor (which now iterates in source
order) to drive the same generic Visitor. extract_and_desugar is now
fully functional — it can parse, apply yeast rules, and emit TRAP.
sliced_source_arg uses opt_string_content() for yeast nodes with
synthetic content (from desugaring), falling back to source byte range.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Fix the critical traversal order bug: extract() was routing through
yeast even with empty rules, which changed child ordering from
tree-sitter source order to BTreeMap field-id order. This affected
parent_index values in TRAP output for ALL languages.
extract() is now restored to main's implementation using tree-sitter's
native Node and TreeCursor. extract_and_desugar() remains available for
languages that need desugaring, with a fallback to extract() when no
rules are provided.
The yeast-based TRAP traversal (traverse_yeast) is stubbed as
unimplemented — it needs a proper adapter between yeast::Node and
the Visitor before it can be used. No language currently uses it.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The fresh identifier counter is now shared across all rule applications
within a single Runner::run call. Each rule application gets a fresh
"scope" (resolved names are cleared) but the counter keeps incrementing.
This ensures that $tmp in different rules produces distinct names:
the for-rule gets $tmp-0, the assignment rule gets $tmp-1.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Tests that both desugaring rules fire correctly on input like
"for a, b in list do x end" — the for-loop is rewritten to .each
and the multiple assignment pattern is expanded within the block body.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace fixture-file-based tests with self-contained tests using
dump_ast for assertions. Move Ruby desugaring rules from rules.rs
into the test file. Delete all fixture files and rules.rs.
Test coverage:
- Parsing: simple assignment, multiple assignment, for loop
- Queries: match, no-match, repeated captures
- Tree building: swap fields via BuildCtx
- Rules: multiple assignment desugaring, for-loop desugaring,
shorthand rule! syntax
- Cursor: navigation (parent/child/sibling)
11 tests, all self-contained with inline inputs and expected outputs.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Unnamed tokens (keywords, operators, punctuation) in the unnamed
children bucket are no longer shown in dump output. They still appear
if they are inside a named field.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add yeast::dump::dump_ast() which produces indented text output:
program
assignment
left:
left_assignment_list
identifier "x"
identifier "y"
right:
call
method: identifier "foo"
Named fields are shown with "field:" labels, unnamed children are
indented under their parent. Leaf nodes show their text content.
Locations are optional via DumpOptions.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
rule!(query => kind_name) is a shorthand for rules that simply gather
query captures into fields on a new node type. Each capture name
becomes a field: single captures produce single-valued fields, repeated
captures produce multi-valued fields.
rule!((foo f: (boo (_) @blah) (_)* @blop) => bar)
is equivalent to:
rule!((foo ...) => (bar blah: {blah} blop: {..blop}))
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
rule! declares a desugaring rule with query pattern and transform
template in a single expression:
rule!(
(for pattern: (_) @pat value: (in (_) @val) body: (do (_)* @body))
=>
(call receiver: {val} method: (identifier "each") ...)
)
Captures become Rust variables automatically: @name binds as Id
(single capture) or Vec<Id> (after * or +). The BuildCtx is created
implicitly. tree! and trees! can also be used without an explicit
context inside rule! transforms.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Rewrite the builder language section to document the current API:
tree!(ctx, ...) / trees!(ctx, ...) with BuildCtx, {expr} for embedded
Rust, {..expr} for splicing, #{expr} for computed literals, and $name
for fresh identifiers. Remove all references to the old TreeBuilder.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Delete TreeBuilder, TreeChildBuilder, TreesBuilder enums and all their
methods, along with the tree_builder! and trees_builder! proc macros.
All building is now done through tree!/trees! with BuildCtx.
The tree_builder module is kept for FreshScope only.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
tree! now consistently returns Vec<Id>, even for a single element.
This matches the Rule transform signature and removes the need to
wrap single-node results in vec![].
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
tree! now returns Id for a single element or Vec<Id> for multiple,
determined by how many top-level elements appear in the template.
trees! is kept as an alias for backward compatibility.
- tree!(ctx, (single_node ...)) → Id
- tree!(ctx, (node1 ...) (node2 ...)) → Vec<Id>
- tree!(ctx, (node) {..splice}) → Vec<Id>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
{..expr} in tree!/trees! splices a Vec<Id> (extend), while {expr}
inserts a single Id (push). The .. mirrors Rust spread syntax.
The assignment rule is now a single trees! expression with the loop
inlined via {..iter.map(|...| tree!(...)).collect()}.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
(kind {expr}) creates a leaf node whose content is expr.to_string().
This eliminates ctx.literal() calls for computed values like loop
counters: (integer {i}) instead of {ctx.literal("integer", &i.to_string())}.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
New tree!(ctx, ...) and trees!(ctx, ...) macros that directly build AST
nodes through a BuildCtx, replacing the intermediate TreeBuilder data
structures for new code.
Key features:
- {expr} embeds Rust expressions inline in templates
- @name references captures from the query match
- $fresh generates unique identifiers (shared across the template)
- (kind "literal") creates literal leaf nodes
- (@name)* splices repeated captures
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Covers architecture, the query and builder languages, capture semantics,
fresh identifiers, and extractor integration, with a complete for-loop
desugaring example.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The end token rule was deleting orphaned "end" tokens after
desugaring. This is no longer needed since the for-loop rule replaces
the entire for node (including its end token), and unnamed tokens are
skipped automatically during matching.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Since (_) and (_)* skip unnamed tokens automatically, explicit patterns
for discarding tokens are no longer needed.
Before: (left_assignment_list ((identifier) @left (",")?)* )
After: (left_assignment_list (identifier)* @left)
Before: (do "do"? (_)* @body)
After: (do (_)* @body)
Also fix proc macro parsing of (node_kind)* to correctly treat the
group as a repeated single node pattern rather than a repeated list.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Align capture syntax with tree-sitter queries: @name must always follow
a pattern, never appear standalone. This eliminates the ambiguity where
`@name` could be confused with a positional wildcard.
Before: right: @right, (@body)*
After: right: (_) @right, (_)* @body
For repeated patterns, `(_)* @body` captures each matched node into
the repeated capture variable, matching tree-sitter semantics.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
In tree-sitter queries, (_) matches any named node but skips unnamed
tokens. Implement the same semantics: QueryNode::Any and captures
wrapping Any now skip unnamed children in positional matching.
This allows writing `(in (_) @val)` to match the first named child of
an `in` node, skipping the "in" keyword token. Previously this
required the awkward `(in "in" @val)`.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Unnamed tokens are now always wrapped in double quotes in the YAML
output, making them visually distinct from named node references.
YAML treats both forms as equivalent strings, so this is purely
a readability improvement.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add convert_from_json() to convert tree-sitter node-types.json back to
the YAML format. The CLI gains a --from-json flag.
The round-trip is tested: YAML → JSON → YAML → JSON produces identical
JSON output.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add a human-friendly YAML format for specifying node types, with a
converter to tree-sitter's node-types.json format.
YAML format has three top-level sections:
- supertypes: union types mapping to lists of member types
- named: concrete AST nodes with fields (?, *, + suffixes for
multiplicity) and $children for unnamed children
- unnamed: list of token strings
Type references are resolved automatically: if a name appears only as
unnamed, it's treated as unnamed; otherwise named. Use {unnamed: name}
for explicit disambiguation.
Includes a library API (yeast::node_types_yaml::convert) and a CLI
binary (node_types_yaml) that reads YAML and outputs JSON.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add an optional output_node_types field to Language (generator) and
LanguageSpec (extractor). When set, the generator produces dbscheme/QL
from the output types, and the extractor validates TRAP against them.
This enables desugaring transforms that produce AST shapes different
from the tree-sitter grammar. When unset (None), behavior is unchanged
— the tree-sitter node_types are used for both input and output.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
build_tree() and build_trees() now create their own FreshScope
internally. For the rare case where a shared scope is needed across
multiple build calls (e.g. the assignment rule), _with_fresh variants
are available.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add two new TreeBuilder variants for declarative identifier creation:
- Literal: (kind "value") — creates a leaf node with fixed content
Example: (identifier "each") creates an identifier node with text 'each'
- Fresh: (kind $name) — creates a leaf node with an auto-generated
unique name. All occurrences of the same $name within one rule
application share the same generated value.
Example: (identifier $tmp) creates 'tmp-0', 'tmp-1', etc.
FreshScope tracks generated names per rule application. This eliminates
the need for manual Rc<Cell> counters and create_named_token calls.
The for-loop rule is now fully declarative (no imperative code in the
transform closure). The assignment rule still needs a closure for the
index counter in repeated captures.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add extract_and_desugar() which accepts yeast desugaring rules.
The original extract() function is unchanged and delegates to
extract_and_desugar with empty rules.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Bare patterns inside a node (not preceded by 'field:') are now
automatically assigned to the synthetic 'child' field. This removes the
need for explicit 'child*: (...)' syntax.
Before: (do child*: (("do")? (@body)*))
After: (do ("do")? (@body)*)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
with procedural macros in a new yeast-macros crate. The proc macros parse
a tree-sitter-inspired syntax and generate the same runtime data structures.
Key improvements:
- Better error messages with source spans
- Cleaner syntax closer to tree-sitter query notation
- Captures use @name after patterns (tree-sitter style)
- Fields with bare @capture no longer need wrapping parens
- Removed ~10 interdependent macro_rules! definitions
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add the yeast crate (Yet another Elaborator for Abstract Syntax Trees),
a framework for tree-sitter AST transformations/desugaring. Integrate it
into the shared tree-sitter extractor.
Key components:
- shared/yeast/: New crate with query/match/transform pipeline for
tree-sitter ASTs, with Ruby desugaring rules as an example
- shared/tree-sitter-extractor: Pass parsed trees through yeast before
TRAP extraction, applying language-specific desugaring rules
Updated from the original hackathon branch to work with tree-sitter 0.24
and current main dependencies.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>