Both accessors returned the same private `kind_name: &'static str`
field; `kind_name()` is widely used (mainly by dump.rs and schema
diagnostics) and `kind()` had only 2 internal callers in lib.rs and
a handful in tests. Pick the more descriptive name and update the
callers.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
In the initial implementation of yeast, the splice syntax was needed do
distinguish between splicing multiple nodes or just a single node.
However, this was always an ugly "wart" in the syntax, since the user
shouldn't have to worry about these things.
To fix this, we add an `IntoFieldIds` trait that dispatches on the
value's type: `Id` pushes a single id, and a blanket impl for
`IntoIterator<Item: Into<Id>>` handles `Vec<Id>`, `Option<Id>`, and
arbitrary iterator chains.
With this, we no longer need to use the special splice syntax, and hence
we can get rid of it.
Previously, the `Id` type was a bare usize alias. The `NodeRef` newtype
existed solely to carry the AST-aware `YeastDisplay` /
`YeastSourceRange` impls (so that `#{captured_node}` rendered source
text rather than the numeric id) without colliding with the impls for
raw integer types.
This commit promotes `Id` itself to a (transparent) newtype struct and
moves the AST-aware trait impls directly onto it. With `Id` and `usize`
now being different types, the integer-display impl (for `usize`) and
the source-text impl (for `Id`) coexist without conflict, and `NodeRef`
becomes redundant (and so we remove it).
- unified/swift: Mark `binding_kind` as a raw `@@` capture in the
property_declaration rule. It is only used to read its source text
(`ctx.ast.source_text`), never as a translated node. With `@` the
auto-translate prefix would route the unnamed `let`/`var` token
through the catch-all `_ @node => {node}` fallback for a no-op
roundtrip; `@@` makes the intent explicit and removes that reliance.
- shared/yeast/tests: Reword a stale comment in test_raw_capture_marker.
The text claimed a "second assertion" exists in this test, but the
explicit-translation check actually lives in the companion
test_raw_capture_marker_explicit_translate.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The `@@name` capture marker in `rule!` queries skips the
auto-translate prefix for that specific capture, letting the body see
the original capture (and thus delay its translation using
`ctx.translate` until it becomes convenient).
Regular `@name` captures continue to be auto-translated as before.
Specifically these are translated _eagerly_, before the main body of the
rewrite rule is run.
I settled on `@@` as the syntax because it did not add new symbols that
the user has to keep track of (it's still a kind of capture), but it's
still visually distinct enough that the user should be able to tell that
there's something special going on. In principle one could accidentally
write one form of capture where the other was intended, but in practice
this would result in code that did not compile (because the types would
not match).
Format the touched Rust crates (shared/tree-sitter-extractor,
shared/yeast, shared/yeast-macros, unified/extractor) so the
tree-sitter-extractor CI fmt check passes. No functional changes.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Renames what was previously called `__yeast_ctx` into just `ctx`, and
adds a new field `user_ctx` to this context. Said field can contain a
struct of any user type (necessitating making various parts of the
implementation generic in said type).
Through some Deref magic, field accesses are delegated to the inner
struct (assuming they are not already defined on `ctx`), which should
hopefully make the interface a bit more ergonomic.
Previously, when a node was synthesized it would always take the
location from the node that matched the current rule. This resulted
in overly broad locations however.
For (foo #{bar}) we now take the location of the 'bar' node.
For non-leaf nodes we merge all its child node locations.
Introduce NodeRef as a typed wrapper around node arena IDs. Captures in
desugaring rules are now bound as NodeRef instead of raw usize, which
prevents accidental misuse and enables source-text-aware rendering.
Add the YeastDisplay trait as an alternative to Display: its
yeast_to_string method receives the Ast, allowing NodeRef to resolve to
the captured node's source text instead of printing a numeric ID.
Store the original source bytes in the Ast so that NodeContent::Range
values (from synthesized literal nodes) can be resolved back to text.
Update yeast-macros to emit NodeRef-typed capture bindings and use
Into::<usize>::into where raw IDs are needed. The #{expr} template
syntax now uses YeastDisplay instead of Display.
The effect is visible in the corpus tests: operator nodes now correctly
render as e.g. operator "+" instead of operator "3".
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
apply_rules_inner used to handle the "child was rewritten, so the
parent needs new field IDs" case by cloning the parent node, swapping
in the new fields, pushing the clone onto the arena, and returning the
new Id. Every ancestor on the path from the rewrite up to the root was
duplicated this way, with the originals retained as garbage in the
arena.
Switch to in-place mutation: assign `ast.nodes[id].fields = new_fields`
and return the same Id. Rule firings still produce genuinely new nodes
via BuildCtx (their structure differs from the input), but the
ancestor-rebuild spine no longer copies anything.
This is safe because apply_rules_inner already works entirely by Id:
the field snapshot is cloned out before recursing, no &Node references
are held across mutations of the arena, and captures are scoped to a
single rule firing so the now-stable Ids do not break anything.
Memory effect: a desugaring pass that rewrites R leaves of a tree of
average depth d previously appended R*d ancestor clones to the arena.
Now appends 0.
With Ids stable for the lifetime of an Ast, the Node::id field becomes
truly redundant and is removed (along with the Node::id() accessor).
AstCursor switches from caching `node: &Node` to tracking `node_id:
Id` and looking the node up via the arena on each access; ChildrenIter
now yields Ids directly. A new AstCursor::node_id() method gives
callers access to the cursor position by Id.
Previously, a bare child pattern in a query took whatever the next
child of the iterator was and either matched or failed: it would not
scan ahead to find a match. So `(foo ("baz"))` against a `foo` whose
implicit `child` field was `["bar", "baz"]` would fail (the pattern
took "bar" first).
Switch to forward-scan semantics: a SingleNode matcher advances through
the iterator until it finds a child that matches its sub-query. Patterns
that are named-only continue to skip past unnamed children for free.
Order is preserved across multiple bare patterns at the same level —
each pattern advances the shared iterator past whatever it consumed —
so a query cannot match children out of source order.
Captures from a failed match attempt are rolled back via a snapshot, so
partial captures from a complex sub-query do not leak across attempts.
Add two regression tests against the `do` body wrapper in a Ruby
for-loop, whose implicit `child` field contains [do, identifier, end]:
- a query for ("end") matches by skipping past `do` and the identifier
- a query for ("end") then ("do") fails, demonstrating order preservation
Three improvements to the query parser, all aimed at allowing query
patterns to refer to unnamed tokens:
1. Bare-literal capture: `"=" @op` now captures the unnamed `=` token,
matching the parenthesized form `("=") @op`. Previously the literal
branch in parse_query_list skipped the maybe_wrap_capture call, so
the `@op` was a leftover token and would error.
2. Bare `_` matches any node, named or unnamed. Previously bare `_` and
`(_)` both produced QueryNode::Any with the same matches_named_only
behaviour, so bare `_` would skip unnamed children. Now Any carries a
match_unnamed flag: false for `(_)` (named-only, tree-sitter default)
and true for bare `_` (any node).
3. Named fields and bare child patterns may be intermixed in any order.
Previously, once parse_query_fields saw a bare pattern it would stop
accepting named fields. The fix accumulates bare patterns into the
implicit `child` field and keeps parsing.
Each named field independently selects its target field for matching, so
the source-order of fields in the query is purely cosmetic and intermixing
is safe.
Add tests covering parenthesized capture, bare-literal capture, and the
named-vs-any distinction between `(_)` and bare `_`. Update query-syntax
docs to reflect all three.
Extend the desugaring config from a single flat list of rules to an
ordered sequence of named Phases. Each phase runs to completion (a
full traversal applying its rules) before the next phase starts.
Rules in different phases never compete for matches.
The config is built via the new chainable API:
DesugaringConfig::new()
.add_phase("cleanup", cleanup_rules)
.add_phase("desugar", desugar_rules)
.with_output_node_types_yaml(yaml);
Single-phase configs are just .add_phase(...) called once.
A single FreshScope is shared across phases so generated identifier
names (e.g. $tmp-N) are unique throughout the run.
Phase names appear in error messages, e.g. "Phase `desugar`:
exceeded maximum rewrite depth".
Add two regression tests: one verifying basic two-phase chained
desugaring, and one verifying that errors include the failing phase
name.
Previously, after a rule fired the engine would always re-try that
same rule on the result root. A rule whose output matched its own
query (intentionally or by accident) would loop until the global
MAX_REWRITE_DEPTH safety net kicked in.
Make the default behavior fire-once-per-node: after a rule fires on
node N, the engine no longer tries that same rule on the result root.
Other rules and child traversal are unaffected. Rules that
intentionally rewrite iteratively can opt into the old behavior via
the new Rule::repeated() builder method.
Add two regression tests using a self-swapping assignment rule:
- with .repeated(), the swap loops and trips the depth limit
- without it (default), the swap fires once and terminates
Adds a regression test verifying that desugaring rules can chain across
output-only node kinds: a first rule rewrites an input kind to an
output-only kind, and a second rule then rewrites that output-only
kind into another output-only kind. This exercises the schema lookup
for query patterns whose root kind is not present in the input
tree-sitter grammar.
12 tests covering parsing, queries, tree building, desugaring rules,
cursor navigation, and the shorthand rule! syntax.
Tests use a custom output node-types.yml with named fields for all
children (parameter, stmt, index), loaded via
schema_from_yaml_with_language.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>