Commit Graph

47 Commits

Author SHA1 Message Date
copilot-swe-agent[bot]
041a8e6adc Fix source_text call in @@raw_lhs documentation example 2026-06-29 11:26:07 +00:00
Taus
fb424020af yeast: Delete the Cursor trait, inline its methods on AstCursor
The trait had a single implementor (`AstCursor`), three type parameters
of which one (`T`) was never used in any method signature, and one
external consumer that needed `use yeast::Cursor;` in scope just to
call methods on the cursor. The abstraction was overhead without a
second implementor to justify it.

Move the six trait methods to an inherent `impl AstCursor` block;
delete `shared/yeast/src/cursor.rs`, the `pub mod cursor;` and
`pub use cursor::Cursor;` lines in `lib.rs`, and the `use yeast::Cursor;`
in `tree-sitter-extractor`'s `traverse_yeast`.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-29 10:34:36 +00:00
Taus
807bb51df7 yeast: Unify Node::kind() and Node::kind_name()
Both accessors returned the same private `kind_name: &'static str`
field; `kind_name()` is widely used (mainly by dump.rs and schema
diagnostics) and `kind()` had only 2 internal callers in lib.rs and
a handful in tests. Pick the more descriptive name and update the
callers.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-29 10:34:36 +00:00
Taus
b6abfe6e5c yeast: Remove dead prepend_field / prepend_field_child
`BuildCtx::prepend_field` and the underlying `Ast::prepend_field_child`
existed to support the create-then-mutate pattern in swift.rs (build
an output node, then prepend modifiers to its `modifier:` field). The
SwiftContext-based refactor on the previous branches eliminated all
such call sites: every emitted declaration now carries its modifiers
from birth, so the in-place prepend operation has no users.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-29 10:34:35 +00:00
Taus
b3dc7009a4 yeast: Remove dead BuildCtx::translate_opt
`translate_opt` was a convenience for the manual_rule! body code,
collapsing `Option<I>` to `Option<Id>` via `translate`. Since the
`@@` raw-capture migration replaced manual_rule! with rule!, no
callers remain — the auto-translate prefix handles `Option<Id>`
captures directly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-29 10:34:35 +00:00
Taus
e59f646870 yeast: Remove dead Captures methods
`Captures::map_captures`, `Captures::map_captures_to`, and
`Captures::try_map_all_captures` had no callers. The last one was
subsumed by `try_map_captures_except` (which takes a skip list and
degenerates to the old behaviour when the list is empty).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-29 10:34:35 +00:00
Taus
cc3c232631 yeast: Replace {..expr} splice syntax with trait-dispatched {expr}
In the initial implementation of yeast, the splice syntax was needed do
distinguish between splicing multiple nodes or just a single node.
However, this was always an ugly "wart" in the syntax, since the user
shouldn't have to worry about these things.

To fix this, we add an `IntoFieldIds` trait that dispatches on the
value's type: `Id` pushes a single id, and a blanket impl for
`IntoIterator<Item: Into<Id>>` handles `Vec<Id>`, `Option<Id>`, and
arbitrary iterator chains.

With this, we no longer need to use the special splice syntax, and hence
we can get rid of it.
2026-06-29 10:34:35 +00:00
Taus
9a5cc3c5e3 yeast: Make Id a newtype, delete NodeRef
Previously, the `Id` type  was a bare usize alias. The `NodeRef` newtype
existed solely to carry the AST-aware `YeastDisplay` /
`YeastSourceRange` impls (so that `#{captured_node}` rendered source
text rather than the numeric id) without colliding with the impls for
raw integer types.

This commit promotes `Id` itself to a (transparent) newtype struct and
moves the AST-aware trait impls directly onto it. With `Id` and `usize`
now being different types, the integer-display impl (for `usize`) and
the source-text impl (for `Id`) coexist without conflict, and `NodeRef`
becomes redundant (and so we remove it).
2026-06-29 10:33:32 +00:00
Taus
70ca7af04c Address PR review comments
- unified/swift: Mark `binding_kind` as a raw `@@` capture in the
  property_declaration rule. It is only used to read its source text
  (`ctx.ast.source_text`), never as a translated node. With `@` the
  auto-translate prefix would route the unnamed `let`/`var` token
  through the catch-all `_ @node => {node}` fallback for a no-op
  roundtrip; `@@` makes the intent explicit and removes that reliance.

- shared/yeast/tests: Reword a stale comment in test_raw_capture_marker.
  The text claimed a "second assertion" exists in this test, but the
  explicit-translation check actually lives in the companion
  test_raw_capture_marker_explicit_translate.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-26 13:30:01 +00:00
Taus
664f0125b9 yeast: Remove now-unused manual_rule!
The `manual_rule!` macro is now fully subsumed by `rule!` + `@@name`, so
this commit simply gets rid of the now no longer needed code.
2026-06-26 12:07:22 +00:00
Taus
1b7f589000 unified/swift: Migrate manual_rule! sites to rule! + @@
With `@@name` available, there's no longer a need to use `manual_rule!`.
Every place where it is used, we can instead just mark the relevant raw
captures as such. This results in quite a lot of cleanup! (Also, to me
at least, it makes these rules a lot easier to reason about.)

A first iteration of this approach resulted in a lot of
`.map(Into::into)` being needed, because `SwiftContext` stores `Id`s,
but captures produce `NodeRef`s. To avoid this, I swapped it around so
that the context stores `NodeRef`s. This does require adding `.into()`
in a few places, but it makes the rest of the code a lot more ergonomic.
2026-06-26 12:07:22 +00:00
Taus
eb7f8cc43d yeast: Add @@name raw-capture syntax to rule!
The `@@name` capture marker in `rule!` queries skips the
auto-translate prefix for that specific capture, letting the body see
the original capture (and thus delay its translation using
`ctx.translate` until it becomes convenient).

Regular `@name` captures continue to be auto-translated as before.
Specifically these are translated _eagerly_, before the main body of the
rewrite rule is run.

I settled on `@@` as the syntax because it did not add new symbols that
the user has to keep track of (it's still a kind of capture), but it's
still visually distinct enough that the user should be able to tell that
there's something special going on. In principle one could accidentally
write one form of capture where the other was intended, but in practice
this would result in code that did not compile (because the types would
not match).
2026-06-26 12:07:21 +00:00
Taus
af7ae8c4cb Apply rustfmt
Format the touched Rust crates (shared/tree-sitter-extractor,
shared/yeast, shared/yeast-macros, unified/extractor) so the
tree-sitter-extractor CI fmt check passes. No functional changes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-25 17:28:24 +02:00
Taus
1c4552edb0 unified/swift: Use tree! instead of ctx.node
Cleans up a few places where we were constructing trees piece by piece
rather than using the `tree!` macro.

In the process, Copilot noticed an issue that should probably be
addressed: the labeled_statement rule can never fire, since there are no
such nodes in the input. This is possibly a simple as making
_labeled_statement (which _does_ exist) named, but I haven't attempted
this.

Finally, a small change to yeast makes it so that the contents of a {}
interpolation can be a Rust block (previously it could only be a single
expression). This avoids the need to double-wrap instances where you
want to interpolate a single node produced as the final value of some
block.
2026-06-25 17:28:24 +02:00
Taus
85c39c04e0 yeast: Hide desugaring behind Desugarer trait
This was necessary since otherwise the generic type of the
user-specified context (which should only be a concern for yeast) starts
to bleed out into the shared extractor. Instead, we type-erase it by
putting it inside the aforementioned trait.
2026-06-25 17:28:24 +02:00
Taus
1ee142d8bd yeast: Add macro for fine-grained rules
Adds `manual_rule!` which provides a more low-level interface for
defining rewrites. (I'm not entirely sold on the name, so any
suggestions would be welcome.)

Notably, the captures bound in the body of such rules have _not_ been
translated yet -- they still come from the _input_ tree. It is the
user's duty to call ctx.translate on these (which has the effect of
recursively invoking the translation) before substituting them into the
output.

For _truly_ low-level access, the user can still construct a Rule
directly, but this is now somewhat cumbersome as the closure contained
therein takes quite a few parameters. Still, the possibility remains.
2026-06-25 17:28:24 +02:00
Taus
a523c7f47f yeast: Pass raw captures to Rule::new rules
This enables users to specify how and when these captures get
translated. In conjunction with the context mechanism, this can be used
to e.g. translate some piece of information (e.g. the type of
something), record it in the context, and then recursively translate
some other capture that relies on this information. This allows
information to be cleanly passed into descendants (which can be written
using context accesses in the `rule!` macro form).

As a consequence of this change, we now need to pass around a
TranslatorHandle to perform the manual translation. For Repeating rules,
it doesn't really make sense to translate things, so in this case we
simply signal an error.

Also, the implementation of the `rule!` macro changes slightly (without
changing semantics): it now essentially delegates to `Rule::new`,
receiving raw captures, but then immediately applies the translation to
those captures (which, for the majority of cases, is likely the desired
behaviour).
2026-06-25 17:28:24 +02:00
Taus
5f73754b95 yeast: Make transforms return Result
This will enable us to actually capture and log errors in complicated
rules (e.g. ones written in Rust) rather than just panicking.
2026-06-25 17:28:24 +02:00
Taus
e0fa6cf785 yeast: Reify the context and allow user-defined data in it
Renames what was previously called `__yeast_ctx` into just `ctx`, and
adds a new field `user_ctx` to this context. Said field can contain a
struct of any user type (necessitating making various parts of the
implementation generic in said type).

Through some Deref magic, field accesses are delegated to the inner
struct (assuming they are not already defined on `ctx`), which should
hopefully make the interface a bit more ergonomic.
2026-06-25 17:28:24 +02:00
Asger F
6c74cd31e4 Yeast: use child locations instead of rule target
Previously, when a node was synthesized it would always take the
location from the node that matched the current rule. This resulted
in overly broad locations however.

For (foo #{bar}) we now take the location of the 'bar' node.

For non-leaf nodes we merge all its child node locations.
2026-06-18 14:26:30 +02:00
Asger F
0baa126473 Add ability to prepend fields in Yeast 2026-06-15 10:49:35 +02:00
Asger F
ddc9516e92 Yeast: better support for rewriting unnamed nodes
- Ensure the full wildcard _ supports quantifiers
- Also rewrite unnamed nodes in one-shot phases
2026-06-15 10:49:31 +02:00
Asger F
3f3bed62d3 yeast: type-check for missing required fields
Add FieldCardinality to Schema to track required/multiple per field,
populated from the ast_types.yml suffixes (bare = required single,
? = optional single, + = required multiple, * = optional multiple).

dump_ast_with_type_errors now emits:
  <-- ERROR: missing required field 'name'
for any node in the output AST whose declared schema requires a field
that is absent from the actual node.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-01 14:18:37 +02:00
Asger F
1ecdc3614f Yeast: Fix matching against extras like comments 2026-06-01 14:18:37 +02:00
Asger F
e3b3888bee Yeast: Fix handling of captures with multiple results 2026-06-01 14:18:36 +02:00
Asger F
6b58482dfb Yeast: Fix text associated with synthesized nodes 2026-05-13 10:35:22 +02:00
Asger F
5772ee4d9b YEAST: add NodeRef type, YeastDisplay trait, and source text storage
Introduce NodeRef as a typed wrapper around node arena IDs. Captures in
desugaring rules are now bound as NodeRef instead of raw usize, which
prevents accidental misuse and enables source-text-aware rendering.

Add the YeastDisplay trait as an alternative to Display: its
yeast_to_string method receives the Ast, allowing NodeRef to resolve to
the captured node's source text instead of printing a numeric ID.

Store the original source bytes in the Ast so that NodeContent::Range
values (from synthesized literal nodes) can be resolved back to text.

Update yeast-macros to emit NodeRef-typed capture bindings and use
Into::<usize>::into where raw IDs are needed. The #{expr} template
syntax now uses YeastDisplay instead of Display.

The effect is visible in the corpus tests: operator nodes now correctly
render as e.g. operator "+" instead of operator "3".

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-13 10:35:17 +02:00
Asger F
5d0cb9e805 YEAST: fix one-shot rules for unnamed nodes and self-captures
One-shot desugaring rules now skip unnamed nodes (punctuation, keywords,
etc.) since rules are intended to target named nodes only.

Also prevent infinite recursion when a capture refers to the root node of
the matched tree (e.g. an @_ capture on the pattern root).

Additionally fix the swift.rs add_phase call to match the updated 3-arg
signature introduced by the one-shot phase kind commit.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-13 10:35:12 +02:00
Asger F
c3a9218dcf Yeast: Add one-shot phase kind 2026-05-13 10:35:09 +02:00
Asger F
a049850c51 Yeast: add type-checking errors in AST dump 2026-05-13 10:35:07 +02:00
Asger F
49f19092fb Yeast: add reachable_node_ids() 2026-05-13 10:35:05 +02:00
Taus
15936a5f8d yeast: Take fields by ownership in apply_rules_inner
Previously, apply_rules_inner snapshotted a node's fields by cloning
the BTreeMap into a Vec<(FieldId, Vec<Id>)>, then built a fresh
BTreeMap of new_fields for the rewritten Ids. For a node with N
fields, this allocated 2N+1 things per visit (the snapshot Vec, N
cloned children Vecs, the new BTreeMap entries) — even when nothing
in the subtree was rewritten.

Use std::mem::take to swap the parent's fields out by ownership: the
recursion can mutate the AST (including pushing new nodes from rule
firings) without any conflict, since we hold the owned BTreeMap
locally. Iterate values_mut() and only allocate a fresh children Vec
on the first divergence (lazy alloc): unchanged children stay in the
existing slot. When done, swap the fields back.

For a subtree with no rewrites, this is now zero allocations per node
(modulo the recursion itself). For nodes with rewrites, it's one Vec
allocation per field that contains a rewritten child, instead of two
plus the BTreeMap rebuild.
2026-05-08 12:48:10 +00:00
Taus
7bd27b83e0 yeast: Mutate parent fields in place; remove redundant Node::id
apply_rules_inner used to handle the "child was rewritten, so the
parent needs new field IDs" case by cloning the parent node, swapping
in the new fields, pushing the clone onto the arena, and returning the
new Id. Every ancestor on the path from the rewrite up to the root was
duplicated this way, with the originals retained as garbage in the
arena.

Switch to in-place mutation: assign `ast.nodes[id].fields = new_fields`
and return the same Id. Rule firings still produce genuinely new nodes
via BuildCtx (their structure differs from the input), but the
ancestor-rebuild spine no longer copies anything.

This is safe because apply_rules_inner already works entirely by Id:
the field snapshot is cloned out before recursing, no &Node references
are held across mutations of the arena, and captures are scoped to a
single rule firing so the now-stable Ids do not break anything.

Memory effect: a desugaring pass that rewrites R leaves of a tree of
average depth d previously appended R*d ancestor clones to the arena.
Now appends 0.

With Ids stable for the lifetime of an Ast, the Node::id field becomes
truly redundant and is removed (along with the Node::id() accessor).
AstCursor switches from caching `node: &Node` to tracking `node_id:
Id` and looking the node up via the arena on each access; ChildrenIter
now yields Ids directly. A new AstCursor::node_id() method gives
callers access to the cursor position by Id.
2026-05-08 12:47:22 +00:00
Taus
5a4dee50f7 Merge pull request #21810 from github/tausbn/yeast-forward-scan-queries
yeast: Align query semantics more closely with tree-sitter
2026-05-08 13:30:43 +02:00
Taus
af6e921da5 yeast: Forward-scan bare child patterns instead of strict positional
Previously, a bare child pattern in a query took whatever the next
child of the iterator was and either matched or failed: it would not
scan ahead to find a match. So `(foo ("baz"))` against a `foo` whose
implicit `child` field was `["bar", "baz"]` would fail (the pattern
took "bar" first).

Switch to forward-scan semantics: a SingleNode matcher advances through
the iterator until it finds a child that matches its sub-query. Patterns
that are named-only continue to skip past unnamed children for free.
Order is preserved across multiple bare patterns at the same level —
each pattern advances the shared iterator past whatever it consumed —
so a query cannot match children out of source order.

Captures from a failed match attempt are rolled back via a snapshot, so
partial captures from a complex sub-query do not leak across attempts.

Add two regression tests against the `do` body wrapper in a Ruby
for-loop, whose implicit `child` field contains [do, identifier, end]:
- a query for ("end") matches by skipping past `do` and the identifier
- a query for ("end") then ("do") fails, demonstrating order preservation
2026-05-07 15:08:22 +00:00
Taus
6f643a3604 yeast: Use canonical ID when registering unnamed kinds in Schema
Schema::from_language registered unnamed kinds via or_insert(id), where
`id` came from iterating 0..node_kind_count. For names with multiple
unnamed IDs (notably "end" in tree-sitter-ruby has IDs 0 and 13, where
ID 0 is the reserved error token), this picked the first encountered
ID — typically the wrong one.

The visitor sets node.kind via language.id_for_node_kind(name, false),
which returns the canonical ID. So a query for ("end") would compare
node.kind=13 against schema=0 and silently fail to match, with no
diagnostic.

Use language.id_for_node_kind(name, false) to obtain the canonical ID
when registering, mirroring the named-kind path that already does the
same with id_for_node_kind(name, true).
2026-05-07 15:08:21 +00:00
Taus
a4df96aad6 yeast: Support capturing unnamed nodes in queries
Three improvements to the query parser, all aimed at allowing query
patterns to refer to unnamed tokens:

1. Bare-literal capture: `"=" @op` now captures the unnamed `=` token,
   matching the parenthesized form `("=") @op`. Previously the literal
   branch in parse_query_list skipped the maybe_wrap_capture call, so
   the `@op` was a leftover token and would error.

2. Bare `_` matches any node, named or unnamed. Previously bare `_` and
   `(_)` both produced QueryNode::Any with the same matches_named_only
   behaviour, so bare `_` would skip unnamed children. Now Any carries a
   match_unnamed flag: false for `(_)` (named-only, tree-sitter default)
   and true for bare `_` (any node).

3. Named fields and bare child patterns may be intermixed in any order.
   Previously, once parse_query_fields saw a bare pattern it would stop
   accepting named fields. The fix accumulates bare patterns into the
   implicit `child` field and keeps parsing.

Each named field independently selects its target field for matching, so
the source-order of fields in the query is purely cosmetic and intermixing
is safe.

Add tests covering parenthesized capture, bare-literal capture, and the
named-vs-any distinction between `(_)` and bare `_`. Update query-syntax
docs to reflect all three.
2026-05-07 15:08:21 +00:00
copilot-swe-agent[bot]
e0d663f79b yeast: address review wording in phase docs
Agent-Logs-Url: https://github.com/github/codeql/sessions/6d23db05-a6e9-4de4-8951-b465980fd0ef

Co-authored-by: tausbn <1104778+tausbn@users.noreply.github.com>
2026-05-07 12:35:46 +00:00
Taus
957c89b478 yeast: Support multi-phase desugaring via DesugaringConfig::add_phase
Extend the desugaring config from a single flat list of rules to an
ordered sequence of named Phases. Each phase runs to completion (a
full traversal applying its rules) before the next phase starts.
Rules in different phases never compete for matches.

The config is built via the new chainable API:

    DesugaringConfig::new()
        .add_phase("cleanup", cleanup_rules)
        .add_phase("desugar", desugar_rules)
        .with_output_node_types_yaml(yaml);

Single-phase configs are just .add_phase(...) called once.

A single FreshScope is shared across phases so generated identifier
names (e.g. $tmp-N) are unique throughout the run.

Phase names appear in error messages, e.g. "Phase `desugar`:
exceeded maximum rewrite depth".

Add two regression tests: one verifying basic two-phase chained
desugaring, and one verifying that errors include the failing phase
name.
2026-05-06 21:17:31 +00:00
Taus
9a94836974 yeast: Add per-rule .repeated() flag to opt into iterative matching
Previously, after a rule fired the engine would always re-try that
same rule on the result root. A rule whose output matched its own
query (intentionally or by accident) would loop until the global
MAX_REWRITE_DEPTH safety net kicked in.

Make the default behavior fire-once-per-node: after a rule fires on
node N, the engine no longer tries that same rule on the result root.
Other rules and child traversal are unaffected. Rules that
intentionally rewrite iteratively can opt into the old behavior via
the new Rule::repeated() builder method.

Add two regression tests using a self-swapping assignment rule:
- with .repeated(), the swap loops and trips the depth limit
- without it (default), the swap fires once and terminates
2026-05-06 12:33:18 +00:00
Taus
a0a0e9e9a7 yeast: Add test for chained rules with output-only kinds
Adds a regression test verifying that desugaring rules can chain across
output-only node kinds: a first rule rewrites an input kind to an
output-only kind, and a second rule then rewrites that output-only
kind into another output-only kind. This exercises the schema lookup
for query patterns whose root kind is not present in the input
tree-sitter grammar.
2026-05-06 11:45:53 +00:00
Taus
60dcf88b50 yeast: Add Bazel build rules for yeast crates
Add BUILD.bazel files for the yeast and yeast-macros crates, register
them as dependencies of the shared tree-sitter extractor, and refresh
the vendored crate dependencies via update_tree_sitter_extractors_deps.sh.
2026-05-06 11:34:09 +00:00
Taus
cc28ff9a48 yeast: Add yeast documentation
Covers architecture, query language, template language
(tree!/trees!/rule!),
capture semantics, fresh identifiers, and extractor integration.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-06 11:34:09 +00:00
Taus
6e580446fd yeast: Add yeast test suite
12 tests covering parsing, queries, tree building, desugaring rules,
cursor navigation, and the shorthand rule! syntax.

Tests use a custom output node-types.yml with named fields for all
children (parameter, stmt, index), loaded via
schema_from_yaml_with_language.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-06 11:34:09 +00:00
Taus
4c5548363c yeast: Add AST dumper for human-readable tree output
Produces indented text showing node kinds, named fields, and leaf
content. Unnamed tokens are hidden unless inside a named field.
Used by tests for readable assertions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-06 11:34:09 +00:00
Taus
8a9e53cc58 yeast: Add YAML node-types format and converter
Human-friendly YAML alternative to tree-sitter node-types.json with
three sections: supertypes, named, unnamed. Supports bidirectional
conversion and building Schema objects from YAML.

Includes CLI binary (node_types_yaml) and documentation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-06 11:34:09 +00:00
Taus
04f587190e yeast: AST desugaring framework with proc-macro DSL
YEAST (YEAST Elaborates Abstract Syntax Trees) is a framework for
transforming tree-sitter parse trees before CodeQL extraction.

Core components:
- shared/yeast/ — Ast, Node, Schema, query matching engine, captures,
  FreshScope, BuildCtx
- shared/yeast-macros/ — proc macros: query!, tree!, trees!, rule!

The query language is inspired by tree-sitter queries:
  (assignment left: (_) @lhs right: (_) @rhs)

Templates support embedded Rust ({expr}), splicing ({..expr}),
computed literals (#{expr}), and fresh identifiers ($name).

The rule! macro combines query and transform:
  rule!((for pattern: (_) @pat ...) => (call receiver: {val} ...))

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-06 11:34:09 +00:00