Commit Graph

87272 Commits

Author SHA1 Message Date
Taus
112b38f78e Yeast: Fix three issues from second review round
1. Depth guard now only counts rule rewrites, not tree traversal.
   Deep ASTs (>100 levels) no longer trigger false "non-terminating
   cycle" errors. Only actual rule-rewrite chains are depth-limited.

2. Repeated matcher detects zero-width matches and breaks the loop.
   Patterns like ((_)?)* no longer infinite-loop — if the iterator
   does not advance after a successful match, the repetition stops.

3. Shorthand rule! syntax now propagates source_range to synthetic
   nodes via create_node_with_range, matching the full template path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:43:21 +00:00
Taus
15f45dd43e Yeast: Support Option<Id> for ? captures in rule!
Captures on ? patterns now bind as Option<Id> instead of Id, correctly
handling the case where the optional pattern matches zero times.

Capture multiplicity is now tracked as a three-way enum:
- Single (no quantifier) → Id
- Optional (?) → Option<Id>
- Repeated (* or +) → Vec<Id>

Added Captures::get_opt() which returns None for unmatched captures.
The shorthand rule! syntax also handles optional fields correctly,
only inserting the child when Some.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:42:53 +00:00
Taus
c648b2a493 Yeast: Remove field*: syntax from query parser
The field*: syntax was a leftover from the old explicit child*: pattern.
Fields now always match a single node pattern. Bare children (unnamed
positional matches) are expressed as patterns after all named fields.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:42:23 +00:00
Taus
f5a41c09cb Yeast: Fix remaining review issues (#8-#10)
#8: Reject * after non-capture template groups with a compile error.
Previously (foo (bar)*) silently dropped the *, behaving like (bar).

#9: Verify inner token streams are exhausted after parsing query nodes.
Unconsumed tokens inside a parenthesized group now produce a compile
error. Fixed a test using the old redundant (pattern)* syntax inside
a field*: group.

#10: Use ast.get_root() instead of hardcoded 0 for the root node id
in apply_rules calls.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:42:22 +00:00
Taus
25155298c7 Yeast: Fix four review issues (#4-#7)
Non-terminating rule cycles now produce an error instead of a stack
overflow.

Captures inside a repeated group (e.g. ((_) @x)*) are now correctly
marked as repeated by passing a parent_repeated flag through recursion.

Empty repetition children are handled as a special case before the
loop.

stale captures could remain in the map after a failed sub-query match.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:42:22 +00:00
Taus
a3e7e4cfd2 Yeast: Propagate errors via Result instead of panicking
Runner::run and run_from_tree now return Result<Ast, String> instead
of panicking on errors. Errors from query matching (unknown node kinds
or field names), parser setup, and unexpected result counts are all
propagated.

The extractor's extract_and_desugar gracefully falls back to the
un-desugared AST on error, logging the failure via tracing::error.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:41:55 +00:00
Taus
23011671ff Yeast: Propagate source locations to synthetic nodes
Synthetic nodes created by desugaring rules now inherit the source
range of the original matched node. This fixes invalid TRAP locations
(previously (0,0)-(0,0)) for desugared nodes.

- Node gains a source_range field used as fallback for position/byte
  methods when content is not a Range
- BuildCtx stores the matched node's range and passes it to all
  created nodes
- Rule::try_rule extracts the source range from the matched node
  and passes it through the transform closure

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:41:20 +00:00
Taus
afcedea877 Shared extractor: Implement traverse_yeast via AstNode trait
Add an AstNode trait abstracting over tree_sitter::Node and yeast::Node,
implemented by both types. The Visitor methods (enter_node, leave_node,
record_parse_error_for_node, complex_node, sliced_source_arg,
location_for) are now generic over AstNode.

traverse_yeast uses yeast's AstCursor (which now iterates in source
order) to drive the same generic Visitor. extract_and_desugar is now
fully functional — it can parse, apply yeast rules, and emit TRAP.

sliced_source_arg uses opt_string_content() for yeast nodes with
synthetic content (from desugaring), falling back to source byte range.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:02 +00:00
Taus
e51338cadd Shared extractor: Restore extract() to use tree-sitter traversal
Fix the critical traversal order bug: extract() was routing through
yeast even with empty rules, which changed child ordering from
tree-sitter source order to BTreeMap field-id order. This affected
parent_index values in TRAP output for ALL languages.

extract() is now restored to main's implementation using tree-sitter's
native Node and TreeCursor. extract_and_desugar() remains available for
languages that need desugaring, with a fallback to extract() when no
rules are provided.

The yeast-based TRAP traversal (traverse_yeast) is stubbed as
unimplemented — it needs a proper adapter between yeast::Node and
the Visitor before it can be used. No language currently uses it.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:02 +00:00
Taus
86972fa04f Yeast: Share FreshScope across rule applications
The fresh identifier counter is now shared across all rule applications
within a single Runner::run call. Each rule application gets a fresh
"scope" (resolved names are cleared) but the counter keeps incrementing.

This ensures that $tmp in different rules produces distinct names:
the for-rule gets $tmp-0, the assignment rule gets $tmp-1.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:02 +00:00
Taus
147ca943b5 Yeast: Add combined for-loop + multiple assignment test
Tests that both desugaring rules fire correctly on input like
"for a, b in list do x end" — the for-loop is rewritten to .each
and the multiple assignment pattern is expanded within the block body.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:02 +00:00
Taus
378af3103a Yeast: Rewrite test suite with inline dump-based assertions
Replace fixture-file-based tests with self-contained tests using
dump_ast for assertions. Move Ruby desugaring rules from rules.rs
into the test file. Delete all fixture files and rules.rs.

Test coverage:
- Parsing: simple assignment, multiple assignment, for loop
- Queries: match, no-match, repeated captures
- Tree building: swap fields via BuildCtx
- Rules: multiple assignment desugaring, for-loop desugaring,
  shorthand rule! syntax
- Cursor: navigation (parent/child/sibling)

11 tests, all self-contained with inline inputs and expected outputs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:01 +00:00
Taus
bc48ea4e3c Yeast: Skip unnamed tokens in AST dump output
Unnamed tokens (keywords, operators, punctuation) in the unnamed
children bucket are no longer shown in dump output. They still appear
if they are inside a named field.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:01 +00:00
Taus
2113ba7f61 Yeast: Add AST dumper for human-readable tree output
Add yeast::dump::dump_ast() which produces indented text output:

    program
      assignment
        left:
          left_assignment_list
            identifier "x"
            identifier "y"
        right:
          call
            method: identifier "foo"

Named fields are shown with "field:" labels, unnamed children are
indented under their parent. Leaf nodes show their text content.
Locations are optional via DumpOptions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:01 +00:00
Taus
83739f6eaf Yeast: Add shorthand rule! syntax for capture-to-field mapping
rule!(query => kind_name) is a shorthand for rules that simply gather
query captures into fields on a new node type. Each capture name
becomes a field: single captures produce single-valued fields, repeated
captures produce multi-valued fields.

    rule!((foo f: (boo (_) @blah) (_)* @blop) => bar)

is equivalent to:

    rule!((foo ...) => (bar blah: {blah} blop: {..blop}))

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:01 +00:00
Taus
79f00b87a3 Yeast: Remove unnecessary braces in rule! closure
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:01 +00:00
Taus
9893c45148 Yeast: Add rule! macro combining query and transform
rule! declares a desugaring rule with query pattern and transform
template in a single expression:

    rule!(
        (for pattern: (_) @pat value: (in (_) @val) body: (do (_)* @body))
        =>
        (call receiver: {val} method: (identifier "each") ...)
    )

Captures become Rust variables automatically: @name binds as Id
(single capture) or Vec<Id> (after * or +). The BuildCtx is created
implicitly. tree! and trees! can also be used without an explicit
context inside rule! transforms.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:01 +00:00
Taus
116c13a920 Yeast: Remove unnecessary .collect() from {..} splice
{..expr} calls extend(), which accepts any IntoIterator — no need to
collect into a Vec first.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:01 +00:00
Taus
66cde4719e Yeast: Update documentation for tree!/trees! macros
Rewrite the builder language section to document the current API:
tree!(ctx, ...) / trees!(ctx, ...) with BuildCtx, {expr} for embedded
Rust, {..expr} for splicing, #{expr} for computed literals, and $name
for fresh identifiers. Remove all references to the old TreeBuilder.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:01 +00:00
Taus
2f53acac82 Yeast: Remove old TreeBuilder infrastructure
Delete TreeBuilder, TreeChildBuilder, TreesBuilder enums and all their
methods, along with the tree_builder! and trees_builder! proc macros.
All building is now done through tree!/trees! with BuildCtx.

The tree_builder module is kept for FreshScope only.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:01 +00:00
Taus
1613372191 Yeast: Make tree! always return Vec<Id>
tree! now consistently returns Vec<Id>, even for a single element.
This matches the Rule transform signature and removes the need to
wrap single-node results in vec![].

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:01 +00:00
Taus
b0e2e5eb0d Yeast: Unify tree! and trees! into a single tree! macro
tree! now returns Id for a single element or Vec<Id> for multiple,
determined by how many top-level elements appear in the template.
trees! is kept as an alias for backward compatibility.

- tree!(ctx, (single_node ...))        → Id
- tree!(ctx, (node1 ...) (node2 ...))  → Vec<Id>
- tree!(ctx, (node) {..splice})        → Vec<Id>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:00 +00:00
Taus
e54d84c29f Yeast: Add {..expr} splice syntax and inline assignment rule
{..expr} in tree!/trees! splices a Vec<Id> (extend), while {expr}
inserts a single Id (push). The .. mirrors Rust spread syntax.

The assignment rule is now a single trees! expression with the loop
inlined via {..iter.map(|...| tree!(...)).collect()}.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:00 +00:00
Taus
4b666a7e4b Yeast: Use #{expr} for computed literals in tree templates
Change computed literal syntax from (kind {expr}) to (kind #{expr})
to avoid ambiguity with child node splicing ({expr}). The # sigil
mirrors Rust format string syntax.

- (kind "static")  — static string content
- (kind #{expr})   — computed content via expr.to_string()
- (kind $name)     — fresh identifier
- {expr}           — child node (always Id, unambiguous)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:00 +00:00
Taus
7d561f13c6 Yeast: Support computed literals in tree templates
(kind {expr}) creates a leaf node whose content is expr.to_string().
This eliminates ctx.literal() calls for computed values like loop
counters: (integer {i}) instead of {ctx.literal("integer", &i.to_string())}.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:00 +00:00
Taus
99d1730ad1 Yeast: Add tree!/trees! macros with embedded Rust expressions
New tree!(ctx, ...) and trees!(ctx, ...) macros that directly build AST
nodes through a BuildCtx, replacing the intermediate TreeBuilder data
structures for new code.

Key features:
- {expr} embeds Rust expressions inline in templates
- @name references captures from the query match
- $fresh generates unique identifiers (shared across the template)
- (kind "literal") creates literal leaf nodes
- (@name)* splices repeated captures

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:00 +00:00
Taus
01506688aa Yeast: Add documentation for the desugaring framework
Covers architecture, the query and builder languages, capture semantics,
fresh identifiers, and extractor integration, with a complete for-loop
desugaring example.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:00 +00:00
Taus
1378390063 Yeast: Remove unnecessary end-token deletion rule
The end token rule was deleting orphaned "end" tokens after
desugaring. This is no longer needed since the for-loop rule replaces
the entire for node (including its end token), and unnamed tokens are
skipped automatically during matching.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:00 +00:00
Taus
c6f362d13e Yeast: Remove unnecessary token patterns from queries
Since (_) and (_)* skip unnamed tokens automatically, explicit patterns
for discarding tokens are no longer needed.

Before: (left_assignment_list ((identifier) @left (",")?)*  )
After:  (left_assignment_list (identifier)* @left)

Before: (do "do"? (_)* @body)
After:  (do (_)* @body)

Also fix proc macro parsing of (node_kind)* to correctly treat the
group as a repeated single node pattern rather than a repeated list.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:00 +00:00
Taus
1b74721af9 Yeast: Require captures to follow a pattern (tree-sitter style)
Align capture syntax with tree-sitter queries: @name must always follow
a pattern, never appear standalone. This eliminates the ambiguity where
`@name` could be confused with a positional wildcard.

Before: right: @right, (@body)*
After:  right: (_) @right, (_)* @body

For repeated patterns, `(_)* @body` captures each matched node into
the repeated capture variable, matching tree-sitter semantics.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:00 +00:00
Taus
185a637915 Yeast: Make (_) match only named nodes (tree-sitter semantics)
In tree-sitter queries, (_) matches any named node but skips unnamed
tokens. Implement the same semantics: QueryNode::Any and captures
wrapping Any now skip unnamed children in positional matching.

This allows writing `(in (_) @val)` to match the first named child of
an `in` node, skipping the "in" keyword token. Previously this
required the awkward `(in "in" @val)`.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:39:00 +00:00
Taus
0b601290b1 Yeast: Document the YAML node-types format
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:38:59 +00:00
Taus
cd6aab62a6 Yeast: Always quote unnamed node types in YAML output
Unnamed tokens are now always wrapped in double quotes in the YAML
output, making them visually distinct from named node references.
YAML treats both forms as equivalent strings, so this is purely
a readability improvement.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:38:59 +00:00
Taus
782181aeae Yeast: Add JSON-to-YAML conversion for node types
Add convert_from_json() to convert tree-sitter node-types.json back to
the YAML format. The CLI gains a --from-json flag.

The round-trip is tested: YAML → JSON → YAML → JSON produces identical
JSON output.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:38:59 +00:00
Taus
1340c516c7 Yeast: Add YAML node-types format and converter
Add a human-friendly YAML format for specifying node types, with a
converter to tree-sitter's node-types.json format.

YAML format has three top-level sections:
- supertypes: union types mapping to lists of member types
- named: concrete AST nodes with fields (?, *, + suffixes for
  multiplicity) and $children for unnamed children
- unnamed: list of token strings

Type references are resolved automatically: if a name appears only as
unnamed, it's treated as unnamed; otherwise named. Use {unnamed: name}
for explicit disambiguation.

Includes a library API (yeast::node_types_yaml::convert) and a CLI
binary (node_types_yaml) that reads YAML and outputs JSON.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:38:59 +00:00
Taus
93fea91f76 Shared extractor: Support separate output node types
Add an optional output_node_types field to Language (generator) and
LanguageSpec (extractor). When set, the generator produces dbscheme/QL
from the output types, and the extractor validates TRAP against them.

This enables desugaring transforms that produce AST shapes different
from the tree-sitter grammar. When unset (None), behavior is unchanged
— the tree-sitter node_types are used for both input and output.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:38:59 +00:00
Taus
dc86d9118f Yeast: Auto-manage FreshScope in build_tree/build_trees
build_tree() and build_trees() now create their own FreshScope
internally. For the rare case where a shared scope is needed across
multiple build calls (e.g. the assignment rule), _with_fresh variants
are available.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:38:59 +00:00
Taus
c064b4a369 Yeast: Add literal and fresh identifier syntax to tree builders
Add two new TreeBuilder variants for declarative identifier creation:

- Literal: (kind "value") — creates a leaf node with fixed content
  Example: (identifier "each") creates an identifier node with text 'each'

- Fresh: (kind $name) — creates a leaf node with an auto-generated
  unique name. All occurrences of the same $name within one rule
  application share the same generated value.
  Example: (identifier $tmp) creates 'tmp-0', 'tmp-1', etc.

FreshScope tracks generated names per rule application. This eliminates
the need for manual Rc<Cell> counters and create_named_token calls.

The for-loop rule is now fully declarative (no imperative code in the
transform closure). The assignment rule still needs a closure for the
index counter in repeated captures.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:38:59 +00:00
Taus
2f18ad6f31 Shared extractor: Add extract_and_desugar function
Add extract_and_desugar() which accepts yeast desugaring rules.
The original extract() function is unchanged and delegates to
extract_and_desugar with empty rules.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:38:59 +00:00
Taus
99d4f391cb Yeast: Remove unnecessary parens around literal with quantifier
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:38:59 +00:00
Taus
7d4d145626 Yeast: Implicit child field for bare patterns
Bare patterns inside a node (not preceded by 'field:') are now
automatically assigned to the synthetic 'child' field. This removes the
need for explicit 'child*: (...)' syntax.

Before: (do child*: (("do")? (@body)*))
After:  (do ("do")? (@body)*)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:38:59 +00:00
Taus
9beab6d1fc Yeast: Replace macro_rules with procedural macros
with procedural macros in a new yeast-macros crate. The proc macros parse
a tree-sitter-inspired syntax and generate the same runtime data structures.

Key improvements:
- Better error messages with source spans
- Cleaner syntax closer to tree-sitter query notation
- Captures use @name after patterns (tree-sitter style)
- Fields with bare @capture no longer need wrapping parens
- Removed ~10 interdependent macro_rules! definitions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:38:58 +00:00
Taus
7efb03d4cc Yeast: AST desugaring framework (rebased from hackathon-desugaring)
Add the yeast crate (Yet another Elaborator for Abstract Syntax Trees),
a framework for tree-sitter AST transformations/desugaring. Integrate it
into the shared tree-sitter extractor.

Key components:
- shared/yeast/: New crate with query/match/transform pipeline for
  tree-sitter ASTs, with Ruby desugaring rules as an example
- shared/tree-sitter-extractor: Pass parsed trees through yeast before
  TRAP extraction, applying language-specific desugaring rules

Updated from the original hackathon branch to work with tree-sitter 0.24
and current main dependencies.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:38:58 +00:00
Mathias Vorreiter Pedersen
154d213fd2 Merge pull request #21768 from github/speed-up-unchecked-leap-year-after-modification
C++: Speed up `cpp/leap-year/unchecked-after-arithmetic-year-modification`
2026-04-30 16:06:17 +01:00
Michael Nebel
4446f42846 Merge pull request #21684 from michaelnebel/csharp/improve-reachability-checks
C#: Improve BMN feed checking & handling.
2026-04-30 15:53:52 +02:00
Owen Mansel-Chan
87c35e6401 Merge pull request #21654 from MarkLee131/fix/sensitive-log-hash-sanitizer
Java: treat hash/encrypt/digest methods as sensitive-log sanitizers
2026-04-30 13:21:03 +01:00
Tom Hvitved
a473fdb709 Merge pull request #21759 from hvitved/csharp/cfg-params
C#: Include parameters and their defaults in the CFG
2026-04-30 11:31:06 +02:00
Owen Mansel-Chan
fed42d655f Merge pull request #21656 from MarkLee131/fix/trust-boundary-regexp-barrier
Java: add RegexpCheckBarrier to trust-boundary-violation sanitizers
2026-04-29 14:59:01 +01:00
Michael Nebel
03d70b9f94 C#: Add another nuget.config integration test. 2026-04-29 15:47:32 +02:00
Michael Nebel
e29770c2b5 C#: Fix missing slash in comments. 2026-04-29 15:27:47 +02:00