66 Commits

Author SHA1 Message Date
Asger F
1751d70c62 Fix parsing of corpus tests when --- delimiter is missing 2026-06-01 14:18:37 +02:00
Asger F
313500e581 Unified: update test outputs 2026-05-27 21:27:09 +02:00
Taus
d6e7e38e1c Unified: merge in main
Keeps our version of the conflicting files. They will be regenerated in
the next commit.
2026-05-27 15:01:03 +00:00
Taus
fbc861e7a4 unified: Clarify grammar comment
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-05-26 16:19:02 +02:00
Taus
f4f85b58ca unified: Remove some pointless fields
All of these fields have contents that are uniquely determined by the
node they appear on, so they convey no information.
2026-05-13 14:22:06 +00:00
Taus
caef72b047 unified: Introduced named property_binding node
This groups together a bunch of related values that would otherwise be
impossible to match up correctly.
2026-05-13 13:54:21 +00:00
Taus
9787a8b072 unified: Group enum entries
Same as in the preceding commit.
2026-05-13 13:51:25 +00:00
Taus
c8f7c3d7f2 unified: Group more paired items
Same as in the preceding commit, these items do not make sense as
separate fields on the parent node, so we materialise (or create new)
intermediate nodes to group them together.
2026-05-13 13:49:30 +00:00
Taus
ea6f3a9568 unified: Encapsulate function parameters
The field representation would have made it difficult to figure out
which parameters correspond to which default values and attributes, so
instead we now encapsulate these in a new `function_parameter` node.
2026-05-13 13:20:58 +00:00
Taus
5d6dc5c3c3 unified: Clean up statements/block mess
Introduces (by making it named) a `block` node, and conversely makes
`statements` anonymous. This enables us to sensibly distinguish between
the "then" and "else" branch of an `if_statement`, which we were not
able to previously.
2026-05-13 13:06:34 +00:00
Asger F
7fa6c4e4a3 Unified: Update test output after rebasing on grammar changes
The branch was rebased on the grammar changes, but rewriting the history was too difficult, so I'm just updating the test output here.
2026-05-13 10:35:34 +02:00
Asger F
600a4969c9 Unified: Simplify concatenation of arguments 2026-05-13 10:35:33 +02:00
Asger F
55194dd757 Unified: Support for calls and member access 2026-05-13 10:35:31 +02:00
Asger F
cbe4c81ca6 Unified: add tuple_pattern and sequence_condition; refine if-let/guard mapping
ast_types.yml additions:
- tuple_pattern { element*: pattern } in the pattern supertype.
- sequence_condition { stmt*: stmt, condition: condition } in the
  condition supertype.

swift.rs:
- Map Swift tuple destructuring (e.g. `let (a, b) = pair`) to the new
  tuple_pattern instead of synthesizing an apply_pattern.
- if-let / guard-let: explicitly match the value_binding_pattern
  (the `let` keyword) and bind the source expression as the next
  condition child, so `let` no longer leaks into the output.
2026-05-13 10:35:29 +02:00
Asger F
ccc1dd5d3e Unified: Add tuple_pattern 2026-05-13 10:35:26 +02:00
Asger F
a966dff76e Unified: Add more patterns and some fixes to the AST 2026-05-13 10:35:24 +02:00
Asger F
6b58482dfb Yeast: Fix text associated with synthesized nodes 2026-05-13 10:35:22 +02:00
Asger F
2307839050 Yeast: Change how patterns with repetition are parsed 2026-05-13 10:35:21 +02:00
Asger F
92838011dd Unified: Add some more AST nodes and rules 2026-05-13 10:35:19 +02:00
Asger F
5772ee4d9b YEAST: add NodeRef type, YeastDisplay trait, and source text storage
Introduce NodeRef as a typed wrapper around node arena IDs. Captures in
desugaring rules are now bound as NodeRef instead of raw usize, which
prevents accidental misuse and enables source-text-aware rendering.

Add the YeastDisplay trait as an alternative to Display: its
yeast_to_string method receives the Ast, allowing NodeRef to resolve to
the captured node's source text instead of printing a numeric ID.

Store the original source bytes in the Ast so that NodeContent::Range
values (from synthesized literal nodes) can be resolved back to text.

Update yeast-macros to emit NodeRef-typed capture bindings and use
Into::<usize>::into where raw IDs are needed. The #{expr} template
syntax now uses YeastDisplay instead of Display.

The effect is visible in the corpus tests: operator nodes now correctly
render as e.g. operator "+" instead of operator "3".

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-13 10:35:17 +02:00
Asger F
72b683d63c Unified: Add Swift corpus tests
Add corpus test cases for Swift covering closures, collections, control
flow, functions, literals, loops, operators, optionals/errors, types,
and variables. Update existing desugar.txt with raw parse sections.

Note: operator nodes currently render their node ID instead of the actual
operator text (e.g. operator "3" instead of operator "+"). This will be
fixed in the next commit.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-13 10:35:16 +02:00
Asger F
8a2a48d2dd Unified extractor: add AST schema, swift translation rules, and corpus framework
Add ast_types.yml defining the unified output AST schema with supertypes
(expr, stmt, condition, pattern) and named nodes (top_level, binary_expr,
name_expr, etc.).

Rewrite swift translation rules to map from tree-sitter Swift grammar to
the unified AST, using one-shot phase rules.

Update the generator to use the output AST schema for dbscheme/QL
generation, and normalize the extraction table prefix to 'unified'.

Improve the corpus test framework to include raw tree-sitter parse output,
type-error checking against the output schema, and better failure
reporting.

Regenerate Ast.qll, unified.dbscheme, and update BasicTest accordingly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-13 10:35:14 +02:00
Asger F
5d0cb9e805 YEAST: fix one-shot rules for unnamed nodes and self-captures
One-shot desugaring rules now skip unnamed nodes (punctuation, keywords,
etc.) since rules are intended to target named nodes only.

Also prevent infinite recursion when a capture refers to the root node of
the matched tree (e.g. an @_ capture on the pattern root).

Additionally fix the swift.rs add_phase call to match the updated 3-arg
signature introduced by the one-shot phase kind commit.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-13 10:35:12 +02:00
Asger F
f668b99d6d Unified: Add support for tree-sitter-style corpus tests
This adds tests consisting of source code and a printout of its rewritten AST.
2026-05-13 10:35:02 +02:00
Taus
52d72836f9 unified: Fix multiline_comment issue
This named node (which is in fact emitted by the scanner as an
`external`) was appearing as a child of `class_body` because of inlining
via `_class_member_separator`. This, in itself, appears to be somewhat
of a hack, to handle cases where a multiline comment signals the end of
a class member.

To fix this, we make the external node _unnamed_, but keep the `extras`
node _named_ (so we can still extract it from the parse tree), and we
add a new rule `multiline_comment` that mediates between the two. That
way, the use inside `_class_member_separator` can use the unnamed
variant, and no node is pushed into $children.
2026-05-12 15:59:18 +00:00
Taus
eb480d1de4 unified: Make parenthesized_type named
I'm not entirely happy about this solution, but it seemed to be the most
straightforward way of avoiding various kinds of token bleeding.
2026-05-12 15:38:29 +00:00
Taus
2eee2e50dc unified: clean up patterns
Mostly by materialising a bunch of (useful) intermediate nodes.
2026-05-12 15:23:26 +00:00
Taus
2010844b1e unified: Add fields to property_declaration
Not entirely sure about the `binding?` field on `pattern`, but it looks
like that might actually be useful.
2026-05-12 15:14:35 +00:00
Taus
406a02fa49 unified: Add fields to switch_entry
Of note: this involved un-inlining where_clause.
2026-05-12 15:09:02 +00:00
Taus
6e5e650b42 unified: Add fields for macro_declaration 2026-05-12 15:03:29 +00:00
Taus
eba9f35673 unified: Get rid of $children* on key_path_expression
Doing this involved materialising a lot of previously anonymous nodes,
and I'm not entirely sure it's the best solution, but the node types
look decent enough.
2026-05-12 15:01:10 +00:00
Taus
e1a0e204b1 unified: Promote enum_type_parameter to named and add fields 2026-05-12 14:55:43 +00:00
Taus
5e14a7574e unified: make compilation_condition named and add fields 2026-05-12 14:55:42 +00:00
Taus
6ff404a6d0 unified: More miscellaneous field additions 2026-05-12 14:50:01 +00:00
Taus
9902beddec unified: add proper fields for availability_condition 2026-05-12 14:47:58 +00:00
Taus
e6eac3784a unified: Consolidate fields in if_let_binding 2026-05-12 14:43:13 +00:00
Taus
5784ef22f6 unified: Unify more fields
Not entirely happy about the mixed nature of the `kind` filed (having
both tokens and the named node `throw_keyword` in there), but that's a
problem for a different time.
2026-05-12 14:40:17 +00:00
Taus
bc96ae6e47 unified: Add lambda and arguments fields 2026-05-12 14:29:23 +00:00
Taus
15d84b3e53 unified: More $children fixes
Some nodes with a single child (arguably redundant to do, but I think
it's nice to have the types be consistent), and also an instance of
ensuring that all branches of a `choice` expose consistent field names.
2026-05-12 14:15:36 +00:00
Taus
0499932ba0 unified: Fix fields in await_expression
This required a change in a different place, due to aliasing.
2026-05-12 14:10:38 +00:00
Taus
732cc7bee0 unified: Add fields to inheritance specifiers and calls 2026-05-12 14:07:58 +00:00
Taus
d6ef467fba unified: Add more fields
A lot of changes, but for the most part these are just adding named
fields in places where they make sense.

After this, there are still ~20 instances of unnamed children appearing.
2026-05-12 13:59:56 +00:00
Taus
c75d819a92 unified: Add effect field
I ended up also aliasing `_async_keyword` to a named node to make it
more consistent with the other node kinds that can be in this field (as
it would be awkward to have two named types and a token here).

Elsewhere in the node types, we'll still have `async?: "async"`, and I
think that's okay.
2026-05-12 13:46:25 +00:00
Taus
ff5c0b40f1 unified: add supertypes for various kinds of declarations
Hides a bunch of huge unions under (hopefully) sensible supertypes.
2026-05-12 12:57:26 +00:00
Taus
9dddd93460 unified: add field declarations for statements and members
Part 1 of N of "getting rid of $children" in node-types.yml

Note: in one of the cases the affected node still has the $children
field present. This is because there's some weirdness about recording
multiline comments as class member separators that I did not want to
figure out how to address right now.
2026-05-12 12:57:26 +00:00
Taus
2608db9fd9 unified: Prevent field bleed-through from _if_let_binding
Same procedure as before -- we change the anonymous node to a named
node, and the problem magically goes away.
2026-05-12 12:57:25 +00:00
Taus
31386f566c unified: drop element field on _parenthesized_type
Same pattern we've seen many times before: a field on an anonymous node
gets attached to the parent node instead.

I'm not 100% sure this is the right solution, but it seemed wrong to
just make `_parenthesized_type` named instead (we don't usually name
parentheticals). At the very least, this cleans up the spurious
navigation_expression.element and tuple_type_item.element fields.
2026-05-12 12:57:25 +00:00
Taus
994b27bdbd unified: convert _type into a named rule
Because `_type` was anonymous, its body was inlined in all of the places
it appeared. Because this body contained a `name` field, this field was
_also_ inlined. This caused a bunch of nodes to have spurious `name`
fields, and for some of them (that already had such a field) it caused
that field have multiplicity greater than one.

To fix this, we make the `_type` node named, which prevents the errant
field from escaping.
2026-05-12 12:57:25 +00:00
Taus
8b977ef8e1 unified: Get rid of some "." bleed
Adds a new type `nested_type_identifier`, which contains the
choice-branch that previously allowed those tokens to bleed through into
the closest parent field.
2026-05-12 12:57:25 +00:00
Taus
91a46f0340 unified: stop "!" bleeding through
You know the drill. We just make an anonymous node named instead. In
this case, however, we have to be a bit more clever about how to rewrite
it. We turn the sequence of a type followed by an optional ! into a
_choice_ between mere type or type followed by bang (the latter being
our new named node).
2026-05-12 12:57:24 +00:00