mirror of
https://github.com/github/codeql.git
synced 2026-05-14 11:19:27 +02:00
yeast: Add YAML node-types format and converter
Human-friendly YAML alternative to tree-sitter node-types.json with three sections: supertypes, named, unnamed. Supports bidirectional conversion and building Schema objects from YAML. Includes CLI binary (node_types_yaml) and documentation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
241
shared/yeast/doc/node-types-yaml.md
Normal file
241
shared/yeast/doc/node-types-yaml.md
Normal file
@@ -0,0 +1,241 @@
|
||||
# YAML Node Types Format
|
||||
|
||||
The YAML node-types format is a human-friendly alternative to tree-sitter's
|
||||
`node-types.json`. It can be converted to and from JSON using the
|
||||
`node_types_yaml` tool.
|
||||
|
||||
## Overview
|
||||
|
||||
A YAML node-types file has three top-level sections:
|
||||
|
||||
```yaml
|
||||
supertypes:
|
||||
# Abstract union types
|
||||
|
||||
named:
|
||||
# Concrete AST nodes and leaf tokens
|
||||
|
||||
unnamed:
|
||||
# Punctuation and keyword tokens
|
||||
```
|
||||
|
||||
All three sections are optional. If omitted, they default to empty.
|
||||
|
||||
## Supertypes
|
||||
|
||||
Supertypes are abstract groupings of node types (unions). Each supertype maps
|
||||
to a list of its members:
|
||||
|
||||
```yaml
|
||||
supertypes:
|
||||
_expression:
|
||||
- assignment
|
||||
- binary
|
||||
- identifier
|
||||
- call
|
||||
```
|
||||
|
||||
This corresponds to the following JSON:
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "_expression",
|
||||
"named": true,
|
||||
"subtypes": [
|
||||
{ "type": "assignment", "named": true },
|
||||
{ "type": "binary", "named": true },
|
||||
{ "type": "identifier", "named": true },
|
||||
{ "type": "call", "named": true }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Members are resolved as named or unnamed using the
|
||||
[type reference rules](#type-references) described below.
|
||||
|
||||
## Named nodes
|
||||
|
||||
Named nodes are concrete AST node types. Each entry is a node kind mapping to
|
||||
its fields. A node with no fields (a leaf token like `identifier`) uses an
|
||||
empty value:
|
||||
|
||||
```yaml
|
||||
named:
|
||||
identifier:
|
||||
constant:
|
||||
```
|
||||
|
||||
```json
|
||||
{"type": "identifier", "named": true, "fields": {}},
|
||||
{"type": "constant", "named": true, "fields": {}}
|
||||
```
|
||||
|
||||
### Fields
|
||||
|
||||
Each field has a name, a multiplicity suffix, and a list of allowed types.
|
||||
|
||||
| Suffix | Meaning | JSON `multiple` | JSON `required` |
|
||||
| ------ | ------------ | --------------- | --------------- |
|
||||
| (none) | exactly one | `false` | `true` |
|
||||
| `?` | zero or one | `false` | `false` |
|
||||
| `+` | one or more | `true` | `true` |
|
||||
| `*` | zero or more | `true` | `false` |
|
||||
|
||||
Example:
|
||||
|
||||
```yaml
|
||||
named:
|
||||
assignment:
|
||||
left: _lhs
|
||||
right: _expression
|
||||
```
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "assignment",
|
||||
"named": true,
|
||||
"fields": {
|
||||
"left": {
|
||||
"multiple": false,
|
||||
"required": true,
|
||||
"types": [{ "type": "_lhs", "named": true }]
|
||||
},
|
||||
"right": {
|
||||
"multiple": false,
|
||||
"required": true,
|
||||
"types": [{ "type": "_expression", "named": true }]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
A field with multiple allowed types uses a list:
|
||||
|
||||
```yaml
|
||||
named:
|
||||
binary:
|
||||
left: [_expression, _simple_numeric]
|
||||
operator: ["!=", "+", "&&"]
|
||||
right: _expression
|
||||
```
|
||||
|
||||
A singleton list can be written as a bare value (as shown with `right` above).
|
||||
|
||||
### Unnamed children
|
||||
|
||||
Unnamed children (nodes that appear as children without a field name) are
|
||||
specified using the special `$children` field name, with the same suffixes:
|
||||
|
||||
```yaml
|
||||
named:
|
||||
argument_list:
|
||||
$children*: [_expression, block_argument, splat_argument]
|
||||
```
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "argument_list",
|
||||
"named": true,
|
||||
"fields": {},
|
||||
"children": {
|
||||
"multiple": true,
|
||||
"required": false,
|
||||
"types": [
|
||||
{ "type": "_expression", "named": true },
|
||||
{ "type": "block_argument", "named": true },
|
||||
{ "type": "splat_argument", "named": true }
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Unnamed tokens
|
||||
|
||||
Unnamed tokens are punctuation, operators, and keywords that appear in the
|
||||
parse tree but don't have their own AST node type. They are listed as simple
|
||||
strings:
|
||||
|
||||
```yaml
|
||||
unnamed:
|
||||
- "="
|
||||
- "end"
|
||||
- "+"
|
||||
- "&&"
|
||||
```
|
||||
|
||||
```json
|
||||
{"type": "=", "named": false},
|
||||
{"type": "end", "named": false},
|
||||
{"type": "+", "named": false},
|
||||
{"type": "&&", "named": false}
|
||||
```
|
||||
|
||||
When converting to YAML, unnamed tokens are always wrapped in quotes for
|
||||
visual clarity. This is purely cosmetic — YAML treats `end` and `"end"` as
|
||||
the same string.
|
||||
|
||||
## Type references
|
||||
|
||||
When a type name appears in a field's type list or a supertype's member list,
|
||||
it needs to be resolved as either named or unnamed. The rules are:
|
||||
|
||||
1. If the name only appears in `named` or `supertypes`, it is **named**.
|
||||
2. If the name only appears in `unnamed`, it is **unnamed**.
|
||||
3. If the name appears in both, it defaults to **named**.
|
||||
4. To explicitly reference an unnamed type in the ambiguous case, use the
|
||||
map form:
|
||||
|
||||
```yaml
|
||||
named:
|
||||
example:
|
||||
field: { unnamed: foo }
|
||||
```
|
||||
|
||||
In practice, ambiguity is rare — names like `end`, `+`, `if` are almost
|
||||
always only unnamed, while names like `identifier`, `assignment` are only
|
||||
named.
|
||||
|
||||
## Complete example
|
||||
|
||||
```yaml
|
||||
supertypes:
|
||||
_expression:
|
||||
- assignment
|
||||
- binary
|
||||
- identifier
|
||||
|
||||
named:
|
||||
assignment:
|
||||
left: _expression
|
||||
right?: _expression
|
||||
binary:
|
||||
left: [_expression, _simple_numeric]
|
||||
operator: ["!=", "+"]
|
||||
right: _expression
|
||||
argument_list:
|
||||
$children*: [_expression, block_argument]
|
||||
identifier:
|
||||
constant:
|
||||
|
||||
unnamed:
|
||||
- "!="
|
||||
- "+"
|
||||
- "="
|
||||
- "end"
|
||||
```
|
||||
|
||||
## CLI usage
|
||||
|
||||
Convert YAML to JSON:
|
||||
|
||||
```
|
||||
node_types_yaml input.yaml > node-types.json
|
||||
```
|
||||
|
||||
Convert JSON to YAML:
|
||||
|
||||
```
|
||||
node_types_yaml --from-json node-types.json > node-types.yaml
|
||||
```
|
||||
|
||||
Both commands also accept input from stdin if no file argument is given.
|
||||
51
shared/yeast/src/bin/node_types_yaml.rs
Normal file
51
shared/yeast/src/bin/node_types_yaml.rs
Normal file
@@ -0,0 +1,51 @@
|
||||
use clap::Parser;
|
||||
use std::io::Read;
|
||||
|
||||
#[derive(Parser)]
|
||||
#[clap(
|
||||
name = "node-types-yaml",
|
||||
about = "Convert between YAML and JSON node-types formats"
|
||||
)]
|
||||
struct Cli {
|
||||
/// Input file (reads from stdin if not provided)
|
||||
input: Option<String>,
|
||||
|
||||
/// Convert from JSON to YAML (default is YAML to JSON)
|
||||
#[arg(long)]
|
||||
from_json: bool,
|
||||
}
|
||||
|
||||
fn main() {
|
||||
let args = Cli::parse();
|
||||
|
||||
let input = match &args.input {
|
||||
Some(path) => std::fs::read_to_string(path).unwrap_or_else(|e| {
|
||||
eprintln!("Error reading {path}: {e}");
|
||||
std::process::exit(1);
|
||||
}),
|
||||
None => {
|
||||
let mut buf = String::new();
|
||||
std::io::stdin()
|
||||
.read_to_string(&mut buf)
|
||||
.unwrap_or_else(|e| {
|
||||
eprintln!("Error reading stdin: {e}");
|
||||
std::process::exit(1);
|
||||
});
|
||||
buf
|
||||
}
|
||||
};
|
||||
|
||||
let result = if args.from_json {
|
||||
yeast::node_types_yaml::convert_from_json(&input)
|
||||
} else {
|
||||
yeast::node_types_yaml::convert(&input)
|
||||
};
|
||||
|
||||
match result {
|
||||
Ok(output) => print!("{output}"),
|
||||
Err(e) => {
|
||||
eprintln!("Error: {e}");
|
||||
std::process::exit(1);
|
||||
}
|
||||
}
|
||||
}
|
||||
722
shared/yeast/src/node_types_yaml.rs
Normal file
722
shared/yeast/src/node_types_yaml.rs
Normal file
@@ -0,0 +1,722 @@
|
||||
/// Converts a YAML node-types file to the tree-sitter `node-types.json` format.
|
||||
///
|
||||
/// # YAML format
|
||||
///
|
||||
/// ```yaml
|
||||
/// supertypes:
|
||||
/// _expression:
|
||||
/// - assignment
|
||||
/// - binary
|
||||
///
|
||||
/// named:
|
||||
/// assignment:
|
||||
/// left: _lhs
|
||||
/// right: _expression
|
||||
/// identifier:
|
||||
///
|
||||
/// unnamed:
|
||||
/// - "+"
|
||||
/// - "end"
|
||||
/// ```
|
||||
///
|
||||
/// See the crate-level docs for the full format specification.
|
||||
use std::collections::{BTreeMap, BTreeSet};
|
||||
use std::fmt::Write;
|
||||
|
||||
use serde::Deserialize;
|
||||
use serde_json::json;
|
||||
|
||||
/// Top-level YAML structure.
|
||||
#[derive(Deserialize, Default)]
|
||||
struct YamlNodeTypes {
|
||||
#[serde(default)]
|
||||
supertypes: BTreeMap<String, Vec<TypeRef>>,
|
||||
#[serde(default)]
|
||||
named: BTreeMap<String, Option<BTreeMap<String, TypeRefOrList>>>,
|
||||
#[serde(default)]
|
||||
unnamed: Vec<String>,
|
||||
}
|
||||
|
||||
/// A reference to a node type. Can be:
|
||||
/// - a plain string (resolved by looking up named vs unnamed)
|
||||
/// - a map `{unnamed: "name"}` to force unnamed interpretation
|
||||
#[derive(Deserialize, Debug, Clone)]
|
||||
#[serde(untagged)]
|
||||
enum TypeRef {
|
||||
Name(String),
|
||||
Explicit { unnamed: String },
|
||||
}
|
||||
|
||||
/// A field value: either a single type ref or a list of them.
|
||||
#[derive(Deserialize, Debug, Clone)]
|
||||
#[serde(untagged)]
|
||||
enum TypeRefOrList {
|
||||
Single(TypeRef),
|
||||
List(Vec<TypeRef>),
|
||||
}
|
||||
|
||||
impl TypeRefOrList {
|
||||
fn into_vec(self) -> Vec<TypeRef> {
|
||||
match self {
|
||||
TypeRefOrList::Single(t) => vec![t],
|
||||
TypeRefOrList::List(v) => v,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Parsed field name: base name + multiplicity markers.
|
||||
struct FieldSpec {
|
||||
name: Option<String>, // None for $children
|
||||
multiple: bool,
|
||||
required: bool,
|
||||
}
|
||||
|
||||
fn parse_field_name(raw: &str) -> FieldSpec {
|
||||
let is_children =
|
||||
raw == "$children" || raw == "$children?" || raw == "$children*" || raw == "$children+";
|
||||
|
||||
let suffix = raw.chars().last().filter(|c| matches!(c, '?' | '*' | '+'));
|
||||
|
||||
let (multiple, required) = match suffix {
|
||||
Some('?') => (false, false),
|
||||
Some('*') => (true, false),
|
||||
Some('+') => (true, true),
|
||||
_ => (false, true), // bare field name = required, single
|
||||
};
|
||||
|
||||
let name = if is_children {
|
||||
None
|
||||
} else {
|
||||
let base = raw.trim_end_matches(['?', '*', '+']);
|
||||
Some(base.to_string())
|
||||
};
|
||||
|
||||
FieldSpec {
|
||||
name,
|
||||
multiple,
|
||||
required,
|
||||
}
|
||||
}
|
||||
|
||||
/// Resolve a TypeRef to a (type, named) pair, given the sets of known named
|
||||
/// and unnamed types.
|
||||
fn resolve_type_ref(
|
||||
type_ref: &TypeRef,
|
||||
named_types: &BTreeSet<String>,
|
||||
unnamed_types: &BTreeSet<String>,
|
||||
) -> serde_json::Value {
|
||||
match type_ref {
|
||||
TypeRef::Explicit { unnamed } => {
|
||||
json!({"type": unnamed, "named": false})
|
||||
}
|
||||
TypeRef::Name(name) => {
|
||||
let is_named = named_types.contains(name);
|
||||
let is_unnamed = unnamed_types.contains(name);
|
||||
|
||||
if is_named && is_unnamed {
|
||||
// Ambiguous: default to named
|
||||
json!({"type": name, "named": true})
|
||||
} else if is_unnamed {
|
||||
json!({"type": name, "named": false})
|
||||
} else {
|
||||
// Named, or unknown (assume named)
|
||||
json!({"type": name, "named": true})
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Convert YAML string to node-types JSON string.
|
||||
pub fn convert(yaml_input: &str) -> Result<String, String> {
|
||||
let yaml: YamlNodeTypes =
|
||||
serde_yaml::from_str(yaml_input).map_err(|e| format!("Failed to parse YAML: {e}"))?;
|
||||
|
||||
// Build the sets of known named and unnamed types for resolution.
|
||||
let mut named_types = BTreeSet::new();
|
||||
for name in yaml.supertypes.keys() {
|
||||
named_types.insert(name.clone());
|
||||
}
|
||||
for name in yaml.named.keys() {
|
||||
named_types.insert(name.clone());
|
||||
}
|
||||
let unnamed_types: BTreeSet<String> = yaml.unnamed.iter().cloned().collect();
|
||||
|
||||
let mut output = Vec::new();
|
||||
|
||||
// 1. Supertypes
|
||||
for (name, members) in &yaml.supertypes {
|
||||
let subtypes: Vec<_> = members
|
||||
.iter()
|
||||
.map(|m| resolve_type_ref(m, &named_types, &unnamed_types))
|
||||
.collect();
|
||||
output.push(json!({
|
||||
"type": name,
|
||||
"named": true,
|
||||
"subtypes": subtypes,
|
||||
}));
|
||||
}
|
||||
|
||||
// 2. Named nodes
|
||||
for (name, fields_opt) in &yaml.named {
|
||||
let fields_map = match fields_opt {
|
||||
None => {
|
||||
// Leaf token: no fields, no children, no subtypes
|
||||
output.push(json!({
|
||||
"type": name,
|
||||
"named": true,
|
||||
"fields": {},
|
||||
}));
|
||||
continue;
|
||||
}
|
||||
Some(m) if m.is_empty() => {
|
||||
output.push(json!({
|
||||
"type": name,
|
||||
"named": true,
|
||||
"fields": {},
|
||||
}));
|
||||
continue;
|
||||
}
|
||||
Some(m) => m,
|
||||
};
|
||||
|
||||
let mut json_fields = serde_json::Map::new();
|
||||
let mut json_children: Option<serde_json::Value> = None;
|
||||
|
||||
for (raw_field_name, type_refs) in fields_map {
|
||||
let spec = parse_field_name(raw_field_name);
|
||||
let types: Vec<_> = type_refs
|
||||
.clone()
|
||||
.into_vec()
|
||||
.iter()
|
||||
.map(|t| resolve_type_ref(t, &named_types, &unnamed_types))
|
||||
.collect();
|
||||
|
||||
// Cloning to make the borrow checker happy
|
||||
let field_info = json!({
|
||||
"multiple": spec.multiple,
|
||||
"required": spec.required,
|
||||
"types": types,
|
||||
});
|
||||
|
||||
if spec.name.is_none() {
|
||||
// $children
|
||||
json_children = Some(field_info);
|
||||
} else {
|
||||
json_fields.insert(spec.name.unwrap(), field_info);
|
||||
}
|
||||
}
|
||||
|
||||
let mut entry = json!({
|
||||
"type": name,
|
||||
"named": true,
|
||||
"fields": json_fields,
|
||||
});
|
||||
|
||||
if let Some(children) = json_children {
|
||||
entry
|
||||
.as_object_mut()
|
||||
.unwrap()
|
||||
.insert("children".to_string(), children);
|
||||
}
|
||||
|
||||
output.push(entry);
|
||||
}
|
||||
|
||||
// 3. Unnamed tokens
|
||||
for name in &yaml.unnamed {
|
||||
output.push(json!({
|
||||
"type": name,
|
||||
"named": false,
|
||||
}));
|
||||
}
|
||||
|
||||
serde_json::to_string_pretty(&output).map_err(|e| format!("Failed to serialize JSON: {e}"))
|
||||
}
|
||||
|
||||
/// Build a Schema from a YAML node-types string.
|
||||
/// Registers all node kinds and field names found in the YAML.
|
||||
pub fn schema_from_yaml(yaml_input: &str) -> Result<crate::schema::Schema, String> {
|
||||
let yaml: YamlNodeTypes =
|
||||
serde_yaml::from_str(yaml_input).map_err(|e| format!("Failed to parse YAML: {e}"))?;
|
||||
|
||||
let mut schema = crate::schema::Schema::new();
|
||||
|
||||
// Register all supertypes as node kinds
|
||||
for name in yaml.supertypes.keys() {
|
||||
schema.register_kind(name);
|
||||
}
|
||||
|
||||
// Register named node kinds and their fields
|
||||
for (name, fields_opt) in &yaml.named {
|
||||
schema.register_kind(name);
|
||||
if let Some(fields) = fields_opt {
|
||||
for raw_field_name in fields.keys() {
|
||||
let spec = parse_field_name(raw_field_name);
|
||||
if let Some(field_name) = &spec.name {
|
||||
schema.register_field(field_name);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Register unnamed tokens as node kinds
|
||||
for name in &yaml.unnamed {
|
||||
schema.register_unnamed_kind(name);
|
||||
}
|
||||
|
||||
Ok(schema)
|
||||
}
|
||||
|
||||
/// Build a Schema from a YAML string, extending a tree-sitter Language.
|
||||
/// The Schema inherits all field/kind names from the Language, plus any
|
||||
/// additional ones defined in the YAML.
|
||||
pub fn schema_from_yaml_with_language(
|
||||
yaml_input: &str,
|
||||
language: &tree_sitter::Language,
|
||||
) -> Result<crate::schema::Schema, String> {
|
||||
let yaml: YamlNodeTypes =
|
||||
serde_yaml::from_str(yaml_input).map_err(|e| format!("Failed to parse YAML: {e}"))?;
|
||||
|
||||
let mut schema = crate::schema::Schema::from_language(language);
|
||||
|
||||
// Register supertypes
|
||||
for name in yaml.supertypes.keys() {
|
||||
schema.register_kind(name);
|
||||
}
|
||||
|
||||
// Register named node kinds and their fields
|
||||
for (name, fields_opt) in &yaml.named {
|
||||
schema.register_kind(name);
|
||||
if let Some(fields) = fields_opt {
|
||||
for raw_field_name in fields.keys() {
|
||||
let spec = parse_field_name(raw_field_name);
|
||||
if let Some(field_name) = &spec.name {
|
||||
schema.register_field(field_name);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Register unnamed tokens
|
||||
for name in &yaml.unnamed {
|
||||
schema.register_unnamed_kind(name);
|
||||
}
|
||||
|
||||
Ok(schema)
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// JSON → YAML conversion
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/// JSON node-types structures (mirrors tree-sitter's format).
|
||||
#[derive(Deserialize)]
|
||||
struct JsonNodeInfo {
|
||||
#[serde(rename = "type")]
|
||||
kind: String,
|
||||
named: bool,
|
||||
#[serde(default)]
|
||||
fields: BTreeMap<String, JsonFieldInfo>,
|
||||
children: Option<JsonFieldInfo>,
|
||||
#[serde(default)]
|
||||
subtypes: Vec<JsonNodeType>,
|
||||
}
|
||||
|
||||
#[derive(Deserialize)]
|
||||
struct JsonNodeType {
|
||||
#[serde(rename = "type")]
|
||||
kind: String,
|
||||
named: bool,
|
||||
}
|
||||
|
||||
#[derive(Deserialize)]
|
||||
struct JsonFieldInfo {
|
||||
multiple: bool,
|
||||
required: bool,
|
||||
types: Vec<JsonNodeType>,
|
||||
}
|
||||
|
||||
/// Convert a tree-sitter node-types.json string to the YAML format.
|
||||
pub fn convert_from_json(json_input: &str) -> Result<String, String> {
|
||||
let nodes: Vec<JsonNodeInfo> =
|
||||
serde_json::from_str(json_input).map_err(|e| format!("Failed to parse JSON: {e}"))?;
|
||||
|
||||
// Collect all named and unnamed types for disambiguation decisions.
|
||||
let mut all_named: BTreeSet<String> = BTreeSet::new();
|
||||
let mut all_unnamed: BTreeSet<String> = BTreeSet::new();
|
||||
for node in &nodes {
|
||||
if node.named {
|
||||
all_named.insert(node.kind.clone());
|
||||
} else {
|
||||
all_unnamed.insert(node.kind.clone());
|
||||
}
|
||||
}
|
||||
|
||||
let mut supertypes: BTreeMap<String, Vec<JsonNodeType>> = BTreeMap::new();
|
||||
let mut named: BTreeMap<String, Option<BTreeMap<String, JsonFieldInfo>>> = BTreeMap::new();
|
||||
let mut unnamed: Vec<String> = Vec::new();
|
||||
|
||||
for node in nodes {
|
||||
if !node.named {
|
||||
unnamed.push(node.kind);
|
||||
continue;
|
||||
}
|
||||
|
||||
if !node.subtypes.is_empty() {
|
||||
supertypes.insert(node.kind, node.subtypes);
|
||||
continue;
|
||||
}
|
||||
|
||||
if node.fields.is_empty() && node.children.is_none() {
|
||||
// Leaf token
|
||||
named.insert(node.kind, None);
|
||||
} else {
|
||||
let mut fields = BTreeMap::new();
|
||||
for (name, info) in node.fields {
|
||||
fields.insert(name, info);
|
||||
}
|
||||
if let Some(children) = node.children {
|
||||
fields.insert("$children".to_string(), children);
|
||||
}
|
||||
named.insert(node.kind, Some(fields));
|
||||
}
|
||||
}
|
||||
|
||||
// Now emit YAML
|
||||
let mut out = String::new();
|
||||
|
||||
// Supertypes
|
||||
if !supertypes.is_empty() {
|
||||
writeln!(out, "supertypes:").unwrap();
|
||||
for (name, members) in &supertypes {
|
||||
writeln!(out, " {name}:").unwrap();
|
||||
for member in members {
|
||||
let ref_str = format_type_ref(&member.kind, member.named, &all_named, &all_unnamed);
|
||||
writeln!(out, " - {ref_str}").unwrap();
|
||||
}
|
||||
}
|
||||
writeln!(out).unwrap();
|
||||
}
|
||||
|
||||
// Named
|
||||
if !named.is_empty() {
|
||||
writeln!(out, "named:").unwrap();
|
||||
for (name, fields_opt) in &named {
|
||||
match fields_opt {
|
||||
None => {
|
||||
writeln!(out, " {name}:").unwrap();
|
||||
}
|
||||
Some(fields) => {
|
||||
writeln!(out, " {name}:").unwrap();
|
||||
for (field_name, info) in fields {
|
||||
let suffix = field_suffix(info.multiple, info.required);
|
||||
let yaml_name = if field_name == "$children" {
|
||||
format!("$children{suffix}")
|
||||
} else {
|
||||
format!("{field_name}{suffix}")
|
||||
};
|
||||
|
||||
let type_refs: Vec<String> = info
|
||||
.types
|
||||
.iter()
|
||||
.map(|t| format_type_ref(&t.kind, t.named, &all_named, &all_unnamed))
|
||||
.collect();
|
||||
|
||||
if type_refs.len() == 1 {
|
||||
writeln!(out, " {yaml_name}: {}", type_refs[0]).unwrap();
|
||||
} else {
|
||||
let list = type_refs
|
||||
.iter()
|
||||
.map(|s| s.as_str())
|
||||
.collect::<Vec<_>>()
|
||||
.join(", ");
|
||||
writeln!(out, " {yaml_name}: [{list}]").unwrap();
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
writeln!(out).unwrap();
|
||||
}
|
||||
|
||||
// Unnamed
|
||||
if !unnamed.is_empty() {
|
||||
writeln!(out, "unnamed:").unwrap();
|
||||
for name in &unnamed {
|
||||
writeln!(out, " - {}", force_quote(name)).unwrap();
|
||||
}
|
||||
}
|
||||
|
||||
Ok(out)
|
||||
}
|
||||
|
||||
fn field_suffix(multiple: bool, required: bool) -> &'static str {
|
||||
match (multiple, required) {
|
||||
(false, true) => "",
|
||||
(false, false) => "?",
|
||||
(true, true) => "+",
|
||||
(true, false) => "*",
|
||||
}
|
||||
}
|
||||
|
||||
/// Format a type reference for YAML output. Uses the disambiguation rule:
|
||||
/// plain string if unambiguous, `{unnamed: name}` if the name exists as both
|
||||
/// named and unnamed and we need the unnamed interpretation.
|
||||
fn format_type_ref(
|
||||
kind: &str,
|
||||
named: bool,
|
||||
all_named: &BTreeSet<String>,
|
||||
_all_unnamed: &BTreeSet<String>,
|
||||
) -> String {
|
||||
if named {
|
||||
quote_yaml(kind)
|
||||
} else {
|
||||
let is_also_named = all_named.contains(kind);
|
||||
if is_also_named {
|
||||
format!("{{unnamed: {}}}", force_quote(kind))
|
||||
} else {
|
||||
force_quote(kind)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Always wrap in double quotes. Used for unnamed node references so they're
|
||||
/// visually distinct from named ones — YAML treats both forms as equivalent strings.
|
||||
fn force_quote(s: &str) -> String {
|
||||
format!("\"{}\"", s.replace('\\', "\\\\").replace('"', "\\\""))
|
||||
}
|
||||
|
||||
/// Quote a YAML string value if it contains special characters or could be
|
||||
/// misinterpreted.
|
||||
fn quote_yaml(s: &str) -> String {
|
||||
let needs_quoting = s.is_empty()
|
||||
|| s.contains(|c: char| {
|
||||
matches!(
|
||||
c,
|
||||
':' | '{'
|
||||
| '}'
|
||||
| '['
|
||||
| ']'
|
||||
| ','
|
||||
| '&'
|
||||
| '*'
|
||||
| '#'
|
||||
| '?'
|
||||
| '|'
|
||||
| '-'
|
||||
| '<'
|
||||
| '>'
|
||||
| '='
|
||||
| '!'
|
||||
| '%'
|
||||
| '@'
|
||||
| '`'
|
||||
| '"'
|
||||
| '\''
|
||||
)
|
||||
})
|
||||
|| s.starts_with(' ')
|
||||
|| s.ends_with(' ')
|
||||
|| s == "true"
|
||||
|| s == "false"
|
||||
|| s == "null"
|
||||
|| s == "yes"
|
||||
|| s == "no"
|
||||
|| s.parse::<f64>().is_ok();
|
||||
|
||||
if needs_quoting {
|
||||
format!("\"{}\"", s.replace('\\', "\\\\").replace('"', "\\\""))
|
||||
} else {
|
||||
s.to_string()
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn test_basic_conversion() {
|
||||
let yaml = r#"
|
||||
supertypes:
|
||||
_expression:
|
||||
- assignment
|
||||
- binary
|
||||
|
||||
named:
|
||||
assignment:
|
||||
left: _lhs
|
||||
right: _expression
|
||||
binary:
|
||||
left: [_expression, _simple_numeric]
|
||||
operator: ["!=", "+"]
|
||||
right: _expression
|
||||
argument_list:
|
||||
$children*: [_expression, block_argument]
|
||||
identifier:
|
||||
|
||||
unnamed:
|
||||
- "!="
|
||||
- "+"
|
||||
- "end"
|
||||
"#;
|
||||
|
||||
let json_str = convert(yaml).unwrap();
|
||||
let result: Vec<serde_json::Value> = serde_json::from_str(&json_str).unwrap();
|
||||
|
||||
// Check supertype
|
||||
let expr = &result[0];
|
||||
assert_eq!(expr["type"], "_expression");
|
||||
assert_eq!(expr["named"], true);
|
||||
assert_eq!(expr["subtypes"].as_array().unwrap().len(), 2);
|
||||
|
||||
// Check assignment
|
||||
let assign = result.iter().find(|n| n["type"] == "assignment").unwrap();
|
||||
assert_eq!(assign["fields"]["left"]["required"], true);
|
||||
assert_eq!(assign["fields"]["left"]["multiple"], false);
|
||||
assert_eq!(assign["fields"]["left"]["types"][0]["type"], "_lhs");
|
||||
assert_eq!(assign["fields"]["left"]["types"][0]["named"], true);
|
||||
|
||||
// Check binary.operator — "!=" and "+" should resolve to unnamed
|
||||
let binary = result.iter().find(|n| n["type"] == "binary").unwrap();
|
||||
let op_types = binary["fields"]["operator"]["types"].as_array().unwrap();
|
||||
assert_eq!(op_types[0]["type"], "!=");
|
||||
assert_eq!(op_types[0]["named"], false);
|
||||
assert_eq!(op_types[1]["type"], "+");
|
||||
assert_eq!(op_types[1]["named"], false);
|
||||
|
||||
// Check argument_list has children, not a field
|
||||
let arg_list = result
|
||||
.iter()
|
||||
.find(|n| n["type"] == "argument_list")
|
||||
.unwrap();
|
||||
assert!(arg_list.get("children").is_some());
|
||||
assert_eq!(arg_list["children"]["multiple"], true);
|
||||
assert_eq!(arg_list["children"]["required"], false);
|
||||
|
||||
// Check identifier is a leaf
|
||||
let ident = result.iter().find(|n| n["type"] == "identifier").unwrap();
|
||||
assert_eq!(ident["fields"].as_object().unwrap().len(), 0);
|
||||
|
||||
// Check unnamed tokens
|
||||
let end = result.iter().find(|n| n["type"] == "end").unwrap();
|
||||
assert_eq!(end["named"], false);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_explicit_unnamed_disambiguation() {
|
||||
let yaml = r#"
|
||||
named:
|
||||
foo:
|
||||
field: [{unnamed: bar}]
|
||||
|
||||
unnamed:
|
||||
- bar
|
||||
"#;
|
||||
|
||||
let json_str = convert(yaml).unwrap();
|
||||
let result: Vec<serde_json::Value> = serde_json::from_str(&json_str).unwrap();
|
||||
let foo = result.iter().find(|n| n["type"] == "foo").unwrap();
|
||||
assert_eq!(foo["fields"]["field"]["types"][0]["named"], false);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_field_suffixes() {
|
||||
let yaml = r#"
|
||||
named:
|
||||
test_node:
|
||||
required_single: foo
|
||||
optional_single?: foo
|
||||
required_multiple+: foo
|
||||
optional_multiple*: foo
|
||||
"#;
|
||||
|
||||
let json_str = convert(yaml).unwrap();
|
||||
let result: Vec<serde_json::Value> = serde_json::from_str(&json_str).unwrap();
|
||||
let node = result.iter().find(|n| n["type"] == "test_node").unwrap();
|
||||
let fields = node["fields"].as_object().unwrap();
|
||||
|
||||
assert_eq!(fields["required_single"]["required"], true);
|
||||
assert_eq!(fields["required_single"]["multiple"], false);
|
||||
|
||||
assert_eq!(fields["optional_single"]["required"], false);
|
||||
assert_eq!(fields["optional_single"]["multiple"], false);
|
||||
|
||||
assert_eq!(fields["required_multiple"]["required"], true);
|
||||
assert_eq!(fields["required_multiple"]["multiple"], true);
|
||||
|
||||
assert_eq!(fields["optional_multiple"]["required"], false);
|
||||
assert_eq!(fields["optional_multiple"]["multiple"], true);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_json_to_yaml() {
|
||||
let json = r#"[
|
||||
{"type": "_expression", "named": true, "subtypes": [
|
||||
{"type": "assignment", "named": true},
|
||||
{"type": "identifier", "named": true}
|
||||
]},
|
||||
{"type": "assignment", "named": true, "fields": {
|
||||
"left": {"multiple": false, "required": true, "types": [
|
||||
{"type": "_expression", "named": true}
|
||||
]},
|
||||
"right": {"multiple": false, "required": false, "types": [
|
||||
{"type": "_expression", "named": true}
|
||||
]}
|
||||
}, "children": {
|
||||
"multiple": true, "required": false, "types": [
|
||||
{"type": "identifier", "named": true}
|
||||
]
|
||||
}},
|
||||
{"type": "identifier", "named": true, "fields": {}},
|
||||
{"type": "=", "named": false},
|
||||
{"type": "end", "named": false}
|
||||
]"#;
|
||||
|
||||
let yaml = convert_from_json(json).unwrap();
|
||||
|
||||
// Verify key structures are present
|
||||
assert!(yaml.contains("supertypes:"));
|
||||
assert!(yaml.contains("_expression:"));
|
||||
assert!(yaml.contains("named:"));
|
||||
assert!(yaml.contains("assignment:"));
|
||||
assert!(yaml.contains("left:"));
|
||||
assert!(yaml.contains("right?:"));
|
||||
assert!(yaml.contains("$children*:"));
|
||||
assert!(yaml.contains("identifier:"));
|
||||
assert!(yaml.contains("unnamed:"));
|
||||
assert!(yaml.contains("\"=\""));
|
||||
assert!(yaml.contains("end"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_round_trip() {
|
||||
let yaml_input = r#"
|
||||
supertypes:
|
||||
_expression:
|
||||
- assignment
|
||||
- identifier
|
||||
|
||||
named:
|
||||
assignment:
|
||||
left: _expression
|
||||
right?: _expression
|
||||
$children*: identifier
|
||||
identifier:
|
||||
|
||||
unnamed:
|
||||
- "="
|
||||
- end
|
||||
"#;
|
||||
|
||||
// YAML → JSON → YAML
|
||||
let json = convert(yaml_input).unwrap();
|
||||
let yaml_output = convert_from_json(&json).unwrap();
|
||||
// YAML → JSON again (should be identical)
|
||||
let json2 = convert(&yaml_output).unwrap();
|
||||
|
||||
let v1: serde_json::Value = serde_json::from_str(&json).unwrap();
|
||||
let v2: serde_json::Value = serde_json::from_str(&json2).unwrap();
|
||||
assert_eq!(v1, v2);
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user