codeql

mirror of https://github.com/github/codeql.git synced 2026-07-04 19:15:29 +02:00

Author	SHA1	Message	Date
Taus	807bb51df7	yeast: Unify `Node::kind()` and `Node::kind_name()` Both accessors returned the same private `kind_name: &'static str` field; `kind_name()` is widely used (mainly by dump.rs and schema diagnostics) and `kind()` had only 2 internal callers in lib.rs and a handful in tests. Pick the more descriptive name and update the callers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-06-29 10:34:36 +00:00
Taus	cc3c232631	yeast: Replace `{..expr}` splice syntax with trait-dispatched `{expr}` In the initial implementation of yeast, the splice syntax was needed do distinguish between splicing multiple nodes or just a single node. However, this was always an ugly "wart" in the syntax, since the user shouldn't have to worry about these things. To fix this, we add an `IntoFieldIds` trait that dispatches on the value's type: `Id` pushes a single id, and a blanket impl for `IntoIterator<Item: Into<Id>>` handles `Vec<Id>`, `Option<Id>`, and arbitrary iterator chains. With this, we no longer need to use the special splice syntax, and hence we can get rid of it.	2026-06-29 10:34:35 +00:00
Taus	9a5cc3c5e3	yeast: Make `Id` a newtype, delete `NodeRef` Previously, the `Id` type was a bare usize alias. The `NodeRef` newtype existed solely to carry the AST-aware `YeastDisplay` / `YeastSourceRange` impls (so that `#{captured_node}` rendered source text rather than the numeric id) without colliding with the impls for raw integer types. This commit promotes `Id` itself to a (transparent) newtype struct and moves the AST-aware trait impls directly onto it. With `Id` and `usize` now being different types, the integer-display impl (for `usize`) and the source-text impl (for `Id`) coexist without conflict, and `NodeRef` becomes redundant (and so we remove it).	2026-06-29 10:33:32 +00:00
Taus	70ca7af04c	Address PR review comments - unified/swift: Mark `binding_kind` as a raw `@@` capture in the property_declaration rule. It is only used to read its source text (`ctx.ast.source_text`), never as a translated node. With `@` the auto-translate prefix would route the unnamed `let`/`var` token through the catch-all `_ @node => {node}` fallback for a no-op roundtrip; `@@` makes the intent explicit and removes that reliance. - shared/yeast/tests: Reword a stale comment in test_raw_capture_marker. The text claimed a "second assertion" exists in this test, but the explicit-translation check actually lives in the companion test_raw_capture_marker_explicit_translate. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-06-26 13:30:01 +00:00
Taus	eb7f8cc43d	yeast: Add `@@name` raw-capture syntax to `rule!` The `@@name` capture marker in `rule!` queries skips the auto-translate prefix for that specific capture, letting the body see the original capture (and thus delay its translation using `ctx.translate` until it becomes convenient). Regular `@name` captures continue to be auto-translated as before. Specifically these are translated _eagerly_, before the main body of the rewrite rule is run. I settled on `@@` as the syntax because it did not add new symbols that the user has to keep track of (it's still a kind of capture), but it's still visually distinct enough that the user should be able to tell that there's something special going on. In principle one could accidentally write one form of capture where the other was intended, but in practice this would result in code that did not compile (because the types would not match).	2026-06-26 12:07:21 +00:00
Taus	af7ae8c4cb	Apply rustfmt Format the touched Rust crates (shared/tree-sitter-extractor, shared/yeast, shared/yeast-macros, unified/extractor) so the tree-sitter-extractor CI fmt check passes. No functional changes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-06-25 17:28:24 +02:00
Taus	e0fa6cf785	yeast: Reify the context and allow user-defined data in it Renames what was previously called `__yeast_ctx` into just `ctx`, and adds a new field `user_ctx` to this context. Said field can contain a struct of any user type (necessitating making various parts of the implementation generic in said type). Through some Deref magic, field accesses are delegated to the inner struct (assuming they are not already defined on `ctx`), which should hopefully make the interface a bit more ergonomic.	2026-06-25 17:28:24 +02:00
Asger F	6c74cd31e4	Yeast: use child locations instead of rule target Previously, when a node was synthesized it would always take the location from the node that matched the current rule. This resulted in overly broad locations however. For (foo #{bar}) we now take the location of the 'bar' node. For non-leaf nodes we merge all its child node locations.	2026-06-18 14:26:30 +02:00
Asger F	ddc9516e92	Yeast: better support for rewriting unnamed nodes - Ensure the full wildcard _ supports quantifiers - Also rewrite unnamed nodes in one-shot phases	2026-06-15 10:49:31 +02:00
Asger F	1ecdc3614f	Yeast: Fix matching against extras like comments	2026-06-01 14:18:37 +02:00
Asger F	5772ee4d9b	YEAST: add NodeRef type, YeastDisplay trait, and source text storage Introduce NodeRef as a typed wrapper around node arena IDs. Captures in desugaring rules are now bound as NodeRef instead of raw usize, which prevents accidental misuse and enables source-text-aware rendering. Add the YeastDisplay trait as an alternative to Display: its yeast_to_string method receives the Ast, allowing NodeRef to resolve to the captured node's source text instead of printing a numeric ID. Store the original source bytes in the Ast so that NodeContent::Range values (from synthesized literal nodes) can be resolved back to text. Update yeast-macros to emit NodeRef-typed capture bindings and use Into::<usize>::into where raw IDs are needed. The #{expr} template syntax now uses YeastDisplay instead of Display. The effect is visible in the corpus tests: operator nodes now correctly render as e.g. operator "+" instead of operator "3". Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-13 10:35:17 +02:00
Asger F	c3a9218dcf	Yeast: Add one-shot phase kind	2026-05-13 10:35:09 +02:00
Asger F	a049850c51	Yeast: add type-checking errors in AST dump	2026-05-13 10:35:07 +02:00
Asger F	49f19092fb	Yeast: add reachable_node_ids()	2026-05-13 10:35:05 +02:00
Taus	7bd27b83e0	yeast: Mutate parent fields in place; remove redundant Node::id apply_rules_inner used to handle the "child was rewritten, so the parent needs new field IDs" case by cloning the parent node, swapping in the new fields, pushing the clone onto the arena, and returning the new Id. Every ancestor on the path from the rewrite up to the root was duplicated this way, with the originals retained as garbage in the arena. Switch to in-place mutation: assign `ast.nodes[id].fields = new_fields` and return the same Id. Rule firings still produce genuinely new nodes via BuildCtx (their structure differs from the input), but the ancestor-rebuild spine no longer copies anything. This is safe because apply_rules_inner already works entirely by Id: the field snapshot is cloned out before recursing, no &Node references are held across mutations of the arena, and captures are scoped to a single rule firing so the now-stable Ids do not break anything. Memory effect: a desugaring pass that rewrites R leaves of a tree of average depth d previously appended R*d ancestor clones to the arena. Now appends 0. With Ids stable for the lifetime of an Ast, the Node::id field becomes truly redundant and is removed (along with the Node::id() accessor). AstCursor switches from caching `node: &Node` to tracking `node_id: Id` and looking the node up via the arena on each access; ChildrenIter now yields Ids directly. A new AstCursor::node_id() method gives callers access to the cursor position by Id.	2026-05-08 12:47:22 +00:00
Taus	5a4dee50f7	Merge pull request #21810 from github/tausbn/yeast-forward-scan-queries yeast: Align query semantics more closely with tree-sitter	2026-05-08 13:30:43 +02:00
Taus	af6e921da5	yeast: Forward-scan bare child patterns instead of strict positional Previously, a bare child pattern in a query took whatever the next child of the iterator was and either matched or failed: it would not scan ahead to find a match. So `(foo ("baz"))` against a `foo` whose implicit `child` field was `["bar", "baz"]` would fail (the pattern took "bar" first). Switch to forward-scan semantics: a SingleNode matcher advances through the iterator until it finds a child that matches its sub-query. Patterns that are named-only continue to skip past unnamed children for free. Order is preserved across multiple bare patterns at the same level — each pattern advances the shared iterator past whatever it consumed — so a query cannot match children out of source order. Captures from a failed match attempt are rolled back via a snapshot, so partial captures from a complex sub-query do not leak across attempts. Add two regression tests against the `do` body wrapper in a Ruby for-loop, whose implicit `child` field contains [do, identifier, end]: - a query for ("end") matches by skipping past `do` and the identifier - a query for ("end") then ("do") fails, demonstrating order preservation	2026-05-07 15:08:22 +00:00
Taus	a4df96aad6	yeast: Support capturing unnamed nodes in queries Three improvements to the query parser, all aimed at allowing query patterns to refer to unnamed tokens: 1. Bare-literal capture: `"=" @op` now captures the unnamed `=` token, matching the parenthesized form `("=") @op`. Previously the literal branch in parse_query_list skipped the maybe_wrap_capture call, so the `@op` was a leftover token and would error. 2. Bare `_` matches any node, named or unnamed. Previously bare `_` and `(_)` both produced QueryNode::Any with the same matches_named_only behaviour, so bare `_` would skip unnamed children. Now Any carries a match_unnamed flag: false for `(_)` (named-only, tree-sitter default) and true for bare `_` (any node). 3. Named fields and bare child patterns may be intermixed in any order. Previously, once parse_query_fields saw a bare pattern it would stop accepting named fields. The fix accumulates bare patterns into the implicit `child` field and keeps parsing. Each named field independently selects its target field for matching, so the source-order of fields in the query is purely cosmetic and intermixing is safe. Add tests covering parenthesized capture, bare-literal capture, and the named-vs-any distinction between `(_)` and bare `_`. Update query-syntax docs to reflect all three.	2026-05-07 15:08:21 +00:00
Taus	957c89b478	yeast: Support multi-phase desugaring via DesugaringConfig::add_phase Extend the desugaring config from a single flat list of rules to an ordered sequence of named Phases. Each phase runs to completion (a full traversal applying its rules) before the next phase starts. Rules in different phases never compete for matches. The config is built via the new chainable API: DesugaringConfig::new() .add_phase("cleanup", cleanup_rules) .add_phase("desugar", desugar_rules) .with_output_node_types_yaml(yaml); Single-phase configs are just .add_phase(...) called once. A single FreshScope is shared across phases so generated identifier names (e.g. $tmp-N) are unique throughout the run. Phase names appear in error messages, e.g. "Phase `desugar`: exceeded maximum rewrite depth". Add two regression tests: one verifying basic two-phase chained desugaring, and one verifying that errors include the failing phase name.	2026-05-06 21:17:31 +00:00
Taus	9a94836974	yeast: Add per-rule .repeated() flag to opt into iterative matching Previously, after a rule fired the engine would always re-try that same rule on the result root. A rule whose output matched its own query (intentionally or by accident) would loop until the global MAX_REWRITE_DEPTH safety net kicked in. Make the default behavior fire-once-per-node: after a rule fires on node N, the engine no longer tries that same rule on the result root. Other rules and child traversal are unaffected. Rules that intentionally rewrite iteratively can opt into the old behavior via the new Rule::repeated() builder method. Add two regression tests using a self-swapping assignment rule: - with .repeated(), the swap loops and trips the depth limit - without it (default), the swap fires once and terminates	2026-05-06 12:33:18 +00:00
Taus	a0a0e9e9a7	yeast: Add test for chained rules with output-only kinds Adds a regression test verifying that desugaring rules can chain across output-only node kinds: a first rule rewrites an input kind to an output-only kind, and a second rule then rewrites that output-only kind into another output-only kind. This exercises the schema lookup for query patterns whose root kind is not present in the input tree-sitter grammar.	2026-05-06 11:45:53 +00:00
Taus	6e580446fd	yeast: Add yeast test suite 12 tests covering parsing, queries, tree building, desugaring rules, cursor navigation, and the shorthand rule! syntax. Tests use a custom output node-types.yml with named fields for all children (parameter, stmt, index), loaded via schema_from_yaml_with_language. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-06 11:34:09 +00:00

22 Commits