Commit Graph

87353 Commits

Author SHA1 Message Date
Owen Mansel-Chan
974e7cc319 Merge pull request #21825 from github/dependabot/go_modules/go/extractor/extractor-dependencies-0e0a523006
Bump the extractor-dependencies group in /go/extractor with 2 updates
2026-05-11 11:35:14 +01:00
Asger F
f91482810d Merge pull request #21816 from github/tausbn/yeast-mutate-in-place
yeast: Two minor performance optimisations
2026-05-11 11:08:24 +02:00
dependabot[bot]
8f9d5c5217 Bump the extractor-dependencies group in /go/extractor with 2 updates
Bumps the extractor-dependencies group in /go/extractor with 2 updates: [golang.org/x/mod](https://github.com/golang/mod) and [golang.org/x/tools](https://github.com/golang/tools).


Updates `golang.org/x/mod` from 0.35.0 to 0.36.0
- [Commits](https://github.com/golang/mod/compare/v0.35.0...v0.36.0)

Updates `golang.org/x/tools` from 0.44.0 to 0.45.0
- [Release notes](https://github.com/golang/tools/releases)
- [Commits](https://github.com/golang/tools/compare/v0.44.0...v0.45.0)

---
updated-dependencies:
- dependency-name: golang.org/x/mod
  dependency-version: 0.36.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: extractor-dependencies
- dependency-name: golang.org/x/tools
  dependency-version: 0.45.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: extractor-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-05-11 03:06:30 +00:00
Taus
15936a5f8d yeast: Take fields by ownership in apply_rules_inner
Previously, apply_rules_inner snapshotted a node's fields by cloning
the BTreeMap into a Vec<(FieldId, Vec<Id>)>, then built a fresh
BTreeMap of new_fields for the rewritten Ids. For a node with N
fields, this allocated 2N+1 things per visit (the snapshot Vec, N
cloned children Vecs, the new BTreeMap entries) — even when nothing
in the subtree was rewritten.

Use std::mem::take to swap the parent's fields out by ownership: the
recursion can mutate the AST (including pushing new nodes from rule
firings) without any conflict, since we hold the owned BTreeMap
locally. Iterate values_mut() and only allocate a fresh children Vec
on the first divergence (lazy alloc): unchanged children stay in the
existing slot. When done, swap the fields back.

For a subtree with no rewrites, this is now zero allocations per node
(modulo the recursion itself). For nodes with rewrites, it's one Vec
allocation per field that contains a rewritten child, instead of two
plus the BTreeMap rebuild.
2026-05-08 12:48:10 +00:00
Taus
7bd27b83e0 yeast: Mutate parent fields in place; remove redundant Node::id
apply_rules_inner used to handle the "child was rewritten, so the
parent needs new field IDs" case by cloning the parent node, swapping
in the new fields, pushing the clone onto the arena, and returning the
new Id. Every ancestor on the path from the rewrite up to the root was
duplicated this way, with the originals retained as garbage in the
arena.

Switch to in-place mutation: assign `ast.nodes[id].fields = new_fields`
and return the same Id. Rule firings still produce genuinely new nodes
via BuildCtx (their structure differs from the input), but the
ancestor-rebuild spine no longer copies anything.

This is safe because apply_rules_inner already works entirely by Id:
the field snapshot is cloned out before recursing, no &Node references
are held across mutations of the arena, and captures are scoped to a
single rule firing so the now-stable Ids do not break anything.

Memory effect: a desugaring pass that rewrites R leaves of a tree of
average depth d previously appended R*d ancestor clones to the arena.
Now appends 0.

With Ids stable for the lifetime of an Ast, the Node::id field becomes
truly redundant and is removed (along with the Node::id() accessor).
AstCursor switches from caching `node: &Node` to tracking `node_id:
Id` and looking the node up via the arena on each access; ChildrenIter
now yields Ids directly. A new AstCursor::node_id() method gives
callers access to the cursor position by Id.
2026-05-08 12:47:22 +00:00
Owen Mansel-Chan
36554d160c Merge pull request #21741 from MarkLee131/fix/path-injection-read-subkind
Fix/path injection read subkind
2026-05-08 12:38:16 +01:00
Taus
5a4dee50f7 Merge pull request #21810 from github/tausbn/yeast-forward-scan-queries
yeast: Align query semantics more closely with tree-sitter
2026-05-08 13:30:43 +02:00
Asger F
fdef477138 Merge pull request #21812 from asgerf/asgerf/swift-yeast-1
Add tree-sitter-swift extractor scaffolding and YEAST desugaring
2026-05-08 13:21:17 +02:00
Anders Schack-Mulligen
81e1ab7aab Merge pull request #21808 from aschackmull/cfg/switch-pattern-eval
Cfg: Rework CFG for switch case patterns.
2026-05-08 12:48:44 +02:00
Paolo Tranquilli
8cc6d788c5 Merge pull request #21814 from github/codeql-spark-run-25547718006
Update changelog documentation site for codeql-cli-2.25.4
2026-05-08 11:45:26 +02:00
github-actions[bot]
26e13055c8 update codeql documentation 2026-05-08 09:24:10 +00:00
Asger F
33e89ea123 Address review comments 2026-05-08 09:03:18 +02:00
Asger F
9a2b7bac8f Fix Bazel glob to include subdirectories
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-05-08 08:56:40 +02:00
Anders Schack-Mulligen
048411e168 Apply suggestions from code review
Co-authored-by: Anders Schack-Mulligen <aschackmull@users.noreply.github.com>
2026-05-08 08:11:32 +02:00
Asger F
2802819170 Use new YEAST API after rebasing 2026-05-07 21:37:42 +02:00
Asger F
a1447075e8 Add AGENTS.md with build/test instructions 2026-05-07 21:35:51 +02:00
Asger F
cd457a7d6b Move Swift language into its own module 2026-05-07 21:35:50 +02:00
Asger F
4e12a8c8d2 Add basic YEAST dependency and rule 2026-05-07 21:35:48 +02:00
Asger F
0210c970f2 Add tree-sitter for Swift (called 'unified') 2026-05-07 21:35:46 +02:00
Taus
b027ac3658 Merge pull request #21809 from github/tausbn/yeast-add-support-for-desugaring-phases
Yeast: Two small improvements
2026-05-07 19:00:44 +02:00
MarkLee131
26af52897d Merge branch 'main' into fix/path-injection-read-subkind 2026-05-07 23:48:42 +08:00
Taus
af6e921da5 yeast: Forward-scan bare child patterns instead of strict positional
Previously, a bare child pattern in a query took whatever the next
child of the iterator was and either matched or failed: it would not
scan ahead to find a match. So `(foo ("baz"))` against a `foo` whose
implicit `child` field was `["bar", "baz"]` would fail (the pattern
took "bar" first).

Switch to forward-scan semantics: a SingleNode matcher advances through
the iterator until it finds a child that matches its sub-query. Patterns
that are named-only continue to skip past unnamed children for free.
Order is preserved across multiple bare patterns at the same level —
each pattern advances the shared iterator past whatever it consumed —
so a query cannot match children out of source order.

Captures from a failed match attempt are rolled back via a snapshot, so
partial captures from a complex sub-query do not leak across attempts.

Add two regression tests against the `do` body wrapper in a Ruby
for-loop, whose implicit `child` field contains [do, identifier, end]:
- a query for ("end") matches by skipping past `do` and the identifier
- a query for ("end") then ("do") fails, demonstrating order preservation
2026-05-07 15:08:22 +00:00
Taus
6f643a3604 yeast: Use canonical ID when registering unnamed kinds in Schema
Schema::from_language registered unnamed kinds via or_insert(id), where
`id` came from iterating 0..node_kind_count. For names with multiple
unnamed IDs (notably "end" in tree-sitter-ruby has IDs 0 and 13, where
ID 0 is the reserved error token), this picked the first encountered
ID — typically the wrong one.

The visitor sets node.kind via language.id_for_node_kind(name, false),
which returns the canonical ID. So a query for ("end") would compare
node.kind=13 against schema=0 and silently fail to match, with no
diagnostic.

Use language.id_for_node_kind(name, false) to obtain the canonical ID
when registering, mirroring the named-kind path that already does the
same with id_for_node_kind(name, true).
2026-05-07 15:08:21 +00:00
Taus
a4df96aad6 yeast: Support capturing unnamed nodes in queries
Three improvements to the query parser, all aimed at allowing query
patterns to refer to unnamed tokens:

1. Bare-literal capture: `"=" @op` now captures the unnamed `=` token,
   matching the parenthesized form `("=") @op`. Previously the literal
   branch in parse_query_list skipped the maybe_wrap_capture call, so
   the `@op` was a leftover token and would error.

2. Bare `_` matches any node, named or unnamed. Previously bare `_` and
   `(_)` both produced QueryNode::Any with the same matches_named_only
   behaviour, so bare `_` would skip unnamed children. Now Any carries a
   match_unnamed flag: false for `(_)` (named-only, tree-sitter default)
   and true for bare `_` (any node).

3. Named fields and bare child patterns may be intermixed in any order.
   Previously, once parse_query_fields saw a bare pattern it would stop
   accepting named fields. The fix accumulates bare patterns into the
   implicit `child` field and keeps parsing.

Each named field independently selects its target field for matching, so
the source-order of fields in the query is purely cosmetic and intermixing
is safe.

Add tests covering parenthesized capture, bare-literal capture, and the
named-vs-any distinction between `(_)` and bare `_`. Update query-syntax
docs to reflect all three.
2026-05-07 15:08:21 +00:00
Owen Mansel-Chan
f9240e7058 Fix QL formatting 2026-05-07 15:57:33 +01:00
Anders Schack-Mulligen
6b6df374fa C#/Java: Accept test changes. 2026-05-07 15:07:31 +02:00
Paolo Tranquilli
f9e42ac443 Merge pull request #21794 from github/post-release-prep/codeql-cli-2.25.4
Post-release preparation for codeql-cli-2.25.4
2026-05-07 14:43:24 +02:00
copilot-swe-agent[bot]
e0d663f79b yeast: address review wording in phase docs
Agent-Logs-Url: https://github.com/github/codeql/sessions/6d23db05-a6e9-4de4-8951-b465980fd0ef

Co-authored-by: tausbn <1104778+tausbn@users.noreply.github.com>
2026-05-07 12:35:46 +00:00
Taus
33fc767782 Merge pull request #21797 from github/tausbn/yeast-desugaring-tool
Shared: Add YEAST desugaring library
2026-05-07 13:48:12 +02:00
Anders Schack-Mulligen
072166ba88 C#/Java: Adjust Guards instantiations. 2026-05-07 13:46:52 +02:00
Anders Schack-Mulligen
48785a0a76 Cfg: Rework CFG for switch case patterns. 2026-05-07 13:07:07 +02:00
MarkLee131
e8553c7449 Merge branch 'main' into fix/path-injection-read-subkind 2026-05-07 18:11:45 +08:00
Owen Mansel-Chan
33035dbfc8 Fix yaml formatting 2026-05-07 11:06:43 +01:00
Taus
957c89b478 yeast: Support multi-phase desugaring via DesugaringConfig::add_phase
Extend the desugaring config from a single flat list of rules to an
ordered sequence of named Phases. Each phase runs to completion (a
full traversal applying its rules) before the next phase starts.
Rules in different phases never compete for matches.

The config is built via the new chainable API:

    DesugaringConfig::new()
        .add_phase("cleanup", cleanup_rules)
        .add_phase("desugar", desugar_rules)
        .with_output_node_types_yaml(yaml);

Single-phase configs are just .add_phase(...) called once.

A single FreshScope is shared across phases so generated identifier
names (e.g. $tmp-N) are unique throughout the run.

Phase names appear in error messages, e.g. "Phase `desugar`:
exceeded maximum rewrite depth".

Add two regression tests: one verifying basic two-phase chained
desugaring, and one verifying that errors include the failing phase
name.
2026-05-06 21:17:31 +00:00
Taus
9a94836974 yeast: Add per-rule .repeated() flag to opt into iterative matching
Previously, after a rule fired the engine would always re-try that
same rule on the result root. A rule whose output matched its own
query (intentionally or by accident) would loop until the global
MAX_REWRITE_DEPTH safety net kicked in.

Make the default behavior fire-once-per-node: after a rule fires on
node N, the engine no longer tries that same rule on the result root.
Other rules and child traversal are unaffected. Rules that
intentionally rewrite iteratively can opt into the old behavior via
the new Rule::repeated() builder method.

Add two regression tests using a self-swapping assignment rule:
- with .repeated(), the swap loops and trips the depth limit
- without it (default), the swap fires once and terminates
2026-05-06 12:33:18 +00:00
Taus
a0a0e9e9a7 yeast: Add test for chained rules with output-only kinds
Adds a regression test verifying that desugaring rules can chain across
output-only node kinds: a first rule rewrites an input kind to an
output-only kind, and a second rule then rewrites that output-only
kind into another output-only kind. This exercises the schema lookup
for query patterns whose root kind is not present in the input
tree-sitter grammar.
2026-05-06 11:45:53 +00:00
Taus
60dcf88b50 yeast: Add Bazel build rules for yeast crates
Add BUILD.bazel files for the yeast and yeast-macros crates, register
them as dependencies of the shared tree-sitter extractor, and refresh
the vendored crate dependencies via update_tree_sitter_extractors_deps.sh.
2026-05-06 11:34:09 +00:00
Taus
82bbdee832 yeast: Support separate output node types in extractor generator
Language and LanguageSpec gain optional output_node_types field.
When set, the generator produces dbscheme/QL from the output types
and the extractor validates TRAP against them.

All existing extractors pass None (no behavior change).
Ruby extract() calls gain vec![] for the new rules parameter.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-06 11:34:09 +00:00
Taus
9ad431dea1 yeast: Integrate yeast with shared tree-sitter extractor
extract() gains a rules parameter. When empty, uses tree-sitter native
traversal (no behavior change). When non-empty, runs yeast desugaring
and extracts via traverse_yeast.

Adds AstNode trait abstracting over tree_sitter::Node and yeast::Node,
with minimal changes to existing Visitor methods (Node -> &N in 6
signatures).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-06 11:34:09 +00:00
Taus
cc28ff9a48 yeast: Add yeast documentation
Covers architecture, query language, template language
(tree!/trees!/rule!),
capture semantics, fresh identifiers, and extractor integration.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-06 11:34:09 +00:00
Taus
6e580446fd yeast: Add yeast test suite
12 tests covering parsing, queries, tree building, desugaring rules,
cursor navigation, and the shorthand rule! syntax.

Tests use a custom output node-types.yml with named fields for all
children (parameter, stmt, index), loaded via
schema_from_yaml_with_language.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-06 11:34:09 +00:00
Taus
4c5548363c yeast: Add AST dumper for human-readable tree output
Produces indented text showing node kinds, named fields, and leaf
content. Unnamed tokens are hidden unless inside a named field.
Used by tests for readable assertions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-06 11:34:09 +00:00
Taus
8a9e53cc58 yeast: Add YAML node-types format and converter
Human-friendly YAML alternative to tree-sitter node-types.json with
three sections: supertypes, named, unnamed. Supports bidirectional
conversion and building Schema objects from YAML.

Includes CLI binary (node_types_yaml) and documentation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-06 11:34:09 +00:00
Taus
04f587190e yeast: AST desugaring framework with proc-macro DSL
YEAST (YEAST Elaborates Abstract Syntax Trees) is a framework for
transforming tree-sitter parse trees before CodeQL extraction.

Core components:
- shared/yeast/ — Ast, Node, Schema, query matching engine, captures,
  FreshScope, BuildCtx
- shared/yeast-macros/ — proc macros: query!, tree!, trees!, rule!

The query language is inspired by tree-sitter queries:
  (assignment left: (_) @lhs right: (_) @rhs)

Templates support embedded Rust ({expr}), splicing ({..expr}),
computed literals (#{expr}), and fresh identifiers ($name).

The rule! macro combines query and transform:
  rule!((for pattern: (_) @pat ...) => (call receiver: {val} ...))

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-06 11:34:09 +00:00
Owen Mansel-Chan
e6f587e761 Merge pull request #21715 from knewbury01/knewbury01/adjust-actions-queries-untrusted-checkout
Improve actions/ql/src/Security/CWE-829/UntrustedCheckoutX queries
2026-05-06 11:52:30 +01:00
Jack Nørskov Jørgensen
2d2b690b5d Merge pull request #21799 from github/jacknojo/fix_python_formatting
Fix issue with Python formatting and expand scope of python-tooling
2026-05-06 12:24:21 +02:00
Jack Nørskov Jørgensen
52b02a0581 Fix path to generated models 2026-05-06 08:39:41 +02:00
Tom Hvitved
00fb11b028 Merge pull request #21778 from hvitved/rust/type-inference-verbose-type-path-expectations
Rust: Use verbose type paths in inline expectation comments
2026-05-05 20:23:25 +02:00
Kristen Newbury
6a8f9a950c Fix unit test expected file 2026-05-05 13:27:09 -04:00
Jack Nørskov Jørgensen
ebc759d830 Fix issue with Python formatting and expand scope of python-tooling 2026-05-05 16:14:05 +02:00