Commit Graph

1884 Commits

Author SHA1 Message Date
Copilot
577cf4a630 Shared CFG: support for-else and while-else loops
Add two default predicates to AstSig:

  default AstNode getWhileElse(WhileStmt loop) { none() }
  default AstNode getForeachElse(ForeachStmt loop) { none() }

When defined, the explicit-step rules for While/Do and Foreach
route the loop's normal-completion exits through the else block
before reaching the after-loop node:

  - WhileStmt: after-false condition -> before-else -> after-while
    (instead of directly after-while).
  - ForeachStmt: after-collection [empty] and the LoopHeader exit
    are both routed through before-else -> after-foreach.

Python's Ast module overrides the predicates to return the
synthetic BlockStmt for the orelse slot, replacing the previous
customisations in Input::step. This eliminates parallel direct
successors emitted by the previous Python-side step additions
(verified: multipleSuccessors on a CPython database goes from
1340 to 0).

Java and C# CFG tests are unaffected.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-26 16:32:39 +00:00
Óscar San José
996e79131e Merge branch 'main' into post-release-prep/codeql-cli-2.25.5 2026-05-22 16:32:30 +02:00
Tom Hvitved
688695cd57 Merge pull request #21876 from hvitved/dense-rank-short-circuit
Util: Short-circuit `rank` usage in dense ranking library
2026-05-22 16:08:45 +02:00
Tom Hvitved
c70007607a Merge pull request #21850 from hvitved/type-inference-unify-base-type
Type inference: Unify `getABaseTypeMention` and `conditionSatisfiesConstraint`
2026-05-22 13:44:18 +02:00
Tom Hvitved
3ee45ff4b9 Apply suggestion from @geoffw0
Co-authored-by: Geoffrey White <40627776+geoffw0@users.noreply.github.com>
2026-05-22 10:07:52 +02:00
Tom Hvitved
6d6e9c0d47 Util: Only compute dense ranks when needed 2026-05-22 08:59:01 +02:00
Owen Mansel-Chan
7e6b10e8cf Merge pull request #21879 from owen-mc/shared/cfg/simpleleafnode
Shared CFG: update `simpleLeafNode` to exclude those with additional leaf nodes
2026-05-21 14:58:04 +01:00
Owen Mansel-Chan
149bfd19d3 Merge pull request #21880 from owen-mc/shared/cfg/for-loop-stmt-init-update
Shared CFG: Make the init and update parts of a for loop statements
2026-05-21 14:57:44 +01:00
Owen Mansel-Chan
c3bafc75ab Shared CFG: allow statements for init and update of for loop 2026-05-21 13:40:26 +01:00
Owen Mansel-Chan
19f93cd18b Shared CFG: update simpleLeafNode to exclude those with additional nodes 2026-05-21 13:31:56 +01:00
Paolo Tranquilli
39becfd7e5 Add Windows file path tests for relativize_for_diagnostic
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-21 14:08:50 +02:00
Paolo Tranquilli
a84043b627 Merge pull request #21844 from github/redsun82/issue-21802-ruby-absolute-paths-in-sarif-diagnostics-a02887
Use relative paths in tree-sitter extractor diagnostics
2026-05-21 14:00:32 +02:00
Paolo Tranquilli
06c908756f Merge branch 'main' into redsun82/issue-21802-ruby-absolute-paths-in-sarif-diagnostics-a02887 2026-05-19 13:17:23 +02:00
github-actions[bot]
9f64000962 Post-release preparation for codeql-cli-2.25.5 2026-05-18 15:20:31 +00:00
github-actions[bot]
e38616a2ef Release preparation for version 2.25.5 2026-05-18 12:05:32 +00:00
Tom Hvitved
7f1bebe8ba Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-05-17 20:29:19 +02:00
Tom Hvitved
3f7b50ebba Type inference: Unify getABaseTypeMention and conditionSatisfiesConstraint 2026-05-13 16:24:36 +02:00
Geoffrey White
c8196e439f Merge branch 'main' into extsensitive 2026-05-13 13:04:48 +01:00
Paolo Tranquilli
c2fc0cf111 Fix Windows path handling in diagnostic relativization
Canonicalize `current_dir()` to match canonicalized file paths (avoids
`\\?\` prefix mismatch on Windows), and normalize backslashes to
forward slashes in relative diagnostic paths.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-13 10:31:48 +02:00
Paolo Tranquilli
c3cf7c2bca Use absolute path fallback instead of file: URI
Drop the `url` crate dependency. When a path can't be relativized
against the source root, emit it as a bare absolute path and let the
CLI's SARIF generator handle URI conversion downstream.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-13 10:28:27 +02:00
Asger F
cfa175357b Merge pull request #21815 from asgerf/asgerf/missing-node-kind-error
Shared: Nicer panic message if node kind is missing
2026-05-13 10:11:14 +02:00
Paolo Tranquilli
57ac0192c0 Fix formatting
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-13 09:48:45 +02:00
Paolo Tranquilli
d16bc36e83 Use relative paths in tree-sitter extractor diagnostics
Diagnostic `location.file` entries were using absolute paths (e.g.
`/home/runner/work/...`), causing broken links in the GitHub UI.
Now relativize against CWD (the source root during extraction), falling
back to a properly percent-encoded `file:` URI for paths outside it.

Fixes https://github.com/github/codeql/issues/21802

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-13 09:45:37 +02:00
Geoffrey White
51dae161a7 Merge branch 'main' into extsensitive 2026-05-12 09:29:32 +01:00
Geoffrey White
af0124f0f1 Merge branch 'main' into extsensitive 2026-05-11 09:47:29 +01:00
Taus
15936a5f8d yeast: Take fields by ownership in apply_rules_inner
Previously, apply_rules_inner snapshotted a node's fields by cloning
the BTreeMap into a Vec<(FieldId, Vec<Id>)>, then built a fresh
BTreeMap of new_fields for the rewritten Ids. For a node with N
fields, this allocated 2N+1 things per visit (the snapshot Vec, N
cloned children Vecs, the new BTreeMap entries) — even when nothing
in the subtree was rewritten.

Use std::mem::take to swap the parent's fields out by ownership: the
recursion can mutate the AST (including pushing new nodes from rule
firings) without any conflict, since we hold the owned BTreeMap
locally. Iterate values_mut() and only allocate a fresh children Vec
on the first divergence (lazy alloc): unchanged children stay in the
existing slot. When done, swap the fields back.

For a subtree with no rewrites, this is now zero allocations per node
(modulo the recursion itself). For nodes with rewrites, it's one Vec
allocation per field that contains a rewritten child, instead of two
plus the BTreeMap rebuild.
2026-05-08 12:48:10 +00:00
Taus
7bd27b83e0 yeast: Mutate parent fields in place; remove redundant Node::id
apply_rules_inner used to handle the "child was rewritten, so the
parent needs new field IDs" case by cloning the parent node, swapping
in the new fields, pushing the clone onto the arena, and returning the
new Id. Every ancestor on the path from the rewrite up to the root was
duplicated this way, with the originals retained as garbage in the
arena.

Switch to in-place mutation: assign `ast.nodes[id].fields = new_fields`
and return the same Id. Rule firings still produce genuinely new nodes
via BuildCtx (their structure differs from the input), but the
ancestor-rebuild spine no longer copies anything.

This is safe because apply_rules_inner already works entirely by Id:
the field snapshot is cloned out before recursing, no &Node references
are held across mutations of the arena, and captures are scoped to a
single rule firing so the now-stable Ids do not break anything.

Memory effect: a desugaring pass that rewrites R leaves of a tree of
average depth d previously appended R*d ancestor clones to the arena.
Now appends 0.

With Ids stable for the lifetime of an Ast, the Node::id field becomes
truly redundant and is removed (along with the Node::id() accessor).
AstCursor switches from caching `node: &Node` to tracking `node_id:
Id` and looking the node up via the arena on each access; ChildrenIter
now yields Ids directly. A new AstCursor::node_id() method gives
callers access to the cursor position by Id.
2026-05-08 12:47:22 +00:00
Asger F
9a1c2da5d9 Fix clippy: inline variable in format string
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-08 14:22:01 +02:00
Owen Mansel-Chan
36554d160c Merge pull request #21741 from MarkLee131/fix/path-injection-read-subkind
Fix/path injection read subkind
2026-05-08 12:38:16 +01:00
Taus
5a4dee50f7 Merge pull request #21810 from github/tausbn/yeast-forward-scan-queries
yeast: Align query semantics more closely with tree-sitter
2026-05-08 13:30:43 +02:00
Asger F
638dc9380c Shared: Nicer panic message if node kind is missing
Still panics, just with a better message
2026-05-08 13:23:35 +02:00
Anders Schack-Mulligen
81e1ab7aab Merge pull request #21808 from aschackmull/cfg/switch-pattern-eval
Cfg: Rework CFG for switch case patterns.
2026-05-08 12:48:44 +02:00
Anders Schack-Mulligen
048411e168 Apply suggestions from code review
Co-authored-by: Anders Schack-Mulligen <aschackmull@users.noreply.github.com>
2026-05-08 08:11:32 +02:00
Taus
b027ac3658 Merge pull request #21809 from github/tausbn/yeast-add-support-for-desugaring-phases
Yeast: Two small improvements
2026-05-07 19:00:44 +02:00
Geoffrey White
36946313d9 Shared: Autoformat. 2026-05-07 17:21:13 +01:00
MarkLee131
26af52897d Merge branch 'main' into fix/path-injection-read-subkind 2026-05-07 23:48:42 +08:00
Taus
af6e921da5 yeast: Forward-scan bare child patterns instead of strict positional
Previously, a bare child pattern in a query took whatever the next
child of the iterator was and either matched or failed: it would not
scan ahead to find a match. So `(foo ("baz"))` against a `foo` whose
implicit `child` field was `["bar", "baz"]` would fail (the pattern
took "bar" first).

Switch to forward-scan semantics: a SingleNode matcher advances through
the iterator until it finds a child that matches its sub-query. Patterns
that are named-only continue to skip past unnamed children for free.
Order is preserved across multiple bare patterns at the same level —
each pattern advances the shared iterator past whatever it consumed —
so a query cannot match children out of source order.

Captures from a failed match attempt are rolled back via a snapshot, so
partial captures from a complex sub-query do not leak across attempts.

Add two regression tests against the `do` body wrapper in a Ruby
for-loop, whose implicit `child` field contains [do, identifier, end]:
- a query for ("end") matches by skipping past `do` and the identifier
- a query for ("end") then ("do") fails, demonstrating order preservation
2026-05-07 15:08:22 +00:00
Taus
6f643a3604 yeast: Use canonical ID when registering unnamed kinds in Schema
Schema::from_language registered unnamed kinds via or_insert(id), where
`id` came from iterating 0..node_kind_count. For names with multiple
unnamed IDs (notably "end" in tree-sitter-ruby has IDs 0 and 13, where
ID 0 is the reserved error token), this picked the first encountered
ID — typically the wrong one.

The visitor sets node.kind via language.id_for_node_kind(name, false),
which returns the canonical ID. So a query for ("end") would compare
node.kind=13 against schema=0 and silently fail to match, with no
diagnostic.

Use language.id_for_node_kind(name, false) to obtain the canonical ID
when registering, mirroring the named-kind path that already does the
same with id_for_node_kind(name, true).
2026-05-07 15:08:21 +00:00
Taus
a4df96aad6 yeast: Support capturing unnamed nodes in queries
Three improvements to the query parser, all aimed at allowing query
patterns to refer to unnamed tokens:

1. Bare-literal capture: `"=" @op` now captures the unnamed `=` token,
   matching the parenthesized form `("=") @op`. Previously the literal
   branch in parse_query_list skipped the maybe_wrap_capture call, so
   the `@op` was a leftover token and would error.

2. Bare `_` matches any node, named or unnamed. Previously bare `_` and
   `(_)` both produced QueryNode::Any with the same matches_named_only
   behaviour, so bare `_` would skip unnamed children. Now Any carries a
   match_unnamed flag: false for `(_)` (named-only, tree-sitter default)
   and true for bare `_` (any node).

3. Named fields and bare child patterns may be intermixed in any order.
   Previously, once parse_query_fields saw a bare pattern it would stop
   accepting named fields. The fix accumulates bare patterns into the
   implicit `child` field and keeps parsing.

Each named field independently selects its target field for matching, so
the source-order of fields in the query is purely cosmetic and intermixing
is safe.

Add tests covering parenthesized capture, bare-literal capture, and the
named-vs-any distinction between `(_)` and bare `_`. Update query-syntax
docs to reflect all three.
2026-05-07 15:08:21 +00:00
Geoffrey White
df37b50051 Shared: Small adjustment to the encrypt not-sensitive regex. 2026-05-07 14:22:31 +01:00
Paolo Tranquilli
f9e42ac443 Merge pull request #21794 from github/post-release-prep/codeql-cli-2.25.4
Post-release preparation for codeql-cli-2.25.4
2026-05-07 14:43:24 +02:00
copilot-swe-agent[bot]
e0d663f79b yeast: address review wording in phase docs
Agent-Logs-Url: https://github.com/github/codeql/sessions/6d23db05-a6e9-4de4-8951-b465980fd0ef

Co-authored-by: tausbn <1104778+tausbn@users.noreply.github.com>
2026-05-07 12:35:46 +00:00
Anders Schack-Mulligen
48785a0a76 Cfg: Rework CFG for switch case patterns. 2026-05-07 13:07:07 +02:00
MarkLee131
e8553c7449 Merge branch 'main' into fix/path-injection-read-subkind 2026-05-07 18:11:45 +08:00
Geoffrey White
809da0f8e7 Shared: Autoformat. 2026-05-07 10:01:56 +01:00
Taus
957c89b478 yeast: Support multi-phase desugaring via DesugaringConfig::add_phase
Extend the desugaring config from a single flat list of rules to an
ordered sequence of named Phases. Each phase runs to completion (a
full traversal applying its rules) before the next phase starts.
Rules in different phases never compete for matches.

The config is built via the new chainable API:

    DesugaringConfig::new()
        .add_phase("cleanup", cleanup_rules)
        .add_phase("desugar", desugar_rules)
        .with_output_node_types_yaml(yaml);

Single-phase configs are just .add_phase(...) called once.

A single FreshScope is shared across phases so generated identifier
names (e.g. $tmp-N) are unique throughout the run.

Phase names appear in error messages, e.g. "Phase `desugar`:
exceeded maximum rewrite depth".

Add two regression tests: one verifying basic two-phase chained
desugaring, and one verifying that errors include the failing phase
name.
2026-05-06 21:17:31 +00:00
Geoffrey White
f2f4f4cce3 Shared: Add 'security_code' sensitive data heuristic. 2026-05-06 14:48:55 +01:00
Geoffrey White
5ed78d1a4a Shared: Fix and simplify the exclusion for 'encrypted' values. 2026-05-06 14:43:52 +01:00
Geoffrey White
6e2fb6f0ff Shared: Fix for 'coauthor'. 2026-05-06 14:34:18 +01:00
Geoffrey White
213ab902cd Shared: Fix for 'api_tok'. 2026-05-06 14:34:15 +01:00