Compare commits

..

48 Commits

Author SHA1 Message Date
Taus
f75cf95db5 Apply rustfmt
Format the touched Rust crates (shared/tree-sitter-extractor,
shared/yeast, shared/yeast-macros, unified/extractor) so the
tree-sitter-extractor CI fmt check passes. No functional changes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-25 12:26:52 +00:00
Taus
8a3bb915e4 unified/swift: Use tree! instead of ctx.node
Cleans up a few places where we were constructing trees piece by piece
rather than using the `tree!` macro.

In the process, Copilot noticed an issue that should probably be
addressed: the labeled_statement rule can never fire, since there are no
such nodes in the input. This is possibly a simple as making
_labeled_statement (which _does_ exist) named, but I haven't attempted
this.

Finally, a small change to yeast makes it so that the contents of a {}
interpolation can be a Rust block (previously it could only be a single
expression). This avoids the need to double-wrap instances where you
want to interpolate a single node produced as the final value of some
block.
2026-06-25 12:02:39 +00:00
Taus
ded5cc2901 unified/swift: Replace reduce_left with Rust helpers
(Both reduce_left and map are still supported, but we could remove them
at this point.)

I think this way of writing things makes the intent a lot clearer -- it
avoids extending the yeast rule language with complicated constructs,
pushing the complexity (such as it is) into Rust instead.
2026-06-25 12:02:39 +00:00
Taus
d9484e6196 unified/swift: Propagate property_declaration modifiers via context
Gets rid of the final uses of mutation (via prepend_field). The approach
is the same as in the preceding commits: we set the appropriate fields
on the context when processing the outer node, and then access these
fields on the inner nodes.

The repeated use of `modifier` fields is a _bit_ clunky, but since we're
likely moving to an out-of-band modifier mechanism at some point, I
think it's good enough for now.
2026-06-25 12:02:39 +00:00
Taus
7793702e8a unified/swift: Propagate enum_entry outer modifiers via context
Same as in the preceding commit, we added a test beforehand for testing
this syntax, and verified that it was unchanged by the cleanup in this
commit.
2026-06-25 12:02:39 +00:00
Taus
4730898b2f unified/swift: Translate protocol properties using context
Avoids more "mutation after creation" via prepend_field.

Also adds a test to the corpus for exercising this syntax. Although it's
not evident, the test output was unchanged by this refactoring.
2026-06-25 12:02:39 +00:00
Taus
86feaeff4e unified/swift: Propagate parameter default values via context
Extends the context with a field for keeping track of the default value.

In the process, we also rename the context to SwiftContext as it now
doesn't only concern itself with properties.
2026-06-25 12:02:39 +00:00
Taus
0c85c31129 yeast: Simplify Swift rules using the new machinery
Propagates in name and type information for various property
declarations, using the context mechanism. This avoids mutating
already-translated nodes in-place, and is generally much easier to read.
2026-06-25 12:02:39 +00:00
Taus
919c5b8c53 yeast: Hide desugaring behind Desugarer trait
This was necessary since otherwise the generic type of the
user-specified context (which should only be a concern for yeast) starts
to bleed out into the shared extractor. Instead, we type-erase it by
putting it inside the aforementioned trait.
2026-06-25 12:02:39 +00:00
Taus
c39bfa555d yeast: Add macro for fine-grained rules
Adds `manual_rule!` which provides a more low-level interface for
defining rewrites. (I'm not entirely sold on the name, so any
suggestions would be welcome.)

Notably, the captures bound in the body of such rules have _not_ been
translated yet -- they still come from the _input_ tree. It is the
user's duty to call ctx.translate on these (which has the effect of
recursively invoking the translation) before substituting them into the
output.

For _truly_ low-level access, the user can still construct a Rule
directly, but this is now somewhat cumbersome as the closure contained
therein takes quite a few parameters. Still, the possibility remains.
2026-06-25 12:02:39 +00:00
Taus
03350bf8d7 yeast: Pass raw captures to Rule::new rules
This enables users to specify how and when these captures get
translated. In conjunction with the context mechanism, this can be used
to e.g. translate some piece of information (e.g. the type of
something), record it in the context, and then recursively translate
some other capture that relies on this information. This allows
information to be cleanly passed into descendants (which can be written
using context accesses in the `rule!` macro form).

As a consequence of this change, we now need to pass around a
TranslatorHandle to perform the manual translation. For Repeating rules,
it doesn't really make sense to translate things, so in this case we
simply signal an error.

Also, the implementation of the `rule!` macro changes slightly (without
changing semantics): it now essentially delegates to `Rule::new`,
receiving raw captures, but then immediately applies the translation to
those captures (which, for the majority of cases, is likely the desired
behaviour).
2026-06-25 12:02:39 +00:00
Taus
d38ffe0ad5 yeast: Make transforms return Result
This will enable us to actually capture and log errors in complicated
rules (e.g. ones written in Rust) rather than just panicking.
2026-06-25 12:02:38 +00:00
Taus
d6373eaef7 yeast: Reify the context and allow user-defined data in it
Renames what was previously called `__yeast_ctx` into just `ctx`, and
adds a new field `user_ctx` to this context. Said field can contain a
struct of any user type (necessitating making various parts of the
implementation generic in said type).

Through some Deref magic, field accesses are delegated to the inner
struct (assuming they are not already defined on `ctx`), which should
hopefully make the interface a bit more ergonomic.
2026-06-25 12:02:38 +00:00
Asger F
89cd6770ae Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-06-25 13:18:27 +02:00
Asger F
66c1f037f5 Add TODO 2026-06-19 12:19:51 +02:00
Asger F
2675070291 unified/swift: Clean up translation of patterns
Patterns have an unusual parse tree, but now the matching should
at least be a bit easier to follow.

The TODO regarding not being able to pass down context to handle
var/let is still relevant, and can't be solved in the mapping alone.
2026-06-19 11:35:06 +02:00
Asger F
c01264d05c Coerce pattern_element.key to be an identifier 2026-06-19 10:31:34 +02:00
Asger F
63e1cc90e9 Test: add corpus test for switch case patterns with labeled arguments
Adds a test case 'Switch with labeled case pattern arguments' covering:
- case .implicit(isAcknowledged: false) — labeled bool literal
- case .thread(threadRowId: _, let rowId) — labeled wildcard + binding

The current output contains type errors: pattern_element::key is being
produced as name_expr instead of identifier. These will be fixed in the
following commit.
2026-06-19 10:27:20 +02:00
Asger F
2182265120 unified/swift: Better source range for inferred_type_expr 2026-06-18 14:57:55 +02:00
Asger F
0b666d47db Preserve the dot token in case patterns 2026-06-18 14:55:54 +02:00
Asger F
142ac47166 Refactor: map switch case patterns to constructor_pattern instead of tuple_pattern
Changed the desugaring rules to properly map case patterns with binding (e.g.,
'case .circle(let r):') to constructor_pattern nodes instead of tuple_pattern.

New rules added:
- tuple_pattern_item → pattern_element (preserves optional name/key)
- pattern.kind: binding_pattern → name_pattern (extracts bound identifier)
- pattern.kind: case_pattern → constructor_pattern (creates proper constructor
  with bound arguments as pattern_elements)

This provides a more semantically correct AST representation:
- Constructor name: name_expr identifier 'circle'
- Elements: pattern_element containing name_pattern identifier 'r'

Instead of the previous tuple_pattern string representation.

Updated control-flow.txt corpus outputs.
2026-06-18 14:54:59 +02:00
Asger F
2470c1388a Fix: preserve switch case patterns in desugared output
The switch_entry rule was capturing switch_pattern wrapper nodes instead of
drilling into them to extract the actual pattern nodes. This caused patterns
from switch cases to be lost during desugaring.

Changed the pattern match from:
  (switch_entry pattern: (switch_pattern)* @pats ...)
to:
  (switch_entry pattern: (switch_pattern pattern: @pats)* ...)

This now correctly extracts the pattern field from each switch_pattern node,
ensuring that patterns from cases like 'case 1:' and 'case .circle(let r):'
are preserved in the switch_case AST nodes.

Updated control-flow.txt corpus outputs to reflect the new behavior.
2026-06-18 14:37:42 +02:00
Asger F
fa98557dd9 Update QL test output 2026-06-18 14:26:49 +02:00
Asger F
1e167dfa6b unified/swift: add type and declaration-family mappings 2026-06-18 14:26:47 +02:00
Asger F
f362707493 unified/swift: Imports 2026-06-18 14:26:45 +02:00
Asger F
15208b70aa Unified: Add import_declaration.scoped_import_kind 2026-06-18 14:26:43 +02:00
Asger F
3522f35ab2 unified/swift: add collections, optionals/errors 2026-06-18 14:26:42 +02:00
Asger F
938396a751 unified/swift: add control-flow and loop mappings 2026-06-18 14:26:40 +02:00
Asger F
790d4f11be unified/swift: add closure and capture mappings 2026-06-18 14:26:38 +02:00
Asger F
8f747a355c unified/swift: add function and parameter mappings 2026-06-18 14:26:37 +02:00
Asger F
d17fd2d964 unified/swift: add variable/property/accessor and enum mappings 2026-06-18 14:26:35 +02:00
Asger F
4e9c3fb436 unified/swift: add literals, names, and operator expression mappings 2026-06-18 14:26:33 +02:00
Asger F
0e9d17b59c unified/swift: add top-level normalization and fallback scaffold 2026-06-18 14:26:31 +02:00
Asger F
6c74cd31e4 Yeast: use child locations instead of rule target
Previously, when a node was synthesized it would always take the
location from the node that matched the current rule. This resulted
in overly broad locations however.

For (foo #{bar}) we now take the location of the 'bar' node.

For non-leaf nodes we merge all its child node locations.
2026-06-18 14:26:30 +02:00
Asger F
166406acbb Unified: Elaborate a bit more on AGENTS.md 2026-06-18 14:26:28 +02:00
Asger F
b40cb5dedd Regenerate QL 2026-06-18 14:26:26 +02:00
Asger F
6dd7dedc19 Rewrite AST 2026-06-18 14:26:22 +02:00
Asger F
1d8e682e5f Reset mappings 2026-06-15 10:49:37 +02:00
Asger F
0baa126473 Add ability to prepend fields in Yeast 2026-06-15 10:49:35 +02:00
Asger F
d11b428292 yeast-macros: desugar 'field: @cap' to 'field: _ @cap'
When a field pattern has a bare capture with no preceding pattern
atom (i.e. `foo: @bar`), implicitly use a true wildcard (`_`,
match_unnamed: true) as the node pattern, making it equivalent to
`foo: _ @bar`.

This is a convenience shorthand: in practice every `field: _ @cap`
in the Swift rules can now be written more concisely as `field: @cap`.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-15 10:49:33 +02:00
Asger F
ddc9516e92 Yeast: better support for rewriting unnamed nodes
- Ensure the full wildcard _ supports quantifiers
- Also rewrite unnamed nodes in one-shot phases
2026-06-15 10:49:31 +02:00
Asger F
00068948c1 yeast-macros: add .reduce_left(first -> init, acc, elem -> fold) chain
A left fold over an iterable where the first element seeds the accumulator:
- first -> init  : converts the first element to the initial accumulator
- acc, elem -> fold : fold step; acc = current accumulator, elem = next element
- Empty iterable produces nothing (0-element splice)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-15 10:49:29 +02:00
Asger F
28c879f58c yeast-macros: add .map(p -> tpl) chain syntax for tree templates
After a {expr} or {..expr} placeholder, an optional chain of
.<builtin>() calls may follow. Currently the only builtin is:

  .map(param -> template)

which applies the template to each element of the iterable and
collects the resulting node IDs. A chain auto-splices into the
enclosing field/child position.

Example:
  path: {parts}.map(p -> (identifier #{p}))

The framework is extensible: additional builtins can be added by
matching on the method name in parse_chain_suffix.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-15 10:49:27 +02:00
Asger F
6000c18c24 Unified: also QLDoc for unified.qll 2026-06-12 16:48:25 +02:00
Asger F
e81a3bcbc3 Unified: Add QLDoc 2026-06-12 16:47:06 +02:00
Asger F
7d6d5bfb4a Unified: add test for comments 2026-06-12 16:36:33 +02:00
Asger F
f83adb55ce Unified: regenerate AST 2026-06-12 16:33:51 +02:00
Asger F
5608369abe Extract trivia tokens from original parse tree 2026-06-12 16:32:57 +02:00
49 changed files with 7067 additions and 1201 deletions

View File

@@ -1,22 +0,0 @@
using System.Net.Http;
using System.Net.Security;
using System.Security.Cryptography.X509Certificates;
public class CertificateValidation
{
public void Bad()
{
var handler = new HttpClientHandler();
// BAD: the callback always returns true, so every certificate is trusted.
handler.ServerCertificateCustomValidationCallback =
(request, certificate, chain, errors) => true;
}
public void Good()
{
var handler = new HttpClientHandler();
// GOOD: the certificate is only trusted when there are no validation errors.
handler.ServerCertificateCustomValidationCallback =
(request, certificate, chain, errors) => errors == SslPolicyErrors.None;
}
}

View File

@@ -1,52 +0,0 @@
<!DOCTYPE qhelp PUBLIC
"-//Semmle//qhelp//EN"
"qhelp.dtd">
<qhelp>
<overview>
<p>
A TLS/SSL certificate validation callback that always returns <code>true</code> trusts every certificate,
regardless of any validation errors that were detected. This allows an attacker to perform a machine-in-the-middle
attack against the application, therefore breaking any security that Transport Layer Security (TLS) provides.
</p>
<p>
An attack might look like this:
</p>
<ol>
<li>The vulnerable program connects to <code>https://example.com</code>.</li>
<li>The attacker intercepts this connection and presents a valid, self-signed certificate for <code>https://example.com</code>.</li>
<li>The vulnerable program calls the certificate validation callback to check whether it should trust the certificate.</li>
<li>The callback ignores the <code>SslPolicyErrors</code> argument and returns <code>true</code>.</li>
<li>The vulnerable program accepts the certificate and proceeds with the connection, since the callback indicated that the certificate is trusted.</li>
<li>The attacker can now read the data the program sends to <code>https://example.com</code> and/or alter its replies while the program thinks the connection is secure.</li>
</ol>
</overview>
<recommendation>
<p>
Do not use a certificate validation callback that unconditionally returns <code>true</code>.
Either rely on the default certificate validation, or implement a callback that inspects the
<code>SslPolicyErrors</code> argument and only trusts a specific, known certificate (for example, when
using a self-signed certificate that has been explicitly pinned).
</p>
</recommendation>
<example>
<p>
In the first (bad) example, the callback always returns <code>true</code> and therefore trusts any certificate,
which allows an attacker to perform a machine-in-the-middle attack. In the second (good) example, the callback
returns <code>true</code> only when there are no validation errors.
</p>
<sample src="AcceptAnyCertificate.cs" />
</example>
<references>
<li>Microsoft Learn:
<a href="https://learn.microsoft.com/en-us/dotnet/api/system.net.security.remotecertificatevalidationcallback">RemoteCertificateValidationCallback Delegate</a>.</li>
<li>Microsoft Learn:
<a href="https://learn.microsoft.com/en-us/dotnet/fundamentals/code-analysis/quality-rules/ca5359">CA5359: Do not disable certificate validation</a>.</li>
<li>OWASP:
<a href="https://owasp.org/www-community/attacks/Manipulator-in-the-middle_attack">Manipulator-in-the-middle attack</a>.</li>
</references>
</qhelp>

View File

@@ -1,101 +0,0 @@
/**
* @name Accepting any TLS certificate during validation
* @description A certificate validation callback that always accepts any certificate
* allows an attacker to perform a machine-in-the-middle attack.
* @kind path-problem
* @problem.severity error
* @security-severity 7.5
* @precision high
* @id cs/accept-any-certificate
* @tags security
* external/cwe/cwe-295
*/
import csharp
import semmle.code.csharp.dataflow.DataFlow::DataFlow
import AcceptAnyCertificate::PathGraph
/**
* Holds if `c` always returns `true` and never returns `false`, i.e. it accepts
* every input it is given.
*/
predicate alwaysReturnsTrue(Callable c) {
c.getReturnType() instanceof BoolType and
// There is at least one returned value, and every returned value is the
// constant `true`.
forex(Expr ret | c.canReturn(ret) | ret.getValue() = "true")
}
/**
* A delegate type used as a TLS/SSL certificate validation callback. Such a
* delegate returns a `bool` (whether the certificate is trusted) and takes a
* `System.Net.Security.SslPolicyErrors` parameter describing any validation
* errors that were found. This covers `RemoteCertificateValidationCallback` as
* well as the `Func<..., SslPolicyErrors, bool>` callbacks used by, for example,
* `HttpClientHandler.ServerCertificateCustomValidationCallback`.
*/
class CertificateValidationCallbackType extends DelegateType {
CertificateValidationCallbackType() {
this.getReturnType() instanceof BoolType and
this.getAParameter().getType().hasFullyQualifiedName("System.Net.Security", "SslPolicyErrors")
}
}
/**
* Gets a callable that always accepts any certificate, referenced by the
* delegate-producing expression `e`.
*/
Callable getAcceptingCallable(Expr e) {
// A lambda or anonymous method, e.g. `(sender, cert, chain, errors) => true`.
result = e and
alwaysReturnsTrue(e)
or
// A method group, e.g. `AcceptAllCertificates`, possibly wrapped in an
// (implicit or explicit) delegate creation.
result = e.(DelegateCreation).getArgument().(CallableAccess).getTarget() and
alwaysReturnsTrue(result)
or
result = e.(CallableAccess).getTarget() and
alwaysReturnsTrue(result)
}
module AcceptAnyCertificateConfig implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
exists(getAcceptingCallable(source.asExpr()))
or
// `HttpClientHandler.DangerousAcceptAnyServerCertificateValidator` is a
// built-in callback that accepts every certificate.
source
.asExpr()
.(PropertyAccess)
.getTarget()
.hasName("DangerousAcceptAnyServerCertificateValidator")
}
predicate isSink(DataFlow::Node sink) {
// The value assigned to a property, field or local of certificate
// validation callback type.
exists(Assignable a |
a.getType() instanceof CertificateValidationCallbackType and
sink.asExpr() = a.getAnAssignedValue()
)
or
// The value passed as a certificate validation callback argument, e.g. to
// the `SslStream` constructor.
exists(Call call, Parameter p |
p = call.getTarget().getAParameter() and
p.getType() instanceof CertificateValidationCallbackType and
sink.asExpr() = call.getArgumentForParameter(p)
)
}
predicate observeDiffInformedIncrementalMode() { any() }
}
module AcceptAnyCertificate = DataFlow::Global<AcceptAnyCertificateConfig>;
from AcceptAnyCertificate::PathNode source, AcceptAnyCertificate::PathNode sink
where AcceptAnyCertificate::flowPath(source, sink)
select sink.getNode(), source, sink,
"This TLS certificate validation $@, which trusts any certificate.", source.getNode(),
"uses a callback"

View File

@@ -1,4 +0,0 @@
---
category: newQuery
---
* Added a new query, `cs/accept-any-certificate`, to detect TLS/SSL certificate validation callbacks that always accept any certificate (CWE-295).

View File

@@ -1,24 +0,0 @@
edges
| Test.cs:64:45:64:52 | access to local variable callback : (...) => ... | Test.cs:67:48:67:55 | access to local variable callback | provenance | |
| Test.cs:65:13:65:56 | (...) => ... : (...) => ... | Test.cs:64:45:64:52 | access to local variable callback : (...) => ... | provenance | |
nodes
| Test.cs:14:13:14:57 | (...) => ... | semmle.label | (...) => ... |
| Test.cs:22:13:25:13 | (...) => ... | semmle.label | (...) => ... |
| Test.cs:33:13:33:74 | access to property DangerousAcceptAnyServerCertificateValidator | semmle.label | access to property DangerousAcceptAnyServerCertificateValidator |
| Test.cs:40:13:40:56 | (...) => ... | semmle.label | (...) => ... |
| Test.cs:52:67:52:75 | delegate creation of type RemoteCertificateValidationCallback | semmle.label | delegate creation of type RemoteCertificateValidationCallback |
| Test.cs:59:13:59:56 | (...) => ... | semmle.label | (...) => ... |
| Test.cs:64:45:64:52 | access to local variable callback : (...) => ... | semmle.label | access to local variable callback : (...) => ... |
| Test.cs:65:13:65:56 | (...) => ... | semmle.label | (...) => ... |
| Test.cs:65:13:65:56 | (...) => ... : (...) => ... | semmle.label | (...) => ... : (...) => ... |
| Test.cs:67:48:67:55 | access to local variable callback | semmle.label | access to local variable callback |
subpaths
#select
| Test.cs:14:13:14:57 | (...) => ... | Test.cs:14:13:14:57 | (...) => ... | Test.cs:14:13:14:57 | (...) => ... | This TLS certificate validation $@, which trusts any certificate. | Test.cs:14:13:14:57 | (...) => ... | uses a callback |
| Test.cs:22:13:25:13 | (...) => ... | Test.cs:22:13:25:13 | (...) => ... | Test.cs:22:13:25:13 | (...) => ... | This TLS certificate validation $@, which trusts any certificate. | Test.cs:22:13:25:13 | (...) => ... | uses a callback |
| Test.cs:33:13:33:74 | access to property DangerousAcceptAnyServerCertificateValidator | Test.cs:33:13:33:74 | access to property DangerousAcceptAnyServerCertificateValidator | Test.cs:33:13:33:74 | access to property DangerousAcceptAnyServerCertificateValidator | This TLS certificate validation $@, which trusts any certificate. | Test.cs:33:13:33:74 | access to property DangerousAcceptAnyServerCertificateValidator | uses a callback |
| Test.cs:40:13:40:56 | (...) => ... | Test.cs:40:13:40:56 | (...) => ... | Test.cs:40:13:40:56 | (...) => ... | This TLS certificate validation $@, which trusts any certificate. | Test.cs:40:13:40:56 | (...) => ... | uses a callback |
| Test.cs:52:67:52:75 | delegate creation of type RemoteCertificateValidationCallback | Test.cs:52:67:52:75 | delegate creation of type RemoteCertificateValidationCallback | Test.cs:52:67:52:75 | delegate creation of type RemoteCertificateValidationCallback | This TLS certificate validation $@, which trusts any certificate. | Test.cs:52:67:52:75 | delegate creation of type RemoteCertificateValidationCallback | uses a callback |
| Test.cs:59:13:59:56 | (...) => ... | Test.cs:59:13:59:56 | (...) => ... | Test.cs:59:13:59:56 | (...) => ... | This TLS certificate validation $@, which trusts any certificate. | Test.cs:59:13:59:56 | (...) => ... | uses a callback |
| Test.cs:65:13:65:56 | (...) => ... | Test.cs:65:13:65:56 | (...) => ... | Test.cs:65:13:65:56 | (...) => ... | This TLS certificate validation $@, which trusts any certificate. | Test.cs:65:13:65:56 | (...) => ... | uses a callback |
| Test.cs:67:48:67:55 | access to local variable callback | Test.cs:65:13:65:56 | (...) => ... : (...) => ... | Test.cs:67:48:67:55 | access to local variable callback | This TLS certificate validation $@, which trusts any certificate. | Test.cs:65:13:65:56 | (...) => ... | uses a callback |

View File

@@ -1 +0,0 @@
Security Features/CWE-295/AcceptAnyCertificate.ql

View File

@@ -1,89 +0,0 @@
using System.IO;
using System.Net;
using System.Net.Http;
using System.Net.Security;
using System.Security.Cryptography.X509Certificates;
public class CertificateValidationTests
{
public void HttpClientHandlerBad()
{
var handler = new HttpClientHandler();
// BAD: always trusts any certificate.
handler.ServerCertificateCustomValidationCallback =
(request, certificate, chain, errors) => true;
}
public void HttpClientHandlerBlockBodyBad()
{
var handler = new HttpClientHandler();
// BAD: always trusts any certificate.
handler.ServerCertificateCustomValidationCallback =
(request, certificate, chain, errors) =>
{
return true;
};
}
public void HttpClientHandlerDangerousBad()
{
var handler = new HttpClientHandler();
// BAD: built-in callback that accepts any certificate.
handler.ServerCertificateCustomValidationCallback =
HttpClientHandler.DangerousAcceptAnyServerCertificateValidator;
}
public void ServicePointManagerBad()
{
// BAD: always trusts any certificate.
ServicePointManager.ServerCertificateValidationCallback =
(sender, certificate, chain, errors) => true;
}
private static bool AcceptAll(object sender, X509Certificate certificate, X509Chain chain,
SslPolicyErrors errors)
{
return true;
}
public void MethodGroupBad()
{
// BAD: the referenced method always returns true.
ServicePointManager.ServerCertificateValidationCallback = AcceptAll;
}
public void SslStreamBad(Stream stream)
{
// BAD: the validation callback always returns true.
var ssl = new SslStream(stream, false,
(sender, certificate, chain, errors) => true);
}
public void IndirectBad(Stream stream)
{
RemoteCertificateValidationCallback callback =
(sender, certificate, chain, errors) => true;
// BAD: the callback flowing here always returns true.
var ssl = new SslStream(stream, false, callback);
}
public void HttpClientHandlerGood()
{
var handler = new HttpClientHandler();
// GOOD: the certificate is only trusted when there are no validation errors.
handler.ServerCertificateCustomValidationCallback =
(request, certificate, chain, errors) => errors == SslPolicyErrors.None;
}
private static bool Validate(object sender, X509Certificate certificate, X509Chain chain,
SslPolicyErrors errors)
{
return errors == SslPolicyErrors.None;
}
public void MethodGroupGood()
{
// GOOD: the referenced method performs real validation.
ServicePointManager.ServerCertificateValidationCallback = Validate;
}
}

View File

@@ -1,2 +0,0 @@
semmle-extractor-options: /nostdlib /noconfig
semmle-extractor-options: --load-sources-from-project:${testdir}/../../../../resources/stubs/_frameworks/Microsoft.NETCore.App/Microsoft.NETCore.App.csproj

View File

@@ -280,10 +280,11 @@ pub fn location_label(writer: &mut trap::Writer, location: trap::Location) -> tr
}
/// Extracts the source file at `path`, which is assumed to be canonicalized.
/// When `yeast_runner` is `Some`, the parsed tree is first transformed
/// through the supplied yeast `Runner` before TRAP extraction. Building the
/// `Runner` (which parses YAML and constructs the schema) is the caller's
/// responsibility, allowing it to be done once and shared across files.
/// When `desugarer` is `Some`, the parsed tree is first transformed
/// through the supplied yeast desugarer before TRAP extraction. Building
/// the desugarer (which parses YAML and constructs the schema) is the
/// caller's responsibility, allowing it to be done once and shared across
/// files.
#[allow(clippy::too_many_arguments)]
pub fn extract(
language: &Language,
@@ -295,7 +296,7 @@ pub fn extract(
path: &Path,
source: &[u8],
ranges: &[Range],
yeast_runner: Option<&yeast::Runner<'_>>,
desugarer: Option<&dyn yeast::Desugarer>,
) {
let path_str = file_paths::normalize_and_transform_path(path, transformer);
let source_root = std::env::current_dir()
@@ -328,11 +329,14 @@ pub fn extract(
schema,
);
if let Some(yeast_runner) = yeast_runner {
let ast = yeast_runner
if let Some(desugarer) = desugarer {
let ast = desugarer
.run_from_tree(&tree, source)
.unwrap_or_else(|e| panic!("Desugaring failed for {path_str}: {e}"));
traverse_yeast(&ast, &mut visitor);
// Comments and other `extra` nodes are not represented in the desugared
// AST, so recover them directly from the original parse tree.
traverse_extras(&tree, &mut visitor);
} else {
traverse(&tree, &mut visitor);
}
@@ -365,6 +369,8 @@ struct Visitor<'a> {
ast_node_parent_table_name: String,
/// Language-specific name of the tokeninfo table
tokeninfo_table_name: String,
/// Language-specific name of the trivia tokeninfo table
trivia_tokeninfo_table_name: String,
/// A lookup table from type name to node types
schema: &'a NodeTypeMap,
/// A stack for gathering information from child nodes. Whenever a node is
@@ -395,11 +401,33 @@ impl<'a> Visitor<'a> {
ast_node_location_table_name: format!("{language_prefix}_ast_node_location"),
ast_node_parent_table_name: format!("{language_prefix}_ast_node_parent"),
tokeninfo_table_name: format!("{language_prefix}_tokeninfo"),
trivia_tokeninfo_table_name: format!("{language_prefix}_trivia_tokeninfo"),
schema,
stack: Vec::new(),
}
}
/// Emits a `TriviaToken` for the given `extra` node (e.g. a comment) from
/// the original parse tree. Trivia tokens carry a location and their source
/// text, but are not attached to a parent in the (possibly desugared) AST.
fn emit_trivia_token(&mut self, node: &Node) {
let id = self.trap_writer.fresh_id();
let loc = location_for(self, self.file_label, node);
let loc_label = location_label(self.trap_writer, loc);
self.trap_writer.add_tuple(
&self.ast_node_location_table_name,
vec![trap::Arg::Label(id), trap::Arg::Label(loc_label)],
);
self.trap_writer.add_tuple(
&self.trivia_tokeninfo_table_name,
vec![
trap::Arg::Label(id),
trap::Arg::Int(node.kind_id() as usize),
sliced_source_arg(self.source, node),
],
);
}
fn record_parse_error(&mut self, loc: trap::Label, mesg: &diagnostics::DiagnosticMessage) {
self.diagnostics_writer.write(mesg);
let id = self.trap_writer.fresh_id();
@@ -835,6 +863,24 @@ fn traverse(tree: &Tree, visitor: &mut Visitor) {
}
}
/// Walks the original tree-sitter tree and emits a `TriviaToken` for every
/// `extra` node (e.g. a comment). Used to preserve comments that would
/// otherwise be lost after a desugaring pass rewrites the tree.
fn traverse_extras(tree: &Tree, visitor: &mut Visitor) {
emit_extras_in(visitor, tree.root_node());
}
fn emit_extras_in(visitor: &mut Visitor, node: Node<'_>) {
let mut cursor = node.walk();
for child in node.children(&mut cursor) {
if child.is_extra() {
visitor.emit_trivia_token(&child);
} else {
emit_extras_in(visitor, child);
}
}
}
fn traverse_yeast(tree: &yeast::Ast, visitor: &mut Visitor) {
use yeast::Cursor;
let mut cursor = tree.walk();

View File

@@ -13,11 +13,14 @@ pub struct LanguageSpec {
pub prefix: &'static str,
pub ts_language: tree_sitter::Language,
pub node_types: &'static str,
/// Optional yeast desugaring configuration. When set, the parsed
/// tree is rewritten through yeast before TRAP extraction. The
/// config's `output_node_types_yaml` (if set) provides the schema
/// used both at runtime (for the rewriter) and for TRAP validation.
pub desugar: Option<yeast::DesugaringConfig>,
/// Optional desugarer. When set, the parsed tree is rewritten through
/// the desugarer before TRAP extraction. The desugarer's
/// `output_node_types_yaml()` (if set) provides the schema used both
/// at runtime (for the rewriter) and for TRAP validation.
///
/// `Box<dyn yeast::Desugarer>` so the shared extractor is agnostic to
/// the user-defined context type the desugarer uses internally.
pub desugar: Option<Box<dyn yeast::Desugarer>>,
pub file_globs: Vec<String>,
}
@@ -91,35 +94,22 @@ impl Extractor {
.collect();
let mut schemas = vec![];
let mut yeast_runners = Vec::new();
for lang in &self.languages {
let effective_node_types: String =
match lang.desugar.as_ref().and_then(|c| c.output_node_types_yaml) {
Some(yaml) => yeast::node_types_yaml::convert(yaml).map_err(|e| {
std::io::Error::other(format!(
"Failed to convert YAML node-types to JSON for {}: {e}",
lang.prefix
))
})?,
None => lang.node_types.to_string(),
};
let schema = node_types::read_node_types_str(lang.prefix, &effective_node_types)?;
schemas.push(schema);
// Build the yeast runner once per language so the YAML schema
// isn't re-parsed for every file.
let yeast_runner = lang
let effective_node_types: String = match lang
.desugar
.as_ref()
.map(|config| yeast::Runner::from_config(lang.ts_language.clone(), config))
.transpose()
.map_err(|e| {
.and_then(|d| d.output_node_types_yaml())
{
Some(yaml) => yeast::node_types_yaml::convert(yaml).map_err(|e| {
std::io::Error::other(format!(
"Failed to build desugaring runner for {}: {e}",
"Failed to convert YAML node-types to JSON for {}: {e}",
lang.prefix
))
})?;
yeast_runners.push(yeast_runner);
})?,
None => lang.node_types.to_string(),
};
let schema = node_types::read_node_types_str(lang.prefix, &effective_node_types)?;
schemas.push(schema);
}
// Construct a single globset containing all language globs,
@@ -194,7 +184,7 @@ impl Extractor {
&path,
&source,
&[],
yeast_runners[i].as_ref(),
lang.desugar.as_deref(),
);
std::fs::create_dir_all(src_archive_file.parent().unwrap())?;
std::fs::copy(&path, &src_archive_file)?;

View File

@@ -68,7 +68,12 @@ pub fn generate(
let node_parent_table_name = format!("{}_ast_node_parent", &prefix);
let token_name = format!("{}_token", &prefix);
let tokeninfo_name = format!("{}_tokeninfo", &prefix);
let trivia_token_name = format!("{}_trivia_token", &prefix);
let trivia_tokeninfo_name = format!("{}_trivia_tokeninfo", &prefix);
let reserved_word_name = format!("{}_reserved_word", &prefix);
// When a desugaring is configured, comments and other `extra` nodes are
// preserved from the original parse tree as `TriviaToken`s.
let has_trivia_tokens = language.desugar.is_some();
let effective_node_types: String = match language
.desugar
.as_ref()
@@ -85,28 +90,35 @@ pub fn generate(
let nodes = node_types::read_node_types_str(&prefix, &effective_node_types)?;
let (dbscheme_entries, mut ast_node_members, token_kinds) = convert_nodes(&nodes);
ast_node_members.insert(&token_name);
if has_trivia_tokens {
ast_node_members.insert(&trivia_token_name);
}
writeln!(&mut dbscheme_writer, "/*- {} dbscheme -*/", language.name)?;
dbscheme::write(&mut dbscheme_writer, &dbscheme_entries)?;
let token_case = create_token_case(&token_name, token_kinds);
dbscheme::write(
&mut dbscheme_writer,
&[
dbscheme::Entry::Table(create_tokeninfo(&tokeninfo_name, &token_name)),
dbscheme::Entry::Case(token_case),
dbscheme::Entry::Union(dbscheme::Union {
name: &ast_node_name,
members: ast_node_members,
}),
dbscheme::Entry::Table(create_ast_node_location_table(
&node_location_table_name,
&ast_node_name,
)),
dbscheme::Entry::Table(create_ast_node_parent_table(
&node_parent_table_name,
&ast_node_name,
)),
],
)?;
let mut dbscheme_tail = vec![
dbscheme::Entry::Table(create_tokeninfo(&tokeninfo_name, &token_name)),
dbscheme::Entry::Case(token_case),
];
if has_trivia_tokens {
dbscheme_tail.push(dbscheme::Entry::Table(create_tokeninfo(
&trivia_tokeninfo_name,
&trivia_token_name,
)));
}
dbscheme_tail.push(dbscheme::Entry::Union(dbscheme::Union {
name: &ast_node_name,
members: ast_node_members,
}));
dbscheme_tail.push(dbscheme::Entry::Table(create_ast_node_location_table(
&node_location_table_name,
&ast_node_name,
)));
dbscheme_tail.push(dbscheme::Entry::Table(create_ast_node_parent_table(
&node_parent_table_name,
&ast_node_name,
)));
dbscheme::write(&mut dbscheme_writer, &dbscheme_tail)?;
let mut body = vec![
ql::TopLevel::Class(ql_gen::create_ast_node_class(
@@ -116,6 +128,12 @@ pub fn generate(
)),
ql::TopLevel::Class(ql_gen::create_token_class(&token_name, &tokeninfo_name)),
];
if has_trivia_tokens {
body.push(ql::TopLevel::Class(ql_gen::create_trivia_token_class(
&trivia_token_name,
&trivia_tokeninfo_name,
)));
}
// Only emit the ReservedWord class when there are actually unnamed token
// types in the schema (i.e., @{prefix}_reserved_word exists in the dbscheme).
// When converting from a YEAST YAML schema that has no unnamed tokens, this

View File

@@ -199,6 +199,70 @@ pub fn create_token_class<'a>(token_type: &'a str, tokeninfo: &'a str) -> ql::Cl
}
}
/// Creates the `TriviaToken` class. Trivia tokens (e.g. comments) are
/// `extra` nodes preserved from the original parse tree even when the tree has
/// been rewritten by a desugaring pass. They are not part of the regular
/// `Token` hierarchy because they do not appear in the (possibly desugared)
/// output schema.
pub fn create_trivia_token_class<'a>(
trivia_token_type: &'a str,
trivia_tokeninfo: &'a str,
) -> ql::Class<'a> {
let trivia_tokeninfo_arity = 3; // id, kind, value
let get_value = ql::Predicate {
qldoc: Some(String::from("Gets the source text of this trivia token.")),
name: "getValue",
overridden: false,
is_private: false,
is_final: true,
return_type: Some(ql::Type::String),
formal_parameters: vec![],
body: create_get_field_expr_for_column_storage(
"result",
trivia_tokeninfo,
1,
trivia_tokeninfo_arity,
),
overlay: None,
};
let to_string = ql::Predicate {
qldoc: Some(String::from(
"Gets a string representation of this element.",
)),
name: "toString",
overridden: true,
is_private: false,
is_final: true,
return_type: Some(ql::Type::String),
formal_parameters: vec![],
body: ql::Expression::Equals(
Box::new(ql::Expression::Var("result")),
Box::new(ql::Expression::Dot(
Box::new(ql::Expression::Var("this")),
"getValue",
vec![],
)),
),
overlay: None,
};
ql::Class {
qldoc: Some(String::from(
"A trivia token, such as a comment, preserved from the original parse tree.",
)),
name: "TriviaToken",
is_abstract: false,
supertypes: vec![ql::Type::At(trivia_token_type), ql::Type::Normal("AstNode")]
.into_iter()
.collect(),
characteristic_predicate: None,
predicates: vec![
get_value,
to_string,
create_get_a_primary_ql_class("TriviaToken", false),
],
}
}
// Creates the `ReservedWord` class.
pub fn create_reserved_word_class(db_name: &str) -> ql::Class<'_> {
let class_name = "ReservedWord";

View File

@@ -44,8 +44,19 @@ pub fn query(input: TokenStream) -> TokenStream {
/// {expr} - embed a Rust expression returning Id
/// {..expr} - splice an iterable of Id (in child/field position)
/// field: {..expr} - splice into a named field
/// {expr}.map(p -> tpl) - apply tpl to each element; splice result
/// {expr}.reduce_left(f -> init, acc, e -> fold)
/// - fold with per-element init; splice 0 or 1 result
/// ```
///
/// Chain syntax after `{expr}` or `{..expr}`:
/// - `.map(param -> template)` — one output node per input element.
/// - `.reduce_left(first -> init, acc, elem -> fold)` — fold left; the first
/// element is converted by `init`, subsequent elements are folded by `fold`
/// with the accumulator bound to `acc`. An empty iterable yields nothing.
/// - Chains always splice (the result is iterable).
/// - Multiple chains can be chained, e.g. `.map(...).reduce_left(...)`.
///
/// Can be called with an explicit context or using the implicit context
/// from an enclosing `rule!`:
///
@@ -110,3 +121,37 @@ pub fn rule(input: TokenStream) -> TokenStream {
Err(err) => err.to_compile_error().into(),
}
}
/// Define a desugaring rule whose transform is a hand-written Rust block.
///
/// Use `manual_rule!` when the transform needs control over capture
/// translation timing — for example, when an outer rule needs to set
/// state in `ctx` (the `BuildCtx`'s user context) before recursive
/// translation reaches inner rules that read that state.
///
/// ```text
/// manual_rule!(
/// (query_pattern field: (_) @name)
/// {
/// // `ctx` is a `&mut BuildCtx<'_, C>`; capture variables
/// // (`name: NodeRef`, etc.) are bound from the query.
/// let translated = ctx.translate(name)?;
/// Ok(translated)
/// }
/// )
/// ```
///
/// Differences from [`rule!`]:
/// - Captures are **not** auto-translated before the body runs; they
/// refer to raw input-schema nodes. Use [`BuildCtx::translate`] (or
/// [`BuildCtx::translate_opt`]) to translate them when you choose.
/// - The body is plain Rust returning `Result<Vec<Id>, String>` — no
/// tree template, no `Ok(...)` wrap.
#[proc_macro]
pub fn manual_rule(input: TokenStream) -> TokenStream {
let input2: TokenStream2 = input.into();
match parse::parse_manual_rule_top(input2) {
Ok(output) => output.into(),
Err(err) => err.to_compile_error().into(),
}
}

View File

@@ -121,9 +121,9 @@ fn parse_query_fields(tokens: &mut Tokens) -> Result<Vec<TokenStream>> {
std::collections::HashMap::new();
let mut bare_children: Vec<TokenStream> = Vec::new();
let push_field_elem = |order: &mut Vec<String>,
map: &mut std::collections::HashMap<String, Vec<TokenStream>>,
name: String,
elem: TokenStream| {
map: &mut std::collections::HashMap<String, Vec<TokenStream>>,
name: String,
elem: TokenStream| {
if !map.contains_key(&name) {
order.push(name.clone());
map.insert(name, vec![elem]);
@@ -141,7 +141,12 @@ fn parse_query_fields(tokens: &mut Tokens) -> Result<Vec<TokenStream>> {
// Parse the field's pattern. To support repetition like
// `field: (kind)* @cap`, parse the atom first, then check for
// a quantifier, and lastly handle a trailing `@capture`.
let atom = parse_query_atom(tokens)?;
// `field: @cap` is sugar for `field: _ @cap`.
let atom = if peek_is_at(tokens) {
quote! { yeast::query::QueryNode::Any { match_unnamed: true } }
} else {
parse_query_atom(tokens)?
};
if peek_is_repetition(tokens) {
let rep = expect_repetition(tokens)?;
let elem = quote! {
@@ -155,8 +160,7 @@ fn parse_query_fields(tokens: &mut Tokens) -> Result<Vec<TokenStream>> {
} else {
let child = if peek_is_at(tokens) {
tokens.next();
let capture_name =
expect_ident(tokens, "expected capture name after @")?;
let capture_name = expect_ident(tokens, "expected capture name after @")?;
let name_str = capture_name.to_string();
quote! {
yeast::query::QueryNode::Capture {
@@ -259,6 +263,7 @@ fn parse_query_list(tokens: &mut Tokens) -> Result<Vec<TokenStream>> {
yeast::query::QueryListElem::SingleNode(#node)
},
)?;
let elem = maybe_wrap_list_capture(tokens, elem)?;
elems.push(elem);
continue;
}
@@ -276,6 +281,7 @@ fn parse_query_list(tokens: &mut Tokens) -> Result<Vec<TokenStream>> {
yeast::query::QueryListElem::SingleNode(#node)
},
)?;
let elem = maybe_wrap_list_capture(tokens, elem)?;
elems.push(elem);
continue;
}
@@ -289,10 +295,10 @@ fn parse_query_list(tokens: &mut Tokens) -> Result<Vec<TokenStream>> {
// tree! / trees! parsing — direct code generation against BuildCtx
// ---------------------------------------------------------------------------
const IMPLICIT_CTX: &str = "__yeast_ctx";
const IMPLICIT_CTX: &str = "ctx";
/// Determine the context identifier: either explicit `ctx,` or the implicit
/// `__yeast_ctx` from an enclosing `rule!`.
/// `ctx` from an enclosing `rule!`.
fn parse_ctx_or_implicit(tokens: &mut Tokens) -> Ident {
// Check if first token is an ident followed by a comma
let mut lookahead = tokens.clone();
@@ -352,7 +358,7 @@ fn parse_direct_node(tokens: &mut Tokens, ctx: &Ident) -> Result<TokenStream> {
Some(TokenTree::Group(g)) if g.delimiter() == Delimiter::Brace => {
let group = expect_group(tokens, Delimiter::Brace)?;
let expr = group.stream();
Ok(quote! { ::std::convert::Into::<usize>::into(#expr) })
Ok(quote! { ::std::convert::Into::<usize>::into({ #expr }) })
}
Some(TokenTree::Group(g)) if g.delimiter() == Delimiter::Parenthesis => {
let group = expect_group(tokens, Delimiter::Parenthesis)?;
@@ -389,8 +395,10 @@ fn parse_direct_node_inner(tokens: &mut Tokens, ctx: &Ident) -> Result<TokenStre
let expr = group.stream();
return Ok(quote! {
{
let __value = yeast::YeastDisplay::yeast_to_string(&(#expr), &*#ctx.ast);
#ctx.literal(#kind_str, &__value)
let __expr = { #expr };
let __value = yeast::YeastDisplay::yeast_to_string(&__expr, &*#ctx.ast);
let __source_range = yeast::YeastSourceRange::yeast_source_range(&__expr, &*#ctx.ast);
#ctx.literal_with_source_range(#kind_str, &__value, __source_range)
}
});
}
@@ -411,7 +419,11 @@ fn parse_direct_node_inner(tokens: &mut Tokens, ctx: &Ident) -> Result<TokenStre
// Named fields — compute each value into a temp, then reference it
while peek_is_field(tokens) {
let field_name = expect_ident(tokens, "expected field name")?;
let field_str = field_name.to_string().strip_prefix("r#").unwrap_or(&field_name.to_string()).to_string();
let field_str = field_name
.to_string()
.strip_prefix("r#")
.unwrap_or(&field_name.to_string())
.to_string();
expect_punct(tokens, ':', "expected `:` after field name")?;
let temp = Ident::new(
&format!("__field_{field_str}_{field_counter}"),
@@ -419,23 +431,36 @@ fn parse_direct_node_inner(tokens: &mut Tokens, ctx: &Ident) -> Result<TokenStre
);
field_counter += 1;
// Check for field: {..expr} — splice a Vec<Id> into the field
// Check for field: {..expr}.chain or field: {expr}.chain — splice a Vec<Id> into the field
if peek_is_group(tokens, Delimiter::Brace) {
let group_clone = tokens.clone().next().unwrap();
if let TokenTree::Group(g) = &group_clone {
let mut inner_check = g.stream().into_iter();
let is_splice = matches!(inner_check.next(), Some(TokenTree::Punct(p)) if p.as_char() == '.')
&& matches!(inner_check.next(), Some(TokenTree::Punct(p)) if p.as_char() == '.');
if is_splice {
// Determine if a chain (.map(..)) follows the `{}` group.
let mut after = tokens.clone();
after.next(); // skip the brace group
let has_chain =
matches!(after.peek(), Some(TokenTree::Punct(p)) if p.as_char() == '.');
if is_splice || has_chain {
let group = expect_group(tokens, Delimiter::Brace)?;
let mut inner = group.stream().into_iter().peekable();
inner.next(); // consume first .
inner.next(); // consume second .
let expr: proc_macro2::TokenStream = inner.collect();
let base: TokenStream = if is_splice {
let mut inner = group.stream().into_iter().peekable();
inner.next(); // consume first .
inner.next(); // consume second .
let expr: TokenStream = inner.collect();
quote! {
{ #expr }.into_iter().map(::std::convert::Into::<usize>::into)
}
} else {
let expr = group.stream();
quote! { { #expr }.into_iter() }
};
let chained = parse_chain_suffix(tokens, ctx, base)?;
stmts.push(quote! {
let #temp: Vec<usize> = (#expr).into_iter()
.map(::std::convert::Into::<usize>::into)
.collect();
let #temp: Vec<usize> = #chained.collect();
});
// An empty splice means the field is absent — skip it
// entirely rather than emitting an empty named field.
@@ -472,6 +497,94 @@ fn parse_direct_node_inner(tokens: &mut Tokens, ctx: &Ident) -> Result<TokenStre
})
}
/// Parse a chain of `.method(args)` suffixes after a `{expr}` or `{..expr}`
/// placeholder in tree templates. Currently supports:
///
/// ```text
/// .map(param -> template) -- iterator map: produces Vec<usize>
/// ```
///
/// The chain may be empty (returns `base` unchanged). Multiple chained calls
/// are supported, e.g. `.map(p -> ...).map(q -> ...)`.
///
/// Each call expects the receiver to be an iterator. The `base` argument
/// should therefore already be an iterator (use `.into_iter()` on it before
/// calling this function).
fn parse_chain_suffix(tokens: &mut Tokens, ctx: &Ident, base: TokenStream) -> Result<TokenStream> {
let mut current = base;
while matches!(tokens.peek(), Some(TokenTree::Punct(p)) if p.as_char() == '.') {
tokens.next(); // consume .
let method = expect_ident(tokens, "expected method name after `.`")?;
let method_str = method.to_string();
let args_group = expect_group(tokens, Delimiter::Parenthesis)?;
match method_str.as_str() {
"map" => {
let mut inner = args_group.stream().into_iter().peekable();
let param = expect_ident(&mut inner, "expected lambda parameter name")?;
expect_punct(&mut inner, '-', "expected `->` after lambda parameter")?;
expect_punct(&mut inner, '>', "expected `->` after lambda parameter")?;
let body = parse_direct_node(&mut inner, ctx)?;
if let Some(tok) = inner.next() {
return Err(syn::Error::new_spanned(
tok,
"unexpected token after lambda body",
));
}
current = quote! {
#current.map(|#param| #body)
};
}
"reduce_left" => {
// Syntax: reduce_left(first -> init_tpl, acc, elem -> fold_tpl)
// - first -> init_tpl : converts the first element to the initial accumulator
// - acc, elem -> fold_tpl : fold step (acc = current accumulator, elem = next element)
// Empty iterator produces an empty iterator; non-empty produces a single-element iterator.
let mut inner = args_group.stream().into_iter().peekable();
let init_param = expect_ident(&mut inner, "expected initial lambda parameter")?;
expect_punct(&mut inner, '-', "expected `->` after init parameter")?;
expect_punct(&mut inner, '>', "expected `->` after init parameter")?;
let init_body = parse_direct_node(&mut inner, ctx)?;
expect_punct(&mut inner, ',', "expected `,` after init template")?;
let acc_param = expect_ident(&mut inner, "expected accumulator parameter")?;
expect_punct(&mut inner, ',', "expected `,` after accumulator parameter")?;
let elem_param = expect_ident(&mut inner, "expected element parameter")?;
expect_punct(&mut inner, '-', "expected `->` after element parameter")?;
expect_punct(&mut inner, '>', "expected `->` after element parameter")?;
let fold_body = parse_direct_node(&mut inner, ctx)?;
if let Some(tok) = inner.next() {
return Err(syn::Error::new_spanned(
tok,
"unexpected token after fold template",
));
}
current = quote! {
{
let mut __iter = #current;
let __result: Option<usize> = if let Some(#init_param) = __iter.next() {
let mut __acc: usize = #init_body;
for #elem_param in __iter {
let #acc_param: usize = __acc;
__acc = #fold_body;
}
Some(__acc)
} else {
None
};
__result.into_iter()
}
};
}
_ => {
return Err(syn::Error::new_spanned(
method,
format!("unknown builtin method `.{method_str}()`"),
));
}
}
}
Ok(current)
}
/// Parse the top-level list of a `trees!` template.
/// Each item is a node template or `{expr}` splice.
fn parse_direct_list(tokens: &mut Tokens, ctx: &Ident) -> Result<Vec<TokenStream>> {
@@ -492,23 +605,33 @@ fn parse_direct_list(tokens: &mut Tokens, ctx: &Ident) -> Result<Vec<TokenStream
continue;
}
// {expr} or {..expr} — single node or splice
// {expr} or {..expr} (with optional .chain) — single node or splice
if peek_is_group(tokens, Delimiter::Brace) {
let group = expect_group(tokens, Delimiter::Brace)?;
let has_chain =
matches!(tokens.peek(), Some(TokenTree::Punct(p)) if p.as_char() == '.');
let mut inner = group.stream().into_iter().peekable();
if peek_is_dotdot(&inner) {
inner.next(); // consume first .
inner.next(); // consume second .
let expr: TokenStream = inner.collect();
let is_splice = peek_is_dotdot(&inner);
if is_splice || has_chain {
let base: TokenStream = if is_splice {
inner.next(); // consume first .
inner.next(); // consume second .
let expr: TokenStream = inner.collect();
quote! {
{ #expr }.into_iter().map(::std::convert::Into::<usize>::into)
}
} else {
let expr = group.stream();
quote! { { #expr }.into_iter() }
};
let chained = parse_chain_suffix(tokens, ctx, base)?;
items.push(quote! {
__nodes.extend(
(#expr).into_iter().map(::std::convert::Into::<usize>::into)
);
__nodes.extend(#chained);
});
} else {
let expr = group.stream();
items.push(quote! {
__nodes.push(::std::convert::Into::<usize>::into(#expr));
__nodes.push(::std::convert::Into::<usize>::into({ #expr }));
});
}
continue;
@@ -604,8 +727,11 @@ fn extract_captures_inner(
}
last_mult = CaptureMultiplicity::Single;
}
TokenTree::Punct(p) if matches!(p.as_char(), '*' | '+' | '?') => {
// Keep last_mult — the @capture follows
TokenTree::Punct(p) if p.as_char() == '*' || p.as_char() == '+' => {
last_mult = CaptureMultiplicity::Repeated;
}
TokenTree::Punct(p) if p.as_char() == '?' => {
last_mult = CaptureMultiplicity::Optional;
}
_ => {
last_mult = CaptureMultiplicity::Single;
@@ -763,10 +889,117 @@ pub fn parse_rule_top(input: TokenStream) -> Result<TokenStream> {
Ok(quote! {
{
let __query = #query_code;
yeast::Rule::new(__query, Box::new(|__ast: &mut yeast::Ast, __captures: yeast::captures::Captures, __fresh: &yeast::tree_builder::FreshScope, __source_range: Option<tree_sitter::Range>| {
yeast::Rule::new(__query, Box::new(|__ast: &mut yeast::Ast, mut __captures: yeast::captures::Captures, __fresh: &yeast::tree_builder::FreshScope, __source_range: Option<tree_sitter::Range>, __user_ctx: &mut _, __translator: yeast::TranslatorHandle<'_, _>| {
// Auto-translation prefix: recursively translate every
// captured node before invoking the user's transform body.
// For OneShot rules this preserves the legacy behaviour
// (input-schema captures translated to output-schema
// nodes); for Repeating rules it is a no-op.
__translator.auto_translate_captures(&mut __captures, __ast, __user_ctx)?;
#(#bindings)*
let mut #ctx_ident = yeast::build::BuildCtx::with_source_range(__ast, &__captures, __fresh, __source_range);
#transform_body
let mut #ctx_ident = yeast::build::BuildCtx::with_translator(__ast, &__captures, __fresh, __source_range, __user_ctx, __translator);
let __result: Vec<usize> = { #transform_body };
Ok(__result)
}))
}
})
}
/// Parse `manual_rule!( query { body } )`.
///
/// Like [`parse_rule_top`] but:
/// - Expects a Rust block `{ ... }` after the query (no `=>` arrow).
/// - Generates code that does NOT auto-translate captures before
/// running the body. Capture variables refer to raw (input-schema)
/// nodes; the body is responsible for explicit translation via
/// `ctx.translate(...)`.
/// - The body is included verbatim and must evaluate to
/// `Result<Vec<usize>, String>`.
pub fn parse_manual_rule_top(input: TokenStream) -> Result<TokenStream> {
let mut tokens = input.into_iter().peekable();
// Collect query tokens up to the body block `{ ... }`.
let mut query_tokens = Vec::new();
loop {
match tokens.peek() {
None => {
return Err(syn::Error::new(
Span::call_site(),
"expected a Rust block `{ ... }` after the query in manual_rule!",
))
}
Some(TokenTree::Group(g)) if g.delimiter() == Delimiter::Brace => break,
_ => {
query_tokens.push(tokens.next().unwrap());
}
}
}
let query_stream: TokenStream = query_tokens.into_iter().collect();
// Extract captures from the query (same as in `rule!`).
let captures = extract_captures(&query_stream);
// Parse the query into the QueryNode-building expression.
let query_code = parse_query_top(query_stream)?;
// Generate capture bindings (same as in `rule!`).
let ctx_ident = Ident::new(IMPLICIT_CTX, Span::call_site());
let bindings: Vec<TokenStream> = captures
.iter()
.map(|cap| {
let name = Ident::new(&cap.name, Span::call_site());
let name_str = &cap.name;
match cap.multiplicity {
CaptureMultiplicity::Repeated => quote! {
let #name: Vec<yeast::NodeRef> = __captures.get_all(#name_str)
.into_iter()
.map(yeast::NodeRef)
.collect();
},
CaptureMultiplicity::Optional => quote! {
let #name: Option<yeast::NodeRef> =
__captures.get_opt(#name_str).map(yeast::NodeRef);
},
CaptureMultiplicity::Single => quote! {
let #name: yeast::NodeRef =
yeast::NodeRef(__captures.get_var(#name_str).unwrap());
},
}
})
.collect();
// Consume the body block.
let body_group = match tokens.next() {
Some(TokenTree::Group(g)) if g.delimiter() == Delimiter::Brace => g,
other => {
return Err(syn::Error::new(
Span::call_site(),
format!(
"expected a Rust block `{{ ... }}` after the query in manual_rule!, found: {other:?}"
),
))
}
};
let body_stream = body_group.stream();
// No tokens should follow the body.
if let Some(tok) = tokens.next() {
return Err(syn::Error::new_spanned(
tok,
"unexpected token after manual_rule! body",
));
}
Ok(quote! {
{
let __query = #query_code;
yeast::Rule::new(__query, Box::new(|__ast: &mut yeast::Ast, __captures: yeast::captures::Captures, __fresh: &yeast::tree_builder::FreshScope, __source_range: Option<tree_sitter::Range>, __user_ctx: &mut _, __translator: yeast::TranslatorHandle<'_, _>| {
// No auto-translate prefix for manual rules — the body
// is responsible for translating captures explicitly.
#(#bindings)*
let mut #ctx_ident = yeast::build::BuildCtx::with_translator(__ast, &__captures, __fresh, __source_range, __user_ctx, __translator);
#body_stream
}))
}
})

View File

@@ -265,7 +265,21 @@ occurrences of the same `$name` within one `BuildCtx` share the same value:
)
```
`{..expr}` splices a `Vec<Id>` (or any iterable of `Id`):
The contents of `{…}` are treated as a Rust block, so multi-statement
expressions (with `let` bindings) work too:
```rust
(assignment
left: {tmp}
right: {
let lit = ctx.literal("integer", "0");
tree!((binary_expr op: (operator "+") left: {tmp} right: {lit}))
})
```
`{..expr}` splices a `Vec<Id>` (or any iterable of `Id`); the contents
are likewise a Rust block, so the splice can be the result of arbitrary
computation:
```rust
yeast::trees!(ctx,

View File

@@ -20,7 +20,7 @@ fn main() {
let args = Cli::parse();
let language = get_language(&args.language);
let source = std::fs::read_to_string(&args.file).unwrap();
let runner = yeast::Runner::new(language, &[]);
let runner: yeast::Runner = yeast::Runner::new(language, &[]);
let ast = runner.run(&source).unwrap();
println!("{}", ast.print(&source, ast.get_root()));
}

View File

@@ -2,28 +2,60 @@ use std::collections::BTreeMap;
use crate::captures::Captures;
use crate::tree_builder::FreshScope;
use crate::{Ast, FieldId, Id, NodeContent};
use crate::{Ast, FieldId, Id, NodeContent, TranslatorHandle};
/// Context for building new AST nodes during a transformation.
///
/// Used by the `tree!` and `trees!` macros. Holds a mutable reference to the
/// AST, a reference to the captures from a query match, and a `FreshScope` for
/// generating unique identifiers.
pub struct BuildCtx<'a> {
/// AST, a reference to the captures from a query match, a `FreshScope` for
/// generating unique identifiers, and a mutable reference to a user-defined
/// context of type `C`.
///
/// The user context `C` is shared across rules via the framework's driver:
/// outer rules can write to it before recursive translation, and inner rules
/// can read (or further mutate) it during their transforms. The framework
/// snapshots and restores the user context around each rule application, so
/// mutations made by a rule are visible to its descendants (via recursive
/// translation) but not to its parent's siblings.
///
/// `BuildCtx` implements [`Deref`] and [`DerefMut`] targeting `C`, so user
/// context fields are accessible as `ctx.my_field` directly (provided they
/// don't collide with `BuildCtx`'s own fields like `ast`, `captures`, etc.).
///
/// The default `C = ()` means rules that don't need any user context don't
/// pay any cost.
///
/// When constructed by the framework (via the rule! macro), `BuildCtx` also
/// carries a [`TranslatorHandle`] that the [`translate`] method delegates
/// to. When constructed by hand (e.g. in tests), the translator is `None`
/// and [`translate`] returns an error.
pub struct BuildCtx<'a, C: 'a = ()> {
pub ast: &'a mut Ast,
pub captures: &'a Captures,
pub fresh: &'a FreshScope,
/// Source range of the matched node, inherited by synthetic nodes.
pub source_range: Option<tree_sitter::Range>,
/// User-supplied context, accessible directly via `ctx.field` (via Deref).
pub user_ctx: &'a mut C,
/// Optional translator handle, populated when the context is built by
/// the framework's rule driver. None when the context is built by hand.
pub(crate) translator: Option<TranslatorHandle<'a, C>>,
}
impl<'a> BuildCtx<'a> {
pub fn new(ast: &'a mut Ast, captures: &'a Captures, fresh: &'a FreshScope) -> Self {
impl<'a, C> BuildCtx<'a, C> {
pub fn new(
ast: &'a mut Ast,
captures: &'a Captures,
fresh: &'a FreshScope,
user_ctx: &'a mut C,
) -> Self {
Self {
ast,
captures,
fresh,
source_range: None,
user_ctx,
translator: None,
}
}
@@ -32,12 +64,35 @@ impl<'a> BuildCtx<'a> {
captures: &'a Captures,
fresh: &'a FreshScope,
source_range: Option<tree_sitter::Range>,
user_ctx: &'a mut C,
) -> Self {
Self {
ast,
captures,
fresh,
source_range,
user_ctx,
translator: None,
}
}
/// Construct a `BuildCtx` carrying a translator handle. Used by the
/// `rule!` macro to enable [`translate`] inside rule transforms.
pub fn with_translator(
ast: &'a mut Ast,
captures: &'a Captures,
fresh: &'a FreshScope,
source_range: Option<tree_sitter::Range>,
user_ctx: &'a mut C,
translator: TranslatorHandle<'a, C>,
) -> Self {
Self {
ast,
captures,
fresh,
source_range,
user_ctx,
translator: Some(translator),
}
}
@@ -82,10 +137,83 @@ impl<'a> BuildCtx<'a> {
.create_named_token_with_range(kind, value.to_string(), self.source_range)
}
/// Create a leaf node with fixed content and an optional preferred source range.
/// If `source_range` is `None`, falls back to this context's inherited range.
pub fn literal_with_source_range(
&mut self,
kind: &'static str,
value: &str,
source_range: Option<tree_sitter::Range>,
) -> Id {
self.ast.create_named_token_with_range(
kind,
value.to_string(),
source_range.or(self.source_range),
)
}
/// Create a leaf node with an auto-generated unique name.
pub fn fresh(&mut self, kind: &'static str, name: &str) -> Id {
let generated = self.fresh.resolve(name);
self.ast
.create_named_token_with_range(kind, generated, self.source_range)
}
/// Prepend a value to a field of an existing node.
pub fn prepend_field(&mut self, node_id: Id, field_name: &str, value_id: Id) {
let field_id = self
.ast
.field_id_for_name(field_name)
.unwrap_or_else(|| panic!("build: field '{field_name}' not found"));
self.ast.prepend_field_child(node_id, field_id, value_id);
}
}
impl<C: Clone> BuildCtx<'_, C> {
/// Recursively translate a node via the framework's rule machinery.
/// In a OneShot phase, applies OneShot rules to the given node and
/// returns the resulting node ids. In a Repeating phase, errors
/// (translation is not meaningful when input and output share a
/// schema).
///
/// Accepts any value convertible to [`Id`] (including [`crate::NodeRef`]),
/// so manual rules can pass capture bindings directly without unwrapping.
///
/// Errors if this `BuildCtx` was constructed by hand (without a
/// translator handle) — for example, in unit tests that don't go
/// through the rule driver.
pub fn translate<I: Into<Id>>(&mut self, id: I) -> Result<Vec<Id>, String> {
let id = id.into();
match &self.translator {
Some(t) => t.translate(self.ast, self.user_ctx, id),
None => Err("translate() called on a BuildCtx without a translator handle".into()),
}
}
/// Translate an optional capture, returning the first translated id or
/// `None`. Convenience for `?`-quantifier captures (`Option<NodeRef>`).
///
/// If the underlying translation produces multiple ids for a single
/// input, only the first is returned. For most use cases (e.g.
/// translating a single type annotation) this is what you want; if
/// you need all ids, use [`translate`] directly.
pub fn translate_opt<I: Into<Id>>(&mut self, id: Option<I>) -> Result<Option<Id>, String> {
match id {
Some(id) => Ok(self.translate(id)?.into_iter().next()),
None => Ok(None),
}
}
}
impl<C> std::ops::Deref for BuildCtx<'_, C> {
type Target = C;
fn deref(&self) -> &C {
&*self.user_ctx
}
}
impl<C> std::ops::DerefMut for BuildCtx<'_, C> {
fn deref_mut(&mut self) -> &mut C {
&mut *self.user_ctx
}
}

View File

@@ -53,12 +53,7 @@ pub fn dump_ast_with_options(
///
/// Any node that does not match the expected type set for its parent field is
/// rendered with a trailing `" <-- ERROR: ..."` annotation on the same line.
pub fn dump_ast_with_type_errors(
ast: &Ast,
root: usize,
source: &str,
schema: &Schema,
) -> String {
pub fn dump_ast_with_type_errors(ast: &Ast, root: usize, source: &str, schema: &Schema) -> String {
dump_ast_with_type_errors_and_options(ast, root, source, schema, &DumpOptions::default())
}
@@ -74,7 +69,15 @@ pub fn dump_ast_with_type_errors_and_options(
options: &DumpOptions,
) -> String {
let mut out = String::new();
dump_node(ast, root, source, options, 0, Some((schema, None, None)), &mut out);
dump_node(
ast,
root,
source,
options,
0,
Some((schema, None, None)),
&mut out,
);
out
}
@@ -232,8 +235,8 @@ fn dump_node(
}
let field_name = ast.field_name_for_id(field_id).unwrap_or("?");
let child_type_check = type_check.map(|(schema, _, _)| {
let expected = expected_for_field(schema, node.kind_name(), field_id)
.or(Some(EMPTY_NODE_TYPES));
let expected =
expected_for_field(schema, node.kind_name(), field_id).or(Some(EMPTY_NODE_TYPES));
let parent_field = Some((node.kind_name(), field_name));
(schema, expected, parent_field)
});

View File

@@ -16,7 +16,7 @@ pub mod schema;
pub mod tree_builder;
mod visitor;
pub use yeast_macros::{query, rule, tree, trees};
pub use yeast_macros::{manual_rule, query, rule, tree, trees};
use captures::Captures;
pub use cursor::Cursor;
@@ -58,12 +58,30 @@ pub trait YeastDisplay {
fn yeast_to_string(&self, ast: &Ast) -> String;
}
/// Optional source range for values used in `#{expr}` interpolations.
///
/// By default this returns `None`, so synthesized leaves inherit the matched
/// rule's source range. `NodeRef` returns the referenced node's range, letting
/// `(kind #{capture})` carry the captured node's location.
pub trait YeastSourceRange {
fn yeast_source_range(&self, ast: &Ast) -> Option<tree_sitter::Range>;
}
impl YeastDisplay for NodeRef {
fn yeast_to_string(&self, ast: &Ast) -> String {
ast.source_text(self.0)
}
}
impl YeastSourceRange for NodeRef {
fn yeast_source_range(&self, ast: &Ast) -> Option<tree_sitter::Range> {
ast.get_node(self.0).and_then(|n| match &n.content {
NodeContent::Range(r) => Some(r.clone()),
_ => n.source_range,
})
}
}
macro_rules! impl_yeast_display_via_display {
($($t:ty),* $(,)?) => {
$(
@@ -72,6 +90,12 @@ macro_rules! impl_yeast_display_via_display {
::std::string::ToString::to_string(self)
}
}
impl YeastSourceRange for $t {
fn yeast_source_range(&self, _ast: &Ast) -> Option<tree_sitter::Range> {
None
}
}
)*
};
}
@@ -90,6 +114,12 @@ impl<T: YeastDisplay + ?Sized> YeastDisplay for &T {
}
}
impl<T: YeastSourceRange + ?Sized> YeastSourceRange for &T {
fn yeast_source_range(&self, ast: &Ast) -> Option<tree_sitter::Range> {
(**self).yeast_source_range(ast)
}
}
pub const CHILD_FIELD: u16 = u16::MAX;
#[derive(Debug)]
@@ -267,7 +297,9 @@ impl Ast {
/// Returns the source text for `id`, resolving `NodeContent::Range`
/// against the stored source bytes when available.
pub fn source_text(&self, id: Id) -> String {
let Some(node) = self.get_node(id) else { return String::new(); };
let Some(node) = self.get_node(id) else {
return String::new();
};
let read_range = |range: &tree_sitter::Range| {
let start = range.start_byte;
let end = range.end_byte;
@@ -368,6 +400,15 @@ impl Ast {
is_named: bool,
source_range: Option<tree_sitter::Range>,
) -> Id {
let source_range = match &content {
// Parsed nodes already carry an exact source range in their content.
NodeContent::Range(_) => source_range,
// Synthesized nodes derive location from children when possible,
// and fall back to the inherited rule-match range otherwise.
_ => self
.union_source_range_of_children(&fields)
.or(source_range),
};
let id = self.nodes.len();
self.nodes.push(Node {
kind,
@@ -383,10 +424,79 @@ impl Ast {
id
}
fn union_source_range_of_children(
&self,
fields: &BTreeMap<FieldId, Vec<Id>>,
) -> Option<tree_sitter::Range> {
let mut start_byte: Option<usize> = None;
let mut end_byte: Option<usize> = None;
let mut start_point = tree_sitter::Point { row: 0, column: 0 };
let mut end_point = tree_sitter::Point { row: 0, column: 0 };
for child_ids in fields.values() {
for &child_id in child_ids {
let Some(child) = self.get_node(child_id) else {
continue;
};
let child_start_byte = child.start_byte();
let child_end_byte = child.end_byte();
// Skip children that carry no usable location.
if child_start_byte == 0 && child_end_byte == 0 {
continue;
}
match start_byte {
None => {
start_byte = Some(child_start_byte);
start_point = child.start_position();
}
Some(current_start) if child_start_byte < current_start => {
start_byte = Some(child_start_byte);
start_point = child.start_position();
}
_ => {}
}
match end_byte {
None => {
end_byte = Some(child_end_byte);
end_point = child.end_position();
}
Some(current_end) if child_end_byte > current_end => {
end_byte = Some(child_end_byte);
end_point = child.end_position();
}
_ => {}
}
}
}
match (start_byte, end_byte) {
(Some(start_byte), Some(end_byte)) => Some(tree_sitter::Range {
start_byte,
end_byte,
start_point,
end_point,
}),
_ => None,
}
}
pub fn create_named_token(&mut self, kind: &'static str, content: String) -> Id {
self.create_named_token_with_range(kind, content, None)
}
/// Prepend a child id to the given field of the given node.
pub fn prepend_field_child(&mut self, node_id: Id, field_id: FieldId, value_id: Id) {
let node = self
.nodes
.get_mut(node_id)
.expect("prepend_field_child: invalid node id");
node.fields.entry(field_id).or_default().insert(0, value_id);
}
pub fn create_named_token_with_range(
&mut self,
kind: &'static str,
@@ -595,18 +705,118 @@ impl From<tree_sitter::Range> for NodeContent {
}
}
/// The transform function for a rule: takes the AST, captured variables, a
/// fresh-name scope, and the source range of the matched node, and returns
/// the IDs of the replacement nodes.
pub type Transform = Box<
dyn Fn(&mut Ast, Captures, &tree_builder::FreshScope, Option<tree_sitter::Range>) -> Vec<Id>
/// A handle that lets a rule transform recursively translate AST nodes via
/// the framework's rule machinery. Constructed by the driver and passed as
/// the last argument of every [`Transform`] invocation.
///
/// The `rule!` macro uses [`TranslatorHandle::auto_translate_captures`] in
/// its generated prefix to translate captures before running the user's
/// transform body. Manually-written transforms (using [`Rule::new`]
/// directly) can call [`TranslatorHandle::translate`] selectively on
/// specific node ids to control when translation happens.
pub struct TranslatorHandle<'a, C> {
inner: TranslatorImpl<'a, C>,
}
/// Internal phase-specific translation state. Kept private — callers
/// interact with [`TranslatorHandle`] only.
enum TranslatorImpl<'a, C> {
/// OneShot phase translator: recursively applies OneShot rules.
OneShot {
index: &'a RuleIndex<'a, C>,
fresh: &'a tree_builder::FreshScope,
rewrite_depth: usize,
/// The id of the node the current rule is matching. Used by
/// [`auto_translate_captures`] to avoid infinite recursion when a
/// rule captures its own match root (e.g. via `(_) @_`).
matched_root: Id,
},
/// Repeating phase translator: translation is not meaningful here
/// (input and output schemas are the same). [`translate`] errors;
/// [`auto_translate_captures`] is a no-op so the macro's auto-prefix
/// works unchanged for Repeating rules.
Repeating,
}
impl<'a, C: Clone> TranslatorHandle<'a, C> {
/// Recursively apply OneShot rules to `id` and return the resulting
/// node ids. Errors in a Repeating phase (where translation is not
/// meaningful).
pub fn translate(&self, ast: &mut Ast, user_ctx: &mut C, id: Id) -> Result<Vec<Id>, String> {
match &self.inner {
TranslatorImpl::OneShot {
index,
fresh,
rewrite_depth,
..
} => apply_one_shot_rules_inner(index, ast, user_ctx, id, fresh, rewrite_depth + 1),
TranslatorImpl::Repeating => {
Err("translate() is not available in a Repeating phase".into())
}
}
}
/// Translate every captured node in `captures` in place (OneShot phase
/// only). In a Repeating phase this is a no-op — Repeating rules
/// receive raw captures.
///
/// Used by the `rule!` macro's generated prefix to preserve the
/// pre-existing "auto-translate captures before running the transform
/// body" behavior. Manually-written transforms typically translate
/// captures selectively via [`translate`] instead.
///
/// To avoid infinite recursion, a capture whose id matches the rule's
/// matched root (e.g. from a `(_) @_` pattern) is left unchanged.
pub fn auto_translate_captures(
&self,
captures: &mut Captures,
ast: &mut Ast,
user_ctx: &mut C,
) -> Result<(), String> {
match &self.inner {
TranslatorImpl::OneShot { matched_root, .. } => {
let root = *matched_root;
captures.try_map_all_captures(|cid| {
if cid == root {
Ok(vec![cid])
} else {
self.translate(ast, user_ctx, cid)
}
})
}
TranslatorImpl::Repeating => Ok(()),
}
}
}
/// The transform function for a rule.
///
/// Takes the AST, the (raw, untranslated) captured variables, a fresh-name
/// scope, the source range of the matched node, a mutable reference to the
/// user context of type `C`, and a [`TranslatorHandle`] for recursively
/// translating nodes. Returns the IDs of the replacement nodes, or an
/// error message if the transform could not be completed.
///
/// Transforms produced by [`Rule::new`] receive **raw** captures and must
/// translate them themselves (via the handle). Transforms produced by the
/// `rule!` macro have an auto-translation prefix injected for backward
/// compatibility.
pub type Transform<C = ()> = Box<
dyn Fn(
&mut Ast,
Captures,
&tree_builder::FreshScope,
Option<tree_sitter::Range>,
&mut C,
TranslatorHandle<'_, C>,
) -> Result<Vec<Id>, String>
+ Send
+ Sync,
>;
pub struct Rule {
pub struct Rule<C = ()> {
query: QueryNode,
transform: Transform,
transform: Transform<C>,
/// If true, after this rule fires on a node the engine will try to
/// re-apply this same rule on the result root. Defaults to false:
/// each rule fires at most once on a given node, which prevents
@@ -614,8 +824,8 @@ pub struct Rule {
repeated: bool,
}
impl Rule {
pub fn new(query: QueryNode, transform: Transform) -> Self {
impl<C> Rule<C> {
pub fn new(query: QueryNode, transform: Transform<C>) -> Self {
Self {
query,
transform,
@@ -637,9 +847,13 @@ impl Rule {
ast: &mut Ast,
node: Id,
fresh: &tree_builder::FreshScope,
user_ctx: &mut C,
translator: TranslatorHandle<'_, C>,
) -> Result<Option<Vec<Id>>, String> {
match self.try_match(ast, node)? {
Some(captures) => Ok(Some(self.run_transform(ast, captures, node, fresh))),
Some(captures) => Ok(Some(
self.run_transform(ast, captures, node, fresh, user_ctx, translator)?,
)),
None => Ok(None),
}
}
@@ -663,29 +877,31 @@ impl Rule {
captures: Captures,
node: Id,
fresh: &tree_builder::FreshScope,
) -> Vec<Id> {
user_ctx: &mut C,
translator: TranslatorHandle<'_, C>,
) -> Result<Vec<Id>, String> {
fresh.next_scope();
let source_range = ast.get_node(node).and_then(|n| match n.content {
NodeContent::Range(r) => Some(r),
_ => n.source_range,
});
(self.transform)(ast, captures, fresh, source_range)
(self.transform)(ast, captures, fresh, source_range, user_ctx, translator)
}
}
const MAX_REWRITE_DEPTH: usize = 100;
/// Index of rules by their root query kind for fast lookup.
struct RuleIndex<'a> {
struct RuleIndex<'a, C> {
/// Rules indexed by root node kind name.
by_kind: BTreeMap<&'static str, Vec<&'a Rule>>,
by_kind: BTreeMap<&'static str, Vec<&'a Rule<C>>>,
/// Rules with wildcard queries (Any) that apply to all nodes.
wildcard: Vec<&'a Rule>,
wildcard: Vec<&'a Rule<C>>,
}
impl<'a> RuleIndex<'a> {
fn new(rules: &'a [Rule]) -> Self {
let mut by_kind: BTreeMap<&'static str, Vec<&'a Rule>> = BTreeMap::new();
impl<'a, C> RuleIndex<'a, C> {
fn new(rules: &'a [Rule<C>]) -> Self {
let mut by_kind: BTreeMap<&'static str, Vec<&'a Rule<C>>> = BTreeMap::new();
let mut wildcard = Vec::new();
for rule in rules {
match rule.query.root_kind() {
@@ -696,7 +912,7 @@ impl<'a> RuleIndex<'a> {
Self { by_kind, wildcard }
}
fn rules_for_kind(&self, kind: &str) -> impl Iterator<Item = &&'a Rule> {
fn rules_for_kind(&self, kind: &str) -> impl Iterator<Item = &&'a Rule<C>> {
self.by_kind
.get(kind)
.into_iter()
@@ -705,23 +921,25 @@ impl<'a> RuleIndex<'a> {
}
}
fn apply_repeating_rules(
rules: &[Rule],
fn apply_repeating_rules<C: Clone>(
rules: &[Rule<C>],
ast: &mut Ast,
user_ctx: &mut C,
id: Id,
fresh: &tree_builder::FreshScope,
) -> Result<Vec<Id>, String> {
let index = RuleIndex::new(rules);
apply_repeating_rules_inner(&index, ast, id, fresh, 0, None)
apply_repeating_rules_inner(&index, ast, user_ctx, id, fresh, 0, None)
}
fn apply_repeating_rules_inner(
index: &RuleIndex,
fn apply_repeating_rules_inner<C: Clone>(
index: &RuleIndex<C>,
ast: &mut Ast,
user_ctx: &mut C,
id: Id,
fresh: &tree_builder::FreshScope,
rewrite_depth: usize,
skip_rule: Option<*const Rule>,
skip_rule: Option<*const Rule<C>>,
) -> Result<Vec<Id>, String> {
if rewrite_depth > MAX_REWRITE_DEPTH {
return Err(format!(
@@ -732,11 +950,23 @@ fn apply_repeating_rules_inner(
let node_kind = ast.get_node(id).map(|n| n.kind()).unwrap_or("");
for rule in index.rules_for_kind(node_kind) {
let rule_ptr = *rule as *const Rule;
let rule_ptr = *rule as *const Rule<C>;
if Some(rule_ptr) == skip_rule {
continue;
}
if let Some(result_node) = rule.try_rule(ast, id, fresh)? {
// Snapshot the user context before invoking the rule so that any
// mutations the rule makes are visible during recursive translation
// of its result, but not leaked to the parent's siblings.
let snapshot = user_ctx.clone();
// Repeating rules don't need a real translator: their captures
// aren't auto-translated (Repeating preserves the input schema),
// and `ctx.translate(id)` errors if invoked from a Repeating
// transform.
let translator = TranslatorHandle {
inner: TranslatorImpl::Repeating,
};
let try_result = rule.try_rule(ast, id, fresh, user_ctx, translator)?;
if let Some(result_node) = try_result {
// For non-repeated rules, suppress further application of *this*
// rule on the result root, so a rule whose output matches its own
// query doesn't loop. Other rules and child traversal are
@@ -747,14 +977,19 @@ fn apply_repeating_rules_inner(
results.extend(apply_repeating_rules_inner(
index,
ast,
user_ctx,
node,
fresh,
rewrite_depth + 1,
next_skip,
)?);
}
*user_ctx = snapshot;
return Ok(results);
}
// Rule didn't match; restore any speculative changes (none expected
// since try_rule only mutates on match, but be defensive).
*user_ctx = snapshot;
}
// Take the parent's fields by ownership: the recursion will rewrite
@@ -769,7 +1004,15 @@ fn apply_repeating_rules_inner(
for children in fields.values_mut() {
let mut new_children: Option<Vec<Id>> = None;
for (i, &child_id) in children.iter().enumerate() {
let result = apply_repeating_rules_inner(index, ast, child_id, fresh, rewrite_depth, None)?;
let result = apply_repeating_rules_inner(
index,
ast,
user_ctx,
child_id,
fresh,
rewrite_depth,
None,
)?;
let unchanged = result.len() == 1 && result[0] == child_id;
match (&mut new_children, unchanged) {
(None, true) => {} // unchanged so far, no allocation needed
@@ -798,24 +1041,25 @@ fn apply_repeating_rules_inner(
/// each visited node, recursion proceeds only through captured nodes (not
/// through the input node's children directly), and an error is returned if
/// no rule matches a visited node.
fn apply_one_shot_rules(
rules: &[Rule],
fn apply_one_shot_rules<C: Clone>(
rules: &[Rule<C>],
ast: &mut Ast,
user_ctx: &mut C,
id: Id,
fresh: &tree_builder::FreshScope,
) -> Result<Vec<Id>, String> {
let index = RuleIndex::new(rules);
apply_one_shot_rules_inner(&index, ast, id, fresh, 0)
apply_one_shot_rules_inner(&index, ast, user_ctx, id, fresh, 0)
}
fn apply_one_shot_rules_inner(
index: &RuleIndex,
fn apply_one_shot_rules_inner<C: Clone>(
index: &RuleIndex<C>,
ast: &mut Ast,
user_ctx: &mut C,
id: Id,
fresh: &tree_builder::FreshScope,
rewrite_depth: usize,
) -> Result<Vec<Id>, String> {
if rewrite_depth > MAX_REWRITE_DEPTH {
return Err(format!(
"Desugaring exceeded maximum rewrite depth ({MAX_REWRITE_DEPTH}). \
@@ -825,31 +1069,28 @@ fn apply_one_shot_rules_inner(
let node_kind = ast.get_node(id).map(|n| n.kind()).unwrap_or("");
// Don't rewrite unnamed nodes (punctuation, keywords, etc.); leave them
// as-is. Rules target named nodes only.
if let Some(node) = ast.get_node(id) {
if !node.is_named() {
return Ok(vec![id]);
}
}
for rule in index.rules_for_kind(node_kind) {
if let Some(mut captures) = rule.try_match(ast, id)? {
// Recursively translate every captured node before invoking the
// transform. The transform's output uses output-schema kinds, so
// we must translate captured input-schema nodes to their
// output-schema equivalents first.
captures.try_map_all_captures(|captured_id| {
// Avoid infinite recursion when a capture refers to the root
// node of the matched tree (e.g. an `@_` capture on the
// pattern root): re-analyzing it would match the same rule
// again indefinitely.
if captured_id == id {
return Ok(vec![captured_id]);
}
apply_one_shot_rules_inner(index, ast, captured_id, fresh, rewrite_depth + 1)
})?;
return Ok(rule.run_transform(ast, captures, id, fresh));
if let Some(captures) = rule.try_match(ast, id)? {
// Snapshot the user context before invoking the rule so that any
// mutations the rule (or its transitively-translated captures)
// make are visible during this rule's transform, but not leaked
// to the parent's siblings.
let snapshot = user_ctx.clone();
// Build the translator handle the transform will use to
// recursively translate captures (or, for macro-generated
// rules, the auto-translate prefix uses it to translate every
// capture up front, preserving the legacy behavior).
let translator = TranslatorHandle {
inner: TranslatorImpl::OneShot {
index,
fresh,
rewrite_depth,
matched_root: id,
},
};
let result = rule.run_transform(ast, captures, id, fresh, user_ctx, translator)?;
*user_ctx = snapshot;
return Ok(result);
}
}
@@ -877,15 +1118,15 @@ pub enum PhaseKind {
/// starts. Rules within a phase compete for matches as usual; rules in
/// different phases never compete because each traversal only considers the
/// current phase's rules.
pub struct Phase {
pub struct Phase<C = ()> {
/// Name used in error messages.
pub name: String,
pub rules: Vec<Rule>,
pub rules: Vec<Rule<C>>,
pub kind: PhaseKind,
}
impl Phase {
pub fn new(name: impl Into<String>, kind: PhaseKind, rules: Vec<Rule>) -> Self {
impl<C> Phase<C> {
pub fn new(name: impl Into<String>, kind: PhaseKind, rules: Vec<Rule<C>>) -> Self {
Self {
name: name.into(),
rules,
@@ -911,17 +1152,30 @@ impl Phase {
/// .add_phase("desugar", PhaseKind::Repeating, desugar_rules)
/// .with_output_node_types_yaml(yaml);
/// ```
#[derive(Default)]
pub struct DesugaringConfig {
///
/// The optional type parameter `C` is the user context type threaded through
/// rule transforms. Defaults to `()` (no user context).
pub struct DesugaringConfig<C = ()> {
/// Phases of rule application, applied in order.
pub phases: Vec<Phase>,
pub phases: Vec<Phase<C>>,
/// Output node-types in YAML format. If `None`, the input grammar's
/// node types are used (i.e. the desugared AST has the same node types
/// as the tree-sitter grammar).
pub output_node_types_yaml: Option<&'static str>,
}
impl DesugaringConfig {
// Manual `Default` impl so users with a custom `C` that doesn't implement
// `Default` can still construct an empty config.
impl<C> Default for DesugaringConfig<C> {
fn default() -> Self {
Self {
phases: Vec::new(),
output_node_types_yaml: None,
}
}
}
impl<C> DesugaringConfig<C> {
/// Create an empty configuration. Add phases via [`add_phase`] and an
/// optional output schema via [`with_output_node_types_yaml`].
pub fn new() -> Self {
@@ -933,7 +1187,7 @@ impl DesugaringConfig {
mut self,
name: impl Into<String>,
kind: PhaseKind,
rules: Vec<Rule>,
rules: Vec<Rule<C>>,
) -> Self {
self.phases.push(Phase::new(name, kind, rules));
self
@@ -955,15 +1209,15 @@ impl DesugaringConfig {
}
}
pub struct Runner<'a> {
pub struct Runner<'a, C = ()> {
language: tree_sitter::Language,
schema: schema::Schema,
phases: &'a [Phase],
phases: &'a [Phase<C>],
}
impl<'a> Runner<'a> {
impl<'a, C> Runner<'a, C> {
/// Create a runner using the input grammar's schema for output.
pub fn new(language: tree_sitter::Language, phases: &'a [Phase]) -> Self {
pub fn new(language: tree_sitter::Language, phases: &'a [Phase<C>]) -> Self {
let schema = schema::Schema::from_language(&language);
Self {
language,
@@ -976,7 +1230,7 @@ impl<'a> Runner<'a> {
pub fn with_schema(
language: tree_sitter::Language,
schema: &schema::Schema,
phases: &'a [Phase],
phases: &'a [Phase<C>],
) -> Self {
Self {
language,
@@ -988,7 +1242,7 @@ impl<'a> Runner<'a> {
/// Create a runner from a [`DesugaringConfig`].
pub fn from_config(
language: tree_sitter::Language,
config: &'a DesugaringConfig,
config: &'a DesugaringConfig<C>,
) -> Result<Self, String> {
let schema = config.build_schema(&language)?;
Ok(Self {
@@ -997,11 +1251,17 @@ impl<'a> Runner<'a> {
phases: &config.phases,
})
}
}
pub fn run_from_tree(
impl<'a, C: Clone> Runner<'a, C> {
/// Parse `tree` against `source` and run all phases, threading
/// `user_ctx` through every rule transform. The caller owns the
/// initial context state.
pub fn run_from_tree_with_ctx(
&self,
tree: &tree_sitter::Tree,
source: &[u8],
user_ctx: &mut C,
) -> Result<Ast, String> {
let mut ast = Ast::from_tree_with_schema_and_source(
self.schema.clone(),
@@ -1009,11 +1269,13 @@ impl<'a> Runner<'a> {
&self.language,
source.to_vec(),
);
self.run_phases(&mut ast)?;
self.run_phases(&mut ast, user_ctx)?;
Ok(ast)
}
pub fn run(&self, input: &str) -> Result<Ast, String> {
/// Parse `input` and run all phases, threading `user_ctx` through
/// every rule transform. The caller owns the initial context state.
pub fn run_with_ctx(&self, input: &str, user_ctx: &mut C) -> Result<Ast, String> {
let mut parser = tree_sitter::Parser::new();
parser
.set_language(&self.language)
@@ -1027,20 +1289,24 @@ impl<'a> Runner<'a> {
&self.language,
input.as_bytes().to_vec(),
);
self.run_phases(&mut ast)?;
self.run_phases(&mut ast, user_ctx)?;
Ok(ast)
}
/// Apply each phase in turn to the AST, threading the root through.
/// A single `FreshScope` is shared across phases so that fresh
/// identifiers generated in different phases don't collide.
fn run_phases(&self, ast: &mut Ast) -> Result<(), String> {
fn run_phases(&self, ast: &mut Ast, user_ctx: &mut C) -> Result<(), String> {
let fresh = tree_builder::FreshScope::new();
let mut root = ast.get_root();
for phase in self.phases {
let res = match phase.kind {
PhaseKind::Repeating => apply_repeating_rules(&phase.rules, ast, root, &fresh),
PhaseKind::OneShot => apply_one_shot_rules(&phase.rules, ast, root, &fresh),
PhaseKind::Repeating => {
apply_repeating_rules(&phase.rules, ast, user_ctx, root, &fresh)
}
PhaseKind::OneShot => {
apply_one_shot_rules(&phase.rules, ast, user_ctx, root, &fresh)
}
}
.map_err(|e| format!("Phase `{}`: {e}", phase.name))?;
if res.len() != 1 {
@@ -1056,3 +1322,78 @@ impl<'a> Runner<'a> {
Ok(())
}
}
impl<'a, C: Clone + Default> Runner<'a, C> {
/// Parse `tree` against `source` and run all phases, using the
/// default context (`C::default()`) as the initial context state.
pub fn run_from_tree(&self, tree: &tree_sitter::Tree, source: &[u8]) -> Result<Ast, String> {
let mut user_ctx = C::default();
self.run_from_tree_with_ctx(tree, source, &mut user_ctx)
}
/// Parse `input` and run all phases, using the default context
/// (`C::default()`) as the initial context state.
pub fn run(&self, input: &str) -> Result<Ast, String> {
let mut user_ctx = C::default();
self.run_with_ctx(input, &mut user_ctx)
}
}
// ---------------------------------------------------------------------------
// Desugarer: type-erased view of a DesugaringConfig + Runner
// ---------------------------------------------------------------------------
/// Type-erased interface to a desugaring pipeline for a single language.
///
/// Consumers (e.g. a generic tree-sitter extractor) hold
/// `Box<dyn Desugarer>` so they can dispatch through the trait without
/// knowing the user context type `C` that's internal to yeast.
///
/// Construct one via [`ConcreteDesugarer::new`] from a
/// [`DesugaringConfig<C>`] and a [`tree_sitter::Language`].
pub trait Desugarer: Send + Sync {
/// The output AST schema (in YAML format), or `None` if the input
/// grammar's schema should be used.
fn output_node_types_yaml(&self) -> Option<&'static str>;
/// Parse `tree` against `source` and run the desugaring pipeline.
/// Each call constructs a fresh default user context internally.
fn run_from_tree(&self, tree: &tree_sitter::Tree, source: &[u8]) -> Result<Ast, String>;
}
/// A concrete [`Desugarer`] backed by a [`DesugaringConfig<C>`] for a
/// specific user context type `C`. Stores the language and a pre-built
/// schema so that per-call cost is bounded to constructing a transient
/// [`Runner`] and cloning the schema (no YAML re-parsing).
pub struct ConcreteDesugarer<C: Default + Clone + Send + Sync + 'static> {
language: tree_sitter::Language,
schema: schema::Schema,
config: DesugaringConfig<C>,
}
impl<C: Default + Clone + Send + Sync + 'static> ConcreteDesugarer<C> {
/// Build a desugarer for `language` from `config`. Parses the output
/// schema YAML once (if set) and stores it for reuse across files.
pub fn new(
language: tree_sitter::Language,
config: DesugaringConfig<C>,
) -> Result<Self, String> {
let schema = config.build_schema(&language)?;
Ok(Self {
language,
schema,
config,
})
}
}
impl<C: Default + Clone + Send + Sync + 'static> Desugarer for ConcreteDesugarer<C> {
fn output_node_types_yaml(&self) -> Option<&'static str> {
self.config.output_node_types_yaml
}
fn run_from_tree(&self, tree: &tree_sitter::Tree, source: &[u8]) -> Result<Ast, String> {
let runner = Runner::with_schema(self.language.clone(), &self.schema, &self.config.phases);
runner.run_from_tree(tree, source)
}
}

View File

@@ -242,10 +242,7 @@ pub fn convert(yaml_input: &str) -> Result<String, String> {
/// Apply YAML node-type definitions to a mutable Schema.
/// Registers all types, fields, and allowed types from the YAML into the schema.
fn apply_yaml_to_schema(
yaml: &YamlNodeTypes,
schema: &mut crate::schema::Schema,
) {
fn apply_yaml_to_schema(yaml: &YamlNodeTypes, schema: &mut crate::schema::Schema) {
// Register all supertypes as node kinds
for name in yaml.supertypes.keys() {
schema.register_kind(name);
@@ -307,7 +304,8 @@ fn apply_yaml_to_schema(
.into_vec()
.into_iter()
.map(|type_ref| {
let (kind, named) = resolve_type_ref_pair(&type_ref, &named_types, &unnamed_types);
let (kind, named) =
resolve_type_ref_pair(&type_ref, &named_types, &unnamed_types);
crate::schema::NodeType { kind, named }
})
.collect::<Vec<_>>();

View File

@@ -198,13 +198,8 @@ impl Schema {
.insert((parent_kind.to_string(), field_id), node_types);
}
pub fn field_types(
&self,
parent_kind: &str,
field_id: FieldId,
) -> Option<&Vec<NodeType>> {
self.field_types
.get(&(parent_kind.to_string(), field_id))
pub fn field_types(&self, parent_kind: &str, field_id: FieldId) -> Option<&Vec<NodeType>> {
self.field_types.get(&(parent_kind.to_string(), field_id))
}
pub fn set_field_cardinality(

View File

@@ -7,7 +7,7 @@ const OUTPUT_SCHEMA_YAML: &str = include_str!("node-types.yml");
/// Helper: parse Ruby source with no rules, return dump.
fn parse_and_dump(input: &str) -> String {
let runner = Runner::new(tree_sitter_ruby::LANGUAGE.into(), &[]);
let runner: Runner = Runner::new(tree_sitter_ruby::LANGUAGE.into(), &[]);
let ast = runner.run(input).unwrap();
dump_ast(&ast, ast.get_root(), input)
}
@@ -18,13 +18,23 @@ fn run_and_dump(input: &str, rules: Vec<Rule>) -> String {
run_phased_and_dump(input, vec![Phase::new("test", PhaseKind::Repeating, rules)])
}
/// Helper: parse Ruby source with custom rules and return the transformed AST.
fn run_and_ast(input: &str, rules: Vec<Rule>) -> Ast {
let lang: tree_sitter::Language = tree_sitter_ruby::LANGUAGE.into();
let schema =
yeast::node_types_yaml::schema_from_yaml_with_language(OUTPUT_SCHEMA_YAML, &lang).unwrap();
let phases = vec![Phase::new("test", PhaseKind::Repeating, rules)];
let runner: Runner = Runner::with_schema(lang, &schema, &phases);
runner.run(input).unwrap()
}
/// Helper: parse Ruby source with a custom output schema and multiple
/// rule phases, return dump.
fn run_phased_and_dump(input: &str, phases: Vec<Phase>) -> String {
let lang: tree_sitter::Language = tree_sitter_ruby::LANGUAGE.into();
let schema =
yeast::node_types_yaml::schema_from_yaml_with_language(OUTPUT_SCHEMA_YAML, &lang).unwrap();
let runner = Runner::with_schema(lang, &schema, &phases);
let runner: Runner = Runner::with_schema(lang, &schema, &phases);
let ast = runner.run(input).unwrap();
dump_ast(&ast, ast.get_root(), input)
}
@@ -36,7 +46,7 @@ fn run_and_get_error(input: &str, rules: Vec<Rule>) -> String {
let schema =
yeast::node_types_yaml::schema_from_yaml_with_language(OUTPUT_SCHEMA_YAML, &lang).unwrap();
let phases = vec![Phase::new("test", PhaseKind::Repeating, rules)];
let runner = Runner::with_schema(lang, &schema, &phases);
let runner: Runner = Runner::with_schema(lang, &schema, &phases);
runner
.run(input)
.expect_err("expected runner to return an error")
@@ -44,7 +54,7 @@ fn run_and_get_error(input: &str, rules: Vec<Rule>) -> String {
/// Helper: parse Ruby source with no rules and dump with schema type errors.
fn parse_and_dump_typed(input: &str, schema_yaml: &str) -> String {
let runner = Runner::new(tree_sitter_ruby::LANGUAGE.into(), &[]);
let runner: Runner = Runner::new(tree_sitter_ruby::LANGUAGE.into(), &[]);
let ast = runner.run(input).unwrap();
let schema = yeast::node_types_yaml::schema_from_yaml(schema_yaml).unwrap();
dump_ast_with_type_errors(&ast, ast.get_root(), input, &schema)
@@ -54,10 +64,10 @@ fn parse_and_dump_typed(input: &str, schema_yaml: &str) -> String {
/// building schema with language IDs so field checks align with parser fields.
fn parse_and_dump_typed_with_language(input: &str, schema_yaml: &str) -> String {
let lang: tree_sitter::Language = tree_sitter_ruby::LANGUAGE.into();
let runner = Runner::new(lang.clone(), &[]);
let runner: Runner = Runner::new(lang.clone(), &[]);
let ast = runner.run(input).unwrap();
let schema = yeast::node_types_yaml::schema_from_yaml_with_language(schema_yaml, &lang)
.unwrap();
let schema =
yeast::node_types_yaml::schema_from_yaml_with_language(schema_yaml, &lang).unwrap();
dump_ast_with_type_errors(&ast, ast.get_root(), input, &schema)
}
@@ -66,7 +76,7 @@ fn run_and_dump_typed(input: &str, rules: Vec<Rule>, schema_yaml: &str) -> Strin
let lang: tree_sitter::Language = tree_sitter_ruby::LANGUAGE.into();
let schema = yeast::node_types_yaml::schema_from_yaml(schema_yaml).unwrap();
let phases = vec![Phase::new("test", PhaseKind::Repeating, rules)];
let runner = Runner::with_schema(lang, &schema, &phases);
let runner: Runner = Runner::with_schema(lang, &schema, &phases);
let ast = runner.run(input).unwrap();
dump_ast_with_type_errors(&ast, ast.get_root(), input, &schema)
}
@@ -156,7 +166,7 @@ fn test_parse_for_loop() {
#[test]
fn test_dump_highlights_type_errors_inline() {
let schema_yaml = r#"
let schema_yaml = r#"
named:
program:
$children*: assignment
@@ -166,13 +176,13 @@ named:
identifier:
"#;
let dump = parse_and_dump_typed("x = 1", schema_yaml);
assert!(dump.contains("integer \"1\" <-- ERROR:"));
let dump = parse_and_dump_typed("x = 1", schema_yaml);
assert!(dump.contains("integer \"1\" <-- ERROR:"));
}
#[test]
fn test_dump_reports_preserved_unknown_kind_after_transformation() {
let schema_yaml = r#"
let schema_yaml = r#"
named:
program:
$children*: assignment
@@ -182,25 +192,25 @@ named:
identifier:
"#;
// This rewrite runs and preserves the RHS node kind via capture.
// With schema above, preserving `integer` should be reported inline.
let rules = vec![yeast::rule!(
(assignment left: (_) @left right: (_) @right)
=>
(assignment
left: {left}
right: {right}
)
)];
// This rewrite runs and preserves the RHS node kind via capture.
// With schema above, preserving `integer` should be reported inline.
let rules: Vec<Rule> = vec![yeast::rule!(
(assignment left: (_) @left right: (_) @right)
=>
(assignment
left: {left}
right: {right}
)
)];
let dump = run_and_dump_typed("x = 1", rules, schema_yaml);
assert!(dump.contains("integer \"1\" <-- ERROR:"));
assert!(dump.contains("node kind 'integer' not in schema"));
let dump = run_and_dump_typed("x = 1", rules, schema_yaml);
assert!(dump.contains("integer \"1\" <-- ERROR:"));
assert!(dump.contains("node kind 'integer' not in schema"));
}
#[test]
fn test_dump_reports_undeclared_field_on_node() {
let schema_yaml = r#"
let schema_yaml = r#"
named:
program:
$children*: assignment
@@ -209,14 +219,14 @@ named:
identifier:
"#;
let dump = parse_and_dump_typed_with_language("x = y", schema_yaml);
assert!(dump.contains("right: identifier \"y\" <-- ERROR:"));
assert!(dump.contains("the node 'assignment' has no field 'right'"));
let dump = parse_and_dump_typed_with_language("x = y", schema_yaml);
assert!(dump.contains("right: identifier \"y\" <-- ERROR:"));
assert!(dump.contains("the node 'assignment' has no field 'right'"));
}
#[test]
fn test_dump_reports_disallowed_kind_in_field_type() {
let schema_yaml = r#"
let schema_yaml = r#"
named:
program:
$children*: assignment
@@ -227,17 +237,17 @@ named:
integer:
"#;
let dump = parse_and_dump_typed_with_language("x = 1", schema_yaml);
assert!(dump.contains("right: integer \"1\" <-- ERROR:"));
assert!(dump.contains("should contain"));
assert!(dump.contains("but got integer"));
let dump = parse_and_dump_typed_with_language("x = 1", schema_yaml);
assert!(dump.contains("right: integer \"1\" <-- ERROR:"));
assert!(dump.contains("should contain"));
assert!(dump.contains("but got integer"));
}
// ---- Query tests ----
#[test]
fn test_query_match() {
let runner = Runner::new(tree_sitter_ruby::LANGUAGE.into(), &[]);
let runner: Runner = Runner::new(tree_sitter_ruby::LANGUAGE.into(), &[]);
let ast = runner.run("x = 1").unwrap();
let query = yeast::query!(
@@ -258,7 +268,7 @@ fn test_query_match() {
#[test]
fn test_query_no_match() {
let runner = Runner::new(tree_sitter_ruby::LANGUAGE.into(), &[]);
let runner: Runner = Runner::new(tree_sitter_ruby::LANGUAGE.into(), &[]);
let ast = runner.run("x = 1").unwrap();
let query = yeast::query!(
@@ -283,7 +293,7 @@ fn test_query_skips_extras_in_positional_match() {
// captured comment to nothing (a common idiom, e.g.
// `(comment) => ()` in Swift) leaves the capture's match-list empty
// and causes the transform to fail with "Variable X has 0 matches".
let runner = Runner::new(tree_sitter_ruby::LANGUAGE.into(), &[]);
let runner: Runner = Runner::new(tree_sitter_ruby::LANGUAGE.into(), &[]);
let ast = runner.run("[1, # comment\n2]").unwrap();
// Navigate to the `array` node: program -> array.
@@ -299,15 +309,11 @@ fn test_query_skips_extras_in_positional_match() {
let matched = query.do_match(&ast, array_id, &mut captures).unwrap();
assert!(matched);
assert_eq!(
ast.get_node(captures.get_var("a").unwrap())
.unwrap()
.kind(),
ast.get_node(captures.get_var("a").unwrap()).unwrap().kind(),
"integer"
);
assert_eq!(
ast.get_node(captures.get_var("b").unwrap())
.unwrap()
.kind(),
ast.get_node(captures.get_var("b").unwrap()).unwrap().kind(),
"integer"
);
}
@@ -315,14 +321,14 @@ fn test_query_skips_extras_in_positional_match() {
#[test]
fn test_reachable_nodes_excludes_orphaned_rewrite_nodes() {
let lang: tree_sitter::Language = tree_sitter_ruby::LANGUAGE.into();
let schema = yeast::node_types_yaml::schema_from_yaml_with_language(OUTPUT_SCHEMA_YAML, &lang)
.unwrap();
let phases = vec![Phase::new(
let schema =
yeast::node_types_yaml::schema_from_yaml_with_language(OUTPUT_SCHEMA_YAML, &lang).unwrap();
let phases: Vec<Phase> = vec![Phase::new(
"test",
PhaseKind::Repeating,
vec![yeast::rule!((integer) => (identifier "replaced"))],
)];
let runner = Runner::with_schema(lang, &schema, &phases);
let runner: Runner = Runner::with_schema(lang, &schema, &phases);
let input = "x = 1";
let ast = runner.run(input).unwrap();
@@ -340,7 +346,7 @@ fn test_reachable_nodes_excludes_orphaned_rewrite_nodes() {
#[test]
fn test_query_repeated_capture() {
let runner = Runner::new(tree_sitter_ruby::LANGUAGE.into(), &[]);
let runner: Runner = Runner::new(tree_sitter_ruby::LANGUAGE.into(), &[]);
let ast = runner.run("x, y, z = 1").unwrap();
let query = yeast::query!(
@@ -365,7 +371,7 @@ fn test_query_repeated_capture() {
#[test]
fn test_capture_unnamed_node_parenthesized() {
// `("=") @op` captures the unnamed `=` token between left and right.
let runner = Runner::new(tree_sitter_ruby::LANGUAGE.into(), &[]);
let runner: Runner = Runner::new(tree_sitter_ruby::LANGUAGE.into(), &[]);
let ast = runner.run("x = 1").unwrap();
let query = yeast::query!(
@@ -389,10 +395,33 @@ fn test_capture_unnamed_node_parenthesized() {
assert!(!op_node.is_named());
}
#[test]
fn test_capture_bare_underscore_repeated() {
// `_` matches named and unnamed nodes in bare-child position. On this
// assignment shape, bare children correspond to unnamed tokens (the `=`).
let runner: Runner = Runner::new(tree_sitter_ruby::LANGUAGE.into(), &[]);
let ast = runner.run("x = 1").unwrap();
let query = yeast::query!((assignment _* @all));
let mut cursor = AstCursor::new(&ast);
cursor.goto_first_child();
let assignment_id = cursor.node_id();
let mut captures = yeast::captures::Captures::new();
let matched = query.do_match(&ast, assignment_id, &mut captures).unwrap();
assert!(matched);
let all = captures.get_all("all");
assert_eq!(all.len(), 1);
assert_eq!(ast.get_node(all[0]).unwrap().kind(), "=");
assert!(!ast.get_node(all[0]).unwrap().is_named());
}
#[test]
fn test_capture_unnamed_node_bare_literal() {
// `"=" @op` (without surrounding parens) is the same as `("=") @op`.
let runner = Runner::new(tree_sitter_ruby::LANGUAGE.into(), &[]);
let runner: Runner = Runner::new(tree_sitter_ruby::LANGUAGE.into(), &[]);
let ast = runner.run("x = 1").unwrap();
let query = yeast::query!(
@@ -421,7 +450,7 @@ fn test_bare_underscore_matches_unnamed() {
// Bare `_` matches any node, including unnamed tokens, while `(_)`
// matches only named nodes. Demonstrate by matching the unnamed `=`
// token in the implicit `child` field of an `assignment`.
let runner = Runner::new(tree_sitter_ruby::LANGUAGE.into(), &[]);
let runner: Runner = Runner::new(tree_sitter_ruby::LANGUAGE.into(), &[]);
let ast = runner.run("x = 1").unwrap();
let mut cursor = AstCursor::new(&ast);
@@ -460,7 +489,7 @@ fn test_bare_forms_in_field_position() {
// field's value, not just in the bare-children position. This is
// syntactic sugar for `(_)` / `("…")` and goes through the same
// code paths.
let runner = Runner::new(tree_sitter_ruby::LANGUAGE.into(), &[]);
let runner: Runner = Runner::new(tree_sitter_ruby::LANGUAGE.into(), &[]);
let ast = runner.run("x = 1").unwrap();
let mut cursor = AstCursor::new(&ast);
@@ -499,7 +528,7 @@ fn test_forward_scan_finds_unnamed_token_late() {
// query for `("end")` skip past the first two and match the third.
// Without forward-scan, the matcher took the first child unconditionally
// and failed.
let runner = Runner::new(tree_sitter_ruby::LANGUAGE.into(), &[]);
let runner: Runner = Runner::new(tree_sitter_ruby::LANGUAGE.into(), &[]);
let ast = runner.run("for x in list do\n y\nend").unwrap();
// Navigate: program > for > do (the body wrapper).
@@ -526,7 +555,7 @@ fn test_forward_scan_preserves_order() {
// order. A query for ("end") then ("do") should fail because `do`
// appears before `end` in the source order; once forward-scan has
// consumed `end`, the iterator is exhausted.
let runner = Runner::new(tree_sitter_ruby::LANGUAGE.into(), &[]);
let runner: Runner = Runner::new(tree_sitter_ruby::LANGUAGE.into(), &[]);
let ast = runner.run("for x in list do\n y\nend").unwrap();
let mut cursor = AstCursor::new(&ast);
@@ -547,7 +576,7 @@ fn test_forward_scan_preserves_order() {
#[test]
fn test_tree_builder() {
let runner = Runner::new(tree_sitter_ruby::LANGUAGE.into(), &[]);
let runner: Runner = Runner::new(tree_sitter_ruby::LANGUAGE.into(), &[]);
let mut ast = runner.run("x = 1").unwrap();
let input = "x = 1";
@@ -565,7 +594,8 @@ fn test_tree_builder() {
// Swap left and right
let fresh = yeast::tree_builder::FreshScope::new();
let mut ctx = yeast::build::BuildCtx::new(&mut ast, &captures, &fresh);
let mut user_ctx = ();
let mut ctx = yeast::build::BuildCtx::new(&mut ast, &captures, &fresh, &mut user_ctx);
let new_id = yeast::tree!(ctx,
(program
child: (assignment
@@ -593,7 +623,7 @@ fn test_tree_builder() {
// tree-sitter-ruby grammar with named fields for nodes that only have
// unnamed children in tree-sitter (e.g. block_body.stmt, block_parameters.parameter).
fn ruby_rules() -> Vec<Rule> {
let assign_rule = yeast::rule!(
let assign_rule: Rule = yeast::rule!(
(assignment
left: (left_assignment_list
(identifier)* @left
@@ -618,7 +648,7 @@ fn ruby_rules() -> Vec<Rule> {
)}
);
let for_rule = yeast::rule!(
let for_rule: Rule = yeast::rule!(
(for
pattern: (_) @pat
value: (in (_) @val)
@@ -700,7 +730,7 @@ fn test_desugar_for_loop() {
#[test]
fn test_shorthand_rule() {
let rule = yeast::rule!(
let rule: Rule = yeast::rule!(
(assignment
left: (_) @method
right: (_) @receiver
@@ -852,7 +882,7 @@ fn test_phase_error_includes_phase_name() {
PhaseKind::Repeating,
vec![swap_assignment_rule().repeated()],
)];
let runner = Runner::with_schema(lang, &schema, &phases);
let runner: Runner = Runner::with_schema(lang, &schema, &phases);
let err = runner
.run("x = 1")
.expect_err("expected runner to return an error");
@@ -895,7 +925,7 @@ fn test_one_shot_phase() {
PhaseKind::OneShot,
one_shot_xeq1_rules(),
)];
let runner = Runner::with_schema(lang, &schema, &phases);
let runner: Runner = Runner::with_schema(lang, &schema, &phases);
let input = "x = 1";
let ast = runner.run(input).unwrap();
@@ -921,7 +951,7 @@ fn test_one_shot_phase_errors_when_no_rule_matches() {
let mut rules = one_shot_xeq1_rules();
rules.pop();
let phases = vec![Phase::new("translate", PhaseKind::OneShot, rules)];
let runner = Runner::with_schema(lang, &schema, &phases);
let runner: Runner = Runner::with_schema(lang, &schema, &phases);
let err = runner
.run("x = 1")
@@ -945,7 +975,7 @@ fn test_one_shot_recurses_into_returned_capture() {
let lang: tree_sitter::Language = tree_sitter_ruby::LANGUAGE.into();
let schema =
yeast::node_types_yaml::schema_from_yaml_with_language(OUTPUT_SCHEMA_YAML, &lang).unwrap();
let rules = vec![
let rules: Vec<Rule> = vec![
yeast::rule!(
(program (_)* @stmts)
=>
@@ -961,7 +991,7 @@ fn test_one_shot_recurses_into_returned_capture() {
yeast::rule!((integer) => (integer "INT")),
];
let phases = vec![Phase::new("translate", PhaseKind::OneShot, rules)];
let runner = Runner::with_schema(lang, &schema, &phases);
let runner: Runner = Runner::with_schema(lang, &schema, &phases);
let input = "x = 1";
let ast = runner.run(input).unwrap();
@@ -987,7 +1017,7 @@ fn test_one_shot_does_not_recurse_into_wrapper_output() {
let lang: tree_sitter::Language = tree_sitter_ruby::LANGUAGE.into();
let schema =
yeast::node_types_yaml::schema_from_yaml_with_language(OUTPUT_SCHEMA_YAML, &lang).unwrap();
let rules = vec![
let rules: Vec<Rule> = vec![
yeast::rule!(
(program (_)* @stmts)
=>
@@ -1008,7 +1038,7 @@ fn test_one_shot_does_not_recurse_into_wrapper_output() {
yeast::rule!((integer) => (integer "INT")),
];
let phases = vec![Phase::new("translate", PhaseKind::OneShot, rules)];
let runner = Runner::with_schema(lang, &schema, &phases);
let runner: Runner = Runner::with_schema(lang, &schema, &phases);
let input = "x = 1";
let ast = runner.run(input).unwrap();
@@ -1032,7 +1062,7 @@ fn test_one_shot_does_not_recurse_into_wrapper_output() {
#[test]
fn test_cursor_navigation() {
let runner = Runner::new(tree_sitter_ruby::LANGUAGE.into(), &[]);
let runner: Runner = Runner::new(tree_sitter_ruby::LANGUAGE.into(), &[]);
let ast = runner.run("x = 1").unwrap();
let mut cursor = AstCursor::new(&ast);
@@ -1106,7 +1136,7 @@ fn test_desugar_for_with_multiple_assignment() {
/// resolves to the captured node's source text via `YeastDisplay`.
#[test]
fn test_hash_brace_renders_capture_source_text() {
let rule = rule!(
let rule: Rule = rule!(
(call
method: (identifier) @name
receiver: (identifier) @recv
@@ -1135,7 +1165,7 @@ fn test_hash_brace_renders_capture_source_text() {
/// `Display` impl (covered by `YeastDisplay`'s blanket impls for primitives).
#[test]
fn test_hash_brace_renders_integer_expression() {
let rule = rule!(
let rule: Rule = rule!(
(identifier) @_
=>
(identifier #{1 + 2})
@@ -1149,3 +1179,39 @@ fn test_hash_brace_renders_integer_expression() {
"#,
);
}
/// Regression test: `(kind #{capture})` should inherit the captured node's
/// source location, not the full source range of the matched rule root.
#[test]
fn test_hash_brace_uses_capture_location_for_leaf() {
let rule: Rule = rule!(
(call
method: (identifier) @name
receiver: (identifier) @recv
)
=>
(call
method: (identifier #{name})
receiver: (identifier #{recv})
arguments: (argument_list)
)
);
let ast = run_and_ast("foo.bar()", vec![rule]);
let mut bar_ids: Vec<usize> = Vec::new();
for id in ast.reachable_node_ids() {
let Some(node) = ast.get_node(id) else {
continue;
};
if node.kind() == "identifier" && ast.source_text(id) == "bar" {
bar_ids.push(id);
}
}
assert_eq!(bar_ids.len(), 1, "expected exactly one identifier 'bar'");
let bar = ast.get_node(bar_ids[0]).unwrap();
assert_eq!(bar.start_byte(), 4);
assert_eq!(bar.end_byte(), 7);
}

View File

@@ -3,25 +3,21 @@
This is a CodeQL extractor based on tree-sitter.
## Building
To build the extractor, run `scripts/create-extractor-pack.sh`
- To build the extractor, run `scripts/create-extractor-pack.sh`
## Editing the Swift grammar
The vendored tree-sitter-swift grammar lives at
`extractor/tree-sitter-swift/`. After editing `grammar.js` (or any other
grammar source), run `scripts/regenerate-grammar.sh` to:
- regenerate `extractor/tree-sitter-swift/src/{parser.c, grammar.json,
node-types.json}` (and the `src/tree_sitter/*.h` headers) via
`tree-sitter generate`; and
- refresh `extractor/tree-sitter-swift/node-types.yml`, the
human-readable companion to `src/node-types.json` produced by yeast's
`node_types_yaml` binary.
## Swift Parser
- The Swift parser is defined by `extractor/tree-sitter-swift/grammar.js` and can be edited if needed.
`node-types.yml` is the recommended review surface for grammar changes —
it shows the impact of a grammar tweak on the named node kinds, fields,
and child types in a form much easier to read than the raw JSON.
- After editing the grammar, always run `scripts/regenerate-grammar.sh`.
## Extractor Testing
- To run extractor tests, run `cargo test` in the `extractor` directory.
- The raw parse tree is described by `extractor/tree-sitter-swift/node-types.yml` and should be reviewed after grammar changes.
## AST Mapping
- The target AST shape is described by `extractor/ast_types.yml`.
- The mapping from the parse tree to the target AST is found in `extractor/src/languages/swift/swift.rs`
- To run tests for the parser and mapping, run `cargo test` in the `extractor` directory.
- Do not edit the printed ASTs in `extractor/test/corpus` directly. To regenerate the ASTs, run `scripts/update-corpus.sh`.

View File

@@ -2,36 +2,103 @@ supertypes:
expr:
- name_expr
- int_literal
- float_literal
- boolean_literal
- string_literal
- regex_literal
- builtin_expr
- binary_expr
- unary_expr
- call_expr
- member_access_expr
- lambda_expr
- unsupported_node
stmt:
- empty_stmt
- block_stmt
- expr_stmt
- if_stmt
- variable_declaration_stmt
- guard_if_stmt
- unsupported_node
condition:
- expr_condition
- let_pattern_condition
- sequence_condition
- super_expr
- function_expr
- array_literal
- map_literal
- key_value_pair
- tuple_expr
- type_cast_expr
- type_test_expr
- if_expr
- assign_expr
- compound_assign_expr
- pattern_guard_expr
- empty_expr
- block
- break_expr
- continue_expr
- return_expr
- throw_expr
- try_expr
- switch_expr
- unsupported_node
expr_or_pattern:
- expr
- pattern
expr_or_type:
- expr
- type_expr
pattern:
- var_pattern
- apply_pattern
- name_pattern
- tuple_pattern
- constructor_pattern
- ignore_pattern
- expr_equality_pattern
- bulk_importing_pattern
- unsupported_node
# A statement is anything that can appear in a block.
# This type contains all of 'expr' and has partial overlap with 'member'.
# For example, type_alias_declaration can appear either as a stmt or member.
# constructor_declaration and destructor_declaration appear here because
# tree-sitter-swift's error recovery for #if/#endif in class bodies can place
# init/deinit declarations at the wrong (statement) level.
stmt:
- expr
- variable_declaration
- type_alias_declaration
- function_declaration
- import_declaration
- operator_syntax_declaration
- class_like_declaration
- accessor_declaration
- constructor_declaration
- destructor_declaration
- guard_if_stmt
- for_each_stmt
- while_stmt
- do_while_stmt
- labeled_stmt
# A member is anything that can appear in the body of a class-like declaration
member:
- constructor_declaration
- destructor_declaration
- function_declaration
- variable_declaration
- accessor_declaration
- initializer_declaration
- class_like_declaration
- type_alias_declaration
- associated_type_declaration
- unsupported_node
type_expr:
- named_type_expr
- generic_type_expr
- tuple_type_expr
- function_type_expr
- inferred_type_expr
- unsupported_node
type_constraint:
- equality_type_constraint
- bound_type_constraint
operator:
- infix_operator
- prefix_operator
- postfix_operator
named:
# Top-level is the root node, currently containing a list of expressions
# Top-level is the root node, containing a single block of statements
# (which are themselves expressions or declarations).
top_level:
body*: [expr, stmt]
body: block
# An identifier used in the context of an expression
name_expr:
@@ -40,13 +107,28 @@ named:
# An integer literal
int_literal:
# A floating-point literal
float_literal:
# A boolean literal
boolean_literal:
# A literal backed by a keyword such as `nil`, `null`, or `nullptr`.
#
# Although nil/null are keyword literals in many languages there should be
# no attempt to normalize "null-like" named entities, like Python's `None`.
builtin_expr:
# A string literal
string_literal:
# A regex literal
regex_literal:
# Application of a binary operator, such as `a + b`
binary_expr:
left: expr
operator: operator
operator: infix_operator
right: expr
# Application of a unary operator, such as `!x`
@@ -54,86 +136,310 @@ named:
operand: expr
operator: operator
# A function or method call, such as `f(x)` or `obj.m(x)`. Method calls
# are represented as a call whose `function` is a `member_access_expr`.
# Plain assignment
assign_expr:
target: expr_or_pattern
value: expr
# Compound assignment
compound_assign_expr:
target: expr
operator: infix_operator
value: expr
# A function or method call, such as `f(x)` or `obj.m(x)`.
#
# Method calls are represented as a call whose `function` is a `member_access_expr`.
#
# Constructor calls are marked by a language-specific modifier, and the target may be
# a `type_expr` if the parser can deduce that the target is a type.
call_expr:
function: expr
argument*: expr
modifier*: modifier
callee: expr_or_type
argument*: argument
argument:
modifier*: modifier
name?: identifier
value: expr
# Member access, such as `obj.member`.
#
# The base may be a type expression when it is a static member access like `Array<Int>.method`.
# In ambiguous cases where the parser cannot distinguish static and instance member access, the base
# will be typically be an expression.
#
# For `super.x` the base will be an instance of `super_expr`.
member_access_expr:
target: expr
base: expr_or_type
member: identifier
lambda_expr:
# A type expression that refers to a type inferred from the contextual type.
# This is used to translate Swift's leading-dot syntax, `.foo`, which means `T.foo` where
# `T` is the contextual type of some enclosing expression. This is translated to a member_access
# with an inferred_type_expr as the base.
inferred_type_expr:
# A `super` token, which can usually only appear as the base of member access.
super_expr:
function_expr:
modifier*: modifier
capture_declaration*: variable_declaration
parameter*: parameter
body: [expr, stmt]
return_type?: type_expr
body: block
# A parameter
array_literal:
element*: expr
map_literal:
element*: expr
# A key-value pair, usually appearing as a named argument or as part of a map literal.
#
# For some languages, the key-value pair is a first class value and this type of expression
# may thus appear anywhere in the general case.
key_value_pair:
key: expr
value: expr
# A tuple expression, such as `(a, b, c)`.
tuple_expr:
element*: expr
# A parameter.
#
# `type` is its declared type annotation (if any)
#
# `pattern` binds the parameter's internal name(s). For a simple parameter this is a
# `name_pattern`, but may be an arbitrary pattern for languages where patterns may appear
# in the parameter list.
#
# `external_name` is the name by which to call sites refer to the parameter, if the parameter
# can be passed as a named parameter. For example, the Swift function `func greet(person id: String)`
# would have `person` as the external name and a `name_pattern` wrapping `id` is the parameter's pattern.
parameter:
modifier*: modifier
external_name?: identifier
type?: type_expr
pattern?: pattern
default?: expr
# An expression that does nothing. Used where the grammar permits an
# empty statement (e.g. a stray `;`).
empty_expr:
# A brace-delimited sequence of statements (`{ ... }`). Blocks are the
# only nodes that can directly contain statements; every other body-like
# field holds a single `block`.
block:
stmt*: stmt
if_expr:
condition: expr
then?: expr
else?: expr
# A variable declaration or destructuring assignment that introduces new variables.
#
# Any occurrence of `var_patterns` in 'pattern' result in fresh bindings that are
# in scope for the rest of the enclosing block.
#
# The initializer is optional (but typically cannot be omitted if combined with a non-trivial pattern).
#
# Modifiers should include 'var', 'let', 'const', etc, if they are significant.
# A grouped declaration like `let x = 1, y = 2` is emitted as a sequence of
# `variable_declaration`s directly into the enclosing stmt/member slot; every
# declaration after the first in such a group is tagged with a synthetic
# `chained_declaration` modifier so the grouping can be recovered downstream.
variable_declaration:
modifier*: modifier
pattern: pattern
empty_stmt:
block_stmt:
body*: stmt
expr_stmt:
expr: expr
if_stmt:
condition: condition
then?: stmt
else?: stmt
variable_declaration_stmt:
variable_declarator+: variable_declarator
# A variable declaration, or assignment to a pattern.
# The initializer is optional (but typically only possible in combination with a simple variable pattern).
variable_declarator:
pattern: pattern
type?: type_expr
value?: expr
# Evaluate 'condition', and if false, execute 'else' which must break from the enclosing block scope (return, break, etc).
# Any variables bound by 'condition' will be in scope for the remainder of the enclosing block scope
# (which differs from how if_stmt works).
# (which differs from how if_expr works).
guard_if_stmt:
condition: condition
else: stmt
condition: expr
else: block
# Evaluates the given condition and interprets it as a boolean (by language conventions)
expr_condition:
expr: expr
# `break` (with optional label)
break_expr:
label?: identifier
# A series of statements that are executed before evaluating the trailing condition.
# Useful for languages where a conditional clause may be preceded by side-effecting
# syntactic elements (e.g. binding clauses) that don't themselves form a condition.
sequence_condition:
stmt*: stmt
condition: condition
# `continue` (with optional label)
continue_expr:
label?: identifier
# A labeled statement, such as `outer: for ... { ... }`. The labeled
# statement appears as the `stmt` field; `break`/`continue` may target
# the label.
labeled_stmt:
label: identifier
stmt: stmt
# `return value` or bare `return`
return_expr:
value?: expr
# `throw value`
throw_expr:
value?: expr
# An import declaration.
#
# The semantics of an import are generally:
# - Evaluate the 'imported_expr' to a value (possibly a compile-time value, such as namespace)
# - Filter away possible values based on modifiers (e.g. type-only imports only accept types)
# - Assign the value to the pattern, binding variables and/or type names in scope
#
import_declaration:
modifier*: modifier
imported_expr: expr # Qualified names are encoded as a chain of member_access_expr ending with a name_expr
pattern?: pattern # Binds local names in scope (possibly via bulk_importing_pattern)
# `typealias Name = Type`
type_alias_declaration:
modifier*: modifier
name: identifier
type_parameter*: type_parameter
type_constraint*: type_constraint
type: type_expr
# A top-level function declaration.
function_declaration:
modifier*: modifier
name: identifier
type_parameter*: type_parameter
type_constraint*: type_constraint
parameter*: parameter
return_type?: type_expr
body?: block
# `for pattern in iterable [where guard] { body }`.
for_each_stmt:
modifier*: modifier
pattern: pattern
iterable: expr
guard?: expr
body?: block
# `while condition { body }`.
while_stmt:
modifier*: modifier
condition: expr
body?: block
# `repeat { body } while condition`.
do_while_stmt:
modifier*: modifier
body?: block
condition: expr
# `do { body } catch pattern { ... } catch ...`. Swift uses `do`/`catch`
# for error handling; for languages with `try`/`catch`, this is the same shape.
try_expr:
modifier*: modifier
body: block
catch_clause*: catch_clause
catch_clause:
modifier*: modifier
pattern?: pattern
guard?: expr
body: block
# `switch value { case pattern: body case ...: default: body }`
switch_expr:
modifier*: modifier
value: expr
case*: switch_case
# A single `case ...:` (or `default:`) entry in a switch.
# An entry with multiple `case p1, p2:` patterns has multiple `pattern`s.
# A `default:` entry has no patterns.
# An optional `guard` corresponds to a `where`-clause on the case.
switch_case:
modifier*: modifier
pattern*: pattern
guard?: expr
body: block
# Evaluate 'expr' and match its result against 'pattern', and return true if it matches.
# Variables bound by the pattern will be in scope within the 'true' branch controlled by this condition.
let_pattern_condition:
# Variables bound by the pattern will be in scope within the 'true' branch controlled by this expression.
#
# In Swift, `if case let PATTERN = EXPR` maps to this node
#
# Java: 'if (x instanceof Foo y && w ...) { ... }'
pattern_guard_expr:
pattern: pattern
value: expr
# A pattern matching anything, binding its value to the given variable
var_pattern:
# A type cast expression, such as `x as T`, `x as? T`, or `x as! T`. The
# operator distinguishes between the variants.
type_cast_expr:
expr: expr
operator: infix_operator
type: type_expr
# A type-test expression, such as `x is T`. Yields a boolean indicating
# whether `expr` is an instance of `type`.
type_test_expr:
expr: expr
operator: infix_operator
type: type_expr
# An identifier that introduces a variable.
#
# When used as a pattern, the pattern matches anything and binds its incoming value to the variable
name_pattern:
modifier*: modifier
identifier: identifier
# A pattern matching anything, binding no variables, usually using the syntax "_"
ignore_pattern:
# A pattern such as `Some(x)` where `Some` is the constructor and `x` is an argument
apply_pattern:
constructor: expr
argument*: pattern
# A pattern that matches if the incoming value is equal to the value of the given expression.
# Used for literal patterns in switch (e.g. `case 1:`).
expr_equality_pattern:
expr: expr
# A tuple pattern such as `(a, b)` in `let (a, b) = pair`.
#
# Elements of the tuple pattern can have names, such as Swift's `let (foo: x, bar: y) = tuple`.
tuple_pattern:
element*: pattern
modifier*: modifier
element*: pattern_element
# A pattern such as `Some(x)` where `Some` is the constructor and `x` is an element.
# The element names are interpreted as argument labels and/or field names.
constructor_pattern:
modifier*: modifier
constructor: expr_or_type
element*: pattern_element
# A pattern with an optional associated name.
pattern_element:
modifier*: modifier
key?: identifier
pattern: pattern
# A pattern that checks if the incoming value has the given type, and if so, the
# value is matched against the given nested pattern (and succeeds iff the nested match succeeds).
#
# In Swift: `if let y = x as? Foo` is a pattern_guard_expr containing a type_test_pattern
# In Java: `x instanceof Foo y` is a type_test_pattern wrapping a name_pattern
type_test_pattern:
pattern: pattern
type: type_expr
# A '*' pattern that imports all members of the incoming value into the local scope
# Currently this can only appear in import declarations.
bulk_importing_pattern:
modifier*: modifier
# An simple unqualified identifier token
identifier:
@@ -141,4 +447,129 @@ named:
# A node that we don't yet translate
unsupported_node:
operator:
infix_operator:
prefix_operator:
postfix_operator:
# The fixity of a custom operator declaration (e.g. "prefix", "infix",
# "postfix"). The value is the keyword string.
fixity:
type_parameter:
modifier*: modifier
name: identifier
bound?: type_expr
# A generic constraint of the form `T == U`, requiring two types to be
# equal. Appears in `where` clauses on generic declarations
# (e.g. Swift `func foo<T, U>() where T == U`).
equality_type_constraint:
left: type_expr
right: type_expr
# A generic constraint of the form `T: Bound`, requiring a type parameter
# to conform to (or inherit from) some other type. Appears in `where`
# clauses on generic declarations (e.g. Swift `where T: Equatable`).
bound_type_constraint:
type: type_expr
bound: type_expr
# `infix operator +++` (and the like) — a declaration of a custom operator.
operator_syntax_declaration:
modifier*: modifier
name: identifier
# The fixity specifier (`prefix`, `infix`, `postfix`), when applicable.
fixity?: fixity
# The declared precedence level, when present (e.g. Swift's
# `infix operator +++ : AdditionPrecedence`).
precedence?: expr
# A class-like declaration: class, struct, interface (protocol), enum (or actor).
# The syntactic kind is carried as a `modifier` (e.g. "class", "struct",
# "interface", "enum", "extension"). The `"enum_case"` modifier additionally
# marks a declaration as an enum case with associated values. Extensions are
# represented as a class-like declaration with the `"extension"` modifier and
# no `name`; the extended type appears as a `base_type`.
class_like_declaration:
modifier*: modifier
name?: identifier
type_parameter*: type_parameter
type_constraint*: type_constraint
base_type*: base_type
member*: member
# One of the base types of a class declaration.
#
# If the language has multiple kinds of base classes (e.g. extends/implements) the
# kind should be included as a modifier on this node.
base_type:
modifier*: modifier
type: type_expr
constructor_declaration:
modifier*: modifier
name?: identifier
parameter*: parameter
body: block
# A destructor / finalizer (Swift `deinit`, C++ `~T()`, etc.).
destructor_declaration:
modifier*: modifier
body: block
# Declaration of a single accessor for a property (such as a getter, setter,
# or observer like Swift's `willSet`/`didSet`).
#
# Multiple accessors for the same property are emitted as a sequence of
# accessor_declaration nodes; every accessor after the first is tagged with
# a synthetic `chained_declaration` modifier so the grouping can be recovered
# downstream. Stored properties with observers are emitted as a
# variable_declaration followed by one accessor_declaration per observer
# (each observer also tagged with `chained_declaration`).
accessor_declaration:
modifier*: modifier
name: identifier
accessor_kind: accessor_kind
parameter*: parameter
type?: type_expr
body?: block
# "get", "set", or a language-specific kind like "didSet"
accessor_kind:
# Static or instance initializer block. That is, code that runs at initialization time of either the class or an instance.
initializer_declaration:
modifier*: modifier
body: block
associated_type_declaration:
modifier*: modifier
name: identifier
bound?: type_expr
named_type_expr:
qualifier?: type_expr
name: identifier
generic_type_expr:
base: type_expr
type_argument*: type_expr
# A tuple type such as `(Int, String)` or `(a: A, b: B)`.
tuple_type_expr:
element*: tuple_type_element
# An element of a `tuple_type_expr`, optionally carrying a label.
tuple_type_element:
name?: identifier
type: type_expr
# A function type such as `(Int, String) -> Bool` or `(x: Int) -> Bool`.
function_type_expr:
parameter*: parameter
return_type: type_expr
# A modifier such as 'static', 'public', or 'async'. For now this is just a leaf node with a string value.
modifier:

View File

@@ -1,9 +1,9 @@
use clap::Args;
use std::path::PathBuf;
use crate::languages;
use codeql_extractor::extractor::simple;
use codeql_extractor::trap;
use crate::languages;
#[derive(Args)]
pub struct Options {
@@ -35,7 +35,9 @@ pub fn run(options: Options) -> std::io::Result<()> {
prefix: "unified".to_string(),
languages,
trap_dir: options.output_dir,
trap_compression: trap::Compression::from_env("CODEQL_EXTRACTOR_UNIFIED_OPTION_TRAP_COMPRESSION"),
trap_compression: trap::Compression::from_env(
"CODEQL_EXTRACTOR_UNIFIED_OPTION_TRAP_COMPRESSION",
),
source_archive_dir: options.source_archive_dir,
file_lists: vec![options.file_list],
};

View File

@@ -22,14 +22,19 @@ pub fn run(options: Options) -> std::io::Result<()> {
// The QL-visible schema is the unified output AST, not the per-language
// input grammars. Pass it via `desugar.output_node_types_yaml` so the
// generator converts the YAML to JSON node-types.
let desugar = yeast::DesugaringConfig::new()
.with_output_node_types_yaml(languages::OUTPUT_AST_SCHEMA);
let desugar =
yeast::DesugaringConfig::new().with_output_node_types_yaml(languages::OUTPUT_AST_SCHEMA);
let languages = vec![Language {
name: "Unified".to_owned(),
node_types: "", // unused: generator picks up output_node_types_yaml above
node_types: "", // unused: generator picks up output_node_types_yaml above
desugar: Some(desugar),
}];
generate(languages, options.dbscheme, options.library, "run unified/scripts/create-extractor-pack.sh")
generate(
languages,
options.dbscheme,
options.library,
"run unified/scripts/create-extractor-pack.sh",
)
}

File diff suppressed because it is too large Load Diff

View File

@@ -50,6 +50,35 @@ source_file
top_level
body:
block
stmt:
variable_declaration
modifier: modifier "let"
pattern:
name_pattern
identifier: identifier "f"
value:
function_expr
body:
block
stmt:
binary_expr
operator: infix_operator "*"
left:
name_expr
identifier: identifier "x"
right: int_literal "2"
parameter:
parameter
pattern:
name_pattern
identifier: identifier "x"
type:
named_type_expr
name: identifier "Int"
return_type:
named_type_expr
name: identifier "Int"
===
Closure with shorthand parameters
@@ -82,6 +111,26 @@ source_file
top_level
body:
block
stmt:
variable_declaration
modifier: modifier "let"
pattern:
name_pattern
identifier: identifier "f"
value:
function_expr
body:
block
stmt:
binary_expr
operator: infix_operator "+"
left:
name_expr
identifier: identifier "$0"
right:
name_expr
identifier: identifier "$1"
===
Trailing closure
@@ -114,6 +163,28 @@ source_file
top_level
body:
block
stmt:
call_expr
argument:
argument
value:
function_expr
body:
block
stmt:
binary_expr
operator: infix_operator "*"
left:
name_expr
identifier: identifier "$0"
right: int_literal "2"
callee:
member_access_expr
base:
name_expr
identifier: identifier "xs"
member: identifier "map"
===
Closure with capture list
@@ -163,6 +234,31 @@ source_file
top_level
body:
block
stmt:
variable_declaration
modifier: modifier "let"
pattern:
name_pattern
identifier: identifier "f"
value:
function_expr
body:
block
stmt:
call_expr
callee:
member_access_expr
base:
name_expr
identifier: identifier "self"
member: identifier "doThing"
capture_declaration:
variable_declaration
modifier: modifier "weak"
pattern:
name_pattern
identifier: identifier "self"
===
Multi-statement closure
@@ -236,3 +332,46 @@ source_file
top_level
body:
block
stmt:
variable_declaration
modifier: modifier "let"
pattern:
name_pattern
identifier: identifier "f"
value:
function_expr
body:
block
stmt:
variable_declaration
modifier: modifier "let"
pattern:
name_pattern
identifier: identifier "y"
value:
binary_expr
operator: infix_operator "+"
left:
name_expr
identifier: identifier "x"
right: int_literal "1"
return_expr
value:
binary_expr
operator: infix_operator "*"
left:
name_expr
identifier: identifier "y"
right: int_literal "2"
parameter:
parameter
pattern:
name_pattern
identifier: identifier "x"
type:
named_type_expr
name: identifier "Int"
return_type:
named_type_expr
name: identifier "Int"

View File

@@ -28,6 +28,19 @@ source_file
top_level
body:
block
stmt:
variable_declaration
modifier: modifier "let"
pattern:
name_pattern
identifier: identifier "xs"
value:
array_literal
element:
int_literal "1"
int_literal "2"
int_literal "3"
===
Empty array literal with type
@@ -68,6 +81,22 @@ source_file
top_level
body:
block
stmt:
variable_declaration
modifier: modifier "let"
pattern:
name_pattern
identifier: identifier "xs"
type:
generic_type_expr
base:
named_type_expr
name: identifier "Array"
type_argument:
named_type_expr
name: identifier "Int"
value: array_literal "[]"
===
Dictionary literal
@@ -106,6 +135,14 @@ source_file
top_level
body:
block
stmt:
variable_declaration
modifier: modifier "let"
pattern:
name_pattern
identifier: identifier "d"
value: map_literal "[\"a\": 1, \"b\": 2]"
===
Set literal
@@ -155,6 +192,22 @@ source_file
top_level
body:
block
stmt:
variable_declaration
modifier: modifier "let"
pattern:
name_pattern
identifier: identifier "s"
type:
named_type_expr
name: identifier "Set<Int>"
value:
array_literal
element:
int_literal "1"
int_literal "2"
int_literal "3"
===
Tuple literal
@@ -191,6 +244,14 @@ source_file
top_level
body:
block
stmt:
variable_declaration
modifier: modifier "let"
pattern:
name_pattern
identifier: identifier "t"
value: tuple_expr "(1, \"two\", 3.0)"
===
Subscript access
@@ -232,9 +293,21 @@ source_file
top_level
body:
unsupported_node "// TODO: tree-sitter-swift parses `xs[0]` as a call_expression (same shape"
unsupported_node "// as `xs(0)`), so the mapping currently produces a call_expr. Update the"
unsupported_node "// parser / add a separate subscript_expr node and remap when fixed."
block
stmt:
variable_declaration
modifier: modifier "let"
pattern:
name_pattern
identifier: identifier "first"
value:
call_expr
argument:
argument
value: int_literal "0"
callee:
name_expr
identifier: identifier "xs"
===
Dictionary subscript
@@ -276,8 +349,21 @@ source_file
top_level
body:
unsupported_node "// TODO: same parser issue as the array subscript case above —"
unsupported_node "// `d[\"key\"]` is parsed as `call_expression(d, (\"key\"))`."
block
stmt:
variable_declaration
modifier: modifier "let"
pattern:
name_pattern
identifier: identifier "v"
value:
call_expr
argument:
argument
value: string_literal "\"key\""
callee:
name_expr
identifier: identifier "d"
===
Tuple member access
@@ -309,3 +395,16 @@ source_file
top_level
body:
block
stmt:
variable_declaration
modifier: modifier "let"
pattern:
name_pattern
identifier: identifier "n"
value:
member_access_expr
base:
name_expr
identifier: identifier "t"
member: identifier "0"

View File

@@ -35,6 +35,28 @@ source_file
top_level
body:
block
stmt:
if_expr
condition:
binary_expr
operator: infix_operator ">"
left:
name_expr
identifier: identifier "x"
right: int_literal "0"
then:
block
stmt:
call_expr
argument:
argument
value:
name_expr
identifier: identifier "x"
callee:
name_expr
identifier: identifier "print"
===
If-else
@@ -90,6 +112,43 @@ source_file
top_level
body:
block
stmt:
if_expr
condition:
binary_expr
operator: infix_operator ">"
left:
name_expr
identifier: identifier "x"
right: int_literal "0"
else:
block
stmt:
call_expr
argument:
argument
value:
unary_expr
operand:
name_expr
identifier: identifier "x"
operator: prefix_operator "-"
callee:
name_expr
identifier: identifier "print"
then:
block
stmt:
call_expr
argument:
argument
value:
name_expr
identifier: identifier "x"
callee:
name_expr
identifier: identifier "print"
===
If-else-if chain
@@ -165,6 +224,55 @@ source_file
top_level
body:
block
stmt:
if_expr
condition:
binary_expr
operator: infix_operator ">"
left:
name_expr
identifier: identifier "x"
right: int_literal "0"
else:
if_expr
condition:
binary_expr
operator: infix_operator "<"
left:
name_expr
identifier: identifier "x"
right: int_literal "0"
else:
block
stmt:
call_expr
argument:
argument
value: int_literal "3"
callee:
name_expr
identifier: identifier "print"
then:
block
stmt:
call_expr
argument:
argument
value: int_literal "2"
callee:
name_expr
identifier: identifier "print"
then:
block
stmt:
call_expr
argument:
argument
value: int_literal "1"
callee:
name_expr
identifier: identifier "print"
===
If-let optional binding
@@ -207,6 +315,39 @@ source_file
top_level
body:
block
stmt:
if_expr
condition:
pattern_guard_expr
pattern:
constructor_pattern
element:
pattern_element
pattern:
name_pattern
identifier: identifier "value"
constructor:
member_access_expr
base:
named_type_expr
name: identifier "Optional"
member: identifier "some"
value:
name_expr
identifier: identifier "optional"
then:
block
stmt:
call_expr
argument:
argument
value:
name_expr
identifier: identifier "value"
callee:
name_expr
identifier: identifier "print"
===
Guard let
@@ -240,6 +381,30 @@ source_file
top_level
body:
block
stmt:
guard_if_stmt
condition:
pattern_guard_expr
pattern:
constructor_pattern
element:
pattern_element
pattern:
name_pattern
identifier: identifier "value"
constructor:
member_access_expr
base:
named_type_expr
name: identifier "Optional"
member: identifier "some"
value:
name_expr
identifier: identifier "optional"
else:
block
stmt: return_expr "return"
===
Ternary expression
@@ -277,6 +442,27 @@ source_file
top_level
body:
block
stmt:
variable_declaration
modifier: modifier "let"
pattern:
name_pattern
identifier: identifier "y"
value:
if_expr
condition:
binary_expr
operator: infix_operator ">"
left:
name_expr
identifier: identifier "x"
right: int_literal "0"
else:
unary_expr
operand: int_literal "1"
operator: prefix_operator "-"
then: int_literal "1"
===
Switch statement
@@ -357,6 +543,54 @@ source_file
top_level
body:
block
stmt:
switch_expr
case:
switch_case
body:
block
stmt:
call_expr
argument:
argument
value: string_literal "\"one\""
callee:
name_expr
identifier: identifier "print"
pattern:
expr_equality_pattern
expr: int_literal "1"
switch_case
body:
block
stmt:
call_expr
argument:
argument
value: string_literal "\"two or three\""
callee:
name_expr
identifier: identifier "print"
pattern:
expr_equality_pattern
expr: int_literal "2"
expr_equality_pattern
expr: int_literal "3"
switch_case
body:
block
stmt:
call_expr
argument:
argument
value: string_literal "\"other\""
callee:
name_expr
identifier: identifier "print"
value:
name_expr
identifier: identifier "x"
===
Switch with binding pattern
@@ -396,6 +630,7 @@ source_file
pattern:
pattern
bound_identifier: simple_identifier "r"
dot: .
name: simple_identifier "circle"
statement:
call_expression
@@ -428,6 +663,7 @@ source_file
pattern:
pattern
bound_identifier: simple_identifier "s"
dot: .
name: simple_identifier "square"
statement:
call_expression
@@ -445,3 +681,207 @@ source_file
top_level
body:
block
stmt:
switch_expr
case:
switch_case
body:
block
stmt:
call_expr
argument:
argument
value:
name_expr
identifier: identifier "r"
callee:
name_expr
identifier: identifier "print"
pattern:
constructor_pattern
element:
pattern_element
pattern:
name_pattern
identifier: identifier "r"
constructor:
member_access_expr
base: inferred_type_expr "."
member: identifier "circle"
switch_case
body:
block
stmt:
call_expr
argument:
argument
value:
name_expr
identifier: identifier "s"
callee:
name_expr
identifier: identifier "print"
pattern:
constructor_pattern
element:
pattern_element
pattern:
name_pattern
identifier: identifier "s"
constructor:
member_access_expr
base: inferred_type_expr "."
member: identifier "square"
value:
name_expr
identifier: identifier "shape"
===
Switch with labeled case pattern arguments
===
switch x {
case .implicit(isAcknowledged: false):
print("yes")
case .thread(threadRowId: _, let rowId):
print(rowId)
}
---
source_file
statement:
switch_statement
entry:
switch_entry
pattern:
switch_pattern
pattern:
pattern
kind:
case_pattern
arguments:
tuple_pattern
item:
tuple_pattern_item
name: simple_identifier "isAcknowledged"
pattern:
pattern
kind:
boolean_literal
dot: .
name: simple_identifier "implicit"
statement:
call_expression
function: simple_identifier "print"
suffix:
call_suffix
arguments:
value_arguments
argument:
value_argument
value:
line_string_literal
text: line_str_text "yes"
switch_entry
pattern:
switch_pattern
pattern:
pattern
kind:
case_pattern
arguments:
tuple_pattern
item:
tuple_pattern_item
name: simple_identifier "threadRowId"
pattern:
pattern
kind: wildcard_pattern "_"
tuple_pattern_item
pattern:
pattern
kind:
binding_pattern
binding:
value_binding_pattern
mutability: let
pattern:
pattern
bound_identifier: simple_identifier "rowId"
dot: .
name: simple_identifier "thread"
statement:
call_expression
function: simple_identifier "print"
suffix:
call_suffix
arguments:
value_arguments
argument:
value_argument
value: simple_identifier "rowId"
expr: simple_identifier "x"
---
top_level
body:
block
stmt:
switch_expr
case:
switch_case
body:
block
stmt:
call_expr
argument:
argument
value: string_literal "\"yes\""
callee:
name_expr
identifier: identifier "print"
pattern:
constructor_pattern
element:
pattern_element
key: identifier "isAcknowledged"
pattern:
expr_equality_pattern
expr: boolean_literal "false"
constructor:
member_access_expr
base: inferred_type_expr "."
member: identifier "implicit"
switch_case
body:
block
stmt:
call_expr
argument:
argument
value:
name_expr
identifier: identifier "rowId"
callee:
name_expr
identifier: identifier "print"
pattern:
constructor_pattern
element:
pattern_element
key: identifier "threadRowId"
pattern: ignore_pattern "_"
pattern_element
pattern:
name_pattern
identifier: identifier "rowId"
constructor:
member_access_expr
base: inferred_type_expr "."
member: identifier "thread"
value:
name_expr
identifier: identifier "x"

View File

@@ -17,6 +17,12 @@ source_file
top_level
body:
block
stmt:
binary_expr
operator: infix_operator "+"
left: int_literal "1"
right: int_literal "2"
===
Another additive expression is desugared
@@ -37,3 +43,144 @@ source_file
top_level
body:
block
stmt:
binary_expr
operator: infix_operator "+"
left:
name_expr
identifier: identifier "foo"
right:
name_expr
identifier: identifier "bar"
===
Simple import with single name
===
import Foundation
---
source_file
statement:
import_declaration
name:
identifier
part: simple_identifier "Foundation"
---
top_level
body:
block
stmt:
import_declaration
pattern: bulk_importing_pattern "import Foundation"
imported_expr:
name_expr
identifier: identifier "Foundation"
===
Import with dotted path (two parts)
===
import Foundation.Networking
---
source_file
statement:
import_declaration
name:
identifier
part:
simple_identifier "Foundation"
simple_identifier "Networking"
---
top_level
body:
block
stmt:
import_declaration
pattern: bulk_importing_pattern "import Foundation.Networking"
imported_expr:
member_access_expr
base:
name_expr
identifier: identifier "Foundation"
member: identifier "Networking"
===
Import with deeply nested path (three parts)
===
import Foundation.Networking.URLSession
---
source_file
statement:
import_declaration
name:
identifier
part:
simple_identifier "Foundation"
simple_identifier "Networking"
simple_identifier "URLSession"
---
top_level
body:
block
stmt:
import_declaration
pattern: bulk_importing_pattern "import Foundation.Networking.URLSession"
imported_expr:
member_access_expr
base:
member_access_expr
base:
name_expr
identifier: identifier "Foundation"
member: identifier "Networking"
member: identifier "URLSession"
===
Scoped import uses name_pattern
===
import struct Foundation.Date
---
source_file
statement:
import_declaration
name:
identifier
part:
simple_identifier "Foundation"
simple_identifier "Date"
scoped_import_kind: struct
---
top_level
body:
block
stmt:
import_declaration
modifier: modifier "struct"
pattern:
name_pattern
identifier: identifier "Date"
imported_expr:
member_access_expr
base:
name_expr
identifier: identifier "Foundation"
member: identifier "Date"

View File

@@ -31,6 +31,20 @@ source_file
top_level
body:
block
stmt:
function_declaration
body:
block
stmt:
call_expr
argument:
argument
value: string_literal "\"hello\""
callee:
name_expr
identifier: identifier "print"
name: identifier "greet"
===
Function with parameters and return type
@@ -93,6 +107,37 @@ source_file
top_level
body:
block
stmt:
function_declaration
body:
block
stmt:
return_expr
value:
binary_expr
operator: infix_operator "+"
left:
name_expr
identifier: identifier "a"
right:
name_expr
identifier: identifier "b"
name: identifier "add"
parameter:
parameter
external_name: identifier "_"
pattern:
name_pattern
identifier: identifier "a"
parameter
external_name: identifier "_"
pattern:
name_pattern
identifier: identifier "b"
return_type:
named_type_expr
name: identifier "Int"
===
Function with named parameters
@@ -138,6 +183,28 @@ source_file
top_level
body:
block
stmt:
function_declaration
body:
block
stmt:
call_expr
argument:
argument
value:
name_expr
identifier: identifier "name"
callee:
name_expr
identifier: identifier "print"
name: identifier "greet"
parameter:
parameter
external_name: identifier "person"
pattern:
name_pattern
identifier: identifier "name"
===
Function with default parameter value
@@ -185,6 +252,28 @@ source_file
top_level
body:
block
stmt:
function_declaration
body:
block
stmt:
call_expr
argument:
argument
value:
name_expr
identifier: identifier "name"
callee:
name_expr
identifier: identifier "print"
name: identifier "greet"
parameter:
parameter
default: string_literal "\"world\""
pattern:
name_pattern
identifier: identifier "name"
===
Variadic function
@@ -249,6 +338,38 @@ source_file
top_level
body:
block
stmt:
function_declaration
body:
block
stmt:
return_expr
value:
call_expr
argument:
argument
value: int_literal "0"
argument
value:
name_expr
identifier: identifier "+"
callee:
member_access_expr
base:
name_expr
identifier: identifier "values"
member: identifier "reduce"
name: identifier "sum"
parameter:
parameter
external_name: identifier "_"
pattern:
name_pattern
identifier: identifier "values"
return_type:
named_type_expr
name: identifier "Int"
===
Function call
@@ -276,6 +397,17 @@ source_file
top_level
body:
block
stmt:
call_expr
argument:
argument
value: int_literal "1"
argument
value: int_literal "2"
callee:
name_expr
identifier: identifier "foo"
===
Function call with labelled arguments
@@ -306,6 +438,16 @@ source_file
top_level
body:
block
stmt:
call_expr
argument:
argument
name: identifier "person"
value: string_literal "\"Bob\""
callee:
name_expr
identifier: identifier "greet"
===
Method call
@@ -336,6 +478,18 @@ source_file
top_level
body:
block
stmt:
call_expr
argument:
argument
value: int_literal "1"
callee:
member_access_expr
base:
name_expr
identifier: identifier "list"
member: identifier "append"
===
Generic function
@@ -387,3 +541,117 @@ source_file
top_level
body:
block
stmt:
function_declaration
body:
block
stmt:
return_expr
value:
name_expr
identifier: identifier "x"
name: identifier "identity"
parameter:
parameter
external_name: identifier "_"
pattern:
name_pattern
identifier: identifier "x"
return_type:
named_type_expr
name: identifier "T"
===
Leading-dot expression value
===
let x = .foo
---
source_file
statement:
property_declaration
binding:
value_binding_pattern
mutability: let
declarator:
property_binding
name:
pattern
bound_identifier: simple_identifier "x"
value:
prefix_expression
operation: .
target: simple_identifier "foo"
---
top_level
body:
block
stmt:
variable_declaration
modifier: modifier "let"
pattern:
name_pattern
identifier: identifier "x"
value:
member_access_expr
base: inferred_type_expr ".foo"
member: identifier "foo"
===
Leading-dot expression call
===
let y = .some(1)
---
source_file
statement:
property_declaration
binding:
value_binding_pattern
mutability: let
declarator:
property_binding
name:
pattern
bound_identifier: simple_identifier "y"
value:
call_expression
function:
prefix_expression
operation: .
target: simple_identifier "some"
suffix:
call_suffix
arguments:
value_arguments
argument:
value_argument
value: integer_literal "1"
---
top_level
body:
block
stmt:
variable_declaration
modifier: modifier "let"
pattern:
name_pattern
identifier: identifier "y"
value:
call_expr
argument:
argument
value: int_literal "1"
callee:
member_access_expr
base: inferred_type_expr ".some"
member: identifier "some"

View File

@@ -13,6 +13,8 @@ source_file
top_level
body:
block
stmt: int_literal "42"
===
Negative integer literal
@@ -32,6 +34,11 @@ source_file
top_level
body:
block
stmt:
unary_expr
operand: int_literal "7"
operator: prefix_operator "-"
===
Floating-point literal
@@ -48,6 +55,8 @@ source_file
top_level
body:
block
stmt: float_literal "3.14"
===
Boolean literals
@@ -67,6 +76,10 @@ source_file
top_level
body:
block
stmt:
boolean_literal "true"
boolean_literal "false"
===
Nil literal
@@ -83,6 +96,8 @@ source_file
top_level
body:
block
stmt: builtin_expr "nil"
===
String literal
@@ -101,6 +116,8 @@ source_file
top_level
body:
block
stmt: string_literal "\"hello\""
===
String with interpolation
@@ -122,3 +139,5 @@ source_file
top_level
body:
block
stmt: string_literal "\"hello \\(name)\""

View File

@@ -37,6 +37,30 @@ source_file
top_level
body:
block
stmt:
for_each_stmt
body:
block
stmt:
call_expr
argument:
argument
value:
name_expr
identifier: identifier "x"
callee:
name_expr
identifier: identifier "print"
pattern:
name_pattern
identifier: identifier "x"
iterable:
array_literal
element:
int_literal "1"
int_literal "2"
int_literal "3"
===
For-in over range
@@ -76,6 +100,29 @@ source_file
top_level
body:
block
stmt:
for_each_stmt
body:
block
stmt:
call_expr
argument:
argument
value:
name_expr
identifier: identifier "i"
callee:
name_expr
identifier: identifier "print"
pattern:
name_pattern
identifier: identifier "i"
iterable:
binary_expr
operator: infix_operator "..<"
left: int_literal "0"
right: int_literal "10"
===
For-in with where clause
@@ -119,6 +166,34 @@ source_file
top_level
body:
block
stmt:
for_each_stmt
body:
block
stmt:
call_expr
argument:
argument
value:
name_expr
identifier: identifier "x"
callee:
name_expr
identifier: identifier "print"
pattern:
name_pattern
identifier: identifier "x"
guard:
binary_expr
operator: infix_operator ">"
left:
name_expr
identifier: identifier "x"
right: int_literal "0"
iterable:
name_expr
identifier: identifier "xs"
===
While loop
@@ -154,6 +229,25 @@ source_file
top_level
body:
block
stmt:
while_stmt
body:
block
stmt:
compound_assign_expr
operator: infix_operator "-="
target:
name_expr
identifier: identifier "x"
value: int_literal "1"
condition:
binary_expr
operator: infix_operator ">"
left:
name_expr
identifier: identifier "x"
right: int_literal "0"
===
Repeat-while loop
@@ -189,6 +283,25 @@ source_file
top_level
body:
block
stmt:
do_while_stmt
body:
block
stmt:
compound_assign_expr
operator: infix_operator "-="
target:
name_expr
identifier: identifier "x"
value: int_literal "1"
condition:
binary_expr
operator: infix_operator ">"
left:
name_expr
identifier: identifier "x"
right: int_literal "0"
===
Break and continue
@@ -252,3 +365,46 @@ source_file
top_level
body:
block
stmt:
for_each_stmt
body:
block
stmt:
if_expr
condition:
binary_expr
operator: infix_operator "<"
left:
name_expr
identifier: identifier "x"
right: int_literal "0"
then:
block
stmt: continue_expr "continue"
if_expr
condition:
binary_expr
operator: infix_operator ">"
left:
name_expr
identifier: identifier "x"
right: int_literal "100"
then:
block
stmt: break_expr "break"
call_expr
argument:
argument
value:
name_expr
identifier: identifier "x"
callee:
name_expr
identifier: identifier "print"
pattern:
name_pattern
identifier: identifier "x"
iterable:
name_expr
identifier: identifier "xs"

View File

@@ -17,6 +17,16 @@ source_file
top_level
body:
block
stmt:
binary_expr
operator: infix_operator "+"
left:
name_expr
identifier: identifier "a"
right:
name_expr
identifier: identifier "b"
===
Subtraction
@@ -37,6 +47,16 @@ source_file
top_level
body:
block
stmt:
binary_expr
operator: infix_operator "-"
left:
name_expr
identifier: identifier "a"
right:
name_expr
identifier: identifier "b"
===
Multiplication
@@ -57,6 +77,16 @@ source_file
top_level
body:
block
stmt:
binary_expr
operator: infix_operator "*"
left:
name_expr
identifier: identifier "a"
right:
name_expr
identifier: identifier "b"
===
Division
@@ -77,6 +107,16 @@ source_file
top_level
body:
block
stmt:
binary_expr
operator: infix_operator "/"
left:
name_expr
identifier: identifier "a"
right:
name_expr
identifier: identifier "b"
===
Operator precedence: addition and multiplication
@@ -101,6 +141,22 @@ source_file
top_level
body:
block
stmt:
binary_expr
operator: infix_operator "+"
left:
name_expr
identifier: identifier "a"
right:
binary_expr
operator: infix_operator "*"
left:
name_expr
identifier: identifier "b"
right:
name_expr
identifier: identifier "c"
===
Parenthesised expression
@@ -129,6 +185,14 @@ source_file
top_level
body:
block
stmt:
binary_expr
operator: infix_operator "*"
left: tuple_expr "(a + b)"
right:
name_expr
identifier: identifier "c"
===
Comparison
@@ -149,6 +213,16 @@ source_file
top_level
body:
block
stmt:
binary_expr
operator: infix_operator "<"
left:
name_expr
identifier: identifier "a"
right:
name_expr
identifier: identifier "b"
===
Equality
@@ -169,6 +243,16 @@ source_file
top_level
body:
block
stmt:
binary_expr
operator: infix_operator "=="
left:
name_expr
identifier: identifier "a"
right:
name_expr
identifier: identifier "b"
===
Logical and
@@ -189,6 +273,16 @@ source_file
top_level
body:
block
stmt:
binary_expr
operator: infix_operator "&&"
left:
name_expr
identifier: identifier "a"
right:
name_expr
identifier: identifier "b"
===
Logical or
@@ -209,6 +303,16 @@ source_file
top_level
body:
block
stmt:
binary_expr
operator: infix_operator "||"
left:
name_expr
identifier: identifier "a"
right:
name_expr
identifier: identifier "b"
===
Logical not
@@ -228,6 +332,13 @@ source_file
top_level
body:
block
stmt:
unary_expr
operand:
name_expr
identifier: identifier "a"
operator: prefix_operator "!"
===
Range operator
@@ -248,3 +359,9 @@ source_file
top_level
body:
block
stmt:
binary_expr
operator: infix_operator "..."
left: int_literal "1"
right: int_literal "10"

View File

@@ -34,6 +34,22 @@ source_file
top_level
body:
block
stmt:
variable_declaration
modifier: modifier "let"
pattern:
name_pattern
identifier: identifier "x"
type:
generic_type_expr
base:
named_type_expr
name: identifier "Optional"
type_argument:
named_type_expr
name: identifier "Int"
value: builtin_expr "nil"
===
Optional chaining
@@ -74,6 +90,22 @@ source_file
top_level
body:
block
stmt:
variable_declaration
modifier: modifier "let"
pattern:
name_pattern
identifier: identifier "n"
value:
member_access_expr
base:
member_access_expr
base:
name_expr
identifier: identifier "obj"
member: identifier "foo"
member: identifier "bar"
===
Force unwrap
@@ -103,6 +135,19 @@ source_file
top_level
body:
block
stmt:
variable_declaration
modifier: modifier "let"
pattern:
name_pattern
identifier: identifier "n"
value:
unary_expr
operand:
name_expr
identifier: identifier "opt"
operator: postfix_operator "!"
===
Nil-coalescing
@@ -132,6 +177,20 @@ source_file
top_level
body:
block
stmt:
variable_declaration
modifier: modifier "let"
pattern:
name_pattern
identifier: identifier "n"
value:
binary_expr
operator: infix_operator "??"
left:
name_expr
identifier: identifier "opt"
right: int_literal "0"
===
Throwing function
@@ -167,6 +226,18 @@ source_file
top_level
body:
block
stmt:
function_declaration
body:
block
stmt:
return_expr
value: string_literal "\"\""
name: identifier "read"
return_type:
named_type_expr
name: identifier "String"
===
Do-catch
@@ -216,6 +287,33 @@ source_file
top_level
body:
block
stmt:
try_expr
body:
block
stmt:
unary_expr
operand:
call_expr
callee:
name_expr
identifier: identifier "foo"
operator: prefix_operator "try"
catch_clause:
catch_clause
body:
block
stmt:
call_expr
argument:
argument
value:
name_expr
identifier: identifier "error"
callee:
name_expr
identifier: identifier "print"
===
Try? expression
@@ -252,6 +350,21 @@ source_file
top_level
body:
block
stmt:
variable_declaration
modifier: modifier "let"
pattern:
name_pattern
identifier: identifier "result"
value:
unary_expr
operand:
call_expr
callee:
name_expr
identifier: identifier "foo"
operator: prefix_operator "try?"
===
Try! expression
@@ -288,3 +401,18 @@ source_file
top_level
body:
block
stmt:
variable_declaration
modifier: modifier "let"
pattern:
name_pattern
identifier: identifier "result"
value:
unary_expr
operand:
call_expr
callee:
name_expr
identifier: identifier "foo"
operator: prefix_operator "try!"

View File

@@ -18,6 +18,11 @@ source_file
top_level
body:
block
stmt:
class_like_declaration
modifier: modifier "class"
name: identifier "Foo"
===
Class with stored properties
@@ -79,6 +84,28 @@ source_file
top_level
body:
block
stmt:
class_like_declaration
member:
variable_declaration
modifier: modifier "var"
pattern:
name_pattern
identifier: identifier "x"
type:
named_type_expr
name: identifier "Int"
variable_declaration
modifier: modifier "var"
pattern:
name_pattern
identifier: identifier "y"
type:
named_type_expr
name: identifier "Int"
modifier: modifier "class"
name: identifier "Point"
===
Class with initializer
@@ -152,6 +179,34 @@ source_file
top_level
body:
block
stmt:
class_like_declaration
member:
variable_declaration
modifier: modifier "var"
pattern:
name_pattern
identifier: identifier "x"
type:
named_type_expr
name: identifier "Int"
constructor_declaration
body:
block
stmt:
assign_expr
target:
member_access_expr
base:
name_expr
identifier: identifier "self"
member: identifier "x"
value:
name_expr
identifier: identifier "x"
modifier: modifier "class"
name: identifier "Point"
===
Class with method
@@ -200,6 +255,29 @@ source_file
top_level
body:
block
stmt:
class_like_declaration
member:
variable_declaration
modifier: modifier "var"
pattern:
name_pattern
identifier: identifier "n"
value: int_literal "0"
function_declaration
body:
block
stmt:
compound_assign_expr
operator: infix_operator "+="
target:
name_expr
identifier: identifier "n"
value: int_literal "1"
name: identifier "bump"
modifier: modifier "class"
name: identifier "Counter"
===
Class inheritance
@@ -228,6 +306,11 @@ source_file
top_level
body:
block
stmt:
class_like_declaration
modifier: modifier "class"
name: identifier "Dog"
===
Struct
@@ -289,6 +372,28 @@ source_file
top_level
body:
block
stmt:
class_like_declaration
member:
variable_declaration
modifier: modifier "let"
pattern:
name_pattern
identifier: identifier "x"
type:
named_type_expr
name: identifier "Int"
variable_declaration
modifier: modifier "let"
pattern:
name_pattern
identifier: identifier "y"
type:
named_type_expr
name: identifier "Int"
modifier: modifier "struct"
name: identifier "Point"
===
Enum with cases
@@ -332,6 +437,32 @@ source_file
top_level
body:
block
stmt:
class_like_declaration
member:
variable_declaration
modifier: modifier "enum_case"
pattern:
name_pattern
identifier: identifier "north"
variable_declaration
modifier: modifier "enum_case"
pattern:
name_pattern
identifier: identifier "south"
variable_declaration
modifier: modifier "enum_case"
pattern:
name_pattern
identifier: identifier "east"
variable_declaration
modifier: modifier "enum_case"
pattern:
name_pattern
identifier: identifier "west"
modifier: modifier "enum"
name: identifier "Direction"
===
Enum with associated values
@@ -389,6 +520,40 @@ source_file
top_level
body:
block
stmt:
class_like_declaration
member:
class_like_declaration
member:
constructor_declaration
body: block "circle(radius: Double)"
parameter:
parameter
pattern:
name_pattern
identifier: identifier "radius"
type:
named_type_expr
name: identifier "Double"
modifier: modifier "enum_case"
name: identifier "circle"
class_like_declaration
member:
constructor_declaration
body: block "square(side: Double)"
parameter:
parameter
pattern:
name_pattern
identifier: identifier "side"
type:
named_type_expr
name: identifier "Double"
modifier: modifier "enum_case"
name: identifier "square"
modifier: modifier "enum"
name: identifier "Shape"
===
Protocol declaration
@@ -414,6 +579,15 @@ source_file
top_level
body:
block
stmt:
class_like_declaration
member:
function_declaration
body: block "func draw()"
name: identifier "draw"
modifier: modifier "protocol"
name: identifier "Drawable"
===
Extension
@@ -463,6 +637,30 @@ source_file
top_level
body:
block
stmt:
class_like_declaration
member:
function_declaration
body:
block
stmt:
return_expr
value:
binary_expr
operator: infix_operator "*"
left:
name_expr
identifier: identifier "self"
right:
name_expr
identifier: identifier "self"
name: identifier "squared"
return_type:
named_type_expr
name: identifier "Int"
modifier: modifier "extension"
name: identifier "Int"
===
Computed property
@@ -555,6 +753,48 @@ source_file
top_level
body:
block
stmt:
class_like_declaration
member:
variable_declaration
modifier: modifier "var"
pattern:
name_pattern
identifier: identifier "w"
type:
named_type_expr
name: identifier "Double"
variable_declaration
modifier: modifier "var"
pattern:
name_pattern
identifier: identifier "h"
type:
named_type_expr
name: identifier "Double"
accessor_declaration
body:
block
stmt:
return_expr
value:
binary_expr
operator: infix_operator "*"
left:
name_expr
identifier: identifier "w"
right:
name_expr
identifier: identifier "h"
modifier: modifier "var"
name: identifier "area"
type:
named_type_expr
name: identifier "Double"
accessor_kind: accessor_kind "get"
modifier: modifier "class"
name: identifier "Rect"
===
Property with getter and setter
@@ -639,3 +879,204 @@ source_file
top_level
body:
block
stmt:
class_like_declaration
member:
variable_declaration
modifier: modifier "var"
pattern:
name_pattern
identifier: identifier "_v"
value: int_literal "0"
accessor_declaration
body:
block
stmt:
return_expr
value:
name_expr
identifier: identifier "_v"
modifier: modifier "var"
name: identifier "v"
type:
named_type_expr
name: identifier "Int"
accessor_kind: accessor_kind "get"
accessor_declaration
body:
block
stmt:
assign_expr
target:
name_expr
identifier: identifier "_v"
value:
name_expr
identifier: identifier "newValue"
modifier:
modifier "var"
modifier "chained_declaration"
name: identifier "v"
type:
named_type_expr
name: identifier "Int"
accessor_kind: accessor_kind "set"
modifier: modifier "class"
name: identifier "Box"
===
Protocol with read-only and read-write property requirements
===
protocol P {
var foo: Int { get }
var bar: String { get set }
}
---
source_file
statement:
protocol_declaration
body:
protocol_body
member:
protocol_property_declaration
name:
pattern
binding:
value_binding_pattern
mutability: var
bound_identifier: simple_identifier "foo"
requirements:
protocol_property_requirements
accessor:
getter_specifier
type:
type_annotation
type:
type
name:
user_type
part:
simple_user_type
name: type_identifier "Int"
protocol_property_declaration
name:
pattern
binding:
value_binding_pattern
mutability: var
bound_identifier: simple_identifier "bar"
requirements:
protocol_property_requirements
accessor:
getter_specifier
setter_specifier
type:
type_annotation
type:
type
name:
user_type
part:
simple_user_type
name: type_identifier "String"
name: type_identifier "P"
---
top_level
body:
block
stmt:
class_like_declaration
member:
accessor_declaration
name: identifier "foo"
type:
named_type_expr
name: identifier "Int"
accessor_kind: accessor_kind "get"
accessor_declaration
name: identifier "bar"
type:
named_type_expr
name: identifier "String"
accessor_kind: accessor_kind "get"
accessor_declaration
modifier: modifier "chained_declaration"
name: identifier "bar"
type:
named_type_expr
name: identifier "String"
accessor_kind: accessor_kind "set"
modifier: modifier "protocol"
name: identifier "P"
===
Enum with comma-separated cases (chained_declaration)
===
enum Suit {
case clubs, diamonds, hearts, spades
}
---
source_file
statement:
class_declaration
body:
enum_class_body
member:
enum_entry
case:
enum_case_entry
name: simple_identifier "clubs"
enum_case_entry
name: simple_identifier "diamonds"
enum_case_entry
name: simple_identifier "hearts"
enum_case_entry
name: simple_identifier "spades"
declaration_kind: enum
name: type_identifier "Suit"
---
top_level
body:
block
stmt:
class_like_declaration
member:
variable_declaration
modifier: modifier "enum_case"
pattern:
name_pattern
identifier: identifier "clubs"
variable_declaration
modifier:
modifier "chained_declaration"
modifier "enum_case"
pattern:
name_pattern
identifier: identifier "diamonds"
variable_declaration
modifier:
modifier "chained_declaration"
modifier "enum_case"
pattern:
name_pattern
identifier: identifier "hearts"
variable_declaration
modifier:
modifier "chained_declaration"
modifier "enum_case"
pattern:
name_pattern
identifier: identifier "spades"
modifier: modifier "enum"
name: identifier "Suit"

View File

@@ -23,6 +23,14 @@ source_file
top_level
body:
block
stmt:
variable_declaration
modifier: modifier "let"
pattern:
name_pattern
identifier: identifier "x"
value: int_literal "1"
===
Var binding
@@ -49,6 +57,14 @@ source_file
top_level
body:
block
stmt:
variable_declaration
modifier: modifier "var"
pattern:
name_pattern
identifier: identifier "x"
value: int_literal "1"
===
Let with type annotation
@@ -84,6 +100,17 @@ source_file
top_level
body:
block
stmt:
variable_declaration
modifier: modifier "let"
pattern:
name_pattern
identifier: identifier "x"
type:
named_type_expr
name: identifier "Int"
value: int_literal "1"
===
Var without initialiser
@@ -118,6 +145,16 @@ source_file
top_level
body:
block
stmt:
variable_declaration
modifier: modifier "var"
pattern:
name_pattern
identifier: identifier "x"
type:
named_type_expr
name: identifier "Int"
===
Tuple destructuring binding
@@ -154,6 +191,28 @@ source_file
top_level
body:
block
stmt:
variable_declaration
modifier: modifier "let"
pattern:
tuple_pattern
element:
pattern_element
pattern:
expr_equality_pattern
expr:
name_expr
identifier: identifier "a"
pattern_element
pattern:
expr_equality_pattern
expr:
name_expr
identifier: identifier "b"
value:
name_expr
identifier: identifier "pair"
===
Multiple bindings on one line
@@ -185,6 +244,22 @@ source_file
top_level
body:
block
stmt:
variable_declaration
modifier: modifier "let"
pattern:
name_pattern
identifier: identifier "x"
value: int_literal "1"
variable_declaration
modifier:
modifier "let"
modifier "chained_declaration"
pattern:
name_pattern
identifier: identifier "y"
value: int_literal "2"
===
Assignment
@@ -207,6 +282,13 @@ source_file
top_level
body:
block
stmt:
assign_expr
target:
name_expr
identifier: identifier "x"
value: int_literal "1"
===
Compound assignment
@@ -229,3 +311,138 @@ source_file
top_level
body:
block
stmt:
compound_assign_expr
operator: infix_operator "+="
target:
name_expr
identifier: identifier "x"
value: int_literal "1"
===
Property with willSet and didSet observers
===
class C {
var x: Int = 0 {
willSet { print(newValue) }
didSet { print(oldValue) }
}
}
---
source_file
statement:
class_declaration
body:
class_body
member:
property_declaration
binding:
value_binding_pattern
mutability: var
declarator:
property_binding
name:
pattern
bound_identifier: simple_identifier "x"
observers:
willset_didset_block
didset:
didset_clause
body:
block
statement:
call_expression
function: simple_identifier "print"
suffix:
call_suffix
arguments:
value_arguments
argument:
value_argument
value: simple_identifier "oldValue"
willset:
willset_clause
body:
block
statement:
call_expression
function: simple_identifier "print"
suffix:
call_suffix
arguments:
value_arguments
argument:
value_argument
value: simple_identifier "newValue"
type:
type_annotation
type:
type
name:
user_type
part:
simple_user_type
name: type_identifier "Int"
value: integer_literal "0"
declaration_kind: class
name: type_identifier "C"
---
top_level
body:
block
stmt:
class_like_declaration
member:
variable_declaration
modifier: modifier "var"
pattern:
name_pattern
identifier: identifier "x"
type:
named_type_expr
name: identifier "Int"
value: int_literal "0"
accessor_declaration
body:
block
stmt:
call_expr
argument:
argument
value:
name_expr
identifier: identifier "newValue"
callee:
name_expr
identifier: identifier "print"
modifier:
modifier "var"
modifier "chained_declaration"
name: identifier "x"
accessor_kind: accessor_kind "willSet"
accessor_declaration
body:
block
stmt:
call_expr
argument:
argument
value:
name_expr
identifier: identifier "oldValue"
callee:
name_expr
identifier: identifier "print"
modifier:
modifier "var"
modifier "chained_declaration"
name: identifier "x"
accessor_kind: accessor_kind "didSet"
modifier: modifier "class"
name: identifier "C"

View File

@@ -2,7 +2,7 @@ use std::fs;
use std::path::Path;
use codeql_extractor::extractor::simple;
use yeast::{dump::dump_ast, dump::dump_ast_with_type_errors, Runner};
use yeast::{Runner, dump::dump_ast, dump::dump_ast_with_type_errors};
#[path = "../src/languages/mod.rs"]
mod languages;
@@ -146,29 +146,36 @@ fn render_corpus(cases: &[CorpusCase]) -> String {
out
}
fn run_desugaring(
lang: &simple::LanguageSpec,
input: &str,
) -> Result<yeast::Ast, String> {
let runner = match lang.desugar.as_ref() {
Some(config) => Runner::from_config(lang.ts_language.clone(), config)
.map_err(|e| format!("Failed to create yeast runner: {e}"))?,
None => Runner::new(lang.ts_language.clone(), &[]),
};
runner
.run(input)
.map_err(|e| format!("Failed to parse input: {e}"))
fn run_desugaring(lang: &simple::LanguageSpec, input: &str) -> Result<yeast::Ast, String> {
match lang.desugar.as_deref() {
Some(desugarer) => {
// Parse the input ourselves so we don't depend on the desugarer
// knowing about the language.
let mut parser = tree_sitter::Parser::new();
parser
.set_language(&lang.ts_language)
.map_err(|e| format!("Failed to set language: {e}"))?;
let tree = parser
.parse(input, None)
.ok_or_else(|| "Failed to parse input".to_string())?;
desugarer
.run_from_tree(&tree, input.as_bytes())
.map_err(|e| format!("Desugaring failed: {e}"))
}
None => {
let runner: Runner = Runner::new(lang.ts_language.clone(), &[]);
runner
.run(input)
.map_err(|e| format!("Failed to parse input: {e}"))
}
}
}
/// Produce the raw tree-sitter parse tree dump for `input`, with no
/// desugaring rules applied. Uses a `Runner` with an empty phase list and
/// the input grammar's own schema.
fn dump_raw_parse(
lang: &simple::LanguageSpec,
input: &str,
) -> Result<String, String> {
let runner = Runner::new(lang.ts_language.clone(), &[]);
fn dump_raw_parse(lang: &simple::LanguageSpec, input: &str) -> Result<String, String> {
let runner: Runner = Runner::new(lang.ts_language.clone(), &[]);
let ast = runner
.run(input)
.map_err(|e| format!("Failed to parse input: {e}"))?;
@@ -272,11 +279,7 @@ fn test_corpus() {
}
}
assert!(
failures.is_empty(),
"{}",
failures.join("\n\n") + "\n\n"
);
assert!(failures.is_empty(), "{}", failures.join("\n\n") + "\n\n");
if update_mode {
let updated = render_corpus(&cases);
@@ -285,7 +288,9 @@ fn test_corpus() {
write_result.is_ok(),
"Failed to update corpus file {}: {}",
corpus_path.display(),
write_result.err().map_or_else(String::new, |e| e.to_string())
write_result
.err()
.map_or_else(String::new, |e| e.to_string())
);
}
}

View File

@@ -1368,7 +1368,7 @@ module.exports = grammar({
seq(
field("modifiers", optional($.modifiers)),
"import",
optional($._import_kind),
optional(field("scoped_import_kind", $._import_kind)),
field("name", $.identifier)
),
_import_kind: ($) =>
@@ -1930,7 +1930,7 @@ module.exports = grammar({
seq(
optional("case"),
optional(field("type", $.user_type)), // XXX this should just be _type but that creates ambiguity
$._dot,
field("dot", $._dot),
field("name", $.simple_identifier),
optional(field("arguments", $.tuple_pattern))
),

View File

@@ -173,6 +173,7 @@ named:
value?: expression
case_pattern:
arguments?: tuple_pattern
dot: "."
name: simple_identifier
type?: user_type
catch_block:
@@ -351,6 +352,7 @@ named:
import_declaration:
modifiers?: modifiers
name: identifier
scoped_import_kind?: ["class", "enum", "func", "let", "protocol", "struct", "typealias", "var"]
infix_expression:
lhs: expression
op: custom_operator

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,18 @@
/** Provides classes for working with comments. */
private import unified
/**
* A comment appearing in the source code.
*/
class Comment extends TriviaToken {
// At the moment, comments are the only type trivia token we extract
/**
* Gets the text inside this comment, not counting the delimeters.
*/
string getCommentText() {
result = this.getValue().regexpCapture("//(.*)", 1)
or
result = this.getValue().regexpCapture("(?s)/\\*(.*)\\*/", 1)
}
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,8 @@
/**
* Provides classes for working with the AST, as well as files and locations.
*/
import codeql.Locations
import codeql.files.FileSystem
import codeql.unified.Ast::Unified
import codeql.unified.Comments

View File

@@ -1,9 +1,37 @@
nameExpr
| name_expr.swift:1:9:1:9 | NameExpr | y |
| test.swift:1:8:1:17 | NameExpr | Foundation |
| test.swift:8:9:8:13 | NameExpr | items |
| test.swift:8:22:8:25 | NameExpr | item |
| test.swift:12:16:12:20 | NameExpr | items |
| test.swift:12:31:12:34 | NameExpr | item |
| test.swift:25:18:25:22 | NameExpr | Array |
| test.swift:25:24:25:28 | NameExpr | first |
| test.swift:26:17:26:22 | NameExpr | second |
| test.swift:27:13:27:18 | NameExpr | result |
| test.swift:27:29:27:32 | NameExpr | item |
| test.swift:28:13:28:18 | NameExpr | result |
| test.swift:28:27:28:30 | NameExpr | item |
| test.swift:31:12:31:17 | NameExpr | result |
| test.swift:40:16:40:19 | NameExpr | data |
| test.swift:44:9:44:12 | NameExpr | data |
| test.swift:48:15:48:19 | NameExpr | index |
| test.swift:48:29:48:33 | NameExpr | index |
| test.swift:48:37:48:40 | NameExpr | data |
| test.swift:49:16:49:19 | NameExpr | data |
| test.swift:49:21:49:25 | NameExpr | index |
| test.swift:53:9:53:12 | NameExpr | data |
| test.swift:53:21:53:24 | NameExpr | item |
| test.swift:63:16:63:19 | NameExpr | self |
| test.swift:65:29:65:37 | NameExpr | transform |
| test.swift:65:39:65:43 | NameExpr | value |
| test.swift:67:29:67:33 | NameExpr | error |
| test.swift:76:16:76:19 | NameExpr | self |
| test.swift:76:21:76:21 | NameExpr | i |
| test.swift:76:26:76:29 | NameExpr | self |
| test.swift:76:31:76:31 | NameExpr | i |
| test.swift:86:12:86:17 | NameExpr | values |
| test.swift:87:12:87:17 | NameExpr | values |
| test.swift:87:38:87:43 | NameExpr | values |
| test.swift:87:49:87:57 | NameExpr | transform |
unsupported
| test.swift:3:1:3:38 | | |
| test.swift:16:1:16:32 | | |
| test.swift:23:1:23:37 | | |
| test.swift:34:1:34:49 | | |
| test.swift:57:1:57:30 | | |
| test.swift:72:1:72:37 | | |
| test.swift:84:1:84:24 | | |

View File

@@ -0,0 +1,3 @@
| comments.swift:1:1:1:22 | // Hello this is swift | Hello this is swift |
| comments.swift:3:1:6:3 | /*\n * This is a multi-line comment\n * It should be ignored by the parser\n */ | \n * This is a multi-line comment\n * It should be ignored by the parser\n |
| comments.swift:9:5:9:36 | // This is a single-line comment | This is a single-line comment |

View File

@@ -0,0 +1,3 @@
import unified
query predicate comments(Comment c, string text) { text = c.getCommentText() }

View File

@@ -0,0 +1,11 @@
// Hello this is swift
/*
* This is a multi-line comment
* It should be ignored by the parser
*/
func hello() {
// This is a single-line comment
print("Hello, world!")
}