The internal predicates that identify `@staticmethod`, `@classmethod` and
`@property` decorators previously required the decorator's `NameNode` to
satisfy `isGlobal()` (i.e. no SSA def reaches the decorator's name use).
That filter was correct but unnecessarily indirect: these three names
are builtins, and even when a class body redefines one, the class body
has not started executing at the decorator position, so Python uses the
builtin.
Match the decorator's AST `Name` directly instead, dropping the CFG/SSA
detour. The slight semantic change — `isGlobal()` would have rejected
module-level shadowing of these builtins — is negligible in practice
and explicitly documented in the change note.
`hasContextmanagerDecorator` and `hasOverloadDecorator` keep the
`NameNode.isGlobal()` check because their target names (`contextmanager`,
`overload`) are imported, not builtin, and local shadowing is a real
concern.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add FieldCardinality to Schema to track required/multiple per field,
populated from the ast_types.yml suffixes (bare = required single,
? = optional single, + = required multiple, * = optional multiple).
dump_ast_with_type_errors now emits:
<-- ERROR: missing required field 'name'
for any node in the output AST whose declared schema requires a field
that is absent from the actual node.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When a {..expr} splice in an output template is empty (e.g. from an
optional capture that did not match), drop the field entirely rather
than emitting an empty named field. This lets a single rule with
optional captures replace what used to be two near-identical rules.
Also re-renders the corpus to drop the now-suppressed empty fields.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
We now find an alert on this line as we hope to
It is not an alert for _full_ SSRF, though, since that configuration cannot handle multiple substitutions.
- remove `tupleStoreStep` and `dictStoreStep` from `containerStep`
These are imprecise compared to the content being precise.
- add implicit reads to recover taint at sinks
- add implicit read steps for decoders
to supplement the `AdditionalTaintStep`
that now only covers when the full container is tainted.
Same as in the preceding commit, these items do not make sense as
separate fields on the parent node, so we materialise (or create new)
intermediate nodes to group them together.
The field representation would have made it difficult to figure out
which parameters correspond to which default values and attributes, so
instead we now encapsulate these in a new `function_parameter` node.
Introduces (by making it named) a `block` node, and conversely makes
`statements` anonymous. This enables us to sensibly distinguish between
the "then" and "else" branch of an `if_statement`, which we were not
able to previously.
The output is not so interesting as the mapping removes most nodes from the current test file.
I added a name_expr.swift test so at least one NameExpr makes it through.
ast_types.yml additions:
- tuple_pattern { element*: pattern } in the pattern supertype.
- sequence_condition { stmt*: stmt, condition: condition } in the
condition supertype.
swift.rs:
- Map Swift tuple destructuring (e.g. `let (a, b) = pair`) to the new
tuple_pattern instead of synthesizing an apply_pattern.
- if-let / guard-let: explicitly match the value_binding_pattern
(the `let` keyword) and bind the source expression as the next
condition child, so `let` no longer leaks into the output.
Two changes to parse_query_fields:
- Allow `field: (kind)* @cap` (repetition + optional capture) in field
position, mirroring how it works for bare children.
- When the same field name is declared multiple times in a query (e.g.
`condition: (foo) condition: (bar)`), merge them into a single
ordered list of children rather than emitting duplicate field
entries (which at runtime restart the iterator for the field and
cause the second declaration to re-match from the first child).
Introduce NodeRef as a typed wrapper around node arena IDs. Captures in
desugaring rules are now bound as NodeRef instead of raw usize, which
prevents accidental misuse and enables source-text-aware rendering.
Add the YeastDisplay trait as an alternative to Display: its
yeast_to_string method receives the Ast, allowing NodeRef to resolve to
the captured node's source text instead of printing a numeric ID.
Store the original source bytes in the Ast so that NodeContent::Range
values (from synthesized literal nodes) can be resolved back to text.
Update yeast-macros to emit NodeRef-typed capture bindings and use
Into::<usize>::into where raw IDs are needed. The #{expr} template
syntax now uses YeastDisplay instead of Display.
The effect is visible in the corpus tests: operator nodes now correctly
render as e.g. operator "+" instead of operator "3".
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add corpus test cases for Swift covering closures, collections, control
flow, functions, literals, loops, operators, optionals/errors, types,
and variables. Update existing desugar.txt with raw parse sections.
Note: operator nodes currently render their node ID instead of the actual
operator text (e.g. operator "3" instead of operator "+"). This will be
fixed in the next commit.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add ast_types.yml defining the unified output AST schema with supertypes
(expr, stmt, condition, pattern) and named nodes (top_level, binary_expr,
name_expr, etc.).
Rewrite swift translation rules to map from tree-sitter Swift grammar to
the unified AST, using one-shot phase rules.
Update the generator to use the output AST schema for dbscheme/QL
generation, and normalize the extraction table prefix to 'unified'.
Improve the corpus test framework to include raw tree-sitter parse output,
type-error checking against the output schema, and better failure
reporting.
Regenerate Ast.qll, unified.dbscheme, and update BasicTest accordingly.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
One-shot desugaring rules now skip unnamed nodes (punctuation, keywords,
etc.) since rules are intended to target named nodes only.
Also prevent infinite recursion when a capture refers to the root node of
the matched tree (e.g. an @_ capture on the pattern root).
Additionally fix the swift.rs add_phase call to match the updated 3-arg
signature introduced by the one-shot phase kind commit.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This named node (which is in fact emitted by the scanner as an
`external`) was appearing as a child of `class_body` because of inlining
via `_class_member_separator`. This, in itself, appears to be somewhat
of a hack, to handle cases where a multiline comment signals the end of
a class member.
To fix this, we make the external node _unnamed_, but keep the `extras`
node _named_ (so we can still extract it from the parse tree), and we
add a new rule `multiline_comment` that mediates between the two. That
way, the use inside `_class_member_separator` can use the unnamed
variant, and no node is pushed into $children.
Doing this involved materialising a lot of previously anonymous nodes,
and I'm not entirely sure it's the best solution, but the node types
look decent enough.
Not entirely happy about the mixed nature of the `kind` filed (having
both tokens and the named node `throw_keyword` in there), but that's a
problem for a different time.
Some nodes with a single child (arguably redundant to do, but I think
it's nice to have the types be consistent), and also an instance of
ensuring that all branches of a `choice` expose consistent field names.
A lot of changes, but for the most part these are just adding named
fields in places where they make sense.
After this, there are still ~20 instances of unnamed children appearing.
I ended up also aliasing `_async_keyword` to a named node to make it
more consistent with the other node kinds that can be in this field (as
it would be awkward to have two named types and a token here).
Elsewhere in the node types, we'll still have `async?: "async"`, and I
think that's okay.
Part 1 of N of "getting rid of $children" in node-types.yml
Note: in one of the cases the affected node still has the $children
field present. This is because there's some weirdness about recording
multiline comments as class member separators that I did not want to
figure out how to address right now.
This one is potentially a bit iffy -- it checks for a very powerful
property (that implies many of the other queries), but as the test
results show, it can produce false positives when there is in fact no
problem. We may want to get rid of it entirely, if it becomes too noisy.
This looks for nodes annotated with `t[never]` in the test that are
reachable in the CFG. This should not happen (it messes with various
queries, e.g. the "mixed returns" query), but the test shows that in a
few particular cases (involving the `match` statement where all cases
contain `return`s), we _do_ have reachable nodes that shouldn't be.
This one demonstrates a bug in the current CFG. In a dictionary
comprehension `{k: v for k, v in d.items()}`, we evaluate the value
before the key, which is incorrect. (A fix for this bug has been
implemented in a separate PR.)
These use the annotated, self-verifying test files to check various
consistency requirements.
Some of these may be expressing the same thing in different ways, but
it's fairly cheap to keep them around, so I have not attempted to
produce a minimal set of queries for this.
These tests consist of various Python constructions (hopefully a
somewhat comprehensive set) with specific timestamp annotations
scattered throughout. When the tests are run using the Python 3
interpreter, these annotations are checked and compared to the "current
timestamp" to see that they are in agreement. This is what makes the
tests "self-validating".
There are a few different kinds of annotations: the basic `t[4]` style
(meaning this is executed at timestamp 4), the `t[dead(4)]` variant
(meaning this _would_ happen at timestamp 4, but it is in a dead
branch), and `t[never]` (meaning this is never executed at all).
In addition to this, there is a query, MissingAnnotations, which checks
whether we have applied these annotations maximally. Many expression
nodes are not actually annotatable, so there is a sizeable list of
excluded nodes for that query.
if ! sed -i "s/var maxGoVersion = util\.NewSemVer(\"$CURRENT_MAJOR_MINOR_ESCAPED\")/var maxGoVersion = util.NewSemVer(\"$LATEST_MAJOR_MINOR\")/" go/extractor/autobuilder/build-environment.go; then
echo "Warning: Failed to update build-environment.go"
fi
# Update go/actions/test/action.yml
if ! sed -i "s/default: \"~$CURRENT_VERSION_ESCAPED\"/default: \"~$LATEST_VERSION_NUM\"/" go/actions/test/action.yml; then
This PR was automatically created by the [Go version update workflow](https://github.com/${{ github.repository }}/blob/main/.github/workflows/go-version-update.yml).
EOF
)
if [ "${{ steps.check-pr.outputs.pr_exists }}" = "true" ]; then
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.