Compare commits

...

67 Commits

Author SHA1 Message Date
BazookaMusic
e40a7124d4 oops 2026-05-28 10:31:12 +02:00
BazookaMusic
7b5cceadf5 models for axis2 2026-05-28 10:15:41 +02:00
Asger F
17fe3e4e31 Merge pull request #21901 from asgerf/unified-fix-test
Unified: fix test output
2026-05-27 22:19:17 +02:00
Asger F
313500e581 Unified: update test outputs 2026-05-27 21:27:09 +02:00
Asger F
ad56ebd361 Unified: update test output 2026-05-27 21:25:32 +02:00
Asger F
6be9e2315d Merge pull request #21841 from github/tausbn/unified-swift-named-body-fields
Unified: Get rid of all `$children` fields
2026-05-27 21:25:11 +02:00
Taus
41fd59c1c1 Unified: regenerate Ast.qll and dbscheme 2026-05-27 15:02:28 +00:00
Taus
d6e7e38e1c Unified: merge in main
Keeps our version of the conflicting files. They will be regenerated in
the next commit.
2026-05-27 15:01:03 +00:00
Jeroen Ketema
7723324687 Merge pull request #21896 from jketema/jketema/deprecated
C++: Remove deprecated code
2026-05-27 14:11:10 +02:00
Jeroen Ketema
42c4d8a98b Merge pull request #21897 from jketema/jketema/missing-friend
C++: Update expected test results after extractor changes
2026-05-27 12:54:00 +02:00
Jeroen Ketema
e66b1e4beb Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-05-27 10:58:05 +02:00
Jeroen Ketema
362c48cc6d C++: Add change note 2026-05-27 10:44:44 +02:00
Jeroen Ketema
35364a087a C++: Update expected test results after extractor changes 2026-05-27 10:23:16 +02:00
Asger F
f18cdcfec6 Merge pull request #21848 from asgerf/asgerf/swift-yeast
Unified: Add schema checking and corpus-style tests
2026-05-26 22:00:21 +02:00
Jeroen Ketema
7862922e5c C++: Remove deprecated code 2026-05-26 17:54:51 +02:00
Taus
fbc861e7a4 unified: Clarify grammar comment
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-05-26 16:19:02 +02:00
Taus
dd9c066c61 unified: Regenerate files 2026-05-13 14:24:12 +00:00
Taus
f4f85b58ca unified: Remove some pointless fields
All of these fields have contents that are uniquely determined by the
node they appear on, so they convey no information.
2026-05-13 14:22:06 +00:00
Taus
caef72b047 unified: Introduced named property_binding node
This groups together a bunch of related values that would otherwise be
impossible to match up correctly.
2026-05-13 13:54:21 +00:00
Taus
9787a8b072 unified: Group enum entries
Same as in the preceding commit.
2026-05-13 13:51:25 +00:00
Taus
c8f7c3d7f2 unified: Group more paired items
Same as in the preceding commit, these items do not make sense as
separate fields on the parent node, so we materialise (or create new)
intermediate nodes to group them together.
2026-05-13 13:49:30 +00:00
Taus
ea6f3a9568 unified: Encapsulate function parameters
The field representation would have made it difficult to figure out
which parameters correspond to which default values and attributes, so
instead we now encapsulate these in a new `function_parameter` node.
2026-05-13 13:20:58 +00:00
Taus
5d6dc5c3c3 unified: Clean up statements/block mess
Introduces (by making it named) a `block` node, and conversely makes
`statements` anonymous. This enables us to sensibly distinguish between
the "then" and "else" branch of an `if_statement`, which we were not
able to previously.
2026-05-13 13:06:34 +00:00
Asger F
554bdf14b2 Yeast: fix warning about unnecessary mutability 2026-05-13 11:19:51 +02:00
Asger F
b031e5b1f8 Unified: regenerate QL and make tests not crash
The output is not so interesting as the mapping removes most nodes from the current test file.

I added a name_expr.swift test so at least one NameExpr makes it through.
2026-05-13 10:48:43 +02:00
Asger F
7fa6c4e4a3 Unified: Update test output after rebasing on grammar changes
The branch was rebased on the grammar changes, but rewriting the history was too difficult, so I'm just updating the test output here.
2026-05-13 10:35:34 +02:00
Asger F
600a4969c9 Unified: Simplify concatenation of arguments 2026-05-13 10:35:33 +02:00
Asger F
55194dd757 Unified: Support for calls and member access 2026-05-13 10:35:31 +02:00
Asger F
cbe4c81ca6 Unified: add tuple_pattern and sequence_condition; refine if-let/guard mapping
ast_types.yml additions:
- tuple_pattern { element*: pattern } in the pattern supertype.
- sequence_condition { stmt*: stmt, condition: condition } in the
  condition supertype.

swift.rs:
- Map Swift tuple destructuring (e.g. `let (a, b) = pair`) to the new
  tuple_pattern instead of synthesizing an apply_pattern.
- if-let / guard-let: explicitly match the value_binding_pattern
  (the `let` keyword) and bind the source expression as the next
  condition child, so `let` no longer leaks into the output.
2026-05-13 10:35:29 +02:00
Asger F
3b7a53f678 yeast-macros: merge repeated field declarations and support repetition in field patterns
Two changes to parse_query_fields:

- Allow `field: (kind)* @cap` (repetition + optional capture) in field
  position, mirroring how it works for bare children.
- When the same field name is declared multiple times in a query (e.g.
  `condition: (foo) condition: (bar)`), merge them into a single
  ordered list of children rather than emitting duplicate field
  entries (which at runtime restart the iterator for the field and
  cause the second declaration to re-match from the first child).
2026-05-13 10:35:27 +02:00
Asger F
ccc1dd5d3e Unified: Add tuple_pattern 2026-05-13 10:35:26 +02:00
Asger F
a966dff76e Unified: Add more patterns and some fixes to the AST 2026-05-13 10:35:24 +02:00
Asger F
6b58482dfb Yeast: Fix text associated with synthesized nodes 2026-05-13 10:35:22 +02:00
Asger F
2307839050 Yeast: Change how patterns with repetition are parsed 2026-05-13 10:35:21 +02:00
Asger F
92838011dd Unified: Add some more AST nodes and rules 2026-05-13 10:35:19 +02:00
Asger F
5772ee4d9b YEAST: add NodeRef type, YeastDisplay trait, and source text storage
Introduce NodeRef as a typed wrapper around node arena IDs. Captures in
desugaring rules are now bound as NodeRef instead of raw usize, which
prevents accidental misuse and enables source-text-aware rendering.

Add the YeastDisplay trait as an alternative to Display: its
yeast_to_string method receives the Ast, allowing NodeRef to resolve to
the captured node's source text instead of printing a numeric ID.

Store the original source bytes in the Ast so that NodeContent::Range
values (from synthesized literal nodes) can be resolved back to text.

Update yeast-macros to emit NodeRef-typed capture bindings and use
Into::<usize>::into where raw IDs are needed. The #{expr} template
syntax now uses YeastDisplay instead of Display.

The effect is visible in the corpus tests: operator nodes now correctly
render as e.g. operator "+" instead of operator "3".

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-13 10:35:17 +02:00
Asger F
72b683d63c Unified: Add Swift corpus tests
Add corpus test cases for Swift covering closures, collections, control
flow, functions, literals, loops, operators, optionals/errors, types,
and variables. Update existing desugar.txt with raw parse sections.

Note: operator nodes currently render their node ID instead of the actual
operator text (e.g. operator "3" instead of operator "+"). This will be
fixed in the next commit.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-13 10:35:16 +02:00
Asger F
8a2a48d2dd Unified extractor: add AST schema, swift translation rules, and corpus framework
Add ast_types.yml defining the unified output AST schema with supertypes
(expr, stmt, condition, pattern) and named nodes (top_level, binary_expr,
name_expr, etc.).

Rewrite swift translation rules to map from tree-sitter Swift grammar to
the unified AST, using one-shot phase rules.

Update the generator to use the output AST schema for dbscheme/QL
generation, and normalize the extraction table prefix to 'unified'.

Improve the corpus test framework to include raw tree-sitter parse output,
type-error checking against the output schema, and better failure
reporting.

Regenerate Ast.qll, unified.dbscheme, and update BasicTest accordingly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-13 10:35:14 +02:00
Asger F
5d0cb9e805 YEAST: fix one-shot rules for unnamed nodes and self-captures
One-shot desugaring rules now skip unnamed nodes (punctuation, keywords,
etc.) since rules are intended to target named nodes only.

Also prevent infinite recursion when a capture refers to the root node of
the matched tree (e.g. an @_ capture on the pattern root).

Additionally fix the swift.rs add_phase call to match the updated 3-arg
signature introduced by the one-shot phase kind commit.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-13 10:35:12 +02:00
Asger F
bb9e996cb6 Shared: Do not emit ReservedWord class when there are no unnamed tokens 2026-05-13 10:35:11 +02:00
Asger F
c3a9218dcf Yeast: Add one-shot phase kind 2026-05-13 10:35:09 +02:00
Asger F
a049850c51 Yeast: add type-checking errors in AST dump 2026-05-13 10:35:07 +02:00
Asger F
49f19092fb Yeast: add reachable_node_ids() 2026-05-13 10:35:05 +02:00
Asger F
f668b99d6d Unified: Add support for tree-sitter-style corpus tests
This adds tests consisting of source code and a printout of its rewritten AST.
2026-05-13 10:35:02 +02:00
Taus
bfe5aa8d42 unified: Regenerate files 2026-05-12 16:01:32 +00:00
Taus
52d72836f9 unified: Fix multiline_comment issue
This named node (which is in fact emitted by the scanner as an
`external`) was appearing as a child of `class_body` because of inlining
via `_class_member_separator`. This, in itself, appears to be somewhat
of a hack, to handle cases where a multiline comment signals the end of
a class member.

To fix this, we make the external node _unnamed_, but keep the `extras`
node _named_ (so we can still extract it from the parse tree), and we
add a new rule `multiline_comment` that mediates between the two. That
way, the use inside `_class_member_separator` can use the unnamed
variant, and no node is pushed into $children.
2026-05-12 15:59:18 +00:00
Taus
eb480d1de4 unified: Make parenthesized_type named
I'm not entirely happy about this solution, but it seemed to be the most
straightforward way of avoiding various kinds of token bleeding.
2026-05-12 15:38:29 +00:00
Taus
2eee2e50dc unified: clean up patterns
Mostly by materialising a bunch of (useful) intermediate nodes.
2026-05-12 15:23:26 +00:00
Taus
2010844b1e unified: Add fields to property_declaration
Not entirely sure about the `binding?` field on `pattern`, but it looks
like that might actually be useful.
2026-05-12 15:14:35 +00:00
Taus
406a02fa49 unified: Add fields to switch_entry
Of note: this involved un-inlining where_clause.
2026-05-12 15:09:02 +00:00
Taus
6e5e650b42 unified: Add fields for macro_declaration 2026-05-12 15:03:29 +00:00
Taus
eba9f35673 unified: Get rid of $children* on key_path_expression
Doing this involved materialising a lot of previously anonymous nodes,
and I'm not entirely sure it's the best solution, but the node types
look decent enough.
2026-05-12 15:01:10 +00:00
Taus
e1a0e204b1 unified: Promote enum_type_parameter to named and add fields 2026-05-12 14:55:43 +00:00
Taus
5e14a7574e unified: make compilation_condition named and add fields 2026-05-12 14:55:42 +00:00
Taus
6ff404a6d0 unified: More miscellaneous field additions 2026-05-12 14:50:01 +00:00
Taus
9902beddec unified: add proper fields for availability_condition 2026-05-12 14:47:58 +00:00
Taus
e6eac3784a unified: Consolidate fields in if_let_binding 2026-05-12 14:43:13 +00:00
Taus
5784ef22f6 unified: Unify more fields
Not entirely happy about the mixed nature of the `kind` filed (having
both tokens and the named node `throw_keyword` in there), but that's a
problem for a different time.
2026-05-12 14:40:17 +00:00
Taus
bc96ae6e47 unified: Add lambda and arguments fields 2026-05-12 14:29:23 +00:00
Taus
15d84b3e53 unified: More $children fixes
Some nodes with a single child (arguably redundant to do, but I think
it's nice to have the types be consistent), and also an instance of
ensuring that all branches of a `choice` expose consistent field names.
2026-05-12 14:15:36 +00:00
Taus
0499932ba0 unified: Fix fields in await_expression
This required a change in a different place, due to aliasing.
2026-05-12 14:10:38 +00:00
Taus
732cc7bee0 unified: Add fields to inheritance specifiers and calls 2026-05-12 14:07:58 +00:00
Taus
853a98842d unified: Regenerate files 2026-05-12 14:00:14 +00:00
Taus
d6ef467fba unified: Add more fields
A lot of changes, but for the most part these are just adding named
fields in places where they make sense.

After this, there are still ~20 instances of unnamed children appearing.
2026-05-12 13:59:56 +00:00
Taus
c75d819a92 unified: Add effect field
I ended up also aliasing `_async_keyword` to a named node to make it
more consistent with the other node kinds that can be in this field (as
it would be awkward to have two named types and a token here).

Elsewhere in the node types, we'll still have `async?: "async"`, and I
think that's okay.
2026-05-12 13:46:25 +00:00
Taus
75c07996f3 unified: regenerate files 2026-05-12 12:57:26 +00:00
Taus
9dddd93460 unified: add field declarations for statements and members
Part 1 of N of "getting rid of $children" in node-types.yml

Note: in one of the cases the affected node still has the $children
field present. This is because there's some weirdness about recording
multiline comments as class member separators that I did not want to
figure out how to address right now.
2026-05-12 12:57:26 +00:00
84 changed files with 7101 additions and 5365 deletions

View File

@@ -30,8 +30,6 @@ class Options extends string {
predicate overrideReturnsNull(Call call) {
// Used in CVS:
call.(FunctionCall).getTarget().hasGlobalName("Xstrdup")
or
CustomOptions::overrideReturnsNull(call) // old Options.qll
}
/**
@@ -45,8 +43,6 @@ class Options extends string {
// Used in CVS:
call.(FunctionCall).getTarget().hasGlobalName("Xstrdup") and
nullValue(call.getArgument(0))
or
CustomOptions::returnsNull(call) // old Options.qll
}
/**
@@ -65,8 +61,6 @@ class Options extends string {
f.hasGlobalOrStdName([
"exit", "_exit", "_Exit", "abort", "__assert_fail", "longjmp", "__builtin_unreachable"
])
or
CustomOptions::exits(f) // old Options.qll
}
/**
@@ -79,8 +73,7 @@ class Options extends string {
* runtime, the program's behavior is undefined)
*/
predicate exprExits(Expr e) {
e.(AssumeExpr).getChild(0).(CompileTimeConstantInt).getIntValue() = 0 or
CustomOptions::exprExits(e) // old Options.qll
e.(AssumeExpr).getChild(0).(CompileTimeConstantInt).getIntValue() = 0
}
/**
@@ -88,10 +81,7 @@ class Options extends string {
*
* By default holds only for `fgets`.
*/
predicate alwaysCheckReturnValue(Function f) {
f.hasGlobalOrStdName("fgets") or
CustomOptions::alwaysCheckReturnValue(f) // old Options.qll
}
predicate alwaysCheckReturnValue(Function f) { f.hasGlobalOrStdName("fgets") }
/**
* Holds if it is reasonable to ignore the return value of function
@@ -107,8 +97,6 @@ class Options extends string {
// common way of sleeping using select:
fc.getTarget().hasGlobalName("select") and
fc.getArgument(0).getValue() = "0"
or
CustomOptions::okToIgnoreReturnValue(fc) // old Options.qll
}
}

View File

@@ -98,57 +98,3 @@ class CustomMutexType extends MutexType {
*/
override predicate unlockAccess(FunctionCall fc, Expr arg) { none() }
}
/**
* DEPRECATED: customize `CustomOptions.overrideReturnsNull` instead.
*
* This predicate is required to support backwards compatibility for
* older `Options.qll` files. It should not be removed or modified by
* end users.
*/
predicate overrideReturnsNull(Call call) { none() }
/**
* DEPRECATED: customize `CustomOptions.returnsNull` instead.
*
* This predicate is required to support backwards compatibility for
* older `Options.qll` files. It should not be removed or modified by
* end users.
*/
predicate returnsNull(Call call) { none() }
/**
* DEPRECATED: customize `CustomOptions.exits` instead.
*
* This predicate is required to support backwards compatibility for
* older `Options.qll` files. It should not be removed or modified by
* end users.
*/
predicate exits(Function f) { none() }
/**
* DEPRECATED: customize `CustomOptions.exprExits` instead.
*
* This predicate is required to support backwards compatibility for
* older `Options.qll` files. It should not be removed or modified by
* end users.
*/
predicate exprExits(Expr e) { none() }
/**
* DEPRECATED: customize `CustomOptions.alwaysCheckReturnValue` instead.
*
* This predicate is required to support backwards compatibility for
* older `Options.qll` files. It should not be removed or modified by
* end users.
*/
predicate alwaysCheckReturnValue(Function f) { none() }
/**
* DEPRECATED: customize `CustomOptions.okToIgnoreReturnValue` instead.
*
* This predicate is required to support backwards compatibility for
* older `Options.qll` files. It should not be removed or modified by
* end users.
*/
predicate okToIgnoreReturnValue(FunctionCall fc) { none() }

View File

@@ -0,0 +1,15 @@
---
category: breaking
---
* Removed the deprecated `overrideReturnsNull` predicate from `Options.qll`. Use `CustomOptions.overrideReturnsNull` instead.
* Removed the deprecated `returnsNull` predicate from `Options.qll`. Use `CustomOptions.returnsNull` instead.
* Removed the deprecated `exits` predicate from `Options.qll`. Use `CustomOptions.exits` instead.
* Removed the deprecated `exprExits` predicate from `Options.qll`. Use `CustomOptions.exprExits` instead.
* Removed the deprecated `alwaysCheckReturnValue` predicate from `Options.qll`. Use `CustomOptions.alwaysCheckReturnValue` instead.
* Removed the deprecated `okToIgnoreReturnValue` predicate from `Options.qll`. Use `CustomOptions.okToIgnoreReturnValue` instead.
* Removed the deprecated `semmle.code.cpp.Member`. Import `semmle.code.cpp.Element` and/or `semmle.code.cpp.Type` directly.
* Removed the deprecated `UnknownDefaultLocation` class. Use `UnknownLocation` instead.
* Removed the deprecated `UnknownExprLocation` class. Use `UnknownLocation` instead.
* Removed the deprecated `UnknownStmtLocation` class. Use `UnknownLocation` instead.
* Removed the deprecated `TemplateParameter` class. Use `TypeTemplateParameter` instead.
* Support for class resolution across link targets has been removed for databases which were created with CodeQL versions before 1.23.0.

View File

@@ -32,7 +32,6 @@ import semmle.code.cpp.Class
import semmle.code.cpp.Struct
import semmle.code.cpp.Union
import semmle.code.cpp.Enum
import semmle.code.cpp.Member
import semmle.code.cpp.Field
import semmle.code.cpp.Function
import semmle.code.cpp.MemberFunction

View File

@@ -148,28 +148,3 @@ class UnknownLocation extends Location {
this.getFile().getAbsolutePath() = "" and locations_default(this, _, 0, 0, 0, 0)
}
}
/**
* A dummy location which is used when something doesn't have a location in
* the source code but needs to have a `Location` associated with it.
*
* DEPRECATED: use `UnknownLocation`
*/
deprecated class UnknownDefaultLocation extends UnknownLocation { }
/**
* A dummy location which is used when an expression doesn't have a
* location in the source code but needs to have a `Location` associated
* with it.
*
* DEPRECATED: use `UnknownLocation`
*/
deprecated class UnknownExprLocation extends UnknownLocation { }
/**
* A dummy location which is used when a statement doesn't have a location
* in the source code but needs to have a `Location` associated with it.
*
* DEPRECATED: use `UnknownLocation`
*/
deprecated class UnknownStmtLocation extends UnknownLocation { }

View File

@@ -1,6 +0,0 @@
/**
* DEPRECATED: import `semmle.code.cpp.Element` and/or `semmle.code.cpp.Type` directly as required.
*/
import semmle.code.cpp.Element
import semmle.code.cpp.Type

View File

@@ -35,13 +35,6 @@ class NonTypeTemplateParameter extends Literal, TemplateParameterImpl {
override string getAPrimaryQlClass() { result = "NonTypeTemplateParameter" }
}
/**
* A C++ `typename` (or `class`) template parameter.
*
* DEPRECATED: Use `TypeTemplateParameter` instead.
*/
deprecated class TemplateParameter = TypeTemplateParameter;
/**
* A C++ `typename` (or `class`) template parameter.
*

View File

@@ -1,59 +1,5 @@
import semmle.code.cpp.Type
/** For upgraded databases without mangled name info. */
pragma[noinline]
private string getTopLevelClassName(@usertype c) {
not mangled_name(_, _, _) and
isClass(c) and
usertypes(c, result, _) and
not namespacembrs(_, c) and // not in a namespace
not member(_, _, c) and // not in some structure
not class_instantiation(c, _) // not a template instantiation
}
/**
* For upgraded databases without mangled name info.
* Holds if `d` is a unique complete class named `name`.
*/
pragma[noinline]
private predicate existsCompleteWithName(string name, @usertype d) {
not mangled_name(_, _, _) and
is_complete(d) and
name = getTopLevelClassName(d) and
onlyOneCompleteClassExistsWithName(name)
}
/** For upgraded databases without mangled name info. */
pragma[noinline]
private predicate onlyOneCompleteClassExistsWithName(string name) {
not mangled_name(_, _, _) and
strictcount(@usertype c | is_complete(c) and getTopLevelClassName(c) = name) = 1
}
/**
* For upgraded databases without mangled name info.
* Holds if `c` is an incomplete class named `name`.
*/
pragma[noinline]
private predicate existsIncompleteWithName(string name, @usertype c) {
not mangled_name(_, _, _) and
not is_complete(c) and
name = getTopLevelClassName(c)
}
/**
* For upgraded databases without mangled name info.
* Holds if `c` is an incomplete class, and there exists a unique complete class `d`
* with the same name.
*/
private predicate oldHasCompleteTwin(@usertype c, @usertype d) {
not mangled_name(_, _, _) and
exists(string name |
existsIncompleteWithName(name, c) and
existsCompleteWithName(name, d)
)
}
pragma[noinline]
private @mangledname getClassMangledName(@usertype c) {
isClass(c) and
@@ -103,10 +49,7 @@ private module Cached {
@usertype resolveClass(@usertype c) {
hasCompleteTwin(c, result)
or
oldHasCompleteTwin(c, result)
or
not hasCompleteTwin(c, _) and
not oldHasCompleteTwin(c, _) and
result = c
}

View File

@@ -1,14 +1,14 @@
| file://:0:0:0:0 | E<C>'s friend | loop.cpp:5:26:5:26 | E<D> |
| file://:0:0:0:0 | E<C>'s friend | loop.cpp:5:26:5:26 | E<T> |
| file://:0:0:0:0 | E<C>'s friend | loop.cpp:10:26:10:26 | F<D> |
| file://:0:0:0:0 | E<C>'s friend | loop.cpp:5:26:5:29 | E<D> |
| file://:0:0:0:0 | E<C>'s friend | loop.cpp:10:26:10:26 | F<T> |
| file://:0:0:0:0 | E<D>'s friend | loop.cpp:5:26:5:26 | E<C> |
| file://:0:0:0:0 | E<C>'s friend | loop.cpp:10:26:10:29 | F<D> |
| file://:0:0:0:0 | E<D>'s friend | loop.cpp:5:26:5:26 | E<T> |
| file://:0:0:0:0 | E<D>'s friend | loop.cpp:10:26:10:26 | F<D> |
| file://:0:0:0:0 | E<D>'s friend | loop.cpp:5:26:5:29 | E<C> |
| file://:0:0:0:0 | E<D>'s friend | loop.cpp:10:26:10:26 | F<T> |
| file://:0:0:0:0 | F<D>'s friend | loop.cpp:5:26:5:26 | E<C> |
| file://:0:0:0:0 | F<D>'s friend | loop.cpp:5:26:5:26 | E<D> |
| file://:0:0:0:0 | E<D>'s friend | loop.cpp:10:26:10:29 | F<D> |
| file://:0:0:0:0 | F<D>'s friend | loop.cpp:5:26:5:26 | E<T> |
| file://:0:0:0:0 | F<D>'s friend | loop.cpp:5:26:5:29 | E<C> |
| file://:0:0:0:0 | F<D>'s friend | loop.cpp:5:26:5:29 | E<D> |
| loop.cpp:6:5:6:5 | E<T>'s friend | loop.cpp:5:26:5:26 | E<T> |
| loop.cpp:7:5:7:5 | E<T>'s friend | loop.cpp:7:36:7:36 | F<U> |
| loop.cpp:11:5:11:5 | F<T>'s friend | loop.cpp:11:36:11:36 | E<U> |

View File

@@ -0,0 +1,9 @@
# THIS FILE IS AN AUTO-GENERATED MODELS AS DATA FILE. DO NOT EDIT.
# Generated from https://github.com/apache/axis-axis2-java-core.git#b7e6711279d38b7f0db3e648888de5154729e9a8 by codeql-mads-via-llm
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
- ["org.apache.axis2.addressing", "EndpointReference", True, "readExternal", "(ObjectInput)", "", "Argument[0]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.axis2.addressing", "RelatesTo", True, "readExternal", "(ObjectInput)", "", "Argument[0]", "unsafe-deserialization", "ai-generated"]

View File

@@ -0,0 +1,12 @@
# THIS FILE IS AN AUTO-GENERATED MODELS AS DATA FILE. DO NOT EDIT.
# Generated from https://github.com/apache/axis-axis2-java-core.git#b7e6711279d38b7f0db3e648888de5154729e9a8 by codeql-mads-via-llm
extensions:
- addsTo:
pack: codeql/java-all
extensible: sourceModel
data:
#- ["org.apache.axis2.builder", "DiskFileDataSource", True, "getContentType", "()", "", "ReturnValue", "remote", "ai-generated"] # INVALID: Not a remote source; returns local file-item metadata
#- ["org.apache.axis2.builder", "DiskFileDataSource", True, "getInputStream", "()", "", "ReturnValue", "remote", "ai-generated"] # INVALID: Not a remote source; returns local uploaded-file stream
#- ["org.apache.axis2.builder", "DiskFileDataSource", True, "getName", "()", "", "ReturnValue", "remote", "ai-generated"] # INVALID: Not a remote source; returns file-item name metadata
- ["org.apache.axis2.builder", "MultipartFormDataBuilder", True, "processDocument", "(InputStream,String,MessageContext)", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.builder", "XFormURLEncodedBuilder", True, "processDocument", "(InputStream,String,MessageContext)", "", "ReturnValue", "remote", "ai-generated"]

View File

@@ -0,0 +1,9 @@
# THIS FILE IS AN AUTO-GENERATED MODELS AS DATA FILE. DO NOT EDIT.
# Generated from https://github.com/apache/axis-axis2-java-core.git#b7e6711279d38b7f0db3e648888de5154729e9a8 by codeql-mads-via-llm
extensions:
- addsTo:
pack: codeql/java-all
extensible: sourceModel
data:
- ["org.apache.axis2.client.async", "AxisCallback", True, "onFault", "(MessageContext)", "", "Argument[0]", "remote", "ai-generated"]
- ["org.apache.axis2.client.async", "AxisCallback", True, "onMessage", "(MessageContext)", "", "Argument[0]", "remote", "ai-generated"]

View File

@@ -0,0 +1,26 @@
# THIS FILE IS AN AUTO-GENERATED MODELS AS DATA FILE. DO NOT EDIT.
# Generated from https://github.com/apache/axis-axis2-java-core.git#b7e6711279d38b7f0db3e648888de5154729e9a8 by codeql-mads-via-llm
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
- ["org.apache.axis2.client", "Options", True, "readExternal", "(ObjectInput)", "", "Argument[0]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.axis2.client", "ServiceClient", True, "ServiceClient", "(ConfigurationContext,URL,QName,String)", "", "Argument[1]", "request-forgery", "ai-generated"]
- ["org.apache.axis2.client", "ServiceClient", True, "fireAndForget", "(OMElement)", "", "this", "request-forgery", "ai-generated"]
- ["org.apache.axis2.client", "ServiceClient", True, "fireAndForget", "(QName,OMElement)", "", "this", "request-forgery", "ai-generated"]
- ["org.apache.axis2.client", "ServiceClient", True, "sendReceive", "(OMElement)", "", "this", "request-forgery", "ai-generated"]
- ["org.apache.axis2.client", "ServiceClient", True, "sendReceive", "(QName,OMElement)", "", "this", "request-forgery", "ai-generated"]
- ["org.apache.axis2.client", "ServiceClient", True, "sendReceiveNonBlocking", "(OMElement,AxisCallback)", "", "this", "request-forgery", "ai-generated"]
- ["org.apache.axis2.client", "ServiceClient", True, "sendReceiveNonBlocking", "(QName,OMElement,AxisCallback)", "", "this", "request-forgery", "ai-generated"]
- ["org.apache.axis2.client", "ServiceClient", True, "sendRobust", "(OMElement)", "", "this", "request-forgery", "ai-generated"]
- ["org.apache.axis2.client", "ServiceClient", True, "sendRobust", "(QName,OMElement)", "", "this", "request-forgery", "ai-generated"]
#- ["org.apache.axis2.client", "Stub", True, "addHttpHeader", "(MessageContext,String,String)", "", "Argument[1..2]", "response-splitting", "ai-generated"] # INVALID: Only stores header in memory; not response-splitting
#- ["org.apache.axis2.client", "Stub", True, "setServiceClientEPR", "(String)", "", "Argument[0]", "request-forgery", "ai-generated"] # INVALID: Just a setter; no request is made
- addsTo:
pack: codeql/java-all
extensible: sourceModel
data:
- ["org.apache.axis2.client", "OperationClient", True, "getMessageContext", "(String)", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.client", "ServiceClient", True, "sendReceive", "(OMElement)", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.client", "ServiceClient", True, "sendReceive", "(QName,OMElement)", "", "ReturnValue", "remote", "ai-generated"]

View File

@@ -0,0 +1,19 @@
# THIS FILE IS AN AUTO-GENERATED MODELS AS DATA FILE. DO NOT EDIT.
# Generated from https://github.com/apache/axis-axis2-java-core.git#b7e6711279d38b7f0db3e648888de5154729e9a8 by codeql-mads-via-llm
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
- ["org.apache.axis2.context.externalize", "DebugObjectInput", True, "readObject", "()", "", "this", "unsafe-deserialization", "ai-generated"]
#- ["org.apache.axis2.context.externalize", "DebugObjectInput", True, "trace", "(String)", "", "Argument[0]", "log-injection", "ai-generated"] # INVALID: Helper logging API; not a meaningful log-injection sink
- ["org.apache.axis2.context.externalize", "DebugObjectOutputStream", True, "writeBytes", "(String)", "", "Argument[0]", "log-injection", "ai-generated"]
- ["org.apache.axis2.context.externalize", "DebugObjectOutputStream", True, "writeChars", "(String)", "", "Argument[0]", "log-injection", "ai-generated"]
- ["org.apache.axis2.context.externalize", "DebugObjectOutputStream", True, "writeObject", "(Object)", "", "Argument[0]", "log-injection", "ai-generated"]
- ["org.apache.axis2.context.externalize", "DebugObjectOutputStream", True, "writeUTF", "(String)", "", "Argument[0]", "log-injection", "ai-generated"]
- ["org.apache.axis2.context.externalize", "SafeObjectInputStream", True, "readArrayList", "()", "", "this", "unsafe-deserialization", "ai-generated"]
- ["org.apache.axis2.context.externalize", "SafeObjectInputStream", True, "readHashMap", "()", "", "this", "unsafe-deserialization", "ai-generated"]
- ["org.apache.axis2.context.externalize", "SafeObjectInputStream", True, "readLinkedList", "()", "", "this", "unsafe-deserialization", "ai-generated"]
- ["org.apache.axis2.context.externalize", "SafeObjectInputStream", True, "readList", "(List)", "", "this", "unsafe-deserialization", "ai-generated"]
- ["org.apache.axis2.context.externalize", "SafeObjectInputStream", True, "readMap", "(Map)", "", "this", "unsafe-deserialization", "ai-generated"]
- ["org.apache.axis2.context.externalize", "SafeObjectInputStream", True, "readObject", "()", "", "this", "unsafe-deserialization", "ai-generated"]

View File

@@ -0,0 +1,35 @@
# THIS FILE IS AN AUTO-GENERATED MODELS AS DATA FILE. DO NOT EDIT.
# Generated from https://github.com/apache/axis-axis2-java-core.git#b7e6711279d38b7f0db3e648888de5154729e9a8 by codeql-mads-via-llm
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
#- ["org.apache.axis2.context", "ConfigurationContext", True, "getRealPath", "(String)", "", "Argument[0]", "path-injection", "ai-generated"] # INVALID: Just resolves new File(repo,path); no file access
- ["org.apache.axis2.context", "ConfigurationContextFactory", True, "createConfigurationContextFromFileSystem", "(String)", "", "Argument[0]", "path-injection", "ai-generated"]
- ["org.apache.axis2.context", "ConfigurationContextFactory", True, "createConfigurationContextFromFileSystem", "(String,String)", "", "Argument[0..1]", "path-injection", "ai-generated"]
- ["org.apache.axis2.context", "ConfigurationContextFactory", True, "createConfigurationContextFromURIs", "(URL,URL)", "", "Argument[0..1]", "request-forgery", "ai-generated"]
- ["org.apache.axis2.context", "MessageContext", True, "readExternal", "(ObjectInput)", "", "Argument[0]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.axis2.context", "OperationContext", True, "readExternal", "(ObjectInput)", "", "Argument[0]", "unsafe-deserialization", "ai-generated"]
#- ["org.apache.axis2.context", "SelfManagedDataManager", True, "deserializeSelfManagedData", "(ByteArrayInputStream,MessageContext)", "", "Argument[0]", "unsafe-deserialization", "ai-generated"] # INVALID: Interface declaration only; no implementation
- ["org.apache.axis2.context", "ServiceContext", True, "readExternal", "(ObjectInput)", "", "Argument[0]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.axis2.context", "ServiceGroupContext", True, "readExternal", "(ObjectInput)", "", "Argument[0]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.axis2.context", "SessionContext", True, "readExternal", "(ObjectInput)", "", "Argument[0]", "unsafe-deserialization", "ai-generated"]
- addsTo:
pack: codeql/java-all
extensible: sourceModel
data:
- ["org.apache.axis2.context", "MessageContext", True, "getAttachment", "(String)", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.context", "MessageContext", True, "getAttachmentMap", "()", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.context", "MessageContext", True, "getAttachmentMap", "(boolean)", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.context", "MessageContext", True, "getEnvelope", "()", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.context", "MessageContext", True, "getFaultTo", "()", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.context", "MessageContext", True, "getFrom", "()", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.context", "MessageContext", True, "getMessageID", "()", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.context", "MessageContext", True, "getRelatesTo", "()", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.context", "MessageContext", True, "getRelatesTo", "(String)", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.context", "MessageContext", True, "getRelationships", "()", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.context", "MessageContext", True, "getReplyTo", "()", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.context", "MessageContext", True, "getSoapAction", "()", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.context", "MessageContext", True, "getTo", "()", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.context", "MessageContext", True, "getWSAAction", "()", "", "ReturnValue", "remote", "ai-generated"]

View File

@@ -0,0 +1,17 @@
# THIS FILE IS AN AUTO-GENERATED MODELS AS DATA FILE. DO NOT EDIT.
# Generated from https://github.com/apache/axis-axis2-java-core.git#b7e6711279d38b7f0db3e648888de5154729e9a8 by codeql-mads-via-llm
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
#- ["org.apache.axis2.dataretrieval", "AxisDataLocatorImpl", True, "getData", "(DataRetrievalRequest,MessageContext)", "", "Argument[0]", "log-injection", "ai-generated"] # INVALID: No log sink on arg[0]
- ["org.apache.axis2.dataretrieval", "DataRetrievalUtil", True, "buildOM", "(ClassLoader,String)", "", "Argument[1]", "path-injection", "ai-generated"]
- ["org.apache.axis2.dataretrieval", "ServiceData", True, "getFileContent", "(ClassLoader)", "", "this", "path-injection", "ai-generated"]
- addsTo:
pack: codeql/java-all
extensible: sourceModel
data:
#- ["org.apache.axis2.dataretrieval", "BaseAxisDataLocator", True, "getData", "(DataRetrievalRequest,MessageContext)", "", "ReturnValue", "file", "ai-generated"] # INVALID: Returns in-memory data, not a file source
#- ["org.apache.axis2.dataretrieval", "BaseAxisDataLocator", True, "outputInlineForm", "(MessageContext,ServiceData[])", "", "ReturnValue", "file", "ai-generated"] # INVALID: Builds OM from in-memory metadata; not a file source
#- ["org.apache.axis2.dataretrieval", "ServiceData", True, "getFileContent", "(ClassLoader)", "", "ReturnValue", "file", "ai-generated"] # INVALID: This is a sink, not a source

View File

@@ -0,0 +1,33 @@
# THIS FILE IS AN AUTO-GENERATED MODELS AS DATA FILE. DO NOT EDIT.
# Generated from https://github.com/apache/axis-axis2-java-core.git#b7e6711279d38b7f0db3e648888de5154729e9a8 by codeql-mads-via-llm
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
#- ["org.apache.axis2.deployment", "Deployer", True, "deploy", "(DeploymentFileData)", "", "Argument[0]", "path-injection", "ai-generated"] # INVALID: No implementation found in repo
#- ["org.apache.axis2.deployment", "Deployer", True, "undeploy", "(String)", "", "Argument[0]", "path-injection", "ai-generated"] # INVALID: No method found
#- ["org.apache.axis2.deployment", "DeploymentClassLoader", True, "getResourceAsStream", "(String)", "", "Argument[0]", "path-injection", "ai-generated"] # INVALID: Method not found in repo
- ["org.apache.axis2.deployment", "DeploymentEngine", True, "buildModule", "(File,AxisConfiguration)", "", "Argument[0]", "path-injection", "ai-generated"]
- ["org.apache.axis2.deployment", "DeploymentEngine", True, "getFileList", "(URL)", "", "Argument[0]", "request-forgery", "ai-generated"]
- ["org.apache.axis2.deployment", "DeploymentEngine", True, "loadRepository", "(String)", "", "Argument[0]", "path-injection", "ai-generated"]
- ["org.apache.axis2.deployment", "DeploymentEngine", True, "loadRepositoryFromURL", "(URL)", "", "Argument[0]", "request-forgery", "ai-generated"]
- ["org.apache.axis2.deployment", "DeploymentEngine", True, "loadServiceGroup", "(File,ConfigurationContext)", "", "Argument[0]", "path-injection", "ai-generated"]
- ["org.apache.axis2.deployment", "DeploymentEngine", True, "loadServicesFromUrl", "(URL)", "", "Argument[0]", "request-forgery", "ai-generated"]
- ["org.apache.axis2.deployment", "DeploymentEngine", True, "prepareRepository", "(String)", "", "Argument[0]", "path-injection", "ai-generated"]
- ["org.apache.axis2.deployment", "DeploymentEngine", True, "setClassLoaders", "(String)", "", "Argument[0]", "path-injection", "ai-generated"]
- ["org.apache.axis2.deployment", "FileSystemConfigurator", True, "getAxisConfiguration", "()", "", "this", "path-injection", "ai-generated"]
- ["org.apache.axis2.deployment", "ModuleDeployer", True, "deoloyFromUrl", "(DeploymentFileData)", "", "Argument[0]", "request-forgery", "ai-generated"]
- ["org.apache.axis2.deployment", "ModuleDeployer", True, "deploy", "(DeploymentFileData)", "", "Argument[0]", "path-injection", "ai-generated"]
- ["org.apache.axis2.deployment", "RepositoryListener", True, "findServicesInDirectory", "(File)", "", "Argument[0]", "path-injection", "ai-generated"]
- ["org.apache.axis2.deployment", "ServiceDeployer", True, "deploy", "(DeploymentFileData)", "", "Argument[0]", "path-injection", "ai-generated"]
#- ["org.apache.axis2.deployment", "ServiceDeployer", True, "deploy", "(DeploymentFileData)", "", "Argument[0]", "request-forgery", "ai-generated"] # INVALID: Not a request-forgery sink; URL fetch is in deployFromUrl
- ["org.apache.axis2.deployment", "ServiceDeployer", True, "deployFromUrl", "(Deployer,URL)", "", "Argument[1]", "request-forgery", "ai-generated"]
- ["org.apache.axis2.deployment", "ServiceDeployer", True, "deployFromUrl", "(DeploymentFileData)", "", "Argument[0]", "request-forgery", "ai-generated"]
- ["org.apache.axis2.deployment", "TransportDeployer", True, "deploy", "(DeploymentFileData)", "", "Argument[0]", "path-injection", "ai-generated"]
- ["org.apache.axis2.deployment", "URLBasedAxisConfigurator", True, "getAxisConfiguration", "()", "", "this", "request-forgery", "ai-generated"]
- addsTo:
pack: codeql/java-all
extensible: sourceModel
data:
#- ["org.apache.axis2.deployment", "DeploymentEngine", True, "getFileList", "(URL)", "", "ReturnValue", "remote", "ai-generated"] # INVALID: Not a source; it is a URL fetch sink

View File

@@ -0,0 +1,13 @@
# THIS FILE IS AN AUTO-GENERATED MODELS AS DATA FILE. DO NOT EDIT.
# Generated from https://github.com/apache/axis-axis2-java-core.git#b7e6711279d38b7f0db3e648888de5154729e9a8 by codeql-mads-via-llm
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
- ["org.apache.axis2.deployment.repository.util", "ArchiveReader", True, "buildServiceDescription", "(String,ConfigurationContext,boolean)", "", "Argument[0]", "path-injection", "ai-generated"]
- ["org.apache.axis2.deployment.repository.util", "ArchiveReader", True, "processFilesInFolder", "(File,HashMap)", "", "Argument[0]", "path-injection", "ai-generated"]
- ["org.apache.axis2.deployment.repository.util", "ArchiveReader", True, "processServiceGroup", "(String,DeploymentFileData,AxisServiceGroup,boolean,HashMap,ConfigurationContext)", "", "Argument[0]", "path-injection", "ai-generated"]
- ["org.apache.axis2.deployment.repository.util", "DeploymentFileData", True, "setClassLoader", "(boolean,ClassLoader,File,boolean)", "", "Argument[2]", "path-injection", "ai-generated"]
- ["org.apache.axis2.deployment.repository.util", "WSInfoList", True, "addWSInfoItem", "(File,Deployer,int)", "", "Argument[0]", "path-injection", "ai-generated"]
- ["org.apache.axis2.deployment.repository.util", "WSInfoList", True, "addWSInfoItem", "(URL,Deployer,int)", "", "Argument[0]", "path-injection", "ai-generated"]

View File

@@ -0,0 +1,20 @@
# THIS FILE IS AN AUTO-GENERATED MODELS AS DATA FILE. DO NOT EDIT.
# Generated from https://github.com/apache/axis-axis2-java-core.git#b7e6711279d38b7f0db3e648888de5154729e9a8 by codeql-mads-via-llm
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
- ["org.apache.axis2.deployment.resolver", "AARBasedWSDLLocator", True, "getImportInputSource", "(String,String)", "", "Argument[0..1]", "path-injection", "ai-generated"]
- ["org.apache.axis2.deployment.resolver", "AARBasedWSDLLocator", True, "getImportInputSource", "(String,String)", "", "Argument[0..1]", "request-forgery", "ai-generated"]
- ["org.apache.axis2.deployment.resolver", "AARFileBasedURIResolver", True, "resolveEntity", "(String,String,String)", "", "Argument[1..2]", "path-injection", "ai-generated"]
#- ["org.apache.axis2.deployment.resolver", "AARFileBasedURIResolver", True, "resolveEntity", "(String,String,String)", "", "Argument[1..2]", "request-forgery", "ai-generated"] # INVALID: Blocks remote URLs; not a request-forgery sink
- ["org.apache.axis2.deployment.resolver", "WarBasedWSDLLocator", True, "getImportInputSource", "(String,String)", "", "Argument[0..1]", "path-injection", "ai-generated"]
- ["org.apache.axis2.deployment.resolver", "WarBasedWSDLLocator", True, "getImportInputSource", "(String,String)", "", "Argument[0..1]", "request-forgery", "ai-generated"]
- ["org.apache.axis2.deployment.resolver", "WarFileBasedURIResolver", True, "resolveEntity", "(String,String,String)", "", "Argument[1..2]", "path-injection", "ai-generated"]
#- ["org.apache.axis2.deployment.resolver", "WarFileBasedURIResolver", True, "resolveEntity", "(String,String,String)", "", "Argument[1..2]", "request-forgery", "ai-generated"] # INVALID: Remote URLs blocked; not a request-forgery sink
- addsTo:
pack: codeql/java-all
extensible: sourceModel
data:
#- ["org.apache.axis2.deployment.resolver", "AARFileBasedURIResolver", True, "resolveEntity", "(String,String,String)", "", "ReturnValue", "file", "ai-generated"] # INVALID: This is a sink/resolver, not a source

View File

@@ -0,0 +1,14 @@
# THIS FILE IS AN AUTO-GENERATED MODELS AS DATA FILE. DO NOT EDIT.
# Generated from https://github.com/apache/axis-axis2-java-core.git#b7e6711279d38b7f0db3e648888de5154729e9a8 by codeql-mads-via-llm
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
- ["org.apache.axis2.deployment.util", "TempFileManager", True, "createTempFile", "(String,String)", "", "Argument[0..1]", "path-injection", "ai-generated"]
- ["org.apache.axis2.deployment.util", "Utils", True, "createClassLoader", "(File,boolean)", "", "Argument[0]", "path-injection", "ai-generated"]
- ["org.apache.axis2.deployment.util", "Utils", True, "createClassLoader", "(URL,URL[],ClassLoader,File,boolean)", "", "Argument[0]", "request-forgery", "ai-generated"]
- ["org.apache.axis2.deployment.util", "Utils", True, "createTempFile", "(String,InputStream,File)", "", "Argument[0]", "path-injection", "ai-generated"]
- ["org.apache.axis2.deployment.util", "Utils", True, "getClassLoader", "(ClassLoader,File,boolean)", "", "Argument[1]", "path-injection", "ai-generated"]
- ["org.apache.axis2.deployment.util", "Utils", True, "getClassLoader", "(ClassLoader,String,boolean)", "", "Argument[1]", "path-injection", "ai-generated"]
- ["org.apache.axis2.deployment.util", "Utils", True, "getURLsForAllJars", "(URL,File)", "", "Argument[0]", "request-forgery", "ai-generated"]

View File

@@ -0,0 +1,8 @@
# THIS FILE IS AN AUTO-GENERATED MODELS AS DATA FILE. DO NOT EDIT.
# Generated from https://github.com/apache/axis-axis2-java-core.git#b7e6711279d38b7f0db3e648888de5154729e9a8 by codeql-mads-via-llm
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
#- ["org.apache.axis2.description.java2wsdl", "DefaultSchemaGenerator", True, "generateSchema", "()", "", "this", "path-injection", "ai-generated"] # INVALID: Generates schemas from classes, not paths

View File

@@ -0,0 +1,12 @@
# THIS FILE IS AN AUTO-GENERATED MODELS AS DATA FILE. DO NOT EDIT.
# Generated from https://github.com/apache/axis-axis2-java-core.git#b7e6711279d38b7f0db3e648888de5154729e9a8 by codeql-mads-via-llm
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
- ["org.apache.axis2.description", "AxisService", True, "createClientSideAxisService", "(URL,QName,String,Options)", "", "Argument[0]", "request-forgery", "ai-generated"]
- ["org.apache.axis2.description", "Parameter", True, "readExternal", "(ObjectInput)", "", "Argument[0]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.axis2.description", "ParameterIncludeImpl", True, "readExternal", "(ObjectInput)", "", "Argument[0]", "unsafe-deserialization", "ai-generated"]
#- ["org.apache.axis2.description", "WSDL11ToAxisServiceBuilder", True, "populateService", "()", "", "this", "request-forgery", "ai-generated"] # INVALID: Operates on already-loaded WSDL DOM; no URL fetch
#- ["org.apache.axis2.description", "WSDLToAxisServiceBuilder", True, "getXMLSchema", "(Element,String)", "", "Argument[1]", "request-forgery", "ai-generated"] # INVALID: Resolves schemas from in-memory DOM element

View File

@@ -0,0 +1,8 @@
# THIS FILE IS AN AUTO-GENERATED MODELS AS DATA FILE. DO NOT EDIT.
# Generated from https://github.com/apache/axis-axis2-java-core.git#b7e6711279d38b7f0db3e648888de5154729e9a8 by codeql-mads-via-llm
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
#- ["org.apache.axis2.dispatchers", "RequestURIBasedServiceDispatcher", True, "findService", "(MessageContext)", "", "Argument[0]", "log-injection", "ai-generated"] # INVALID: Routes by URI; no log-injection sink

View File

@@ -0,0 +1,20 @@
# THIS FILE IS AN AUTO-GENERATED MODELS AS DATA FILE. DO NOT EDIT.
# Generated from https://github.com/apache/axis-axis2-java-core.git#b7e6711279d38b7f0db3e648888de5154729e9a8 by codeql-mads-via-llm
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
- ["org.apache.axis2.engine", "AxisConfiguration", True, "deployModule", "(String)", "", "Argument[0]", "path-injection", "ai-generated"]
#- ["org.apache.axis2.engine", "AxisEngine", True, "resume", "(MessageContext)", "", "Argument[0]", "request-forgery", "ai-generated"] # INVALID: Just resumes flow; no direct network op
#- ["org.apache.axis2.engine", "AxisEngine", True, "resumeSend", "(MessageContext)", "", "Argument[0]", "request-forgery", "ai-generated"] # INVALID: Only invokes handlers; not a direct RF sink
#- ["org.apache.axis2.engine", "AxisEngine", True, "resumeSendFault", "(MessageContext)", "", "Argument[0]", "request-forgery", "ai-generated"] # INVALID: Only resumes fault flow; not a direct RF sink
#- ["org.apache.axis2.engine", "AxisEngine", True, "send", "(MessageContext)", "", "Argument[0]", "request-forgery", "ai-generated"] # INVALID: Engine dispatch; not a direct request sink
#- ["org.apache.axis2.engine", "AxisEngine", True, "sendFault", "(MessageContext)", "", "Argument[0]", "request-forgery", "ai-generated"] # INVALID: Fault dispatch; not a direct RF sink
- addsTo:
pack: codeql/java-all
extensible: sourceModel
data:
- ["org.apache.axis2.engine", "Handler", True, "flowComplete", "(MessageContext)", "", "Argument[0]", "remote", "ai-generated"]
- ["org.apache.axis2.engine", "Handler", True, "invoke", "(MessageContext)", "", "Argument[0]", "remote", "ai-generated"]
- ["org.apache.axis2.engine", "MessageReceiver", True, "receive", "(MessageContext)", "", "Argument[0]", "remote", "ai-generated"]

View File

@@ -0,0 +1,8 @@
# THIS FILE IS AN AUTO-GENERATED MODELS AS DATA FILE. DO NOT EDIT.
# Generated from https://github.com/apache/axis-axis2-java-core.git#b7e6711279d38b7f0db3e648888de5154729e9a8 by codeql-mads-via-llm
extensions:
- addsTo:
pack: codeql/java-all
extensible: sourceModel
data:
- ["org.apache.axis2.handlers", "AbstractHandler", True, "flowComplete", "(MessageContext)", "", "Argument[0]", "remote", "ai-generated"]

View File

@@ -0,0 +1,14 @@
# THIS FILE IS AN AUTO-GENERATED MODELS AS DATA FILE. DO NOT EDIT.
# Generated from https://github.com/apache/axis-axis2-java-core.git#b7e6711279d38b7f0db3e648888de5154729e9a8 by codeql-mads-via-llm
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
#- ["org.apache.axis2.kernel", "OutTransportInfo", True, "setContentType", "(String)", "", "Argument[0]", "response-splitting", "ai-generated"] # INVALID: Just an interface setter; not a sink
- ["org.apache.axis2.kernel", "SimpleAxis2Server", True, "SimpleAxis2Server", "(String,String)", "", "Argument[0..1]", "path-injection", "ai-generated"]
- addsTo:
pack: codeql/java-all
extensible: sourceModel
data:
- ["org.apache.axis2.kernel", "SimpleAxis2Server", True, "main", "(String[])", "", "Argument[0]", "commandargs", "ai-generated"]

View File

@@ -0,0 +1,10 @@
# THIS FILE IS AN AUTO-GENERATED MODELS AS DATA FILE. DO NOT EDIT.
# Generated from https://github.com/apache/axis-axis2-java-core.git#b7e6711279d38b7f0db3e648888de5154729e9a8 by codeql-mads-via-llm
extensions:
- addsTo:
pack: codeql/java-all
extensible: sourceModel
data:
- ["org.apache.axis2.receivers", "AbstractInOutMessageReceiver", True, "invokeBusinessLogic", "(MessageContext)", "", "Argument[0]", "remote", "ai-generated"]
- ["org.apache.axis2.receivers", "AbstractInOutMessageReceiver", True, "invokeBusinessLogic", "(MessageContext,MessageContext)", "", "Argument[0]", "remote", "ai-generated"]
- ["org.apache.axis2.receivers", "ServerCallback", True, "handleResult", "(MessageContext)", "", "Argument[0]", "remote", "ai-generated"]

View File

@@ -0,0 +1,44 @@
# THIS FILE IS AN AUTO-GENERATED MODELS AS DATA FILE. DO NOT EDIT.
# Generated from https://github.com/apache/axis-axis2-java-core.git#b7e6711279d38b7f0db3e648888de5154729e9a8 by codeql-mads-via-llm
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
- ["org.apache.axis2.util", "FileWriter", True, "createClassFile", "(File,String,String,String)", "", "Argument[0..3]", "path-injection", "ai-generated"]
- ["org.apache.axis2.util", "LogWriter", True, "write", "(char[],int,int)", "", "Argument[0]", "log-injection", "ai-generated"]
- ["org.apache.axis2.util", "MetaDataEntry", True, "readExternal", "(ObjectInput)", "", "Argument[0]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.axis2.util", "ObjectStateUtils", True, "readArrayList", "(ObjectInput,String)", "", "Argument[0]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.axis2.util", "ObjectStateUtils", True, "readHashMap", "(ObjectInput,String)", "", "Argument[0]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.axis2.util", "ObjectStateUtils", True, "readLinkedList", "(ObjectInput,String)", "", "Argument[0]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.axis2.util", "ObjectStateUtils", True, "readObject", "(ObjectInput,String)", "", "Argument[0]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.axis2.util", "ObjectStateUtils", True, "readString", "(ObjectInput,String)", "", "Argument[0]", "unsafe-deserialization", "ai-generated"]
- ["org.apache.axis2.util", "OnDemandLogger", True, "debug", "(Object)", "", "Argument[0]", "log-injection", "ai-generated"]
- ["org.apache.axis2.util", "OnDemandLogger", True, "debug", "(Object,Throwable)", "", "Argument[0]", "log-injection", "ai-generated"]
- ["org.apache.axis2.util", "OnDemandLogger", True, "error", "(Object)", "", "Argument[0]", "log-injection", "ai-generated"]
- ["org.apache.axis2.util", "OnDemandLogger", True, "error", "(Object,Throwable)", "", "Argument[0]", "log-injection", "ai-generated"]
- ["org.apache.axis2.util", "OnDemandLogger", True, "fatal", "(Object)", "", "Argument[0]", "log-injection", "ai-generated"]
- ["org.apache.axis2.util", "OnDemandLogger", True, "fatal", "(Object,Throwable)", "", "Argument[0]", "log-injection", "ai-generated"]
- ["org.apache.axis2.util", "OnDemandLogger", True, "info", "(Object)", "", "Argument[0]", "log-injection", "ai-generated"]
- ["org.apache.axis2.util", "OnDemandLogger", True, "info", "(Object,Throwable)", "", "Argument[0]", "log-injection", "ai-generated"]
- ["org.apache.axis2.util", "OnDemandLogger", True, "trace", "(Object)", "", "Argument[0]", "log-injection", "ai-generated"]
- ["org.apache.axis2.util", "OnDemandLogger", True, "trace", "(Object,Throwable)", "", "Argument[0]", "log-injection", "ai-generated"]
- ["org.apache.axis2.util", "OnDemandLogger", True, "warn", "(Object)", "", "Argument[0]", "log-injection", "ai-generated"]
- ["org.apache.axis2.util", "OnDemandLogger", True, "warn", "(Object,Throwable)", "", "Argument[0]", "log-injection", "ai-generated"]
- ["org.apache.axis2.util", "SecureWSDLLocator", True, "getBaseInputSource", "()", "", "this", "request-forgery", "ai-generated"]
- ["org.apache.axis2.util", "SecureWSDLLocator", True, "getImportInputSource", "(String,String)", "", "Argument[0..1]", "request-forgery", "ai-generated"]
#- ["org.apache.axis2.util", "Utils", True, "getNewConfigurationContext", "(String)", "", "Argument[0]", "path-injection", "ai-generated"] # INVALID: Method not found in repo
- ["org.apache.axis2.util", "XMLPrettyPrinter", True, "prettify", "(File)", "", "Argument[0]", "path-injection", "ai-generated"]
#- ["org.apache.axis2.util", "XMLUtils", True, "getInputSourceFromURI", "(String)", "", "Argument[0]", "request-forgery", "ai-generated"] # INVALID: Only returns new InputSource(uri); no fetch
- ["org.apache.axis2.util", "XMLUtils", True, "newDocument", "(String)", "", "Argument[0]", "request-forgery", "ai-generated"]
- ["org.apache.axis2.util", "XMLUtils", True, "newDocument", "(String,String,String)", "", "Argument[0]", "request-forgery", "ai-generated"]
- addsTo:
pack: codeql/java-all
extensible: sourceModel
data:
- ["org.apache.axis2.util", "CallbackReceiver", True, "receive", "(MessageContext)", "", "Argument[0]", "remote", "ai-generated"]
- ["org.apache.axis2.util", "OptionsParser", True, "getPassword", "()", "", "ReturnValue", "commandargs", "ai-generated"]
- ["org.apache.axis2.util", "OptionsParser", True, "getRemainingArgs", "()", "", "ReturnValue", "commandargs", "ai-generated"]
- ["org.apache.axis2.util", "OptionsParser", True, "getRemainingFlags", "()", "", "ReturnValue", "commandargs", "ai-generated"]
- ["org.apache.axis2.util", "OptionsParser", True, "getUser", "()", "", "ReturnValue", "commandargs", "ai-generated"]
- ["org.apache.axis2.util", "OptionsParser", True, "isValueSet", "(char)", "", "ReturnValue", "commandargs", "ai-generated"]

View File

@@ -0,0 +1,11 @@
# THIS FILE IS AN AUTO-GENERATED MODELS AS DATA FILE. DO NOT EDIT.
# Generated from https://github.com/apache/axis-axis2-java-core.git#b7e6711279d38b7f0db3e648888de5154729e9a8 by codeql-mads-via-llm
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
- ["org.apache.axis2.wsdl.util", "WSDLDefinitionWrapper", True, "WSDLDefinitionWrapper", "(Definition,URL,AxisConfiguration)", "", "Argument[1]", "request-forgery", "ai-generated"]
- ["org.apache.axis2.wsdl.util", "WSDLDefinitionWrapper", True, "WSDLDefinitionWrapper", "(Definition,URL,boolean)", "", "Argument[1]", "request-forgery", "ai-generated"]
- ["org.apache.axis2.wsdl.util", "WSDLDefinitionWrapper", True, "WSDLDefinitionWrapper", "(Definition,URL,boolean,int)", "", "Argument[1]", "request-forgery", "ai-generated"]
- ["org.apache.axis2.wsdl.util", "WSDLDefinitionWrapper", True, "WSDLDefinitionWrapper", "(Definition,URL,int)", "", "Argument[1]", "request-forgery", "ai-generated"]

View File

@@ -0,0 +1,278 @@
# MaD Generation Report
## Included (264)
| Package | Class | Method | Type | Kind | Certainty | Reason |
|---------|-------|--------|------|------|-----------|--------|
| org.apache.axis2.context | ConfigurationContext | getRealPath | sink | CWE-22 | 5 | Argument 0 is a relative path that is resolved against the repository root directory (via AxisConfiguration.getRepository()) to produce a File. A user-controlled path with '..' sequences could traverse outside the intended directory, leading to path traversal. |
| org.apache.axis2.context | ConfigurationContextFactory | createConfigurationContextFromFileSystem | sink | CWE-22 | 4 | Arguments 0 (path) and 1 (axis2xml) are file system paths used to locate the repository directory and configuration file. An attacker who controls these values can traverse the file system to access arbitrary files/directories. |
| org.apache.axis2.context | ConfigurationContextFactory | createConfigurationContextFromFileSystem | sink | CWE-22 | 4 | Argument 0 (path) is a file system path used to locate the repository directory. An attacker who controls this value can traverse the file system to access arbitrary directories. |
| org.apache.axis2.context | ConfigurationContextFactory | createConfigurationContextFromURIs | sink | CWE-918 | 4 | Arguments 0 (axis2xml) and 1 (repository) are URLs used to load configuration and repository data from potentially remote locations. An attacker controlling these URLs can perform server-side request forgery. |
| org.apache.axis2.context | MessageContext | readExternal | sink | CWE-502 | 5 | readExternal deserializes data from the ObjectInput stream (arg 0), calling SafeObjectInputStream.readObject(), readUTF(), readMap(), etc. This is unsafe deserialization of potentially untrusted data. |
| org.apache.axis2.context | MessageContext | getEnvelope | source | remote | 4 | Returns the SOAPEnvelope of the incoming SOAP message, which contains the full message body and headers from the remote client. |
| org.apache.axis2.context | MessageContext | getSoapAction | source | remote | 4 | Returns the SOAP action string from the incoming message (typically from the HTTP SOAPAction header), which is remote-controlled. |
| org.apache.axis2.context | MessageContext | getWSAAction | source | remote | 4 | Returns the WS-Addressing Action from the incoming message, which is controlled by the remote client. |
| org.apache.axis2.context | MessageContext | getMessageID | source | remote | 4 | Returns the WS-Addressing MessageID from the incoming message, which is controlled by the remote client. |
| org.apache.axis2.context | MessageContext | getFrom | source | remote | 4 | Returns the WS-Addressing From EndpointReference from the incoming message, which is controlled by the remote client. |
| org.apache.axis2.context | MessageContext | getTo | source | remote | 4 | Returns the WS-Addressing To EndpointReference from the incoming message, which is controlled by the remote client. |
| org.apache.axis2.context | MessageContext | getReplyTo | source | remote | 4 | Returns the WS-Addressing ReplyTo EndpointReference from the incoming message, which is controlled by the remote client. |
| org.apache.axis2.context | MessageContext | getFaultTo | source | remote | 4 | Returns the WS-Addressing FaultTo EndpointReference from the incoming message, which is controlled by the remote client. |
| org.apache.axis2.context | MessageContext | getRelatesTo | source | remote | 4 | Returns the WS-Addressing RelatesTo from the incoming message, which is controlled by the remote client. |
| org.apache.axis2.context | MessageContext | getRelatesTo | source | remote | 4 | Returns the WS-Addressing RelatesTo of a specified type from the incoming message, controlled by the remote client. |
| org.apache.axis2.context | MessageContext | getRelationships | source | remote | 4 | Returns all WS-Addressing RelatesTo headers from the incoming message, which are controlled by the remote client. |
| org.apache.axis2.context | MessageContext | getAttachment | source | remote | 4 | Returns the DataHandler for a MIME attachment from the incoming message, which is remote-controlled data. |
| org.apache.axis2.context | MessageContext | getAttachmentMap | source | remote | 4 | Returns the Attachments map from the incoming message. MIME attachments are remote-controlled data. |
| org.apache.axis2.context | MessageContext | getAttachmentMap | source | remote | 4 | Returns the Attachments map from the incoming message. MIME attachments are remote-controlled data. |
| org.apache.axis2.context | OperationContext | readExternal | sink | CWE-502 | 5 | The readExternal method deserializes objects from the ObjectInput stream (argument 0). Callees confirm it delegates to SafeObjectInputStream.readObject(), readHashMap(), and readMap(), which perform deserialization of arbitrary objects. This can lead to arbitrary code execution if the input stream contains untrusted data. |
| org.apache.axis2.context | SelfManagedDataManager | deserializeSelfManagedData | sink | CWE-502 | 4 | Argument 0 is a ByteArrayInputStream containing serialized data that is deserialized (reconstituted) by the implementor. The method's explicit purpose is deserialization of previously serialized handler-specific data, making the data parameter a deserialization sink. |
| org.apache.axis2.context | ServiceContext | readExternal | sink | CWE-502 | 5 | Argument 0 is an ObjectInput stream from which the method deserializes objects by calling SafeObjectInputStream.readObject() and readMap(). Deserializing untrusted data from this stream can lead to arbitrary code execution via gadget chains (CWE-502). |
| org.apache.axis2.context | ServiceGroupContext | readExternal | sink | CWE-502 | 5 | Argument 0 (ObjectInput inObject) is deserialized via SafeObjectInputStream.readObject() and readMap(), which can lead to unsafe deserialization of untrusted data (CWE-502). |
| org.apache.axis2.context | SessionContext | readExternal | sink | CWE-502 | 5 | Argument 0 (ObjectInput inObject) is deserialized via readObject() and readMap() calls on SafeObjectInputStream, which wraps the provided ObjectInput. This can lead to arbitrary code execution if the stream contains untrusted data. |
| org.apache.axis2.description.java2wsdl.bytecode | ClassReader | resolveClass | sink | CWE-470 | 4 | The method calls Class.forName(String) with a class name resolved from the object's internal byte buffer (constant pool). If untrusted bytecode was loaded into this ClassReader, an attacker could control which class is dynamically loaded. |
| org.apache.axis2.description.java2wsdl.bytecode | ClassReader | resolveMethod | sink | CWE-470 | 4 | The method calls resolveClass which calls Class.forName(String) with a class name from the internal byte buffer. If untrusted bytecode was loaded into this ClassReader, an attacker could control which class is dynamically loaded and which method is resolved. |
| org.apache.axis2.description.java2wsdl.bytecode | ClassReader | resolveField | sink | CWE-470 | 4 | The method calls resolveClass which calls Class.forName(String) with a class name from the internal byte buffer. If untrusted bytecode was loaded into this ClassReader, an attacker could control which class is loaded and which field is accessed. |
| org.apache.axis2.receivers | AbstractInOutMessageReceiver | invokeBusinessLogic | source | remote | 4 | This is a framework callback in Apache Axis2 invoked when a SOAP/REST request arrives. Parameter 0 (msgContext) carries the incoming MessageContext with remote request data from the client. |
| org.apache.axis2.receivers | AbstractInOutMessageReceiver | invokeBusinessLogic | source | remote | 4 | This is the abstract framework callback that developers override to handle incoming Axis2 web service requests. Parameter 0 (inMessage) carries the incoming MessageContext with remote request data from the client. |
| org.apache.axis2.receivers | ServerCallback | handleResult | source | remote | 3 | handleResult is a server-side framework callback in Axis2's receiver pipeline. The MessageContext parameter (arg 0) carries the full SOAP message context (envelope, headers, body) originating from a remote client, making it a source of remote data. |
| org.apache.axis2.context.externalize | DebugObjectInput | readObject | sink | CWE-502 | 5 | readObject() delegates to ObjectInput.readObject(), performing Java deserialization on data from the underlying ObjectInput stream held in `this`. Deserializing untrusted data can lead to arbitrary code execution (CWE-502). |
| org.apache.axis2.context.externalize | DebugObjectInput | trace | sink | CWE-117 | 4 | Argument 0 (str) is passed directly to Log.debug(), writing it to the log without sanitization. If the string contains attacker-controlled data, it can lead to log injection (CWE-117). |
| org.apache.axis2.context.externalize | DebugObjectOutputStream | writeUTF | sink | CWE-117 | 4 | Argument 0 (String str) is logged via Log.debug(), which can lead to log injection if the string contains newline or other formatting characters. |
| org.apache.axis2.context.externalize | DebugObjectOutputStream | writeBytes | sink | CWE-117 | 4 | Argument 0 (String s) is logged via Log.debug(), which can lead to log injection if the string contains newline or other formatting characters. |
| org.apache.axis2.context.externalize | DebugObjectOutputStream | writeChars | sink | CWE-117 | 4 | Argument 0 (String s) is logged via Log.debug(), which can lead to log injection if the string contains newline or other formatting characters. |
| org.apache.axis2.context.externalize | DebugObjectOutputStream | writeObject | sink | CWE-117 | 4 | Argument 0 (Object obj) is passed to valueName() and the result is logged via Log.debug(), which can lead to log injection if the object's string representation contains newline or formatting characters. |
| org.apache.axis2.context.externalize | MessageExternalizeUtils | readExternal | sink | CWE-611 | 4 | Argument 0 (ObjectInput in) is read and its content is parsed as XML via OMXMLBuilderFactory.createSOAPModelBuilder(InputStream, String). If the input contains untrusted XML, it could lead to XML External Entity (XXE) attacks if the underlying parser is not configured to disable external entities. |
| org.apache.axis2.context.externalize | ObjectInputStreamWithCL$ClassResolver | resolveClass | sink | CWE-470 | 4 | Argument 0 (className) is used to dynamically load a class via reflection. If an attacker can control this value, they could load arbitrary classes, leading to unsafe reflection / class instantiation vulnerabilities. |
| org.apache.axis2.context.externalize | SafeObjectInputStream | readObject | sink | CWE-502 | 5 | readObject() delegates to readObjectOverride(), performing Java object deserialization on the data in the underlying stream (this). Deserializing untrusted data can lead to arbitrary code execution. |
| org.apache.axis2.context.externalize | SafeObjectInputStream | readList | sink | CWE-502 | 5 | readList() calls ObjectInputStream.readObject() to deserialize objects from the underlying stream (this) into a List. Deserializing untrusted data can lead to arbitrary code execution. |
| org.apache.axis2.context.externalize | SafeObjectInputStream | readLinkedList | sink | CWE-502 | 5 | readLinkedList() delegates to readList(), which calls ObjectInputStream.readObject() to deserialize objects from the underlying stream (this). Deserializing untrusted data can lead to arbitrary code execution. |
| org.apache.axis2.context.externalize | SafeObjectInputStream | readArrayList | sink | CWE-502 | 5 | readArrayList() delegates to readList(), which calls ObjectInputStream.readObject() to deserialize objects from the underlying stream (this). Deserializing untrusted data can lead to arbitrary code execution. |
| org.apache.axis2.context.externalize | SafeObjectInputStream | readMap | sink | CWE-502 | 5 | readMap() calls ObjectInputStream.readObject() to deserialize key/value pairs from the underlying stream (this) into a Map. Deserializing untrusted data can lead to arbitrary code execution. |
| org.apache.axis2.context.externalize | SafeObjectInputStream | readHashMap | sink | CWE-502 | 5 | readHashMap() delegates to readMap(), which calls ObjectInputStream.readObject() to deserialize key/value pairs from the underlying stream (this). Deserializing untrusted data can lead to arbitrary code execution. |
| org.apache.axis2.classloader | MultiParentClassLoader | loadClass | sink | CWE-470 | 5 | Argument 0 (name) specifies the fully-qualified class name to load dynamically. The method delegates to ClassLoader.loadClass and URLClassLoader.findClass. If the class name is externally controlled, this enables unsafe reflection / dynamic class loading attacks. |
| org.apache.axis2.deployment.resolver | AARBasedWSDLLocator | getImportInputSource | sink | CWE-22 | 4 | The importLocation (arg 1) is used to construct a path to look up entries within a zip archive via ZipInputStream iteration and ZipEntry.getName() comparison. The parentLocation (arg 0) is used as a base URI via URI.create(). Both are used to resolve a path within the zip file without validation, enabling path traversal within the archive (CWE-22). |
| org.apache.axis2.deployment.resolver | AARBasedWSDLLocator | getImportInputSource | sink | CWE-918 | 4 | When importLocation (arg 1) is detected as an absolute URI by isAbsolute(), the method delegates to DefaultURIResolver.resolveEntity() which can fetch arbitrary external resources. The parentLocation (arg 0) is used as the base URI. This enables server-side request forgery if an attacker controls the import location. |
| org.apache.axis2.deployment.resolver | AARFileBasedURIResolver | resolveEntity | source | file | 4 | The method reads file content from a ZIP/AAR archive via ZipInputStream and returns it as an InputSource, or delegates to DefaultURIResolver.resolveEntity which may also read from files or network. The return value contains data freshly read from the filesystem. |
| org.apache.axis2.deployment.resolver | AARFileBasedURIResolver | resolveEntity | sink | CWE-22 | 4 | Arguments 1 (schemaLocation) and 2 (baseUri) are used to construct URIs via URI.create/URI.resolve and to look up entries in the ZIP archive. They are also passed to DefaultURIResolver.resolveEntity, which can resolve to arbitrary file paths, enabling path traversal. |
| org.apache.axis2.deployment.resolver | AARFileBasedURIResolver | resolveEntity | sink | CWE-918 | 4 | Arguments 1 (schemaLocation) and 2 (baseUri) are used to construct URIs that are passed to DefaultURIResolver.resolveEntity as a fallback, which can make network requests to attacker-controlled servers, enabling SSRF. |
| org.apache.axis2.deployment.resolver | WarBasedWSDLLocator | getImportInputSource | sink | CWE-22 | 4 | Arguments 0 (parentLocation) and 1 (importLocation) are used to construct a resource path via URI.create/URI.resolve and then load a resource via ClassLoader.getResourceAsStream. An attacker-controlled importLocation could traverse outside the expected resource directory. |
| org.apache.axis2.deployment.resolver | WarBasedWSDLLocator | getImportInputSource | sink | CWE-918 | 4 | Arguments 0 (parentLocation) and 1 (importLocation) are used to construct a URI which may be resolved via DefaultURIResolver.resolveEntity. If the import location is an absolute URL pointing to an attacker-controlled server, this enables SSRF. |
| org.apache.axis2.deployment.resolver | WarFileBasedURIResolver | resolveEntity | sink | CWE-22 | 4 | Arguments 1 (schemaLocation) and 2 (baseUri) are used to construct resource paths via URI.create/resolve and then passed to ClassLoader.getResourceAsStream(), enabling path traversal to read arbitrary classpath resources. |
| org.apache.axis2.deployment.resolver | WarFileBasedURIResolver | resolveEntity | sink | CWE-918 | 4 | Arguments 1 (schemaLocation) and 2 (baseUri) are used to construct URIs. For absolute URIs, the method delegates to DefaultURIResolver.resolveEntity() which can make network requests, enabling SSRF attacks. |
| org.apache.axis2.wsdl.util | WSDLDefinitionWrapper | WSDLDefinitionWrapper | sink | CWE-918 | 3 | Argument 1 (wURL) specifies the URL from which the WSDL definition may be re-read to reduce memory footprint. The class documentation states it manages WSDL definitions and may re-read them, and the constructor delegates to prepare(Definition, URL). An attacker-controlled URL could lead to SSRF. |
| org.apache.axis2.wsdl.util | WSDLDefinitionWrapper | WSDLDefinitionWrapper | sink | CWE-918 | 3 | Argument 1 (wURL) specifies the URL from which the WSDL definition may be re-read. An attacker-controlled URL could lead to SSRF. |
| org.apache.axis2.wsdl.util | WSDLDefinitionWrapper | WSDLDefinitionWrapper | sink | CWE-918 | 3 | Argument 1 (wURL) specifies the URL from which the WSDL definition may be re-read. An attacker-controlled URL could lead to SSRF. |
| org.apache.axis2.wsdl.util | WSDLDefinitionWrapper | WSDLDefinitionWrapper | sink | CWE-918 | 3 | Argument 1 (wURL) specifies the URL from which the WSDL definition may be re-read. An attacker-controlled URL could lead to SSRF. |
| org.apache.axis2.wsdl.util | WSDLWrapperReloadImpl | getTypes | sink | CWE-918 | 4 | This method calls loadDefinition() which uses the URL stored in the object state (this) to make a network request to reload the WSDL definition. If the stored URL is attacker-controlled, this enables SSRF. |
| org.apache.axis2.wsdl.util | WSDLWrapperReloadImpl | getDocumentationElement | sink | CWE-918 | 4 | This method calls loadDefinition() which uses the URL stored in the object state (this) to make a network request to reload the WSDL definition. If the stored URL is attacker-controlled, this enables SSRF. |
| org.apache.axis2.dispatchers | RequestURIBasedServiceDispatcher | findService | sink | CWE-117 | 4 | The method extracts the request URI from messageContext (arg 0) via getTo().getAddress(), and logs derived data via Log.debug(). Attacker-controlled data from the request URI can be injected into log entries without sanitization. |
| org.apache.axis2.client.async | AxisCallback | onMessage | source | remote | 4 | Callback method invoked when a response message is received from a remote service. The msgContext parameter (arg 0) carries the remote response data, making it a source of remote input. |
| org.apache.axis2.client.async | AxisCallback | onFault | source | remote | 4 | Callback method invoked when a fault message is received from a remote service. The msgContext parameter (arg 0) carries the remote fault response data, making it a source of remote input. |
| org.apache.axis2.handlers | AbstractHandler | flowComplete | source | remote | 3 | flowComplete is a handler lifecycle callback in the Axis2 SOAP web services framework. The MessageContext parameter (arg 0) carries data from remote SOAP requests, including the SOAP envelope, headers, and body. When subclasses override this method, the parameter is an entry point for remote data. |
| org.apache.axis2.client | OperationClient | getMessageContext | source | remote | 3 | After the operation client executes a web service call, getMessageContext returns a MessageContext containing the response from the remote service. This response data originates from outside the program boundary (a remote service endpoint). |
| org.apache.axis2.client | Options | readExternal | sink | CWE-502 | 5 | The readExternal method deserializes object state from an ObjectInput stream. Callees confirm it calls SafeObjectInputStream.readObject(), readArrayList(), readHashMap(), which reconstruct objects from the stream. Argument 0 (the ObjectInput stream) is the source of potentially untrusted serialized data, making this a deserialization sink. |
| org.apache.axis2.client | ServiceClient | ServiceClient | sink | CWE-918 | 5 | Argument 1 (wsdlURL) is used to fetch a WSDL document from a remote URL via AxisService.createClientSideAxisService, which can lead to SSRF if the URL is attacker-controlled. |
| org.apache.axis2.client | ServiceClient | sendReceive | sink | CWE-918 | 4 | The method sends a request to the endpoint configured in the object's state (this), which can lead to SSRF if the endpoint is attacker-controlled. |
| org.apache.axis2.client | ServiceClient | sendReceive | source | remote | 4 | The return value is the response from a remote web service, which constitutes externally-provided data entering the program. |
| org.apache.axis2.client | ServiceClient | sendReceive | sink | CWE-918 | 4 | The method sends a request to the endpoint configured in the object's state (this), which can lead to SSRF if the endpoint is attacker-controlled. |
| org.apache.axis2.client | ServiceClient | sendReceive | source | remote | 4 | The return value is the response from a remote web service, constituting externally-provided data entering the program. |
| org.apache.axis2.client | ServiceClient | fireAndForget | sink | CWE-918 | 4 | The method sends a request to the endpoint configured in the object's state (this), which can lead to SSRF if the endpoint is attacker-controlled. |
| org.apache.axis2.client | ServiceClient | fireAndForget | sink | CWE-918 | 4 | The method sends a request to the endpoint configured in the object's state (this), which can lead to SSRF if the endpoint is attacker-controlled. |
| org.apache.axis2.client | ServiceClient | sendRobust | sink | CWE-918 | 4 | The method sends a request to the endpoint configured in the object's state (this), which can lead to SSRF if the endpoint is attacker-controlled. |
| org.apache.axis2.client | ServiceClient | sendRobust | sink | CWE-918 | 4 | The method sends a request to the endpoint configured in the object's state (this), which can lead to SSRF if the endpoint is attacker-controlled. |
| org.apache.axis2.client | ServiceClient | sendReceiveNonBlocking | sink | CWE-918 | 4 | The method sends a request to the endpoint configured in the object's state (this), which can lead to SSRF if the endpoint is attacker-controlled. |
| org.apache.axis2.client | ServiceClient | sendReceiveNonBlocking | sink | CWE-918 | 4 | The method sends a request to the endpoint configured in the object's state (this), which can lead to SSRF if the endpoint is attacker-controlled. |
| org.apache.axis2.client | Stub | addHttpHeader | sink | CWE-113 | 4 | Arguments 1 (name) and 2 (value) are used to set an HTTP header on the message context. If attacker-controlled data flows into these, it can lead to HTTP header injection / response splitting. |
| org.apache.axis2.client | Stub | setServiceClientEPR | sink | CWE-918 | 4 | Argument 0 (address) is used to set the endpoint reference address for the service client, determining where outbound requests are sent. If attacker-controlled data flows in, it enables server-side request forgery. |
| org.apache.axis2.dataretrieval | AxisDataLocatorImpl | getData | sink | CWE-117 | 3 | The method extracts dialect and identifier strings from argument 0 (DataRetrievalRequest) via getDialect() and getIdentifier(), and the method directly calls Log.info(Object) and Log.info(Object, Throwable). Data from the request parameter likely flows to these log calls, making it a log injection sink. |
| org.apache.axis2.dataretrieval | BaseAxisDataLocator | outputInlineForm | source | file | 4 | The method reads file content via ServiceData.getFileContent(ClassLoader) and returns it as Data[]. The returned data contains content read from files on the filesystem/classpath, which is new data brought into the program. |
| org.apache.axis2.dataretrieval | BaseAxisDataLocator | getData | source | file | 4 | The method delegates to outputInlineForm (among others) which reads file content via ServiceData.getFileContent(ClassLoader). The return value Data[] contains data read from files, representing new data brought into the program from the filesystem. |
| org.apache.axis2.dataretrieval | DataRetrievalUtil | convertToOMElement | sink | CWE-611 | 4 | Argument 0 (servicexmlStream) is an InputStream parsed as XML via OMXMLBuilderFactory.createOMBuilder(InputStream). If the XML contains external entity declarations and the parser is not properly configured, this leads to XXE. |
| org.apache.axis2.dataretrieval | DataRetrievalUtil | buildOM | sink | CWE-22 | 4 | Argument 1 (file) is a file path relative to the Service Repository used to load a file via getInputStream(ClassLoader, String). If attacker-controlled, it can traverse to unexpected files. |
| org.apache.axis2.dataretrieval | DataRetrievalUtil | buildOM | sink | CWE-611 | 4 | Argument 1 (file) specifies a file whose content is loaded and then parsed as XML via convertToOMElement, which uses OMXMLBuilderFactory.createOMBuilder. If the file contains malicious XML with external entities, this can lead to XXE. |
| org.apache.axis2.dataretrieval | ServiceData | getFileContent | source | file | 4 | The return value is file content loaded via DataRetrievalUtil.buildOM(ClassLoader, String), which reads a file from the classloader. This brings new data from the filesystem into the program. |
| org.apache.axis2.dataretrieval | ServiceData | getFileContent | sink | CWE-22 | 4 | The file path used to load content via DataRetrievalUtil.buildOM comes from the object's state (this), which was populated from XML data in the constructor. If an attacker controls the ServiceData XML, they can specify arbitrary file paths, leading to path traversal. |
| org.apache.axis2.deployment.repository.util | ArchiveReader | buildServiceDescription | sink | CWE-22 | 4 | Argument 0 (filename) is used to access the filesystem — the method calls File.exists() and opens ZipInputStream to read from the file, allowing path traversal attacks. |
| org.apache.axis2.deployment.repository.util | ArchiveReader | buildServiceDescription | sink | CWE-611 | 4 | Argument 0 (in) is an InputStream that is parsed as XML via DescriptionBuilder.buildOM(). If the XML parser is not configured to disable external entities, this can lead to XXE attacks. |
| org.apache.axis2.deployment.repository.util | ArchiveReader | processServiceGroup | sink | CWE-22 | 4 | Argument 0 (filename) is used to access the filesystem — the method calls File.exists(), opens FileInputStream and ZipInputStream to read from the file, allowing path traversal attacks. |
| org.apache.axis2.deployment.repository.util | ArchiveReader | processFilesInFolder | sink | CWE-22 | 4 | Argument 0 (folder) is used to list and read files — the method calls File.listFiles(), FileInputStream, and File.toURI(), allowing path traversal if the folder path is attacker-controlled. |
| org.apache.axis2.deployment.repository.util | ArchiveReader | getAxisServiceFromWsdl | sink | CWE-611 | 4 | Argument 0 (in) is an InputStream containing WSDL XML that is parsed via XMLUtils.toOM(InputStream). If the XML parser is not configured to disable external entities, this can lead to XXE attacks. |
| org.apache.axis2.deployment.repository.util | ArchiveReader | buildServiceGroup | sink | CWE-611 | 4 | Argument 0 (zin) is an InputStream that is delegated to buildServiceDescription(InputStream, ConfigurationContext) which parses it as XML via DescriptionBuilder.buildOM(). This can lead to XXE if the parser is not hardened. |
| org.apache.axis2.deployment.repository.util | DeploymentFileData | setClassLoader | sink | CWE-22 | 4 | Argument 2 (File file) is used to create a ClassLoader via Utils.createClassLoader/Utils.getClassLoader (confirmed by callees). If the file path is attacker-controlled, path traversal sequences could cause code to be loaded from unintended filesystem locations. |
| org.apache.axis2.deployment.repository.util | WSInfoList | addWSInfoItem | sink | CWE-22 | 4 | Argument 0 (File) is used to determine the file path (via getAbsolutePath()) for deployment. An attacker-controlled file path could lead to path traversal, allowing deployment of services/modules from unexpected file system locations. |
| org.apache.axis2.deployment.repository.util | WSInfoList | addWSInfoItem | sink | CWE-22 | 4 | Argument 0 (URL) has its path extracted via getPath() and used for deployment. An attacker-controlled URL path could lead to path traversal, allowing deployment of services/modules from unexpected locations. |
| org.apache.axis2.deployment | AxisConfigBuilder | processTransportSenders | sink | CWE-470 | 4 | Argument 0 (Iterator of XML transport sender elements) provides class names that are loaded via Loader.loadClass() and instantiated via Class.newInstance(), enabling unsafe reflection / dynamic class instantiation. |
| org.apache.axis2.deployment | AxisConfigBuilder | processTransportReceivers | sink | CWE-470 | 4 | Argument 0 (Iterator of XML transport receiver elements) provides class names that are loaded via Loader.loadClass() and instantiated via Class.newInstance(), enabling unsafe reflection / dynamic class instantiation. |
| org.apache.axis2.deployment | AxisConfigBuilder | processMessageBuilders | sink | CWE-470 | 4 | Argument 0 (OMElement containing message builder XML config) provides class names that are dynamically loaded and instantiated via Class.newInstance() in the delegate method, enabling unsafe reflection. |
| org.apache.axis2.deployment | AxisConfigBuilder | processMessageFormatters | sink | CWE-470 | 4 | Argument 0 (OMElement containing message formatter XML config) provides class names that are dynamically loaded and instantiated via Class.newInstance() in the delegate method, enabling unsafe reflection. |
| org.apache.axis2.deployment | Deployer | undeploy | sink | CWE-22 | 3 | Argument 0 (fileName) specifies the file name/path to remove from the configuration. An attacker-controlled fileName could lead to path traversal, allowing removal of unintended files. |
| org.apache.axis2.deployment | Deployer | deploy | sink | CWE-22 | 3 | Argument 0 (deploymentFileData) contains file path and data information used to process and deploy a file into the configuration. Attacker-controlled deployment file data could lead to path traversal during file processing. |
| org.apache.axis2.deployment | DeploymentClassLoader | loadClass | sink | CWE-470 | 5 | Argument 0 (name) specifies the fully qualified class name to load dynamically. Callees confirm delegation to ClassLoader.loadClass and URLClassLoader.findClass. If attacker-controlled, this allows arbitrary class instantiation (unsafe reflection). |
| org.apache.axis2.deployment | DeploymentClassLoader | getResourceAsStream | sink | CWE-22 | 4 | Argument 0 (name) specifies the resource path to access. Callees show it resolves via getResource/findResource and opens with URL.openStream(). An attacker-controlled name could traverse paths to access unintended resources. |
| org.apache.axis2.deployment | DeploymentEngine | getFileList | source | remote | 5 | The method opens a URL stream (URL.openStream()), reads lines from it via BufferedReader.readLine(), and returns them as an ArrayList. The return value contains data read from an external URL. |
| org.apache.axis2.deployment | DeploymentEngine | getFileList | sink | CWE-918 | 5 | Argument 0 (fileListUrl) is a URL used to open a stream (URL.openStream()), allowing an attacker to control the server-side request destination, leading to SSRF. |
| org.apache.axis2.deployment | DeploymentEngine | loadRepositoryFromURL | sink | CWE-918 | 4 | Argument 0 (repoURL) is used to load modules and services from a remote URL. The method delegates to getFileList which calls URL.openStream(), and to addURLToDeploy which fetches deployable artifacts from the URL. |
| org.apache.axis2.deployment | DeploymentEngine | loadServicesFromUrl | sink | CWE-918 | 4 | Argument 0 (repoURL) is used to load services from a remote URL. The method delegates to getFileList which calls URL.openStream(), and to addURLToDeploy which fetches deployable artifacts from the URL. |
| org.apache.axis2.deployment | DeploymentEngine | buildService | sink | CWE-611 | 4 | Argument 0 (serviceInputStream) is an InputStream that is parsed as XML via DescriptionBuilder.buildOM(). If the XML parser is not configured to disable external entities, this can lead to XXE attacks. |
| org.apache.axis2.deployment | DeploymentEngine | buildServiceGroup | sink | CWE-611 | 4 | Argument 0 (servicesxml) is an InputStream that is parsed as XML via ArchiveReader.buildServiceGroup(). If the XML parser is not configured to disable external entities, this can lead to XXE attacks. |
| org.apache.axis2.deployment | DeploymentEngine | populateAxisConfiguration | sink | CWE-611 | 4 | Argument 0 (in) is an InputStream parsed as axis2.xml via AxisConfigBuilder.populateConfig(). If the XML parser is not configured to disable external entities, this can lead to XXE attacks. |
| org.apache.axis2.deployment | DeploymentEngine | loadRepository | sink | CWE-22 | 4 | Argument 0 (repoDir) is used as a file system directory path. The method accesses the filesystem, creates File objects, sets up classloaders, and loads modules/services from the directory without path validation. |
| org.apache.axis2.deployment | DeploymentEngine | loadServiceGroup | sink | CWE-22 | 4 | Argument 0 (serviceFile) is a File used to load service groups, including creating class loaders and reading service configuration from the archive file. |
| org.apache.axis2.deployment | DeploymentEngine | buildModule | sink | CWE-22 | 4 | Argument 0 (modulearchive) is a File used to read module archives and set up class loaders, accessing the filesystem based on the File path. |
| org.apache.axis2.deployment | DeploymentEngine | prepareRepository | sink | CWE-22 | 4 | Argument 0 (repositoryName) is used as a directory path to create filesystem directories for modules and services. |
| org.apache.axis2.deployment | DeploymentEngine | setClassLoaders | sink | CWE-22 | 4 | Argument 0 (axis2repoURI) is used as a filesystem path to set up classloader hierarchy, accessing the filesystem to load classes from the specified repository location. |
| org.apache.axis2.deployment | DescriptionBuilder | findAndValidateSelectorClass | sink | CWE-470 | 5 | Argument 0 (className) is used to dynamically load a class via AccessController.doPrivileged, enabling unsafe reflection/class instantiation. |
| org.apache.axis2.deployment | DescriptionBuilder | processMessageBuilders | sink | CWE-470 | 5 | Argument 0 (messageBuildersElement) contains XML elements from which class names are extracted and dynamically loaded via findAndValidateSelectorClass and Class.newInstance(). |
| org.apache.axis2.deployment | DescriptionBuilder | processMessageFormatters | sink | CWE-470 | 5 | Argument 0 (messageFormattersElement) contains XML elements from which class names are extracted and dynamically loaded via findAndValidateSelectorClass and Class.newInstance(). |
| org.apache.axis2.deployment | DescriptionBuilder | loadMessageReceiver | sink | CWE-470 | 5 | Argument 1 (element) is an OMElement from which a class name is extracted via getAttribute and passed to Loader.loadClass followed by Class.newInstance(), enabling dynamic class loading. |
| org.apache.axis2.deployment | DescriptionBuilder | processMessageReceivers | sink | CWE-470 | 5 | Argument 0 (messageReceivers) is an OMElement from which class names are extracted and dynamically loaded via AccessController.doPrivileged. |
| org.apache.axis2.deployment | DescriptionBuilder | processMessageReceivers | sink | CWE-470 | 5 | Argument 1 (element) is an OMElement from which class names are extracted and dynamically loaded via delegation to loadMessageReceiver. |
| org.apache.axis2.deployment | DescriptionBuilder | buildOM | sink | CWE-611 | 4 | The object state (this) contains an InputStream set during construction. buildOM() parses this InputStream as XML via XMLUtils.toOM(InputStream), which may be vulnerable to XXE if the parser is not properly configured. |
| org.apache.axis2.deployment | FileSystemConfigurator | getAxisConfiguration | sink | CWE-22 | 4 | The method uses stored file paths (repoLocation, axis2xml) from object state to load files from the filesystem via DeploymentEngine.loadRepository(String) and Loader.getResourceAsStream(String). An attacker-controlled path could lead to path traversal. |
| org.apache.axis2.deployment | FileSystemConfigurator | getAxisConfiguration | sink | CWE-611 | 4 | The method parses an XML configuration file (axis2xml path stored in this) via DeploymentEngine.populateAxisConfiguration(InputStream). If the XML file is attacker-controlled, this could lead to XML External Entity (XXE) attacks. |
| org.apache.axis2.deployment | ModuleBuilder | ModuleBuilder | sink | CWE-611 | 3 | Argument 0 (serviceInputStream) is an InputStream containing XML data that is parsed to build a module description (OM = AXIOM XML Object Model). If the XML parser is not securely configured, this can lead to XXE (XML External Entity) attacks. |
| org.apache.axis2.deployment | ModuleDeployer | deploy | sink | CWE-22 | 4 | Argument 0 (DeploymentFileData) contains file paths used to access the filesystem: the method calls File.isDirectory(), DeploymentFileData.getFile(), readModuleArchive(), and setClassLoader() with a File argument — all derived from the deployment file data's path, enabling path traversal attacks. |
| org.apache.axis2.deployment | ModuleDeployer | deoloyFromUrl | sink | CWE-918 | 4 | Argument 0 (DeploymentFileData) contains a URL (obtained via getUrl()) that is used to fetch remote resources and create a class loader via Utils.createClassLoader. An attacker controlling this URL could cause the server to make requests to arbitrary destinations (SSRF). |
| org.apache.axis2.deployment | POJODeployer | deploy | sink | CWE-94 | 4 | Argument 0 (DeploymentFileData) provides the file from which classes are dynamically loaded and executed. The method creates ClassLoaders from the file's URL (Utils.createClassLoader), loads classes (Utils.getListOfClasses), and deploys them as services. If the deployment file data is attacker-controlled, arbitrary code can be loaded and executed. |
| org.apache.axis2.deployment | RepositoryListener | findServicesInDirectory | sink | CWE-22 | 4 | Argument 0 (root) specifies a directory that the method directly accesses via File.exists(), File.listFiles(), File.isDirectory(), and recursively traverses. If attacker-controlled, the root path could point to arbitrary directories on the filesystem, enabling path traversal. |
| org.apache.axis2.deployment | ServiceBuilder | populateService | sink | CWE-470 | 4 | Argument 0 (service_element) is an OMElement containing XML service configuration. The method extracts class names from this element and dynamically loads/instantiates them via loadServiceLifeCycleClass, loadObjectSupplierClass, and processMessageReceivers. If the XML content is attacker-controlled, arbitrary classes can be instantiated. |
| org.apache.axis2.deployment | ServiceDeployer | deploy | sink | CWE-22 | 4 | Argument 0 (DeploymentFileData) contains file path information used to access the filesystem: the method calls File.isDirectory(), File.toURI(), buildServiceDescription(), and setClassLoader(), all based on paths from this argument. User-controlled paths could lead to path traversal. |
| org.apache.axis2.deployment | ServiceDeployer | deploy | sink | CWE-918 | 4 | Argument 0 (DeploymentFileData) can contain a URL used to fetch remote service content. The method delegates to deployFromUrl which makes outbound requests to the URL, potentially enabling SSRF. |
| org.apache.axis2.deployment | ServiceDeployer | deployFromUrl | sink | CWE-918 | 4 | Argument 0 (DeploymentFileData) contains a URL extracted via getUrl() and used to fetch remote service definitions via populateService, enabling SSRF if URL is user-controlled. |
| org.apache.axis2.deployment | ServiceDeployer | deployFromUrl | sink | CWE-918 | 3 | Argument 1 (servicesURL) specifies a remote URL from which to deploy a service. If user-controlled, this enables SSRF attacks. |
| org.apache.axis2.deployment | ServiceDeployer | populateService | sink | CWE-918 | 3 | Argument 1 (servicesURL) specifies the URL from which service content is fetched and populated. If user-controlled, this enables SSRF attacks. |
| org.apache.axis2.deployment | TransportDeployer | deploy | sink | CWE-22 | 4 | Argument 0 (DeploymentFileData) provides a file path that is used to access the filesystem. The method calls getFile(), File.isDirectory(), getResourceAsStream(), and setClassLoader() based on the file path contained in the DeploymentFileData object. An attacker who controls this file path could traverse the filesystem to deploy arbitrary files. |
| org.apache.axis2.deployment | URLBasedAxisConfigurator | getAxisConfiguration | sink | CWE-918 | 4 | The method uses URLs stored in the object state (set via constructor) to make network requests via URL.openStream() and loadRepositoryFromURL(). If the URLs are attacker-controlled, this enables Server-Side Request Forgery. |
| org.apache.axis2.deployment | WSDLServiceBuilderExtension | buildAxisServices | sink | CWE-611 | 4 | Argument 0 (DeploymentFileData) is passed to ArchiveReader.processWSDLs() which parses WSDL (XML) documents. If the WSDL content is untrusted, this can lead to XML External Entity (XXE) attacks during XML parsing. |
| org.apache.axis2.deployment.util | ExcludeInfo | getBeanExcludeInfoForClass | sink | CWE-1333 | 4 | The method iterates over stored regex patterns (from the object's map keys, set via putBeanInfo) and uses them in String.matches(), which compiles and executes the regex. If an attacker can influence the regex patterns stored in this object, they can craft a malicious regex causing ReDoS. The object state (`this`) contains the regex patterns used in the matching operation. |
| org.apache.axis2.deployment.util | PhasesInfo | makePhase | sink | CWE-470 | 4 | Argument 0 (phaseElement) is an XML element from which attribute values (e.g., handler class names) are extracted via getAttributeValue and used to create handlers via makeHandler. This follows the Axis2 pattern of instantiating handler classes from deployment XML, which constitutes unsafe reflection if the XML content is attacker-controlled. |
| org.apache.axis2.deployment.util | TempFileManager | createTempFile | sink | CWE-22 | 4 | Arguments 0 (prefix) and 1 (suffix) are passed directly to File.createTempFile(String, String, File) to construct the filename of a newly created temporary file. If an attacker controls these values and includes path separator characters or '..' sequences, files could be created outside the intended temporary directory, leading to path traversal. |
| org.apache.axis2.deployment.util | Utils | loadHandler | sink | CWE-470 | 4 | Argument 1 (desc) provides the class name via getClassName(), which is used with Loader.loadClass() to dynamically load and instantiate an arbitrary class. If the class name is attacker-controlled, this allows unsafe reflection / arbitrary class instantiation. |
| org.apache.axis2.deployment.util | Utils | addFlowHandlers | sink | CWE-470 | 4 | Argument 0 (flow) provides handler descriptions whose class names are used to dynamically load and instantiate handler classes via getHandlerClass(). If the flow configuration contains attacker-controlled class names, arbitrary classes could be instantiated. |
| org.apache.axis2.deployment.util | Utils | getClassLoader | sink | CWE-22 | 4 | Argument 1 (file) specifies a filesystem directory from which classes and JARs are loaded. If this path is attacker-controlled, it allows path traversal to load code from arbitrary directories. |
| org.apache.axis2.deployment.util | Utils | getClassLoader | sink | CWE-22 | 4 | Argument 1 (path) is a string path that is used to access a filesystem directory to load classes and JARs. Delegates to the File-based overload. If attacker-controlled, allows path traversal. |
| org.apache.axis2.deployment.util | Utils | createClassLoader | sink | CWE-22 | 4 | Argument 0 (serviceFile) specifies a filesystem location from which classes are loaded. If attacker-controlled, allows path traversal to load from arbitrary directories. |
| org.apache.axis2.deployment.util | Utils | createClassLoader | sink | CWE-918 | 4 | Argument 0 (archiveUrl) is used to open a stream via getURLsForAllJars, which calls url.openStream(). If attacker-controlled, this enables SSRF by making requests to arbitrary URLs. |
| org.apache.axis2.deployment.util | Utils | getURLsForAllJars | sink | CWE-918 | 4 | Argument 0 (url) is opened with url.openStream() to read its content as a ZIP stream. If the URL is attacker-controlled, this enables SSRF attacks. |
| org.apache.axis2.deployment.util | Utils | createTempFile | sink | CWE-22 | 4 | Argument 0 (suffix) is used in temp file name creation via TempFileManager.createTempFile(). If the suffix contains path separators or traversal characters, it could cause file creation outside the intended temp directory. |
| org.apache.axis2.addressing | EndpointReference | readExternal | sink | CWE-502 | 5 | Argument 0 (ObjectInput inObject) is deserialized using SafeObjectInputStream.readObject(), readUTF(), readInt(), and OMXMLBuilderFactory.createOMBuilder(InputStream). Deserializing untrusted data from this stream can lead to arbitrary code execution. |
| org.apache.axis2.addressing | EndpointReferenceHelper | fromString | sink | CWE-611 | 4 | Argument 0 (eprString) is parsed as XML via AXIOMUtil.stringToOM(String), which performs XML parsing. If the underlying parser is not configured to disable external entities/DTDs, this can lead to XXE attacks. |
| org.apache.axis2.addressing | RelatesTo | readExternal | sink | CWE-502 | 5 | Argument 0 is an ObjectInput stream from which the method deserializes objects via SafeObjectInputStream.readObject(). Deserializing untrusted data can lead to arbitrary code execution. |
| org.apache.axis2.description.java2wsdl | DefaultSchemaGenerator | DefaultSchemaGenerator | sink | CWE-470 | 5 | Argument 1 (className) is passed to Class.forName(String, boolean, ClassLoader) to dynamically load a class. If className is externally controlled, this allows arbitrary class loading (CWE-470). |
| org.apache.axis2.description.java2wsdl | DefaultSchemaGenerator | generateSchema | sink | CWE-470 | 4 | The method loads classes via Class.forName using extra class names stored in the object state (set via setExtraClasses). If those class names are externally controlled, this enables arbitrary class loading (CWE-470). |
| org.apache.axis2.description.java2wsdl | DefaultSchemaGenerator | generateSchema | sink | CWE-22 | 3 | The method calls loadCustomSchemaFile() and loadMappingFile(), which likely read files from paths stored in object state (customSchemaLocation and mappingFileLocation set via setters). If those paths are externally controlled, this enables path traversal (CWE-22). |
| org.apache.axis2.description.java2wsdl | DocLitBareSchemaGenerator | DocLitBareSchemaGenerator | sink | CWE-470 | 5 | The constructor passes the `className` argument (arg 1) to the parent class DefaultSchemaGenerator, which calls `Class.forName(className, ..., loader)` to dynamically load the class. If `className` is attacker-controlled, arbitrary classes can be loaded, constituting an unsafe reflection vulnerability. |
| org.apache.axis2.description.java2wsdl | Java2WSDLUtils | getPackageName | sink | CWE-470 | 5 | Argument 0 (className) is passed to Class.forName(String, boolean, ClassLoader) for dynamic class loading, which can allow an attacker to load arbitrary classes. |
| org.apache.axis2.description.java2wsdl | Java2WSDLUtils | namespaceFromClassName | sink | CWE-470 | 5 | Argument 0 (className) is passed to Class.forName(String, boolean, ClassLoader) for dynamic class loading, which can allow an attacker to load arbitrary classes. |
| org.apache.axis2.description.java2wsdl | Java2WSDLUtils | namespaceFromClassName | sink | CWE-470 | 5 | Argument 0 (className) is delegated to the 3-arg namespaceFromClassName which calls Class.forName for dynamic class loading. |
| org.apache.axis2.description.java2wsdl | Java2WSDLUtils | targetNamespaceFromClassName | sink | CWE-470 | 4 | Argument 0 (packageName) is passed to namespaceFromClassName which calls Class.forName for dynamic class loading. |
| org.apache.axis2.description.java2wsdl | Java2WSDLUtils | schemaNamespaceFromClassName | sink | CWE-470 | 4 | Argument 0 (packageName) is delegated through to namespaceFromClassName which calls Class.forName for dynamic class loading. |
| org.apache.axis2.description.java2wsdl | Java2WSDLUtils | schemaNamespaceFromClassName | sink | CWE-470 | 4 | Argument 0 (packageName) is delegated to namespaceFromClassName which calls Class.forName for dynamic class loading. |
| org.apache.axis2.description | AxisService | loadDataLocator | sink | CWE-470 | 5 | Argument 0 (className) is used in Class.forName() followed by newInstance(), allowing arbitrary class instantiation via unsafe reflection. |
| org.apache.axis2.description | AxisService | createService | sink | CWE-470 | 5 | Argument 0 (implClass) is used in Loader.loadClass() to dynamically load and instantiate a class, allowing unsafe reflection. |
| org.apache.axis2.description | AxisService | createService | sink | CWE-470 | 4 | Argument 0 (implClass) is used to create a SchemaGenerator which loads the class, enabling unsafe reflection via dynamic class instantiation. |
| org.apache.axis2.description | AxisService | createClientSideAxisService | sink | CWE-918 | 5 | Argument 0 (wsdlURL) is used to open a network connection via URL.openConnection() and URLConnection.getInputStream(), enabling server-side request forgery. |
| org.apache.axis2.description | AxisService | createClientSideAxisService | sink | CWE-611 | 4 | Argument 0 (wsdlURL) provides XML content that is fetched and parsed via XMLUtils.newDocument and WSDLReader.readWSDL. Attacker-controlled XML parsed with a potentially misconfigured parser could lead to XXE. |
| org.apache.axis2.description | Flow | getHandler | sink | CWE-129 | 4 | Argument 0 (index) is used directly in a List.get(int) call without validation, which can lead to an IndexOutOfBoundsException if the index is attacker-controlled. |
| org.apache.axis2.description | Parameter | readExternal | sink | CWE-502 | 5 | The method readExternal deserializes objects from the provided ObjectInput stream. It delegates to SafeObjectInputStream.readObject(), which performs Java deserialization. If the input stream contains untrusted data, this can lead to arbitrary code execution via deserialization gadget chains (CWE-502). Argument 0 (inObject) is the sink. |
| org.apache.axis2.description | ParameterIncludeImpl | readExternal | sink | CWE-502 | 4 | readExternal deserializes data from the ObjectInput parameter (arg 0), using SafeObjectInputStream to call readMap, readInt, readLong. This is a deserialization sink where untrusted data from the stream is used to reconstruct object state. |
| org.apache.axis2.description | WSDL11ToAllAxisServicesBuilder | WSDL11ToAllAxisServicesBuilder | sink | CWE-611 | 4 | Argument 0 is an InputStream containing WSDL XML data. This data is stored and later parsed as XML when populateAllServices() or populateService() is called, which can lead to XXE if the XML parser is not securely configured. |
| org.apache.axis2.description | WSDL11ToAllAxisServicesBuilder | populateAllServices | sink | CWE-611 | 4 | The object state (this) contains WSDL XML data (stored via the constructor). This method triggers XML parsing via setup() and populateService(), which can lead to XXE vulnerabilities if the underlying XML parser is not securely configured. |
| org.apache.axis2.description | WSDL11ToAxisServiceBuilder | populateService | sink | CWE-611 | 4 | populateService() parses WSDL XML content from the InputStream stored in `this` (set by constructors). It calls setup(), processTypes(), getXMLSchema(), and generateWrapperSchema() — all XML processing operations. Parsing untrusted WSDL/XML without secure parser configuration can lead to XXE attacks. |
| org.apache.axis2.description | WSDL11ToAxisServiceBuilder | populateService | sink | CWE-918 | 3 | populateService() processes WSDL which may contain import/include statements with external URLs. The WSDL content and document base URI are stored in `this`. During WSDL parsing and import resolution (via setup(), getParentDefinition(), Definition.getImports()), the server may fetch attacker-controlled URLs, leading to SSRF. |
| org.apache.axis2.description | WSDLToAxisServiceBuilder | getXMLSchema | sink | CWE-918 | 4 | Argument 1 (baseUri) is passed to XmlSchemaCollection.setBaseUri() which controls where XML schemas are resolved from when XmlSchemaCollection.read() is subsequently called. An attacker-controlled baseUri could lead to server-side request forgery by causing the server to make requests to arbitrary URLs during schema resolution. |
| org.apache.axis2.kernel | OutTransportInfo | setContentType | sink | CWE-113 | 4 | Argument 0 (contentType) is used to set the Content-Type HTTP response header. If user-controlled data containing CRLF characters is passed, it can lead to HTTP response splitting. |
| org.apache.axis2.kernel | SimpleAxis2Server | main | source | commandargs | 5 | The args parameter (index 0) receives command-line arguments, which are an external source of data entering the program. Callees confirm these are parsed via CommandLineOptionParser. |
| org.apache.axis2.kernel | SimpleAxis2Server | SimpleAxis2Server | sink | CWE-22 | 4 | Arguments 0 (repoLocation) and 1 (confLocation) are file system paths passed directly to ConfigurationContextFactory.createConfigurationContextFromFileSystem, which accesses files/directories at the specified paths. Untrusted input could lead to path traversal. |
| org.apache.axis2.kernel | TransportUtils | createDocumentElement | sink | CWE-611 | 5 | Argument 2 (inStream) is parsed as XML via Builder.processDocument, which can lead to XXE if the parser is not configured to disable external entity processing. |
| org.apache.axis2.kernel | TransportUtils | createDocumentElement | sink | CWE-611 | 5 | Argument 3 (inStream) is parsed as XML via Builder.processDocument, which can lead to XXE if the parser is not configured to disable external entity processing. |
| org.apache.axis2.kernel | TransportUtils | createSOAPMessage | sink | CWE-611 | 5 | Argument 1 (inStream) flows into createDocumentElement which parses XML via Builder.processDocument, potentially leading to XXE. |
| org.apache.axis2.kernel | TransportUtils | createSOAPMessage | sink | CWE-611 | 5 | Argument 1 (inStream) flows into createDocumentElement which parses XML via Builder.processDocument, potentially leading to XXE. |
| org.apache.axis2.kernel | TransportUtils | createSOAPMessage | sink | CWE-611 | 4 | Argument 0 (msgContext) carries an InputStream that is extracted and parsed as XML, which may lead to XXE. |
| org.apache.axis2.kernel | TransportUtils | createSOAPMessage | sink | CWE-611 | 4 | Argument 0 (msgContext) carries an InputStream that is extracted and parsed as XML, potentially leading to XXE. |
| org.apache.axis2.util | CallbackReceiver | receive | source | remote | 4 | The receive() method is a framework callback (MessageReceiver) invoked by the Axis2 framework when a SOAP response message arrives from a remote service. The msgContext parameter (arg 0) carries data received from the network, including remote message contents, headers, etc. Callees confirm it dispatches to onMessage/onFault with this context. |
| org.apache.axis2.util | FileWriter | createClassFile | sink | CWE-22 | 4 | Arguments 0-3 are used to construct a filesystem path: rootLocation is the base, packageName is split to create subdirectories, fileName and extension form the file name. No path traversal validation is performed, allowing an attacker to escape the intended directory via '..' sequences in packageName, fileName, or extension. |
| org.apache.axis2.util | Loader | loadClass | sink | CWE-470 | 5 | Argument 0 specifies the class name to load dynamically via Class.forName() and ClassLoader.loadClass(). If attacker-controlled, this allows instantiation of arbitrary classes, leading to unsafe reflection. |
| org.apache.axis2.util | Loader | loadClass | sink | CWE-470 | 5 | Argument 1 (clazz) specifies the class name to load dynamically via ClassLoader.loadClass() and falls back to the other loadClass(String) which uses Class.forName(). If attacker-controlled, this allows instantiation of arbitrary classes, leading to unsafe reflection. |
| org.apache.axis2.util | LogWriter | write | sink | CWE-117 | 4 | Argument 0 (cbuf) is character data that gets appended to a buffer and then written to a Log instance via flushLineBuffer(). Unsanitized user input passed here can result in log injection (forged log entries via newline characters, etc.). |
| org.apache.axis2.util | MessageContextBuilder | createFaultMessageContext | sink | CWE-209 | 4 | Argument 1 (Throwable e) is used to create a SOAP fault envelope via createFaultEnvelope(), which packages the exception's error message and stack trace information into a response message context that will be sent back to the client. This exposes potentially sensitive error details (server paths, SQL queries, internal class names) to remote users. |
| org.apache.axis2.util | MetaDataEntry | readExternal | sink | CWE-502 | 5 | The readExternal method deserializes objects from the ObjectInput stream (argument 0). Callees show it calls SafeObjectInputStream.readObject() and readArrayList(), which perform object deserialization. If the ObjectInput contains untrusted data, this can lead to arbitrary code execution via deserialization attacks. |
| org.apache.axis2.util | ObjectStateUtils | readObject | sink | CWE-502 | 5 | Argument 0 (ObjectInput) is deserialized via SafeObjectInputStream.readObject(), which performs Java object deserialization. Untrusted data in the stream could lead to arbitrary code execution. |
| org.apache.axis2.util | ObjectStateUtils | readLinkedList | sink | CWE-502 | 5 | Argument 0 (ObjectInput) is deserialized via SafeObjectInputStream.readLinkedList(). Untrusted data in the stream could lead to deserialization attacks. |
| org.apache.axis2.util | ObjectStateUtils | readArrayList | sink | CWE-502 | 5 | Argument 0 (ObjectInput) is deserialized via SafeObjectInputStream.readArrayList(). Untrusted data in the stream could lead to deserialization attacks. |
| org.apache.axis2.util | ObjectStateUtils | readHashMap | sink | CWE-502 | 5 | Argument 0 (ObjectInput) is deserialized via SafeObjectInputStream.readHashMap(). Untrusted data in the stream could lead to deserialization attacks. |
| org.apache.axis2.util | ObjectStateUtils | readString | sink | CWE-502 | 5 | Argument 0 (ObjectInput) is deserialized via SafeObjectInputStream.readObject(). Even though the method returns a String, the underlying deserialization uses readObject() which can trigger gadget chains on untrusted data. |
| org.apache.axis2.util | OnDemandLogger | info | sink | CWE-117 | 5 | Argument 0 is the log message passed to org.apache.commons.logging.Log.info(). Unsanitized user input in this argument can lead to log injection. |
| org.apache.axis2.util | OnDemandLogger | info | sink | CWE-117 | 5 | Argument 0 is the log message passed to org.apache.commons.logging.Log.info(). Unsanitized user input in this argument can lead to log injection. |
| org.apache.axis2.util | OnDemandLogger | debug | sink | CWE-117 | 4 | Argument 0 is the log message passed to the underlying logging framework. Unsanitized user input can lead to log injection. |
| org.apache.axis2.util | OnDemandLogger | debug | sink | CWE-117 | 4 | Argument 0 is the log message passed to the underlying logging framework. Unsanitized user input can lead to log injection. |
| org.apache.axis2.util | OnDemandLogger | error | sink | CWE-117 | 4 | Argument 0 is the log message passed to the underlying logging framework. Unsanitized user input can lead to log injection. |
| org.apache.axis2.util | OnDemandLogger | error | sink | CWE-117 | 4 | Argument 0 is the log message passed to the underlying logging framework. Unsanitized user input can lead to log injection. |
| org.apache.axis2.util | OnDemandLogger | trace | sink | CWE-117 | 4 | Argument 0 is the log message passed to the underlying logging framework. Unsanitized user input can lead to log injection. |
| org.apache.axis2.util | OnDemandLogger | trace | sink | CWE-117 | 4 | Argument 0 is the log message passed to the underlying logging framework. Unsanitized user input can lead to log injection. |
| org.apache.axis2.util | OnDemandLogger | warn | sink | CWE-117 | 5 | Argument 0 is the log message passed to org.apache.commons.logging.Log.warn(). Unsanitized user input can lead to log injection. |
| org.apache.axis2.util | OnDemandLogger | warn | sink | CWE-117 | 5 | Argument 0 is the log message passed to org.apache.commons.logging.Log.warn(). Unsanitized user input can lead to log injection. |
| org.apache.axis2.util | OnDemandLogger | fatal | sink | CWE-117 | 4 | Argument 0 is the log message passed to the underlying logging framework. Unsanitized user input can lead to log injection. |
| org.apache.axis2.util | OnDemandLogger | fatal | sink | CWE-117 | 4 | Argument 0 is the log message passed to the underlying logging framework. Unsanitized user input can lead to log injection. |
| org.apache.axis2.util | OptionsParser | getUser | source | commandargs | 4 | Returns the user value parsed from command-line arguments. Delegates to isValueSet() which extracts the value from the stored args array. |
| org.apache.axis2.util | OptionsParser | getPassword | source | commandargs | 4 | Returns the password value parsed from command-line arguments. Delegates to isValueSet() which extracts the value from the stored args array. |
| org.apache.axis2.util | OptionsParser | isValueSet | source | commandargs | 4 | Returns the value of a command-line option flag, parsed from the stored args array. This is direct access to command-line argument data. |
| org.apache.axis2.util | OptionsParser | getRemainingArgs | source | commandargs | 4 | Returns an array of non-option arguments from the command line. These are unused args that were not parsed as flags. |
| org.apache.axis2.util | OptionsParser | getRemainingFlags | source | commandargs | 3 | Returns a string of unprocessed command-line flags. These are derived from command-line arguments passed to the constructor. |
| org.apache.axis2.util | SecureWSDLLocator | getImportInputSource | sink | CWE-918 | 4 | Arguments 0 (parentLocation) and 1 (importLocation) are used together to resolve a URI via resolveURI(), then the resolved URI is fetched via createSecureInputSource(). An attacker controlling importLocation (e.g., from a malicious WSDL import statement) can cause the server to make requests to arbitrary URLs, enabling SSRF. |
| org.apache.axis2.util | SecureWSDLLocator | getBaseInputSource | sink | CWE-918 | 4 | The method uses the base URI stored in the object state (this) to fetch content via createSecureInputSource(). If the base URI was set from user-controlled input in the constructor, this enables SSRF by making the server fetch content from an attacker-controlled URL. |
| org.apache.axis2.util | Utils | getNewConfigurationContext | sink | CWE-22 | 4 | Argument 0 (repositry) is used as a filesystem path. Callees show it creates a File from this string and passes it to ConfigurationContextFactory.createConfigurationContextFromFileSystem, which reads from the filesystem based on this path. An attacker-controlled path could lead to path traversal. |
| org.apache.axis2.util | Utils | createServiceObject | sink | CWE-470 | 4 | Argument 0 (service) carries configuration parameters (SERVICE_CLASS, SERVICE_OBJECT_SUPPLIER) that are used to reflectively load and instantiate classes via Loader.loadClass. If the service configuration contains attacker-controlled class names, this allows instantiation of arbitrary classes. |
| org.apache.axis2.util | Utils | getServiceClass | sink | CWE-470 | 3 | Argument 0 (service) carries configuration parameters (SERVICE_CLASS, SERVICE_OBJECT_SUPPLIER) used to reflectively load a class. Documentation states the method loads a class based on these parameters, which could allow loading of arbitrary classes if configuration is attacker-controlled. |
| org.apache.axis2.util | WrappedDataHandler | getBean | sink | CWE-470 | 4 | Argument 0 (cmdinfo) determines which class is instantiated by DataHandler.getBean(). If the CommandInfo is attacker-controlled, arbitrary classes could be instantiated, leading to unsafe reflection. |
| org.apache.axis2.util | XMLPrettyPrinter | prettify | sink | CWE-22 | 4 | Argument 0 (File) is used to construct FileInputStream and FileOutputStream, reading from and writing to the specified file path. If the file path is attacker-controlled, this enables path traversal to read or modify arbitrary files. |
| org.apache.axis2.util | XMLPrettyPrinter | prettify | sink | CWE-611 | 4 | Argument 0 (File) provides XML content that is read and processed through javax.xml.transform.TransformerFactory without apparent DTD disabling configuration. This can lead to XXE attacks if the file contains malicious XML with external entity references. |
| org.apache.axis2.util | XMLUtils | newDocument | sink | CWE-918 | 4 | Argument 0 (uri) is used to fetch a remote or local resource and the fetched content is then parsed as XML via DocumentBuilder.parse(). This makes arg 0 a sink for SSRF. |
| org.apache.axis2.util | XMLUtils | newDocument | sink | CWE-611 | 4 | Argument 0 (uri) determines the XML content parsed by DocumentBuilder. If the URI is attacker-controlled, malicious XML with external entities could be parsed, leading to XXE. |
| org.apache.axis2.util | XMLUtils | newDocument | sink | CWE-611 | 5 | Argument 0 (inp) is an InputStream whose content is parsed as XML via DocumentBuilder.parse(). If the InputStream contains untrusted XML, this is an XXE sink. |
| org.apache.axis2.util | XMLUtils | newDocument | sink | CWE-918 | 4 | Argument 0 (uri) is used to fetch a remote or local resource. If the URI is attacker-controlled, this enables SSRF attacks. |
| org.apache.axis2.util | XMLUtils | newDocument | sink | CWE-611 | 4 | Argument 0 (uri) determines the XML content that gets parsed by DocumentBuilder. If the URI is attacker-controlled, malicious XML could exploit XXE. |
| org.apache.axis2.util | XMLUtils | newDocument | sink | CWE-611 | 5 | Argument 0 (inp) is the InputSource parsed by DocumentBuilder.parse(InputSource). If it contains untrusted XML, this is an XXE sink. |
| org.apache.axis2.util | XMLUtils | toOM | sink | CWE-611 | 4 | Argument 0 (inputStream) is parsed as XML via OMXMLBuilderFactory.createOMBuilder(). If the InputStream contains untrusted XML, XXE attacks are possible. |
| org.apache.axis2.util | XMLUtils | toOM | sink | CWE-611 | 4 | Argument 0 (reader) is parsed as XML via OMXMLBuilderFactory.createOMBuilder(). If the Reader provides untrusted XML, XXE attacks are possible. |
| org.apache.axis2.util | XMLUtils | toOM | sink | CWE-611 | 5 | Argument 0 (reader) is parsed as XML via OMXMLBuilderFactory.createOMBuilder(Reader). If the Reader provides untrusted XML, XXE attacks are possible. |
| org.apache.axis2.util | XMLUtils | toOM | sink | CWE-611 | 5 | Argument 0 (inputStream) is parsed as XML via OMXMLBuilderFactory.createOMBuilder(InputStream). If the InputStream contains untrusted XML, XXE attacks are possible. |
| org.apache.axis2.util | XMLUtils | getInputSourceFromURI | sink | CWE-918 | 4 | Argument 0 (uri) is used to fetch a resource. If the URI is attacker-controlled, this enables SSRF attacks. |
| org.apache.axis2.util | XMLUtils | initSAXFactory | sink | CWE-470 | 5 | Argument 0 (factoryClassName) is used to dynamically load and instantiate a class via Loader.loadClass() and Class.newInstance(). If attacker-controlled, this enables arbitrary class instantiation. |
| org.apache.axis2.builder | ApplicationXMLBuilder | processDocument | sink | CWE-611 | 4 | Argument 0 (inputStream) is parsed as XML via BuilderUtil.createPOXBuilder. If the XML parser is not configured to disable DTD/external entity processing, this leads to XXE vulnerabilities. |
| org.apache.axis2.builder | ApplicationXMLBuilder | processDocument | sink | CWE-776 | 4 | Argument 0 (inputStream) is parsed as XML via BuilderUtil.createPOXBuilder. Without proper entity expansion limits, this is vulnerable to XML entity expansion (billion laughs) denial-of-service attacks. |
| org.apache.axis2.builder | Builder | processDocument | sink | CWE-611 | 4 | Argument 0 is an InputStream containing raw XML/SOAP message data that will be parsed by an XML parser. If the parser is not safely configured, this can lead to XML External Entity (XXE) attacks. |
| org.apache.axis2.builder | Builder | processDocument | sink | CWE-776 | 4 | Argument 0 is an InputStream containing raw XML/SOAP message data that will be parsed. If the parser does not limit entity expansion, this can lead to XML Entity Expansion (billion laughs) denial-of-service attacks. |
| org.apache.axis2.builder | BuilderUtil | createSOAPModelBuilder | sink | CWE-611 | 4 | Argument 0 (InputStream) contains SOAP XML data that is parsed by delegating to OMXMLBuilderFactory.createSOAPModelBuilder. Unlike createPOXBuilder, the documentation does not mention any DTD/XXE protections, making the input stream a sink for XXE attacks. |
| org.apache.axis2.builder | DataSourceBuilder$ByteArrayDataSourceEx | getReader | sink | CWE-611 | 3 | The getReader() method creates an XMLStreamReader from the internal byte array stored in the object. XMLStreamReader is explicitly identified as a vulnerable XML parser type for XXE attacks. The byte array data (from `this`) is the untrusted input being parsed as XML. |
| org.apache.axis2.builder | DiskFileDataSource | getInputStream | source | remote | 3 | Returns an InputStream reading the content of an uploaded file (DiskFileItem). The file content originates from a remote HTTP multipart upload and is user-controlled. |
| org.apache.axis2.builder | DiskFileDataSource | getContentType | source | remote | 3 | Returns the content type of an uploaded file (DiskFileItem). The content type is user-controlled, supplied via the HTTP multipart Content-Type header. |
| org.apache.axis2.builder | DiskFileDataSource | getName | source | remote | 3 | Returns the name/filename of an uploaded file (DiskFileItem). The filename is user-controlled, originating from the HTTP multipart Content-Disposition header, and can lead to path traversal vulnerabilities if used unsanitized. |
| org.apache.axis2.builder | MIMEBuilder | processDocument | sink | CWE-611 | 4 | Argument 0 (inputStream) provides raw message content that is parsed as MIME and then delegated to XML/SOAP processing via Builder.processDocument or MIMEAwareBuilder.processMIMEMessage. This XML parsing can be vulnerable to XXE if the underlying parser is not securely configured. |
| org.apache.axis2.builder | MTOMBuilder | processDocument | sink | CWE-611 | 4 | Argument 0 (inputStream) contains raw XML/SOAP message content that is parsed by the method. Parsing untrusted XML without proper configuration can lead to XXE attacks. |
| org.apache.axis2.builder | MTOMBuilder | processMIMEMessage | sink | CWE-611 | 5 | Argument 0 (attachments) is a MIME message whose content is parsed as XML/SOAP via OMXMLBuilderFactory.createSOAPModelBuilder. Parsing untrusted XML without proper configuration can lead to XXE attacks. |
| org.apache.axis2.builder | MultipartFormDataBuilder | processDocument | source | remote | 4 | The method processes multipart form data from an HTTP request (via HttpServletRequest obtained from MessageContext) and returns an OMElement containing the parsed request data. The return value originates from remote user input. |
| org.apache.axis2.builder | SOAPBuilder | processDocument | sink | CWE-611 | 4 | Argument 0 (inputStream) is parsed as SOAP/XML via OMXMLBuilderFactory.createSOAPModelBuilder(InputStream, String). Parsing untrusted XML without disabling external entities can lead to XXE attacks. |
| org.apache.axis2.builder | SOAPBuilder | processMIMEMessage | sink | CWE-611 | 4 | Argument 0 (attachments) contains MIME data whose root part is extracted and parsed as SOAP/XML via delegation to processDocument. Parsing untrusted XML without disabling external entities can lead to XXE attacks. |
| org.apache.axis2.builder | XFormURLEncodedBuilder | processDocument | source | remote | 4 | This is an Apache Axis2 Builder interface method that processes HTTP request body data (x-www-form-urlencoded). It reads from the InputStream (HTTP request body), extracts form parameters, and returns an OMElement containing the parsed remote user input. |
| org.apache.axis2.engine | AxisConfiguration | deployModule | sink | CWE-22 | 4 | Argument 0 (moduleFileName) is used as a file path — the method calls File.exists() with this value and then DeploymentFileData.deploy() to deploy a module from the specified file. An attacker controlling this filename could traverse the filesystem to deploy modules from arbitrary locations. |
| org.apache.axis2.engine | AxisEngine | send | sink | CWE-918 | 4 | The send() method sends an outbound SOAP message over the network. It calls TransportOutDescription.getSender() and Handler.invoke(MessageContext) to transmit the message. The MessageContext (arg 0) determines the remote endpoint URL, making this a potential SSRF sink. |
| org.apache.axis2.engine | AxisEngine | sendFault | sink | CWE-918 | 4 | The sendFault() method sends a SOAP fault message to another SOAP node. It calls TransportOutDescription.getSender() and Handler.invoke(MessageContext) to transmit the message. The MessageContext (arg 0) determines the remote endpoint URL, making this a potential SSRF sink. |
| org.apache.axis2.engine | AxisEngine | resumeSend | sink | CWE-918 | 4 | The resumeSend() method resumes the send path and calls the TransportSender. It calls TransportOutDescription.getSender() and Handler.invoke(MessageContext). The MessageContext (arg 0) determines the remote endpoint URL, making this a potential SSRF sink. |
| org.apache.axis2.engine | AxisEngine | resumeSendFault | sink | CWE-918 | 4 | The resumeSendFault() method resumes the outbound fault flow. It calls TransportOutDescription.getSender() and Handler.invoke(MessageContext). The MessageContext (arg 0) determines the remote endpoint URL, making this a potential SSRF sink. |
| org.apache.axis2.engine | AxisEngine | resume | sink | CWE-918 | 4 | The resume() method delegates to resumeSend(MessageContext) when the flow is outbound, which sends a network request. The MessageContext (arg 0) determines the remote endpoint URL, making this a potential SSRF sink. |
| org.apache.axis2.engine | AxisServer | deployService | sink | CWE-470 | 5 | Argument 0 (serviceClassName) is a class name string passed to AxisService.createService which loads/instantiates the class reflectively. If attacker-controlled, this allows arbitrary class loading and instantiation (CWE-470: Use of Externally-Controlled Input to Select Classes or Code). |
| org.apache.axis2.engine | DefaultObjectSupplier | getObject | sink | CWE-470 | 5 | Argument 0 (clazz) is used to instantiate an object via Class.newInstance() and Constructor.newInstance(). If the Class parameter is derived from externally-controlled input, an attacker can instantiate arbitrary classes, leading to unsafe reflection / class instantiation vulnerabilities. |
| org.apache.axis2.engine | DependencyManager | makeNewServiceObject | sink | CWE-470 | 4 | Argument 0 (AxisService) contains configuration (e.g., class name) that is used by Utils.createServiceObject to instantiate a class. If the AxisService is influenced by attacker-controlled data, arbitrary classes could be instantiated. |
| org.apache.axis2.engine | DependencyManager | initService | sink | CWE-470 | 4 | Argument 0 (ServiceGroupContext) provides services whose class names are loaded via Loader.loadClass and instantiated via makeNewServiceObject. If the ServiceGroupContext is influenced by attacker-controlled data, arbitrary classes could be loaded and instantiated. |
| org.apache.axis2.engine | Handler | invoke | source | remote | 4 | The invoke() method is a framework callback in Apache Axis2's handler chain. It is called by the Axis2 engine when processing incoming SOAP/HTTP messages. The MessageContext parameter (argument 0) contains data from a remote client request, making it a source of remote data, analogous to HttpServlet.doGet/doPost. |
| org.apache.axis2.engine | Handler | flowComplete | source | remote | 4 | The flowComplete() method is a post-processing framework callback in Axis2's handler chain, called after message processing completes. The MessageContext parameter (argument 0) contains data from a remote client request, making it a source of remote data. |
| org.apache.axis2.engine | MessageReceiver | receive | source | remote | 4 | MessageReceiver.receive() is a framework callback invoked by the Apache Axis2 engine when a SOAP/HTTP message arrives over the network. The MessageContext parameter (argument 0) carries the incoming remote request data into the application, making it a source of remote data. |
| org.apache.axis2.engine | ObjectSupplier | getObject | sink | CWE-470 | 4 | Argument 0 (clazz) determines which class gets instantiated and returned. If attacker-controlled, this allows arbitrary class instantiation. The documentation confirms this method is used during deserialization to provide implementation classes for interfaces. |
## Ignored (low certainty) (2)
| Package | Class | Method | Type | Kind | Certainty | Reason | Why Ignored |
|---------|-------|--------|------|------|-----------|--------|-------------|
| org.apache.axis2.dataretrieval.client | MexClient | MexClient | sink | CWE-918 | 2 | Argument 1 (wsdlURL) specifies a URL from which the MexClient (Metadata Exchange client) fetches a WSDL, potentially initiating a server-side HTTP request. This can lead to SSRF if the URL is attacker-controlled. | No callees were found so I cannot confirm the constructor fetches the WSDL at construction time, but this matches the well-known Axis2 ServiceClient(configContext, wsdlURL, serviceName, portName) pattern which fetches the WSDL during initialization. Certainty is 2 because there's no direct tool evidence confirming the network call. |
| org.apache.axis2.description.java2wsdl | SchemaGenerator | generateSchema | sink | CWE-22 | 2 | generateSchema() uses stored file paths (mappingFileLocation, customSchemaLocation) from object state to read files during schema generation, which could lead to path traversal if the paths are user-controlled. | Based on naming conventions (mappingFileLocation, customSchemaLocation stored in object state) and domain knowledge of Apache Axis2's java2wsdl schema generation. No callee evidence available to confirm file I/O operations. |

View File

@@ -0,0 +1,8 @@
# THIS FILE IS AN AUTO-GENERATED MODELS AS DATA FILE. DO NOT EDIT.
# Generated from https://github.com/apache/axis-axis2-java-core.git#b7e6711279d38b7f0db3e648888de5154729e9a8 by codeql-mads-via-llm
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
- ["org.apache.axis2.transport.http.impl.httpclient5", "HTTPSenderImpl", True, "createRequest", "(MessageContext,String,URL,AxisRequestEntity)", "", "Argument[2]", "request-forgery", "ai-generated"]

View File

@@ -0,0 +1,42 @@
# THIS FILE IS AN AUTO-GENERATED MODELS AS DATA FILE. DO NOT EDIT.
# Generated from https://github.com/apache/axis-axis2-java-core.git#b7e6711279d38b7f0db3e648888de5154729e9a8 by codeql-mads-via-llm
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
- ["org.apache.axis2.transport.http", "AbstractAgent", True, "renderView", "(String,HttpServletRequest,HttpServletResponse)", "", "Argument[0]", "path-injection", "ai-generated"]
#- ["org.apache.axis2.transport.http", "AbstractAgent", True, "renderView", "(String,HttpServletRequest,HttpServletResponse)", "", "Argument[0]", "path-injection", "ai-generated"] # INVALID: Duplicate entry
- ["org.apache.axis2.transport.http", "AbstractHTTPTransportSender", True, "invoke", "(MessageContext)", "", "Argument[0]", "request-forgery", "ai-generated"]
- ["org.apache.axis2.transport.http", "HTTPSender", True, "send", "(MessageContext,URL,String)", "", "Argument[1]", "request-forgery", "ai-generated"]
#- ["org.apache.axis2.transport.http", "Request", True, "addHeader", "(String,String)", "", "Argument[0..1]", "response-splitting", "ai-generated"] # INVALID: Outbound request header setter, not response splitting
#- ["org.apache.axis2.transport.http", "Request", True, "setHeader", "(String,String)", "", "Argument[0..1]", "response-splitting", "ai-generated"] # INVALID: Outbound request header setter, not response splitting
- ["org.apache.axis2.transport.http", "ServletBasedOutTransportInfo", True, "addHeader", "(String,String)", "", "Argument[0..1]", "response-splitting", "ai-generated"]
- ["org.apache.axis2.transport.http", "ServletBasedOutTransportInfo", True, "setContentType", "(String)", "", "Argument[0]", "response-splitting", "ai-generated"]
- addsTo:
pack: codeql/java-all
extensible: sourceModel
data:
- ["org.apache.axis2.transport.http", "AxisServlet", True, "createMessageContext", "(HttpServletRequest,HttpServletResponse)", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http", "AxisServlet", True, "createMessageContext", "(HttpServletRequest,HttpServletResponse,boolean)", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http", "AxisServlet", True, "getTransportHeaders", "(HttpServletRequest)", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http", "CommonsTransportHeaders", True, "entrySet", "()", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http", "CommonsTransportHeaders", True, "get", "(Object)", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http", "CommonsTransportHeaders", True, "keySet", "()", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http", "CommonsTransportHeaders", True, "remove", "(Object)", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http", "CommonsTransportHeaders", True, "values", "()", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http", "HTTPWorker", True, "getHost", "(AxisHttpRequest)", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http", "HTTPWorker", True, "service", "(AxisHttpRequest,AxisHttpResponse,MessageContext)", "", "Argument[0]", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http", "ListingAgent", True, "getParamtereIgnoreCase", "(HttpServletRequest,String)", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http", "Request", True, "getCookies", "()", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http", "Request", True, "getResponseContent", "()", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http", "Request", True, "getResponseContentEncoding", "()", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http", "Request", True, "getResponseHeader", "(String)", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http", "Request", True, "getResponseHeaders", "()", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http", "Request", True, "getStatusText", "()", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http", "SimpleHTTPServer", True, "main", "(String[])", "", "Argument[0]", "commandargs", "ai-generated"]
- ["org.apache.axis2.transport.http", "TransportHeaders", True, "entrySet", "()", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http", "TransportHeaders", True, "get", "(Object)", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http", "TransportHeaders", True, "keySet", "()", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http", "TransportHeaders", True, "remove", "(Object)", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http", "TransportHeaders", True, "values", "()", "", "ReturnValue", "remote", "ai-generated"]

View File

@@ -0,0 +1,43 @@
# THIS FILE IS AN AUTO-GENERATED MODELS AS DATA FILE. DO NOT EDIT.
# Generated from https://github.com/apache/axis-axis2-java-core.git#b7e6711279d38b7f0db3e648888de5154729e9a8 by codeql-mads-via-llm
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
- ["org.apache.axis2.transport.http.server", "AxisHttpResponse", True, "sendError", "(int,String)", "", "Argument[1]", "response-splitting", "ai-generated"]
- ["org.apache.axis2.transport.http.server", "AxisHttpResponse", True, "setContentType", "(String)", "", "Argument[0]", "response-splitting", "ai-generated"]
- ["org.apache.axis2.transport.http.server", "AxisHttpResponseImpl", True, "addHeader", "(Header)", "", "Argument[0]", "response-splitting", "ai-generated"]
- ["org.apache.axis2.transport.http.server", "AxisHttpResponseImpl", True, "addHeader", "(String,Object)", "", "Argument[0..1]", "response-splitting", "ai-generated"]
- ["org.apache.axis2.transport.http.server", "AxisHttpResponseImpl", True, "sendError", "(int,String)", "", "Argument[1]", "response-splitting", "ai-generated"]
- ["org.apache.axis2.transport.http.server", "AxisHttpResponseImpl", True, "setHeader", "(Header)", "", "Argument[0]", "response-splitting", "ai-generated"]
- ["org.apache.axis2.transport.http.server", "AxisHttpResponseImpl", True, "setHeader", "(String,Object)", "", "Argument[0..1]", "response-splitting", "ai-generated"]
- ["org.apache.axis2.transport.http.server", "AxisHttpResponseImpl", True, "setHeaders", "(Header[])", "", "Argument[0]", "response-splitting", "ai-generated"]
- ["org.apache.axis2.transport.http.server", "DefaultConnectionListenerFailureHandler", True, "failed", "(IOProcessor,Throwable)", "", "Argument[1]", "log-injection", "ai-generated"]
- ["org.apache.axis2.transport.http.server", "DefaultConnectionListenerFailureHandler", True, "notifyAbnormalTermination", "(IOProcessor,String,Throwable)", "", "Argument[1..2]", "log-injection", "ai-generated"]
- ["org.apache.axis2.transport.http.server", "ResponseSessionCookie", True, "process", "(HttpResponse,EntityDetails,HttpContext)", "", "Argument[2]", "response-splitting", "ai-generated"]
- addsTo:
pack: codeql/java-all
extensible: sourceModel
data:
- ["org.apache.axis2.transport.http.server", "AxisHttpConnection", True, "getInputStream", "()", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http.server", "AxisHttpConnection", True, "receiveRequest", "()", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http.server", "AxisHttpConnectionImpl", True, "receiveRequest", "()", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http.server", "AxisHttpRequest", True, "getContentType", "()", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http.server", "AxisHttpRequest", True, "getInputStream", "()", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http.server", "AxisHttpRequest", True, "getMethod", "()", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http.server", "AxisHttpRequest", True, "getRequestURI", "()", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http.server", "AxisHttpRequestImpl", True, "getAllHeaders", "()", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http.server", "AxisHttpRequestImpl", True, "getContentType", "()", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http.server", "AxisHttpRequestImpl", True, "getFirstHeader", "(String)", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http.server", "AxisHttpRequestImpl", True, "getHeader", "(String)", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http.server", "AxisHttpRequestImpl", True, "getHeaders", "()", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http.server", "AxisHttpRequestImpl", True, "getHeaders", "(String)", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http.server", "AxisHttpRequestImpl", True, "getInputStream", "()", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http.server", "AxisHttpRequestImpl", True, "getLastHeader", "(String)", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http.server", "AxisHttpRequestImpl", True, "getMethod", "()", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http.server", "AxisHttpRequestImpl", True, "getRequestURI", "()", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http.server", "AxisHttpRequestImpl", True, "headerIterator", "()", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http.server", "AxisHttpRequestImpl", True, "headerIterator", "(String)", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http.server", "HttpUtils", True, "getSoapAction", "(AxisHttpRequest)", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.http.server", "Worker", True, "service", "(AxisHttpRequest,AxisHttpResponse,MessageContext)", "", "Argument[0]", "remote", "ai-generated"]

View File

@@ -0,0 +1,84 @@
# MaD Generation Report
## Included (71)
| Package | Class | Method | Type | Kind | Certainty | Reason |
|---------|-------|--------|------|------|-----------|--------|
| org.apache.axis2.transport.http.util | RESTUtil | processXMLRequest | sink | CWE-611 | 4 | Argument 1 (InputStream in) is parsed as XML via TransportUtils.createSOAPMessage, which can lead to XXE if the underlying XML parser is not safely configured (e.g., DTD parsing enabled). |
| org.apache.axis2.transport.http.util | RESTUtil | processXMLRequest | sink | CWE-611 | 4 | Argument 1 (InputStream in) is parsed as XML via TransportUtils.createSOAPMessage, which can lead to XXE if the underlying XML parser is not safely configured (e.g., DTD parsing enabled). |
| org.apache.axis2.transport.http.impl.httpclient5 | HTTPSenderImpl | createRequest | sink | CWE-918 | 4 | Argument 2 (url) is used to form the target of an HTTP request (confirmed by callee URL.toURI()). If user-controlled, this can lead to Server-Side Request Forgery (SSRF). |
| org.apache.axis2.transport.http | AbstractAgent | renderView | sink | CWE-22 | 5 | Argument 0 (jspName) is passed to getRequestDispatcher(String) and then included via RequestDispatcher.include(), allowing an attacker to control which server-side resource is included, leading to path traversal. |
| org.apache.axis2.transport.http | AbstractAgent | renderView | sink | CWE-73 | 5 | Argument 0 (jspName) externally controls which file/resource is dispatched via getRequestDispatcher(String).include(), allowing external control of file name or path. |
| org.apache.axis2.transport.http | AbstractHTTPTransportSender | invoke | sink | CWE-918 | 4 | The invoke method extracts the target endpoint from msgContext via getTo() and sends an HTTP request to it via writeMessageWithCommons. If the endpoint URL in the MessageContext is attacker-controlled, this can lead to SSRF. |
| org.apache.axis2.transport.http | AxisServlet | getTransportHeaders | source | remote | 4 | The method reads HTTP transport headers from an HttpServletRequest (via methods like getHeader/getHeaderNames which are outside this library) and returns them as a Map<String,String>. The return value contains user-controlled remote data (HTTP headers). |
| org.apache.axis2.transport.http | AxisServlet | createMessageContext | source | remote | 4 | The method creates and returns a MessageContext populated with remote data extracted from HttpServletRequest (query strings, request URIs, URLs, remote address, transport headers). The return value contains user-controlled remote data. |
| org.apache.axis2.transport.http | AxisServlet | createMessageContext | source | remote | 5 | The method directly reads remote data from HttpServletRequest (query strings, request URIs, URLs, remote addresses, transport headers) and populates the returned MessageContext with this data. The return value contains user-controlled remote data. |
| org.apache.axis2.transport.http | CommonsTransportHeaders | get | source | remote | 4 | Returns an HTTP header value by key. HTTP headers are user-controlled remote data from incoming HTTP requests in Apache Axis2's transport layer. |
| org.apache.axis2.transport.http | CommonsTransportHeaders | values | source | remote | 4 | Returns all HTTP header values. HTTP headers are user-controlled remote data from incoming HTTP requests. |
| org.apache.axis2.transport.http | CommonsTransportHeaders | entrySet | source | remote | 4 | Returns all HTTP header entries (key-value pairs). HTTP headers are user-controlled remote data from incoming HTTP requests. |
| org.apache.axis2.transport.http | CommonsTransportHeaders | keySet | source | remote | 4 | Returns all HTTP header keys. HTTP header names are user-controlled remote data from incoming HTTP requests. |
| org.apache.axis2.transport.http | CommonsTransportHeaders | remove | source | remote | 3 | Returns the removed HTTP header value. HTTP headers are user-controlled remote data from incoming HTTP requests. |
| org.apache.axis2.transport.http | HTTPSender | send | sink | CWE-918 | 5 | The URL argument (arg 1) is used to create and execute an HTTP request (via createRequest followed by Request.execute()). If an attacker controls this URL, it can lead to Server-Side Request Forgery. |
| org.apache.axis2.transport.http | HTTPTransportReceiver | printServiceHTML | sink | CWE-79 | 4 | Argument 0 (serviceName) is used to generate HTML output (the method returns an HTML string). The service name is likely embedded in the returned HTML without proper escaping, which can lead to reflected XSS if the serviceName originates from user input (e.g., an HTTP request parameter). |
| org.apache.axis2.transport.http | HTTPTransportUtils | processHTTPPostRequest | sink | CWE-611 | 4 | Argument 1 (InputStream `in`) is parsed as XML/SOAP via TransportUtils.createSOAPMessage. If the underlying XML parser is not configured to disable external entities, this leads to XXE attacks. |
| org.apache.axis2.transport.http | HTTPTransportUtils | processHTTPPostRequest | sink | CWE-611 | 4 | Argument 1 (InputStream `in`) is parsed as XML/SOAP via TransportUtils.createSOAPMessage. If the underlying XML parser is not configured to disable external entities, this leads to XXE attacks. |
| org.apache.axis2.transport.http | HTTPWorker | service | source | remote | 4 | The `request` parameter (arg 0) is an HTTP request object provided by the framework, carrying remote client data into the application. This is a framework entry point analogous to HttpServlet.doGet(HttpServletRequest, ...). |
| org.apache.axis2.transport.http | HTTPWorker | getHost | source | remote | 4 | The return value is derived from the HTTP Host header of a remote request, extracted via getFirstHeader/getValue. This is attacker-controllable remote data. |
| org.apache.axis2.transport.http | ListingAgent | getParamtereIgnoreCase | source | remote | 5 | This method reads HTTP request parameters by calling ServletRequest.getParameter() and iterating through getParameterNames() to perform case-insensitive parameter lookup. The return value is user-controlled input from an HTTP request. |
| org.apache.axis2.transport.http | Request | getResponseContent | source | remote | 4 | Returns the HTTP response body as an InputStream. This data comes from a remote HTTP server and could contain attacker-controlled content. |
| org.apache.axis2.transport.http | Request | getResponseHeaders | source | remote | 4 | Returns all HTTP response headers. These come from a remote HTTP server and could contain attacker-controlled content. |
| org.apache.axis2.transport.http | Request | getResponseHeader | source | remote | 4 | Returns the value of a specific HTTP response header. This data comes from a remote HTTP server and could contain attacker-controlled content. |
| org.apache.axis2.transport.http | Request | getStatusText | source | remote | 4 | Returns the HTTP response status text. This data comes from a remote HTTP server and could contain attacker-controlled content. |
| org.apache.axis2.transport.http | Request | getResponseContentEncoding | source | remote | 4 | Returns the content encoding of the HTTP response. This data comes from a remote HTTP server and could contain attacker-controlled content. |
| org.apache.axis2.transport.http | Request | getCookies | source | remote | 3 | Returns cookies which may come from a remote HTTP server's Set-Cookie response headers, making them attacker-controllable. |
| org.apache.axis2.transport.http | Request | addHeader | sink | CWE-113 | 4 | Arguments 0 (name) and 1 (value) are written directly to an HTTP request header. If attacker-controlled input is used, it can lead to HTTP request splitting. |
| org.apache.axis2.transport.http | Request | setHeader | sink | CWE-113 | 4 | Arguments 0 (name) and 1 (value) are written directly to an HTTP request header. If attacker-controlled input is used, it can lead to HTTP request splitting. |
| org.apache.axis2.transport.http | ServletBasedOutTransportInfo | setContentType | sink | CWE-113 | 5 | Argument 0 (contentType) is passed directly to jakarta.servlet.ServletResponse.setContentType, writing it into the HTTP response Content-Type header. Attacker-controlled data here can cause HTTP response splitting / header injection. |
| org.apache.axis2.transport.http | ServletBasedOutTransportInfo | addHeader | sink | CWE-113 | 5 | Arguments 0 (headerName) and 1 (headerValue) are passed directly to jakarta.servlet.http.HttpServletResponse.addHeader, writing them into the HTTP response headers. Attacker-controlled data in either argument can cause HTTP response splitting / header injection. |
| org.apache.axis2.transport.http | SimpleHTTPServer | main | source | commandargs | 5 | The args parameter of main() receives command-line arguments from outside the program, making it a source of user-controlled input. Callees show these args are used to construct file system paths (createConfigurationContextFromFileSystem), parse integers, etc. |
| org.apache.axis2.transport.http | TransportHeaders | get | source | remote | 5 | The return value is an HTTP header value read from HttpServletRequest.getHeader(), which is remote user-controlled input. |
| org.apache.axis2.transport.http | TransportHeaders | values | source | remote | 5 | Returns all HTTP header values from the wrapped HttpServletRequest, which are remote user-controlled input. |
| org.apache.axis2.transport.http | TransportHeaders | entrySet | source | remote | 5 | Returns all HTTP header key-value entries from the wrapped HttpServletRequest, which are remote user-controlled input. |
| org.apache.axis2.transport.http | TransportHeaders | keySet | source | remote | 5 | Returns all HTTP header names from the wrapped HttpServletRequest, which are remote user-controlled input. |
| org.apache.axis2.transport.http | TransportHeaders | remove | source | remote | 5 | Returns the removed HTTP header value from the wrapped HttpServletRequest, which is remote user-controlled input. |
| org.apache.axis2.transport.http.server | AxisHttpConnection | getInputStream | source | remote | 4 | Returns an InputStream that reads data from a remote HTTP client connection. The data originates from outside the program boundary (a remote network client). |
| org.apache.axis2.transport.http.server | AxisHttpConnection | receiveRequest | source | remote | 4 | Returns a ClassicHttpRequest parsed from the remote HTTP client connection. The entire request object (URL, headers, body) originates from a remote client and is attacker-controllable. |
| org.apache.axis2.transport.http.server | AxisHttpConnectionImpl | receiveRequest | source | remote | 5 | receiveRequest() reads an HTTP request from the underlying socket connection by parsing the request line (method, URI), headers, and body from the socket's input stream. The returned ClassicHttpRequest contains attacker-controlled remote data. |
| org.apache.axis2.transport.http.server | AxisHttpRequest | getInputStream | source | remote | 4 | Returns the input stream of an incoming HTTP request body, which contains data sent by a remote client. |
| org.apache.axis2.transport.http.server | AxisHttpRequest | getContentType | source | remote | 4 | Returns the Content-Type header from an incoming HTTP request, which is controlled by the remote client. |
| org.apache.axis2.transport.http.server | AxisHttpRequest | getRequestURI | source | remote | 5 | Returns the request URI from an incoming HTTP request, which is fully controlled by the remote client and is a classic source of user-controlled input. |
| org.apache.axis2.transport.http.server | AxisHttpRequest | getMethod | source | remote | 3 | Returns the HTTP method from an incoming HTTP request, which is specified by the remote client. |
| org.apache.axis2.transport.http.server | AxisHttpRequestImpl | getInputStream | source | remote | 5 | Returns the input stream of the HTTP request body from the remote client, delegating to AxisHttpConnection.getInputStream(). |
| org.apache.axis2.transport.http.server | AxisHttpRequestImpl | getRequestURI | source | remote | 5 | Returns the request URI from a remote HTTP request, delegating to HttpRequest.getRequestUri(). |
| org.apache.axis2.transport.http.server | AxisHttpRequestImpl | getContentType | source | remote | 5 | Returns the Content-Type header value from the remote HTTP request. |
| org.apache.axis2.transport.http.server | AxisHttpRequestImpl | getMethod | source | remote | 5 | Returns the HTTP method from the remote HTTP request. |
| org.apache.axis2.transport.http.server | AxisHttpRequestImpl | getHeaders | source | remote | 5 | Returns all HTTP headers from the remote HTTP request. |
| org.apache.axis2.transport.http.server | AxisHttpRequestImpl | getHeaders | source | remote | 5 | Returns HTTP headers matching a given name from the remote HTTP request. |
| org.apache.axis2.transport.http.server | AxisHttpRequestImpl | getHeader | source | remote | 5 | Returns an HTTP header by name from the remote HTTP request. |
| org.apache.axis2.transport.http.server | AxisHttpRequestImpl | getFirstHeader | source | remote | 5 | Returns the first HTTP header with the given name from the remote HTTP request. |
| org.apache.axis2.transport.http.server | AxisHttpRequestImpl | getLastHeader | source | remote | 5 | Returns the last HTTP header with the given name from the remote HTTP request. |
| org.apache.axis2.transport.http.server | AxisHttpRequestImpl | getAllHeaders | source | remote | 5 | Returns all HTTP headers from the remote HTTP request. |
| org.apache.axis2.transport.http.server | AxisHttpRequestImpl | headerIterator | source | remote | 5 | Returns an iterator over all HTTP headers from the remote HTTP request. |
| org.apache.axis2.transport.http.server | AxisHttpRequestImpl | headerIterator | source | remote | 5 | Returns an iterator over HTTP headers matching a given name from the remote HTTP request. |
| org.apache.axis2.transport.http.server | AxisHttpResponse | setContentType | sink | CWE-113 | 4 | Argument 0 (contentType) is written directly as the Content-Type HTTP response header value. If user-controlled, CRLF characters could cause HTTP response splitting. |
| org.apache.axis2.transport.http.server | AxisHttpResponse | sendError | sink | CWE-79 | 4 | Argument 1 (msg) is included in the HTTP error response body. If user-controlled, unsanitized content could lead to cross-site scripting. |
| org.apache.axis2.transport.http.server | AxisHttpResponse | sendError | sink | CWE-113 | 3 | Argument 1 (msg) may be included in HTTP response headers (e.g., as a status reason phrase). If user-controlled, CRLF characters could cause HTTP response splitting. |
| org.apache.axis2.transport.http.server | AxisHttpResponseImpl | addHeader | sink | CWE-113 | 5 | Arguments 0 (name) and 1 (value) are directly passed to HttpMessage.addHeader(String, Object), setting an HTTP response header. Tainted data in either could lead to HTTP response splitting. |
| org.apache.axis2.transport.http.server | AxisHttpResponseImpl | addHeader | sink | CWE-113 | 5 | Argument 0 (Header object) is directly passed to HttpMessage.addHeader(Header), setting an HTTP response header. Tainted data in the Header object could lead to HTTP response splitting. |
| org.apache.axis2.transport.http.server | AxisHttpResponseImpl | setHeader | sink | CWE-113 | 5 | Arguments 0 (name) and 1 (value) are directly passed to HttpMessage.setHeader(String, Object), setting an HTTP response header. Tainted data could lead to HTTP response splitting. |
| org.apache.axis2.transport.http.server | AxisHttpResponseImpl | setHeader | sink | CWE-113 | 5 | Argument 0 (Header object) is directly passed to HttpMessage.setHeader(Header), setting an HTTP response header. Tainted data could lead to HTTP response splitting. |
| org.apache.axis2.transport.http.server | AxisHttpResponseImpl | setHeaders | sink | CWE-113 | 5 | Argument 0 (Header[] array) is directly passed to HttpMessage.setHeaders(Header[]), setting HTTP response headers. Tainted data in any header could lead to HTTP response splitting. |
| org.apache.axis2.transport.http.server | AxisHttpResponseImpl | sendError | sink | CWE-113 | 4 | Argument 1 (msg) is passed to HttpResponse.setReasonPhrase(String), which sets the reason phrase in the HTTP response status line. Tainted data with CRLF characters could lead to HTTP response splitting. |
| org.apache.axis2.transport.http.server | AxisHttpService | handleException | sink | CWE-209 | 4 | Argument 0 (HttpException ex) is converted to an error message via ServerSupport.toErrorMessage(ex) and sent to the remote client by setting it as the HTTP response entity. This exposes potentially sensitive exception details (stack traces, internal class names, error messages) to the end user. |
| org.apache.axis2.transport.http.server | DefaultConnectionListenerFailureHandler | notifyAbnormalTermination | sink | CWE-117 | 4 | The 'message' (arg 1) String and 'cause' (arg 2) Throwable are passed directly to Log.error() without sanitization, which could lead to log injection if they contain attacker-controlled data. |
| org.apache.axis2.transport.http.server | DefaultConnectionListenerFailureHandler | failed | sink | CWE-117 | 4 | The 'cause' (arg 1) Throwable is passed directly to Log.warn() without sanitization. If the Throwable's message contains attacker-controlled data, this could lead to log injection. |
| org.apache.axis2.transport.http.server | HttpUtils | getSoapAction | source | remote | 5 | The method extracts the SOAPAction HTTP header from an incoming AxisHttpRequest via getFirstHeader/getValue. This is user-controlled remote input entering the program. |
| org.apache.axis2.transport.http.server | ResponseSessionCookie | process | sink | CWE-113 | 4 | Data from the HttpContext (arg 2) is retrieved via getAttribute, used to build a session cookie value (via CharArrayBuffer.append), and added as an HTTP response header via HttpMessage.addHeader. If the context contains attacker-controlled data (e.g., a session ID), this could lead to HTTP response header injection/splitting. |
| org.apache.axis2.transport.http.server | Worker | service | source | remote | 4 | The `request` parameter (arg 0) represents an incoming HTTP request in this framework callback method, analogous to HttpServlet.service(). It brings externally-controlled data (HTTP headers, parameters, body) into the program. |
## Ignored (low certainty) (1)
| Package | Class | Method | Type | Kind | Certainty | Reason | Why Ignored |
|---------|-------|--------|------|------|-----------|--------|-------------|
| org.apache.axis2.transport.http.server | AxisHttpConnectionImpl | getInputStream | source | remote | 2 | getInputStream() returns the input stream from the underlying HTTP socket connection, which carries attacker-controlled remote data. | No callees were found, but the class is an HTTP connection implementation wrapping a Socket, and getInputStream() follows the standard pattern of exposing the socket's input stream which carries remote data. However, without callee confirmation, certainty is lower. |

View File

@@ -0,0 +1,8 @@
# THIS FILE IS AN AUTO-GENERATED MODELS AS DATA FILE. DO NOT EDIT.
# Generated from https://github.com/apache/axis-axis2-java-core.git#b7e6711279d38b7f0db3e648888de5154729e9a8 by codeql-mads-via-llm
extensions:
- addsTo:
pack: codeql/java-all
extensible: sourceModel
data:
- ["org.apache.axis2.transport.jms.ctype", "PropertyRule", True, "getContentType", "(Message)", "", "ReturnValue", "remote", "ai-generated"]

View File

@@ -0,0 +1,10 @@
# THIS FILE IS AN AUTO-GENERATED MODELS AS DATA FILE. DO NOT EDIT.
# Generated from https://github.com/apache/axis-axis2-java-core.git#b7e6711279d38b7f0db3e648888de5154729e9a8 by codeql-mads-via-llm
extensions:
- addsTo:
pack: codeql/java-all
extensible: sourceModel
data:
- ["org.apache.axis2.transport.jms.iowrappers", "BytesMessageDataSource", True, "getInputStream", "()", "", "ReturnValue", "remote", "ai-generated"]
- ["org.apache.axis2.transport.jms.iowrappers", "BytesMessageInputStream", True, "read", "(byte[])", "", "Argument[0]", "remote", "ai-generated"]
- ["org.apache.axis2.transport.jms.iowrappers", "BytesMessageInputStream", True, "read", "(byte[],int,int)", "", "Argument[0]", "remote", "ai-generated"]

View File

@@ -0,0 +1,17 @@
# THIS FILE IS AN AUTO-GENERATED MODELS AS DATA FILE. DO NOT EDIT.
# Generated from https://github.com/apache/axis-axis2-java-core.git#b7e6711279d38b7f0db3e648888de5154729e9a8 by codeql-mads-via-llm
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
- ["org.apache.axis2.transport.jms", "JMSConnectionFactoryManager", True, "handleException", "(String,Exception)", "", "Argument[0]", "log-injection", "ai-generated"]
- ["org.apache.axis2.transport.jms", "JMSMessageSender", True, "JMSMessageSender", "(JMSConnectionFactory,String)", "", "Argument[1]", "request-forgery", "ai-generated"]
- ["org.apache.axis2.transport.jms", "JMSSender", True, "sendMessage", "(MessageContext,String,OutTransportInfo)", "", "Argument[1]", "request-forgery", "ai-generated"]
- addsTo:
pack: codeql/java-all
extensible: sourceModel
data:
- ["org.apache.axis2.transport.jms", "JMSMessageReceiver", True, "onMessage", "(Message,UserTransaction)", "", "Argument[0]", "remote", "ai-generated"]
- ["org.apache.axis2.transport.jms", "JMSUtils", True, "getProperty", "(Message,String)", "", "ReturnValue", "remote", "ai-generated"]
#- ["org.apache.axis2.transport.jms", "JMSUtils", True, "getTransportHeaders", "(Message)", "", "ReturnValue", "remote", "ai-generated"] # INVALID: Method does not exist in repo

View File

@@ -0,0 +1,25 @@
# MaD Generation Report
## Included (14)
| Package | Class | Method | Type | Kind | Certainty | Reason |
|---------|-------|--------|------|------|-----------|--------|
| org.apache.axis2.transport.jms.ctype | PropertyRule | getContentType | source | remote | 4 | The method reads a string property from a JMS Message (via Message.getStringProperty) received from an external message broker. The return value encapsulates content type information extracted from the external message, making it a remote data source. |
| org.apache.axis2.transport.jms.iowrappers | BytesMessageDataSource | getInputStream | source | remote | 4 | Returns an InputStream that reads data from a JMS BytesMessage. JMS messages originate from external message queues (remote systems), so the returned stream carries externally-sourced data that did not exist in the process before the message was received. |
| org.apache.axis2.transport.jms.iowrappers | BytesMessageInputStream | read | source | remote | 4 | Reads bytes from a JMS BytesMessage (via BytesMessage.readBytes) into the byte array parameter (arg 0). JMS messages originate from external message brokers, making this a remote data source. |
| org.apache.axis2.transport.jms.iowrappers | BytesMessageInputStream | read | source | remote | 4 | Reads bytes from a JMS BytesMessage (via BytesMessage.readBytes) into the byte array parameter (arg 0). JMS messages originate from external message brokers, making this a remote data source. |
| org.apache.axis2.transport.jms | JMSConnectionFactory | getDestination | sink | CWE-074 | 4 | Argument 0 (destinationName) is used as a JNDI lookup name via JMSUtils.lookupDestination(Context, String, String). If this name is attacker-controlled, it can point to a malicious JNDI server, potentially leading to remote code execution (JNDI injection). |
| org.apache.axis2.transport.jms | JMSConnectionFactoryManager | handleException | sink | CWE-117 | 4 | Argument 0 (msg) is passed directly to Log.error(), which writes it to the log. If msg contains unsanitized user input, this enables log injection (e.g., forged log entries via newline characters). |
| org.apache.axis2.transport.jms | JMSMessageReceiver | onMessage | source | remote | 4 | The Message parameter (arg 0) is data received from an external JMS message queue. This method is a well-known JMS framework callback invoked when a message arrives. The callees confirm it reads message content (getText, getJMSMessageID, getJMSCorrelationID, etc.) and processes it through the engine, making this the entry point where external data enters the application. |
| org.apache.axis2.transport.jms | JMSMessageSender | JMSMessageSender | sink | CWE-918 | 4 | Argument 1 (targetAddress) is a target EPR (endpoint reference) used to resolve the JMS destination. Callees show it flows into JMSUtils.getDestination(String) and JMSConnectionFactory.getDestination(String,String), setting up a server-side connection to a potentially attacker-controlled destination, enabling request forgery. |
| org.apache.axis2.transport.jms | JMSOutTransportInfo | getReplyDestination | sink | CWE-074 | 5 | Argument 0 (replyDest) is used as a JNDI name in a lookup via JMSUtils.lookupDestination(). If this name is attacker-controlled, it can point to a malicious LDAP/RMI server and lead to remote code execution through JNDI injection. |
| org.apache.axis2.transport.jms | JMSSender | sendMessage | sink | CWE-918 | 4 | Argument 1 (targetAddress) specifies the JMS endpoint/broker address where the message is sent. If attacker-controlled, the server could be made to connect to an arbitrary JMS broker, enabling SSRF. Callees confirm the address is used to create JMSOutTransportInfo and send the message via sendOverJMS. |
| org.apache.axis2.transport.jms | JMSUtils | getProperty | source | remote | 5 | The return value is a string property read from a JMS Message, which originates from an external/remote messaging system. Callees confirm delegation to jakarta.jms.Message.getStringProperty(). |
| org.apache.axis2.transport.jms | JMSUtils | getTransportHeaders | source | remote | 5 | The return value is a Map containing transport headers extracted from a JMS Message. Callees confirm it reads multiple properties (getStringProperty, getJMSCorrelationID, getJMSMessageID, getJMSType, etc.) from the external message. |
| org.apache.axis2.transport.jms | JMSUtils | lookup | sink | CWE-074 | 5 | Argument 2 (name) is used in a JNDI lookup via javax.naming.Context.lookup(String). If attacker-controlled, this can lead to JNDI injection allowing remote code execution. |
| org.apache.axis2.transport.jms | JMSUtils | lookupDestination | sink | CWE-074 | 5 | Argument 1 (destinationName) is passed to JMSUtils.lookup() which performs a JNDI lookup via javax.naming.Context.lookup(String). If attacker-controlled, this can lead to JNDI injection. |
## Ignored (low certainty) (0)
None.

View File

@@ -0,0 +1,9 @@
# THIS FILE IS AN AUTO-GENERATED MODELS AS DATA FILE. DO NOT EDIT.
# Generated from https://github.com/apache/axis-axis2-java-core.git#b7e6711279d38b7f0db3e648888de5154729e9a8 by codeql-mads-via-llm
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
- ["org.apache.axis2.transport.mail", "MailTransportListener", True, "poll", "(PollTableEntry)", "", "Argument[0]", "request-forgery", "ai-generated"]
- ["org.apache.axis2.transport.mail", "MailTransportSender", True, "sendMessage", "(MessageContext,String,OutTransportInfo)", "", "Argument[1]", "request-forgery", "ai-generated"]

View File

@@ -0,0 +1,13 @@
# MaD Generation Report
## Included (2)
| Package | Class | Method | Type | Kind | Certainty | Reason |
|---------|-------|--------|------|------|-----------|--------|
| org.apache.axis2.transport.mail | MailTransportListener | poll | sink | CWE-918 | 3 | Argument 0 (PollTableEntry) specifies the mail server configuration (host, protocol, credentials) used to establish a network connection to a mail server. If attacker-controlled, this enables SSRF by directing the server to connect to an arbitrary mail host. |
| org.apache.axis2.transport.mail | MailTransportSender | sendMessage | sink | CWE-918 | 3 | Argument 1 (targetAddress) is parsed via InternetAddress.parse() and used to set the target addresses for sending mail via SMTP. This allows the server to be directed to send email (a server-side request) to a user-controlled destination, which is a form of server-side request forgery. |
## Ignored (low certainty) (0)
None.

View File

@@ -0,0 +1,8 @@
# THIS FILE IS AN AUTO-GENERATED MODELS AS DATA FILE. DO NOT EDIT.
# Generated from https://github.com/apache/axis-axis2-java-core.git#b7e6711279d38b7f0db3e648888de5154729e9a8 by codeql-mads-via-llm
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
- ["org.apache.axis2.transport.tcp", "TCPTransportSender", True, "sendMessage", "(MessageContext,String,OutTransportInfo)", "", "Argument[1]", "request-forgery", "ai-generated"]

View File

@@ -0,0 +1,12 @@
# MaD Generation Report
## Included (1)
| Package | Class | Method | Type | Kind | Certainty | Reason |
|---------|-------|--------|------|------|-----------|--------|
| org.apache.axis2.transport.tcp | TCPTransportSender | sendMessage | sink | CWE-918 | 5 | Argument 1 (targetEPR) is parsed and used to open a TCP connection via openTCPConnection(String, int). An attacker-controlled endpoint reference could lead to server-side request forgery. |
## Ignored (low certainty) (0)
None.

View File

@@ -0,0 +1,8 @@
# THIS FILE IS AN AUTO-GENERATED MODELS AS DATA FILE. DO NOT EDIT.
# Generated from https://github.com/apache/axis-axis2-java-core.git#b7e6711279d38b7f0db3e648888de5154729e9a8 by codeql-mads-via-llm
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
- ["org.apache.axis2.transport.udp", "UDPSender", True, "sendMessage", "(MessageContext,String,OutTransportInfo)", "", "Argument[1]", "request-forgery", "ai-generated"]

View File

@@ -0,0 +1,12 @@
# MaD Generation Report
## Included (1)
| Package | Class | Method | Type | Kind | Certainty | Reason |
|---------|-------|--------|------|------|-----------|--------|
| org.apache.axis2.transport.udp | UDPSender | sendMessage | sink | CWE-918 | 4 | Argument 1 (`targetEPR`) specifies the target endpoint for a UDP network request. The method directly calls `DatagramSocket.send()`, meaning user-controlled input in `targetEPR` can lead to server-side request forgery. |
## Ignored (low certainty) (0)
None.

View File

@@ -0,0 +1,8 @@
# THIS FILE IS AN AUTO-GENERATED MODELS AS DATA FILE. DO NOT EDIT.
# Generated from https://github.com/apache/axis-axis2-java-core.git#b7e6711279d38b7f0db3e648888de5154729e9a8 by codeql-mads-via-llm
extensions:
- addsTo:
pack: codeql/java-all
extensible: sourceModel
data:
- ["org.apache.axis2.webapp", "AxisAdminServlet", True, "service", "(HttpServletRequest,HttpServletResponse)", "", "Argument[0]", "remote", "ai-generated"]

View File

@@ -0,0 +1,12 @@
# MaD Generation Report
## Included (1)
| Package | Class | Method | Type | Kind | Certainty | Reason |
|---------|-------|--------|------|------|-----------|--------|
| org.apache.axis2.webapp | AxisAdminServlet | service | source | remote | 5 | The `request` parameter (arg 0) is an HttpServletRequest provided by the servlet container, carrying user-supplied HTTP data (query parameters, headers, path info, etc.). The `service` method is a well-known servlet entry point where remote data enters the application. Callees confirm usage of getParameter, getPathInfo, getMethod on the request. |
## Ignored (low certainty) (0)
None.

View File

@@ -330,7 +330,7 @@ pub fn extract(
if let Some(yeast_runner) = yeast_runner {
let ast = yeast_runner
.run_from_tree(&tree)
.run_from_tree(&tree, source)
.unwrap_or_else(|e| panic!("Desugaring failed for {path_str}: {e}"));
traverse_yeast(&ast, &mut visitor);
} else {

View File

@@ -115,8 +115,19 @@ pub fn generate(
&node_parent_table_name,
)),
ql::TopLevel::Class(ql_gen::create_token_class(&token_name, &tokeninfo_name)),
ql::TopLevel::Class(ql_gen::create_reserved_word_class(&reserved_word_name)),
];
// Only emit the ReservedWord class when there are actually unnamed token
// types in the schema (i.e., @{prefix}_reserved_word exists in the dbscheme).
// When converting from a YEAST YAML schema that has no unnamed tokens, this
// type is absent and referencing it would cause a QL compilation error.
let has_reserved_words = nodes
.values()
.any(|n| n.dbscheme_name == reserved_word_name);
if has_reserved_words {
body.push(ql::TopLevel::Class(ql_gen::create_reserved_word_class(
&reserved_word_name,
)));
}
// Overlay discard predicates
body.push(ql::TopLevel::Predicate(

View File

@@ -113,8 +113,24 @@ fn parse_query_node_inner(tokens: &mut Tokens) -> Result<TokenStream> {
/// appear in any order; bare patterns are accumulated and emitted as a
/// single `("child", ...)` entry.
fn parse_query_fields(tokens: &mut Tokens) -> Result<Vec<TokenStream>> {
let mut fields = Vec::new();
// Accumulate per-field elems in declaration order; multiple uses of the
// same field name extend the same list (so e.g. `cond: (foo) cond: (bar)`
// matches a `cond` field whose first child is `foo` and second is `bar`).
let mut field_order: Vec<String> = Vec::new();
let mut field_elems: std::collections::HashMap<String, Vec<TokenStream>> =
std::collections::HashMap::new();
let mut bare_children: Vec<TokenStream> = Vec::new();
let push_field_elem = |order: &mut Vec<String>,
map: &mut std::collections::HashMap<String, Vec<TokenStream>>,
name: String,
elem: TokenStream| {
if !map.contains_key(&name) {
order.push(name.clone());
map.insert(name, vec![elem]);
} else {
map.get_mut(&name).unwrap().push(elem);
}
};
while tokens.peek().is_some() {
if peek_is_field(tokens) {
let field_name = expect_ident(tokens, "expected field name")?;
@@ -122,10 +138,40 @@ fn parse_query_fields(tokens: &mut Tokens) -> Result<Vec<TokenStream>> {
expect_punct(tokens, ':', "expected `:` after field name")?;
let child = parse_query_node(tokens)?;
fields.push(quote! {
(#field_str, vec![yeast::query::QueryListElem::SingleNode(#child)])
});
// Parse the field's pattern. To support repetition like
// `field: (kind)* @cap`, parse the atom first, then check for
// a quantifier, and lastly handle a trailing `@capture`.
let atom = parse_query_atom(tokens)?;
if peek_is_repetition(tokens) {
let rep = expect_repetition(tokens)?;
let elem = quote! {
yeast::query::QueryListElem::Repeated {
children: vec![yeast::query::QueryListElem::SingleNode(#atom)],
rep: #rep,
}
};
let elem = maybe_wrap_list_capture(tokens, elem)?;
push_field_elem(&mut field_order, &mut field_elems, field_str, elem);
} else {
let child = if peek_is_at(tokens) {
tokens.next();
let capture_name =
expect_ident(tokens, "expected capture name after @")?;
let name_str = capture_name.to_string();
quote! {
yeast::query::QueryNode::Capture {
capture: #name_str,
node: Box::new(#atom),
}
}
} else {
atom
};
let elem = quote! {
yeast::query::QueryListElem::SingleNode(#child)
};
push_field_elem(&mut field_order, &mut field_elems, field_str, elem);
}
} else {
// Bare patterns — accumulate into the implicit `child` field.
// We don't break here, so we can interleave with named fields.
@@ -137,6 +183,13 @@ fn parse_query_fields(tokens: &mut Tokens) -> Result<Vec<TokenStream>> {
bare_children.extend(elems);
}
}
let mut fields: Vec<TokenStream> = Vec::new();
for name in field_order {
let elems = field_elems.remove(&name).unwrap();
fields.push(quote! {
(#name, vec![#(#elems),*])
});
}
if !bare_children.is_empty() {
fields.push(quote! {
("child", vec![#(#bare_children),*])
@@ -299,7 +352,7 @@ fn parse_direct_node(tokens: &mut Tokens, ctx: &Ident) -> Result<TokenStream> {
Some(TokenTree::Group(g)) if g.delimiter() == Delimiter::Brace => {
let group = expect_group(tokens, Delimiter::Brace)?;
let expr = group.stream();
Ok(quote! { #expr })
Ok(quote! { ::std::convert::Into::<usize>::into(#expr) })
}
Some(TokenTree::Group(g)) if g.delimiter() == Delimiter::Parenthesis => {
let group = expect_group(tokens, Delimiter::Parenthesis)?;
@@ -329,12 +382,17 @@ fn parse_direct_node_inner(tokens: &mut Tokens, ctx: &Ident) -> Result<TokenStre
return Ok(quote! { #ctx.literal(#kind_str, #lit) });
}
// Check for (kind #{expr}) — computed literal, expr converted via .to_string()
// Check for (kind #{expr}) — computed literal, expr converted via YeastDisplay
if peek_is_hash(tokens) {
tokens.next(); // consume #
let group = expect_group(tokens, Delimiter::Brace)?;
let expr = group.stream();
return Ok(quote! { #ctx.literal(#kind_str, &(#expr).to_string()) });
return Ok(quote! {
{
let __value = yeast::YeastDisplay::yeast_to_string(&(#expr), &*#ctx.ast);
#ctx.literal(#kind_str, &__value)
}
});
}
// Check for (kind $fresh)
@@ -374,7 +432,11 @@ fn parse_direct_node_inner(tokens: &mut Tokens, ctx: &Ident) -> Result<TokenStre
inner.next(); // consume first .
inner.next(); // consume second .
let expr: proc_macro2::TokenStream = inner.collect();
stmts.push(quote! { let #temp: Vec<usize> = #expr; });
stmts.push(quote! {
let #temp: Vec<usize> = (#expr).into_iter()
.map(::std::convert::Into::<usize>::into)
.collect();
});
field_args.push(quote! { (#field_str, #temp) });
continue;
}
@@ -382,7 +444,7 @@ fn parse_direct_node_inner(tokens: &mut Tokens, ctx: &Ident) -> Result<TokenStre
}
let value = parse_direct_node(tokens, ctx)?;
stmts.push(quote! { let #temp = #value; });
stmts.push(quote! { let #temp: usize = #value; });
field_args.push(quote! { (#field_str, vec![#temp]) });
}
@@ -427,10 +489,16 @@ fn parse_direct_list(tokens: &mut Tokens, ctx: &Ident) -> Result<Vec<TokenStream
inner.next(); // consume first .
inner.next(); // consume second .
let expr: TokenStream = inner.collect();
items.push(quote! { __nodes.extend(#expr); });
items.push(quote! {
__nodes.extend(
(#expr).into_iter().map(::std::convert::Into::<usize>::into)
);
});
} else {
let expr = group.stream();
items.push(quote! { __nodes.push(#expr); });
items.push(quote! {
__nodes.push(::std::convert::Into::<usize>::into(#expr));
});
}
continue;
}
@@ -580,13 +648,24 @@ pub fn parse_rule_top(input: TokenStream) -> Result<TokenStream> {
let name_str = &cap.name;
match cap.multiplicity {
CaptureMultiplicity::Repeated => {
quote! { let #name: Vec<usize> = __captures.get_all(#name_str); }
quote! {
let #name: Vec<yeast::NodeRef> = __captures.get_all(#name_str)
.into_iter()
.map(yeast::NodeRef)
.collect();
}
}
CaptureMultiplicity::Optional => {
quote! { let #name: Option<usize> = __captures.get_opt(#name_str); }
quote! {
let #name: Option<yeast::NodeRef> =
__captures.get_opt(#name_str).map(yeast::NodeRef);
}
}
CaptureMultiplicity::Single => {
quote! { let #name: usize = __captures.get_var(#name_str).unwrap(); }
quote! {
let #name: yeast::NodeRef =
yeast::NodeRef(__captures.get_var(#name_str).unwrap());
}
}
}
})
@@ -613,19 +692,26 @@ pub fn parse_rule_top(input: TokenStream) -> Result<TokenStream> {
CaptureMultiplicity::Repeated => quote! {
let __field_id = #ctx_ident.ast.field_id_for_name(#name_str)
.unwrap_or_else(|| panic!("field '{}' not found", #name_str));
__fields.insert(__field_id, #name);
__fields.insert(
__field_id,
#name.into_iter()
.map(::std::convert::Into::<usize>::into)
.collect(),
);
},
CaptureMultiplicity::Optional => quote! {
let __field_id = #ctx_ident.ast.field_id_for_name(#name_str)
.unwrap_or_else(|| panic!("field '{}' not found", #name_str));
if let Some(__id) = #name {
__fields.entry(__field_id).or_insert_with(Vec::new).push(__id);
__fields.entry(__field_id).or_insert_with(Vec::new)
.push(::std::convert::Into::<usize>::into(__id));
}
},
CaptureMultiplicity::Single => quote! {
let __field_id = #ctx_ident.ast.field_id_for_name(#name_str)
.unwrap_or_else(|| panic!("field '{}' not found", #name_str));
__fields.entry(__field_id).or_insert_with(Vec::new).push(#name);
__fields.entry(__field_id).or_insert_with(Vec::new)
.push(::std::convert::Into::<usize>::into(#name));
},
}
})

View File

@@ -349,8 +349,8 @@ to enable rewriting:
```rust
let desugar = yeast::DesugaringConfig::new()
.add_phase("cleanup", cleanup_rules())
.add_phase("desugar", desugar_rules())
.add_phase("cleanup", yeast::PhaseKind::Repeating, cleanup_rules())
.add_phase("translate", yeast::PhaseKind::OneShot, translate_rules())
.with_output_node_types_yaml(include_str!("output-node-types.yml"));
let lang = simple::LanguageSpec {
@@ -365,6 +365,15 @@ let lang = simple::LanguageSpec {
A single-phase config is just `.add_phase(...)` called once. Phase names
appear in error messages so you can tell which phase failed.
There are two kinds of phases:
- **Repeating**:
Each node is re-processed until none of the rules in the phase matches.
When a node no longer matches any rules, its children are recursively processed. In practice this is used to desugar or simplify an AST, while staying mostly within the same schema.
- **One-shot**:
Each node is processed by the first matching rule, and the engine panics if no rule matches.
Rules are then recursively applied to every captured node.
In practice this is used when translating from one AST schema to another, where an exhaustive match is required.
The same YAML node-types is used for both the runtime yeast `Schema` (so
rules can refer to output-only kinds and fields) and TRAP validation (it
is converted to JSON internally).

View File

@@ -61,6 +61,21 @@ impl Captures {
}
}
}
/// Apply a fallible function to every captured id (across all keys),
/// replacing each id with the result. Stops and returns the error on
/// the first failure.
pub fn try_map_all_captures<E>(
&mut self,
mut f: impl FnMut(Id) -> Result<Id, E>,
) -> Result<(), E> {
for ids in self.captures.values_mut() {
for id in ids {
*id = f(*id)?;
}
}
Ok(())
}
pub fn map_captures_to(&mut self, from: &str, to: &'static str, f: &mut impl FnMut(Id) -> Id) {
if let Some(from_ids) = self.captures.get(from) {
let new_values = from_ids.iter().copied().map(f).collect();

View File

@@ -1,6 +1,6 @@
use std::fmt::Write;
use crate::{Ast, Node, NodeContent, CHILD_FIELD};
use crate::{schema::Schema, Ast, Node, NodeContent, CHILD_FIELD};
/// Options for controlling AST dump output.
pub struct DumpOptions {
@@ -45,16 +45,143 @@ pub fn dump_ast_with_options(
options: &DumpOptions,
) -> String {
let mut out = String::new();
dump_node(ast, root, source, options, 0, &mut out);
dump_node(ast, root, source, options, 0, None, &mut out);
out
}
/// Dump an AST and annotate type mismatches against a schema inline.
///
/// Any node that does not match the expected type set for its parent field is
/// rendered with a trailing `" <-- ERROR: ..."` annotation on the same line.
pub fn dump_ast_with_type_errors(
ast: &Ast,
root: usize,
source: &str,
schema: &Schema,
) -> String {
dump_ast_with_type_errors_and_options(ast, root, source, schema, &DumpOptions::default())
}
/// Dump an AST and annotate type mismatches against a schema inline.
///
/// Any node that does not match the expected type set for its parent field is
/// rendered with a trailing `" <-- ERROR: ..."` annotation on the same line.
pub fn dump_ast_with_type_errors_and_options(
ast: &Ast,
root: usize,
source: &str,
schema: &Schema,
options: &DumpOptions,
) -> String {
let mut out = String::new();
dump_node(ast, root, source, options, 0, Some((schema, None, None)), &mut out);
out
}
fn format_node_types(node_types: &[crate::schema::NodeType]) -> String {
node_types
.iter()
.map(|t| {
if t.named {
t.kind.clone()
} else {
format!("\"{}\"", t.kind)
}
})
.collect::<Vec<_>>()
.join(" | ")
}
const EMPTY_NODE_TYPES: &[crate::schema::NodeType] = &[];
/// Generate a type-checking error message for a node if it doesn't match expected types.
///
/// # Arguments
/// - `schema`: The AST schema to validate against.
/// - `node`: The node being checked.
/// - `expected`: The set of allowed types for this node, or `None` if type-checking is disabled.
/// - `parent_field`: Optional tuple of (parent_kind, field_name) for context in error messages.
///
/// # Returns
/// `Some(error_message)` if the node violates the schema (e.g., wrong kind, missing field declaration).
/// `None` if the node matches the expected types or if type-checking is disabled.
fn type_error_for_node(
schema: &Schema,
node: &Node,
expected: Option<&[crate::schema::NodeType]>,
parent_field: Option<(&str, &str)>,
) -> Option<String> {
if schema.id_for_node_kind(node.kind_name()).is_none()
&& schema.id_for_unnamed_node_kind(node.kind_name()).is_none()
{
return Some(format!("node kind '{}' not in schema", node.kind_name()));
}
let expected = expected?;
if expected.is_empty() {
if let Some((kind, field)) = parent_field {
return Some(format!("the node '{kind}' has no field '{field}'"));
}
return Some("field not declared in schema for this parent node".to_string());
}
if schema.node_matches_types(node.kind_name(), node.is_named(), expected) {
None
} else {
let actual = if node.is_named() {
node.kind_name().to_string()
} else {
format!("\"{}\"", node.kind_name())
};
if let Some((kind, field)) = parent_field {
Some(format!(
"The field {}.{} should contain {}, but got {}",
kind,
field,
format_node_types(expected),
actual
))
} else {
Some(format!(
"expected {}, got {}",
format_node_types(expected),
actual
))
}
}
}
/// Look up the allowed types for a field in the schema.
///
/// # Arguments
/// - `schema`: The AST schema to query.
/// - `parent_kind`: The node kind of the parent that contains this field.
/// - `field_id`: The field ID within that parent node.
///
/// # Returns
/// `Some(&[NodeType])` if the field is declared in the schema and has type constraints.
/// `None` if the field is not declared or has no constraints (undeclared field).
fn expected_for_field<'a>(
schema: &'a Schema,
parent_kind: &str,
field_id: u16,
) -> Option<&'a [crate::schema::NodeType]> {
schema
.field_types(parent_kind, field_id)
.map(|v| v.as_slice())
}
fn dump_node(
ast: &Ast,
id: usize,
source: &str,
options: &DumpOptions,
indent: usize,
type_check: Option<(
&Schema,
Option<&[crate::schema::NodeType]>,
Option<(&str, &str)>,
)>,
out: &mut String,
) {
let node = match ast.get_node(id) {
@@ -90,6 +217,12 @@ fn dump_node(
}
}
if let Some((schema, expected, parent_field)) = type_check {
if let Some(err) = type_error_for_node(schema, node, expected, parent_field) {
write!(out, " <-- ERROR: {err}").unwrap();
}
}
writeln!(out).unwrap();
// Named fields first
@@ -98,31 +231,68 @@ fn dump_node(
continue; // Handle unnamed children last
}
let field_name = ast.field_name_for_id(field_id).unwrap_or("?");
let child_type_check = type_check.map(|(schema, _, _)| {
let expected = expected_for_field(schema, node.kind_name(), field_id)
.or(Some(EMPTY_NODE_TYPES));
let parent_field = Some((node.kind_name(), field_name));
(schema, expected, parent_field)
});
if children.len() == 1 {
write!(out, "{prefix} {field_name}:").unwrap();
// Inline single child
let child = ast.get_node(children[0]);
if child.is_some_and(is_leaf) {
write!(out, " ").unwrap();
dump_node_inline(ast, children[0], source, options, out);
dump_node_inline(ast, children[0], source, options, child_type_check, out);
} else {
writeln!(out).unwrap();
dump_node(ast, children[0], source, options, indent + 2, out);
dump_node(
ast,
children[0],
source,
options,
indent + 2,
child_type_check,
out,
);
}
} else {
writeln!(out, "{prefix} {field_name}:").unwrap();
for &child_id in children {
dump_node(ast, child_id, source, options, indent + 2, out);
dump_node(
ast,
child_id,
source,
options,
indent + 2,
child_type_check,
out,
);
}
}
}
// Unnamed children — skip unnamed tokens (keywords, punctuation)
if let Some(children) = node.fields.get(&CHILD_FIELD) {
let child_type_check = type_check.map(|(schema, _, _)| {
let expected = expected_for_field(schema, node.kind_name(), CHILD_FIELD)
.or(Some(EMPTY_NODE_TYPES));
let parent_field = Some((node.kind_name(), "children"));
(schema, expected, parent_field)
});
for &child_id in children {
if let Some(child) = ast.get_node(child_id) {
if child.is_named() {
dump_node(ast, child_id, source, options, indent + 1, out);
dump_node(
ast,
child_id,
source,
options,
indent + 1,
child_type_check,
out,
);
}
}
}
@@ -130,7 +300,18 @@ fn dump_node(
}
/// Dump a leaf node inline (no newline prefix, caller provides context).
fn dump_node_inline(ast: &Ast, id: usize, source: &str, options: &DumpOptions, out: &mut String) {
fn dump_node_inline(
ast: &Ast,
id: usize,
source: &str,
options: &DumpOptions,
type_check: Option<(
&Schema,
Option<&[crate::schema::NodeType]>,
Option<(&str, &str)>,
)>,
out: &mut String,
) {
let node = match ast.get_node(id) {
Some(n) => n,
None => return,
@@ -159,6 +340,12 @@ fn dump_node_inline(ast: &Ast, id: usize, source: &str, options: &DumpOptions, o
}
}
if let Some((schema, expected, parent_field)) = type_check {
if let Some(err) = type_error_for_node(schema, node, expected, parent_field) {
write!(out, " <-- ERROR: {err}").unwrap();
}
}
writeln!(out).unwrap();
}

View File

@@ -23,12 +23,73 @@ pub use cursor::Cursor;
use query::QueryNode;
/// Node ids are indexes into the arena
type Id = usize;
pub type Id = usize;
/// Field and Kind ids are provided by tree-sitter
type FieldId = u16;
type KindId = u16;
/// A typed reference to a node in an [`Ast`] arena. Wraps an [`Id`] but
/// deliberately does not implement [`std::fmt::Display`]: rendering a node
/// requires the [`Ast`] it lives in (to resolve [`NodeContent::Range`] back
/// to source text). Use [`YeastDisplay::yeast_to_string`] to format it.
#[derive(Copy, Clone, Eq, PartialEq, Debug, Hash)]
pub struct NodeRef(pub Id);
impl NodeRef {
pub fn id(self) -> Id {
self.0
}
}
impl From<NodeRef> for Id {
fn from(value: NodeRef) -> Self {
value.0
}
}
/// Like [`std::fmt::Display`], but the formatting routine is given access to
/// the [`Ast`] so that node references can resolve to their source text.
///
/// All standard primitive and string types implement [`YeastDisplay`] via
/// the [`impl_yeast_display_via_display`] macro below. Coherence prevents a
/// blanket `impl<T: Display>`, so additional types must be added explicitly.
pub trait YeastDisplay {
fn yeast_to_string(&self, ast: &Ast) -> String;
}
impl YeastDisplay for NodeRef {
fn yeast_to_string(&self, ast: &Ast) -> String {
ast.source_text(self.0)
}
}
macro_rules! impl_yeast_display_via_display {
($($t:ty),* $(,)?) => {
$(
impl YeastDisplay for $t {
fn yeast_to_string(&self, _ast: &Ast) -> String {
::std::string::ToString::to_string(self)
}
}
)*
};
}
impl_yeast_display_via_display! {
i8, i16, i32, i64, i128, isize,
u8, u16, u32, u64, u128, usize,
f32, f64,
bool, char,
str, String,
}
impl<T: YeastDisplay + ?Sized> YeastDisplay for &T {
fn yeast_to_string(&self, ast: &Ast) -> String {
(**self).yeast_to_string(ast)
}
}
pub const CHILD_FIELD: u16 = u16::MAX;
#[derive(Debug)]
@@ -160,6 +221,9 @@ pub struct Ast {
root: Id,
nodes: Vec<Node>,
schema: schema::Schema,
/// Original source bytes the tree was parsed from. Used to resolve
/// `NodeContent::Range` to text for synthesized literal nodes.
source: Vec<u8>,
}
impl std::fmt::Debug for Ast {
@@ -182,21 +246,93 @@ impl Ast {
schema: schema::Schema,
tree: &tree_sitter::Tree,
language: &tree_sitter::Language,
) -> Self {
Self::from_tree_with_schema_and_source(schema, tree, language, Vec::new())
}
pub fn from_tree_with_schema_and_source(
schema: schema::Schema,
tree: &tree_sitter::Tree,
language: &tree_sitter::Language,
source: Vec<u8>,
) -> Self {
let mut visitor = visitor::Visitor::new(language.clone());
visitor.visit(tree);
visitor.build_with_schema(schema)
let mut ast = visitor.build_with_schema(schema);
ast.source = source;
ast
}
/// Returns the source text for `id`, resolving `NodeContent::Range`
/// against the stored source bytes when available.
pub fn source_text(&self, id: Id) -> String {
let Some(node) = self.get_node(id) else { return String::new(); };
let read_range = |range: &tree_sitter::Range| {
let start = range.start_byte;
let end = range.end_byte;
if end <= self.source.len() && start <= end {
String::from_utf8_lossy(&self.source[start..end]).into_owned()
} else {
String::new()
}
};
match &node.content {
NodeContent::Range(range) => read_range(range),
NodeContent::String(s) => s.to_string(),
NodeContent::DynamicString(s) if !s.is_empty() => s.clone(),
// Synthesized nodes (from rule transforms) carry an empty
// `DynamicString`; resolve them against the inherited source
// range so `#{capture}` after a translation still yields the
// original source text.
NodeContent::DynamicString(_) => match node.source_range {
Some(range) => read_range(&range),
None => String::new(),
},
}
}
pub fn walk(&self) -> AstCursor {
AstCursor::new(self)
}
/// Return all nodes currently allocated in the AST arena.
///
/// This includes nodes that are no longer reachable from `get_root()`
/// after desugaring rewrites. Use `reachable_node_ids()` for output-level
/// validation/traversal semantics.
pub fn nodes(&self) -> &[Node] {
&self.nodes
}
/// Return node ids reachable from `get_root()` by following child edges.
///
/// This reflects the effective AST after desugaring and excludes orphaned
/// arena nodes left behind by rewrite operations.
pub fn reachable_node_ids(&self) -> Vec<usize> {
let mut reachable = Vec::new();
let mut stack = vec![self.root];
let mut seen = vec![false; self.nodes.len()];
while let Some(id) = stack.pop() {
if id >= self.nodes.len() || seen[id] {
continue;
}
seen[id] = true;
reachable.push(id);
if let Some(node) = self.get_node(id) {
for children in node.fields.values() {
for &child in children {
stack.push(child);
}
}
}
}
reachable
}
pub fn get_root(&self) -> Id {
self.root
}
@@ -493,18 +629,39 @@ impl Rule {
node: Id,
fresh: &tree_builder::FreshScope,
) -> Result<Option<Vec<Id>>, String> {
match self.try_match(ast, node)? {
Some(captures) => Ok(Some(self.run_transform(ast, captures, node, fresh))),
None => Ok(None),
}
}
/// Attempt to match this rule's query against `node`, returning the
/// resulting captures on success. Does not invoke the transform.
fn try_match(&self, ast: &Ast, node: Id) -> Result<Option<Captures>, String> {
let mut captures = Captures::new();
if self.query.do_match(ast, node, &mut captures)? {
fresh.next_scope();
let source_range = ast.get_node(node).and_then(|n| match n.content {
NodeContent::Range(r) => Some(r),
_ => n.source_range,
});
Ok(Some((self.transform)(ast, captures, fresh, source_range)))
Ok(Some(captures))
} else {
Ok(None)
}
}
/// Run this rule's transform with the given captures, using `node`'s
/// source range as the source range of the produced nodes.
fn run_transform(
&self,
ast: &mut Ast,
captures: Captures,
node: Id,
fresh: &tree_builder::FreshScope,
) -> Vec<Id> {
fresh.next_scope();
let source_range = ast.get_node(node).and_then(|n| match n.content {
NodeContent::Range(r) => Some(r),
_ => n.source_range,
});
(self.transform)(ast, captures, fresh, source_range)
}
}
const MAX_REWRITE_DEPTH: usize = 100;
@@ -539,17 +696,17 @@ impl<'a> RuleIndex<'a> {
}
}
fn apply_rules(
fn apply_repeating_rules(
rules: &[Rule],
ast: &mut Ast,
id: Id,
fresh: &tree_builder::FreshScope,
) -> Result<Vec<Id>, String> {
let index = RuleIndex::new(rules);
apply_rules_inner(&index, ast, id, fresh, 0, None)
apply_repeating_rules_inner(&index, ast, id, fresh, 0, None)
}
fn apply_rules_inner(
fn apply_repeating_rules_inner(
index: &RuleIndex,
ast: &mut Ast,
id: Id,
@@ -578,7 +735,7 @@ fn apply_rules_inner(
let next_skip = if rule.repeated { None } else { Some(rule_ptr) };
let mut results = Vec::new();
for node in result_node {
results.extend(apply_rules_inner(
results.extend(apply_repeating_rules_inner(
index,
ast,
node,
@@ -603,7 +760,7 @@ fn apply_rules_inner(
for children in fields.values_mut() {
let mut new_children: Option<Vec<Id>> = None;
for (i, &child_id) in children.iter().enumerate() {
let result = apply_rules_inner(index, ast, child_id, fresh, rewrite_depth, None)?;
let result = apply_repeating_rules_inner(index, ast, child_id, fresh, rewrite_depth, None)?;
let unchanged = result.len() == 1 && result[0] == child_id;
match (&mut new_children, unchanged) {
(None, true) => {} // unchanged so far, no allocation needed
@@ -628,6 +785,92 @@ fn apply_rules_inner(
Ok(vec![id])
}
/// Apply rules using `OneShot` semantics: the first matching rule fires on
/// each visited node, recursion proceeds only through captured nodes (not
/// through the input node's children directly), and an error is returned if
/// no rule matches a visited node.
fn apply_one_shot_rules(
rules: &[Rule],
ast: &mut Ast,
id: Id,
fresh: &tree_builder::FreshScope,
) -> Result<Vec<Id>, String> {
let index = RuleIndex::new(rules);
apply_one_shot_rules_inner(&index, ast, id, fresh, 0)
}
fn apply_one_shot_rules_inner(
index: &RuleIndex,
ast: &mut Ast,
id: Id,
fresh: &tree_builder::FreshScope,
rewrite_depth: usize,
) -> Result<Vec<Id>, String> {
if rewrite_depth > MAX_REWRITE_DEPTH {
return Err(format!(
"Desugaring exceeded maximum rewrite depth ({MAX_REWRITE_DEPTH}). \
This likely indicates a non-terminating rule cycle."
));
}
let node_kind = ast.get_node(id).map(|n| n.kind()).unwrap_or("");
// Don't rewrite unnamed nodes (punctuation, keywords, etc.); leave them
// as-is. Rules target named nodes only.
if let Some(node) = ast.get_node(id) {
if !node.is_named() {
return Ok(vec![id]);
}
}
for rule in index.rules_for_kind(node_kind) {
if let Some(mut captures) = rule.try_match(ast, id)? {
// Recursively translate every captured node before invoking the
// transform. The transform's output uses output-schema kinds, so
// we must translate captured input-schema nodes to their
// output-schema equivalents first.
captures.try_map_all_captures(|captured_id| {
// Avoid infinite recursion when a capture refers to the root
// node of the matched tree (e.g. an `@_` capture on the
// pattern root): re-analyzing it would match the same rule
// again indefinitely.
if captured_id == id {
return Ok(captured_id);
}
let result =
apply_one_shot_rules_inner(index, ast, captured_id, fresh, rewrite_depth + 1)?;
if result.len() != 1 {
return Err(format!(
"OneShot: recursion on captured node produced {} results, expected exactly 1",
result.len()
));
}
Ok(result[0])
})?;
return Ok(rule.run_transform(ast, captures, id, fresh));
}
}
Err(format!(
"OneShot: no rule matched node of kind '{node_kind}'"
))
}
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub enum PhaseKind {
/// A node is re-processed until none of the rules in the phase matches,
/// albeit a single rule cannot be applied twice in a row unless that rule is also marked as repeating.
/// When a node no longer matches any rules, its children are recursively processed (top down).
Repeating,
/// A node is processed by the first matching rule, and the engine panics if no rule matches.
/// Rules are then recursively applied to every captured node.
/// In practice this is used when translating from one AST schema to another, where every node must be rewritten,
/// and it would be a type error to match the rule patterns (based on the input schema) against the output nodes (which conform to the output schema).
OneShot,
}
/// One phase of a desugaring pass: a named bundle of rules that runs to
/// completion (a full traversal applying its rules) before the next phase
/// starts. Rules within a phase compete for matches as usual; rules in
@@ -637,13 +880,15 @@ pub struct Phase {
/// Name used in error messages.
pub name: String,
pub rules: Vec<Rule>,
pub kind: PhaseKind,
}
impl Phase {
pub fn new(name: impl Into<String>, rules: Vec<Rule>) -> Self {
pub fn new(name: impl Into<String>, kind: PhaseKind, rules: Vec<Rule>) -> Self {
Self {
name: name.into(),
rules,
kind,
}
}
}
@@ -661,8 +906,8 @@ impl Phase {
///
/// ```ignore
/// let config = yeast::DesugaringConfig::new()
/// .add_phase("cleanup", cleanup_rules)
/// .add_phase("desugar", desugar_rules)
/// .add_phase("cleanup", PhaseKind::Repeating, cleanup_rules)
/// .add_phase("desugar", PhaseKind::Repeating, desugar_rules)
/// .with_output_node_types_yaml(yaml);
/// ```
#[derive(Default)]
@@ -682,9 +927,14 @@ impl DesugaringConfig {
Self::default()
}
/// Append a new phase with the given name and rules.
pub fn add_phase(mut self, name: impl Into<String>, rules: Vec<Rule>) -> Self {
self.phases.push(Phase::new(name, rules));
/// Append a new phase with the given name, kind, and rules.
pub fn add_phase(
mut self,
name: impl Into<String>,
kind: PhaseKind,
rules: Vec<Rule>,
) -> Self {
self.phases.push(Phase::new(name, kind, rules));
self
}
@@ -747,8 +997,17 @@ impl<'a> Runner<'a> {
})
}
pub fn run_from_tree(&self, tree: &tree_sitter::Tree) -> Result<Ast, String> {
let mut ast = Ast::from_tree_with_schema(self.schema.clone(), tree, &self.language);
pub fn run_from_tree(
&self,
tree: &tree_sitter::Tree,
source: &[u8],
) -> Result<Ast, String> {
let mut ast = Ast::from_tree_with_schema_and_source(
self.schema.clone(),
tree,
&self.language,
source.to_vec(),
);
self.run_phases(&mut ast)?;
Ok(ast)
}
@@ -761,7 +1020,12 @@ impl<'a> Runner<'a> {
let tree = parser
.parse(input, None)
.ok_or_else(|| "Failed to parse input".to_string())?;
let mut ast = Ast::from_tree_with_schema(self.schema.clone(), &tree, &self.language);
let mut ast = Ast::from_tree_with_schema_and_source(
self.schema.clone(),
&tree,
&self.language,
input.as_bytes().to_vec(),
);
self.run_phases(&mut ast)?;
Ok(ast)
}
@@ -773,8 +1037,11 @@ impl<'a> Runner<'a> {
let fresh = tree_builder::FreshScope::new();
let mut root = ast.get_root();
for phase in self.phases {
let res = apply_rules(&phase.rules, ast, root, &fresh)
.map_err(|e| format!("Phase `{}`: {e}", phase.name))?;
let res = match phase.kind {
PhaseKind::Repeating => apply_repeating_rules(&phase.rules, ast, root, &fresh),
PhaseKind::OneShot => apply_one_shot_rules(&phase.rules, ast, root, &fresh),
}
.map_err(|e| format!("Phase `{}`: {e}", phase.name))?;
if res.len() != 1 {
return Err(format!(
"Phase `{}`: expected exactly one result node, got {}",

View File

@@ -23,6 +23,7 @@
use std::collections::{BTreeMap, BTreeSet};
use std::fmt::Write;
use crate::CHILD_FIELD;
use serde::Deserialize;
use serde_json::json;
@@ -100,30 +101,36 @@ fn parse_field_name(raw: &str) -> FieldSpec {
/// Resolve a TypeRef to a (type, named) pair, given the sets of known named
/// and unnamed types.
fn resolve_type_ref_pair(
type_ref: &TypeRef,
named_types: &BTreeSet<String>,
unnamed_types: &BTreeSet<String>,
) -> (String, bool) {
match type_ref {
TypeRef::Explicit { unnamed } => (unnamed.clone(), false),
TypeRef::Name(name) => {
let is_named = named_types.contains(name);
let is_unnamed = unnamed_types.contains(name);
if is_named && is_unnamed {
(name.clone(), true)
} else if is_unnamed {
(name.clone(), false)
} else {
(name.clone(), true)
}
}
}
}
/// Resolve a TypeRef to a {type, named} JSON record, given the sets of known named
/// and unnamed types.
fn resolve_type_ref(
type_ref: &TypeRef,
named_types: &BTreeSet<String>,
unnamed_types: &BTreeSet<String>,
) -> serde_json::Value {
match type_ref {
TypeRef::Explicit { unnamed } => {
json!({"type": unnamed, "named": false})
}
TypeRef::Name(name) => {
let is_named = named_types.contains(name);
let is_unnamed = unnamed_types.contains(name);
if is_named && is_unnamed {
// Ambiguous: default to named
json!({"type": name, "named": true})
} else if is_unnamed {
json!({"type": name, "named": false})
} else {
// Named, or unknown (assume named)
json!({"type": name, "named": true})
}
}
}
let (kind, named) = resolve_type_ref_pair(type_ref, named_types, unnamed_types);
json!({"type": kind, "named": named})
}
/// Convert YAML string to node-types JSON string.
@@ -233,14 +240,12 @@ pub fn convert(yaml_input: &str) -> Result<String, String> {
serde_json::to_string_pretty(&output).map_err(|e| format!("Failed to serialize JSON: {e}"))
}
/// Build a Schema from a YAML node-types string.
/// Registers all node kinds and field names found in the YAML.
pub fn schema_from_yaml(yaml_input: &str) -> Result<crate::schema::Schema, String> {
let yaml: YamlNodeTypes =
serde_yaml::from_str(yaml_input).map_err(|e| format!("Failed to parse YAML: {e}"))?;
let mut schema = crate::schema::Schema::new();
/// Apply YAML node-type definitions to a mutable Schema.
/// Registers all types, fields, and allowed types from the YAML into the schema.
fn apply_yaml_to_schema(
yaml: &YamlNodeTypes,
schema: &mut crate::schema::Schema,
) {
// Register all supertypes as node kinds
for name in yaml.supertypes.keys() {
schema.register_kind(name);
@@ -264,6 +269,62 @@ pub fn schema_from_yaml(yaml_input: &str) -> Result<crate::schema::Schema, Strin
schema.register_unnamed_kind(name);
}
let mut named_types = BTreeSet::new();
for name in yaml.supertypes.keys() {
named_types.insert(name.clone());
}
for name in yaml.named.keys() {
named_types.insert(name.clone());
}
let unnamed_types: BTreeSet<String> = yaml.unnamed.iter().cloned().collect();
for (supertype, members) in &yaml.supertypes {
let node_types = members
.iter()
.map(|m| {
let (kind, named) = resolve_type_ref_pair(m, &named_types, &unnamed_types);
crate::schema::NodeType { kind, named }
})
.collect();
schema.set_supertype_members(supertype, node_types);
}
// Register allowed field child types for type checking.
for (parent_kind, fields_opt) in &yaml.named {
let Some(fields) = fields_opt else {
continue;
};
for (raw_field_name, type_refs) in fields {
let spec = parse_field_name(raw_field_name);
let field_id = match &spec.name {
Some(name) => schema.register_field(name),
None => CHILD_FIELD,
};
let mut node_types = type_refs
.clone()
.into_vec()
.into_iter()
.map(|type_ref| {
let (kind, named) = resolve_type_ref_pair(&type_ref, &named_types, &unnamed_types);
crate::schema::NodeType { kind, named }
})
.collect::<Vec<_>>();
node_types.sort_by(|a, b| a.kind.cmp(&b.kind).then(a.named.cmp(&b.named)));
node_types.dedup_by(|a, b| a.kind == b.kind && a.named == b.named);
schema.set_field_types(parent_kind, field_id, node_types);
}
}
}
pub fn schema_from_yaml(yaml_input: &str) -> Result<crate::schema::Schema, String> {
let yaml: YamlNodeTypes =
serde_yaml::from_str(yaml_input).map_err(|e| format!("Failed to parse YAML: {e}"))?;
let mut schema = crate::schema::Schema::new();
apply_yaml_to_schema(&yaml, &mut schema);
Ok(schema)
}
@@ -278,29 +339,7 @@ pub fn schema_from_yaml_with_language(
serde_yaml::from_str(yaml_input).map_err(|e| format!("Failed to parse YAML: {e}"))?;
let mut schema = crate::schema::Schema::from_language(language);
// Register supertypes
for name in yaml.supertypes.keys() {
schema.register_kind(name);
}
// Register named node kinds and their fields
for (name, fields_opt) in &yaml.named {
schema.register_kind(name);
if let Some(fields) = fields_opt {
for raw_field_name in fields.keys() {
let spec = parse_field_name(raw_field_name);
if let Some(field_name) = &spec.name {
schema.register_field(field_name);
}
}
}
}
// Register unnamed tokens
for name in &yaml.unnamed {
schema.register_unnamed_kind(name);
}
apply_yaml_to_schema(&yaml, &mut schema);
Ok(schema)
}

View File

@@ -1,7 +1,13 @@
use std::collections::BTreeMap;
use std::collections::{BTreeMap, BTreeSet};
use crate::{FieldId, KindId, CHILD_FIELD};
#[derive(Clone, Debug)]
pub struct NodeType {
pub kind: String,
pub named: bool,
}
/// A schema defining node kinds and field names for the output AST.
/// Built from a node-types.yml file, independent of any tree-sitter grammar.
///
@@ -25,6 +31,8 @@ pub struct Schema {
unnamed_kind_ids: BTreeMap<String, KindId>,
kind_names: BTreeMap<KindId, &'static str>,
next_kind_id: KindId,
field_types: BTreeMap<(String, FieldId), Vec<NodeType>>,
supertypes: BTreeMap<String, Vec<NodeType>>,
}
impl Default for Schema {
@@ -43,6 +51,8 @@ impl Schema {
unnamed_kind_ids: BTreeMap::new(),
kind_names: BTreeMap::new(),
next_kind_id: 1, // 0 is reserved
field_types: BTreeMap::new(),
supertypes: BTreeMap::new(),
}
}
@@ -166,4 +176,68 @@ impl Schema {
pub fn node_kind_for_id(&self, id: KindId) -> Option<&'static str> {
self.kind_names.get(&id).copied()
}
pub fn set_field_types(
&mut self,
parent_kind: &str,
field_id: FieldId,
node_types: Vec<NodeType>,
) {
self.field_types
.insert((parent_kind.to_string(), field_id), node_types);
}
pub fn field_types(
&self,
parent_kind: &str,
field_id: FieldId,
) -> Option<&Vec<NodeType>> {
self.field_types
.get(&(parent_kind.to_string(), field_id))
}
pub fn set_supertype_members(&mut self, supertype: &str, node_types: Vec<NodeType>) {
self.supertypes.insert(supertype.to_string(), node_types);
}
fn allows_node(
&self,
node_type: &NodeType,
node_kind: &str,
node_named: bool,
active: &mut BTreeSet<String>,
) -> bool {
if node_type.kind == node_kind && node_type.named == node_named {
return true;
}
if !node_type.named {
return false;
}
let Some(members) = self.supertypes.get(&node_type.kind) else {
return false;
};
if !active.insert(node_type.kind.clone()) {
return false;
}
let matched = members
.iter()
.any(|member| self.allows_node(member, node_kind, node_named, active));
active.remove(&node_type.kind);
matched
}
pub fn node_matches_types(
&self,
node_kind: &str,
node_named: bool,
node_types: &[NodeType],
) -> bool {
node_types.iter().any(|node_type| {
self.allows_node(node_type, node_kind, node_named, &mut BTreeSet::new())
})
}
}

View File

@@ -52,6 +52,7 @@ impl Visitor {
root: 0,
schema,
nodes: self.nodes.into_iter().map(|n| n.inner).collect(),
source: Vec::new(),
}
}

View File

@@ -1,6 +1,6 @@
#![cfg(test)]
use yeast::dump::dump_ast;
use yeast::dump::{dump_ast, dump_ast_with_type_errors};
use yeast::*;
const OUTPUT_SCHEMA_YAML: &str = include_str!("node-types.yml");
@@ -15,7 +15,7 @@ fn parse_and_dump(input: &str) -> String {
/// Helper: parse Ruby source with a custom output schema and a single
/// phase of rules, return dump.
fn run_and_dump(input: &str, rules: Vec<Rule>) -> String {
run_phased_and_dump(input, vec![Phase::new("test", rules)])
run_phased_and_dump(input, vec![Phase::new("test", PhaseKind::Repeating, rules)])
}
/// Helper: parse Ruby source with a custom output schema and multiple
@@ -35,13 +35,42 @@ fn run_and_get_error(input: &str, rules: Vec<Rule>) -> String {
let lang: tree_sitter::Language = tree_sitter_ruby::LANGUAGE.into();
let schema =
yeast::node_types_yaml::schema_from_yaml_with_language(OUTPUT_SCHEMA_YAML, &lang).unwrap();
let phases = vec![Phase::new("test", rules)];
let phases = vec![Phase::new("test", PhaseKind::Repeating, rules)];
let runner = Runner::with_schema(lang, &schema, &phases);
runner
.run(input)
.expect_err("expected runner to return an error")
}
/// Helper: parse Ruby source with no rules and dump with schema type errors.
fn parse_and_dump_typed(input: &str, schema_yaml: &str) -> String {
let runner = Runner::new(tree_sitter_ruby::LANGUAGE.into(), &[]);
let ast = runner.run(input).unwrap();
let schema = yeast::node_types_yaml::schema_from_yaml(schema_yaml).unwrap();
dump_ast_with_type_errors(&ast, ast.get_root(), input, &schema)
}
/// Helper: parse Ruby source with no rules and dump with schema type errors,
/// building schema with language IDs so field checks align with parser fields.
fn parse_and_dump_typed_with_language(input: &str, schema_yaml: &str) -> String {
let lang: tree_sitter::Language = tree_sitter_ruby::LANGUAGE.into();
let runner = Runner::new(lang.clone(), &[]);
let ast = runner.run(input).unwrap();
let schema = yeast::node_types_yaml::schema_from_yaml_with_language(schema_yaml, &lang)
.unwrap();
dump_ast_with_type_errors(&ast, ast.get_root(), input, &schema)
}
/// Helper: parse Ruby source with custom rules and dump with schema type errors.
fn run_and_dump_typed(input: &str, rules: Vec<Rule>, schema_yaml: &str) -> String {
let lang: tree_sitter::Language = tree_sitter_ruby::LANGUAGE.into();
let schema = yeast::node_types_yaml::schema_from_yaml(schema_yaml).unwrap();
let phases = vec![Phase::new("test", PhaseKind::Repeating, rules)];
let runner = Runner::with_schema(lang, &schema, &phases);
let ast = runner.run(input).unwrap();
dump_ast_with_type_errors(&ast, ast.get_root(), input, &schema)
}
/// Assert that a dump equals the expected string, treating the expected
/// string as an indented multiline literal: leading/trailing blank lines
/// are stripped, and the common leading indentation is removed from every
@@ -125,6 +154,85 @@ fn test_parse_for_loop() {
);
}
#[test]
fn test_dump_highlights_type_errors_inline() {
let schema_yaml = r#"
named:
program:
$children*: assignment
assignment:
left: identifier
right: identifier
identifier:
"#;
let dump = parse_and_dump_typed("x = 1", schema_yaml);
assert!(dump.contains("integer \"1\" <-- ERROR:"));
}
#[test]
fn test_dump_reports_preserved_unknown_kind_after_transformation() {
let schema_yaml = r#"
named:
program:
$children*: assignment
assignment:
left: identifier
right: identifier
identifier:
"#;
// This rewrite runs and preserves the RHS node kind via capture.
// With schema above, preserving `integer` should be reported inline.
let rules = vec![yeast::rule!(
(assignment left: (_) @left right: (_) @right)
=>
(assignment
left: {left}
right: {right}
)
)];
let dump = run_and_dump_typed("x = 1", rules, schema_yaml);
assert!(dump.contains("integer \"1\" <-- ERROR:"));
assert!(dump.contains("node kind 'integer' not in schema"));
}
#[test]
fn test_dump_reports_undeclared_field_on_node() {
let schema_yaml = r#"
named:
program:
$children*: assignment
assignment:
left: identifier
identifier:
"#;
let dump = parse_and_dump_typed_with_language("x = y", schema_yaml);
assert!(dump.contains("right: identifier \"y\" <-- ERROR:"));
assert!(dump.contains("the node 'assignment' has no field 'right'"));
}
#[test]
fn test_dump_reports_disallowed_kind_in_field_type() {
let schema_yaml = r#"
named:
program:
$children*: assignment
assignment:
left: identifier
right: identifier
identifier:
integer:
"#;
let dump = parse_and_dump_typed_with_language("x = 1", schema_yaml);
assert!(dump.contains("right: integer \"1\" <-- ERROR:"));
assert!(dump.contains("should contain"));
assert!(dump.contains("but got integer"));
}
// ---- Query tests ----
#[test]
@@ -166,6 +274,32 @@ fn test_query_no_match() {
assert!(!matched);
}
#[test]
fn test_reachable_nodes_excludes_orphaned_rewrite_nodes() {
let lang: tree_sitter::Language = tree_sitter_ruby::LANGUAGE.into();
let schema = yeast::node_types_yaml::schema_from_yaml_with_language(OUTPUT_SCHEMA_YAML, &lang)
.unwrap();
let phases = vec![Phase::new(
"test",
PhaseKind::Repeating,
vec![yeast::rule!((integer) => (identifier "replaced"))],
)];
let runner = Runner::with_schema(lang, &schema, &phases);
let input = "x = 1";
let ast = runner.run(input).unwrap();
let reachable_ids = ast.reachable_node_ids();
assert!(
ast.nodes().len() > reachable_ids.len(),
"expected rewrite to leave orphaned arena nodes"
);
let dump = dump_ast(&ast, ast.get_root(), input);
assert!(dump.contains("identifier \"replaced\""));
assert!(!dump.contains("integer \"1\""));
}
#[test]
fn test_query_repeated_capture() {
let runner = Runner::new(tree_sitter_ruby::LANGUAGE.into(), &[]);
@@ -653,8 +787,8 @@ fn test_phased_desugaring() {
let dump = run_phased_and_dump(
"x = 1",
vec![
Phase::new("cleanup", cleanup),
Phase::new("desugar", desugar),
Phase::new("cleanup", PhaseKind::Repeating, cleanup),
Phase::new("desugar", PhaseKind::Repeating, desugar),
],
);
assert_dump_eq(
@@ -675,7 +809,11 @@ fn test_phase_error_includes_phase_name() {
let lang: tree_sitter::Language = tree_sitter_ruby::LANGUAGE.into();
let schema =
yeast::node_types_yaml::schema_from_yaml_with_language(OUTPUT_SCHEMA_YAML, &lang).unwrap();
let phases = vec![Phase::new("buggy", vec![swap_assignment_rule().repeated()])];
let phases = vec![Phase::new(
"buggy",
PhaseKind::Repeating,
vec![swap_assignment_rule().repeated()],
)];
let runner = Runner::with_schema(lang, &schema, &phases);
let err = runner
.run("x = 1")
@@ -690,6 +828,168 @@ fn test_phase_error_includes_phase_name() {
);
}
/// Helper: an exhaustive set of OneShot rules covering every node reachable
/// (via captures) when translating `"x = 1"`.
fn one_shot_xeq1_rules() -> Vec<Rule> {
vec![
yeast::rule!(
(program (_)* @stmts)
=>
(program stmt: {..stmts})
),
yeast::rule!(
(assignment left: (_) @left right: (_) @right)
=>
(first_node left: {left} right: {right})
),
yeast::rule!((identifier) => (identifier "ID")),
yeast::rule!((integer) => (integer "INT")),
]
}
#[test]
fn test_one_shot_phase() {
let lang: tree_sitter::Language = tree_sitter_ruby::LANGUAGE.into();
let schema =
yeast::node_types_yaml::schema_from_yaml_with_language(OUTPUT_SCHEMA_YAML, &lang).unwrap();
let phases = vec![Phase::new(
"translate",
PhaseKind::OneShot,
one_shot_xeq1_rules(),
)];
let runner = Runner::with_schema(lang, &schema, &phases);
let input = "x = 1";
let ast = runner.run(input).unwrap();
let dump = dump_ast(&ast, ast.get_root(), input);
assert_dump_eq(
&dump,
r#"
program
stmt:
first_node
left: identifier "ID"
right: integer "INT"
"#,
);
}
#[test]
fn test_one_shot_phase_errors_when_no_rule_matches() {
let lang: tree_sitter::Language = tree_sitter_ruby::LANGUAGE.into();
let schema =
yeast::node_types_yaml::schema_from_yaml_with_language(OUTPUT_SCHEMA_YAML, &lang).unwrap();
// Drop the `integer` rule so the recursion has no rule for `integer`.
let mut rules = one_shot_xeq1_rules();
rules.pop();
let phases = vec![Phase::new("translate", PhaseKind::OneShot, rules)];
let runner = Runner::with_schema(lang, &schema, &phases);
let err = runner
.run("x = 1")
.expect_err("expected OneShot to error on unmatched node");
assert!(
err.contains("Phase `translate`"),
"error should name the phase, got: {err}"
);
assert!(
err.contains("no rule matched") && err.contains("integer"),
"error should describe the unmatched node kind, got: {err}"
);
}
/// OneShot recursion must apply rules to *captured* nodes, even if the rule
/// returns a captured child verbatim. A buggy implementation that only
/// recurses into the children of the rule's output (rather than into the
/// captures) would leave the returned capture untransformed.
#[test]
fn test_one_shot_recurses_into_returned_capture() {
let lang: tree_sitter::Language = tree_sitter_ruby::LANGUAGE.into();
let schema =
yeast::node_types_yaml::schema_from_yaml_with_language(OUTPUT_SCHEMA_YAML, &lang).unwrap();
let rules = vec![
yeast::rule!(
(program (_)* @stmts)
=>
(program stmt: {..stmts})
),
// Returns the captured `left` verbatim, discarding `right`.
yeast::rule!(
(assignment left: (_) @left right: (_) @right)
=>
{left}
),
yeast::rule!((identifier) => (identifier "ID")),
yeast::rule!((integer) => (integer "INT")),
];
let phases = vec![Phase::new("translate", PhaseKind::OneShot, rules)];
let runner = Runner::with_schema(lang, &schema, &phases);
let input = "x = 1";
let ast = runner.run(input).unwrap();
let dump = dump_ast(&ast, ast.get_root(), input);
// `left` is an `identifier`; OneShot must apply the identifier rule to
// it before the assignment transform returns it verbatim.
assert_dump_eq(
&dump,
r#"
program
stmt: identifier "ID"
"#,
);
}
/// OneShot recursion must NOT descend into the children of the rule's output.
/// A rule may legitimately wrap a captured node in fresh output-schema nodes
/// that have no matching rule of their own (since rule patterns target the
/// input schema). Recursing into the output would erroneously try to find
/// rules for those wrapper kinds and fail.
#[test]
fn test_one_shot_does_not_recurse_into_wrapper_output() {
let lang: tree_sitter::Language = tree_sitter_ruby::LANGUAGE.into();
let schema =
yeast::node_types_yaml::schema_from_yaml_with_language(OUTPUT_SCHEMA_YAML, &lang).unwrap();
let rules = vec![
yeast::rule!(
(program (_)* @stmts)
=>
(program stmt: {..stmts})
),
// Wraps `left` in nested `first_node`/`second_node` output kinds.
// Neither wrapper kind has a matching rule, so a buggy implementation
// that recurses into the wrapper's children would error.
yeast::rule!(
(assignment left: (_) @left right: (_) @right)
=>
(first_node
left: (second_node left: {left} right: {right})
right: {left}
)
),
yeast::rule!((identifier) => (identifier "ID")),
yeast::rule!((integer) => (integer "INT")),
];
let phases = vec![Phase::new("translate", PhaseKind::OneShot, rules)];
let runner = Runner::with_schema(lang, &schema, &phases);
let input = "x = 1";
let ast = runner.run(input).unwrap();
let dump = dump_ast(&ast, ast.get_root(), input);
assert_dump_eq(
&dump,
r#"
program
stmt:
first_node
left:
second_node
left: identifier "ID"
right: integer "INT"
right: identifier "ID"
"#,
);
}
// ---- Cursor tests ----
#[test]
@@ -760,3 +1060,54 @@ fn test_desugar_for_with_multiple_assignment() {
"#,
);
}
/// Regression test: `#{capture}` in a template must render the *source text*
/// of the captured node, not its arena `Id`. Previously, captures were bound
/// as `usize`, so `#{cap}` printed the integer id (e.g. `"3"`) via `Display`.
/// Captures are now bound as `NodeRef`, which has no `Display` impl and
/// resolves to the captured node's source text via `YeastDisplay`.
#[test]
fn test_hash_brace_renders_capture_source_text() {
let rule = rule!(
(call
method: (identifier) @name
receiver: (identifier) @recv
)
=>
(call
method: (identifier #{name})
receiver: (identifier #{recv})
arguments: (argument_list)
)
);
let dump = run_and_dump("foo.bar()", vec![rule]);
assert_dump_eq(
&dump,
r#"
program
call
arguments: argument_list "foo.bar()"
method: identifier "bar"
receiver: identifier "foo"
"#,
);
}
/// Regression test: non-`NodeRef` values in `#{expr}` still render via their
/// `Display` impl (covered by `YeastDisplay`'s blanket impls for primitives).
#[test]
fn test_hash_brace_renders_integer_expression() {
let rule = rule!(
(identifier) @_
=>
(identifier #{1 + 2})
);
let dump = run_and_dump("foo", vec![rule]);
assert_dump_eq(
&dump,
r#"
program
identifier "3"
"#,
);
}

View File

@@ -20,10 +20,15 @@ grammar source), run `scripts/regenerate-grammar.sh` to:
it shows the impact of a grammar tweak on the named node kinds, fields,
and child types in a form much easier to read than the raw JSON.
## Testing
- If you changed the extractor code, always rebuild it before running tests.
## Extractor Testing
- To run extractor tests, run `cargo test` in the `extractor` directory.
- To run all tests, run `codeql test run --search-path extractor-pack ql/test`
- Do not edit the printed ASTs in `extractor/test/corpus` directly. To regenerate the ASTs, run `scripts/update-corpus.sh`.
## CodeQL Testing
- If you changed the extractor code, always rebuild it before running CodeQL tests.
- To run all CodeQL tests, run `codeql test run --search-path extractor-pack ql/test`
- Do not edit `.expected` files manually. To update the expected output, pass `--learn` to the `codeql test run` command.

View File

@@ -0,0 +1,144 @@
supertypes:
expr:
- name_expr
- int_literal
- string_literal
- binary_expr
- unary_expr
- call_expr
- member_access_expr
- lambda_expr
- unsupported_node
stmt:
- empty_stmt
- block_stmt
- expr_stmt
- if_stmt
- variable_declaration_stmt
- guard_if_stmt
- unsupported_node
condition:
- expr_condition
- let_pattern_condition
- sequence_condition
- unsupported_node
pattern:
- var_pattern
- apply_pattern
- tuple_pattern
- ignore_pattern
- unsupported_node
named:
# Top-level is the root node, currently containing a list of expressions
top_level:
body*: [expr, stmt]
# An identifier used in the context of an expression
name_expr:
identifier: identifier
# An integer literal
int_literal:
# A string literal
string_literal:
# Application of a binary operator, such as `a + b`
binary_expr:
left: expr
operator: operator
right: expr
# Application of a unary operator, such as `!x`
unary_expr:
operand: expr
operator: operator
# A function or method call, such as `f(x)` or `obj.m(x)`. Method calls
# are represented as a call whose `function` is a `member_access_expr`.
call_expr:
function: expr
argument*: expr
# Member access, such as `obj.member`.
member_access_expr:
target: expr
member: identifier
lambda_expr:
parameter*: parameter
body: [expr, stmt]
# A parameter
parameter:
pattern: pattern
empty_stmt:
block_stmt:
body*: stmt
expr_stmt:
expr: expr
if_stmt:
condition: condition
then?: stmt
else?: stmt
variable_declaration_stmt:
variable_declarator+: variable_declarator
# A variable declaration, or assignment to a pattern.
# The initializer is optional (but typically only possible in combination with a simple variable pattern).
variable_declarator:
pattern: pattern
value?: expr
# Evaluate 'condition', and if false, execute 'else' which must break from the enclosing block scope (return, break, etc).
# Any variables bound by 'condition' will be in scope for the remainder of the enclosing block scope
# (which differs from how if_stmt works).
guard_if_stmt:
condition: condition
else: stmt
# Evaluates the given condition and interprets it as a boolean (by language conventions)
expr_condition:
expr: expr
# A series of statements that are executed before evaluating the trailing condition.
# Useful for languages where a conditional clause may be preceded by side-effecting
# syntactic elements (e.g. binding clauses) that don't themselves form a condition.
sequence_condition:
stmt*: stmt
condition: condition
# Evaluate 'expr' and match its result against 'pattern', and return true if it matches.
# Variables bound by the pattern will be in scope within the 'true' branch controlled by this condition.
let_pattern_condition:
pattern: pattern
value: expr
# A pattern matching anything, binding its value to the given variable
var_pattern:
identifier: identifier
# A pattern matching anything, binding no variables, usually using the syntax "_"
ignore_pattern:
# A pattern such as `Some(x)` where `Some` is the constructor and `x` is an argument
apply_pattern:
constructor: expr
argument*: pattern
# A tuple pattern such as `(a, b)` in `let (a, b) = pair`.
tuple_pattern:
element*: pattern
# An simple unqualified identifier token
identifier:
# A node that we don't yet translate
unsupported_node:
operator:

View File

@@ -3,9 +3,7 @@ use std::path::PathBuf;
use codeql_extractor::extractor::simple;
use codeql_extractor::trap;
#[path = "languages/swift/swift.rs"]
mod swift;
use crate::languages;
#[derive(Args)]
pub struct Options {
@@ -25,11 +23,17 @@ pub struct Options {
pub fn run(options: Options) -> std::io::Result<()> {
codeql_extractor::extractor::set_tracing_level("unified");
// The generated dbscheme/QL library uses the unified_* relation namespace.
// Keep per-language specs for parser/rules/file globs, but normalize the
// extraction table prefix so emitted TRAP relations match the dbscheme.
let mut languages = languages::all_language_specs();
for lang in &mut languages {
lang.prefix = "unified";
}
let extractor = simple::Extractor {
prefix: "unified".to_string(),
languages: vec![
swift::language_spec(),
],
languages,
trap_dir: options.output_dir,
trap_compression: trap::Compression::from_env("CODEQL_EXTRACTOR_UNIFIED_OPTION_TRAP_COMPRESSION"),
source_archive_dir: options.source_archive_dir,

View File

@@ -3,6 +3,8 @@ use std::path::PathBuf;
use codeql_extractor::generator::{generate, language::Language};
use crate::languages;
#[derive(Args)]
pub struct Options {
/// Path of the generated dbscheme file
@@ -17,10 +19,16 @@ pub struct Options {
pub fn run(options: Options) -> std::io::Result<()> {
codeql_extractor::extractor::set_tracing_level("unified");
// The QL-visible schema is the unified output AST, not the per-language
// input grammars. Pass it via `desugar.output_node_types_yaml` so the
// generator converts the YAML to JSON node-types.
let desugar = yeast::DesugaringConfig::new()
.with_output_node_types_yaml(languages::OUTPUT_AST_SCHEMA);
let languages = vec![Language {
name: "Swift".to_owned(),
node_types: tree_sitter_swift::NODE_TYPES,
desugar: None,
name: "Unified".to_owned(),
node_types: "", // unused: generator picks up output_node_types_yaml above
desugar: Some(desugar),
}];
generate(languages, options.dbscheme, options.library, "run unified/scripts/create-extractor-pack.sh")

View File

@@ -0,0 +1,11 @@
use codeql_extractor::extractor::simple;
#[path = "swift/swift.rs"]
mod swift;
/// Shared YEAST output AST schema for all languages.
pub(crate) const OUTPUT_AST_SCHEMA: &str = include_str!("../../ast_types.yml");
pub fn all_language_specs() -> Vec<simple::LanguageSpec> {
vec![swift::language_spec(OUTPUT_AST_SCHEMA)]
}

View File

@@ -1,18 +1,358 @@
use codeql_extractor::extractor::simple;
use yeast::{rule, DesugaringConfig};
use yeast::{build::BuildCtx, rule, DesugaringConfig, PhaseKind};
fn desugaring_rules() -> Vec<yeast::Rule> {
/// Names of output AST kinds that belong to the `expr` supertype. Kept in
/// sync with `ast_types.yml`. `unsupported_node` is intentionally omitted
/// because it is also a member of the `stmt` supertype.
const EXPR_KINDS: &[&str] = &[
"name_expr",
"int_literal",
"string_literal",
"binary_expr",
"unary_expr",
"call_expr",
"member_access_expr",
"lambda_expr",
];
/// If `id` is an `expr`, wrap it in `expr_stmt` so it can sit in a `stmt`
/// position; otherwise return it unchanged.
fn wrap_expr_in_stmt(ctx: &mut BuildCtx, id: usize) -> usize {
let kind = ctx.ast.get_node(id).map(|n| n.kind()).unwrap_or("");
if EXPR_KINDS.contains(&kind) {
yeast::tree!(ctx, (expr_stmt expr: {id}))
} else {
id
}
}
fn translation_rules() -> Vec<yeast::Rule> {
vec![
rule!(
(additive_expression)
(source_file (_)* @children)
=>
(simple_identifier "blah")
(top_level
body: {..children}
)
),
// ---- Binary expressions ----
// Swift's parser produces a different node kind for each operator
// family, but the field shape (`lhs` / `op` / `rhs`) is uniform, so
// each maps onto `binary_expr`.
rule!(
(additive_expression
lhs: (_) @left
op: _ @operator
rhs: (_) @right)
=>
(binary_expr
left: {left}
operator: (operator #{operator})
right: {right})
),
rule!(
(multiplicative_expression
lhs: (_) @left
op: _ @operator
rhs: (_) @right)
=>
(binary_expr
left: {left}
operator: (operator #{operator})
right: {right})
),
rule!(
(comparison_expression
lhs: (_) @left
op: _ @operator
rhs: (_) @right)
=>
(binary_expr
left: {left}
operator: (operator #{operator})
right: {right})
),
rule!(
(equality_expression
lhs: (_) @left
op: _ @operator
rhs: (_) @right)
=>
(binary_expr
left: {left}
operator: (operator #{operator})
right: {right})
),
rule!(
(conjunction_expression
lhs: (_) @left
op: _ @operator
rhs: (_) @right)
=>
(binary_expr
left: {left}
operator: (operator #{operator})
right: {right})
),
rule!(
(disjunction_expression
lhs: (_) @left
op: _ @operator
rhs: (_) @right)
=>
(binary_expr
left: {left}
operator: (operator #{operator})
right: {right})
),
rule!(
(nil_coalescing_expression
lhs: (_) @left
op: _ @operator
rhs: (_) @right)
=>
(binary_expr
left: {left}
operator: (operator #{operator})
right: {right})
),
rule!(
(range_expression
start: (_) @left
op: _ @operator
end: (_) @right)
=>
(binary_expr
left: {left}
operator: (operator #{operator})
right: {right})
),
// ---- Unary expressions ----
rule!(
(prefix_expression
operation: _ @operator
target: (_) @operand)
=>
(unary_expr
operand: {operand}
operator: (operator #{operator}))
),
// ---- Identifiers / name expressions ----
rule!(
(simple_identifier) @name
=>
(name_expr
identifier: (identifier #{name}))
),
// ---- Literals ----
rule!(
(integer_literal) @lit
=>
(int_literal #{lit})
),
// String literals: render the *raw* source text, including the
// surrounding quotes. Interpolations (e.g. `"hi \(x)"`) are not
// yet broken out into structured pieces \u2014 they show up as part
// of the literal's source text.
rule!(
(line_string_literal) @lit
=>
(string_literal #{lit})
), // ---- Lambdas / closures ----
// Map a `lambda_literal` whose body is a single statement to
// `lambda_expr`. Multi-statement bodies fall through to
// `unsupported_node` because `lambda_expr.body` is single-valued
// in the current `ast_types.yml`. Parameters from explicit-typed
// closures (`{ (x: Int) -> Int in ... }`) are not yet captured.
rule!(
(lambda_literal
(statements (_) @body))
=>
(lambda_expr
body: {body})
),
// ---- Block / statement wrapping ----
// A `(statements ...)` node corresponds to a brace-delimited block.
// Each child is mapped through translation; bare expression results
// get wrapped in `expr_stmt` so they fit the `body*: stmt` field.
rule!(
(statements (_)* @stmts)
=>
(block_stmt body: {..stmts.iter().copied().map(|n|
wrap_expr_in_stmt(&mut __yeast_ctx, n.into())
).collect::<Vec<usize>>()})
),
// ---- Calls and member access ----
// Member access, e.g. `obj.member`. The Swift parser wraps the
// member name as `(navigation_suffix suffix: (simple_identifier))`.
rule!(
(navigation_expression
target: (_) @target
suffix: (navigation_suffix
suffix: (simple_identifier) @member))
=>
(member_access_expr
target: {target}
member: (identifier #{member}))
),
// Function / method call. The callee is the first child of
// `call_expression`; the second is a `call_suffix` whose
// `value_arguments` (if present) hold the parenthesized args. A
// trailing closure (`call_suffix` with a `lambda_literal` child)
// is appended as a final argument.
rule!(
(call_expression
(_) @callee
(call_suffix
(value_arguments
(value_argument value: (_) @args)*)?
(lambda_literal)? @trailing))
=>
(call_expr
function: {callee}
argument: {..args}
argument: {..trailing}
)
),
// ---- Guard statement ----
// `guard let x = e else { ... }` — currently only handles the
// let-binding form. The Swift parser models the `let` keyword as a
// `value_binding_pattern` child of `condition`, followed by an
// unnamed `=` and the source expression.
rule!(
(guard_statement
bound_identifier: (simple_identifier) @id
condition: (value_binding_pattern)
condition: (_) @value
(else)
(statements) @else_branch)
=>
(guard_if_stmt
condition: (let_pattern_condition
pattern: (var_pattern identifier: (identifier #{id}))
value: {value})
else: {else_branch})
),
// ---- If statement ----
// if-let binding (with optional else branch). The Swift parser puts
// the bound name in `bound_identifier`, the `let` keyword as a
// `value_binding_pattern` child of `condition`, and the source
// expression as a separate child of `condition`.
rule!(
(if_statement
bound_identifier: (simple_identifier) @id
condition: (value_binding_pattern)
condition: (_) @value
(statements) @then
(else)
(_) @else_branch)
=>
(if_stmt
condition: (let_pattern_condition
pattern: (var_pattern identifier: (identifier #{id}))
value: {value})
then: {then}
else: {else_branch})
),
rule!(
(if_statement
bound_identifier: (simple_identifier) @id
condition: (value_binding_pattern)
condition: (_) @value
(statements) @then)
=>
(if_stmt
condition: (let_pattern_condition
pattern: (var_pattern identifier: (identifier #{id}))
value: {value})
then: {then})
),
// With explicit else branch (block or chained if).
rule!(
(if_statement
condition: (_) @cond
(statements) @then
(else)
(_) @else_branch)
=>
(if_stmt
condition: (expr_condition expr: {cond})
then: {then}
else: {else_branch})
),
// Without else branch.
rule!(
(if_statement
condition: (_) @cond
(statements) @then)
=>
(if_stmt
condition: (expr_condition expr: {cond})
then: {then})
), // ---- Patterns ----
// The Swift parser uses a `pattern` node with a `bound_identifier`
// field for simple bindings such as `let x = ...`.
rule!(
(pattern bound_identifier: (simple_identifier) @id)
=>
(var_pattern
identifier: (identifier #{id}))
),
// Inside tuple patterns, the inner `pattern` node holds a bare
// `simple_identifier` (with no `bound_identifier` field).
rule!(
(pattern (simple_identifier) @id)
=>
(var_pattern
identifier: (identifier #{id}))
),
// Tuple destructuring pattern, e.g. `let (a, b) = pair`. The parser
// emits a `pattern` node whose unnamed children are themselves
// `pattern` nodes.
rule!(
(pattern (pattern)+ @parts)
=>
(tuple_pattern element: {..parts})
),
// ---- Variable declarations ----
// Handles single (`let x = e`), multiple (`let x = 1, y = 2`),
// and uninitialized (`var x: T`) bindings.
rule!(
(property_declaration
name: (_)* @pats
value: (_)* @vals)
=>
(variable_declaration_stmt
variable_declarator: {..pats.iter().enumerate().map(|(i, &pat)| {
match vals.get(i).copied() {
Some(val) => yeast::tree!(
(variable_declarator
pattern: {pat}
value: {val})),
None => yeast::tree!(
(variable_declarator
pattern: {pat})),
}
})})
),
// ---- Fallbacks ----
rule!(
(_)
=>
(unsupported_node)
),
rule!(
_ @node
=>
{node}
),
]
}
pub fn language_spec() -> simple::LanguageSpec {
let desugar = DesugaringConfig::new().add_phase("desugar", desugaring_rules());
pub fn language_spec(desugared_ast_schema: &'static str) -> simple::LanguageSpec {
let desugar = DesugaringConfig::new()
.add_phase("translate", PhaseKind::OneShot, translation_rules())
.with_output_node_types_yaml(desugared_ast_schema);
simple::LanguageSpec {
prefix: "swift",
ts_language: tree_sitter_swift::LANGUAGE.into(),

View File

@@ -3,6 +3,7 @@ use clap::Parser;
mod autobuilder;
mod extractor;
mod generator;
mod languages;
#[derive(Parser)]
#[command(author, version, about)]

View File

@@ -0,0 +1,238 @@
===
Closure with explicit parameters
===
let f = { (x: Int) -> Int in x * 2 }
---
source_file
statement:
property_declaration
binding:
value_binding_pattern
mutability: let
declarator:
property_binding
name:
pattern
bound_identifier: simple_identifier "f"
value:
lambda_literal
statement:
multiplicative_expression
lhs: simple_identifier "x"
op: *
rhs: integer_literal "2"
type:
lambda_function_type
params:
lambda_function_type_parameters
parameter:
lambda_parameter
name: simple_identifier "x"
type:
type
name:
user_type
part:
simple_user_type
name: type_identifier "Int"
return_type:
type
name:
user_type
part:
simple_user_type
name: type_identifier "Int"
---
top_level
body:
===
Closure with shorthand parameters
===
let f = { $0 + $1 }
---
source_file
statement:
property_declaration
binding:
value_binding_pattern
mutability: let
declarator:
property_binding
name:
pattern
bound_identifier: simple_identifier "f"
value:
lambda_literal
statement:
additive_expression
lhs: simple_identifier "$0"
op: +
rhs: simple_identifier "$1"
---
top_level
body:
===
Trailing closure
===
xs.map { $0 * 2 }
---
source_file
statement:
call_expression
function:
navigation_expression
suffix:
navigation_suffix
suffix: simple_identifier "map"
target: simple_identifier "xs"
suffix:
call_suffix
lambda:
lambda_literal
statement:
multiplicative_expression
lhs: simple_identifier "$0"
op: *
rhs: integer_literal "2"
---
top_level
body:
===
Closure with capture list
===
let f = { [weak self] in self?.doThing() }
---
source_file
statement:
property_declaration
binding:
value_binding_pattern
mutability: let
declarator:
property_binding
name:
pattern
bound_identifier: simple_identifier "f"
value:
lambda_literal
captures:
capture_list
item:
capture_list_item
name: simple_identifier "self"
ownership:
ownership_modifier
statement:
call_expression
function:
navigation_expression
suffix:
navigation_suffix
suffix: simple_identifier "doThing"
target:
optional_chain_marker
expr:
self_expression
suffix:
call_suffix
arguments:
value_arguments
---
top_level
body:
===
Multi-statement closure
===
let f = { (x: Int) -> Int in
let y = x + 1
return y * 2
}
---
source_file
statement:
property_declaration
binding:
value_binding_pattern
mutability: let
declarator:
property_binding
name:
pattern
bound_identifier: simple_identifier "f"
value:
lambda_literal
statement:
property_declaration
binding:
value_binding_pattern
mutability: let
declarator:
property_binding
name:
pattern
bound_identifier: simple_identifier "y"
value:
additive_expression
lhs: simple_identifier "x"
op: +
rhs: integer_literal "1"
control_transfer_statement
kind: return
result:
multiplicative_expression
lhs: simple_identifier "y"
op: *
rhs: integer_literal "2"
type:
lambda_function_type
params:
lambda_function_type_parameters
parameter:
lambda_parameter
name: simple_identifier "x"
type:
type
name:
user_type
part:
simple_user_type
name: type_identifier "Int"
return_type:
type
name:
user_type
part:
simple_user_type
name: type_identifier "Int"
---
top_level
body:

View File

@@ -0,0 +1,311 @@
===
Array literal
===
let xs = [1, 2, 3]
---
source_file
statement:
property_declaration
binding:
value_binding_pattern
mutability: let
declarator:
property_binding
name:
pattern
bound_identifier: simple_identifier "xs"
value:
array_literal
element:
integer_literal "1"
integer_literal "2"
integer_literal "3"
---
top_level
body:
===
Empty array literal with type
===
let xs: [Int] = []
---
source_file
statement:
property_declaration
binding:
value_binding_pattern
mutability: let
declarator:
property_binding
name:
pattern
bound_identifier: simple_identifier "xs"
type:
type_annotation
type:
type
name:
array_type
element:
type
name:
user_type
part:
simple_user_type
name: type_identifier "Int"
value:
array_literal
---
top_level
body:
===
Dictionary literal
===
let d = ["a": 1, "b": 2]
---
source_file
statement:
property_declaration
binding:
value_binding_pattern
mutability: let
declarator:
property_binding
name:
pattern
bound_identifier: simple_identifier "d"
value:
dictionary_literal
element:
dictionary_literal_item
key:
line_string_literal
text: line_str_text "a"
value: integer_literal "1"
dictionary_literal_item
key:
line_string_literal
text: line_str_text "b"
value: integer_literal "2"
---
top_level
body:
===
Set literal
===
let s: Set<Int> = [1, 2, 3]
---
source_file
statement:
property_declaration
binding:
value_binding_pattern
mutability: let
declarator:
property_binding
name:
pattern
bound_identifier: simple_identifier "s"
type:
type_annotation
type:
type
name:
user_type
part:
simple_user_type
arguments:
type_arguments
argument:
type
name:
user_type
part:
simple_user_type
name: type_identifier "Int"
name: type_identifier "Set"
value:
array_literal
element:
integer_literal "1"
integer_literal "2"
integer_literal "3"
---
top_level
body:
===
Tuple literal
===
let t = (1, "two", 3.0)
---
source_file
statement:
property_declaration
binding:
value_binding_pattern
mutability: let
declarator:
property_binding
name:
pattern
bound_identifier: simple_identifier "t"
value:
tuple_expression
element:
tuple_expression_item
value: integer_literal "1"
tuple_expression_item
value:
line_string_literal
text: line_str_text "two"
tuple_expression_item
value: real_literal "3.0"
---
top_level
body:
===
Subscript access
===
// TODO: tree-sitter-swift parses `xs[0]` as a call_expression (same shape
// as `xs(0)`), so the mapping currently produces a call_expr. Update the
// parser / add a separate subscript_expr node and remap when fixed.
let first = xs[0]
---
source_file
statement:
property_declaration
binding:
value_binding_pattern
mutability: let
declarator:
property_binding
name:
pattern
bound_identifier: simple_identifier "first"
value:
call_expression
function: simple_identifier "xs"
suffix:
call_suffix
arguments:
value_arguments
argument:
value_argument
value: integer_literal "0"
comment "// TODO: tree-sitter-swift parses `xs[0]` as a call_expression (same shape"
comment "// as `xs(0)`), so the mapping currently produces a call_expr. Update the"
comment "// parser / add a separate subscript_expr node and remap when fixed."
---
top_level
body:
unsupported_node "// TODO: tree-sitter-swift parses `xs[0]` as a call_expression (same shape"
unsupported_node "// as `xs(0)`), so the mapping currently produces a call_expr. Update the"
unsupported_node "// parser / add a separate subscript_expr node and remap when fixed."
===
Dictionary subscript
===
// TODO: same parser issue as the array subscript case above —
// `d["key"]` is parsed as `call_expression(d, ("key"))`.
let v = d["key"]
---
source_file
statement:
property_declaration
binding:
value_binding_pattern
mutability: let
declarator:
property_binding
name:
pattern
bound_identifier: simple_identifier "v"
value:
call_expression
function: simple_identifier "d"
suffix:
call_suffix
arguments:
value_arguments
argument:
value_argument
value:
line_string_literal
text: line_str_text "key"
comment "// TODO: same parser issue as the array subscript case above —"
comment "// `d[\"key\"]` is parsed as `call_expression(d, (\"key\"))`."
---
top_level
body:
unsupported_node "// TODO: same parser issue as the array subscript case above —"
unsupported_node "// `d[\"key\"]` is parsed as `call_expression(d, (\"key\"))`."
===
Tuple member access
===
let n = t.0
---
source_file
statement:
property_declaration
binding:
value_binding_pattern
mutability: let
declarator:
property_binding
name:
pattern
bound_identifier: simple_identifier "n"
value:
navigation_expression
suffix:
navigation_suffix
suffix: integer_literal "0"
target: simple_identifier "t"
---
top_level
body:

View File

@@ -0,0 +1,447 @@
===
If statement
===
if x > 0 {
print(x)
}
---
source_file
statement:
if_statement
body:
block
statement:
call_expression
function: simple_identifier "print"
suffix:
call_suffix
arguments:
value_arguments
argument:
value_argument
value: simple_identifier "x"
condition:
if_condition
kind:
comparison_expression
lhs: simple_identifier "x"
op: >
rhs: integer_literal "0"
---
top_level
body:
===
If-else
===
if x > 0 {
print(x)
} else {
print(-x)
}
---
source_file
statement:
if_statement
body:
block
statement:
call_expression
function: simple_identifier "print"
suffix:
call_suffix
arguments:
value_arguments
argument:
value_argument
value: simple_identifier "x"
condition:
if_condition
kind:
comparison_expression
lhs: simple_identifier "x"
op: >
rhs: integer_literal "0"
else_branch:
block
statement:
call_expression
function: simple_identifier "print"
suffix:
call_suffix
arguments:
value_arguments
argument:
value_argument
value:
prefix_expression
operation: -
target: simple_identifier "x"
---
top_level
body:
===
If-else-if chain
===
if x > 0 {
print(1)
} else if x < 0 {
print(2)
} else {
print(3)
}
---
source_file
statement:
if_statement
body:
block
statement:
call_expression
function: simple_identifier "print"
suffix:
call_suffix
arguments:
value_arguments
argument:
value_argument
value: integer_literal "1"
condition:
if_condition
kind:
comparison_expression
lhs: simple_identifier "x"
op: >
rhs: integer_literal "0"
else_branch:
if_statement
body:
block
statement:
call_expression
function: simple_identifier "print"
suffix:
call_suffix
arguments:
value_arguments
argument:
value_argument
value: integer_literal "2"
condition:
if_condition
kind:
comparison_expression
lhs: simple_identifier "x"
op: <
rhs: integer_literal "0"
else_branch:
block
statement:
call_expression
function: simple_identifier "print"
suffix:
call_suffix
arguments:
value_arguments
argument:
value_argument
value: integer_literal "3"
---
top_level
body:
===
If-let optional binding
===
if let value = optional {
print(value)
}
---
source_file
statement:
if_statement
body:
block
statement:
call_expression
function: simple_identifier "print"
suffix:
call_suffix
arguments:
value_arguments
argument:
value_argument
value: simple_identifier "value"
condition:
if_condition
kind:
if_let_binding
pattern:
pattern
binding:
value_binding_pattern
mutability: let
bound_identifier: simple_identifier "value"
value: simple_identifier "optional"
---
top_level
body:
===
Guard let
===
guard let value = optional else { return }
---
source_file
statement:
guard_statement
body:
block
statement:
control_transfer_statement
kind: return
condition:
if_condition
kind:
if_let_binding
pattern:
pattern
binding:
value_binding_pattern
mutability: let
bound_identifier: simple_identifier "value"
value: simple_identifier "optional"
---
top_level
body:
===
Ternary expression
===
let y = x > 0 ? 1 : -1
---
source_file
statement:
property_declaration
binding:
value_binding_pattern
mutability: let
declarator:
property_binding
name:
pattern
bound_identifier: simple_identifier "y"
value:
ternary_expression
condition:
comparison_expression
lhs: simple_identifier "x"
op: >
rhs: integer_literal "0"
if_false:
prefix_expression
operation: -
target: integer_literal "1"
if_true: integer_literal "1"
---
top_level
body:
===
Switch statement
===
switch x {
case 1:
print("one")
case 2, 3:
print("two or three")
default:
print("other")
}
---
source_file
statement:
switch_statement
entry:
switch_entry
pattern:
switch_pattern
pattern:
pattern
kind: integer_literal "1"
statement:
call_expression
function: simple_identifier "print"
suffix:
call_suffix
arguments:
value_arguments
argument:
value_argument
value:
line_string_literal
text: line_str_text "one"
switch_entry
pattern:
switch_pattern
pattern:
pattern
kind: integer_literal "2"
switch_pattern
pattern:
pattern
kind: integer_literal "3"
statement:
call_expression
function: simple_identifier "print"
suffix:
call_suffix
arguments:
value_arguments
argument:
value_argument
value:
line_string_literal
text: line_str_text "two or three"
switch_entry
default: default_keyword "default"
statement:
call_expression
function: simple_identifier "print"
suffix:
call_suffix
arguments:
value_arguments
argument:
value_argument
value:
line_string_literal
text: line_str_text "other"
expr: simple_identifier "x"
---
top_level
body:
===
Switch with binding pattern
===
switch shape {
case .circle(let r):
print(r)
case .square(let s):
print(s)
}
---
source_file
statement:
switch_statement
entry:
switch_entry
pattern:
switch_pattern
pattern:
pattern
kind:
case_pattern
arguments:
tuple_pattern
item:
tuple_pattern_item
pattern:
pattern
kind:
binding_pattern
binding:
value_binding_pattern
mutability: let
pattern:
pattern
bound_identifier: simple_identifier "r"
name: simple_identifier "circle"
statement:
call_expression
function: simple_identifier "print"
suffix:
call_suffix
arguments:
value_arguments
argument:
value_argument
value: simple_identifier "r"
switch_entry
pattern:
switch_pattern
pattern:
pattern
kind:
case_pattern
arguments:
tuple_pattern
item:
tuple_pattern_item
pattern:
pattern
kind:
binding_pattern
binding:
value_binding_pattern
mutability: let
pattern:
pattern
bound_identifier: simple_identifier "s"
name: simple_identifier "square"
statement:
call_expression
function: simple_identifier "print"
suffix:
call_suffix
arguments:
value_arguments
argument:
value_argument
value: simple_identifier "s"
expr: simple_identifier "shape"
---
top_level
body:

View File

@@ -0,0 +1,39 @@
===
Additive expression is desugared
===
1 + 2
---
source_file
statement:
additive_expression
lhs: integer_literal "1"
op: +
rhs: integer_literal "2"
---
top_level
body:
===
Another additive expression is desugared
===
foo + bar
---
source_file
statement:
additive_expression
lhs: simple_identifier "foo"
op: +
rhs: simple_identifier "bar"
---
top_level
body:

View File

@@ -0,0 +1,389 @@
===
Function with no parameters
===
func greet() {
print("hello")
}
---
source_file
statement:
function_declaration
body:
block
statement:
call_expression
function: simple_identifier "print"
suffix:
call_suffix
arguments:
value_arguments
argument:
value_argument
value:
line_string_literal
text: line_str_text "hello"
name: simple_identifier "greet"
---
top_level
body:
===
Function with parameters and return type
===
func add(_ a: Int, _ b: Int) -> Int {
return a + b
}
---
source_file
statement:
function_declaration
body:
block
statement:
control_transfer_statement
kind: return
result:
additive_expression
lhs: simple_identifier "a"
op: +
rhs: simple_identifier "b"
name: simple_identifier "add"
parameter:
function_parameter
parameter:
parameter
external_name: simple_identifier "_"
name: simple_identifier "a"
type:
type
name:
user_type
part:
simple_user_type
name: type_identifier "Int"
function_parameter
parameter:
parameter
external_name: simple_identifier "_"
name: simple_identifier "b"
type:
type
name:
user_type
part:
simple_user_type
name: type_identifier "Int"
return_type:
type
name:
user_type
part:
simple_user_type
name: type_identifier "Int"
---
top_level
body:
===
Function with named parameters
===
func greet(person name: String) {
print(name)
}
---
source_file
statement:
function_declaration
body:
block
statement:
call_expression
function: simple_identifier "print"
suffix:
call_suffix
arguments:
value_arguments
argument:
value_argument
value: simple_identifier "name"
name: simple_identifier "greet"
parameter:
function_parameter
parameter:
parameter
external_name: simple_identifier "person"
name: simple_identifier "name"
type:
type
name:
user_type
part:
simple_user_type
name: type_identifier "String"
---
top_level
body:
===
Function with default parameter value
===
func greet(name: String = "world") {
print(name)
}
---
source_file
statement:
function_declaration
body:
block
statement:
call_expression
function: simple_identifier "print"
suffix:
call_suffix
arguments:
value_arguments
argument:
value_argument
value: simple_identifier "name"
name: simple_identifier "greet"
parameter:
function_parameter
default_value:
line_string_literal
text: line_str_text "world"
parameter:
parameter
name: simple_identifier "name"
type:
type
name:
user_type
part:
simple_user_type
name: type_identifier "String"
---
top_level
body:
===
Variadic function
===
func sum(_ values: Int...) -> Int {
return values.reduce(0, +)
}
---
source_file
statement:
function_declaration
body:
block
statement:
control_transfer_statement
kind: return
result:
call_expression
function:
navigation_expression
suffix:
navigation_suffix
suffix: simple_identifier "reduce"
target: simple_identifier "values"
suffix:
call_suffix
arguments:
value_arguments
argument:
value_argument
value: integer_literal "0"
value_argument
value:
referenceable_operator
operator: +
name: simple_identifier "sum"
parameter:
function_parameter
parameter:
parameter
external_name: simple_identifier "_"
name: simple_identifier "values"
type:
type
name:
user_type
part:
simple_user_type
name: type_identifier "Int"
return_type:
type
name:
user_type
part:
simple_user_type
name: type_identifier "Int"
---
top_level
body:
===
Function call
===
foo(1, 2)
---
source_file
statement:
call_expression
function: simple_identifier "foo"
suffix:
call_suffix
arguments:
value_arguments
argument:
value_argument
value: integer_literal "1"
value_argument
value: integer_literal "2"
---
top_level
body:
===
Function call with labelled arguments
===
greet(person: "Bob")
---
source_file
statement:
call_expression
function: simple_identifier "greet"
suffix:
call_suffix
arguments:
value_arguments
argument:
value_argument
name:
value_argument_label
name: simple_identifier "person"
value:
line_string_literal
text: line_str_text "Bob"
---
top_level
body:
===
Method call
===
list.append(1)
---
source_file
statement:
call_expression
function:
navigation_expression
suffix:
navigation_suffix
suffix: simple_identifier "append"
target: simple_identifier "list"
suffix:
call_suffix
arguments:
value_arguments
argument:
value_argument
value: integer_literal "1"
---
top_level
body:
===
Generic function
===
func identity<T>(_ x: T) -> T {
return x
}
---
source_file
statement:
function_declaration
body:
block
statement:
control_transfer_statement
kind: return
result: simple_identifier "x"
name: simple_identifier "identity"
parameter:
function_parameter
parameter:
parameter
external_name: simple_identifier "_"
name: simple_identifier "x"
type:
type
name:
user_type
part:
simple_user_type
name: type_identifier "T"
return_type:
type
name:
user_type
part:
simple_user_type
name: type_identifier "T"
type_parameters:
type_parameters
parameter:
type_parameter
name: type_identifier "T"
---
top_level
body:

View File

@@ -0,0 +1,124 @@
===
Integer literal
===
42
---
source_file
statement: integer_literal "42"
---
top_level
body:
===
Negative integer literal
===
-7
---
source_file
statement:
prefix_expression
operation: -
target: integer_literal "7"
---
top_level
body:
===
Floating-point literal
===
3.14
---
source_file
statement: real_literal "3.14"
---
top_level
body:
===
Boolean literals
===
true
false
---
source_file
statement:
boolean_literal
boolean_literal
---
top_level
body:
===
Nil literal
===
nil
---
source_file
statement: nil
---
top_level
body:
===
String literal
===
"hello"
---
source_file
statement:
line_string_literal
text: line_str_text "hello"
---
top_level
body:
===
String with interpolation
===
"hello \(name)"
---
source_file
statement:
line_string_literal
interpolation:
interpolated_expression
value: simple_identifier "name"
text: line_str_text "hello "
---
top_level
body:

View File

@@ -0,0 +1,254 @@
===
For-in over array literal
===
for x in [1, 2, 3] {
print(x)
}
---
source_file
statement:
for_statement
body:
block
statement:
call_expression
function: simple_identifier "print"
suffix:
call_suffix
arguments:
value_arguments
argument:
value_argument
value: simple_identifier "x"
collection:
array_literal
element:
integer_literal "1"
integer_literal "2"
integer_literal "3"
item:
pattern
bound_identifier: simple_identifier "x"
---
top_level
body:
===
For-in over range
===
for i in 0..<10 {
print(i)
}
---
source_file
statement:
for_statement
body:
block
statement:
call_expression
function: simple_identifier "print"
suffix:
call_suffix
arguments:
value_arguments
argument:
value_argument
value: simple_identifier "i"
collection:
range_expression
end: integer_literal "10"
op: ..<
start: integer_literal "0"
item:
pattern
bound_identifier: simple_identifier "i"
---
top_level
body:
===
For-in with where clause
===
for x in xs where x > 0 {
print(x)
}
---
source_file
statement:
for_statement
body:
block
statement:
call_expression
function: simple_identifier "print"
suffix:
call_suffix
arguments:
value_arguments
argument:
value_argument
value: simple_identifier "x"
collection: simple_identifier "xs"
item:
pattern
bound_identifier: simple_identifier "x"
where:
where_clause
expr:
comparison_expression
lhs: simple_identifier "x"
op: >
rhs: integer_literal "0"
keyword: where_keyword "where"
---
top_level
body:
===
While loop
===
while x > 0 {
x -= 1
}
---
source_file
statement:
while_statement
body:
block
statement:
assignment
operator: -=
result: integer_literal "1"
target:
directly_assignable_expression
expr: simple_identifier "x"
condition:
if_condition
kind:
comparison_expression
lhs: simple_identifier "x"
op: >
rhs: integer_literal "0"
---
top_level
body:
===
Repeat-while loop
===
repeat {
x -= 1
} while x > 0
---
source_file
statement:
repeat_while_statement
body:
block
statement:
assignment
operator: -=
result: integer_literal "1"
target:
directly_assignable_expression
expr: simple_identifier "x"
condition:
if_condition
kind:
comparison_expression
lhs: simple_identifier "x"
op: >
rhs: integer_literal "0"
---
top_level
body:
===
Break and continue
===
for x in xs {
if x < 0 { continue }
if x > 100 { break }
print(x)
}
---
source_file
statement:
for_statement
body:
block
statement:
if_statement
body:
block
statement:
control_transfer_statement
kind: continue
condition:
if_condition
kind:
comparison_expression
lhs: simple_identifier "x"
op: <
rhs: integer_literal "0"
if_statement
body:
block
statement:
control_transfer_statement
kind: break
condition:
if_condition
kind:
comparison_expression
lhs: simple_identifier "x"
op: >
rhs: integer_literal "100"
call_expression
function: simple_identifier "print"
suffix:
call_suffix
arguments:
value_arguments
argument:
value_argument
value: simple_identifier "x"
collection: simple_identifier "xs"
item:
pattern
bound_identifier: simple_identifier "x"
---
top_level
body:

View File

@@ -0,0 +1,250 @@
===
Addition
===
a + b
---
source_file
statement:
additive_expression
lhs: simple_identifier "a"
op: +
rhs: simple_identifier "b"
---
top_level
body:
===
Subtraction
===
a - b
---
source_file
statement:
additive_expression
lhs: simple_identifier "a"
op: -
rhs: simple_identifier "b"
---
top_level
body:
===
Multiplication
===
a * b
---
source_file
statement:
multiplicative_expression
lhs: simple_identifier "a"
op: *
rhs: simple_identifier "b"
---
top_level
body:
===
Division
===
a / b
---
source_file
statement:
multiplicative_expression
lhs: simple_identifier "a"
op: /
rhs: simple_identifier "b"
---
top_level
body:
===
Operator precedence: addition and multiplication
===
a + b * c
---
source_file
statement:
additive_expression
lhs: simple_identifier "a"
op: +
rhs:
multiplicative_expression
lhs: simple_identifier "b"
op: *
rhs: simple_identifier "c"
---
top_level
body:
===
Parenthesised expression
===
(a + b) * c
---
source_file
statement:
multiplicative_expression
lhs:
tuple_expression
element:
tuple_expression_item
value:
additive_expression
lhs: simple_identifier "a"
op: +
rhs: simple_identifier "b"
op: *
rhs: simple_identifier "c"
---
top_level
body:
===
Comparison
===
a < b
---
source_file
statement:
comparison_expression
lhs: simple_identifier "a"
op: <
rhs: simple_identifier "b"
---
top_level
body:
===
Equality
===
a == b
---
source_file
statement:
equality_expression
lhs: simple_identifier "a"
op: ==
rhs: simple_identifier "b"
---
top_level
body:
===
Logical and
===
a && b
---
source_file
statement:
conjunction_expression
lhs: simple_identifier "a"
op: &&
rhs: simple_identifier "b"
---
top_level
body:
===
Logical or
===
a || b
---
source_file
statement:
disjunction_expression
lhs: simple_identifier "a"
op: ||
rhs: simple_identifier "b"
---
top_level
body:
===
Logical not
===
!a
---
source_file
statement:
prefix_expression
operation: bang "!"
target: simple_identifier "a"
---
top_level
body:
===
Range operator
===
1...10
---
source_file
statement:
range_expression
end: integer_literal "10"
op: ...
start: integer_literal "1"
---
top_level
body:

View File

@@ -0,0 +1,290 @@
===
Optional type annotation
===
let x: Int? = nil
---
source_file
statement:
property_declaration
binding:
value_binding_pattern
mutability: let
declarator:
property_binding
name:
pattern
bound_identifier: simple_identifier "x"
type:
type_annotation
type:
type
name:
optional_type
wrapped:
user_type
part:
simple_user_type
name: type_identifier "Int"
value: nil
---
top_level
body:
===
Optional chaining
===
let n = obj?.foo?.bar
---
source_file
statement:
property_declaration
binding:
value_binding_pattern
mutability: let
declarator:
property_binding
name:
pattern
bound_identifier: simple_identifier "n"
value:
navigation_expression
suffix:
navigation_suffix
suffix: simple_identifier "bar"
target:
optional_chain_marker
expr:
navigation_expression
suffix:
navigation_suffix
suffix: simple_identifier "foo"
target:
optional_chain_marker
expr: simple_identifier "obj"
---
top_level
body:
===
Force unwrap
===
let n = opt!
---
source_file
statement:
property_declaration
binding:
value_binding_pattern
mutability: let
declarator:
property_binding
name:
pattern
bound_identifier: simple_identifier "n"
value:
postfix_expression
operation: bang "!"
target: simple_identifier "opt"
---
top_level
body:
===
Nil-coalescing
===
let n = opt ?? 0
---
source_file
statement:
property_declaration
binding:
value_binding_pattern
mutability: let
declarator:
property_binding
name:
pattern
bound_identifier: simple_identifier "n"
value:
nil_coalescing_expression
if_nil: integer_literal "0"
value: simple_identifier "opt"
---
top_level
body:
===
Throwing function
===
func read() throws -> String {
return ""
}
---
source_file
statement:
function_declaration
body:
block
statement:
control_transfer_statement
kind: return
result:
line_string_literal
name: simple_identifier "read"
return_type:
type
name:
user_type
part:
simple_user_type
name: type_identifier "String"
throws: throws "throws"
---
top_level
body:
===
Do-catch
===
do {
try foo()
} catch {
print(error)
}
---
source_file
statement:
do_statement
body:
block
statement:
try_expression
expr:
call_expression
function: simple_identifier "foo"
suffix:
call_suffix
arguments:
value_arguments
operator:
try_operator
catch:
catch_block
body:
block
statement:
call_expression
function: simple_identifier "print"
suffix:
call_suffix
arguments:
value_arguments
argument:
value_argument
value: simple_identifier "error"
keyword: catch_keyword "catch"
---
top_level
body:
===
Try? expression
===
let result = try? foo()
---
source_file
statement:
property_declaration
binding:
value_binding_pattern
mutability: let
declarator:
property_binding
name:
pattern
bound_identifier: simple_identifier "result"
value:
try_expression
expr:
call_expression
function: simple_identifier "foo"
suffix:
call_suffix
arguments:
value_arguments
operator:
try_operator
---
top_level
body:
===
Try! expression
===
let result = try! foo()
---
source_file
statement:
property_declaration
binding:
value_binding_pattern
mutability: let
declarator:
property_binding
name:
pattern
bound_identifier: simple_identifier "result"
value:
try_expression
expr:
call_expression
function: simple_identifier "foo"
suffix:
call_suffix
arguments:
value_arguments
operator:
try_operator
---
top_level
body:

View File

@@ -0,0 +1,641 @@
===
Empty class
===
class Foo {}
---
source_file
statement:
class_declaration
body:
class_body
declaration_kind: class
name: type_identifier "Foo"
---
top_level
body:
===
Class with stored properties
===
class Point {
var x: Int
var y: Int
}
---
source_file
statement:
class_declaration
body:
class_body
member:
property_declaration
binding:
value_binding_pattern
mutability: var
declarator:
property_binding
name:
pattern
bound_identifier: simple_identifier "x"
type:
type_annotation
type:
type
name:
user_type
part:
simple_user_type
name: type_identifier "Int"
property_declaration
binding:
value_binding_pattern
mutability: var
declarator:
property_binding
name:
pattern
bound_identifier: simple_identifier "y"
type:
type_annotation
type:
type
name:
user_type
part:
simple_user_type
name: type_identifier "Int"
declaration_kind: class
name: type_identifier "Point"
---
top_level
body:
===
Class with initializer
===
class Point {
var x: Int
init(x: Int) {
self.x = x
}
}
---
source_file
statement:
class_declaration
body:
class_body
member:
property_declaration
binding:
value_binding_pattern
mutability: var
declarator:
property_binding
name:
pattern
bound_identifier: simple_identifier "x"
type:
type_annotation
type:
type
name:
user_type
part:
simple_user_type
name: type_identifier "Int"
init_declaration
body:
block
statement:
assignment
operator: =
result: simple_identifier "x"
target:
directly_assignable_expression
expr:
navigation_expression
suffix:
navigation_suffix
suffix: simple_identifier "x"
target:
self_expression
parameter:
function_parameter
parameter:
parameter
name: simple_identifier "x"
type:
type
name:
user_type
part:
simple_user_type
name: type_identifier "Int"
declaration_kind: class
name: type_identifier "Point"
---
top_level
body:
===
Class with method
===
class Counter {
var n = 0
func bump() {
n += 1
}
}
---
source_file
statement:
class_declaration
body:
class_body
member:
property_declaration
binding:
value_binding_pattern
mutability: var
declarator:
property_binding
name:
pattern
bound_identifier: simple_identifier "n"
value: integer_literal "0"
function_declaration
body:
block
statement:
assignment
operator: +=
result: integer_literal "1"
target:
directly_assignable_expression
expr: simple_identifier "n"
name: simple_identifier "bump"
declaration_kind: class
name: type_identifier "Counter"
---
top_level
body:
===
Class inheritance
===
class Dog: Animal {}
---
source_file
statement:
class_declaration
body:
class_body
declaration_kind: class
inherits:
inheritance_specifier
inherits_from:
user_type
part:
simple_user_type
name: type_identifier "Animal"
name: type_identifier "Dog"
---
top_level
body:
===
Struct
===
struct Point {
let x: Int
let y: Int
}
---
source_file
statement:
class_declaration
body:
class_body
member:
property_declaration
binding:
value_binding_pattern
mutability: let
declarator:
property_binding
name:
pattern
bound_identifier: simple_identifier "x"
type:
type_annotation
type:
type
name:
user_type
part:
simple_user_type
name: type_identifier "Int"
property_declaration
binding:
value_binding_pattern
mutability: let
declarator:
property_binding
name:
pattern
bound_identifier: simple_identifier "y"
type:
type_annotation
type:
type
name:
user_type
part:
simple_user_type
name: type_identifier "Int"
declaration_kind: struct
name: type_identifier "Point"
---
top_level
body:
===
Enum with cases
===
enum Direction {
case north
case south
case east
case west
}
---
source_file
statement:
class_declaration
body:
enum_class_body
member:
enum_entry
case:
enum_case_entry
name: simple_identifier "north"
enum_entry
case:
enum_case_entry
name: simple_identifier "south"
enum_entry
case:
enum_case_entry
name: simple_identifier "east"
enum_entry
case:
enum_case_entry
name: simple_identifier "west"
declaration_kind: enum
name: type_identifier "Direction"
---
top_level
body:
===
Enum with associated values
===
enum Shape {
case circle(radius: Double)
case square(side: Double)
}
---
source_file
statement:
class_declaration
body:
enum_class_body
member:
enum_entry
case:
enum_case_entry
data_contents:
enum_type_parameters
parameter:
enum_type_parameter
name: simple_identifier "radius"
type:
type
name:
user_type
part:
simple_user_type
name: type_identifier "Double"
name: simple_identifier "circle"
enum_entry
case:
enum_case_entry
data_contents:
enum_type_parameters
parameter:
enum_type_parameter
name: simple_identifier "side"
type:
type
name:
user_type
part:
simple_user_type
name: type_identifier "Double"
name: simple_identifier "square"
declaration_kind: enum
name: type_identifier "Shape"
---
top_level
body:
===
Protocol declaration
===
protocol Drawable {
func draw()
}
---
source_file
statement:
protocol_declaration
body:
protocol_body
member:
protocol_function_declaration
name: simple_identifier "draw"
name: type_identifier "Drawable"
---
top_level
body:
===
Extension
===
extension Int {
func squared() -> Int { return self * self }
}
---
source_file
statement:
class_declaration
body:
class_body
member:
function_declaration
body:
block
statement:
control_transfer_statement
kind: return
result:
multiplicative_expression
lhs:
self_expression
op: *
rhs:
self_expression
name: simple_identifier "squared"
return_type:
type
name:
user_type
part:
simple_user_type
name: type_identifier "Int"
declaration_kind: extension
name:
user_type
part:
simple_user_type
name: type_identifier "Int"
---
top_level
body:
===
Computed property
===
class Rect {
var w: Double
var h: Double
var area: Double {
return w * h
}
}
---
source_file
statement:
class_declaration
body:
class_body
member:
property_declaration
binding:
value_binding_pattern
mutability: var
declarator:
property_binding
name:
pattern
bound_identifier: simple_identifier "w"
type:
type_annotation
type:
type
name:
user_type
part:
simple_user_type
name: type_identifier "Double"
property_declaration
binding:
value_binding_pattern
mutability: var
declarator:
property_binding
name:
pattern
bound_identifier: simple_identifier "h"
type:
type_annotation
type:
type
name:
user_type
part:
simple_user_type
name: type_identifier "Double"
property_declaration
binding:
value_binding_pattern
mutability: var
declarator:
property_binding
computed_value:
computed_property
statement:
control_transfer_statement
kind: return
result:
multiplicative_expression
lhs: simple_identifier "w"
op: *
rhs: simple_identifier "h"
name:
pattern
bound_identifier: simple_identifier "area"
type:
type_annotation
type:
type
name:
user_type
part:
simple_user_type
name: type_identifier "Double"
declaration_kind: class
name: type_identifier "Rect"
---
top_level
body:
===
Property with getter and setter
===
class Box {
private var _v = 0
var v: Int {
get { return _v }
set { _v = newValue }
}
}
---
source_file
statement:
class_declaration
body:
class_body
member:
property_declaration
binding:
value_binding_pattern
mutability: var
declarator:
property_binding
name:
pattern
bound_identifier: simple_identifier "_v"
value: integer_literal "0"
modifiers:
modifiers
modifier:
visibility_modifier
property_declaration
binding:
value_binding_pattern
mutability: var
declarator:
property_binding
computed_value:
computed_property
accessor:
computed_getter
body:
block
statement:
control_transfer_statement
kind: return
result: simple_identifier "_v"
specifier:
getter_specifier
computed_setter
body:
block
statement:
assignment
operator: =
result: simple_identifier "newValue"
target:
directly_assignable_expression
expr: simple_identifier "_v"
specifier:
setter_specifier
name:
pattern
bound_identifier: simple_identifier "v"
type:
type_annotation
type:
type
name:
user_type
part:
simple_user_type
name: type_identifier "Int"
declaration_kind: class
name: type_identifier "Box"
---
top_level
body:

View File

@@ -0,0 +1,231 @@
===
Let binding
===
let x = 1
---
source_file
statement:
property_declaration
binding:
value_binding_pattern
mutability: let
declarator:
property_binding
name:
pattern
bound_identifier: simple_identifier "x"
value: integer_literal "1"
---
top_level
body:
===
Var binding
===
var x = 1
---
source_file
statement:
property_declaration
binding:
value_binding_pattern
mutability: var
declarator:
property_binding
name:
pattern
bound_identifier: simple_identifier "x"
value: integer_literal "1"
---
top_level
body:
===
Let with type annotation
===
let x: Int = 1
---
source_file
statement:
property_declaration
binding:
value_binding_pattern
mutability: let
declarator:
property_binding
name:
pattern
bound_identifier: simple_identifier "x"
type:
type_annotation
type:
type
name:
user_type
part:
simple_user_type
name: type_identifier "Int"
value: integer_literal "1"
---
top_level
body:
===
Var without initialiser
===
var x: Int
---
source_file
statement:
property_declaration
binding:
value_binding_pattern
mutability: var
declarator:
property_binding
name:
pattern
bound_identifier: simple_identifier "x"
type:
type_annotation
type:
type
name:
user_type
part:
simple_user_type
name: type_identifier "Int"
---
top_level
body:
===
Tuple destructuring binding
===
let (a, b) = pair
---
source_file
statement:
property_declaration
binding:
value_binding_pattern
mutability: let
declarator:
property_binding
name:
pattern
kind:
tuple_pattern
item:
tuple_pattern_item
pattern:
pattern
kind: simple_identifier "a"
tuple_pattern_item
pattern:
pattern
kind: simple_identifier "b"
value: simple_identifier "pair"
---
top_level
body:
===
Multiple bindings on one line
===
let x = 1, y = 2
---
source_file
statement:
property_declaration
binding:
value_binding_pattern
mutability: let
declarator:
property_binding
name:
pattern
bound_identifier: simple_identifier "x"
value: integer_literal "1"
property_binding
name:
pattern
bound_identifier: simple_identifier "y"
value: integer_literal "2"
---
top_level
body:
===
Assignment
===
x = 1
---
source_file
statement:
assignment
operator: =
result: integer_literal "1"
target:
directly_assignable_expression
expr: simple_identifier "x"
---
top_level
body:
===
Compound assignment
===
x += 1
---
source_file
statement:
assignment
operator: +=
result: integer_literal "1"
target:
directly_assignable_expression
expr: simple_identifier "x"
---
top_level
body:

View File

@@ -0,0 +1,283 @@
use std::fs;
use std::path::Path;
use codeql_extractor::extractor::simple;
use yeast::{dump::dump_ast, dump::dump_ast_with_type_errors, Runner};
#[path = "../src/languages/mod.rs"]
mod languages;
#[derive(Debug)]
struct CorpusCase {
name: String,
input: String,
raw: String,
expected: String,
}
fn update_mode_enabled() -> bool {
std::env::var("UNIFIED_UPDATE_CORPUS")
.map(|v| matches!(v.to_ascii_lowercase().as_str(), "1" | "true" | "yes" | "on"))
.unwrap_or(false)
}
fn is_header_rule(line: &str) -> bool {
let trimmed = line.trim();
trimmed.len() >= 3 && trimmed.chars().all(|c| c == '=')
}
fn parse_corpus(content: &str) -> Vec<CorpusCase> {
let lines: Vec<&str> = content.lines().collect();
let mut i = 0;
let mut cases = Vec::new();
while i < lines.len() {
while i < lines.len() && lines[i].trim().is_empty() {
i += 1;
}
if i >= lines.len() {
break;
}
assert!(
is_header_rule(lines[i]),
"Expected header delimiter at line {}",
i + 1
);
i += 1;
assert!(i < lines.len(), "Missing test name at line {}", i + 1);
let name = lines[i].trim().to_string();
i += 1;
assert!(
i < lines.len() && is_header_rule(lines[i]),
"Missing closing header delimiter for case {name}"
);
i += 1;
let input_start = i;
while i < lines.len() && lines[i].trim() != "---" {
i += 1;
}
assert!(i < lines.len(), "Missing --- separator for case {name}");
let input = lines[input_start..i].join("\n").trim_end().to_string();
i += 1;
// Raw tree-sitter parse section. New-format files have a second
// `---` separator between the raw tree and the mapped AST. Legacy
// files (with only one separator) have no raw section — in that
// case `raw` stays empty and update mode will populate it.
let raw_start = i;
let mut next_sep = i;
while next_sep < lines.len() && lines[next_sep].trim() != "---" {
if is_header_rule(lines[next_sep])
&& next_sep + 2 < lines.len()
&& !lines[next_sep + 1].trim().is_empty()
&& is_header_rule(lines[next_sep + 2])
{
break;
}
next_sep += 1;
}
let raw = if next_sep < lines.len() && lines[next_sep].trim() == "---" {
let raw_text = lines[raw_start..next_sep].join("\n").trim().to_string();
i = next_sep + 1;
raw_text
} else {
String::new()
};
let expected_start = i;
while i < lines.len() {
if is_header_rule(lines[i])
&& i + 2 < lines.len()
&& !lines[i + 1].trim().is_empty()
&& is_header_rule(lines[i + 2])
{
break;
}
i += 1;
}
let expected = lines[expected_start..i].join("\n").trim().to_string();
cases.push(CorpusCase {
name,
input,
raw,
expected,
});
}
cases
}
fn render_corpus(cases: &[CorpusCase]) -> String {
let mut out = String::new();
for (idx, case) in cases.iter().enumerate() {
if idx > 0 {
// Blank line between cases.
out.push('\n');
}
out.push_str("===\n");
out.push_str(case.name.trim());
out.push_str("\n===\n\n");
out.push_str(case.input.trim());
out.push_str("\n\n---\n\n");
out.push_str(case.raw.trim());
out.push_str("\n\n---\n\n");
out.push_str(case.expected.trim());
// Single trailing newline per case; the inter-case blank line is
// added by the prefix above, and the file ends with exactly one `\n`.
out.push('\n');
}
out
}
fn run_desugaring(
lang: &simple::LanguageSpec,
input: &str,
) -> Result<yeast::Ast, String> {
let runner = match lang.desugar.as_ref() {
Some(config) => Runner::from_config(lang.ts_language.clone(), config)
.map_err(|e| format!("Failed to create yeast runner: {e}"))?,
None => Runner::new(lang.ts_language.clone(), &[]),
};
runner
.run(input)
.map_err(|e| format!("Failed to parse input: {e}"))
}
/// Produce the raw tree-sitter parse tree dump for `input`, with no
/// desugaring rules applied. Uses a `Runner` with an empty phase list and
/// the input grammar's own schema.
fn dump_raw_parse(
lang: &simple::LanguageSpec,
input: &str,
) -> Result<String, String> {
let runner = Runner::new(lang.ts_language.clone(), &[]);
let ast = runner
.run(input)
.map_err(|e| format!("Failed to parse input: {e}"))?;
Ok(dump_ast(&ast, ast.get_root(), input))
}
#[test]
fn test_corpus() {
let update_mode = update_mode_enabled();
let all_languages = languages::all_language_specs();
let corpus_dir = Path::new("tests/corpus");
for lang in all_languages {
let output_schema = yeast::node_types_yaml::schema_from_yaml_with_language(
languages::OUTPUT_AST_SCHEMA,
&lang.ts_language,
)
.expect("Failed to parse OUTPUT_AST_SCHEMA YAML");
let lang_corpus_dir = corpus_dir.join(&lang.prefix);
if !lang_corpus_dir.exists() {
continue;
}
let mut corpus_files: Vec<_> = fs::read_dir(&lang_corpus_dir)
.unwrap_or_else(|e| {
panic!(
"Failed to read corpus directory {}: {e}",
lang_corpus_dir.display()
)
})
.map(|entry| entry.expect("Failed to read corpus entry").path())
.filter(|path| path.extension().is_some_and(|ext| ext == "txt"))
.collect();
corpus_files.sort();
for corpus_path in corpus_files {
let content = fs::read_to_string(&corpus_path)
.unwrap_or_else(|e| panic!("Failed to read {}: {e}", corpus_path.display()));
let mut cases = parse_corpus(&content);
let mut failures = Vec::new();
assert!(
!cases.is_empty(),
"No corpus cases found in {}",
corpus_path.display()
);
for case in &mut cases {
match dump_raw_parse(&lang, &case.input) {
Err(e) => {
failures.push(format!(
"Raw parse failed for {} in {}: {}",
case.name,
corpus_path.display(),
e
));
}
Ok(actual_raw) => {
if update_mode {
case.raw = actual_raw.trim().to_string();
} else if case.raw.trim() != actual_raw.trim() {
failures.push(format!(
"Raw parse mismatch in {}: \"{}\"\nEXPECTED:\n\n{}\n\nACTUAL:\n\n{}",
corpus_path.display(),
case.name,
case.raw.trim(),
actual_raw.trim()
));
}
}
}
match run_desugaring(&lang, &case.input) {
Err(e) => {
failures.push(format!(
"Desugaring failed for {} in {}: {}",
case.name,
corpus_path.display(),
e
));
}
Ok(actual) => {
let actual_dump = dump_ast_with_type_errors(
&actual,
actual.get_root(),
&case.input,
&output_schema,
);
if update_mode {
case.expected = actual_dump.trim().to_string();
} else if case.expected.trim() != actual_dump.trim() {
failures.push(format!(
"Test failed in {}: \"{}\"\nEXPECTED:\n\n{}\n\nACTUAL:\n\n{}",
corpus_path.display(),
case.name,
case.expected.trim(),
actual_dump.trim()
));
}
}
}
}
assert!(
failures.is_empty(),
"{}",
failures.join("\n\n") + "\n\n"
);
if update_mode {
let updated = render_corpus(&cases);
let write_result = fs::write(&corpus_path, updated);
assert!(
write_result.is_ok(),
"Failed to update corpus file {}: {}",
corpus_path.display(),
write_result.err().map_or_else(String::new, |e| e.to_string())
);
}
}
}
}

File diff suppressed because it is too large Load Diff

View File

@@ -119,8 +119,8 @@ named:
array_type:
element: type
as_expression:
$children: as_operator
expr: expression
operator: as_operator
type: type
as_operator:
assignment:
@@ -128,114 +128,166 @@ named:
result: expression
target: directly_assignable_expression
associatedtype_declaration:
$children*: [modifiers, type_constraints]
default_value?: type
modifiers?: modifiers
must_inherit?: type
name: type_identifier
type_constraints?: type_constraints
async_keyword:
attribute:
$children+: [expression, user_type]
argument*: expression
argument_name*: simple_identifier
name: user_type
param_ref*: simple_identifier
platform*: simple_identifier
version*: integer_literal
availability_condition:
$children*: [identifier, integer_literal]
platform*: identifier
version*: integer_literal
await_expression:
$children?: expression
expr?: expression
expr: expression
bang:
bin_literal:
binding_pattern:
binding: value_binding_pattern
pattern: pattern
bitwise_operation:
lhs: expression
op: ["&", "<<", ">>", "^", "|"]
rhs: expression
block:
statement*: [control_transfer_statement, do_statement, expression, for_statement, guard_statement, local_declaration, repeat_while_statement, statement_label, while_statement]
boolean_literal:
call_expression:
$children+: [call_suffix, expression]
function: expression
suffix: call_suffix
call_suffix:
$children+: [lambda_literal, value_arguments]
arguments?: value_arguments
lambda*: lambda_literal
name*: simple_identifier
capture_list:
$children+: capture_list_item
item+: capture_list_item
capture_list_item:
$children?: ownership_modifier
name: [self_expression, simple_identifier]
ownership?: ownership_modifier
value?: expression
case_pattern:
arguments?: tuple_pattern
name: simple_identifier
type?: user_type
catch_block:
$children+: [catch_keyword, statements, where_clause]
body: block
error?: pattern
keyword: catch_keyword
where?: where_clause
catch_keyword:
check_expression:
op: "is"
target: expression
type: type
class_body:
$children*: [multiline_comment, type_level_declaration]
member*: type_level_declaration
class_declaration:
$children*: [attribute, inheritance_modifier, inheritance_specifier, modifiers, ownership_modifier, property_behavior_modifier, type_constraints, type_parameters]
attribute*: attribute
body: [class_body, enum_class_body]
declaration_kind: ["actor", "class", "enum", "extension", "struct"]
inherits*: inheritance_specifier
modifiers*: [attribute, inheritance_modifier, modifiers, ownership_modifier, property_behavior_modifier]
name: [type_identifier, unannotated_type]
type_constraints?: type_constraints
type_parameters?: type_parameters
comment:
comparison_expression:
lhs: expression
op: ["<", "<=", ">", ">="]
rhs: expression
compilation_condition:
inner?: compilation_condition
lhs?: compilation_condition
name*: simple_identifier
operand?: compilation_condition
rhs?: compilation_condition
value?: boolean_literal
version*: integer_literal
computed_getter:
$children+: [attribute, getter_specifier, statements]
attribute*: attribute
body?: block
specifier: getter_specifier
computed_modify:
$children+: [attribute, modify_specifier, statements]
attribute*: attribute
body?: block
specifier: modify_specifier
computed_property:
$children*: [computed_getter, computed_modify, computed_setter, statements]
accessor*: [computed_getter, computed_modify, computed_setter]
statement*: [control_transfer_statement, do_statement, expression, for_statement, guard_statement, local_declaration, repeat_while_statement, statement_label, while_statement]
computed_setter:
$children+: [attribute, setter_specifier, simple_identifier, statements]
attribute*: attribute
body?: block
parameter?: simple_identifier
specifier: setter_specifier
conjunction_expression:
lhs: expression
op: "&&"
rhs: expression
constructor_expression:
$children: constructor_suffix
constructed_type: [array_type, dictionary_type, user_type]
suffix: constructor_suffix
constructor_suffix:
$children+: [lambda_literal, value_arguments]
arguments?: value_arguments
lambda*: lambda_literal
name*: simple_identifier
control_transfer_statement:
$children*: [expression, throw_keyword]
kind: ["break", "continue", "return", throw_keyword, "yield"]
result?: expression
custom_operator:
default_keyword:
deinit_declaration:
$children?: modifiers
body: function_body
body: block
modifiers?: modifiers
deprecated_operator_declaration_body:
$children*: [bin_literal, boolean_literal, hex_literal, integer_literal, line_string_literal, multi_line_string_literal, oct_literal, raw_string_literal, real_literal, regex_literal, simple_identifier]
entry*: [bin_literal, boolean_literal, hex_literal, integer_literal, line_string_literal, multi_line_string_literal, "nil", oct_literal, raw_string_literal, real_literal, regex_literal, simple_identifier]
diagnostic:
dictionary_literal:
key*: expression
value*: expression
element*: dictionary_literal_item
dictionary_literal_item:
key: expression
value: expression
dictionary_type:
key: type
value: type
didset_clause:
$children*: [modifiers, simple_identifier, statements]
body: block
modifiers?: modifiers
parameter?: simple_identifier
directive:
$children*: [boolean_literal, integer_literal, simple_identifier]
condition?: compilation_condition
directly_assignable_expression:
$children: expression
expr: expression
disjunction_expression:
lhs: expression
op: "||"
rhs: expression
do_statement:
$children*: [catch_block, statements]
else:
body: block
catch*: catch_block
enum_case_entry:
data_contents?: enum_type_parameters
name: simple_identifier
raw_value?: expression
enum_class_body:
$children*: [enum_entry, type_level_declaration]
member*: [enum_entry, type_level_declaration]
enum_entry:
$children?: modifiers
data_contents*: enum_type_parameters
name+: simple_identifier
raw_value*: expression
case+: enum_case_entry
modifiers?: modifiers
enum_type_parameter:
default_value?: expression
external_name?: wildcard_pattern
name?: simple_identifier
type: type
enum_type_parameters:
$children*: [expression, type, wildcard_pattern]
parameter*: enum_type_parameter
equality_constraint:
$children*: attribute
attribute*: attribute
constrained_type: [identifier, nested_type_identifier]
must_equal: type
equality_expression:
@@ -243,106 +295,141 @@ named:
op: ["!=", "!==", "==", "==="]
rhs: expression
existential_type:
$children: unannotated_type
name: unannotated_type
external_macro_definition:
$children: value_arguments
arguments: value_arguments
for_statement:
$children*: [statements, try_operator, type_annotation, where_clause]
body: block
collection: expression
item: pattern
try?: try_operator
type?: type_annotation
where?: where_clause
fully_open_range:
function_body:
$children?: statements
function_declaration:
$children*: [attribute, inheritance_modifier, modifiers, ownership_modifier, parameter, property_behavior_modifier, throws, throws_clause, type_constraints, type_parameters]
body: function_body
default_value*: expression
async?: "async"
body: block
modifiers*: [attribute, inheritance_modifier, modifiers, ownership_modifier, property_behavior_modifier]
name: [referenceable_operator, simple_identifier]
parameter*: function_parameter
return_type?: [implicitly_unwrapped_type, type]
throws?: [throws, throws_clause]
type_constraints?: type_constraints
type_parameters?: type_parameters
function_modifier:
function_parameter:
attribute?: attribute
default_value?: expression
parameter: parameter
function_type:
$children?: [throws, throws_clause]
async?: "async"
params: unannotated_type
return_type: type
throws?: [throws, throws_clause]
getter_specifier:
$children*: [mutation_modifier, throws, throws_clause]
effect*: [async_keyword, throws, throws_clause]
mutation?: mutation_modifier
guard_statement:
$children+: [else, statements]
body: block
condition+: if_condition
hex_literal:
identifier:
$children+: simple_identifier
part+: simple_identifier
if_condition:
$children: [availability_condition, expression, if_let_binding]
kind: [availability_condition, expression, if_let_binding]
if_let_binding:
$children*: [expression, pattern, type, type_annotation, user_type, value_binding_pattern, where_clause, wildcard_pattern]
bound_identifier?: simple_identifier
pattern: pattern
type?: type_annotation
value?: expression
where?: where_clause
if_statement:
$children*: [else, if_statement, statements]
body: block
condition+: if_condition
else_branch?: [block, if_statement]
implicitly_unwrapped_type:
$children: type
name: type
import_declaration:
$children+: [identifier, modifiers]
modifiers?: modifiers
name: identifier
infix_expression:
lhs: expression
op: custom_operator
rhs: expression
inheritance_constraint:
$children*: attribute
attribute*: attribute
constrained_type: [identifier, nested_type_identifier]
inherits_from: [implicitly_unwrapped_type, type]
inheritance_modifier:
inheritance_specifier:
inherits_from: [function_type, suppressed_constraint, user_type]
init_declaration:
$children*: [attribute, bang, modifiers, parameter, throws, throws_clause, type_constraints, type_parameters]
body?: function_body
default_value*: expression
name: "init"
async?: "async"
bang?: bang
body?: block
modifiers?: modifiers
parameter*: function_parameter
throws?: [throws, throws_clause]
type_constraints?: type_constraints
type_parameters?: type_parameters
integer_literal:
interpolated_expression:
$children?: type_modifiers
name?: value_argument_label
reference_specifier*: value_argument_label
type_modifiers?: type_modifiers
value?: expression
key_path_component:
name?: simple_identifier
postfix*: key_path_postfix
key_path_expression:
$children*: [array_type, bang, dictionary_type, simple_identifier, type_arguments, type_identifier, value_argument]
component*: key_path_component
type?: [array_type, dictionary_type, simple_user_type]
key_path_postfix:
argument*: value_argument
force_unwrap?: bang
key_path_string_expression:
$children: expression
expr: expression
lambda_function_type:
$children*: [lambda_function_type_parameters, throws, throws_clause]
async?: "async"
params?: lambda_function_type_parameters
return_type?: [implicitly_unwrapped_type, type]
throws?: [throws, throws_clause]
lambda_function_type_parameters:
$children+: lambda_parameter
parameter+: lambda_parameter
lambda_literal:
$children*: [attribute, statements]
attribute*: attribute
captures?: capture_list
statement*: [control_transfer_statement, do_statement, expression, for_statement, guard_statement, local_declaration, repeat_while_statement, statement_label, while_statement]
type?: lambda_function_type
lambda_parameter:
$children?: [parameter_modifiers, self_expression]
external_name?: simple_identifier
name?: simple_identifier
modifiers?: parameter_modifiers
name: [self_expression, simple_identifier]
type?: [implicitly_unwrapped_type, type]
line_str_text:
line_string_literal:
interpolation*: interpolated_expression
text*: [line_str_text, str_escaped_char]
macro_declaration:
$children+: [attribute, modifiers, parameter, simple_identifier, type_constraints, type_parameters, unannotated_type]
default_value*: expression
definition?: macro_definition
modifiers?: modifiers
name: simple_identifier
parameter*: function_parameter
return_type?: unannotated_type
type_constraints?: type_constraints
type_parameters?: type_parameters
macro_definition:
body: [expression, external_macro_definition]
macro_invocation:
$children+: [call_suffix, simple_identifier, type_parameters]
name: simple_identifier
suffix: call_suffix
type_parameters?: type_parameters
member_modifier:
metatype:
$children: unannotated_type
name: unannotated_type
modifiers:
$children+: [attribute, function_modifier, inheritance_modifier, member_modifier, mutation_modifier, ownership_modifier, parameter_modifier, property_behavior_modifier, property_modifier, visibility_modifier]
modifier+: [attribute, function_modifier, inheritance_modifier, member_modifier, mutation_modifier, ownership_modifier, parameter_modifier, property_behavior_modifier, property_modifier, visibility_modifier]
modify_specifier:
$children?: mutation_modifier
mutation?: mutation_modifier
multi_line_str_text:
multi_line_string_literal:
interpolation*: interpolated_expression
@@ -355,80 +442,109 @@ named:
mutation_modifier:
navigation_expression:
suffix: navigation_suffix
target+: ["(", ")", array_type, dictionary_type, existential_type, expression, opaque_type, user_type]
target: [array_type, dictionary_type, expression, parenthesized_type, user_type]
navigation_suffix:
suffix: [integer_literal, simple_identifier]
nested_type_identifier:
$children+: [simple_identifier, unannotated_type]
base: unannotated_type
member*: simple_identifier
nil_coalescing_expression:
if_nil: expression
value: expression
oct_literal:
opaque_type:
$children: unannotated_type
name: unannotated_type
open_end_range_expression:
start: expression
open_start_range_expression:
end: expression
operator_declaration:
$children+: [deprecated_operator_declaration_body, referenceable_operator, simple_identifier]
body?: deprecated_operator_declaration_body
kind: ["infix", "postfix", "prefix"]
name: referenceable_operator
precedence_group?: simple_identifier
optional_chain_marker:
$children: expression
expr: expression
optional_type:
wrapped: [array_type, dictionary_type, tuple_type, user_type]
ownership_modifier:
parameter:
$children?: parameter_modifiers
external_name?: simple_identifier
modifiers?: parameter_modifiers
name: simple_identifier
type: [implicitly_unwrapped_type, type]
parameter_modifier:
parameter_modifiers:
$children+: parameter_modifier
modifier+: parameter_modifier
parenthesized_type:
type: [dictionary_type, existential_type, opaque_type]
pattern:
$children*: [expression, pattern, type, user_type, value_binding_pattern, wildcard_pattern]
binding?: value_binding_pattern
bound_identifier?: simple_identifier
kind: [binding_pattern, case_pattern, expression, tuple_pattern, type_casting_pattern, wildcard_pattern]
playground_literal:
$children+: expression
argument+: playground_literal_argument
kind: ["colorLiteral", "fileLiteral", "imageLiteral"]
playground_literal_argument:
name: simple_identifier
value: expression
postfix_expression:
operation: ["++", "--", bang]
target: expression
precedence_group_attribute:
$children+: [boolean_literal, simple_identifier]
name: simple_identifier
value: [boolean_literal, simple_identifier]
precedence_group_attributes:
$children+: precedence_group_attribute
attribute+: precedence_group_attribute
precedence_group_declaration:
$children+: [precedence_group_attributes, simple_identifier]
attributes?: precedence_group_attributes
name: simple_identifier
prefix_expression:
operation: ["&", "+", "++", "-", "--", ".", bang, custom_operator, "~"]
target: expression
property_behavior_modifier:
property_binding:
computed_value?: computed_property
name: pattern
observers?: willset_didset_block
type?: type_annotation
type_constraints?: type_constraints
value?: expression
property_declaration:
$children*: [attribute, inheritance_modifier, modifiers, ownership_modifier, property_behavior_modifier, type_annotation, type_constraints, value_binding_pattern, willset_didset_block]
computed_value*: computed_property
name+: pattern
value*: expression
binding: value_binding_pattern
declarator+: property_binding
modifiers*: [attribute, inheritance_modifier, modifiers, ownership_modifier, property_behavior_modifier]
property_modifier:
protocol_body:
$children*: protocol_member_declaration
member*: protocol_member_declaration
protocol_composition_type:
$children+: unannotated_type
type+: unannotated_type
protocol_declaration:
$children*: [attribute, inheritance_specifier, modifiers, type_constraints, type_parameters]
attribute*: attribute
body: protocol_body
declaration_kind: "protocol"
inherits*: inheritance_specifier
modifiers?: modifiers
name: type_identifier
type_constraints?: type_constraints
type_parameters?: type_parameters
protocol_function_declaration:
$children*: [attribute, modifiers, parameter, throws, throws_clause, type_constraints, type_parameters]
body?: function_body
default_value*: expression
async?: "async"
body?: block
modifiers?: modifiers
name: [referenceable_operator, simple_identifier]
parameter*: function_parameter
return_type?: [implicitly_unwrapped_type, type]
throws?: [throws, throws_clause]
type_constraints?: type_constraints
type_parameters?: type_parameters
protocol_property_declaration:
$children+: [modifiers, protocol_property_requirements, type_annotation, type_constraints]
modifiers?: modifiers
name: pattern
requirements: protocol_property_requirements
type?: type_annotation
type_constraints?: type_constraints
protocol_property_requirements:
$children*: [getter_specifier, setter_specifier]
accessor*: [getter_specifier, setter_specifier]
range_expression:
end: expression
op: ["...", "..<"]
@@ -436,48 +552,57 @@ named:
raw_str_continuing_indicator:
raw_str_end_part:
raw_str_interpolation:
$children: raw_str_interpolation_start
interpolation+: interpolated_expression
start: raw_str_interpolation_start
raw_str_interpolation_start:
raw_str_part:
raw_string_literal:
$children*: raw_str_continuing_indicator
continuing*: raw_str_continuing_indicator
interpolation*: raw_str_interpolation
text+: [raw_str_end_part, raw_str_part]
real_literal:
referenceable_operator:
$children?: [bang, custom_operator]
operator: ["!=", "!==", "%", "%=", "&", "*", "*=", "+", "++", "+=", "-", "--", "-=", "/", "/=", "<", "<<", "<=", "=", "==", "===", ">", ">=", ">>", "^", bang, custom_operator, "|", "~"]
regex_literal:
repeat_while_statement:
$children?: statements
body: block
condition+: if_condition
selector_expression:
$children: expression
expr: expression
self_expression:
setter_specifier:
$children?: mutation_modifier
mutation?: mutation_modifier
shebang_line:
simple_identifier:
simple_user_type:
arguments?: type_arguments
name: type_identifier
source_file:
$children*: [do_statement, expression, for_statement, global_declaration, guard_statement, repeat_while_statement, shebang_line, statement_label, throw_keyword, while_statement]
shebang?: shebang_line
statement*: [do_statement, expression, for_statement, global_declaration, guard_statement, repeat_while_statement, statement_label, throw_keyword, while_statement]
special_literal:
statement_label:
statements:
$children+: [control_transfer_statement, do_statement, expression, for_statement, guard_statement, local_declaration, repeat_while_statement, statement_label, while_statement]
str_escaped_char:
subscript_declaration:
$children+: [attribute, computed_property, modifiers, parameter, type_constraints, type_parameters]
default_value*: expression
body: computed_property
modifiers?: modifiers
parameter*: function_parameter
return_type?: [implicitly_unwrapped_type, type]
type_constraints?: type_constraints
type_parameters?: type_parameters
super_expression:
suppressed_constraint:
suppressed: type_identifier
switch_entry:
$children+: [default_keyword, expression, modifiers, statements, switch_pattern, where_keyword]
default?: default_keyword
modifiers?: modifiers
pattern*: switch_pattern
statement+: [control_transfer_statement, do_statement, expression, for_statement, guard_statement, local_declaration, repeat_while_statement, statement_label, while_statement]
where?: where_clause
switch_pattern:
$children: pattern
pattern: pattern
switch_statement:
$children*: switch_entry
entry*: switch_entry
expr: expression
ternary_expression:
condition: expression
@@ -488,76 +613,95 @@ named:
throws_clause:
type: unannotated_type
try_expression:
$children: try_operator
expr: expression
operator: try_operator
try_operator:
tuple_expression:
name*: simple_identifier
value+: expression
element+: tuple_expression_item
tuple_expression_item:
name?: simple_identifier
value: expression
tuple_pattern:
item+: tuple_pattern_item
tuple_pattern_item:
name?: simple_identifier
pattern: pattern
tuple_type:
$children?: tuple_type_item
element*: tuple_type_item
tuple_type_item:
$children*: [dictionary_type, existential_type, opaque_type, parameter_modifiers, wildcard_pattern]
external_name?: wildcard_pattern
modifiers?: parameter_modifiers
name?: simple_identifier
type?: type
type: [dictionary_type, existential_type, opaque_type, type]
type:
modifiers?: type_modifiers
name: unannotated_type
type_annotation:
type: [implicitly_unwrapped_type, type]
type_arguments:
$children+: type
argument+: type
type_casting_pattern:
pattern?: pattern
type: type
type_constraint:
$children: [equality_constraint, inheritance_constraint]
constraint: [equality_constraint, inheritance_constraint]
type_constraints:
$children+: [type_constraint, where_keyword]
constraint+: type_constraint
keyword: where_keyword
type_identifier:
type_modifiers:
$children+: attribute
attribute+: attribute
type_pack_expansion:
$children: unannotated_type
name: unannotated_type
type_parameter:
$children+: [type, type_identifier, type_parameter_modifiers, type_parameter_pack]
modifiers?: type_parameter_modifiers
name: [type_identifier, type_parameter_pack]
type?: type
type_parameter_modifiers:
$children+: attribute
attribute+: attribute
type_parameter_pack:
$children: unannotated_type
name: unannotated_type
type_parameters:
$children+: [type_constraints, type_parameter]
constraints?: type_constraints
parameter+: type_parameter
typealias_declaration:
$children*: [attribute, inheritance_modifier, modifiers, ownership_modifier, property_behavior_modifier, type_parameters]
modifiers*: [attribute, inheritance_modifier, modifiers, ownership_modifier, property_behavior_modifier]
name: type_identifier
type_parameters?: type_parameters
value: type
user_type:
$children+: [type_arguments, type_identifier]
part+: simple_user_type
value_argument:
$children?: type_modifiers
name?: value_argument_label
reference_specifier*: value_argument_label
type_modifiers?: type_modifiers
value?: expression
value_argument_label:
$children: simple_identifier
name: simple_identifier
value_arguments:
$children*: value_argument
argument*: value_argument
value_binding_pattern:
mutability: ["let", "var"]
value_pack_expansion:
$children: expression
expr: expression
value_parameter_pack:
$children: expression
expr: expression
visibility_modifier:
where_clause:
$children+: [expression, where_keyword]
expr: expression
keyword: where_keyword
where_keyword:
while_statement:
$children?: statements
body: block
condition+: if_condition
wildcard_pattern:
willset_clause:
$children*: [modifiers, simple_identifier, statements]
body: block
modifiers?: modifiers
parameter?: simple_identifier
willset_didset_block:
$children+: [didset_clause, willset_clause]
didset?: didset_clause
willset?: willset_clause
unnamed:
- "?"
@@ -645,6 +789,7 @@ unnamed:
- "dsohandle"
- "dynamic"
- "each"
- "else"
- "enum"
- "extension"
- "externalMacro"

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1 @@
var x = y + 2;

View File

@@ -1,101 +1,9 @@
identifier
| test.swift:1:8:1:17 | Foundation | Foundation |
| test.swift:5:9:5:13 | items | items |
| test.swift:7:19:7:21 | add | add |
| test.swift:7:23:7:23 | _ | _ |
| test.swift:7:25:7:28 | item | item |
| test.swift:8:9:8:13 | items | items |
| test.swift:8:15:8:20 | append | append |
| test.swift:8:22:8:25 | item | item |
| test.swift:11:10:11:17 | contains | contains |
| test.swift:11:19:11:19 | _ | _ |
| test.swift:11:21:11:24 | item | item |
| test.swift:12:16:12:20 | items | items |
| test.swift:12:22:12:29 | contains | contains |
| test.swift:12:31:12:34 | item | item |
| test.swift:19:9:19:13 | count | count |
| test.swift:20:10:20:13 | item | item |
| test.swift:20:15:20:16 | at | at |
| test.swift:20:18:20:22 | index | index |
| test.swift:24:6:24:10 | merge | merge |
| test.swift:24:27:24:27 | _ | _ |
| test.swift:24:29:24:33 | first | first |
| test.swift:24:39:24:39 | _ | _ |
| test.swift:24:41:24:46 | second | second |
| test.swift:24:73:24:73 | T | T |
| test.swift:24:75:24:81 | Element | Element |
| test.swift:25:9:25:14 | result | result |
| test.swift:25:18:25:22 | Array | Array |
| test.swift:25:24:25:28 | first | first |
| test.swift:26:9:26:12 | item | item |
| test.swift:26:17:26:22 | second | second |
| test.swift:27:13:27:18 | result | result |
| test.swift:27:20:27:27 | contains | contains |
| test.swift:27:29:27:32 | item | item |
| test.swift:28:13:28:18 | result | result |
| test.swift:28:20:28:25 | append | append |
| test.swift:28:27:28:30 | item | item |
| test.swift:31:12:31:17 | result | result |
| test.swift:37:17:37:20 | data | data |
| test.swift:39:9:39:13 | count | count |
| test.swift:40:16:40:19 | data | data |
| test.swift:40:21:40:25 | count | count |
| test.swift:43:9:43:15 | isEmpty | isEmpty |
| test.swift:44:9:44:12 | data | data |
| test.swift:44:14:44:20 | isEmpty | isEmpty |
| test.swift:47:10:47:13 | item | item |
| test.swift:47:15:47:16 | at | at |
| test.swift:47:18:47:22 | index | index |
| test.swift:48:15:48:19 | index | index |
| test.swift:48:29:48:33 | index | index |
| test.swift:48:37:48:40 | data | data |
| test.swift:48:42:48:46 | count | count |
| test.swift:49:16:49:19 | data | data |
| test.swift:49:21:49:25 | index | index |
| test.swift:52:10:52:12 | add | add |
| test.swift:52:14:52:14 | _ | _ |
| test.swift:52:16:52:19 | item | item |
| test.swift:53:9:53:12 | data | data |
| test.swift:53:14:53:19 | append | append |
| test.swift:53:21:53:24 | item | item |
| test.swift:59:10:59:16 | success | success |
| test.swift:60:10:60:16 | failure | failure |
| test.swift:62:10:62:12 | map | map |
| test.swift:62:17:62:17 | _ | _ |
| test.swift:62:19:62:27 | transform | transform |
| test.swift:64:15:64:21 | success | success |
| test.swift:64:27:64:31 | value | value |
| test.swift:65:21:65:27 | success | success |
| test.swift:65:29:65:37 | transform | transform |
| test.swift:65:39:65:43 | value | value |
| test.swift:66:15:66:21 | failure | failure |
| test.swift:66:27:66:31 | error | error |
| test.swift:67:21:67:27 | failure | failure |
| test.swift:67:29:67:33 | error | error |
| test.swift:73:23:73:29 | Element | Element |
| test.swift:74:10:74:17 | isSorted | isSorted |
| test.swift:75:13:75:13 | i | i |
| test.swift:75:23:75:31 | blah | blah |
| test.swift:76:21:76:21 | i | i |
| test.swift:76:31:76:35 | blah | blah |
| test.swift:85:6:85:12 | combine | combine |
| test.swift:85:17:85:17 | _ | _ |
| test.swift:85:19:85:24 | values | values |
| test.swift:85:32:85:40 | transform | transform |
| test.swift:86:12:86:17 | values | values |
| test.swift:86:19:86:25 | isEmpty | isEmpty |
| test.swift:87:12:87:17 | values | values |
| test.swift:87:19:87:27 | dropFirst | dropFirst |
| test.swift:87:31:87:36 | reduce | reduce |
| test.swift:87:38:87:43 | values | values |
| test.swift:87:49:87:57 | transform | transform |
func
| test.swift:7:5:9:5 | FunctionDeclaration |
| test.swift:11:5:13:5 | FunctionDeclaration |
| test.swift:24:1:32:1 | FunctionDeclaration |
| test.swift:47:5:50:5 | FunctionDeclaration |
| test.swift:52:5:54:5 | FunctionDeclaration |
| test.swift:62:5:69:5 | FunctionDeclaration |
| test.swift:74:5:81:5 | FunctionDeclaration |
| test.swift:85:1:88:1 | FunctionDeclaration |
add
nameExpr
unsupported
| test.swift:3:1:3:38 | | |
| test.swift:16:1:16:32 | | |
| test.swift:23:1:23:37 | | |
| test.swift:34:1:34:49 | | |
| test.swift:57:1:57:30 | | |
| test.swift:72:1:72:37 | | |
| test.swift:84:1:84:24 | | |

View File

@@ -1,9 +1,5 @@
import codeql.unified.Ast
import codeql.unified.Ast::Unified
query predicate identifier(Swift::SimpleIdentifier node, string name) { name = node.getValue() }
query predicate nameExpr(NameExpr node, string value) { value = node.getIdentifier().getValue() }
query predicate func(Swift::FunctionDeclaration node) { any() }
query predicate add(Swift::AdditiveExpression node, Swift::AstNode lhs, Swift::AstNode rhs) {
lhs = node.getLhs(0) and rhs = node.getRhs(0)
}
query predicate unsupported(UnsupportedNode node, string value) { value = node.getValue() }

View File

@@ -0,0 +1,8 @@
#!/bin/bash
set -euo pipefail
IFS=$'\n\t'
cd "$(dirname "$0")/.."
cd extractor
UNIFIED_UPDATE_CORPUS=1 cargo test