Same pattern we've seen many times before: a field on an anonymous node
gets attached to the parent node instead.
I'm not 100% sure this is the right solution, but it seemed wrong to
just make `_parenthesized_type` named instead (we don't usually name
parentheticals). At the very least, this cleans up the spurious
navigation_expression.element and tuple_type_item.element fields.
Because `_type` was anonymous, its body was inlined in all of the places
it appeared. Because this body contained a `name` field, this field was
_also_ inlined. This caused a bunch of nodes to have spurious `name`
fields, and for some of them (that already had such a field) it caused
that field have multiplicity greater than one.
To fix this, we make the `_type` node named, which prevents the errant
field from escaping.
Adds a new type `nested_type_identifier`, which contains the
choice-branch that previously allowed those tokens to bleed through into
the closest parent field.
You know the drill. We just make an anonymous node named instead. In
this case, however, we have to be a bit more clever about how to rewrite
it. We turn the sequence of a type followed by an optional ! into a
_choice_ between mere type or type followed by bang (the latter being
our new named node).
Supertypes are a honking great idea. We should use more of them.
This massively cleans up the node types, without polluting the AST with
`expression` nodes.
Before, the `condition` field of an if statement supposedly could
contain things like parentheses and commas, due to bleeding from
referenced anonymous nodes. Making the node named makes this issue go
away.
We make _referenceable_operator a named node. This prevents it from
bleeding through to the _expression definition. It likely also makes the
output easier to deal with, as bare operators used as arguments now have
a named node wrapping them in the AST.
Also removes a duplicated inclusion of _comparison_operator that served
no purpose.
This caused any field containing an _expression to appear as if it could
countain any number of such nodes. It also threw away the information
that there was a `?` marker there.
To fix it, we simply move the definition into its own named node.
The astute reader will note that we seem to _lose_ some node types in
the process. Apparently, these were unreachable in the grammar, and the
newer version of tree-sitter removes such "dead code".
Changes based on code review:
1. Remove redundant strings.Contains check in isExactTestPackage
The equality check on the next line handles both cases, making
the early return unnecessary.
2. Extract package selection logic into selectBestPackages function
This reduces code duplication and allows the test to call the
actual implementation rather than copying the logic.
3. Add TestSelectBestPackages to test the new function
Comprehensive test covering single packages, test vs production,
exact vs nested tests, and multiple packages.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Generated by manually applying the output from CI's Gazelle check.
This adds the go_test target for the new extractor_test.go file.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This test verifies that root internal test files (package foo, not
foo_test) are correctly extracted when the repository has both:
1. Root-level internal tests (main_test.go with package main)
2. Nested packages with tests (nested/nested_test.go)
This scenario reproduces the bug that was fixed: the old extractor
would select the wrong package variant and miss root internal test
files.
The test ensures:
- main_test.go (root internal test) is extracted
- nested/nested_test.go (nested test) is extracted
- All test functions from both files are present in the database
This prevents regression of the bug fix.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
When CODEQL_EXTRACTOR_GO_OPTION_EXTRACT_TESTS=true is set, the Go
extractor was incorrectly skipping internal test files (package foo)
at repository roots when the project contains nested test packages.
Root Cause:
The extractor selected package variants by longest ID string, but this
heuristic fails when nested packages have tests. For a package like
"github.com/go-git/go-git/v6", packages.Load returns multiple variants:
1. "github.com/go-git/go-git/v6" (19 files, production only)
2. "github.com/go-git/go-git/v6 [github.com/go-git/go-git/v6.test]"
(39 files, production + 20 root tests) ← Should select this
3. "github.com/go-git/go-git/v6 [github.com/go-git/go-git/v6/plumbing/format/packfile.test]"
(19 files, test dependency) ← Was incorrectly selected (longest string)
The old logic selected variant #3 (76 chars) over #2 (68 chars),
causing 20 root test files to be missing from the database.
Fix:
Replace string length comparison with a better heuristic that prefers:
1. Exact test packages (e.g., "pkg [pkg.test]") over nested dependencies
2. Packages with more Syntax nodes (more files to extract)
3. String length as a tiebreaker
This ensures the extractor selects the variant with the most complete
test coverage, particularly for root-level internal tests.
Testing:
- Added comprehensive unit tests covering the selection logic
- Tests simulate the real-world go-git scenario
- All tests pass
Impact:
Root-level external tests (package foo_test) were already extracted
correctly. This fix ensures internal tests (package foo) at the root
are now also extracted when they exist alongside nested test packages.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>