Fetch syntactic diagnostics from the tsgo API after parsing each file.
Only genuine parse errors (diagnostic codes 1000-1999) are included;
higher codes like 2880 (import assertion deprecation) are filtered out
since they don't indicate actual parse failures.
The Java extractor uses parseDiagnostics to report syntax errors and
skip full AST extraction for broken files, matching TS5 behavior.
TRAP test results: 495/495 passing (100%)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
TS7 binary AST uses bit 6 for ExportContext (set on all nodes inside
`declare module` contexts), not GlobalAugmentation as previously assumed.
GlobalAugmentation is not a flag in the TS7 binary format at all.
Fix by using a synthetic flag bit (1<<30) for GlobalAugmentation that the
converter sets on `declare global {}` nodes based on the name identifier
being "global". This lets the Java extractor correctly distinguish
`declare global {}` from regular namespace declarations.
Also corrects the flag shift: ExportContext=64 (bit 6), ContainsThis=128
(bit 7), etc., matching the actual TS7 binary layout.
TRAP test results: 494/495 passing (99.8%)
Remaining: badimport.ts (TS7 binary API doesn't report parse diagnostics)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Wire the Go-based TypeScript parser wrapper as an alternative to the
Node.js wrapper. Enabled via SEMMLE_TYPESCRIPT_USE_GO_PARSER=true.
When enabled:
- Skips Node.js installation verification
- Launches the Go binary directly (no Node.js required)
- Uses the same newline-delimited JSON protocol over stdin/stdout
- Go binary path configurable via SEMMLE_TYPESCRIPT_GO_PARSER_WRAPPER
- tsgo binary path passed through via SEMMLE_TYPESCRIPT_TSGO_BINARY
The Go wrapper implements all protocol commands: get-metadata, parse,
prepare-files, reset, and quit.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The shell validation script now uses a structural comparison that
ignores expected numeric differences in kind/flags/token/operator
values between TS5 and TS7. Only truly structural diffs cause failure.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Implement the core components for translating tsgo's binary AST format
into the JSON format expected by the Java extractor:
- decoder.go: Binary AST format parser with random-access node accessors
(kind, pos, end, flags, children, strings, extended data)
- converter.go: Walks decoded AST and produces JSON matching Node.js
wrapper output (augmented , , , ,
isTypeOnly, HeritageClause token, TypeOperator operator)
- childprops.go: Maps ~100 SyntaxKind names to ordered child property
name lists for correct bitmask-to-property assignment
- scanner.go: TypeScript tokenizer producing array with rescan
support for regex, template, and greater-than disambiguation
Update metadata.go with correct TS7 SyntaxKind iota values and export
metadata functions. Wire decoder+converter through TsgoParser.Parse().
Validation test passes: all 421 diffs are expected TS5-vs-TS7 numeric
kind/flags/token/operator value differences. Zero structural diffs.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The script was calling wrappers in single-file CLI mode, but neither
wrapper supports that (they read commands from stdin). Now sends
parse + quit commands via stdin and uses `timeout` to avoid hangs.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add initial scaffolding for a Go process that will replace the Node.js
TypeScript parser wrapper, preparing for TypeScript 7's Go-based compiler.
The Go wrapper implements the same stdin/stdout line-delimited JSON
protocol as the existing Node.js wrapper (lib/typescript/src/main.ts),
making it a drop-in replacement from the Java extractor's perspective.
Key components:
- Protocol handler matching the Node.js wrapper's command set
(get-metadata, prepare-files, parse, reset, quit)
- Parser backend interface with tsgo subprocess implementation
using the tsgo --api --async JSON-RPC mode (LSP Content-Length framing)
- AST property whitelist matching the ~90 properties from the Node.js wrapper
- Static TS7 SyntaxKind and NodeFlags metadata mappings
- Validation framework for comparing JSON output between wrappers
- Integration tests demonstrating successful tsgo API communication:
initialize, updateSnapshot (project opening), getSourceFile
Key finding: the tsgo API returns binary-encoded ASTs (not JSON),
requiring a decoder for the custom flat-node-array format. See
microsoft/typescript-go/internal/api/encoder/ for the format spec.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Fix misplaced semicolons in test files (was inside comment, moved before it)
- Update QLdoc comments to reference new browser source kind names
- Update docs to list browser source kinds and fix outdated 'only remote' note
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add test files with #!/usr/bin/env bun, #!/usr/bin/env tsx, and
#!/usr/bin/env node shebangs. The query lists extracted .ts files,
verifying that all three shebangs are recognized and the files are
not skipped by the extractor.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Update the shebang regexp (renamed NODE_INVOCATION -> JS_INVOCATION) to
also match 'bun' and 'tsx' so that scripts using these runtimes are
correctly identified as JavaScript files.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add explicit load statements for java_library and java_test from
@rules_java//java:defs.bzl in:
- javascript/extractor/BUILD.bazel
- javascript/extractor/test/com/semmle/js/extractor/test/BUILD.bazel