TS7 binary AST uses bit 6 for ExportContext (set on all nodes inside
`declare module` contexts), not GlobalAugmentation as previously assumed.
GlobalAugmentation is not a flag in the TS7 binary format at all.
Fix by using a synthetic flag bit (1<<30) for GlobalAugmentation that the
converter sets on `declare global {}` nodes based on the name identifier
being "global". This lets the Java extractor correctly distinguish
`declare global {}` from regular namespace declarations.
Also corrects the flag shift: ExportContext=64 (bit 6), ContainsThis=128
(bit 7), etc., matching the actual TS7 binary layout.
TRAP test results: 494/495 passing (99.8%)
Remaining: badimport.ts (TS7 binary API doesn't report parse diagnostics)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Wire the Go-based TypeScript parser wrapper as an alternative to the
Node.js wrapper. Enabled via SEMMLE_TYPESCRIPT_USE_GO_PARSER=true.
When enabled:
- Skips Node.js installation verification
- Launches the Go binary directly (no Node.js required)
- Uses the same newline-delimited JSON protocol over stdin/stdout
- Go binary path configurable via SEMMLE_TYPESCRIPT_GO_PARSER_WRAPPER
- tsgo binary path passed through via SEMMLE_TYPESCRIPT_TSGO_BINARY
The Go wrapper implements all protocol commands: get-metadata, parse,
prepare-files, reset, and quit.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The shell validation script now uses a structural comparison that
ignores expected numeric differences in kind/flags/token/operator
values between TS5 and TS7. Only truly structural diffs cause failure.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Implement the core components for translating tsgo's binary AST format
into the JSON format expected by the Java extractor:
- decoder.go: Binary AST format parser with random-access node accessors
(kind, pos, end, flags, children, strings, extended data)
- converter.go: Walks decoded AST and produces JSON matching Node.js
wrapper output (augmented , , , ,
isTypeOnly, HeritageClause token, TypeOperator operator)
- childprops.go: Maps ~100 SyntaxKind names to ordered child property
name lists for correct bitmask-to-property assignment
- scanner.go: TypeScript tokenizer producing array with rescan
support for regex, template, and greater-than disambiguation
Update metadata.go with correct TS7 SyntaxKind iota values and export
metadata functions. Wire decoder+converter through TsgoParser.Parse().
Validation test passes: all 421 diffs are expected TS5-vs-TS7 numeric
kind/flags/token/operator value differences. Zero structural diffs.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The script was calling wrappers in single-file CLI mode, but neither
wrapper supports that (they read commands from stdin). Now sends
parse + quit commands via stdin and uses `timeout` to avoid hangs.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add initial scaffolding for a Go process that will replace the Node.js
TypeScript parser wrapper, preparing for TypeScript 7's Go-based compiler.
The Go wrapper implements the same stdin/stdout line-delimited JSON
protocol as the existing Node.js wrapper (lib/typescript/src/main.ts),
making it a drop-in replacement from the Java extractor's perspective.
Key components:
- Protocol handler matching the Node.js wrapper's command set
(get-metadata, prepare-files, parse, reset, quit)
- Parser backend interface with tsgo subprocess implementation
using the tsgo --api --async JSON-RPC mode (LSP Content-Length framing)
- AST property whitelist matching the ~90 properties from the Node.js wrapper
- Static TS7 SyntaxKind and NodeFlags metadata mappings
- Validation framework for comparing JSON output between wrappers
- Integration tests demonstrating successful tsgo API communication:
initialize, updateSnapshot (project opening), getSourceFile
Key finding: the tsgo API returns binary-encoded ASTs (not JSON),
requiring a decoder for the custom flat-node-array format. See
microsoft/typescript-go/internal/api/encoder/ for the format spec.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Update the shebang regexp (renamed NODE_INVOCATION -> JS_INVOCATION) to
also match 'bun' and 'tsx' so that scripts using these runtimes are
correctly identified as JavaScript files.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add explicit load statements for java_library and java_test from
@rules_java//java:defs.bzl in:
- javascript/extractor/BUILD.bazel
- javascript/extractor/test/com/semmle/js/extractor/test/BUILD.bazel
The code was written on the assumption that 'seenCode' implies 'seenFiles' but the unit test override 'hasSeenCode()' to always return true. Which meant we would start taking this branch in the unit tests.
In #19680 we added support for automatically ignoring files in the
`outDir` directory as specified in the TSconfig compiler options (as
these files were likely duplicates of `.ts` file we were already
scanning).
However, in some cases people put `outDir: "."` or even `outDir: ".."`
in their configuration, which had the side effect of excluding _all_
files, leading to a failed extraction.
With the changes in this PR, we now ignore any `outDir`s that are not
properly contained within the source root of the code being scanned.
This should prevent the files from being extracted, while still allowing
us to not double-scan files in, say, a `.github` directory, as seen in
some Actions workflows.
The 'extractTypeScriptFiles' override did not incorporate the file type and one of our unit tests was expecting this. The test was previously passing for the wrong reasons.
Fixes two things:
- The basic test should no longer extract `tst.js` (as `tst.ts` is
present)
- The `AutoBuild` mock did not populate `extractedFiles` correctly,
which broke the logic that looks for TypeScript files with the same
basename.