Commit Graph

966 Commits

Author SHA1 Message Date
Asger F
fbaf648e4f Fix nodeFlags: bit 6 is ExportContext, not GlobalAugmentation
TS7 binary AST uses bit 6 for ExportContext (set on all nodes inside
`declare module` contexts), not GlobalAugmentation as previously assumed.
GlobalAugmentation is not a flag in the TS7 binary format at all.

Fix by using a synthetic flag bit (1<<30) for GlobalAugmentation that the
converter sets on `declare global {}` nodes based on the name identifier
being "global". This lets the Java extractor correctly distinguish
`declare global {}` from regular namespace declarations.

Also corrects the flag shift: ExportContext=64 (bit 6), ContainsThis=128
(bit 7), etc., matching the actual TS7 binary layout.

TRAP test results: 494/495 passing (99.8%)
Remaining: badimport.ts (TS7 binary API doesn't report parse diagnostics)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-13 15:22:58 +02:00
Asger F
637ce99e44 TypeScript Go extractor: metadata fixes, NestedNamespace inference, and scanner improvements
- Fix TS7 nodeFlags: remove Synthesized (shifted in TS7), add GlobalAugmentation=64,
  correct OptionalChain=32, Namespace=16, shift subsequent flags
- Add 33 missing operator/punctuation token kinds to syntaxKinds metadata
- Infer NestedNamespace flag for dotted namespace declarations (TS7 binary
  doesn't set it, but Java extractor needs it)
- Fix shebang handling: emit ShebangTrivia (kind 6) instead of SingleLineCommentTrivia
- Fix token kinds for regex/template rescans to match TS5 pre-rescan behavior
  (SlashToken for regexes, CloseBraceToken for template continuations)
- Fix augmentPos to correctly skip comments (matching TS5's trivia-skipping regex)
- Resolve native tsgo binary from npm wrapper to avoid Node.js dependency
- Update project-layout glob for worktree support

TRAP test results: 493/495 passing (99.6%)
Remaining: badimport.ts (missing diagnostics), externalmodule.ts (structural diff)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-13 15:11:48 +02:00
Asger F
bd9d6b1962 Add Go TypeScript parser wrapper integration to Java extractor
Wire the Go-based TypeScript parser wrapper as an alternative to the
Node.js wrapper. Enabled via SEMMLE_TYPESCRIPT_USE_GO_PARSER=true.

When enabled:
- Skips Node.js installation verification
- Launches the Go binary directly (no Node.js required)
- Uses the same newline-delimited JSON protocol over stdin/stdout
- Go binary path configurable via SEMMLE_TYPESCRIPT_GO_PARSER_WRAPPER
- tsgo binary path passed through via SEMMLE_TYPESCRIPT_TSGO_BINARY

The Go wrapper implements all protocol commands: get-metadata, parse,
prepare-files, reset, and quit.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-10 15:44:55 +02:00
Asger F
bd5e4761bd Fix broader validation: 52/57 tests pass
Key fixes:
- UTF-16 offset conversion for positions (buildOffsetTables, byteToUTF16, utf16ToByte)
- Unicode identifier scanning (support ID_Start/ID_Continue categories)
- Filter zero-width synthetic modifiers from nested namespaces
- Add ImportAttributes to childprops (elements property)
- Emit isTypeOf:false for ImportType nodes
- Always emit empty statements array for SourceFile
- Emit empty arrays for remaining array properties when no children
- Non-greedy > scanning (always single GreaterThanToken)
- Ignore parseDiagnostics in structural comparison

Remaining 5 failures are binary/UTF-16-BOM encoded files (not real TypeScript).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-10 15:19:49 +02:00
Asger F
93deb33a2a Fix validation script to tolerate expected TS7 kind/flags diffsTS5
The shell validation script now uses a structural comparison that
ignores expected numeric differences in kind/flags/token/operator
values between TS5 and TS7. Only truly structural diffs cause failure.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-10 14:54:19 +02:00
Asger F
f3b27a56b1 TypeScript-Go wrapper: binary AST decoder, JSON converter, and tokenizer
Implement the core components for translating tsgo's binary AST format
into the JSON format expected by the Java extractor:

- decoder.go: Binary AST format parser with random-access node accessors
  (kind, pos, end, flags, children, strings, extended data)
- converter.go: Walks decoded AST and produces JSON matching Node.js
  wrapper output (augmented , , , ,
  isTypeOnly, HeritageClause token, TypeOperator operator)
- childprops.go: Maps ~100 SyntaxKind names to ordered child property
  name lists for correct bitmask-to-property assignment
- scanner.go: TypeScript tokenizer producing  array with rescan
  support for regex, template, and greater-than disambiguation

Update metadata.go with correct TS7 SyntaxKind iota values and export
metadata functions. Wire decoder+converter through TsgoParser.Parse().

Validation test passes: all 421 diffs are expected TS5-vs-TS7 numeric
kind/flags/token/operator value differences. Zero structural diffs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-10 14:36:00 +02:00
Asger F
37852aa1d3 JS: Fix validation script to use stdin protocol with timeouts
The script was calling wrappers in single-file CLI mode, but neither
wrapper supports that (they read commands from stdin). Now sends
parse + quit commands via stdin and uses `timeout` to avoid hangs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-10 13:46:39 +02:00
Asger F
78b1651596 JS: Add Go-based TypeScript parser wrapper scaffolding
Add initial scaffolding for a Go process that will replace the Node.js
TypeScript parser wrapper, preparing for TypeScript 7's Go-based compiler.

The Go wrapper implements the same stdin/stdout line-delimited JSON
protocol as the existing Node.js wrapper (lib/typescript/src/main.ts),
making it a drop-in replacement from the Java extractor's perspective.

Key components:
- Protocol handler matching the Node.js wrapper's command set
  (get-metadata, prepare-files, parse, reset, quit)
- Parser backend interface with tsgo subprocess implementation
  using the tsgo --api --async JSON-RPC mode (LSP Content-Length framing)
- AST property whitelist matching the ~90 properties from the Node.js wrapper
- Static TS7 SyntaxKind and NodeFlags metadata mappings
- Validation framework for comparing JSON output between wrappers
- Integration tests demonstrating successful tsgo API communication:
  initialize, updateSnapshot (project opening), getSourceFile

Key finding: the tsgo API returns binary-encoded ASTs (not JSON),
requiring a decoder for the custom flat-node-array format. See
microsoft/typescript-go/internal/api/encoder/ for the format spec.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-10 13:11:17 +02:00
Asger F
84d1828a9c JavaScript extractor: recognise bun and tsx in shebang lines
Update the shebang regexp (renamed NODE_INVOCATION -> JS_INVOCATION) to
also match 'bun' and 'tsx' so that scripts using these runtimes are
correctly identified as JavaScript files.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-12 09:35:36 +01:00
Asger F
d440b5fa85 JS: Update TRAP files 2026-02-27 14:15:34 +01:00
Asger F
0f2de46648 JS: Emit variable bindings for 'this' expressions 2026-02-27 11:44:54 +01:00
Asger F
f0f58dacb3 JS: Also emit 'this' variable for class scopes 2026-02-27 11:44:31 +01:00
Asger F
e0ab5ce49b JS: Emit variables for 'this'
The extractor does not emit bindings for 'this', we just ensure that a variable exists for it
2026-02-25 10:17:02 +01:00
Paolo Tranquilli
10a2824b82 refactor: migrate BUILD files to explicit rules_java imports
Add explicit load statements for java_library and java_test from
@rules_java//java:defs.bzl in:
- javascript/extractor/BUILD.bazel
- javascript/extractor/test/com/semmle/js/extractor/test/BUILD.bazel
2026-02-10 13:44:06 +01:00
Asger F
0eadebcabd Update javascript/extractor/src/com/semmle/js/extractor/FileExtractor.java
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-14 11:40:01 +01:00
Asger F
7ab52a81a7 JS: Add environment variable to opt out of the behaviour if needed 2026-01-14 11:40:01 +01:00
Asger F
98c8b4c080 JS: Skip minified file if avg line length > 200 2026-01-14 11:40:01 +01:00
Paolo Tranquilli
82435218dc Javascript: fix compilation error after scripted replacement 2025-11-11 12:44:33 +01:00
Paolo Tranquilli
ff62c65cdf Javascript: avoid null pointer exception on boolean values 2025-11-11 12:11:49 +01:00
Paolo Tranquilli
6ef314ed03 Javascript: fix errors from upcoming rules_java update 2025-11-11 11:53:07 +01:00
Asger F
81bb07a7ba JS: Fix check to account for override in tests
The code was written on the assumption that 'seenCode' implies 'seenFiles' but the unit test override 'hasSeenCode()' to always return true. Which meant we would start taking this branch in the unit tests.
2025-11-04 11:46:02 +01:00
Asger F
105213df03 Update javascript/extractor/src/com/semmle/js/extractor/AutoBuild.java
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-04 09:51:56 +01:00
Asger F
c4d23d16ed Actions: dont fail if no JS/TS code was found 2025-11-04 09:37:33 +01:00
Asger F
0acfacefbf JS: Recursively delete source archive so emptiness detection works 2025-10-30 15:31:51 +01:00
Asger F
bab2a79055 JS: Add parsing support in JS parser 2025-09-05 11:57:34 +02:00
Asger F
215602c963 JS: Preserve information about 'defer' keyword 2025-09-05 11:57:33 +02:00
Asger F
0d03c813d0 JS: Also update @types/node version 2025-09-05 11:57:30 +02:00
Asger F
b2b5199055 JS: Bump TypeScript dependency to 5.9 2025-09-05 11:57:29 +02:00
Asger F
67a1c2ffef Update javascript/extractor/src/com/semmle/js/extractor/AutoBuild.java
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-09-01 10:20:17 +02:00
Asger F
c1df8a95cb JS: Overlay extraction support 2025-08-19 09:19:55 +02:00
Taus
30f705822d JavaScript: Add test where outDir resolves to an unwanted path 2025-07-11 14:58:03 +00:00
Taus
43accc50cd JavaScript: Ignore outDirs that would exclude everything
In #19680 we added support for automatically ignoring files in the
`outDir` directory as specified in the TSconfig compiler options (as
these files were likely duplicates of `.ts` file we were already
scanning).

However, in some cases people put `outDir: "."` or even `outDir: ".."`
in their configuration, which had the side effect of excluding _all_
files, leading to a failed extraction.

With the changes in this PR, we now ignore any `outDir`s that are not
properly contained within the source root of the code being scanned.
This should prevent the files from being extracted, while still allowing
us to not double-scan files in, say, a `.github` directory, as seen in
some Actions workflows.
2025-07-11 13:28:59 +00:00
Asger F
4b2025d2c4 JS: Remove obsolete unit tests 2025-07-02 09:54:18 +02:00
Asger F
2aad14771c JS: Remove TypeScriptMode 2025-07-02 08:39:17 +02:00
Asger F
5289e4f424 JS: Fix a bug in a unit test
The 'extractTypeScriptFiles' override did not incorporate the file type and one of our unit tests was expecting this. The test was previously passing for the wrong reasons.
2025-06-25 14:31:31 +02:00
Asger F
02cdde1447 JS: Fix imprecise condition 2025-06-25 14:31:28 +02:00
Asger F
488da145e8 JS: Don't try to augment invalid files
This check existed on the code path for full type extraction, but not for plain single-file extraction.
2025-06-25 14:31:11 +02:00
Asger F
74b817b642 JS: Remove code path for TypeScript full extraction 2025-06-25 14:31:05 +02:00
Asger F
8efa38be79 JS: Change default TypeScript extraction mode to basic 2025-06-23 12:55:20 +02:00
Taus
ac8b41a5da Merge pull request #19680 from github/tausbn/javascript-exclude-obviously-generated-files
JavaScript: Don't extract obviously generated files
2025-06-20 15:52:39 +02:00
Taus
e3d9d92f25 JavaScript: Fix duplicate comment 2025-06-10 12:59:03 +00:00
Taus
f08c2fa387 JavaScript: Move tsconfig files into extractor.tsconfig package
Also make the indentation in `CompilerOptions.java` more consistent.
2025-06-10 12:58:48 +00:00
Taus
281ccf7c11 JavaScript: Extract tsconfig.json also in basic mode
This is needed for the logic that skips files inside the directory
specified in the `tsconfig.json` `outDir` compiler option.
2025-06-05 15:01:05 +00:00
Taus
619256e037 JavaScript: Fix existing tests and test runner
Fixes two things:
- The basic test should no longer extract `tst.js` (as `tst.ts` is
  present)
- The `AutoBuild` mock did not populate `extractedFiles` correctly,
  which broke the logic that looks for TypeScript files with the same
  basename.
2025-06-05 14:59:40 +00:00
Taus
8829f7820a JavaScript: Don't extract files with TypeScript progenitors 2025-06-05 14:57:00 +00:00
Taus
14f50880e9 JavaScript: Don't extract files in tsconfig.json outDir 2025-06-05 14:56:59 +00:00
Asger F
11607e5f62 JS: Update TRAP after extractor change 2025-05-20 13:20:36 +02:00
Asger F
50e4ac8298 JS: Do not ignore variables from ambient declarations 2025-05-20 13:19:51 +02:00
Asger F
359525b65a JS: Extract more tsconfig.json patterns 2025-04-29 12:46:49 +02:00
Asger F
8c0b0c4800 JS: Ensure json files are extracted properly in tests 2025-04-29 12:46:20 +02:00