SnakeYAML 2.3 has [a bug](https://bitbucket.org/snakeyaml/snakeyaml/issues/1098) where it crashes with an `IndexOutOfBoundsException` when a Unicode surrogate pair (e.g. an emoji) straddles the 1024 character internal buffer boundary. This happens because the high surrogate can end up as the last character in the data window, and the reader tries to read the low surrogate past the end of the buffer.
This caused languages that extract YAML, most notably JavaScript and Actions, to fail when the codebase contained a YAML file with an emoji at an unlucky position in the file.
Update the shebang regexp (renamed NODE_INVOCATION -> JS_INVOCATION) to
also match 'bun' and 'tsx' so that scripts using these runtimes are
correctly identified as JavaScript files.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add explicit load statements for java_library and java_test from
@rules_java//java:defs.bzl in:
- javascript/extractor/BUILD.bazel
- javascript/extractor/test/com/semmle/js/extractor/test/BUILD.bazel
The code was written on the assumption that 'seenCode' implies 'seenFiles' but the unit test override 'hasSeenCode()' to always return true. Which meant we would start taking this branch in the unit tests.
In #19680 we added support for automatically ignoring files in the
`outDir` directory as specified in the TSconfig compiler options (as
these files were likely duplicates of `.ts` file we were already
scanning).
However, in some cases people put `outDir: "."` or even `outDir: ".."`
in their configuration, which had the side effect of excluding _all_
files, leading to a failed extraction.
With the changes in this PR, we now ignore any `outDir`s that are not
properly contained within the source root of the code being scanned.
This should prevent the files from being extracted, while still allowing
us to not double-scan files in, say, a `.github` directory, as seen in
some Actions workflows.
The 'extractTypeScriptFiles' override did not incorporate the file type and one of our unit tests was expecting this. The test was previously passing for the wrong reasons.
Fixes two things:
- The basic test should no longer extract `tst.js` (as `tst.ts` is
present)
- The `AutoBuild` mock did not populate `extractedFiles` correctly,
which broke the logic that looks for TypeScript files with the same
basename.