Key fixes: - UTF-16 offset conversion for positions (buildOffsetTables, byteToUTF16, utf16ToByte) - Unicode identifier scanning (support ID_Start/ID_Continue categories) - Filter zero-width synthetic modifiers from nested namespaces - Add ImportAttributes to childprops (elements property) - Emit isTypeOf:false for ImportType nodes - Always emit empty statements array for SourceFile - Emit empty arrays for remaining array properties when no children - Non-greedy > scanning (always single GreaterThanToken) - Ignore parseDiagnostics in structural comparison Remaining 5 failures are binary/UTF-16-BOM encoded files (not real TypeScript). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
JavaScript extractor
This directory contains the source code of the JavaScript extractor. The extractor depends on various libraries that are not currently bundled with the source code, so at present it cannot be built in isolation.
The extractor consists of a parser for the latest version of ECMAScript, including a few proposed and historic extensions (see src/com/semmle/jcorn), classes for representing JavaScript and TypeScript ASTs (src/com/semmle/js/ast and src/com/semmle/ts/ast), and various other bits of functionality. Historically, the main entry point of the JavaScript extractor has been com.semmle.js.extractor.Main. However, this class is slowly being phased out in favour of com.semmle.js.extractor.AutoBuild, which is the entry point used by CodeQL.
License
Like the CodeQL queries, the JavaScript extractor is licensed under the MIT License by GitHub. Some code is derived from other projects, whose licenses are noted in other LICENSE-*.md files in this folder.