This fetches the resource directory directly from the released
toolchains, allowing us to stop prebuilding and assembling them.
Moreover insertion of our resource directory is moved to the lua
tracing configuration (solving a `TODO`) and enhanced. Now all options
that start with the original resource directory (either explicit or
implied) are redirected to our resource directory.
This solves a problem where `-I <original resource dir>/some/path` was
passed to the extractor and did not work.
This works around the 5.9 linux compatibility problem by including the
`PackageDescription` swift modules in the in-dist toolchain. Copying the
toolchain and fixing the `-I` flag was not enough as for some reason
compilation of `PackageDescription.swiftinterface` was causing a crash
in the SIL pass. We work around that by pre-compiling those modules
during the build and including `.swiftmodule` files in the resource
directory.
TODO (apart from testing):
* the libraries included in the macOS toolchain are now fat (they were
intel only before), occupying more space. We should see if we need to
trim them down.
* there might be other swiftinterface files causing problems on linux
lurking around...
* if we go with this, we can simplify and trim down the prebuilding we
do leaving out the resource directory.
This reverts commit 0d9dcb161f.
This turns out to introduce a subtle bug related to destruction order
between `Log::instance()` and the `Logger` instances.
New `DIAGNOSE_ERROR` and `DIAGNOSE_CRITICAL` macros are added.
These accept an ID which should indicate a diagnostic source via
a function definition in `codeql::diagnostics`, together with the usual
format + arguments accepted by other `LOG_*` macros.
When the log is flushed, these special logs will result in an error JSON
diagnostic entry in the database.
This also removes the TODO about using `absl::StrJoin` to dump the environment because we can't easily get a range from a null-terminated `envp`. It also doesn't suffer from the usual awkwardness around inserting a separator *between* elements but not after the last one, so a for loop is clear enough.
In case the extractor is run in isolation for debugging/testing, this
will avoid littering the current working directory with artifacts, and
instead having a single `extractor-out` directory to inspect or clean
up.
Also extractor logs have been nested into a `swift` directory, as the
log directory provided by the `codeql` cli is actually shared between
languages.
This is done by adding a `isSuccessfullyExtracted` predicate that is
filled for primary files at the very end of the extractor invocation if
the frontend was performed successfully. If for example the extractor
crashes this will therefore not be filled.
The upgrade script is written so that `SuccessfullyExtractedFiles.ql`
on an upgraded script will give exactly the same results as before it.
File hashing is now done internally in `SwiftFileInterception` (and
exported as a `getHashOfRealFile` function for future use in linkage
awareness), and using a per-process in-memory cache. The persistent
caching of paths is removed, so the solution is now robust against input
file changes during the build.
For the same reason, the hash to artifact mapping have the symlinks
reversed now. The artifacts themselves are stored using the hash as
filenames, and the original paths of the artifacts are reacreated in the
scratch dir with symlinks mostly for debugging purposes (to understand
what artifact each hash corresponds to, and to follow what was built by
the extractor).
The hash map mechanism that was already in use for reading swiftmodule
files on macOS is now in use also on Linux. The output replacing
mechanism has been also reworked so that:
* frontend module emission modes have the remapping done directly in
the internal frontend options instead of painstakingly modifying input
flags (this requires a patch on the swift headers though)
* object emission mode is silenced to be just a type checking pass,
thus producing no output files
* all other passes but some debugging and version related ones become
noops
The open file read redirection uses a global weak pointer instance to
maximize robustness in the face of possibly multi-threaded calls to open
happening while `main` is exiting. Possibly overkill, but better safe
than sorry.