File hashing is now done internally in `SwiftFileInterception` (and
exported as a `getHashOfRealFile` function for future use in linkage
awareness), and using a per-process in-memory cache. The persistent
caching of paths is removed, so the solution is now robust against input
file changes during the build.
For the same reason, the hash to artifact mapping have the symlinks
reversed now. The artifacts themselves are stored using the hash as
filenames, and the original paths of the artifacts are reacreated in the
scratch dir with symlinks mostly for debugging purposes (to understand
what artifact each hash corresponds to, and to follow what was built by
the extractor).
The hash map mechanism that was already in use for reading swiftmodule
files on macOS is now in use also on Linux. The output replacing
mechanism has been also reworked so that:
* frontend module emission modes have the remapping done directly in
the internal frontend options instead of painstakingly modifying input
flags (this requires a patch on the swift headers though)
* object emission mode is silenced to be just a type checking pass,
thus producing no output files
* all other passes but some debugging and version related ones become
noops
The open file read redirection uses a global weak pointer instance to
maximize robustness in the face of possibly multi-threaded calls to open
happening while `main` is exiting. Possibly overkill, but better safe
than sorry.
There were missing extractions from the Builtin (and other) modules.
This was actually caused by two issues:
* we did not visit all required modules, as for example the `Builtin`
module does not appear as being imported by anybody (together with
another mysterious `__Objc` module)
* moreover the `Builtin` module works internally by only creating
declarations on demand, and does not provide a list of its top level
declarations.
The first problem was solved by moving module collection to the actual
visiting. This may mean we extract less modules, as we only extract the
modules we actually use something from (recursively). This change can
be reverted if we feel we need it.
The second one was solved by explicitly listing the builtin symbols
encountered during a normal extraction. This does mean this list needs
to be kept up to date.
Previously we were not extracting any `swiftmodule` file that was not
a system or a built-in one. This was done to avoid re-extracting
`swiftmodule` files that were built previously in the same build, but it
turned out to be too eager, as there are legitimate cases where a
non-system, non-built-in precompiled swift module can be used. An
example of that is the `PackageDescription` module used in Swift
Package Manager manifest files (`Package.swift`).
We now relax the test and trigger module extraction on all loaded
modules that do not have source files (we trigger source file extraction
for those). The catch, is that we also create empty trap files for
current output `swiftmodule` files (including possible alias locations
set up by XCode).
This means that if a following extractor run loads a previously built
`swiftmodule` file, although it will trigger module extraction, this
will however be skipped as it will find its target file already present
(this is done via the `TargetFile` semantics).
Minimal recreations of internal `integration-tests-runner.py` and
`create_database_utils.py` are provided to be able to run the
integration tests on the codeql repository with a released codeql CLI.
For the moment we skip the database checks by default, as we are still
producing inconsistent results.