Also:
* remove new warnings raised by the rust toolchain
* run new formatting and linting
* update the rust toolchain used by `cargo`
While we keep `bazel` builds using the same toolchain as internally
(now a nightly one), I opted for using a stable toolchain for `cargo`.
The nightly toolchain is only required internally for build reasons, we
should keep not using any unstable rust features in our sources.
Supports only a minimal subset of the project layout specification;
enough to work with the transformers produced by the CLI when building
an overlay database.
This generalizes the location cache to allow multiple sources to be
extracted in the same trap file, by adding `file_label` to `Location`,
and therefore to location cache keys. This will be used by the Rust
extractor.
Since locations for any given source file are never referenced in any
TRAP files besides the one for that particular source file, it's not
necessary to use global IDs. Using fresh IDs will reduce the size of the
ID pool (both on disk and in memory) and the speed of multi-threaded
TRAP import.
The one exception is the empty location, which still uses a global ID.
If someone had used `LGTM_INDEX_FILTERS=exclude:**/*\ninclude:*.rb`
before, we would have mistakenly excluded all files :|
(LGTM_INDEX_FILTERS is a prioritized list where later matches take
priority over earlier ones)
This change is needed to support adding `exclude:**/*` as the first
filter if `paths` include a glob, which currently causes bad behavior in
the Python extractor. However, we can first introduce that change once
this PR has been merged.
I realize this change can cause more folders and files to be traversed
(since they are not just skipped with --exclude). We plan to make a
better long term fix which should bring back the previous performance.
I was perusing the shared extractor the other day, when I came across
the `NodeInfo` struct. I noticed that the `fields` and `subtypes` fields
on this struct had two seemingly identical ways of expressing the same
thing: `None` and `Some(empty)` (where `empty` is respectively the empty
map and the empty vector). As far as I can tell, there's no semantic
difference in either case, so we can just elide the option type entirely
and use the empty value directly. This has the nice side-effect of
cleaning up some of the other code.
Replace the `file_extensions` field with `file_globs`, which supports
UNIX style glob patterns powered by the `globset` crate.
This allows files with no extension (e.g. Dockerfiles) to be extracted,
by specifying a glob such as `*Dockerfile`.
One surprising aspect of this change is that the globs match against the
whole path, rather than just the file name.
This is a breaking change.