Supports only a minimal subset of the project layout specification;
enough to work with the transformers produced by the CLI when building
an overlay database.
The rust-analyzer update will need more work as it seems to break rust
analysis on windows.
This was carried out using `cargo upgrade` from `cargo-edit`:
* getting exclusions options for rust-analyzer with
```bash
cargo upgrade -i --dry-run | grep -o 'ra_ap_\S\+' | sort -u | sed 's/^/--exclude=/' > /tmp/exclude
```
* running
```bash
cargo upgrade -i $(cat /tmp/exclude)
misc/bazel/3rdparty/update_cargo_deps.sh
```
We've been observing some performance issues using crate_universe on CI.
Therefore, we're moving to vendor the auto-generated BUILD files
in our repository. This should provide a nice speed boost, while
getting rid of the complexity of the "rust cache" job we've been using
when we had a lot of git dependencies.
This PR includes a vendor script, and I'll put up a CI job internally
that runs that vendor script on Cargo.toml and Cargo.lock changes, to check
that the vendored files are in sync.
There was a recent round of tree-sitter-* package releases,
so the latest code is now a) released and b) available on crates.io.
Therefore, move away from the (super slow on CI) git dependencies to released crates instead.
This also includes a run of `cargo update`, so there's a bunch of more changes to the lockfile.
The `.cargo/config.toml` override based workaround wasn't really
working, as while `cargo build|check` was reading that, `cargo metadata`
wasn't, ending up in a completely broken IDE experience.
For the moment, we just use a unified workspace `Cargo.toml` for all
extractors using the shared tree-sitter code, which has the downside of
making bazel pull in dependencies for all of them, and not being able to
do sparse checkouts for them. We should investigate and rivist this in
the future.
This generalizes the location cache to allow multiple sources to be
extracted in the same trap file, by adding `file_label` to `Location`,
and therefore to location cache keys. This will be used by the Rust
extractor.
Previously, we pulled in the shared tree-sitter extractor via a `git`
dependency in `Cargo.toml` to address a `rules_rust` limitation (no `path`
dependencies outside of the cargo workspace)). This was a problem,
as that means we're cloning `github/codeql` _again_ for the build, which is
quite slow.
I found another way that is faster, and still produces correct builds
for both `cargo`` and `rules_rust`:
* Cargo depends on a fake crate that has the same dependencies as the real crate (thanks to `sync-files.py`). Therefore, cargo pulls in the right dependencies into the lockfile, which bazel targets
* For local builds, we override the path to that dependency in a cargo config, so we're pulling in the correct code
* rules_rust only uses `path` dependencies for collecting transitive dependencies, it never pulls in the code from there. So far that, we manually provide a `BUILD.bazel` file for the shared extractor, and depend on that.
Since locations for any given source file are never referenced in any
TRAP files besides the one for that particular source file, it's not
necessary to use global IDs. Using fresh IDs will reduce the size of the
ID pool (both on disk and in memory) and the speed of multi-threaded
TRAP import.
The one exception is the empty location, which still uses a global ID.
If someone had used `LGTM_INDEX_FILTERS=exclude:**/*\ninclude:*.rb`
before, we would have mistakenly excluded all files :|
(LGTM_INDEX_FILTERS is a prioritized list where later matches take
priority over earlier ones)
This change is needed to support adding `exclude:**/*` as the first
filter if `paths` include a glob, which currently causes bad behavior in
the Python extractor. However, we can first introduce that change once
this PR has been merged.
I realize this change can cause more folders and files to be traversed
(since they are not just skipped with --exclude). We plan to make a
better long term fix which should bring back the previous performance.
I was perusing the shared extractor the other day, when I came across
the `NodeInfo` struct. I noticed that the `fields` and `subtypes` fields
on this struct had two seemingly identical ways of expressing the same
thing: `None` and `Some(empty)` (where `empty` is respectively the empty
map and the empty vector). As far as I can tell, there's no semantic
difference in either case, so we can just elide the option type entirely
and use the empty value directly. This has the nice side-effect of
cleaning up some of the other code.