Since locations for any given source file are never referenced in any
TRAP files besides the one for that particular source file, it's not
necessary to use global IDs. Using fresh IDs will reduce the size of the
ID pool (both on disk and in memory) and the speed of multi-threaded
TRAP import.
The one exception is the empty location, which still uses a global ID.
If someone had used `LGTM_INDEX_FILTERS=exclude:**/*\ninclude:*.rb`
before, we would have mistakenly excluded all files :|
(LGTM_INDEX_FILTERS is a prioritized list where later matches take
priority over earlier ones)
This change is needed to support adding `exclude:**/*` as the first
filter if `paths` include a glob, which currently causes bad behavior in
the Python extractor. However, we can first introduce that change once
this PR has been merged.
I realize this change can cause more folders and files to be traversed
(since they are not just skipped with --exclude). We plan to make a
better long term fix which should bring back the previous performance.
I was perusing the shared extractor the other day, when I came across
the `NodeInfo` struct. I noticed that the `fields` and `subtypes` fields
on this struct had two seemingly identical ways of expressing the same
thing: `None` and `Some(empty)` (where `empty` is respectively the empty
map and the empty vector). As far as I can tell, there's no semantic
difference in either case, so we can just elide the option type entirely
and use the empty value directly. This has the nice side-effect of
cleaning up some of the other code.
Replace the `file_extensions` field with `file_globs`, which supports
UNIX style glob patterns powered by the `globset` crate.
This allows files with no extension (e.g. Dockerfiles) to be extracted,
by specifying a glob such as `*Dockerfile`.
One surprising aspect of this change is that the globs match against the
whole path, rather than just the file name.
This is a breaking change.