Taus cc39ae57dc Python: Fix dataset check error for string encoding
Here's an example of one of these errors:
```
INVALID_KEY predicate py_cobjectnames(@py_cobject obj, string name)

The key set {obj} does not functionally determine all fields. Here is a
pair of tuples that agree on the key set but differ at index 1: Tuple 1
in row 63874: (72088,"u'<X>'") Tuple 2 in row 63875: (72088,"u'<?>'")
```
(Here, the substring `X` should really be the Unicode character U+FFFD,
but for some reason I'm not allowed to put that in this commit message.)

Inside the extractor, we assign IDs based on the string type (bytestring
or Unicode) and a hash of the UTF-8 encoded content of the string. In
this case, however, certain _different_ strings were receiving the same
hash, due to replacement characters in the encoding process.

In particular, we were converting unencodable characters to question
marks in one place, and to U+FFFD in another place. This caused a
discrepancy that lead to the dataset check error.

To fix this, we put in a custom error handler that always puts the
U+FFFD character in place of unencodable characters. With this, the
strings now agree, and hence there is no clash.
2024-10-21 15:31:16 +00:00
2022-10-20 08:21:02 -04:00
2018-09-23 16:24:31 -07:00
2024-06-05 14:46:59 +02:00
2024-05-21 09:14:13 +02:00
2024-05-15 10:18:31 +02:00
2022-04-12 12:40:59 +02:00
2020-04-07 12:03:26 +01:00
2024-05-07 13:09:08 +01:00
2024-02-22 11:58:02 +01:00

CodeQL

This open source repository contains the standard CodeQL libraries and queries that power GitHub Advanced Security and the other application security products that GitHub makes available to its customers worldwide.

How do I learn CodeQL and run queries?

There is extensive documentation about the CodeQL language, writing CodeQL using the CodeQL extension for Visual Studio Code and using the CodeQL CLI.

Contributing

We welcome contributions to our standard library and standard checks. Do you have an idea for a new check, or how to improve an existing query? Then please go ahead and open a pull request! Before you do, though, please take the time to read our contributing guidelines. You can also consult our style guides to learn how to format your code for consistency and clarity, how to write query metadata, and how to write query help documentation for your query.

For information on contributing to CodeQL documentation, see the "contributing guide" for docs.

License

The code in this repository is licensed under the MIT License by GitHub.

The CodeQL CLI (including the CodeQL engine) is hosted in a different repository and is licensed separately. If you'd like to use the CodeQL CLI to analyze closed-source code, you will need a separate commercial license; please contact us for further help.

Visual Studio Code integration

If you use Visual Studio Code to work in this repository, there are a few integration features to make development easier.

CodeQL for Visual Studio Code

You can install the CodeQL for Visual Studio Code extension to get syntax highlighting, IntelliSense, and code navigation for the QL language, as well as unit test support for testing CodeQL libraries and queries.

Tasks

The .vscode/tasks.json file defines custom tasks specific to working in this repository. To invoke one of these tasks, select the Terminal | Run Task... menu option, and then select the desired task from the dropdown. You can also invoke the Tasks: Run Task command from the command palette.

Description
CodeQL: the libraries and queries that power security researchers around the world, as well as code scanning in GitHub Advanced Security
Readme MIT 19 GiB
Languages
CodeQL 32.3%
Kotlin 27.4%
C# 17.1%
Java 7.7%
Python 4.6%
Other 10.7%