1 Overview

There may be metrics and other meta-information of interest that are not provided by the default queries. Additional project-related information is available through the github API, and almost any meta-information can be collected by the build process at build time.

In addition to these two additional source of information, there are several CodeQL queries and classes that provide additional meta-information. These are summarized in the rest of this document.

Short samples for the github API are found in ../notes/gathering-api-information.html and those are used in ../notes/tables.html, "New tables to be exported".

2 Code scanning @metric and @diagnostic queries

The CodeQL library contains many @kinds of query in addition to problem and path-problem:

hohn@gh-hohn ~/local/codeql-v2.8.4/ql/cpp/ql/src
0:$ ag '@kind' |sed 's/^.*@//g;' | sort -u
kind alert-suppression
kind chart
kind definitions
kind diagnostic
kind display-string
kind extent
kind file-classifier
kind graph
kind metric
kind path-problem
kind problem
kind source-link
kind table
kind tree
kind treemap

The queries of @kind diagnostic and metric contains those; some more statistics are found under @kind table and treemap.

3 Project & codeql db build

For testing, we build a mid-size C project that builds on multiple architectures and for which alerts are found. A .zip file of the resulting database is in ./pure-ftpd-4f26ce6.db.zip

# Get
cd ~/local/sarif-cli/non-sarif-metadata 
git clone https://github.com/jedisct1/pure-ftpd.git

# Configure
cd ~/local/sarif-cli/non-sarif-metadata/pure-ftpd
./autogen.sh 
./configure

# Build
cd ~/local/sarif-cli/non-sarif-metadata/pure-ftpd

# Build db
cd ~/local/sarif-cli/non-sarif-metadata/pure-ftpd
export PATH=$HOME/local/codeql-v2.8.4/codeql:"$PATH"
codeql --version
codeql resolve qlpacks

GITREV=$(git rev-parse --short HEAD)
codeql database create --language=cpp -s . -vvvv pure-ftpd-$GITREV.db \
       --command='make -j8' 

# Logs
ls pure-ftpd-$GITREV.db/log
: build-tracer.log  database-create-20220422.121448.872.log

4 Existing queries producing diagnostic info

Some existing queries from the standard library and their @kinds are

  • @id cpp/diagnostics/successfully-extracted-files (@kind diagnostic)
  • @id cpp/diagnostics/extraction-warnings (@kind diagnostic)
  • @id cpp/architecture/general-statistics (@kind table)
  • @id cpp/external-dependencies (@kind treemap)
  • @id cpp/summary/lines-of-code (@kind metric)
  • @id cpp/summary/lines-of-user-code (@kind metric)

The next sections run them and show samples of their output.

4.1 Metric and Diagnostic queries

Not all @kind s support all output formats; for @kind metric and @kind diagnostic queries, only the sarif format produces output in the named files.

To run all of those queries, use the query suite via

# Common variables
export PATH=$HOME/local/codeql-v2.8.4/codeql:"$PATH"
GITREV=$(cd ~/local/sarif-cli/non-sarif-metadata/pure-ftpd && git rev-parse --short HEAD)

# Working directory
cd ~/local/sarif-cli/non-sarif-metadata/

# List the queries run
codeql resolve queries diagnostic-and-metric.qls |sed 's|.*codeql-|codeql-|g;'

# Run queries and collect output
codeql database analyze --format=sarif-latest   \
       --output diagnostic-and-metric.sarif     \
       -j8                                      \
       --                                       \
       pure-ftpd/pure-ftpd-$GITREV.db           \
       diagnostic-and-metric.qls

Those queries enumerated:

codeql-v2.8.4/ql/cpp/ql/src/Diagnostics/ExtractionWarnings.ql
codeql-v2.8.4/ql/cpp/ql/src/Diagnostics/FailedExtractorInvocations.ql
codeql-v2.8.4/ql/cpp/ql/src/Diagnostics/SuccessfullyExtractedFiles.ql
codeql-v2.8.4/ql/cpp/ql/src/Summary/LinesOfCode.ql
codeql-v2.8.4/ql/cpp/ql/src/Summary/LinesOfUserCode.ql

Summaries of the results of running diagnostic and metric queries are part of the log output:

Analysis produced the following diagnostic data:

Diagnostic Summary
Extraction warnings 0 results
Failed extractor invocations 0 results
Successfully extracted files 85 results

Analysis produced the following metric data:

Metric Value
Total lines of C/C++ code in the database 45606
Total lines of user written C/C++ code in the database 23932

Entries in diagnostic-and-metric.sarif provide the details of non-zero summaries, so no entries for

codeql-v2.8.4/ql/cpp/ql/src/Diagnostics/ExtractionWarnings.ql
codeql-v2.8.4/ql/cpp/ql/src/Diagnostics/FailedExtractorInvocations.ql

Typical sarif entries – but in different subtrees from results – for codeql-v2.8.4/ql/cpp/ql/src/Diagnostics/SuccessfullyExtractedFiles.ql

$schema: https://json.schemastore.org/sarif-2.1.0.json
runs:
- artifacts:
  invocations:
  - executionSuccessful: true
    - descriptor:
        id: cpp/diagnostics/successfully-extracted-files
        index: 2
      level: none
      locations:
      - physicalLocation:
          artifactLocation:
            index: 0
            uri: config.h
            uriBaseId: '%SRCROOT%'
      message:
        text: File successfully extracted
      properties:
        formattedMessage:
          text: File successfully extracted
        relatedLocations: []
    - ...

and codeql-v2.8.4/ql/cpp/ql/src/Summary/LinesOfCode.ql

$schema: https://json.schemastore.org/sarif-2.1.0.json
runs:
- artifacts:
  properties:
    metricResults:
    - rule:
        id: cpp/summary/lines-of-code
        index: 0
      ruleId: cpp/summary/lines-of-code
      ruleIndex: 0
      value: 45606

and codeql-v2.8.4/ql/cpp/ql/src/Summary/LinesOfUserCode.ql

$schema: https://json.schemastore.org/sarif-2.1.0.json
runs:
- artifacts:
  properties:
    metricResults:
    - baseline: 29497
      rule:
        id: cpp/summary/lines-of-user-code
        index: 1
      ruleId: cpp/summary/lines-of-user-code
      ruleIndex: 1
      value: 23932

In addition to file.getMetrics(), these libraries provide support:

  1. codeql-v2.8.4/ql/cpp/ql/src/Diagnostics/ExtractionProblems.qll provides a common hierarchy of all types of problems that can occur during extraction.
  2. codeql-v2.8.4/ql/cpp/ql/lib/semmle/code/cpp/Compilation.qll provides class Compilation, an invocation of the compiler.

4.2 Table queries

Generating table output is more involved; the following produces CSV from all results.

# Common variables
export PATH=$HOME/local/codeql-v2.8.4/codeql:"$PATH"
GITREV=$(cd ~/local/sarif-cli/non-sarif-metadata/pure-ftpd && git rev-parse --short HEAD)

# Working directory
cd ~/local/sarif-cli/non-sarif-metadata/

# Remove prior files
find pure-ftpd -name "*.bqrs" -exec rm {} \; 

# 
# Run a query against the database, saving the results to the results/
# subdirectory of the database directory for further processing.
codeql database run-queries -j8 --ram=20000 -- \
       pure-ftpd/pure-ftpd-$GITREV.db  tables.qls

find pure-ftpd -name "*.bqrs" > bqrs-files

codeql resolve queries tables.qls  | \
    while read path ; do basename "$path" ; done > table-filenames

# Get general info about available results
cat bqrs-files | while read file 
do
    codeql bqrs info --format=text -- "$file"
done 

# Format result as csv for processing
codeql bqrs decode  --result-set="#select" \
       --format=csv \
       --entities=all -- "$file"

# Format results as text for reading
cat bqrs-files | while read file
do
    echo "==> $file <=="
    codeql bqrs decode  --result-set="#select" \
           --format=text \
           --entities=all -- "$file" |\
        sed 's/\+--/|--/g;' | sed 's/--\+/--|/g;'
done

Repository-level results:

=> /cpp-queries/Metrics/Internal/DiagnosticsSumElapsedTimes.bqrs <=

sum_frontend_elapsed_seconds sum_extractor_elapsed_seconds
6.0 4.0

=> /cpp-queries/Architecture/General Top-Level Information/GeneralStatistics.bqrs <=

Title Value
Number of Files 363
Number of Unions 8
Number of C Files 53
Number of Structs 235
Number of Namespaces 1
Number of Functions 1851
Number of Header Files 310
Number of Classes 0
Number of C++ Files 0
Number of Lines Of Code 45606
Self-Containedness 100%

Data to external API (truncated to fit):

=> /cpp-queries/Security/CWE/CWE-020/CountUntrustedDataToExternalAPI.bqrs <=

ID of externalApi externalApi numberOfUses numberOfUntrustedSources
1 read [param 1] 4 4
2 read [param 2] 4 4
4 __builtin___memmove_chk [param 2] 1 1
0 fwrite [param 2] 1 1
3 poll [param 2] 1 1

=> /cpp-queries/Security/CWE/CWE-020/IRCountUntrustedDataToExternalAPI.bqrs <=

ID of externalApi externalApi numberOfUses numberOfUntrustedSources
9 read [param 1] 12 6
7 free [param 0] 27 5
16 poll [param 2] 3 3
12 __builtin_object_size [param 0] 2 2

Hub classes (truncated to fit): => /cpp-queries/Architecture/General Class-Level Information/HubClasses.bqrs <=

ID of Class Class URL for Class AfferentCoupling EfferentCoupling
39174 in_addr file:///Applications/Xcode-11.4.1.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/netinet/in.h:301:8:301:14 8 0
15020 __darwin_fp_status file:///Applications/Xcode-11.4.1.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/mach/i386/_structs.h:150:1:150:17 6 0
15007 __darwin_xmm_reg file:///Applications/Xcode-11.4.1.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/mach/i386/_structs.h:213:1:213:15 6 0
15013 __darwin_mmst_reg file:///Applications/Xcode-11.4.1.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/mach/i386/_structs.h:194:1:194:16 6 0
15042 __darwin_fp_control file:///Applications/Xcode-11.4.1.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/mach/i386/_structs.h:92:1:92:18 6 0

4.3 Treemap queries

The treemap queries are a large collection of code metrics intended for display as a treemap; the queries themselves produce table output. These metrics are not further explored here, but listed for completeness:

hohn@gh-hohn ~/local/codeql-v2.8.4/ql/cpp/ql/src
0:$ ag -l 'kind treemap'
Metrics/Classes/CLackOfCohesionHS.ql
Metrics/Classes/CHalsteadVocabulary.ql
Metrics/Classes/CNumberOfFunctions.ql
Metrics/Classes/CHalsteadLength.ql
Metrics/Classes/CPercentageOfComplexCode.ql
Metrics/Classes/CSizeOfAPI.ql
Metrics/Classes/CLinesOfCode.ql
Metrics/Classes/CAfferentCoupling.ql
Metrics/Classes/CEfferentCoupling.ql
Metrics/Classes/CHalsteadVolume.ql
Metrics/Classes/CHalsteadEffort.ql
Metrics/Classes/CResponse.ql
Metrics/Classes/CHalsteadDifficulty.ql
Metrics/Classes/CHalsteadBugs.ql
Metrics/Classes/CInheritanceDepth.ql
Metrics/Classes/CNumberOfStatements.ql
Metrics/Classes/CSpecialisation.ql
Metrics/Classes/CLackOfCohesionCK.ql
Metrics/Classes/CNumberOfFields.ql
Metrics/Dependencies/ExternalDependencies.ql
Metrics/Files/FLinesOfCommentedOutCode.ql
Metrics/Files/NumberOfParameters.ql
Metrics/Files/FHalsteadLength.ql
Metrics/Files/FLines.ql
Metrics/Files/FHalsteadVocabulary.ql
Metrics/Files/FCommentRatio.ql
Metrics/Files/FTransitiveIncludes.ql
Metrics/Files/AutogeneratedLOC.ql
Metrics/Files/FLinesOfCode.ql
Metrics/Files/FNumberOfClasses.ql
Metrics/Files/NumberOfGlobals.ql
Metrics/Files/NumberOfPublicGlobals.ql
Metrics/Files/FNumberOfTests.ql
Metrics/Files/FTimeInFrontend.ql
Metrics/Files/FTodoComments.ql
Metrics/Files/FCyclomaticComplexity.ql
Metrics/Files/NumberOfFunctions.ql
Metrics/Files/FTransitiveSourceIncludes.ql
Metrics/Files/FHalsteadDifficulty.ql
Metrics/Files/FHalsteadBugs.ql
Metrics/Files/FLinesOfComments.ql
Metrics/Files/ConditionalSegmentLines.ql
Metrics/Files/FMacroRatio.ql
Metrics/Files/ConditionalSegmentConditions.ql
Metrics/Files/FHalsteadEffort.ql
Metrics/Files/FAfferentCoupling.ql
Metrics/Files/FHalsteadVolume.ql
Metrics/Files/FDirectIncludes.ql
Metrics/Files/NumberOfPublicFunctions.ql
Metrics/Files/FEfferentCoupling.ql
Metrics/Files/FunctionLength.ql
Metrics/Functions/FunCyclomaticComplexity.ql
Metrics/Functions/StatementNestingDepth.ql
Metrics/Functions/FunLinesOfCode.ql
Metrics/Functions/FunNumberOfCalls.ql
Metrics/Functions/FunPercentageOfComments.ql
Metrics/Functions/FunNumberOfStatements.ql
Metrics/Functions/FunIterationNestingDepth.ql
Metrics/Functions/FunNumberOfParameters.ql
Metrics/Functions/FunLinesOfComments.ql

4.4 Custom queries

This script and the metrics01.ql files serve as starting point for custom metric / diagnostic queries using the CodeQL File, Compilation, or Diagnostic classes.

# Common variables
export PATH=$HOME/local/codeql-v2.8.4/codeql:"$PATH"
GITREV=$(cd ~/local/sarif-cli/non-sarif-metadata/pure-ftpd && git rev-parse --short HEAD)

# Working directory
cd ~/local/sarif-cli/non-sarif-metadata/

# Run the custom query
codeql database analyze --format=sarif-latest \
       --output metrics01.sarif                         \
       -j8                                              \
       --                                               \
       pure-ftpd/pure-ftpd-$GITREV.db                   \
       metrics01.ql

with log output:

Analysis produced the following diagnostic data:

Diagnostic Summary
metrics01 1 result

Author: Michael Hohn

Created: 2022-04-28 Thu 16:09

Validate