The key insight here is that `HC_FieldCons` and `HC_Array` are
functionally determined by the things that arise in another
recursive call. Lifting them to their own predicate, therefore,
reduces nonlinearity and constrains the join order in a way that
cannot be asymptotically bad -- and, indeed, makes quite a big
difference in practice.
Switching `security.TaintTracking` to use `DefaultTaintTracking` causes
us to lose a result from `UnboundedWrite.ql`, while this commit restores
it:
diff --git a/semmlecode-cpp-tests/DO_NOT_DISTRIBUTE/security-tests/CWE-120/CERT/STR35-C/UnboundedWrite.expected b/semmlecode-cpp-tests/DO_NOT_DISTRIBUTE/security-tests/CWE-120/CERT/STR35-C/UnboundedWrite.expected
index 1eba0e52f0e..d947b33b9d9 100644
--- a/semmlecode-cpp-tests/DO_NOT_DISTRIBUTE/security-tests/CWE-120/CERT/STR35-C/UnboundedWrite.expected
+++ b/semmlecode-cpp-tests/DO_NOT_DISTRIBUTE/security-tests/CWE-120/CERT/STR35-C/UnboundedWrite.expected
@@ -1,2 +1,3 @@
+| main.c:54:7:54:12 | call to strcat | This 'call to strcat' with input from $@ may overflow the destination. | main.c:93:15:93:18 | argv | argv |
| main.c:99:9:99:12 | call to gets | This 'call to gets' with input from $@ may overflow the destination. | main.c:99:9:99:12 | call to gets | call to gets |
| main.c:213:17:213:19 | buf | This 'scanf string argument' with input from $@ may overflow the destination. | main.c:213:17:213:19 | buf | buf |
This is one step toward implementing the taint-tracking wrapper in terms
of `Instruction` rather than `Expr`.
This leads to a few duplicate results in `TaintedAllocationSize.ql`
because the library now considers `sizeof(int)` to be just as
predictable as `4`, whereas the `security.TaintTracking` library does
not consider `sizeof` to be predictable. I think it's simpler to accept
the duplicate results since they are ultimately a quirk of the query,
not the library.
The following is the diff between (a) replacing `TaintTracking.qll` with
a link to `DefaultTaintTracking.qll` and (b) additionally applying this
commit.
diff --git a b
--- a/cpp/ql/test/query-tests/Security/CWE/CWE-190/semmle/TaintedAllocationSize/TaintedAllocationSize.expected
+++ b/cpp/ql/test/query-tests/Security/CWE/CWE-190/semmle/TaintedAllocationSize/TaintedAllocationSize.expected
@@ -1,5 +1,8 @@
| test.cpp:42:31:42:36 | call to malloc | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) |
+| test.cpp:43:31:43:36 | call to malloc | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) |
| test.cpp:43:38:43:63 | ... * ... | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) |
+| test.cpp:45:31:45:36 | call to malloc | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) |
| test.cpp:48:25:48:30 | call to malloc | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) |
| test.cpp:49:17:49:30 | new[] | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) |
+| test.cpp:52:21:52:27 | call to realloc | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) |
| test.cpp:52:35:52:60 | ... * ... | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) |
--- a/semmlecode-cpp-tests/DO_NOT_DISTRIBUTE/security-tests/CWE-190/CERT/INT04-C/int04.expected
+++ b/semmlecode-cpp-tests/DO_NOT_DISTRIBUTE/security-tests/CWE-190/CERT/INT04-C/int04.expected
@@ -1 +1,2 @@
| int04c.c:21:29:21:51 | ... * ... | This allocation size is derived from $@ and might overflow | int04c.c:14:30:14:35 | call to getenv | user input (getenv) |
+| int04c.c:22:33:22:38 | call to malloc | This allocation size is derived from $@ and might overflow | int04c.c:14:30:14:35 | call to getenv | user input (getenv) |