Merge pull request #5934 from github/hmakholm/pr/monotonic-agg

QL language reference: add monotonic aggregate example
This commit is contained in:
Henning Makholm
2021-06-01 13:10:50 +02:00
committed by GitHub

View File

@@ -488,6 +488,101 @@ value for each value generated by the ``<formula>``:
value generated by the ``<formula>``. Here, the aggregation function is applied to each of the
resulting combinations.
Example of monotonic aggregates
-------------------------------
Consider this query:
.. code-block:: ql
string getPerson() { result = "Alice" or
result = "Bob" or
result = "Charles" or
result = "Diane"
}
string getFruit(string p) { p = "Alice" and result = "Orange" or
p = "Alice" and result = "Apple" or
p = "Bob" and result = "Apple" or
p = "Charles" and result = "Apple" or
p = "Charles" and result = "Banana"
}
int getPrice(string f) { f = "Apple" and result = 100 or
f = "Orange" and result = 100 or
f = "Orange" and result = 1
}
predicate nonmono(string p, int cost) {
p = getPerson() and cost = sum(string f | f = getFruit(p) | getPrice(f))
}
language[monotonicAggregates]
predicate mono(string p, int cost) {
p = getPerson() and cost = sum(string f | f = getFruit(p) | getPrice(f))
}
from string variant, string person, int cost
where variant = "default" and nonmono(person, cost) or
variant = "monotonic" and mono(person, cost)
select variant, person, cost
order by variant, person
The query produces these results:
+-----------+---------+------+
| variant | person | cost |
+-----------+---------+------+
| default | Alice | 201 |
| default | Bob | 100 |
| default | Charles | 100 |
| default | Diane | 0 |
| monotonic | Alice | 101 |
| monotonic | Alice | 200 |
| monotonic | Bob | 100 |
| monotonic | Diane | 0 |
+-----------+---------+------+
The two variants of the aggregate semantics differ in what happens
when ``getPrice(f)`` has either multiple results or no results
for a given ``f``.
In this query, oranges are available at two different prices, and the
default ``sum`` aggregate returns a single line where Alice buys an
orange at a price of 100, another orange at a price of 1, and an apple
at a price of 100, totalling 201. On the other hand, in the the
*monotonic* semantics for ``sum``, Alice always buys one orange and
one apple, and a line of output is produced for each *way* she can
complete her shopping list.
If there had been two different prices for apples too, the monotonic
``sum`` would have produced *four* output lines for Alice.
Charles wants to buy a banana, which is not for sale at all. In the
default case, the sum produced for Charles includes the cost of the
apple he *can* buy, but there's no line for Charles in the monontonic
``sum`` output, because there *is no way* for Charles to buy one apple
plus one banana.
(Diane buys no fruit at all, and in both variants her total cost
is 0. The ``strictsum`` aggregate would have excluded her from the
results in both cases).
In actual QL practice, it is quite rare to use monotonic aggregates
with the *goal* of having multiple output lines, as in the "Alice"
case of this example. The more significant point is the "Charles"
case: As long as there's no price for bananas, no output is produced
for him. This means that if we later do learn of a banana price, we
don't need to *remove* any output tuple already produced. The
importance of this is that the monotonic aggregate behavior works well
with a fixpoint-based semantics for recursion, so it will be meaningul
to let the ``getPrice`` predicate be mutually recursive with the count
aggregate itself. (On the other hand, ``getFruit`` still cannot be
allowed to be recursive, because adding another fruit to someone's
shopping list would invalidate the total costs we already knew for
them).
This opportunity to use recursion is the main practical reason for
requesting monotonic semantics of aggregates.
Recursive monotonic aggregates
------------------------------