[ https://issues.apache.org/jira/browse/CALCITE-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17283432#comment-17283432 ]
Julian Hyde commented on CALCITE-4483: -------------------------------------- I implemented {{UNIQUE_VALUE}}, but it was not sufficient to solve the problems. When evaluated in grouping setsĀ \(x) and \(x, y), we needed the result from the second grouping set (where the value was indeed unique) but were not interested in the result from the first grouping set (where the value is not unique). But it was still calculated in the first grouping set, and therefore threw an exception that aborted the query. CALCITE-4484 remains open (and has an implementation that is almost complete) but we will not use it to implement {{WITHIN DISTINCT}}. > WITHIN DISTINCT clause for aggregate functions (experimental) > ------------------------------------------------------------- > > Key: CALCITE-4483 > URL: https://issues.apache.org/jira/browse/CALCITE-4483 > Project: Calcite > Issue Type: Bug > Reporter: Julian Hyde > Assignee: Julian Hyde > Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Add a {{WITHIN DISTINCT}} clause to aggregate functions, allowing duplicate > rows to be eliminated before entering the function. > This feature is non-standard, and in fact does not exist in any database I am > aware of. Therefore this feature is experimental, and may evolve or be > removed without notice. > It is related to {{DISTINCT}}, and is in fact a generalization of it. > {{DISTINCT}} can always be rewritten in terms of {{WITHIN DISTINCT}}. For > example, {{SUM(DISTINCT sal)}} is equivalent to {{SUM(sal) WITHIN DISTINCT > (sal)}}. > Consider the query > {noformat} > SELECT SUM(age), > SUM(DISTINCT age), > SUM(age) WITHIN DISTINCT (name) > FROM Friends{noformat} > where {{Friends}} has the rows > {noformat} > name age job > ====== === ========== > Julian 16 Programmer > Dick 15 > Anne 13 Car wash > George 15 Lifeguard > George 15 Dog walker > Timmy 4 > {noformat} > Note that there are two rows for George, because she has two jobs. > The values of the columns are as follows: > * {{SUM(age)}} has the value (16 + 15 + 13 + 15 + 15 + 4) = 78; > * {{SUM(DISTINCT age)}} has the value (16 + 15 + 13 + 4) = 48; > * {{SUM(age) WITHIN DISTINCT (name)}} has the value (16 + 15 + 13 + 15 + 4) > = 63. > {{WITHIN DISTINCT}} has treated the two 15 values for George as one value, > but has still counted the 15 for Dick separately. > The {{WITHIN DISTINCT}} clause can be useful to prevent double-counting when > duplicates have been added via a many-to-one join. -- This message was sent by Atlassian Jira (v8.3.4#803005)