Re: Hash aggregation

Maryann Xue Tue, 22 May 2018 11:12:47 -0700

Since the performance running a group-by aggregation on client side is most
likely bad, it’s usually not desired. The original implementation was for
functionality completeness only so it chose the easiest way, which reused
some existing classes. In some cases, though, the client group-by can still
be tolerable if there aren’t many distinct keys. So yes, please open a JIRA
for implementing hash aggregation on client side. Thank you!



Thanks,
Maryann
On Tue, May 22, 2018 at 10:50 AM Gerald Sangudi <[email protected]>
wrote:

> Hello,
>
> Any guidance or thoughts on the thread below?
>
> Thanks,
> Gerald
>
>
> On Fri, May 18, 2018 at 11:39 AM, Gerald Sangudi <[email protected]>
> wrote:
>
>> Maryann,
>>
>> Can Phoenix provide hash aggregation on the client side? Are there design
>> / implementation reasons not to, or should I file a ticket for this?
>>
>> Thanks,
>> Gerald
>>
>> On Fri, May 18, 2018 at 11:29 AM, Maryann Xue <[email protected]>
>> wrote:
>>
>>> Hi Gerald,
>>>
>>> Phoenix does have hash aggregation. The reason why sort-based
>>> aggregation is used in your query plan is that the aggregation happens on
>>> the client side. And that is because sort-merge join is used (as hinted)
>>> which is a client driven join, and after that join stage all operations can
>>> only be on the client-side.
>>>
>>>
>>> Thanks,
>>> Marynn
>>>
>>> On Fri, May 18, 2018 at 10:57 AM, Gerald Sangudi <[email protected]>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> Does Phoenix provide hash aggregation? If not, is it on the roadmap, or
>>>> should I file a ticket? We have aggregation queries that do not require
>>>> sorted results.
>>>>
>>>> For example, this EXPLAIN plan shows a CLIENT SORT.
>>>>
>>>> *CREATE TABLE unsalted (       keyA BIGINT NOT NULL,       keyB BIGINT
>>>> NOT NULL,       val SMALLINT,       CONSTRAINT pk PRIMARY KEY (keyA,
>>>> keyB));*
>>>>
>>>>
>>>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val v1, t2.val v2,
>>>> COUNT(*) c FROM unsalted t1 JOIN unsalted t2 ON (t1.keyA = t2.keyA) GROUP
>>>> BY t1.val,
>>>> t2.val;+------------------------------------------------------------+-----------------+----------------+--+|
>>>>                            PLAN   | EST_BYTES_READ | EST_ROWS_READ  |
>>>> |+------------------------------------------------------------+-----------------+----------------+--+|
>>>> SORT-MERGE-JOIN (INNER) TABLES                             | null | null |
>>>> ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null | null
>>>> | || AND                                                        | null |
>>>> null | ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null
>>>> | null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]              |
>>>> null | null | || CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]
>>>>    | null | null |
>>>> |+------------------------------------------------------------+-----------------+----------------+--+*
>>>> Thanks,
>>>> Gerald
>>>>
>>>
>>>
>>
>

Re: Hash aggregation

Reply via email to