Re: Hash aggregation

Gerald Sangudi Thu, 14 Jun 2018 08:33:39 -0700

Thanks James. Looking into that.

Gerald



On Thu, Jun 14, 2018 at 6:30 AM, James Taylor <[email protected]>
wrote:

> Hi Gerald,
> No further suggestions than my comments on the JIRA. Maybe a good next
> step would be a patch?
> Thanks,
> James
>
> On Tue, Jun 12, 2018 at 8:15 PM, Gerald Sangudi <[email protected]>
> wrote:
>
>> Hi Maryann and James,
>>
>> Any further guidance on PHOENIX-4751
>> <https://issues.apache.org/jira/browse/PHOENIX-4751>?
>>
>> Thanks,
>> Gerald
>>
>> On Wed, May 23, 2018 at 11:00 AM, Gerald Sangudi <[email protected]>
>> wrote:
>>
>>> Hi Maryann,
>>>
>>> I filed PHOENIX-4751
>>> <https://issues.apache.org/jira/browse/PHOENIX-4751>.
>>>
>>> Is this likely to be reviewed soon (say next few weeks), or should I
>>> look at the Phoenix source to estimate the scope / impact?
>>>
>>> Thanks,
>>> Gerald
>>>
>>> On Tue, May 22, 2018 at 11:12 AM, Maryann Xue <[email protected]>
>>> wrote:
>>>
>>>> Since the performance running a group-by aggregation on client side is
>>>> most likely bad, it’s usually not desired. The original implementation was
>>>> for functionality completeness only so it chose the easiest way, which
>>>> reused some existing classes. In some cases, though, the client group-by
>>>> can still be tolerable if there aren’t many distinct keys. So yes, please
>>>> open a JIRA for implementing hash aggregation on client side. Thank you!
>>>>
>>>>
>>>> Thanks,
>>>> Maryann
>>>>
>>>> On Tue, May 22, 2018 at 10:50 AM Gerald Sangudi <[email protected]>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> Any guidance or thoughts on the thread below?
>>>>>
>>>>> Thanks,
>>>>> Gerald
>>>>>
>>>>>
>>>>> On Fri, May 18, 2018 at 11:39 AM, Gerald Sangudi <[email protected]
>>>>> > wrote:
>>>>>
>>>>>> Maryann,
>>>>>>
>>>>>> Can Phoenix provide hash aggregation on the client side? Are there
>>>>>> design / implementation reasons not to, or should I file a ticket for 
>>>>>> this?
>>>>>>
>>>>>> Thanks,
>>>>>> Gerald
>>>>>>
>>>>>> On Fri, May 18, 2018 at 11:29 AM, Maryann Xue <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Gerald,
>>>>>>>
>>>>>>> Phoenix does have hash aggregation. The reason why sort-based
>>>>>>> aggregation is used in your query plan is that the aggregation happens 
>>>>>>> on
>>>>>>> the client side. And that is because sort-merge join is used (as hinted)
>>>>>>> which is a client driven join, and after that join stage all operations 
>>>>>>> can
>>>>>>> only be on the client-side.
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Marynn
>>>>>>>
>>>>>>> On Fri, May 18, 2018 at 10:57 AM, Gerald Sangudi <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> Does Phoenix provide hash aggregation? If not, is it on the
>>>>>>>> roadmap, or should I file a ticket? We have aggregation queries that 
>>>>>>>> do not
>>>>>>>> require sorted results.
>>>>>>>>
>>>>>>>> For example, this EXPLAIN plan shows a CLIENT SORT.
>>>>>>>>
>>>>>>>> *CREATE TABLE unsalted (       keyA BIGINT NOT NULL,       keyB
>>>>>>>> BIGINT NOT NULL,       val SMALLINT,       CONSTRAINT pk PRIMARY KEY 
>>>>>>>> (keyA,
>>>>>>>> keyB));*
>>>>>>>>
>>>>>>>>
>>>>>>>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val v1, t2.val v2,
>>>>>>>> COUNT(*) c FROM unsalted t1 JOIN unsalted t2 ON (t1.keyA = t2.keyA) 
>>>>>>>> GROUP
>>>>>>>> BY t1.val,
>>>>>>>> t2.val;+------------------------------------------------------------+-----------------+----------------+--+|
>>>>>>>>                            PLAN   | EST_BYTES_READ | EST_ROWS_READ  |
>>>>>>>> |+------------------------------------------------------------+-----------------+----------------+--+|
>>>>>>>> SORT-MERGE-JOIN (INNER) TABLES                             | null | 
>>>>>>>> null |
>>>>>>>> ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null | 
>>>>>>>> null
>>>>>>>> | || AND                                                        | null 
>>>>>>>> |
>>>>>>>> null | ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | 
>>>>>>>> null
>>>>>>>> | null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]              
>>>>>>>> |
>>>>>>>> null | null | || CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, 
>>>>>>>> T2.VAL]
>>>>>>>>    | null | null |
>>>>>>>> |+------------------------------------------------------------+-----------------+----------------+--+*
>>>>>>>> Thanks,
>>>>>>>> Gerald
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>>
>

Re: Hash aggregation

Reply via email to