Re: Multiple aggregations over streaming dataframes

Michael Armbrust Thu, 07 Jul 2016 14:31:39 -0700

We are planning to address this issue in the future.

At a high level, we'll have to add a delta mode so that updates can be
communicated from one operator to the next.


On Thu, Jul 7, 2016 at 8:59 AM, Arnaud Bailly <arnaud.oq...@gmail.com>
wrote:

> Indeed. But nested aggregation does not work with Structured Streaming,
> that's the point. I would like to know if there is workaround, or what's
> the plan regarding this feature which seems to me quite useful. If the
> implementation is not overtly complex and it is just a matter of manpower,
> I am fine with devoting some time to it.
>
>
>
> --
> Arnaud Bailly
>
> twitter: abailly
> skype: arnaud-bailly
> linkedin: http://fr.linkedin.com/in/arnaudbailly/
>
> On Thu, Jul 7, 2016 at 2:17 PM, Sivakumaran S <siva.kuma...@me.com> wrote:
>
>> Arnauld,
>>
>> You could aggregate the first table and then merge it with the second
>> table (assuming that they are similarly structured) and then carry out the
>> second aggregation. Unless the data is very large, I don’t see why you
>> should persist it to disk. IMO, nested aggregation is more elegant and
>> readable than a complex single stage.
>>
>> Regards,
>>
>> Sivakumaran
>>
>>
>>
>> On 07-Jul-2016, at 1:06 PM, Arnaud Bailly <arnaud.oq...@gmail.com> wrote:
>>
>> It's aggregation at multiple levels in a query: first do some aggregation
>> on one tavle, then join with another table and do a second aggregation. I
>> could probably rewrite the query in such a way that it does aggregation in
>> one pass but that would obfuscate the purpose of the various stages.
>> Le 7 juil. 2016 12:55, "Sivakumaran S" <siva.kuma...@me.com> a écrit :
>>
>>> Hi Arnauld,
>>>
>>> Sorry for the doubt, but what exactly is multiple aggregation? What is
>>> the use case?
>>>
>>> Regards,
>>>
>>> Sivakumaran
>>>
>>>
>>> On 07-Jul-2016, at 11:18 AM, Arnaud Bailly <arnaud.oq...@gmail.com>
>>> wrote:
>>>
>>> Hello,
>>>
>>> I understand multiple aggregations over streaming dataframes is not
>>> currently supported in Spark 2.0. Is there a workaround? Out of the top of
>>> my head I could think of having a two stage approach:
>>>  - first query writes output to disk/memory using "complete" mode
>>>  - second query reads from this output
>>>
>>> Does this makes sense?
>>>
>>> Furthermore, I would like to understand what are the technical hurdles
>>> that are preventing Spark SQL from implementing multiple aggregation right
>>> now?
>>>
>>> Thanks,
>>> --
>>> Arnaud Bailly
>>>
>>> twitter: abailly
>>> skype: arnaud-bailly
>>> linkedin: http://fr.linkedin.com/in/arnaudbailly/
>>>
>>>
>>>
>>
>

Re: Multiple aggregations over streaming dataframes

Reply via email to