We are planning to address this issue in the future. At a high level, we'll have to add a delta mode so that updates can be communicated from one operator to the next.
On Thu, Jul 7, 2016 at 8:59 AM, Arnaud Bailly <arnaud.oq...@gmail.com> wrote: > Indeed. But nested aggregation does not work with Structured Streaming, > that's the point. I would like to know if there is workaround, or what's > the plan regarding this feature which seems to me quite useful. If the > implementation is not overtly complex and it is just a matter of manpower, > I am fine with devoting some time to it. > > > > -- > Arnaud Bailly > > twitter: abailly > skype: arnaud-bailly > linkedin: http://fr.linkedin.com/in/arnaudbailly/ > > On Thu, Jul 7, 2016 at 2:17 PM, Sivakumaran S <siva.kuma...@me.com> wrote: > >> Arnauld, >> >> You could aggregate the first table and then merge it with the second >> table (assuming that they are similarly structured) and then carry out the >> second aggregation. Unless the data is very large, I don’t see why you >> should persist it to disk. IMO, nested aggregation is more elegant and >> readable than a complex single stage. >> >> Regards, >> >> Sivakumaran >> >> >> >> On 07-Jul-2016, at 1:06 PM, Arnaud Bailly <arnaud.oq...@gmail.com> wrote: >> >> It's aggregation at multiple levels in a query: first do some aggregation >> on one tavle, then join with another table and do a second aggregation. I >> could probably rewrite the query in such a way that it does aggregation in >> one pass but that would obfuscate the purpose of the various stages. >> Le 7 juil. 2016 12:55, "Sivakumaran S" <siva.kuma...@me.com> a écrit : >> >>> Hi Arnauld, >>> >>> Sorry for the doubt, but what exactly is multiple aggregation? What is >>> the use case? >>> >>> Regards, >>> >>> Sivakumaran >>> >>> >>> On 07-Jul-2016, at 11:18 AM, Arnaud Bailly <arnaud.oq...@gmail.com> >>> wrote: >>> >>> Hello, >>> >>> I understand multiple aggregations over streaming dataframes is not >>> currently supported in Spark 2.0. Is there a workaround? Out of the top of >>> my head I could think of having a two stage approach: >>> - first query writes output to disk/memory using "complete" mode >>> - second query reads from this output >>> >>> Does this makes sense? >>> >>> Furthermore, I would like to understand what are the technical hurdles >>> that are preventing Spark SQL from implementing multiple aggregation right >>> now? >>> >>> Thanks, >>> -- >>> Arnaud Bailly >>> >>> twitter: abailly >>> skype: arnaud-bailly >>> linkedin: http://fr.linkedin.com/in/arnaudbailly/ >>> >>> >>> >> >