Late to the thread, but, why is counting distinct elements over a 24-hour window not possible? you can certainly do it now, and I'd presume it's possible with structured streaming with a window.
countByValueAndWindow should do it right? the keys (with non-zero counts, I suppose) in a window are the distinct values from the stream in that window. Your example looks right. On Wed, May 18, 2016 at 12:17 AM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > > Ok What can be used here below > > //val countDistinctByValueAndWindow = price.filter(_ > 0.0).reduceByKey((t1, > t2) -> t1).countByValueAndWindow(Seconds(windowLength), > Seconds(slidingInterval)) > //countDistinctByValueAndWindow.print() > >>> On 17 May 2016 at 20:02, Michael Armbrust <mich...@databricks.com> wrote: >>>> In 2.0 you won't be able to do this. The long term vision would be to >>>> make this possible, but a window will be required (like the 24 hours you >>>> suggest). >>>> >>>> On Tue, May 17, 2016 at 1:36 AM, Todd <bit1...@163.com> wrote: >>>>> >>>>> Hi, >>>>> We have a requirement to do count(distinct) in a processing batch against >>>>> all the streaming data(eg, last 24 hours' data),that is,when we do >>>>> count(distinct),we actually want to compute distinct against last 24 >>>>> hours' data. >>>>> Does structured streaming support this scenario?Thanks! --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org