Re: Does Structured Streaming support count(distinct) over all the streaming data?

Sean Owen Wed, 18 May 2016 01:39:04 -0700

Late to the thread, but, why is counting distinct elements over a
24-hour window not possible? you can certainly do it now, and I'd
presume it's possible with structured streaming with a window.


countByValueAndWindow should do it right? the keys (with non-zero
counts, I suppose) in a window are the distinct values from the stream
in that window. Your example looks right.

On Wed, May 18, 2016 at 12:17 AM, Mich Talebzadeh
<mich.talebza...@gmail.com> wrote:
>
> Ok What can be used here below
>
> //val countDistinctByValueAndWindow = price.filter(_ > 0.0).reduceByKey((t1, 
> t2) -> t1).countByValueAndWindow(Seconds(windowLength), 
> Seconds(slidingInterval))
> //countDistinctByValueAndWindow.print()
>
>>> On 17 May 2016 at 20:02, Michael Armbrust <mich...@databricks.com> wrote:
>>>> In 2.0 you won't be able to do this.  The long term vision would be to 
>>>> make this possible, but a window will be required (like the 24 hours you 
>>>> suggest).
>>>>
>>>> On Tue, May 17, 2016 at 1:36 AM, Todd <bit1...@163.com> wrote:
>>>>>
>>>>> Hi,
>>>>> We have a requirement to do count(distinct) in a processing batch against 
>>>>> all the streaming data(eg, last 24 hours' data),that is,when we do 
>>>>> count(distinct),we actually want to compute distinct against last 24 
>>>>> hours' data.
>>>>> Does structured streaming support this scenario?Thanks!

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Does Structured Streaming support count(distinct) over all the streaming data?

Reply via email to