Hi TD,

I have been able to filter the first WindowedRDD, but I am not sure how to make 
a generic filter. The larger window is 8 seconds and want to fetch 4 second 
based on application-time-stamp. I have seen an earlier post which suggest 
timeStampBasedwindow but I am not sure how to make timestampBasedwindow in the 
following example. 


 val transformed = keyAndValues.window(Seconds(8), 
Seconds(4)).transform(windowedRDD => {
 //val timeStampBasedWindow = ???                    // define the window over 
the timestamp that you want to process
 val filteredRDD = windowedRDD.filter(_._1 < 4)     // filter and retain only 
the records that fall in the timestamp-based window
 return filteredRDD
         })
Consider the input tuples as (1,23),(1.2,34) . . . . . (3.8,54)(4,413) . . .  
whereas key is the timestamp.

Regards,
Laeeq
 



On Saturday, July 12, 2014 8:29 PM, Laeeq Ahmed <laeeqsp...@yahoo.com> wrote:
 


Hi,
Thanks I will try to implement it.

Regards,
Laeeq



 On Saturday, July 12, 2014 4:37 AM, Tathagata Das 
<tathagata.das1...@gmail.com> wrote:
 


This is not in the current streaming API.

Queue stream is useful for testing with generated RDDs, but not for actual 
data. For actual data stream, the slack time can be implemented by doing 
DStream.window on a larger window that take slack time in consideration, and 
then the required application-time-based-window of data filtered out. For 
example, if you want a slack time of 1 minute and batches of 10 seconds, then 
do a window operation of 70 seconds, then in each RDD filter out the records 
with the desired application time and process them. 

TD



On Fri, Jul 11, 2014 at 7:44 AM, Laeeq Ahmed <laeeqsp...@yahoo.com> wrote:

Hi,
>
>
>In the spark streaming paper, "slack time" has been suggested for delaying the 
>batch creation in case of external timestamps. I don't see any such option in 
>streamingcontext. Is it available in the API?
>
>
>
>Also going through the previous posts, queueStream has been suggested for 
>this. I looked into to queueStream example.
>
>
>
>     // Create and push some RDDs into Queue
>        for (i <- 1 to 30) {
>                rddQueue += ssc.sparkContext.makeRDD(1 to 10)
>                Thread.sleep(1000)
>                }
>
>The only thing I am unsure is how to make batches(basic RDD) out of stream 
>coming on a port.
>
>
>Regards,
>Laeeq
>
> 

Reply via email to