You can use a reduceByKeyAndWindow with your specific time window. You can specify the inverse function in reduceByKeyAndWindow.
On Tue, Feb 24, 2015 at 1:36 PM, Ashish Sharma <ashishonl...@gmail.com> wrote: > So say I want to calculate top K users visiting a page in the past 2 hours > updated every 5 mins. > > so here I want to maintain something like this > > Page_01 => {user_01:32, user_02:3, user_03:7...} > ... > > Basically a count of number of times a user visited a page. Here my key is > page name/id and state is the hashmap. > > Now in updateStateByKey I get the previous state and new events coming > *in* the window. Is there a way to also get the events going *out* of the > window? This was I can incrementally update the state over a rolling window. > > What is the efficient way to do it in spark streaming? > > Thanks > Ashish > -- [image: Sigmoid Analytics] <http://htmlsig.com/www.sigmoidanalytics.com> *Arush Kharbanda* || Technical Teamlead ar...@sigmoidanalytics.com || www.sigmoidanalytics.com