Except I want it to be a sliding window. So the same record could be in multiple buckets.
On Tue, Jan 6, 2015 at 3:43 PM, Sean Owen <so...@cloudera.com> wrote: > So you want windows covering the same length of time, some of which will > be fuller than others? You could, for example, simply bucket the data by > minute to get this kind of effect. If you an RDD[Ticker], where Ticker has > a timestamp in ms, you could: > > tickerRDD.groupBy(ticker => (ticker.timestamp / 60000) * 60000)) > > ... to get an RDD[(Long,Iterable[Ticker])], where the keys are the moment > at the start of each minute, and the values are the Tickers within the > following minute. You can try variations on this to bucket in different > ways. > > Just be careful because a minute with a huge number of values might cause > you to run out of memory. If you're just doing aggregations of some kind > there are more efficient methods than this most generic method, like the > aggregate methods. > > On Tue, Jan 6, 2015 at 8:34 PM, Asim Jalis <asimja...@gmail.com> wrote: > >> Thanks. Another question. I have event data with timestamps. I want to >> create a sliding window using timestamps. Some windows will have a lot of >> events in them others won’t. Is there a way to get an RDD made of this kind >> of a variable length window? >> >> >> On Tue, Jan 6, 2015 at 1:03 PM, Sean Owen <so...@cloudera.com> wrote: >> >>> First you'd need to sort the RDD to give it a meaningful order, but I >>> assume you have some kind of timestamp in your data you can sort on. >>> >>> I think you might be after the sliding() function, a developer API in >>> MLlib: >>> >>> >>> https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/rdd/RDDFunctions.scala#L43 >>> >>> On Tue, Jan 6, 2015 at 5:25 PM, Asim Jalis <asimja...@gmail.com> wrote: >>> >>>> Is there an easy way to do a moving average across a single RDD (in a >>>> non-streaming app). Here is the use case. I have an RDD made up of stock >>>> prices. I want to calculate a moving average using a window size of N. >>>> >>>> Thanks. >>>> >>>> Asim >>>> >>> >>> >> >