Re: Calculating Timeseries Aggregation

2015-11-19 Thread Sandip Mehta
Thank you Sanket for the feedback. Regards SM > On 19-Nov-2015, at 1:57 PM, Sanket Patil wrote: > > Hey Sandip: > > TD has already outlined the right approach, but let me add a couple of > thoughts as I recently worked on a similar project. I had to compute some > real-time metrics on streami

Re: Calculating Timeseries Aggregation

2015-11-19 Thread Sanket Patil
Hey Sandip: TD has already outlined the right approach, but let me add a couple of thoughts as I recently worked on a similar project. I had to compute some real-time metrics on streaming data. Also, these metrics had to be aggregated for hour/day/week/month. My data pipeline was Kafka --> Spark S

Re: Calculating Timeseries Aggregation

2015-11-18 Thread Sandip Mehta
Thank you TD for your time and help. SM > On 19-Nov-2015, at 6:58 AM, Tathagata Das wrote: > > There are different ways to do the rollups. Either update rollups from the > streaming application, or you can generate roll ups in a later process - say > periodic Spark job every hour. Or you could

Re: Calculating Timeseries Aggregation

2015-11-18 Thread Tathagata Das
There are different ways to do the rollups. Either update rollups from the streaming application, or you can generate roll ups in a later process - say periodic Spark job every hour. Or you could just generate rollups on demand, when it is queried. The whole thing depends on your downstream require

Re: Calculating Timeseries Aggregation

2015-11-18 Thread Sandip Mehta
TD thank you for your reply. I agree on data store requirement. I am using HBase as an underlying store. So for every batch interval of say 10 seconds - Calculate the time dimension ( minutes, hours, day, week, month and quarter ) along with other dimensions and metrics - Update relevant base t

Re: Calculating Timeseries Aggregation

2015-11-17 Thread Tathagata Das
For this sort of long term aggregations you should use a dedicated data storage systems. Like a database, or a key-value store. Spark Streaming would just aggregate and push the necessary data to the data store. TD On Sat, Nov 14, 2015 at 9:32 PM, Sandip Mehta wrote: > Hi, > > I am working on r

Calculating Timeseries Aggregation

2015-11-14 Thread Sandip Mehta
Hi, I am working on requirement of calculating real time metrics and building prototype on Spark streaming. I need to build aggregate at Seconds, Minutes, Hours and Day level. I am not sure whether I should calculate all these aggregates as different Windowed function on input DStream or sh