Thank you Sanket for the feedback.
Regards
SM
> On 19-Nov-2015, at 1:57 PM, Sanket Patil wrote:
>
> Hey Sandip:
>
> TD has already outlined the right approach, but let me add a couple of
> thoughts as I recently worked on a similar project. I had to compute some
> real-time metrics on streami
Hey Sandip:
TD has already outlined the right approach, but let me add a couple of
thoughts as I recently worked on a similar project. I had to compute some
real-time metrics on streaming data. Also, these metrics had to be
aggregated for hour/day/week/month. My data pipeline was Kafka --> Spark
S
Thank you TD for your time and help.
SM
> On 19-Nov-2015, at 6:58 AM, Tathagata Das wrote:
>
> There are different ways to do the rollups. Either update rollups from the
> streaming application, or you can generate roll ups in a later process - say
> periodic Spark job every hour. Or you could
There are different ways to do the rollups. Either update rollups from the
streaming application, or you can generate roll ups in a later process -
say periodic Spark job every hour. Or you could just generate rollups on
demand, when it is queried.
The whole thing depends on your downstream require
TD thank you for your reply.
I agree on data store requirement. I am using HBase as an underlying store.
So for every batch interval of say 10 seconds
- Calculate the time dimension ( minutes, hours, day, week, month and quarter )
along with other dimensions and metrics
- Update relevant base t
For this sort of long term aggregations you should use a dedicated data
storage systems. Like a database, or a key-value store. Spark Streaming
would just aggregate and push the necessary data to the data store.
TD
On Sat, Nov 14, 2015 at 9:32 PM, Sandip Mehta
wrote:
> Hi,
>
> I am working on r
Hi,
I am working on requirement of calculating real time metrics and building
prototype on Spark streaming. I need to build aggregate at Seconds, Minutes,
Hours and Day level.
I am not sure whether I should calculate all these aggregates as different
Windowed function on input DStream or sh