Re: Time series aggregation with storm Trident

Ajay Chander Mon, 14 Sep 2015 11:29:20 -0700

Hi Andrew,

Thank you for your response. If possible can you point me to any such
implementation examples where they used trident.


I have a couple of questions:

If we persist the state to the external store like memsql, I assume that
the computation is happening on the memsql side. In this case are we
limiting ourselves with regards to the scalability provided by Apache
storm? Here in this particular scenario are we using storm just as a router
?

Thank you for your time,
Ajay

On Monday, September 14, 2015, Andrew Xor <[email protected]>
wrote:

> I have, what you could do is have an external persistent store (something
> fast, say for example Memcached or Haystack) that you have your aggregation
> batches for a specific time-slice. For example have a 1 hour window with
> 10-minute slices that are cleared and rotated as needed. Another problem
> that you have to deal with is the fact that should a spout source fails
> everything is delayed unless you have an opaque spout which of course has
> some downsides as indicated here
> <https://storm.apache.org/documentation/Trident-state>.
>
> Hope this helps.
>
> Kindly yours,
>
> Andrew Grammenos
>
> -- PGP PKey --
>  <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt
>
> On Mon, Sep 14, 2015 at 7:40 PM, Ajay Chander <[email protected]
> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>
>> Hi Guys,
>>
>> Right now I am trying to implement the same as mentioned by Elango in the
>> below email. I want to perform aggregations based on a time window using
>> trident. Anyone have done this before using trident? Any help is highly
>> appreciated.
>>
>> Thank you,
>> Ajay
>>
>>
>> On Thursday, August 27, 2015, Rajasekar Elango <[email protected]
>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>>
>>> We have time series data in kafka and we want to aggregate it in storm
>>> using trident. I was able to get data aggregated using persistentAggregate
>>> based onFAQ <https://storm.apache.org/documentation/FAQ.html>. But
>>> aggregation is always done within small batches, I could not figure out a
>>> way to detect when all events for a one minute time window is processed.
>>> Calling each after persistentAggregate(...).newValuesStream() returns
>>> results as soon as a batch is processed, but I want to aggregate values
>>> across multiple batches for a time window. I could not find good answer or
>>> example online. I also see mixed opinion, some people say it's not possible
>>> to do time window aggregation in trident, some people say it's possible
>>> (especially FAQ <https://storm.apache.org/documentation/FAQ.html> looks
>>> promising). The alternate option seem to be using tick tuples with storm
>>> basic, but would prefer to do it in trident as it has better guaranteed
>>> processing semantics and abstraction for persistence.
>>>
>>> Can some one provide more details or examples on how to do this?
>>>
>>> --
>>> Thanks,
>>> Raja.
>>>
>>
>

Re: Time series aggregation with storm Trident

Reply via email to