Re: Flush aggregated data every X seconds

Corey Nolet Thu, 24 Apr 2014 20:37:32 -0700

Raphael, in your case it sounds like a "TickSpout" could be useful where
you emit a tuple every n time slices and then sleep until needing to emit
another. I'm not sure how that'd work in a Trident aggregator, however.

I'm not sure if this is something Nathan or the community would approve of,
but I've been writing my own framework for doing sliding/tumbling windows
in Storm that allow aggregations and triggering/eviction by count, time,
and other policies like "when the time difference between the first item
and the last item in a window is less than x". The bolts could easily be
ripped out for doing your own aggregations.

It's located here: https://github.com/calrissian/flowbox

It's very much so in the proof of concept stage. My other requirement (and
the reason I cared so much to implement this) was that the rules need to be
dynamic and the topology needs to be static as to make the best use of
resources while users are defining that they need.

On Thu, Apr 24, 2014 at 11:27 PM, Raphael Hsieh <raffihs...@gmail.com>wrote:

> Is there a way in Storm Trident to aggregate data over a certain time
> period and have it flush the data out to an external data store after that
> time period is up ?
>
> Trident does not have the functionality of Tick Tuples yet, so I cannot
> use that. Everything I've been researching leads to believe that this is
> not possible in Storm/Trident, however this seems to me to be a fairly
> standard use case of any streaming map reduce library.
>
> For example,
> If I am receiving a stream of integers
> I want to aggregate all those integers over a period of 1 second, then
> persist it into an external datastore.
>
> This is not in order to count how much it will add up to over X amount of
> time, rather I would like to minimize the read/write/updates I do to said
> datastore.
>
> There are many ways in order to reduce these variables, however all of
> them force me to modify my schema in ways that are unpleasant. Also, I
> would rather not have my final external datastore be my scratch space,
> where my program is reading/updating/writing and checking to make sure that
> the transaction id's line up.
> Instead I want that scratch work to be done separately, then the final
> result stored into a final database that no longer needs to do constant
> updating.
>
> Thanks
> --
> Raphael Hsieh
>
>
>
>

Re: Flush aggregated data every X seconds

Reply via email to