Hi Adrian,

Zookeeper does not keep any state we are developing as part of our
topologies, that's up to us to do that.

Also, keep in mind that holding the counter in memory is not very scalable
since it assume that all counting all happen within one Bolt in one
machine: what if you get plenty of traffic and decide your topology to
scale horizontally (which is one of the main point of using Storm, IMHO)?
It would become tricky in that case to coordinate all those in-memory
counters happening in parallel in different machines.


If you use Trident, Storm lets you process tuples per batch and re-group
your counting operation for each word sequencialy always in the same place.
Before counting words from a new batch of words, you can first read the
counter state from DB, then do the in-memory counting, then go write to DB
the new counter value after the batch. Storm actually even adds one
supplementary layer of association to optimize even further: it lets us
read plenty of previous counters as one shot in from DB, then lets you do
your counting thing for each one individually, then lets you write one
single write operation that updates plenty of counter values in DB. And
yes, all that without any race condition :D

I wrote a blog post about that a few months ago, and I love to see traffic
to it, so feel free to click like crazy ^__^:

http://svendvanderveken.wordpress.com/2013/07/30/scalable-real-time-state-update-with-storm/



Svend





On Thu, Dec 12, 2013 at 6:20 PM, Adrian Mocanu <[email protected]>wrote:

>  Hi
>
> I’ve seen several simple aggregation examples on the web. Can’t find one
> that answers my question though.
>
> I’m wondering if Zookeeper saves the states of bolts so if 1 aggregation
> bolt crashes, then when it restarts the worker it will start from previous
> state. I use acks (and might do batch processing as well.)
>
>
>
> For example, let’s say I have to count every minute how many words of the
> same type I find and store them in a db.
>
> My bolt would keep counters for each work and at the end of every minute
> dump the counters it holds in memory to db.
>
>
>
> eg:
>
>    input: The peanut is great. The ocean is great.
>
>    Bolt state after input is processed:
>
>      the:2
>
>     peanut:1
>
>     is:2
>
>     great:2
>
>     ocean:1
>
>
>
> *So if the bolt crashes before it commits to db the counters, does
> Zookeeper save that state? *
>
> If not, then do you have suggestions/links on what the best way to do this
> is?
>
>
>
> Thanks
>
> -Adrian
>
>
>

Reply via email to