Re: long running bolts

Mike Thomsen Tue, 02 Jun 2015 06:12:02 -0700

The way I understand the documentation, the tuple will not be fully purged
from the system until all bolts it can pass through have reported success
back to the spout. That may not hold true with unanchored tuples, but
should definitely be true for ones you've anchored (ie... emit(tuple,
someMessageId)  )


You probably can do this with Storm, I just wouldn't mess with the current
topology.

On Tue, Jun 2, 2015 at 8:58 AM, Subrat Basnet <sub...@myktm.com> wrote:

>  Thanks Mike.
>
> What if I increase the parallelism of the long running bolt, so that other
> threads would still go thru it when export is not required?
> Will ONE bolt thread taking a long time, still need me to increase the
> message timeout to a very high number?
>
> I’ve been contemplating sending the long running bolt’s task to
> a German job queue asynchronously and have it process there instead of
> Storm.
>
>
> --
> Subrat Basnet
> Sent with Sparrow <http://www.sparrowmailapp.com/?sig>
>
> On Tuesday, June 2, 2015 at 6:29 PM, Mike Thomsen wrote:
>
> Not that I know of. I don't think Storm has an effective mechanism for
> allowing a particular tuple to take such an unusually long time while not
> affecting the amount of time allotted to the other bolts and tuples. You
> could jack those numbers up very, very high, but that could conceivably
> encourage other bolts to take longer than you want instead of
> failing/replaying a tuple.
>
> If you REALLY want to use storm for this long processing of a single piece
> of data, you could do something like this supposing it's just one bolt that
> does the long processing:
>
> 1. Write the data to disk, hdfs, hbase, RDBMS, etc.
> 2. Write a new topology based on a signal spout (zookeeper signals)
> 3. Give your new topology a ridiculously high amount of time for
> processing a single tuple
> 4. Have your current topology use SignalClient to post a zookeeper message
> for the the new one, when the last tuple is ready to be processed
>
> On Tue, Jun 2, 2015 at 8:04 AM, Subrat Basnet <sub...@myktm.com> wrote:
>
>  Hi there,
>
> Is it normal to have long running bolts once in a while? When I say long
> running, I’m talking about a bolt that takes a few hours to process a tuple.
>
> I need to export data, push notifications and upload files with this when
> I reach the LAST tuple of a sequence of tuples. This does not happen on
> every tuple.
>
> I am worried, that this will make my whole topology hang up. My instinct
> is to give this particular bolt a higher parallelism, so that other threads
> are available to process when one bolt is hung up.
>
> Please advise, what would be the best way to achieve this.
>
> Thanks!
> Subrat
>
> --
> Subrat Basnet
> Sent with Sparrow <http://www.sparrowmailapp.com/?sig>
>
>
>
>

Re: long running bolts

Reply via email to