Re: Stateful topology hangs

Abhishek Raj Tue, 28 Feb 2017 13:02:54 -0800

Thanks for the response but having spent a couple of days on it, I still
have no clue what is wrong. I tried tweaking the max spout pending but
didn't look like that helped. I think it is related to storm's state
management feature. I am mostly seeing this issue occur when one of the
workers of the stateful topology gets killed and restarts itself. As soon
as that happens both checkpoint spout and kafka spout stop emitting and
only this gets printed in the logs repeatedly -


o.a.s.k.DynamicBrokersReader [INFO] Read partition info from zookeeper:
GlobalPartitionInformation
o.a.s.k.KafkaUtils [INFO] Task [id] assigned
o.a.s.k.ZkCoordinator [INFO] Task [id] Deleted partition managers: []
o.a.s.k.ZkCoordinator [INFO] Task [id] New partition managers: []
o.a.s.k.ZkCoordinator [INFO] Task [id] Finished refreshing

The bolts don't even receive tick tuples let alone any data from kafka once
the topology hangs.

I would greatly appreciate any help on this.


On Thu, Feb 23, 2017 at 3:43 AM, Erik Weathers <eweath...@groupon.com>
wrote:

> I've seen issues like this with storm-kafka v0.9.  A root cause in one
> case was messages in Kafka being larger than the max size assumed by the
> consumers.  You could try increasing the appropriate setting in the kafka
> spout configuration:
>
> https://github.com/apache/storm/blob/v0.9.6/external/
> storm-kafka/src/jvm/storm/kafka/KafkaConfig.java#L34
>
> By default it is 1MiB, you could try increasing it.  Of course, this
> should only have been possible if your kafka has an increased maximum
> message size, since the default for that is 1MiB too (notably, increasing
> this limit is *not* recommended, but was unfortunately done in our kafka
> clusters to appease some unconventional use case).
>
> The other cases I've seen that kind of problem happen are when you have a
> custom Scheme and it's not handling some unexpected message format
> correctly and ends up repeatedly fetching the same offset.
>
> Do you know if the spout is continuously fetching an offset, or if it is
> *literally* stuck?   If you have monitoring of the topic's consumption on
> kafka then it should be obvious.  If you do not then you could use
> Wireshark's cmdline tool called tshark to sniff kafka requests and see if
> the same offset is being requested.
>
> - Erik
>
> On Wed, Feb 22, 2017 at 1:41 PM Abhishek Raj <abhishek....@saavn.com>
> wrote:
>
>> Thanks for the quick response. I am using Storm 1.0.2 and the storm-kafka
>> version is 1.0.2 as well. The kafka version being used is 2.9.2 - 0.8.1.1
>>
>>
>> On Thu, Feb 23, 2017 at 3:04 AM, P. Taylor Goetz <ptgo...@gmail.com>
>> wrote:
>>
>> What version of Storm are you using? And which Kafka spout (i.e.
>> storm-kafka or storm-kafka-client)?
>>
>> -Taylor
>>
>> On Feb 22, 2017, at 4:32 PM, Abhishek Raj <abhishek....@saavn.com> wrote:
>>
>> Hello, I am using storm's state management feature in a topology.
>> The topology has a kafkaspout and a StatefulBolt which uses
>> RedisKeyValueState. What I observe is that after some time of running
>> smoothly, the spout just stops consuming from the kafka topic and
>> the $checkpointspout stops emitting any checkpoint tuples. The topology
>> just hangs and there are no error messages in the logs. The spout acts as
>> if there are no more messages to consume even though there are. This
>> happens very randomly and if I restart the topology the error may or may
>> not come.
>>
>> Can anyone please help in debugging this? I tried searching on jira but
>> couldn't find a bug related to this issue.
>>
>> Thanks,
>>
>> --
>> Abhishek
>>
>>
>>
>>
>>
>> --
>> Abhishek
>>
>


-- 
Abhishek

Re: Stateful topology hangs

Reply via email to