Hi,

Which version of Kafka are you using? This should be fixed in 0.10.2.1, any 
chance you could try that release?

Thanks
Eno
> On 3 May 2017, at 14:04, Sameer Kumar <sam.kum.w...@gmail.com> wrote:
> 
> Hi,
> 
>  
> I ran two nodes in my streams compute cluster, they were running fine for few 
> hours before outputting with failure to rebalance errors.
> 
> 
> 
> I couldnt understand why this happened but I saw one strange behaviour...
> 
> at 16:53 on node1, I saw "Failed to lock the state directory" error, this 
> might have caused the partitions to relocate and hence the error.
> 
>  
> I am attaching detailed logs for both the nodes, please see if you can help.
> 
>  
> Some of the logs for quick reference are these.
> 
>  
> 2017-05-03 16:53:53 ERROR Kafka10Base:44 - Exception caught in thread 
> StreamThread-2
> 
> org.apache.kafka.streams.errors.StreamsException: stream-thread 
> [StreamThread-2] Failed to rebalance
> 
>                 at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:612)
> 
>                 at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:368)
> 
> Caused by: org.apache.kafka.streams.errors.StreamsException: stream-thread 
> [StreamThread-2] failed to suspend stream tasks
> 
>                 at 
> org.apache.kafka.streams.processor.internals.StreamThread.suspendTasksAndState(StreamThread.java:488)
> 
>                 at 
> org.apache.kafka.streams.processor.internals.StreamThread.access$1200(StreamThread.java:69)
> 
>                 at 
> org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsRevoked(StreamThread.java:259)
> 
>                 at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinPrepare(ConsumerCoordinator.java:396)
> 
>                 at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:329)
> 
>                 at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:303)
> 
>                 at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:286)
> 
>                 at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1030)
> 
>                 at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995)
> 
>                 at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:582)
> 
>                 ... 1 more
> 
> Caused by: org.apache.kafka.clients.consumer.CommitFailedException: Commit 
> cannot be completed since the group has already rebalanced and assigned the 
> partitions to another member. This means that the time between subsequent 
> calls to poll() was longer than the configured max.poll.interval.ms 
> <http://max.poll.interval.ms/>, which typically implies that the poll loop is 
> spending too much time message processing. You can address this either by 
> increasing the session timeout or by reducing the maximum size of batches 
> returned in poll() with max.poll.records.
> 
>                 at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:698)
> 
>                 at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:577)
> 
>                 at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1125)
> 
>                 at 
> org.apache.kafka.streams.processor.internals.StreamTask.commitOffsets(StreamTask.java:296)
> 
>                 at 
> org.apache.kafka.streams.processor.internals.StreamThread$3.apply(StreamThread.java:535)
> 
>                 at 
> org.apache.kafka.streams.processor.internals.StreamThread.performOnAllTasks(StreamThread.java:503)
> 
>                 at 
> org.apache.kafka.streams.processor.internals.StreamThread.commitOffsets(StreamThread.java:531)
> 
>                 at 
> org.apache.kafka.streams.processor.internals.StreamThread.suspendTasksAndState(StreamThread.java:480)
> 
>                 ... 10 more
> 
>  
> 2017-05-03 16:53:57 WARN  StreamThread:1184 - Could not create task 1_38. 
> Will retry.
> 
> org.apache.kafka.streams.errors.LockException: task [1_38] Failed to lock the 
> state directory: /data/streampoc/LIC2-5/1_38
> 
>                 at 
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.<init>(ProcessorStateManager.java:102)
> 
>                 at 
> org.apache.kafka.streams.processor.internals.AbstractTask.<init>(AbstractTask.java:73)
> 
>                 at 
> org.apache.kafka.streams.processor.internals.StreamTask.<init>(StreamTask.java:108)
> 
>                 at 
> org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:834)
> 
>                 at 
> org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:1207)
> 
>                 at 
> org.apache.kafka.streams.processor.internals.StreamThread$AbstractTaskCreator.retryWithBackoff(StreamThread.java:1180)
> 
>                 at 
> org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:937)
> 
>                 at 
> org.apache.kafka.streams.processor.internals.StreamThread.access$500(StreamThread.java:69)
> 
>                 at 
> org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:236)
> 
> 
> 
> Regards,
> 
> -Sameer.
> 
> <node2.zip><node1.zip>

Reply via email to