Hi, Which version of Kafka are you using? This should be fixed in 0.10.2.1, any chance you could try that release?
Thanks Eno > On 3 May 2017, at 14:04, Sameer Kumar <sam.kum.w...@gmail.com> wrote: > > Hi, > > > I ran two nodes in my streams compute cluster, they were running fine for few > hours before outputting with failure to rebalance errors. > > > > I couldnt understand why this happened but I saw one strange behaviour... > > at 16:53 on node1, I saw "Failed to lock the state directory" error, this > might have caused the partitions to relocate and hence the error. > > > I am attaching detailed logs for both the nodes, please see if you can help. > > > Some of the logs for quick reference are these. > > > 2017-05-03 16:53:53 ERROR Kafka10Base:44 - Exception caught in thread > StreamThread-2 > > org.apache.kafka.streams.errors.StreamsException: stream-thread > [StreamThread-2] Failed to rebalance > > at > org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:612) > > at > org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:368) > > Caused by: org.apache.kafka.streams.errors.StreamsException: stream-thread > [StreamThread-2] failed to suspend stream tasks > > at > org.apache.kafka.streams.processor.internals.StreamThread.suspendTasksAndState(StreamThread.java:488) > > at > org.apache.kafka.streams.processor.internals.StreamThread.access$1200(StreamThread.java:69) > > at > org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsRevoked(StreamThread.java:259) > > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinPrepare(ConsumerCoordinator.java:396) > > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:329) > > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:303) > > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:286) > > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1030) > > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995) > > at > org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:582) > > ... 1 more > > Caused by: org.apache.kafka.clients.consumer.CommitFailedException: Commit > cannot be completed since the group has already rebalanced and assigned the > partitions to another member. This means that the time between subsequent > calls to poll() was longer than the configured max.poll.interval.ms > <http://max.poll.interval.ms/>, which typically implies that the poll loop is > spending too much time message processing. You can address this either by > increasing the session timeout or by reducing the maximum size of batches > returned in poll() with max.poll.records. > > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:698) > > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:577) > > at > org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1125) > > at > org.apache.kafka.streams.processor.internals.StreamTask.commitOffsets(StreamTask.java:296) > > at > org.apache.kafka.streams.processor.internals.StreamThread$3.apply(StreamThread.java:535) > > at > org.apache.kafka.streams.processor.internals.StreamThread.performOnAllTasks(StreamThread.java:503) > > at > org.apache.kafka.streams.processor.internals.StreamThread.commitOffsets(StreamThread.java:531) > > at > org.apache.kafka.streams.processor.internals.StreamThread.suspendTasksAndState(StreamThread.java:480) > > ... 10 more > > > 2017-05-03 16:53:57 WARN StreamThread:1184 - Could not create task 1_38. > Will retry. > > org.apache.kafka.streams.errors.LockException: task [1_38] Failed to lock the > state directory: /data/streampoc/LIC2-5/1_38 > > at > org.apache.kafka.streams.processor.internals.ProcessorStateManager.<init>(ProcessorStateManager.java:102) > > at > org.apache.kafka.streams.processor.internals.AbstractTask.<init>(AbstractTask.java:73) > > at > org.apache.kafka.streams.processor.internals.StreamTask.<init>(StreamTask.java:108) > > at > org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:834) > > at > org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:1207) > > at > org.apache.kafka.streams.processor.internals.StreamThread$AbstractTaskCreator.retryWithBackoff(StreamThread.java:1180) > > at > org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:937) > > at > org.apache.kafka.streams.processor.internals.StreamThread.access$500(StreamThread.java:69) > > at > org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:236) > > > > Regards, > > -Sameer. > > <node2.zip><node1.zip>