Hi I have a large, stateful, KafkaStreams application that is on a never ending rebalance loop. I can see that Task restorations take a loooong time (circa 30-45 min). And after that I see this error. This is followed by tasks being suspended, and the instance re-joining the group and a new rebalance is triggered. Any ideas on how to fix this?
WARN org.apache.kafka.streams.processor.internals.StreamThread - stream- thread [inventory-streams-green-0-StreamThread-1] Detected that the thread is being fenced. This implies that this thread missed a rebalance and dropped out of the consumer group. Will close out all assigned tasks and rejoin the consumer group. org.apache.kafka.streams.errors.TaskMigratedException: Consumer committing offsets failed, indicating the corresponding thread is no longer part of the group; it means all tasks belonging to this thread should be migrated. at org.apache.kafka.streams.processor.internals.TaskManager.commitOffsetsOrTransaction (TaskManager.java:1141) ~[app.jar:?] at org.apache.kafka.streams.processor.internals.TaskManager.handleRevocation( TaskManager.java:541) ~[app.jar:?] at org.apache.kafka.streams.processor.internals.StreamsRebalanceListener.onPartitionsRevoked (StreamsRebalanceListener.java:95) ~[app.jar:?] at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.invokePartitionsRevoked (ConsumerCoordinator.java:312) ~[app.jar:?] at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete (ConsumerCoordinator.java:408) ~[app.jar:?] at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded (AbstractCoordinator.java:449) ~[app.jar:?] at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup (AbstractCoordinator.java:365) ~[app.jar:?] at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll( ConsumerCoordinator.java:508) ~[app.jar:?] at org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded (KafkaConsumer.java:1261) ~[app.jar:?] at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1230 ) ~[app.jar:?] at org.apache.kafka.clients.consumer.KafkaConsumer.poll( KafkaConsumer.java:1210) ~[app.jar:?] at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests( StreamThread.java:925) ~[app.jar:?] at org.apache.kafka.streams.processor.internals.StreamThread.pollPhase( StreamThread.java:885) ~[app.jar:?] at org.apache.kafka.streams.processor.internals.StreamThread.runOnce( StreamThread.java:720) [app.jar:?] at org.apache.kafka.streams.processor.internals.StreamThread.runLoop( StreamThread.java:583) [app.jar:?] at org.apache.kafka.streams.processor.internals.StreamThread.run( StreamThread.java:556) [app.jar:?] Caused by: org.apache.kafka.clients.consumer.CommitFailedException: Offset commit cannot be completed since the consumer is not part of an active group for auto partition assignment; it is likely that the consumer was kicked out of the group. at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest (ConsumerCoordinator.java:1139) ~[app.jar:?] at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync (ConsumerCoordinator.java:1004) ~[app.jar:?] at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync( KafkaConsumer.java:1490) ~[app.jar:?] at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync( KafkaConsumer.java:1438) ~[app.jar:?] at org.apache.kafka.streams.processor.internals.TaskManager.commitOffsetsOrTransaction (TaskManager.java:1139) ~[app.jar:?] ... 15 more