Yes, I have upgraded my cluster and client both to version 10.2.1 and currently monitoring the situation. Will report back in case I find any errors. Thanks for the help though.
-Sameer. On Fri, May 5, 2017 at 3:37 AM, Matthias J. Sax <matth...@confluent.io> wrote: > Did you see Eno's reply? > > Please try out Streams 0.10.2.1 -- this should be fixed there. If not, > please report back. > > I would also recommend to subscribe to the list. It's self-service > http://kafka.apache.org/contact > > > -Matthias > > On 5/3/17 10:49 PM, Sameer Kumar wrote: > > My brokers are on version 10.1.0 and my clients are on version 10.2.0. > > Also, do a reply to all, I am currently not subscribed to the list. > > > > -Sameer. > > > > On Wed, May 3, 2017 at 6:34 PM, Sameer Kumar <sam.kum.w...@gmail.com> > wrote: > > > >> Hi, > >> > >> > >> > >> I ran two nodes in my streams compute cluster, they were running fine > for > >> few hours before outputting with failure to rebalance errors. > >> > >> > >> I couldnt understand why this happened but I saw one strange > behaviour... > >> > >> at 16:53 on node1, I saw "Failed to lock the state directory" error, > this > >> might have caused the partitions to relocate and hence the error. > >> > >> > >> > >> I am attaching detailed logs for both the nodes, please see if you can > >> help. > >> > >> > >> > >> Some of the logs for quick reference are these. > >> > >> > >> > >> 2017-05-03 16:53:53 ERROR Kafka10Base:44 - Exception caught in thread > >> StreamThread-2 > >> > >> org.apache.kafka.streams.errors.StreamsException: stream-thread > >> [StreamThread-2] Failed to rebalance > >> > >> at org.apache.kafka.streams.proce > >> ssor.internals.StreamThread.runLoop(StreamThread.java:612) > >> > >> at org.apache.kafka.streams.proce > >> ssor.internals.StreamThread.run(StreamThread.java:368) > >> > >> Caused by: org.apache.kafka.streams.errors.StreamsException: > >> stream-thread [StreamThread-2] failed to suspend stream tasks > >> > >> at org.apache.kafka.streams.proce > >> ssor.internals.StreamThread.suspendTasksAndState(StreamThread.java:488) > >> > >> at org.apache.kafka.streams.proce > >> ssor.internals.StreamThread.access$1200(StreamThread.java:69) > >> > >> at org.apache.kafka.streams.proce > >> ssor.internals.StreamThread$1.onPartitionsRevoked( > StreamThread.java:259) > >> > >> at org.apache.kafka.clients.consu > >> mer.internals.ConsumerCoordinator.onJoinPrepare(ConsumerCoor > >> dinator.java:396) > >> > >> at org.apache.kafka.clients.consu > >> mer.internals.AbstractCoordinator.joinGroupIfNeeded(Abstract > >> Coordinator.java:329) > >> > >> at org.apache.kafka.clients.consu > >> mer.internals.AbstractCoordinator.ensureActiveGroup(Abstract > >> Coordinator.java:303) > >> > >> at org.apache.kafka.clients.consu > >> mer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:286) > >> > >> at org.apache.kafka.clients.consu > >> mer.KafkaConsumer.pollOnce(KafkaConsumer.java:1030) > >> > >> at org.apache.kafka.clients.consu > >> mer.KafkaConsumer.poll(KafkaConsumer.java:995) > >> > >> at org.apache.kafka.streams.proce > >> ssor.internals.StreamThread.runLoop(StreamThread.java:582) > >> > >> ... 1 more > >> > >> Caused by: org.apache.kafka.clients.consumer.CommitFailedException: > >> Commit cannot be completed since the group has already rebalanced and > >> assigned the partitions to another member. This means that the time > between > >> subsequent calls to poll() was longer than the configured > >> max.poll.interval.ms, which typically implies that the poll loop is > >> spending too much time message processing. You can address this either > by > >> increasing the session timeout or by reducing the maximum size of > batches > >> returned in poll() with max.poll.records. > >> > >> at org.apache.kafka.clients.consu > >> mer.internals.ConsumerCoordinator.sendOffsetCommitRequest(Co > >> nsumerCoordinator.java:698) > >> > >> at org.apache.kafka.clients.consu > >> mer.internals.ConsumerCoordinator.commitOffsetsSync(Consumer > >> Coordinator.java:577) > >> > >> at org.apache.kafka.clients.consu > >> mer.KafkaConsumer.commitSync(KafkaConsumer.java:1125) > >> > >> at org.apache.kafka.streams.proce > >> ssor.internals.StreamTask.commitOffsets(StreamTask.java:296) > >> > >> at org.apache.kafka.streams.proce > >> ssor.internals.StreamThread$3.apply(StreamThread.java:535) > >> > >> at org.apache.kafka.streams.proce > >> ssor.internals.StreamThread.performOnAllTasks(StreamThread.java:503) > >> > >> at org.apache.kafka.streams.proce > >> ssor.internals.StreamThread.commitOffsets(StreamThread.java:531) > >> > >> at org.apache.kafka.streams.proce > >> ssor.internals.StreamThread.suspendTasksAndState(StreamThread.java:480) > >> > >> ... 10 more > >> > >> > >> > >> 2017-05-03 16:53:57 WARN StreamThread:1184 - Could not create task > 1_38. > >> Will retry. > >> > >> org.apache.kafka.streams.errors.LockException: task [1_38] Failed to > lock > >> the state directory: /data/streampoc/LIC2-5/1_38 > >> > >> at org.apache.kafka.streams.proce > >> ssor.internals.ProcessorStateManager.<init>(ProcessorStateMa > >> nager.java:102) > >> > >> at org.apache.kafka.streams.proce > >> ssor.internals.AbstractTask.<init>(AbstractTask.java:73) > >> > >> at org.apache.kafka.streams.proce > >> ssor.internals.StreamTask.<init>(StreamTask.java:108) > >> > >> at org.apache.kafka.streams.proce > >> ssor.internals.StreamThread.createStreamTask(StreamThread.java:834) > >> > >> at org.apache.kafka.streams.proce > >> ssor.internals.StreamThread$TaskCreator.createTask( > StreamThread.java:1207) > >> > >> at org.apache.kafka.streams.proce > >> ssor.internals.StreamThread$AbstractTaskCreator.retryWithBac > >> koff(StreamThread.java:1180) > >> > >> at org.apache.kafka.streams.proce > >> ssor.internals.StreamThread.addStreamTasks(StreamThread.java:937) > >> > >> at org.apache.kafka.streams.proce > >> ssor.internals.StreamThread.access$500(StreamThread.java:69) > >> > >> at org.apache.kafka.streams.proce > >> ssor.internals.StreamThread$1.onPartitionsAssigned( > StreamThread.java:236) > >> > >> > >> Regards, > >> > >> -Sameer. > >> > > > >