Hi,

I'm facing a strange issue in my Kafka cluster. Could anybody please help me 
with it. The issue is as follows:

We have a 3 node kafka cluster. We installed the zookeeper separately and have 
pointed the brokers to it. The zookeeper is also 3 node, but for our POC setup, 
the zookeeper nodes are on the same machines as the Kafka brokers.

While receiving messages from an existing topic using a new groupId, 2 of the 
brokers crashed with same FATAL errors:

--------------------------------------------------------
<<<<<<<<<<<<<---- [server 2 logs] ---->>>>>>>>>>>>>>>

[2016-06-21 23:09:14,697] INFO [GroupCoordinator 1]: Stabilized group 
pocTestNew11 generation 1 (kafka.coordinator.Gro
upCoordinator)
[2016-06-21 23:09:15,006] INFO [GroupCoordinator 1]: Assignment received from 
leader for group pocTestNew11 for genera
tion 1 (kafka.coordinator.GroupCoordinator)
[2016-06-21 23:09:20,335] FATAL [Replica Manager on Broker 1]: Halting due to 
unrecoverable I/O error while handling p
roduce request:  (kafka.server.ReplicaManager)
kafka.common.KafkaStorageException: I/O exception in append to log 
'__consumer_offsets-4'
        at kafka.log.Log.append(Log.scala:318)
        at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:442)
        at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:428)
        at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
        at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:268)
        at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:428)
        at 
kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:401)
        at 
kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:386)
        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
        at scala.collection.immutable.Map$Map1.foreach(Map.scala:116)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
        at scala.collection.AbstractTraversable.map(Traversable.scala:104)
        at 
kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:386)
        at kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:322)
        at 
kafka.coordinator.GroupMetadataManager.store(GroupMetadataManager.scala:228)
        at 
kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:429)
        at 
kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:429)
        at scala.Option.foreach(Option.scala:257)
        at 
kafka.coordinator.GroupCoordinator.handleCommitOffsets(GroupCoordinator.scala:429)
        at kafka.server.KafkaApis.handleOffsetCommitRequest(KafkaApis.scala:280)
        at kafka.server.KafkaApis.handle(KafkaApis.scala:76)
        at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.FileNotFoundException: 
/tmp/kafka-logs/__consumer_offsets-4/00000000000000000000.index (No such file 
or directory)
        at java.io.RandomAccessFile.open0(Native Method)
        at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243)
        at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:277)
        at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:276)
        at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
        at kafka.log.OffsetIndex.resize(OffsetIndex.scala:276)
        at 
kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(OffsetIndex.scala:265)
        at 
kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
        at 
kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
        at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
        at kafka.log.OffsetIndex.trimToValidSize(OffsetIndex.scala:264)
        at kafka.log.Log.roll(Log.scala:627)
        at kafka.log.Log.maybeRoll(Log.scala:602)
        at kafka.log.Log.append(Log.scala:357)

----------------------------------------------
<<<<<<<<<<<<<---- [server 3 logs] ---->>>>>>>>>>>>>>>

[2016-06-21 23:08:49,796] FATAL [ReplicaFetcherThread-0-0], Disk error while 
replicating data. (kafka.server.ReplicaFe
tcherThread)
kafka.common.KafkaStorageException: I/O exception in append to log 
'__consumer_offsets-4'
        at kafka.log.Log.append(Log.scala:318)
        at 
kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:113)
        at 
kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:42)
        at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.
apply(AbstractFetcherThread.scala:138)
        at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.
apply(AbstractFetcherThread.scala:122)
        at scala.Option.foreach(Option.scala:257)
        at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFet
cherThread.scala:122)
        at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:120)
        at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
        at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
        at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
        at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
        at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
        at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply$mcV$sp(AbstractFetcherThread.scala:120)
        at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:120)
        at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:120)
        at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
        at 
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:118)
        at 
kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:93)
        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
Caused by: java.io.FileNotFoundException: 
/tmp/kafka-logs/__consumer_offsets-4/00000000000000000000.index (No such file 
or directory)
        at java.io.RandomAccessFile.open0(Native Method)
        at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243)
        at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:277)
        at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:276)
        at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
        at kafka.log.OffsetIndex.resize(OffsetIndex.scala:276)
        at 
kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(OffsetIndex.scala:265)
        at 
kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
        at 
kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
        at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
        at kafka.log.OffsetIndex.trimToValidSize(OffsetIndex.scala:264)
        at kafka.log.Log.roll(Log.scala:627)
        at kafka.log.Log.maybeRoll(Log.scala:602)
        at kafka.log.Log.append(Log.scala:357)
        ... 19 more



For the topic "__consumer_offsets" which is used to commit consumer offsets the 
default number of partitions is 50 and the replication factor is 3.
So ideally all the 3 brokers should have logs for all partitions for 
"__consumer_offsets".
I checked the "/temp/kafka-logs" directory for each server and except for the 
broker 1, the other 2 brokers (server 2 and 3) do not contain replicas for all 
the partitions for "__consumer_offsets". There are log directories missing for 
many partitions for "__consumer_offsets" on brokers 2 and 3 (including 
partition 4 which resulted in the above crash).

What could be the cause for this crash. Is there any mis-configuration for the 
broker that can cause this?

Regards,
Rahul Misra

Technical Lead
Altisource(tm)
Mobile: 9886141541 | Ext: 298269
rahul.mi...@altisource.com<mailto:rahul.mi...@altisource.com> | 
www.Altisource.com<http://www.altisource.com/>

This email message and any attachments are intended solely for the use of the 
addressee. If you are not the intended recipient, you are prohibited from 
reading, disclosing, reproducing, distributing, disseminating or otherwise 
using this transmission. If you have received this message in error, please 
promptly notify the sender by reply email and immediately delete this message 
from your system. This message and any attachments may contain information that 
is confidential, privileged or exempt from disclosure. Delivery of this message 
to any person other than the intended recipient is not intended to waive any 
right or privilege. Message transmission is not guaranteed to be secure or free 
of software viruses. 
***********************************************************************************************************************

Reply via email to