Hi,
I'm facing a strange issue in my Kafka cluster. Could anybody please help me
with it. The issue is as follows:
We have a 3 node kafka cluster. We installed the zookeeper separately and have
pointed the brokers to it. The zookeeper is also 3 node, but for our POC setup,
the zookeeper nodes are on the same machines as the Kafka brokers.
While receiving messages from an existing topic using a new groupId, 2 of the
brokers crashed with same FATAL errors:
--------------------------------------------------------
<<<<<<<<<<<<<---- [server 2 logs] ---->>>>>>>>>>>>>>>
[2016-06-21 23:09:14,697] INFO [GroupCoordinator 1]: Stabilized group
pocTestNew11 generation 1 (kafka.coordinator.Gro
upCoordinator)
[2016-06-21 23:09:15,006] INFO [GroupCoordinator 1]: Assignment received from
leader for group pocTestNew11 for genera
tion 1 (kafka.coordinator.GroupCoordinator)
[2016-06-21 23:09:20,335] FATAL [Replica Manager on Broker 1]: Halting due to
unrecoverable I/O error while handling p
roduce request: (kafka.server.ReplicaManager)
kafka.common.KafkaStorageException: I/O exception in append to log
'__consumer_offsets-4'
at kafka.log.Log.append(Log.scala:318)
at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:442)
at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:428)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:268)
at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:428)
at
kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:401)
at
kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:386)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at scala.collection.immutable.Map$Map1.foreach(Map.scala:116)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at
kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:386)
at kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:322)
at
kafka.coordinator.GroupMetadataManager.store(GroupMetadataManager.scala:228)
at
kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:429)
at
kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:429)
at scala.Option.foreach(Option.scala:257)
at
kafka.coordinator.GroupCoordinator.handleCommitOffsets(GroupCoordinator.scala:429)
at kafka.server.KafkaApis.handleOffsetCommitRequest(KafkaApis.scala:280)
at kafka.server.KafkaApis.handle(KafkaApis.scala:76)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.FileNotFoundException:
/tmp/kafka-logs/__consumer_offsets-4/00000000000000000000.index (No such file
or directory)
at java.io.RandomAccessFile.open0(Native Method)
at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243)
at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:277)
at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:276)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
at kafka.log.OffsetIndex.resize(OffsetIndex.scala:276)
at
kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(OffsetIndex.scala:265)
at
kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
at
kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
at kafka.log.OffsetIndex.trimToValidSize(OffsetIndex.scala:264)
at kafka.log.Log.roll(Log.scala:627)
at kafka.log.Log.maybeRoll(Log.scala:602)
at kafka.log.Log.append(Log.scala:357)
----------------------------------------------
<<<<<<<<<<<<<---- [server 3 logs] ---->>>>>>>>>>>>>>>
[2016-06-21 23:08:49,796] FATAL [ReplicaFetcherThread-0-0], Disk error while
replicating data. (kafka.server.ReplicaFe
tcherThread)
kafka.common.KafkaStorageException: I/O exception in append to log
'__consumer_offsets-4'
at kafka.log.Log.append(Log.scala:318)
at
kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:113)
at
kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:42)
at
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.
apply(AbstractFetcherThread.scala:138)
at
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.
apply(AbstractFetcherThread.scala:122)
at scala.Option.foreach(Option.scala:257)
at
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFet
cherThread.scala:122)
at
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:120)
at
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
at
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply$mcV$sp(AbstractFetcherThread.scala:120)
at
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:120)
at
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:120)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
at
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:118)
at
kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:93)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
Caused by: java.io.FileNotFoundException:
/tmp/kafka-logs/__consumer_offsets-4/00000000000000000000.index (No such file
or directory)
at java.io.RandomAccessFile.open0(Native Method)
at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243)
at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:277)
at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:276)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
at kafka.log.OffsetIndex.resize(OffsetIndex.scala:276)
at
kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(OffsetIndex.scala:265)
at
kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
at
kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
at kafka.log.OffsetIndex.trimToValidSize(OffsetIndex.scala:264)
at kafka.log.Log.roll(Log.scala:627)
at kafka.log.Log.maybeRoll(Log.scala:602)
at kafka.log.Log.append(Log.scala:357)
... 19 more
For the topic "__consumer_offsets" which is used to commit consumer offsets the
default number of partitions is 50 and the replication factor is 3.
So ideally all the 3 brokers should have logs for all partitions for
"__consumer_offsets".
I checked the "/temp/kafka-logs" directory for each server and except for the
broker 1, the other 2 brokers (server 2 and 3) do not contain replicas for all
the partitions for "__consumer_offsets". There are log directories missing for
many partitions for "__consumer_offsets" on brokers 2 and 3 (including
partition 4 which resulted in the above crash).
What could be the cause for this crash. Is there any mis-configuration for the
broker that can cause this?
Regards,
Rahul Misra
Technical Lead
Altisource(tm)
Mobile: 9886141541 | Ext: 298269
[email protected]<mailto:[email protected]> |
www.Altisource.com<http://www.altisource.com/>
This email message and any attachments are intended solely for the use of the
addressee. If you are not the intended recipient, you are prohibited from
reading, disclosing, reproducing, distributing, disseminating or otherwise
using this transmission. If you have received this message in error, please
promptly notify the sender by reply email and immediately delete this message
from your system. This message and any attachments may contain information that
is confidential, privileged or exempt from disclosure. Delivery of this message
to any person other than the intended recipient is not intended to waive any
right or privilege. Message transmission is not guaranteed to be secure or free
of software viruses.
***********************************************************************************************************************