Hello,
We are doing a Kafka POC on our CDH cluster. We are running 3 brokers with 24TB 
(48TB Raw) of available RAID10 storage (XFS filesystem mounted with 
nobarrier/largeio) (HP Smart Array P420i for the controller, latest firmware) 
and 48GB of RAM. The broker is running with "-Xmx4G -Xms4G -server 
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled 
-XX:+CMSScavengeBeforeRemark -XX:+DisableExplicitGC". This is on RHEL 6.6 with 
the 2.6.32-504.8.1.el6.x86_64 kernel. JDK is jdk1.7.0_67 64-bit. We were using 
the 1.2.0 version of the Cloudera Kafka 0.8.2.0 build. We are upgrading to 
1.3.0 after the RAID testing, but none of the fixes they included in 1.3.0 seem 
to be related to what we're seeing.

We are using a custom producer to push copies of real messages from our 
existing messaging system onto Kafka in order to test ingestion rates and 
compression ratios. After a couple of hours (during which about 4.3 billion, 
~2.2 terabytes before replication), one of our brokers will fail with an I/O 
error (2 slightly different ones so far) followed by a memory error. We're 
currently doing stress testing on the arrays (write/verify with IOzone set for 
24 threads), but assuming the tests don't find anything on IO, what could cause 
this? Errors are included below.

Thanks,
-Jeff

Occurrence 1:
2015-05-12 03:55:08,291 FATAL kafka.server.KafkaApis: [KafkaApi-834] Halting 
due to unrecoverable I/O error while handling produce request:
kafka.common.KafkaStorageException: I/O exception in append to log 
'TEST_TOPIC-1'
        at kafka.log.Log.append(Log.scala:266)
        at 
kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:379)
        at 
kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:365)
        at kafka.utils.Utils$.inLock(Utils.scala:561)
        at kafka.utils.Utils$.inReadLock(Utils.scala:567)
        at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:365)
        at 
kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:291)
       at 
kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:282)
        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
        at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
        at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
        at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
        at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
        at scala.collection.AbstractTraversable.map(Traversable.scala:105)
        at kafka.server.KafkaApis.appendToLocalLog(KafkaApis.scala:282)
        at 
kafka.server.KafkaApis.handleProducerOrOffsetCommitRequest(KafkaApis.scala:204)
        at kafka.server.KafkaApis.handle(KafkaApis.scala:59)
        at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:59)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Map failed
        at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:888)
        at kafka.log.OffsetIndex.<init>(OffsetIndex.scala:74)
        at kafka.log.LogSegment.<init>(LogSegment.scala:57)
        at kafka.log.Log.roll(Log.scala:565)
        at kafka.log.Log.maybeRoll(Log.scala:539)
        at kafka.log.Log.append(Log.scala:306)
        ... 21 more
Caused by: java.lang.OutOfMemoryError: Map failed
        at sun.nio.ch.FileChannelImpl.map0(Native Method)
        at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:885)
        ... 26 more

Occurrence 2:

2015-05-12 20:08:15,052 FATAL kafka.server.KafkaApis: [KafkaApi-835] Halting 
due to unrecoverable I/O error while handling produce request:

kafka.common.KafkaStorageException: I/O exception in append to log 
'TEST_TOPIC-23'

        at kafka.log.Log.append(Log.scala:266)

        at 
kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:379)

        at 
kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:365)

        at kafka.utils.Utils$.inLock(Utils.scala:561)

        at kafka.utils.Utils$.inReadLock(Utils.scala:567)

        at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:365)

        at 
kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:291)

        at 
kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:282)

        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

        at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)

        at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)

        at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)

        at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)

        at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)

        at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)

        at scala.collection.AbstractTraversable.map(Traversable.scala:105)

        at kafka.server.KafkaApis.appendToLocalLog(KafkaApis.scala:282)

        at 
kafka.server.KafkaApis.handleProducerOrOffsetCommitRequest(KafkaApis.scala:204)

        at kafka.server.KafkaApis.handle(KafkaApis.scala:59)

        at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:59)

        at java.lang.Thread.run(Thread.java:745)

Caused by: java.io.IOException: Map failed

        at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:888)

        at kafka.log.OffsetIndex.<init>(OffsetIndex.scala:74)

        at kafka.log.LogSegment.<init>(LogSegment.scala:57)

        at kafka.log.Log.roll(Log.scala:565)

        at kafka.log.Log.maybeRoll(Log.scala:539)

        at kafka.log.Log.append(Log.scala:306)

        ... 21 more

Caused by: java.lang.OutOfMemoryError: Map failed

        at sun.nio.ch.FileChannelImpl.map0(Native Method)

        at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:885)

        ... 26 more

Occurrence 3:

2015-05-13 01:11:14,626 FATAL kafka.server.ReplicaFetcherThread: 
[ReplicaFetcherThread-0-835], Disk error while replicating data.

kafka.common.KafkaStorageException: I/O exception in append to log 
'TEST_TOPIC-17'

        at kafka.log.Log.append(Log.scala:266)

        at 
kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:54)

        at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfun$apply$mcV$sp$2.apply(AbstractFetcherThread.scala:128)

        at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfun$apply$mcV$sp$2.apply(AbstractFetcherThread.scala:109)

        at 
scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224)

        at 
scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)

        at 
scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)

        at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply$mcV$sp(AbstractFetcherThread.scala:109)

        at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(AbstractFetcherThread.scala:109)

        at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(AbstractFetcherThread.scala:109)

        at kafka.utils.Utils$.inLock(Utils.scala:561)

        at 
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:108)

        at 
kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:86)

        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)

Caused by: java.io.IOException: Map failed

        at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:888)

        at kafka.log.OffsetIndex.<init>(OffsetIndex.scala:74)

        at kafka.log.LogSegment.<init>(LogSegment.scala:57)

        at kafka.log.Log.roll(Log.scala:565)

        at kafka.log.Log.maybeRoll(Log.scala:539)

        at kafka.log.Log.append(Log.scala:306)

        ... 13 more

Caused by: java.lang.OutOfMemoryError: Map failed

        at sun.nio.ch.FileChannelImpl.map0(Native Method)

        at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:885)

        ... 18 more


Reply via email to