Found it. SELinux was running on the box but not showing any events in the 
audit log, however, the kernel log is another story entirely:

INFO: task java:4693 blocked for more than 120 seconds.
      Not tainted 2.6.32-504.8.1.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
java          D 0000000000000007     0  4693   7803 0x00000080
 ffff880c3094bd18 0000000000000082 0000000000000000 000000009378a289
 ffff880c3094bc78 ffffffff81236b72 0001920ae9298ce6 ffffffff8122e6af
 ffff880c3094bd68 000000011a4e9ba0 ffff880c330b9098 ffff880c3094bfd8
Call Trace:
 [<ffffffff81236b72>] ? selinux_inode_permission+0x72/0xb0
 [<ffffffff8122e6af>] ? security_inode_permission+0x1f/0x30
 [<ffffffff8152b366>] __mutex_lock_slowpath+0x96/0x210
 [<ffffffff8152ae8b>] mutex_lock+0x2b/0x50
 [<ffffffff811a0cd6>] do_filp_open+0x2d6/0xd20
 [<ffffffff81298fba>] ? strncpy_from_user+0x4a/0x90
 [<ffffffff811adfb2>] ? alloc_fd+0x92/0x160
 [<ffffffff8118ae17>] do_sys_open+0x67/0x130
 [<ffffffff8118af20>] sys_open+0x20/0x30
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

So it wasn't the array, or Kafka but rather an interaction between SELinux and 
XFS (at least, that is my initial thought). SELinux is now off, we'll likely 
file a bug against RHEL/CentOS after more investigating.

Thanks!
-Jeff

-----Original Message-----
From: Jeff Field 
Sent: Thursday, May 14, 2015 11:17 AM
To: 'users@kafka.apache.org'
Subject: RE: OutOfMemory error on broker when rolling logs

Replying to Jay's message, though some mailing list snafu made me not able to 
see it except through the archive. Apologies if it breaks threading:

I've confirmed that the Kafka process itself is running 64-bit, the information 
is included below. At this point I'm thinking it could be ulimit so we are 
running the tests again and keeping a close eye on that.

Thanks,
-Jeff

64-bit confirmation:
[root@kafka03 bin]# ./jinfo -sysprops 3837 | grep sun.arch.data.model Attaching 
to process ID 3837, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 24.65-b04
sun.arch.data.model = 64

kafka02:
Attaching to process ID 8632, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 24.65-b04
sun.arch.data.model = 64

kafka01:
Attaching to process ID 4611, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 24.65-b04
sun.arch.data.model = 64

java -version
Kafka 03: 
java version "1.7.0_67"
Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit 
Server VM (build 24.65-b04, mixed mode)

Kafka02:
java version "1.7.0_67"
Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit 
Server VM (build 24.65-b04, mixed mode)

Kafka01:
java version "1.7.0_67"
Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit 
Server VM (build 24.65-b04, mixed mode)

Facter (OS architecture/kernel):
Kafka 03:
architecture => x86_64
hardwareisa => x86_64
hardwaremodel => x86_64
kernelrelease => 2.6.32-504.8.1.el6.x86_64

Kafka 02:
architecture => x86_64
hardwareisa => x86_64
hardwaremodel => x86_64
kernelrelease => 2.6.32-504.8.1.el6.x86_64

Kafka 01:
architecture => x86_64
hardwareisa => x86_64
hardwaremodel => x86_64
kernelrelease => 2.6.32-504.8.1.el6.x86_64

On Wed, May 13, 2015 at 5:00 PM, Jay Kreps <jay.kr...@gmail.com> wrote:

I think "java.lang.OutOfMemoryError: Map failed" has usually been "out of 
address space for mmap" if memory serves.

If you sum the length of all .index files while the service is running (not 
after stopped), do they sum to something really close to 2GB? If so it is 
likely either that the OS/arch is 32 bit (which on slack you said it
wasn't) or possibly the jvm is in 32 bit mode?

If you want to debug easiest test would be a simple program that did something 
like:

public void static main(String[] args) throws Exception {
  RandomAccessFile raf = new RandomAccessFile("test-file-1", "rw");
  RandomAccessFile raf2 = new RandomAccessFile("test-file-2", "rw");
  raf1.setLength(2*1024*1024*1024);
  raf2.setLength(2*1024*1024*1024);
  MappedByteBuffer b1 = raf1.getChannel.map(FileChannel.MapMode.READ_WRITE,
0, 2*1024*1024*1024);
  MappedByteBuffer b2 = raf2.getChannel.map(FileChannel.MapMode.READ_WRITE,
0, 2*1024*1024*1024);
}

If you compile this and run with the same options you're running kafka with it 
should succeed but if it fails with the same error that is the address space 
limit for 32 bits kicking in.

-Jay


From: Jeff Field
Sent: Wednesday, May 13, 2015 4:25 PM
To: users@kafka.apache.org
Subject: OutOfMemory error on broker when rolling logs

Hello,
We are doing a Kafka POC on our CDH cluster. We are running 3 brokers with 24TB 
(48TB Raw) of available RAID10 storage (XFS filesystem mounted with 
nobarrier/largeio) (HP Smart Array P420i for the controller, latest firmware) 
and 48GB of RAM. The broker is running with "-Xmx4G -Xms4G -server 
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled 
-XX:+CMSScavengeBeforeRemark -XX:+DisableExplicitGC". This is on RHEL 6.6 with 
the 2.6.32-504.8.1.el6.x86_64 kernel. JDK is jdk1.7.0_67 64-bit. We were using 
the 1.2.0 version of the Cloudera Kafka 0.8.2.0 build. We are upgrading to 
1.3.0 after the RAID testing, but none of the fixes they included in 1.3.0 seem 
to be related to what we're seeing.

We are using a custom producer to push copies of real messages from our 
existing messaging system onto Kafka in order to test ingestion rates and 
compression ratios. After a couple of hours (during which about 4.3 billion, 
~2.2 terabytes before replication), one of our brokers will fail with an I/O 
error (2 slightly different ones so far) followed by a memory error. We're 
currently doing stress testing on the arrays (write/verify with IOzone set for 
24 threads), but assuming the tests don't find anything on IO, what could cause 
this? Errors are included below.

Thanks,
-Jeff

Occurrence 1:
2015-05-12 03:55:08,291 FATAL kafka.server.KafkaApis: [KafkaApi-834] Halting 
due to unrecoverable I/O error while handling produce request:
kafka.common.KafkaStorageException: I/O exception in append to log 
'TEST_TOPIC-1'
        at kafka.log.Log.append(Log.scala:266)
        at 
kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:379)
        at 
kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:365)
        at kafka.utils.Utils$.inLock(Utils.scala:561)
        at kafka.utils.Utils$.inReadLock(Utils.scala:567)
        at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:365)
        at 
kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:291)
       at 
kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:282)
        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
        at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
        at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
        at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
        at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
        at scala.collection.AbstractTraversable.map(Traversable.scala:105)
        at kafka.server.KafkaApis.appendToLocalLog(KafkaApis.scala:282)
        at 
kafka.server.KafkaApis.handleProducerOrOffsetCommitRequest(KafkaApis.scala:204)
        at kafka.server.KafkaApis.handle(KafkaApis.scala:59)
        at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:59)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Map failed
        at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:888)
        at kafka.log.OffsetIndex.<init>(OffsetIndex.scala:74)
        at kafka.log.LogSegment.<init>(LogSegment.scala:57)
        at kafka.log.Log.roll(Log.scala:565)
        at kafka.log.Log.maybeRoll(Log.scala:539)
        at kafka.log.Log.append(Log.scala:306)
        ... 21 more
Caused by: java.lang.OutOfMemoryError: Map failed
        at sun.nio.ch.FileChannelImpl.map0(Native Method)
        at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:885)
        ... 26 more

Occurrence 2:
2015-05-12 20:08:15,052 FATAL kafka.server.KafkaApis: [KafkaApi-835] Halting 
due to unrecoverable I/O error while handling produce request:
kafka.common.KafkaStorageException: I/O exception in append to log 
'TEST_TOPIC-23'
        at kafka.log.Log.append(Log.scala:266)
        at 
kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:379)
        at 
kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:365)
        at kafka.utils.Utils$.inLock(Utils.scala:561)
        at kafka.utils.Utils$.inReadLock(Utils.scala:567)
        at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:365)
        at 
kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:291)
        at 
kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:282)
        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
        at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
        at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
        at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
        at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
        at scala.collection.AbstractTraversable.map(Traversable.scala:105)
        at kafka.server.KafkaApis.appendToLocalLog(KafkaApis.scala:282)
        at 
kafka.server.KafkaApis.handleProducerOrOffsetCommitRequest(KafkaApis.scala:204)
        at kafka.server.KafkaApis.handle(KafkaApis.scala:59)
        at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:59)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Map failed
        at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:888)
        at kafka.log.OffsetIndex.<init>(OffsetIndex.scala:74)
        at kafka.log.LogSegment.<init>(LogSegment.scala:57)
        at kafka.log.Log.roll(Log.scala:565)
        at kafka.log.Log.maybeRoll(Log.scala:539)
        at kafka.log.Log.append(Log.scala:306)
        ... 21 more
Caused by: java.lang.OutOfMemoryError: Map failed
        at sun.nio.ch.FileChannelImpl.map0(Native Method)
        at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:885)
        ... 26 more

Occurrence 3:
2015-05-13 01:11:14,626 FATAL kafka.server.ReplicaFetcherThread: 
[ReplicaFetcherThread-0-835], Disk error while replicating data.
kafka.common.KafkaStorageException: I/O exception in append to log 
'TEST_TOPIC-17'
        at kafka.log.Log.append(Log.scala:266)
        at 
kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:54)
        at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfun$apply$mcV$sp$2.apply(AbstractFetcherThread.scala:128)
        at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfun$apply$mcV$sp$2.apply(AbstractFetcherThread.scala:109)
        at 
scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224)
        at 
scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)
        at 
scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)
        at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply$mcV$sp(AbstractFetcherThread.scala:109)
        at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(AbstractFetcherThread.scala:109)
        at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(AbstractFetcherThread.scala:109)
        at kafka.utils.Utils$.inLock(Utils.scala:561)
        at 
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:108)
        at 
kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:86)
        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
Caused by: java.io.IOException: Map failed
        at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:888)
        at kafka.log.OffsetIndex.<init>(OffsetIndex.scala:74)
        at kafka.log.LogSegment.<init>(LogSegment.scala:57)
        at kafka.log.Log.roll(Log.scala:565)
        at kafka.log.Log.maybeRoll(Log.scala:539)
        at kafka.log.Log.append(Log.scala:306)
        ... 13 more
Caused by: java.lang.OutOfMemoryError: Map failed
        at sun.nio.ch.FileChannelImpl.map0(Native Method)
        at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:885)
        ... 18 more


Reply via email to