Thanks. I managed to get a cpu dump from staging.

The output:
THREAD START (obj=50000427, id = 200004, name="RMI TCP Accept-0",
group="system")
THREAD START (obj=50000427, id = 200001, name="main", group="main")
THREAD START (obj=50000427, id = 200005, name="SensorExpiryThread",
group="main")
THREAD START (obj=500008e6, id = 200006,
name="ThrottledRequestReaper-Fetch", group="main")
THREAD START (obj=500008e6, id = 200007,
name="ThrottledRequestReaper-Produce", group="main")
THREAD START (obj=50000914, id = 200008,
name="ZkClient-EventThread-18-zookeeper:2181", group="main")
THREAD START (obj=500008e6, id = 200009, name="main-SendThread()",
group="main")
THREAD START (obj=50000950, id = 200010, name="main-EventThread",
group="main")
THREAD START (obj=50000427, id = 200011, name="pool-3-thread-1",
group="main")
THREAD END (id = 200011)
THREAD START (obj=50000427, id = 200012,
name="metrics-meter-tick-thread-1", group="main")
THREAD START (obj=50000427, id = 200014, name="kafka-scheduler-0",
group="main")
THREAD START (obj=50000427, id = 200013, name="kafka-scheduler-1",
group="main")
THREAD START (obj=50000427, id = 200015, name="kafka-scheduler-2",
group="main")
THREAD START (obj=50000c33, id = 200016, name="kafka-log-cleaner-thread-0",
group="main")
THREAD START (obj=50000427, id = 200017,
name="kafka-network-thread-2-PLAINTEXT-0", group="main")
THREAD START (obj=50000427, id = 200018,
name="kafka-network-thread-2-PLAINTEXT-1", group="main")
THREAD START (obj=50000427, id = 200019,
name="kafka-network-thread-2-PLAINTEXT-2", group="main")
THREAD START (obj=50000427, id = 200020,
name="kafka-socket-acceptor-PLAINTEXT-9092", group="main")
THREAD START (obj=500008e6, id = 200021, name="ExpirationReaper-2",
group="main")
THREAD START (obj=500008e6, id = 200022, name="ExpirationReaper-2",
group="main")
THREAD START (obj=50000427, id = 200023,
name="metrics-meter-tick-thread-2", group="main")
THREAD START (obj=50000427, id = 200024, name="kafka-scheduler-3",
group="main")
THREAD START (obj=50000427, id = 200025, name="kafka-scheduler-4",
group="main")
THREAD START (obj=50000427, id = 200026, name="kafka-scheduler-5",
group="main")
THREAD START (obj=50000427, id = 200027, name="kafka-scheduler-6",
group="main")
THREAD START (obj=500008e6, id = 200028, name="ExpirationReaper-2",
group="main")
THREAD START (obj=500008e6, id = 200029, name="ExpirationReaper-2",
group="main")
THREAD START (obj=500008e6, id = 200030, name="ExpirationReaper-2",
group="main")
THREAD START (obj=50000427, id = 200031, name="group-metadata-manager-0",
group="main")
THREAD START (obj=50000427, id = 200032, name="kafka-request-handler-0",
group="main")
THREAD START (obj=50000427, id = 200037, name="kafka-request-handler-5",
group="main")
THREAD START (obj=50000427, id = 200036, name="kafka-request-handler-4",
group="main")
THREAD START (obj=50000427, id = 200035, name="kafka-request-handler-3",
group="main")
THREAD START (obj=50000427, id = 200034, name="kafka-request-handler-2",
group="main")
THREAD START (obj=50000427, id = 200033, name="kafka-request-handler-1",
group="main")
THREAD START (obj=50000427, id = 200038, name="kafka-request-handler-6",
group="main")
THREAD START (obj=50000427, id = 200039, name="kafka-request-handler-7",
group="main")
THREAD START (obj=50000427, id = 200040, name="kafka-scheduler-7",
group="main")
THREAD START (obj=50000427, id = 200041, name="kafka-scheduler-8",
group="main")
THREAD START (obj=50000ee2, id = 200042, name="ReplicaFetcherThread-0-0",
group="main")
THREAD START (obj=50000ee2, id = 200043, name="ReplicaFetcherThread-0-1",
group="main")
THREAD START (obj=50000427, id = 200044, name="kafka-scheduler-9",
group="main")
THREAD START (obj=50000427, id = 200045, name="executor-Fetch",
group="main")
TRACE 300920:
sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown line)
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
TRACE 300518:
java.net.PlainSocketImpl.socketAccept(PlainSocketImpl.java:Unknown line)
java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
java.net.ServerSocket.implAccept(ServerSocket.java:545)
java.net.ServerSocket.accept(ServerSocket.java:513)
TRACE 300940:
sun.nio.ch.FileDispatcherImpl.write0(FileDispatcherImpl.java:Unknown line)
sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
sun.nio.ch.IOUtil.write(IOUtil.java:65)
TRACE 301003:
org.xerial.snappy.SnappyNative.rawUncompress(SnappyNative.java:Unknown line)
org.xerial.snappy.Snappy.rawUncompress(Snappy.java:474)
org.xerial.snappy.Snappy.uncompress(Snappy.java:513)
org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:147)
TRACE 300979:
sun.nio.ch.FileDispatcherImpl.pread0(FileDispatcherImpl.java:Unknown line)
sun.nio.ch.FileDispatcherImpl.pread(FileDispatcherImpl.java:52)
sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:220)
sun.nio.ch.IOUtil.read(IOUtil.java:197)
TRACE 301630:
sun.nio.ch.EPollArrayWrapper.epollCtl(EPollArrayWrapper.java:Unknown line)
sun.nio.ch.EPollArrayWrapper.updateRegistrations(EPollArrayWrapper.java:299)
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:268)
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
TRACE 301259:
sun.misc.Unsafe.unpark(Unsafe.java:Unknown line)
java.util.concurrent.locks.LockSupport.unpark(LockSupport.java:141)
java.util.concurrent.locks.AbstractQueuedSynchronizer.unparkSuccessor(AbstractQueuedSynchronizer.java:662)
java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1264)
TRACE 301559:
sun.nio.ch.FileDispatcherImpl.read0(FileDispatcherImpl.java:Unknown line)
sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
sun.nio.ch.IOUtil.read(IOUtil.java:197)
TRACE 300590:
java.lang.ClassLoader.defineClass1(ClassLoader.java:Unknown line)
java.lang.ClassLoader.defineClass(ClassLoader.java:763)
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
TRACE 301643:
scala.Tuple2.equals(Tuple2.scala:20)
java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:940)
kafka.utils.Pool.get(Pool.scala:69)
kafka.server.ReplicaManager.getPartition(ReplicaManager.scala:280)
TRACE 300592:
java.util.zip.ZipFile.read(ZipFile.java:Unknown line)
java.util.zip.ZipFile.access$1400(ZipFile.java:60)
java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:717)
java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:419)
TRACE 301018:
kafka.utils.CoreUtils$.crc32(CoreUtils.scala:148)
kafka.message.Message.computeChecksum(Message.scala:216)
kafka.message.Message.isValid(Message.scala:226)
kafka.message.Message.ensureValid(Message.scala:232)
TRACE 301561:
java.io.FileDescriptor.sync(FileDescriptor.java:Unknown line)
kafka.server.OffsetCheckpoint.liftedTree1$1(OffsetCheckpoint.scala:62)
kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:49)
kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(ReplicaManager.scala:945)
TRACE 301422:
java.util.Arrays.copyOf(Arrays.java:3332)
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421)
CPU SAMPLES BEGIN (total = 699214) Thu Mar 23 12:41:17 2017
rank   self  accum   count trace method
   1 86.46% 86.46%  604544 300920 sun.nio.ch.EPollArrayWrapper.epollWait
   2 12.62% 99.08%   88254 300518 java.net.PlainSocketImpl.socketAccept
   3  0.11% 99.19%     759 300940 sun.nio.ch.FileDispatcherImpl.write0
   4  0.04% 99.23%     253 301003
org.xerial.snappy.SnappyNative.rawUncompress
   5  0.03% 99.26%     231 300979 sun.nio.ch.FileDispatcherImpl.pread0
   6  0.03% 99.29%     220 301630 sun.nio.ch.EPollArrayWrapper.epollCtl
   7  0.03% 99.32%     219 301259 sun.misc.Unsafe.unpark
   8  0.02% 99.34%     145 301559 sun.nio.ch.FileDispatcherImpl.read0
   9  0.01% 99.36%      89 300590 java.lang.ClassLoader.defineClass1
  10  0.01% 99.37%      87 301643 scala.Tuple2.equals
  11  0.01% 99.38%      79 300592 java.util.zip.ZipFile.read
  12  0.01% 99.39%      79 301018 kafka.utils.CoreUtils$.crc32
  13  0.01% 99.40%      78 301561 java.io.FileDescriptor.sync
  14  0.01% 99.41%      72 301422 java.util.Arrays.copyOf
CPU SAMPLES END

It seems like the constant disconnects is far bigger then the 10 minutes
default. I suspect this has something to do with double connects, which I'm
not sure to get around.

On Thu, Mar 23, 2017 at 11:46 AM, Manikumar <manikumar.re...@gmail.com>
wrote:

> 1. may be you can monitor thread wise cpu usage and correlate with thread
> dump
>     to identify the bottleneck
> 2. Broker config property connections.max.idle.ms is used to close
> idle connections.
>     default is 10min.
>
> On Thu, Mar 23, 2017 at 3:55 PM, Paul van der Linden <p...@sportr.co.uk>
> wrote:
>
> > Hi,
> >
> > I deployed Kafka about a week ago, but there are a few problems with how
> > Kafka behaves.
> > The first is the surprisingly high resource usage, one this the memory
> > (1.5-2 GB for each broker, 3 brokers) although this might be normal. The
> > other one is the cpu usage, which starts with 20% minimum on each broker,
> > which I find strange with the current throughput (which is < 1 msg/s).
> >
> > This might has something to do with something else which I find strange,
> > Kafka disconnects clients about every 10-20 minutes per broker. It might
> > have something to do with the configuration: Deployed in kubernetes,
> > bootstrapping with a single dns name (which is backed by all alive kafka
> > brokers), and then every broker has a separate dns address which is used
> > after the bootstrap. This means that a client is connected twice to one
> of
> > the brokers. The reason for the bootstrap dns name is to make sure I
> don't
> > have to update all clients to include other brokers.
> >
> > Any advice on how to solve these 2 problems?
> >
> > Thanks,
> > Paul
> >
> > On Tue, Mar 21, 2017 at 10:30 AM, Paul van der Linden <p...@sportr.co.uk
> >
> > wrote:
> >
> > > Hi,
> > >
> > > I deployed Kafka about a week ago, but there are a few problems with
> how
> > > Kafka behaves.
> > > The first is the surprisingly high resource usage, one this the memory
> > > (1.5-2 GB for each broker, 3 brokers) although this might be normal.
> The
> > > other one is the cpu usage, which starts with 20% minimum on each
> broker,
> > > which I find strange with the current throughput (which is < 1 msg/s).
> > >
> > > This might has something to do with something else which I find
> strange,
> > > Kafka disconnects clients about every 10-20 minutes per broker. It
> might
> > > have something to do with the configuration: Deployed in kubernetes,
> > > bootstrapping with a single dns name (which is backed by all alive
> kafka
> > > brokers), and then every broker has a separate dns address which is
> used
> > > after the bootstrap. This means that a client is connected twice to one
> > of
> > > the brokers. The reason for the bootstrap dns name is to make sure I
> > don't
> > > have to update all clients to include other brokers.
> > >
> > > Any advice on how to solve these 2 problems?
> > >
> > > Thanks,
> > > Paul
> > >
> >
>

Reply via email to