Thanks. I managed to get a cpu dump from staging. The output: THREAD START (obj=50000427, id = 200004, name="RMI TCP Accept-0", group="system") THREAD START (obj=50000427, id = 200001, name="main", group="main") THREAD START (obj=50000427, id = 200005, name="SensorExpiryThread", group="main") THREAD START (obj=500008e6, id = 200006, name="ThrottledRequestReaper-Fetch", group="main") THREAD START (obj=500008e6, id = 200007, name="ThrottledRequestReaper-Produce", group="main") THREAD START (obj=50000914, id = 200008, name="ZkClient-EventThread-18-zookeeper:2181", group="main") THREAD START (obj=500008e6, id = 200009, name="main-SendThread()", group="main") THREAD START (obj=50000950, id = 200010, name="main-EventThread", group="main") THREAD START (obj=50000427, id = 200011, name="pool-3-thread-1", group="main") THREAD END (id = 200011) THREAD START (obj=50000427, id = 200012, name="metrics-meter-tick-thread-1", group="main") THREAD START (obj=50000427, id = 200014, name="kafka-scheduler-0", group="main") THREAD START (obj=50000427, id = 200013, name="kafka-scheduler-1", group="main") THREAD START (obj=50000427, id = 200015, name="kafka-scheduler-2", group="main") THREAD START (obj=50000c33, id = 200016, name="kafka-log-cleaner-thread-0", group="main") THREAD START (obj=50000427, id = 200017, name="kafka-network-thread-2-PLAINTEXT-0", group="main") THREAD START (obj=50000427, id = 200018, name="kafka-network-thread-2-PLAINTEXT-1", group="main") THREAD START (obj=50000427, id = 200019, name="kafka-network-thread-2-PLAINTEXT-2", group="main") THREAD START (obj=50000427, id = 200020, name="kafka-socket-acceptor-PLAINTEXT-9092", group="main") THREAD START (obj=500008e6, id = 200021, name="ExpirationReaper-2", group="main") THREAD START (obj=500008e6, id = 200022, name="ExpirationReaper-2", group="main") THREAD START (obj=50000427, id = 200023, name="metrics-meter-tick-thread-2", group="main") THREAD START (obj=50000427, id = 200024, name="kafka-scheduler-3", group="main") THREAD START (obj=50000427, id = 200025, name="kafka-scheduler-4", group="main") THREAD START (obj=50000427, id = 200026, name="kafka-scheduler-5", group="main") THREAD START (obj=50000427, id = 200027, name="kafka-scheduler-6", group="main") THREAD START (obj=500008e6, id = 200028, name="ExpirationReaper-2", group="main") THREAD START (obj=500008e6, id = 200029, name="ExpirationReaper-2", group="main") THREAD START (obj=500008e6, id = 200030, name="ExpirationReaper-2", group="main") THREAD START (obj=50000427, id = 200031, name="group-metadata-manager-0", group="main") THREAD START (obj=50000427, id = 200032, name="kafka-request-handler-0", group="main") THREAD START (obj=50000427, id = 200037, name="kafka-request-handler-5", group="main") THREAD START (obj=50000427, id = 200036, name="kafka-request-handler-4", group="main") THREAD START (obj=50000427, id = 200035, name="kafka-request-handler-3", group="main") THREAD START (obj=50000427, id = 200034, name="kafka-request-handler-2", group="main") THREAD START (obj=50000427, id = 200033, name="kafka-request-handler-1", group="main") THREAD START (obj=50000427, id = 200038, name="kafka-request-handler-6", group="main") THREAD START (obj=50000427, id = 200039, name="kafka-request-handler-7", group="main") THREAD START (obj=50000427, id = 200040, name="kafka-scheduler-7", group="main") THREAD START (obj=50000427, id = 200041, name="kafka-scheduler-8", group="main") THREAD START (obj=50000ee2, id = 200042, name="ReplicaFetcherThread-0-0", group="main") THREAD START (obj=50000ee2, id = 200043, name="ReplicaFetcherThread-0-1", group="main") THREAD START (obj=50000427, id = 200044, name="kafka-scheduler-9", group="main") THREAD START (obj=50000427, id = 200045, name="executor-Fetch", group="main") TRACE 300920: sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown line) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) TRACE 300518: java.net.PlainSocketImpl.socketAccept(PlainSocketImpl.java:Unknown line) java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409) java.net.ServerSocket.implAccept(ServerSocket.java:545) java.net.ServerSocket.accept(ServerSocket.java:513) TRACE 300940: sun.nio.ch.FileDispatcherImpl.write0(FileDispatcherImpl.java:Unknown line) sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) sun.nio.ch.IOUtil.write(IOUtil.java:65) TRACE 301003: org.xerial.snappy.SnappyNative.rawUncompress(SnappyNative.java:Unknown line) org.xerial.snappy.Snappy.rawUncompress(Snappy.java:474) org.xerial.snappy.Snappy.uncompress(Snappy.java:513) org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:147) TRACE 300979: sun.nio.ch.FileDispatcherImpl.pread0(FileDispatcherImpl.java:Unknown line) sun.nio.ch.FileDispatcherImpl.pread(FileDispatcherImpl.java:52) sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:220) sun.nio.ch.IOUtil.read(IOUtil.java:197) TRACE 301630: sun.nio.ch.EPollArrayWrapper.epollCtl(EPollArrayWrapper.java:Unknown line) sun.nio.ch.EPollArrayWrapper.updateRegistrations(EPollArrayWrapper.java:299) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:268) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) TRACE 301259: sun.misc.Unsafe.unpark(Unsafe.java:Unknown line) java.util.concurrent.locks.LockSupport.unpark(LockSupport.java:141) java.util.concurrent.locks.AbstractQueuedSynchronizer.unparkSuccessor(AbstractQueuedSynchronizer.java:662) java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1264) TRACE 301559: sun.nio.ch.FileDispatcherImpl.read0(FileDispatcherImpl.java:Unknown line) sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) sun.nio.ch.IOUtil.read(IOUtil.java:197) TRACE 300590: java.lang.ClassLoader.defineClass1(ClassLoader.java:Unknown line) java.lang.ClassLoader.defineClass(ClassLoader.java:763) java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) java.net.URLClassLoader.defineClass(URLClassLoader.java:467) TRACE 301643: scala.Tuple2.equals(Tuple2.scala:20) java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:940) kafka.utils.Pool.get(Pool.scala:69) kafka.server.ReplicaManager.getPartition(ReplicaManager.scala:280) TRACE 300592: java.util.zip.ZipFile.read(ZipFile.java:Unknown line) java.util.zip.ZipFile.access$1400(ZipFile.java:60) java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:717) java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:419) TRACE 301018: kafka.utils.CoreUtils$.crc32(CoreUtils.scala:148) kafka.message.Message.computeChecksum(Message.scala:216) kafka.message.Message.isValid(Message.scala:226) kafka.message.Message.ensureValid(Message.scala:232) TRACE 301561: java.io.FileDescriptor.sync(FileDescriptor.java:Unknown line) kafka.server.OffsetCheckpoint.liftedTree1$1(OffsetCheckpoint.scala:62) kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:49) kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(ReplicaManager.scala:945) TRACE 301422: java.util.Arrays.copyOf(Arrays.java:3332) java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137) java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121) java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421) CPU SAMPLES BEGIN (total = 699214) Thu Mar 23 12:41:17 2017 rank self accum count trace method 1 86.46% 86.46% 604544 300920 sun.nio.ch.EPollArrayWrapper.epollWait 2 12.62% 99.08% 88254 300518 java.net.PlainSocketImpl.socketAccept 3 0.11% 99.19% 759 300940 sun.nio.ch.FileDispatcherImpl.write0 4 0.04% 99.23% 253 301003 org.xerial.snappy.SnappyNative.rawUncompress 5 0.03% 99.26% 231 300979 sun.nio.ch.FileDispatcherImpl.pread0 6 0.03% 99.29% 220 301630 sun.nio.ch.EPollArrayWrapper.epollCtl 7 0.03% 99.32% 219 301259 sun.misc.Unsafe.unpark 8 0.02% 99.34% 145 301559 sun.nio.ch.FileDispatcherImpl.read0 9 0.01% 99.36% 89 300590 java.lang.ClassLoader.defineClass1 10 0.01% 99.37% 87 301643 scala.Tuple2.equals 11 0.01% 99.38% 79 300592 java.util.zip.ZipFile.read 12 0.01% 99.39% 79 301018 kafka.utils.CoreUtils$.crc32 13 0.01% 99.40% 78 301561 java.io.FileDescriptor.sync 14 0.01% 99.41% 72 301422 java.util.Arrays.copyOf CPU SAMPLES END
It seems like the constant disconnects is far bigger then the 10 minutes default. I suspect this has something to do with double connects, which I'm not sure to get around. On Thu, Mar 23, 2017 at 11:46 AM, Manikumar <manikumar.re...@gmail.com> wrote: > 1. may be you can monitor thread wise cpu usage and correlate with thread > dump > to identify the bottleneck > 2. Broker config property connections.max.idle.ms is used to close > idle connections. > default is 10min. > > On Thu, Mar 23, 2017 at 3:55 PM, Paul van der Linden <p...@sportr.co.uk> > wrote: > > > Hi, > > > > I deployed Kafka about a week ago, but there are a few problems with how > > Kafka behaves. > > The first is the surprisingly high resource usage, one this the memory > > (1.5-2 GB for each broker, 3 brokers) although this might be normal. The > > other one is the cpu usage, which starts with 20% minimum on each broker, > > which I find strange with the current throughput (which is < 1 msg/s). > > > > This might has something to do with something else which I find strange, > > Kafka disconnects clients about every 10-20 minutes per broker. It might > > have something to do with the configuration: Deployed in kubernetes, > > bootstrapping with a single dns name (which is backed by all alive kafka > > brokers), and then every broker has a separate dns address which is used > > after the bootstrap. This means that a client is connected twice to one > of > > the brokers. The reason for the bootstrap dns name is to make sure I > don't > > have to update all clients to include other brokers. > > > > Any advice on how to solve these 2 problems? > > > > Thanks, > > Paul > > > > On Tue, Mar 21, 2017 at 10:30 AM, Paul van der Linden <p...@sportr.co.uk > > > > wrote: > > > > > Hi, > > > > > > I deployed Kafka about a week ago, but there are a few problems with > how > > > Kafka behaves. > > > The first is the surprisingly high resource usage, one this the memory > > > (1.5-2 GB for each broker, 3 brokers) although this might be normal. > The > > > other one is the cpu usage, which starts with 20% minimum on each > broker, > > > which I find strange with the current throughput (which is < 1 msg/s). > > > > > > This might has something to do with something else which I find > strange, > > > Kafka disconnects clients about every 10-20 minutes per broker. It > might > > > have something to do with the configuration: Deployed in kubernetes, > > > bootstrapping with a single dns name (which is backed by all alive > kafka > > > brokers), and then every broker has a separate dns address which is > used > > > after the bootstrap. This means that a client is connected twice to one > > of > > > the brokers. The reason for the bootstrap dns name is to make sure I > > don't > > > have to update all clients to include other brokers. > > > > > > Any advice on how to solve these 2 problems? > > > > > > Thanks, > > > Paul > > > > > >