Doesn't seem to be the clients indeed. Maybe it already uses 13% of cpu on maintaining the cluster. With no connections at all, except zookeeper and the other 2 brokers. This is the cpu usage:
CPU SAMPLES BEGIN (total = 86359) Thu Mar 23 16:47:26 2017 rank self accum count trace method 1 87.23% 87.23% 75327 300920 sun.nio.ch.EPollArrayWrapper.epollWait 2 12.48% 99.71% 10780 300518 java.net.PlainSocketImpl.socketAccept 3 0.06% 99.77% 51 300940 sun.nio.ch.FileDispatcherImpl.write0 4 0.02% 99.79% 20 301559 sun.nio.ch.FileDispatcherImpl.read0 5 0.01% 99.80% 10 301567 org.apache.log4j.Category.getEffectiveLevel CPU SAMPLES END On Thu, Mar 23, 2017 at 1:02 PM, Jaikiran Pai <jai.forums2...@gmail.com> wrote: > One thing that you might want to check is the number of consumers that are > connected/consuming against this Kafka setup. We have consistently noticed > that the CPU usage of the broker is very high even with very few consumers > (around 10 Java consumers). There's even a JIRA for it. From what I > remember, it had to do with the constant hearbeat and other such network > activities that happen between these consumers and the broker. We had this > issues since 0.8.x days till 0.10.0.1. We just migrated to 0.10.2.0 and we > will have to see if it is still reproducible in there. > > I don't mean to say you are running into the same issue, but you can check > that aspect as well (maybe shutdown all consumers and see how the broker > CPU behaves). > > -Jaikiran > > > On Thursday 23 March 2017 06:15 PM, Paul van der Linden wrote: > >> Thanks. I managed to get a cpu dump from staging. >> >> The output: >> THREAD START (obj=50000427, id = 200004, name="RMI TCP Accept-0", >> group="system") >> THREAD START (obj=50000427, id = 200001, name="main", group="main") >> THREAD START (obj=50000427, id = 200005, name="SensorExpiryThread", >> group="main") >> THREAD START (obj=500008e6, id = 200006, >> name="ThrottledRequestReaper-Fetch", group="main") >> THREAD START (obj=500008e6, id = 200007, >> name="ThrottledRequestReaper-Produce", group="main") >> THREAD START (obj=50000914, id = 200008, >> name="ZkClient-EventThread-18-zookeeper:2181", group="main") >> THREAD START (obj=500008e6, id = 200009, name="main-SendThread()", >> group="main") >> THREAD START (obj=50000950, id = 200010, name="main-EventThread", >> group="main") >> THREAD START (obj=50000427, id = 200011, name="pool-3-thread-1", >> group="main") >> THREAD END (id = 200011) >> THREAD START (obj=50000427, id = 200012, >> name="metrics-meter-tick-thread-1", group="main") >> THREAD START (obj=50000427, id = 200014, name="kafka-scheduler-0", >> group="main") >> THREAD START (obj=50000427, id = 200013, name="kafka-scheduler-1", >> group="main") >> THREAD START (obj=50000427, id = 200015, name="kafka-scheduler-2", >> group="main") >> THREAD START (obj=50000c33, id = 200016, name="kafka-log-cleaner-thread >> -0", >> group="main") >> THREAD START (obj=50000427, id = 200017, >> name="kafka-network-thread-2-PLAINTEXT-0", group="main") >> THREAD START (obj=50000427, id = 200018, >> name="kafka-network-thread-2-PLAINTEXT-1", group="main") >> THREAD START (obj=50000427, id = 200019, >> name="kafka-network-thread-2-PLAINTEXT-2", group="main") >> THREAD START (obj=50000427, id = 200020, >> name="kafka-socket-acceptor-PLAINTEXT-9092", group="main") >> THREAD START (obj=500008e6, id = 200021, name="ExpirationReaper-2", >> group="main") >> THREAD START (obj=500008e6, id = 200022, name="ExpirationReaper-2", >> group="main") >> THREAD START (obj=50000427, id = 200023, >> name="metrics-meter-tick-thread-2", group="main") >> THREAD START (obj=50000427, id = 200024, name="kafka-scheduler-3", >> group="main") >> THREAD START (obj=50000427, id = 200025, name="kafka-scheduler-4", >> group="main") >> THREAD START (obj=50000427, id = 200026, name="kafka-scheduler-5", >> group="main") >> THREAD START (obj=50000427, id = 200027, name="kafka-scheduler-6", >> group="main") >> THREAD START (obj=500008e6, id = 200028, name="ExpirationReaper-2", >> group="main") >> THREAD START (obj=500008e6, id = 200029, name="ExpirationReaper-2", >> group="main") >> THREAD START (obj=500008e6, id = 200030, name="ExpirationReaper-2", >> group="main") >> THREAD START (obj=50000427, id = 200031, name="group-metadata-manager-0", >> group="main") >> THREAD START (obj=50000427, id = 200032, name="kafka-request-handler-0", >> group="main") >> THREAD START (obj=50000427, id = 200037, name="kafka-request-handler-5", >> group="main") >> THREAD START (obj=50000427, id = 200036, name="kafka-request-handler-4", >> group="main") >> THREAD START (obj=50000427, id = 200035, name="kafka-request-handler-3", >> group="main") >> THREAD START (obj=50000427, id = 200034, name="kafka-request-handler-2", >> group="main") >> THREAD START (obj=50000427, id = 200033, name="kafka-request-handler-1", >> group="main") >> THREAD START (obj=50000427, id = 200038, name="kafka-request-handler-6", >> group="main") >> THREAD START (obj=50000427, id = 200039, name="kafka-request-handler-7", >> group="main") >> THREAD START (obj=50000427, id = 200040, name="kafka-scheduler-7", >> group="main") >> THREAD START (obj=50000427, id = 200041, name="kafka-scheduler-8", >> group="main") >> THREAD START (obj=50000ee2, id = 200042, name="ReplicaFetcherThread-0-0", >> group="main") >> THREAD START (obj=50000ee2, id = 200043, name="ReplicaFetcherThread-0-1", >> group="main") >> THREAD START (obj=50000427, id = 200044, name="kafka-scheduler-9", >> group="main") >> THREAD START (obj=50000427, id = 200045, name="executor-Fetch", >> group="main") >> TRACE 300920: >> sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown >> line) >> sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) >> sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) >> sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) >> TRACE 300518: >> java.net.PlainSocketImpl.socketAccept(PlainSocketImpl.java:Unknown line) >> java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409) >> java.net.ServerSocket.implAccept(ServerSocket.java:545) >> java.net.ServerSocket.accept(ServerSocket.java:513) >> TRACE 300940: >> sun.nio.ch.FileDispatcherImpl.write0(FileDispatcherImpl.java:Unknown >> line) >> sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) >> sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) >> sun.nio.ch.IOUtil.write(IOUtil.java:65) >> TRACE 301003: >> org.xerial.snappy.SnappyNative.rawUncompress(SnappyNative.java:Unknown >> line) >> org.xerial.snappy.Snappy.rawUncompress(Snappy.java:474) >> org.xerial.snappy.Snappy.uncompress(Snappy.java:513) >> org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:147) >> TRACE 300979: >> sun.nio.ch.FileDispatcherImpl.pread0(FileDispatcherImpl.java:Unknown >> line) >> sun.nio.ch.FileDispatcherImpl.pread(FileDispatcherImpl.java:52) >> sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:220) >> sun.nio.ch.IOUtil.read(IOUtil.java:197) >> TRACE 301630: >> sun.nio.ch.EPollArrayWrapper.epollCtl(EPollArrayWrapper.java:Unknown >> line) >> sun.nio.ch.EPollArrayWrapper.updateRegistrations(EPollArrayW >> rapper.java:299) >> sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:268) >> sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) >> TRACE 301259: >> sun.misc.Unsafe.unpark(Unsafe.java:Unknown line) >> java.util.concurrent.locks.LockSupport.unpark(LockSupport.java:141) >> java.util.concurrent.locks.AbstractQueuedSynchronizer.unpark >> Successor(AbstractQueuedSynchronizer.java:662) >> java.util.concurrent.locks.AbstractQueuedSynchronizer.releas >> e(AbstractQueuedSynchronizer.java:1264) >> TRACE 301559: >> sun.nio.ch.FileDispatcherImpl.read0(FileDispatcherImpl.java:Unknown line) >> sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) >> sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) >> sun.nio.ch.IOUtil.read(IOUtil.java:197) >> TRACE 300590: >> java.lang.ClassLoader.defineClass1(ClassLoader.java:Unknown line) >> java.lang.ClassLoader.defineClass(ClassLoader.java:763) >> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) >> java.net.URLClassLoader.defineClass(URLClassLoader.java:467) >> TRACE 301643: >> scala.Tuple2.equals(Tuple2.scala:20) >> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:940) >> kafka.utils.Pool.get(Pool.scala:69) >> kafka.server.ReplicaManager.getPartition(ReplicaManager.scala:280) >> TRACE 300592: >> java.util.zip.ZipFile.read(ZipFile.java:Unknown line) >> java.util.zip.ZipFile.access$1400(ZipFile.java:60) >> java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:717) >> java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:419) >> TRACE 301018: >> kafka.utils.CoreUtils$.crc32(CoreUtils.scala:148) >> kafka.message.Message.computeChecksum(Message.scala:216) >> kafka.message.Message.isValid(Message.scala:226) >> kafka.message.Message.ensureValid(Message.scala:232) >> TRACE 301561: >> java.io.FileDescriptor.sync(FileDescriptor.java:Unknown line) >> kafka.server.OffsetCheckpoint.liftedTree1$1(OffsetCheckpoint.scala:62) >> kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:49) >> kafka.server.ReplicaManager$$anonfun$checkpointHighWatermark >> s$2.apply(ReplicaManager.scala:945) >> TRACE 301422: >> java.util.Arrays.copyOf(Arrays.java:3332) >> java.lang.AbstractStringBuilder.expandCapacity(AbstractStrin >> gBuilder.java:137) >> java.lang.AbstractStringBuilder.ensureCapacityInternal(Abstr >> actStringBuilder.java:121) >> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421) >> CPU SAMPLES BEGIN (total = 699214) Thu Mar 23 12:41:17 2017 >> rank self accum count trace method >> 1 86.46% 86.46% 604544 300920 sun.nio.ch.EPollArrayWrapper.epollWait >> 2 12.62% 99.08% 88254 300518 java.net.PlainSocketImpl.socketAccept >> 3 0.11% 99.19% 759 300940 sun.nio.ch.FileDispatcherImpl.write0 >> 4 0.04% 99.23% 253 301003 >> org.xerial.snappy.SnappyNative.rawUncompress >> 5 0.03% 99.26% 231 300979 sun.nio.ch.FileDispatcherImpl.pread0 >> 6 0.03% 99.29% 220 301630 sun.nio.ch.EPollArrayWrapper.epollCtl >> 7 0.03% 99.32% 219 301259 sun.misc.Unsafe.unpark >> 8 0.02% 99.34% 145 301559 sun.nio.ch.FileDispatcherImpl.read0 >> 9 0.01% 99.36% 89 300590 java.lang.ClassLoader.defineClass1 >> 10 0.01% 99.37% 87 301643 scala.Tuple2.equals >> 11 0.01% 99.38% 79 300592 java.util.zip.ZipFile.read >> 12 0.01% 99.39% 79 301018 kafka.utils.CoreUtils$.crc32 >> 13 0.01% 99.40% 78 301561 java.io.FileDescriptor.sync >> 14 0.01% 99.41% 72 301422 java.util.Arrays.copyOf >> CPU SAMPLES END >> >> It seems like the constant disconnects is far bigger then the 10 minutes >> default. I suspect this has something to do with double connects, which >> I'm >> not sure to get around. >> >> On Thu, Mar 23, 2017 at 11:46 AM, Manikumar <manikumar.re...@gmail.com> >> wrote: >> >> 1. may be you can monitor thread wise cpu usage and correlate with thread >>> dump >>> to identify the bottleneck >>> 2. Broker config property connections.max.idle.ms is used to close >>> idle connections. >>> default is 10min. >>> >>> On Thu, Mar 23, 2017 at 3:55 PM, Paul van der Linden <p...@sportr.co.uk> >>> wrote: >>> >>> Hi, >>>> >>>> I deployed Kafka about a week ago, but there are a few problems with how >>>> Kafka behaves. >>>> The first is the surprisingly high resource usage, one this the memory >>>> (1.5-2 GB for each broker, 3 brokers) although this might be normal. The >>>> other one is the cpu usage, which starts with 20% minimum on each >>>> broker, >>>> which I find strange with the current throughput (which is < 1 msg/s). >>>> >>>> This might has something to do with something else which I find strange, >>>> Kafka disconnects clients about every 10-20 minutes per broker. It might >>>> have something to do with the configuration: Deployed in kubernetes, >>>> bootstrapping with a single dns name (which is backed by all alive kafka >>>> brokers), and then every broker has a separate dns address which is used >>>> after the bootstrap. This means that a client is connected twice to one >>>> >>> of >>> >>>> the brokers. The reason for the bootstrap dns name is to make sure I >>>> >>> don't >>> >>>> have to update all clients to include other brokers. >>>> >>>> Any advice on how to solve these 2 problems? >>>> >>>> Thanks, >>>> Paul >>>> >>>> On Tue, Mar 21, 2017 at 10:30 AM, Paul van der Linden < >>>> p...@sportr.co.uk >>>> >>>> wrote: >>>> >>>> Hi, >>>>> >>>>> I deployed Kafka about a week ago, but there are a few problems with >>>>> >>>> how >>> >>>> Kafka behaves. >>>>> The first is the surprisingly high resource usage, one this the memory >>>>> (1.5-2 GB for each broker, 3 brokers) although this might be normal. >>>>> >>>> The >>> >>>> other one is the cpu usage, which starts with 20% minimum on each >>>>> >>>> broker, >>> >>>> which I find strange with the current throughput (which is < 1 msg/s). >>>>> >>>>> This might has something to do with something else which I find >>>>> >>>> strange, >>> >>>> Kafka disconnects clients about every 10-20 minutes per broker. It >>>>> >>>> might >>> >>>> have something to do with the configuration: Deployed in kubernetes, >>>>> bootstrapping with a single dns name (which is backed by all alive >>>>> >>>> kafka >>> >>>> brokers), and then every broker has a separate dns address which is >>>>> >>>> used >>> >>>> after the bootstrap. This means that a client is connected twice to one >>>>> >>>> of >>>> >>>>> the brokers. The reason for the bootstrap dns name is to make sure I >>>>> >>>> don't >>>> >>>>> have to update all clients to include other brokers. >>>>> >>>>> Any advice on how to solve these 2 problems? >>>>> >>>>> Thanks, >>>>> Paul >>>>> >>>>> >