We have been using Kafka for a while now in one of dev projects. Currently we have just 1 broker and 1 zookeeper instance. Almost every day, Kafka "stalls" and we end up cleaning up the data/log folder of Kafka and zookeeper and bring it up afresh. We haven't been able to narrow down the issue yet.

However, keeping aside that part for a while, we have been noticing that even when the system/application is completely idle, the Kafka process seems to take up unreasonably high CPU (10-15% constantly shown in top command). We have taken multiple thread dumps and each of them have this:

"kafka-socket-acceptor" #24 prio=5 os_prio=0 tid=0x00007f62685d9000 nid=0x2d47 runnable [0x00007f6231464000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
    - locked <0x00000000ca77a458> (a sun.nio.ch.Util$2)
    - locked <0x00000000ca77a440> (a java.util.Collections$UnmodifiableSet)
    - locked <0x00000000ca774550> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
    at kafka.network.Acceptor.run(SocketServer.scala:215)
    at java.lang.Thread.run(Thread.java:745)

"kafka-network-thread-9092-2" #23 prio=5 os_prio=0 tid=0x00007f62685d6800 nid=0x2d46 runnable [0x00007f6231565000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
    - locked <0x00000000ca77d050> (a sun.nio.ch.Util$2)
    - locked <0x00000000ca77d038> (a java.util.Collections$UnmodifiableSet)
    - locked <0x00000000ca7745e0> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
    at kafka.network.Processor.run(SocketServer.scala:320)
    at java.lang.Thread.run(Thread.java:745)

"kafka-network-thread-9092-1" #22 prio=5 os_prio=0 tid=0x00007f62685c7800 nid=0x2d45 runnable [0x00007f6231666000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
    - locked <0x00000000ca77e590> (a sun.nio.ch.Util$2)
    - locked <0x00000000ca77e578> (a java.util.Collections$UnmodifiableSet)
    - locked <0x00000000ca7746b8> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
    at kafka.network.Processor.run(SocketServer.scala:320)
    at java.lang.Thread.run(Thread.java:745)

"kafka-network-thread-9092-0" #21 prio=5 os_prio=0 tid=0x00007f62685b9000 nid=0x2d44 runnable [0x00007f6231767000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
    - locked <0x00000000ca77fbd0> (a sun.nio.ch.Util$2)
    - locked <0x00000000ca77fbb8> (a java.util.Collections$UnmodifiableSet)
    - locked <0x00000000ca774790> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
    at kafka.network.Processor.run(SocketServer.scala:320)
    at java.lang.Thread.run(Thread.java:745)




Looking at the code of 0.8.2.1, this piece of code looks like https://github.com/apache/kafka/blob/0.8.2.1/core/src/main/scala/kafka/network/SocketServer.scala#L314:

while(isRunning) {
...
    val ready = selector.select(300)
    ...
    if(ready > 0) {
        ...
    }
...
}

This looks like a (always) "busy" while loop when selector.select returns 0. Could a sleep for a few milli. seconds help in this case? Similar code is present in the Acceptor in that same file, which does this exact thing. Would adding some small sleep in there help with reducing the CPU usage when things are idle?

-Jaikiran


Reply via email to