Taking a long time to roll a new log segment (~1 min)

Stephen Powis Mon, 09 Jan 2017 12:53:29 -0800

Hey!

I've run into something concerning in our production cluster....I believe
I've posted this question to the mailing list previously (
http://mail-archives.apache.org/mod_mbox/kafka-users/201609.mbox/browser)
but the problem has become considerably more serious.


We've been fighting issues where Kafka 0.10.0.1 hits its max file
descriptor limit.  Our limit is set to ~16k, and under normal operation it
holds steady around 4k open files.

But occasionally Kafka will roll a new log segment, which typically takes
on the order of magnitude of a few milliseconds.  However...sometimes it
will take a considerable amount of time, any where from 40 seconds up to
over a minute.  When this happens, it seems like connections are not
released by kafka, and we end up with thousands of client connections stuck
in CLOSE_WAIT, which pile up and exceed our max file descriptor limit.
This happens all in the span of about a minute.

Our logs look like this:

[2017-01-08 01:10:17,117] INFO Rolled new log segment for 'MyTopic-8' in
> 41122 ms. (kafka.log.Log)
> [2017-01-08 01:10:32,550] INFO Rolled new log segment for 'MyTopic-4' in 1
> ms. (kafka.log.Log)
> [2017-01-08 01:11:10,039] INFO [Group Metadata Manager on Broker 4]:
> Removed 0 expired offsets in 0 milliseconds.
> (kafka.coordinator.GroupMetadataManager)
> [2017-01-08 01:19:02,877] ERROR Error while accepting connection
> (kafka.network.Acceptor)
> java.io.IOException: Too many open files       at
> sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>
        at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
>         at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
>         at kafka.network.Acceptor.accept(SocketServer.scala:323)
>         at kafka.network.Acceptor.run(SocketServer.scala:268)
>         at java.lang.Thread.run(Thread.java:745)
> [2017-01-08 01:19:02,877] ERROR Error while accepting connection
> (kafka.network.Acceptor)
> java.io.IOException: Too many open files
>         at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>         at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
>         at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
>         at kafka.network.Acceptor.accept(SocketServer.scala:323)
>         at kafka.network.Acceptor.run(SocketServer.scala:268)
>         at java.lang.Thread.run(Thread.java:745)
> .....
>


And then kafka crashes.

Has anyone seen this behavior of slow log segmented being rolled?  Any
ideas of how to track down what could be causing this?

Thanks!
Stephen

Taking a long time to roll a new log segment (~1 min)

Reply via email to