Ankur C created KAFKA-4573: ------------------------------ Summary: Producer sporadic timeout Key: KAFKA-4573 URL: https://issues.apache.org/jira/browse/KAFKA-4573 Project: Kafka Issue Type: Bug Reporter: Ankur C
We had production outage due to sporadic kafka producer timeout. About 1 to 2% of the message would timeout continuously. Kafka version - 0.9.0.1 #Kafka brokers - 5 #Replication for each topic - 3 #Number of topics - ~30 #Number of partition - ~300 We have kafka 0.9.0.1 running in our 5 broker cluster for 1 month without any issues. However, on Dec 23rd we saw sporadic kafka producer timeout. Issue begin around 6:51am and continued until we bounced kafka broker. 6:51am Underreplication started on small number of topics 6:53am All underreplication recovered 11:00am We restarted all kafka producer writer app but this didn't solve the sporadic kafka producer timeout issue 12:01pm We restarted all kafka broker after this the issue was resolved. Kafka metrics and kafka logs doesn't show any major issue. There were no offline partitions during the outage and #controller was exactly 1. We only saw following exception in kafka broker in controller.log. This log was present for all broker 0 to 4. java.io.IOException: Connection to 2 was disconnected before the response was read at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:87) at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:84) at scala.Option.foreach(Option.scala:236) at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:84) at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:80) at kafka.utils.NetworkClientBlockingOps$.recurse$1(NetworkClientBlockingOps.scala:129) at kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollUntilFound$extension(NetworkClientBlockingOps.scala:139) at kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:80) at kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:180) at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:171) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) -- This message was sent by Atlassian JIRA (v6.3.4#6332)