Dave Powell created KAFKA-3916: ---------------------------------- Summary: Connection from controller to broker disconnects Key: KAFKA-3916 URL: https://issues.apache.org/jira/browse/KAFKA-3916 Project: Kafka Issue Type: Bug Components: controller Affects Versions: 0.9.0.1 Reporter: Dave Powell
We recently upgraded from 0.8.2.1 to 0.9.0.1. Since then, several times per day, the controllers in our clusters have their connection to all brokers disconnected, and then successfully reconnected a few hundred ms later. Each time this occurs we see a brief spike in our 99th percentile produce and consume times, reaching several hundred ms. Here is an example of what we're seeing in the controller.log: {{ [2016-06-28 14:15:35,416] WARN [Controller-151-to-broker-160-send-thread], Controller 151 epoch 106 fails to send request {…} to broker Node(160, broker.160.hostname, 9092). Reconnecting to broker. (kafka.controller.RequestSendThread) java.io.IOException: Connection to 160 was disconnected before the response was read at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:87) at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:84) at scala.Option.foreach(Option.scala:236) at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:84) at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:80) at kafka.utils.NetworkClientBlockingOps$.recurse$1(NetworkClientBlockingOps.scala:129) at kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollUntilFound$extension(NetworkClientBlockingOps.scala:139) at kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:80) at kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:180) at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:171) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) ... one each for all brokers (including the controller) ... [2016-06-28 14:15:35,721] INFO [Controller-151-to-broker-160-send-thread], Controller 151 connected to Node(160, broker.160.hostname, 9092) for sending state change requests (kafka.controller.RequestSendThread) … one each for all brokers (including the controller) ... }} -- This message was sent by Atlassian JIRA (v6.3.4#6332)