Dave Powell created KAFKA-3916:
----------------------------------

             Summary: Connection from controller to broker disconnects
                 Key: KAFKA-3916
                 URL: https://issues.apache.org/jira/browse/KAFKA-3916
             Project: Kafka
          Issue Type: Bug
          Components: controller
    Affects Versions: 0.9.0.1
            Reporter: Dave Powell


We recently upgraded from 0.8.2.1 to 0.9.0.1. Since then, several times per 
day, the controllers in our clusters have their connection to all brokers 
disconnected, and then successfully reconnected a few hundred ms later. Each 
time this occurs we see a brief spike in our 99th percentile produce and 
consume times, reaching several hundred ms.

Here is an example of what we're seeing in the controller.log:
{{
[2016-06-28 14:15:35,416] WARN [Controller-151-to-broker-160-send-thread], 
Controller 151 epoch 106 fails to send request {…} to broker Node(160, 
broker.160.hostname, 9092). Reconnecting to broker. 
(kafka.controller.RequestSendThread)
java.io.IOException: Connection to 160 was disconnected before the response was 
read
        at 
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:87)
        at 
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:84)
        at scala.Option.foreach(Option.scala:236)
        at 
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:84)
        at 
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:80)
        at 
kafka.utils.NetworkClientBlockingOps$.recurse$1(NetworkClientBlockingOps.scala:129)
        at 
kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollUntilFound$extension(NetworkClientBlockingOps.scala:139)
        at 
kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:80)
        at 
kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:180)
        at 
kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:171)
        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)

... one each for all brokers (including the controller) ...

 [2016-06-28 14:15:35,721] INFO [Controller-151-to-broker-160-send-thread], 
Controller 151 connected to Node(160, broker.160.hostname, 9092) for sending 
state change requests (kafka.controller.RequestSendThread)

… one each for all brokers (including the controller) ...
}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to