[ https://issues.apache.org/jira/browse/KAFKA-8008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16783530#comment-16783530 ]
Michael Kemmerer commented on KAFKA-8008: ----------------------------------------- Same here. I have 3 clusters on the same V-Net in Azure. All of them were deployed at the same time using similar configurations, only one is experiencing this issue. I've even torn down the brokers of the affected cluster and rebuilt them without using SASL and they're still experiencing the same error state. > Clients unable to connect and replicas are not able to connect to each other > ---------------------------------------------------------------------------- > > Key: KAFKA-8008 > URL: https://issues.apache.org/jira/browse/KAFKA-8008 > Project: Kafka > Issue Type: Bug > Components: controller, core > Affects Versions: 2.1.0, 2.1.1 > Environment: Java 11 > Reporter: Abhi > Priority: Critical > > Hi, > I upgrade to Kafka v2.1.1 recently and seeing the below exceptions in all the > servers. The kafka-network-thread-1-ListenerName are all consuming full cpu > cycles. Lots of TCP connections are in CLOSE_WAIT state. > My broker setup is using kerberos authentication with > -Dsun.security.jgss.native=true. > I am not sure how to handle this? Will increasing the kafka-network thread > count help if it is possible? > Does this seem like a bug? I am happy to help in anyway I can as this issue > blocking our production usage and would like to get it resolved as early as > possible. > *server.log snippet from one of the servers:* > [2019-02-27 00:00:02,948] DEBUG [ReplicaFetcher replicaId=1, leaderId=2, > fetcherId=3] Built full fetch (sessionId=1488865423, epoch=INITIAL) for node > 2 with 3 partition(s). (org.apache.kafka.clients.FetchSessionHandler) > [2019-02-27 00:00:02,949] DEBUG [ReplicaFetcher replicaId=1, leaderId=2, > fetcherId=3] Initiating connection to node mwkafka-prod-02.nyc.foo.com:9092 > (id: 2 rack: null) using address mwkafka-prod-02.nyc.foo.com/10.219.247.26 > (org.apache.kafka.clients.NetworkClient) > [2019-02-27 00:00:02,949] DEBUG Set SASL client state to > SEND_APIVERSIONS_REQUEST > (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator) > [2019-02-27 00:00:02,949] DEBUG Creating SaslClient: > client=kafka/mwkafka-prod-01.nyc.foo....@unix.foo.com;service=kafka;serviceHostname=mwkafka-prod-02.nyc.foo.com;mechs=[GSSAPI] > (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator) > [2019-02-27 00:00:02,949] DEBUG [ReplicaFetcher replicaId=1, leaderId=2, > fetcherId=3] Created socket with SO_RCVBUF = 65536, SO_SNDBUF = 166400, > SO_TIMEOUT = 0 to node 2 (org.apache.kafka.common.network.Selector) > [2019-02-27 00:00:02,949] DEBUG Set SASL client state to > RECEIVE_APIVERSIONS_RESPONSE > (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator) > [2019-02-27 00:00:02,949] DEBUG [ReplicaFetcher replicaId=1, leaderId=2, > fetcherId=3] Completed connection to node 2. Ready. > (org.apache.kafka.clients.NetworkClient) > [2019-02-27 00:00:03,007] DEBUG [ReplicaFetcher replicaId=1, leaderId=5, > fetcherId=0] Built full fetch (sessionId=2039987243, epoch=INITIAL) for node > 5 with 0 partition(s). (org.apache.kafka.clients.FetchSessionHandler) > [2019-02-27 00:00:03,317] INFO [ReplicaFetcher replicaId=1, leaderId=5, > fetcherId=1] Error sending fetch request (sessionId=397037945, epoch=INITIAL) > to node 5: java.net.SocketTimeoutException: Failed to connect within 30000 > ms. (org.apache.kafka.clients.FetchSessionHandler) > [2019-02-27 00:00:03,317] WARN [ReplicaFetcher replicaId=1, leaderId=5, > fetcherId=1] Error in response for fetch request (type=FetchRequest, > replicaId=1, maxWait=10000, minBytes=1, maxBytes=10485760, > fetchData={reddyvel-159-0=(fetchOffset=3173198, logStartOffset=3173198, > maxBytes=1048576, currentLeaderEpoch=Optional[23]), > reddyvel-331-0=(fetchOffset=3173197, logStartOffset=3173197, > maxBytes=1048576, currentLeaderEpoch=Optional[23]), > reddyvel-newtp-5-64-0=(fetchOffset=8936, logStartOffset=8936, > maxBytes=1048576, currentLeaderEpoch=Optional[18]), > reddyvel-tp9-78-0=(fetchOffset=247943, logStartOffset=247943, > maxBytes=1048576, currentLeaderEpoch=Optional[19]), > reddyvel-tp3-58-0=(fetchOffset=264495, logStartOffset=264495, > maxBytes=1048576, currentLeaderEpoch=Optional[19]), > fps.trsy.fe_prvt-0=(fetchOffset=24, logStartOffset=8, maxBytes=1048576, > currentLeaderEpoch=Optional[3]), reddyvel-7-0=(fetchOffset=3173199, > logStartOffset=3173199, maxBytes=1048576, currentLeaderEpoch=Optional[23]), > reddyvel-298-0=(fetchOffset=3173197, logStartOffset=3173197, > maxBytes=1048576, currentLeaderEpoch=Optional[23]), > fps.guas.peeq.fe_marb_us-0=(fetchOffset=2, logStartOffset=2, > maxBytes=1048576, currentLeaderEpoch=Optional[6]), > reddyvel-108-0=(fetchOffset=3173198, logStartOffset=3173198, > maxBytes=1048576, currentLeaderEpoch=Optional[23]), > reddyvel-988-0=(fetchOffset=3173185, logStartOffset=3173185, > maxBytes=1048576, currentLeaderEpoch=Optional[23]), > reddyvel-111-0=(fetchOffset=3173198, logStartOffset=3173198, > maxBytes=1048576, currentLeaderEpoch=Optional[23]), > reddyvel-409-0=(fetchOffset=3173194, logStartOffset=3173194, > maxBytes=1048576, currentLeaderEpoch=Optional[23]), > reddyvel-104-0=(fetchOffset=3173198, logStartOffset=3173198, > maxBytes=1048576, currentLeaderEpoch=Optional[23]), > fps.priveq.reins-0=(fetchOffset=12, logStartOffset=6, maxBytes=1048576, > currentLeaderEpoch=Optional[5]), reddyvel-353-0=(fetchOffset=3173197, > logStartOffset=3173197, maxBytes=1048576, currentLeaderEpoch=Optional[23]), > reddyvel-tp10-63-0=(fetchOffset=220652, logStartOffset=220652, > maxBytes=1048576, currentLeaderEpoch=Optional[19]), > reddyvel-newtp-5-86-0=(fetchOffset=8935, logStartOffset=8935, > maxBytes=1048576, currentLeaderEpoch=Optional[18]), > reddyvel-878-0=(fetchOffset=3173187, logStartOffset=3173187, > maxBytes=1048576, currentLeaderEpoch=Optional[23]), > reddyvel-621-0=(fetchOffset=3173190, logStartOffset=3173190, > maxBytes=1048576, currentLeaderEpoch=Optional[23]), > fps.agg.uopt.opt-0=(fetchOffset=28297, logStartOffset=28297, > maxBytes=1048576, currentLeaderEpoch=Optional[8]), > reddyvel-661-0=(fetchOffset=3173190, logStartOffset=3173190, > maxBytes=1048576, currentLeaderEpoch=Optional[23]), > fps.guas.jpeq.fe_marb-0=(fetchOffset=532, logStartOffset=10, > maxBytes=1048576, currentLeaderEpoch=Optional[3]), > reddyvel-607-0=(fetchOffset=3173191, logStartOffset=3173191, > maxBytes=1048576, currentLeaderEpoch=Optional[23]), > fps.seed.ornt.desim_ornt-0=(fetchOffset=4060, logStartOffset=2433, > maxBytes=1048576, currentLeaderEpoch=Optional[5]), > reddyvel-962-0=(fetchOffset=3173185, logStartOffset=3173185, > maxBytes=1048576, currentLeaderEpoch=Optional[23]), > fps.agg.ornt.desim_ornt-0=(fetchOffset=1177, logStartOffset=1177, > maxBytes=1048576, currentLeaderEpoch=Optional[8]), > reddyvel-tp6-71-0=(fetchOffset=256309, logStartOffset=256309, > maxBytes=1048576, currentLeaderEpoch=Optional[19]), > fps.trsy.macro-0=(fetchOffset=324, logStartOffset=106, maxBytes=1048576, > currentLeaderEpoch=Optional[3]), fps.agg.dist.treas-0=(fetchOffset=0, > logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[8]), > reddyvel-newtp-8-111-0=(fetchOffset=1, logStartOffset=1, maxBytes=1048576, > currentLeaderEpoch=Optional[18]), reddyvel-94-0=(fetchOffset=3173198, > logStartOffset=3173198, maxBytes=1048576, currentLeaderEpoch=Optional[23]), > reddyvel-544-0=(fetchOffset=3173192, logStartOffset=3173192, > maxBytes=1048576, currentLeaderEpoch=Optional[23]), > reddyvel-newtp-2-72-0=(fetchOffset=8679, logStartOffset=8679, > maxBytes=1048576, currentLeaderEpoch=Optional[18]), > fps.guas.useq.eq_stat_arb_useq-0=(fetchOffset=6071629, > logStartOffset=4470145, maxBytes=1048576, currentLeaderEpoch=Optional[4]), > reddyvel-newtp-10-75-0=(fetchOffset=9768, logStartOffset=9768, > maxBytes=1048576, currentLeaderEpoch=Optional[18]), > reddyvel-tp8-51-0=(fetchOffset=249873, logStartOffset=249873, > maxBytes=1048576, currentLeaderEpoch=Optional[19]), > reddyvel-newtp-8-122-0=(fetchOffset=1, logStartOffset=1, maxBytes=1048576, > currentLeaderEpoch=Optional[18]), fps.seed.trsy.pe_china-0=(fetchOffset=24, > logStartOffset=8, maxBytes=1048576, currentLeaderEpoch=Optional[3]), > fps.seed.trsy.jcas-0=(fetchOffset=93, logStartOffset=31, maxBytes=1048576, > currentLeaderEpoch=Optional[3]), reddyvel-tp8-99-0=(fetchOffset=249871, > logStartOffset=249871, maxBytes=1048576, currentLeaderEpoch=Optional[19]), > reddyvel-643-0=(fetchOffset=3173190, logStartOffset=3173190, > maxBytes=1048576, currentLeaderEpoch=Optional[23]), > reddyvel-577-0=(fetchOffset=3173191, logStartOffset=3173191, > maxBytes=1048576, currentLeaderEpoch=Optional[23]), > reddyvel-tp6-97-0=(fetchOffset=256307, logStartOffset=256307, > maxBytes=1048576, currentLeaderEpoch=Optional[19]), > reddyvel-newtp-6-72-0=(fetchOffset=7652, logStartOffset=7652, > maxBytes=1048576, currentLeaderEpoch=Optional[18]), > reddyvel-959-0=(fetchOffset=3173185, logStartOffset=3173185, > maxBytes=1048576, currentLeaderEpoch=Optional[23]), > reddyvel-newtp-4-70-0=(fetchOffset=8828, logStartOffset=8828, > maxBytes=1048576, currentLeaderEpoch=Optional[18]), > fps.seed.trsy.opt-0=(fetchOffset=114, logStartOffset=38, maxBytes=1048576, > currentLeaderEpoch=Optional[4])}, isolationLevel=READ_UNCOMMITTED, toForget=, > metadata=(sessionId=397037945, epoch=INITIAL)) > (kafka.server.ReplicaFetcherThread) > java.net.SocketTimeoutException: Failed to connect within 30000 ms > at > kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:95) > at > kafka.server.ReplicaFetcherThread.fetchFromLeader(ReplicaFetcherThread.scala:192) > at > kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:274) > at > kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:132) > at > kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3$adapted(AbstractFetcherThread.scala:131) > at scala.Option.foreach(Option.scala:257) > at > kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:131) > at > kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:113) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82) > [2019-02-27 00:00:04,007] DEBUG [ReplicaFetcher replicaId=1, leaderId=5, > fetcherId=0] Built full fetch (sessionId=2039987243, epoch=INITIAL) for node > 5 with 45 partition(s). (org.apache.kafka.clients.FetchSessionHandler) > [2019-02-27 00:00:04,007] DEBUG [ReplicaFetcher replicaId=1, leaderId=5, > fetcherId=0] Initiating connection to node mwkafka-prod-01.tbd.foo.com:9092 > (id: 5 rack: null) using address mwkafka-prod-01.tbd.foo.com/10.236.30.30 > (org.apache.kafka.clients.NetworkClient) > [2019-02-27 00:00:04,008] DEBUG Set SASL client state to > SEND_APIVERSIONS_REQUEST > (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator) > [2019-02-27 00:00:04,008] DEBUG Creating SaslClient: > client=kafka/mwkafka-prod-01.nyc.foo....@unix.foo.com;service=kafka;serviceHostname=mwkafka-prod-01.tbd.foo.com;mechs=[GSSAPI] > (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator) > [2019-02-27 00:00:04,008] DEBUG [ReplicaFetcher replicaId=1, leaderId=5, > fetcherId=0] Created socket with SO_RCVBUF = 65536, SO_SNDBUF = 166400, > SO_TIMEOUT = 0 to node 5 (org.apache.kafka.common.network.Selector) > [2019-02-27 00:00:04,008] DEBUG Set SASL client state to > RECEIVE_APIVERSIONS_RESPONSE > (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator) > [2019-02-27 00:00:04,008] DEBUG [ReplicaFetcher replicaId=1, leaderId=5, > fetcherId=0] Completed connection to node 5. Ready. > (org.apache.kafka.clients.NetworkClient) > [2019-02-27 00:00:04,318] DEBUG [ReplicaFetcher replicaId=1, leaderId=5, > fetcherId=1] Built full fetch (sessionId=397037945, epoch=INITIAL) for node 5 > with 0 partition(s). (org.apache.kafka.clients.FetchSessionHandler) > [2019-02-27 00:00:05,318] DEBUG [ReplicaFetcher replicaId=1, leaderId=5, > fetcherId=1] Built full fetch (sessionId=397037945, epoch=INITIAL) for node 5 > with 48 partition(s). (org.apache.kafka.clients.FetchSessionHandler) > [2019-02-27 00:00:05,318] DEBUG [ReplicaFetcher replicaId=1, leaderId=5, > fetcherId=1] Initiating connection to node mwkafka-prod-01.tbd.foo.com:9092 > (id: 5 rack: null) using address mwkafka-prod-01.tbd.foo.com/10.236.30.30 > (org.apache.kafka.clients.NetworkClient) > [2019-02-27 00:00:05,318] DEBUG Set SASL client state to > SEND_APIVERSIONS_REQUEST > (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator) > [2019-02-27 00:00:05,318] DEBUG Creating SaslClient: > client=kafka/mwkafka-prod-01.nyc.foo....@unix.foo.com;service=kafka;serviceHostname=mwkafka-prod-01.tbd.foo.com;mechs=[GSSAPI] > (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator) > [2019-02-27 00:00:05,319] DEBUG [ReplicaFetcher replicaId=1, leaderId=5, > fetcherId=1] Created socket with SO_RCVBUF = 65536, SO_SNDBUF = 166400, > SO_TIMEOUT = 0 to node 5 (org.apache.kafka.common.network.Selector) > [2019-02-27 00:00:05,319] DEBUG Set SASL client state to > RECEIVE_APIVERSIONS_RESPONSE > (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator) > [2019-02-27 00:00:05,319] DEBUG [ReplicaFetcher replicaId=1, leaderId=5, > fetcherId=1] Completed connection to node 5. Ready. > (org.apache.kafka.clients.NetworkClient) > [2019-02-27 00:00:05,477] DEBUG Set SASL server state to AUTHENTICATE > (org.apache.kafka.common.security.authenticator.SaslServerAuthenticator) > [2019-02-27 00:00:05,477] DEBUG [SocketServer brokerId=1] Connection with > /10.236.30.31 disconnected (org.apache.kafka.common.network.Selector) > java.io.EOFException > at > org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:96) > at > org.apache.kafka.common.security.authenticator.SaslServerAuthenticator.authenticate(SaslServerAuthenticator.java:237) > at > org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:132) > at > org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:532) > at org.apache.kafka.common.network.Selector.poll(Selector.java:467) > at kafka.network.Processor.poll(SocketServer.scala:689) > at kafka.network.Processor.run(SocketServer.scala:594) > at java.base/java.lang.Thread.run(Thread.java:834) > [2019-02-27 00:00:05,477] DEBUG Handling Kafka request SASL_HANDSHAKE > (org.apache.kafka.common.security.authenticator.SaslServerAuthenticator) > [2019-02-27 00:00:05,477] DEBUG Using SASL mechanism 'GSSAPI' provided by > client > (org.apache.kafka.common.security.authenticator.SaslServerAuthenticator) > [2019-02-27 00:00:05,477] DEBUG Creating SaslServer for > kafka/mwkafka-prod-01.nyc.foo....@unix.foo.com with mechanism GSSAPI > (org.apache.kafka.common.security.authenticator.SaslServerAuthenticator) > [2019-02-27 00:00:06,056] INFO [ReplicaFetcher replicaId=1, leaderId=4, > fetcherId=2] Error sending fetch request (sessionId=373847113, epoch=INITIAL) > to node 4: java.net.SocketTimeoutException: Failed to connect within 30000 > ms. (org.apache.kafka.clients.FetchSessionHandler) > *Lsof output:* > kafka...@mwkafka-prod-01.nyc[toa]:/local/kafka/logs> lsof -P -p 103485 | grep > TCP | grep CLOSE > java 103485 kafkagod 635u IPv4 86522305 0t0 TCP > mwkafka-prod-01.nyc.foo.com:9092->mwkafka-prod-02.nyc.foo.com:46014 > (CLOSE_WAIT) > java 103485 kafkagod 639u IPv4 86519040 0t0 TCP > mwkafka-prod-01.nyc.foo.com:9092->mwkafka-prod-02.nyc.foo.com:45926 > (CLOSE_WAIT) > java 103485 kafkagod 642u IPv4 86519057 0t0 TCP > mwkafka-prod-01.nyc.foo.com:9092->mwkafka-prod-01.dr.foo.com:47424 > (CLOSE_WAIT) > java 103485 kafkagod 643u IPv4 86519058 0t0 TCP > mwkafka-prod-01.nyc.foo.com:9092->mwkafka-prod-01.dr.foo.com:47428 > (CLOSE_WAIT) > java 103485 kafkagod 683u IPv4 86509505 0t0 TCP > mwkafka-prod-01.nyc.foo.com:9092->mwkafka-prod-01.tbd.foo.com:57856 > (CLOSE_WAIT) > java 103485 kafkagod 684u IPv4 86522910 0t0 TCP > mwkafka-prod-01.nyc.foo.com:9092->mwkafka-prod-01.tbd.foo.com:57894 > (CLOSE_WAIT) > java 103485 kafkagod 688u IPv4 86522176 0t0 TCP > mwkafka-prod-01.nyc.foo.com:9092->mwkafka-prod-02.nyc.foo.com:45966 > (CLOSE_WAIT) > java 103485 kafkagod 690u IPv4 86522306 0t0 TCP > mwkafka-prod-01.nyc.foo.com:9092->mwkafka-prod-02.tbd.foo.com:35326 > (CLOSE_WAIT) > java 103485 kafkagod 695u IPv4 86522192 0t0 TCP > mwkafka-prod-01.nyc.foo.com:9092->mwkafka-prod-02.nyc.foo.com:45968 > (CLOSE_WAIT) > java 103485 kafkagod 696u IPv4 86509516 0t0 TCP > mwkafka-prod-01.nyc.foo.com:9092->mwkafka-prod-02.dr.foo.com:39676 > (CLOSE_WAIT) > java 103485 kafkagod 697u IPv4 86522307 0t0 TCP > mwkafka-prod-01.nyc.foo.com:9092->mwkafka-prod-02.tbd.foo.com:35328 > (CLOSE_WAIT) > java 103485 kafkagod 705u IPv4 86519026 0t0 TCP > mwkafka-prod-01.nyc.foo.com:9092->mwkafka-prod-01.tbd.foo.com:57838 > (CLOSE_WAIT) > java 103485 kafkagod 726u IPv4 86509517 0t0 TCP > mwkafka-prod-01.nyc.foo.com:9092->mwkafka-prod-02.tbd.foo.com:35258 > (CLOSE_WAIT) > *contoller logs below show that no broker is able connect to each other:* > [2019-02-27 03:39:16,135] WARN [RequestSendThread controllerId=1] Controller > 1's connection to broker mwkafka-prod-02.dr.foo.com:9092 (id: 4 rack: > dr.foo.com) was unsuccessful (kafka.controller.RequestSendThread) > java.net.SocketTimeoutException: Failed to connect within 30000 ms > at > kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.scala:280) > at > kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:233) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82) > [2019-02-27 03:39:20,915] WARN [RequestSendThread controllerId=1] Controller > 1's connection to broker mwkafka-prod-02.tbd.foo.com:9092 (id: 6 rack: > tbd.foo.com) was unsuccessful (kafka.controller.RequestSendThread) > java.net.SocketTimeoutException: Failed to connect within 30000 ms > at > kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.scala:280) > at > kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:233) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82) > [2019-02-27 03:39:25,839] WARN [RequestSendThread controllerId=1] Controller > 1's connection to broker mwkafka-prod-01.dr.foo.com:9092 (id: 3 rack: > dr.foo.com) was unsuccessful (kafka.controller.RequestSendThread) > java.net.SocketTimeoutException: Failed to connect within 30000 ms > at > kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.scala:280) > at > kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:233) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82) > [2019-02-27 03:39:31,371] WARN [RequestSendThread controllerId=1] Controller > 1's connection to broker mwkafka-prod-01.tbd.foo.com:9092 (id: 5 rack: > tbd.foo.com) was unsuccessful (kafka.controller.RequestSendThread) > java.net.SocketTimeoutException: Failed to connect within 30000 ms > at > kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.scala:280) > at > kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:233) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82) > [2019-02-27 03:39:40,440] WARN [RequestSendThread controllerId=1] Controller > 1's connection to broker mwkafka-prod-01.nyc.foo.com:9092 (id: 1 rack: > nyc.foo.com) was unsuccessful (kafka.controller.RequestSendThread) > java.net.SocketTimeoutException: Failed to connect within 30000 ms > at > kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.scala:280) > at > kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:233) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82) > [2019-02-27 03:39:42,149] WARN [RequestSendThread controllerId=1] Controller > 1's connection to broker mwkafka-prod-02.nyc.foo.com:9092 (id: 2 rack: > nyc.foo.com) was unsuccessful (kafka.controller.RequestSendThread) > java.net.SocketTimeoutException: Failed to connect within 30000 ms > at > kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.scala:280) > at > kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:233) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82) > *Client consumer logs: consumer is not able to connect to the brokers* > [2019-02-27 03:37:37,587] DEBUG [Consumer clientId=test_con, groupId=chow] > Created socket with SO_RCVBUF = 65536, SO_SNDBUF = 131072, SO_TIMEOUT = 0 to > node -6 (org.apache.kafka.common.network.Selector) > [2019-02-27 03:37:37,587] DEBUG Set SASL client state to > RECEIVE_APIVERSIONS_RESPONSE > (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator) > [2019-02-27 03:37:37,587] DEBUG [Consumer clientId=test_con, groupId=chow] > Completed connection to node -6. Fetching API versions. > (org.apache.kafka.clients.NetworkClient) > [2019-02-27 03:38:02,488] DEBUG [Consumer clientId=test_con, groupId=chow] > Coordinator discovery failed, refreshing metadata > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2019-02-27 03:38:02,488] DEBUG [Consumer clientId=test_con, groupId=chow] > Sending FindCoordinator request to broker mwkafka-prod-02.tbd.deshaw.com:9092 > (id: -6 rack: null) > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2019-02-27 03:38:08,618] DEBUG Set SASL client state to INITIAL > (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator) > [2019-02-27 03:38:08,619] DEBUG Set SASL client state to INTERMEDIATE > (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator) > [2019-02-27 03:38:29,288] DEBUG Set SASL client state to > SEND_HANDSHAKE_REQUEST > (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator) > [2019-02-27 03:38:29,288] DEBUG Set SASL client state to > RECEIVE_HANDSHAKE_RESPONSE > (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator) > [2019-02-27 03:38:32,493] DEBUG [Consumer clientId=test_con, groupId=chow] > Coordinator discovery failed, refreshing metadata > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2019-02-27 03:38:32,493] DEBUG [Consumer clientId=test_con, groupId=chow] > Sending FindCoordinator request to broker mwkafka-prod-02.dr.deshaw.com:9092 > (id: -4 rack: null) > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2019-02-27 03:39:02,498] DEBUG [Consumer clientId=test_con, groupId=chow] > Coordinator discovery failed, refreshing metadata > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2019-02-27 03:39:02,498] DEBUG [Consumer clientId=test_con, groupId=chow] > Sending FindCoordinator request to broker mwkafka-prod-02.dr.deshaw.com:9092 > (id: -3 rack: null) > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2019-02-27 03:39:32,501] DEBUG [Consumer clientId=test_con, groupId=chow] > Coordinator discovery failed, refreshing metadata > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2019-02-27 03:39:32,501] DEBUG [Consumer clientId=test_con, groupId=chow] > Sending FindCoordinator request to broker mwkafka-prod-01.nyc.deshaw.com:9092 > (id: -1 rack: null) > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2019-02-27 03:39:33,050] DEBUG Set SASL client state to > SEND_HANDSHAKE_REQUEST > (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator) > [2019-02-27 03:39:33,051] DEBUG Set SASL client state to > RECEIVE_HANDSHAKE_RESPONSE > (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator) > [2019-02-27 03:39:34,683] DEBUG Set SASL client state to CLIENT_COMPLETE > (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator) > [2019-02-27 03:39:41,887] DEBUG Set SASL client state to CLIENT_COMPLETE > (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator) > [2019-02-27 03:40:02,503] DEBUG [Consumer clientId=test_con, groupId=chow] > Coordinator discovery failed, refreshing metadata > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2019-02-27 03:40:02,503] DEBUG [Consumer clientId=test_con, groupId=chow] > Sending FindCoordinator request to broker mwkafka-prod-02.tbd.deshaw.com:9092 > (id: -6 rack: null) > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2019-02-27 03:40:31,269] DEBUG Set SASL client state to COMPLETE > (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator) > [2019-02-27 03:40:31,269] DEBUG [Consumer clientId=test_con, groupId=chow] > Initiating API versions fetch from node -1. > (org.apache.kafka.clients.NetworkClient) > [2019-02-27 03:40:32,507] DEBUG [Consumer clientId=test_con, groupId=chow] > Coordinator discovery failed, refreshing metadata > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2019-02-27 03:40:32,507] DEBUG [Consumer clientId=test_con, groupId=chow] > Sending FindCoordinator request to broker mwkafka-prod-02.dr.deshaw.com:9092 > (id: -4 rack: null) > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2019-02-27 03:40:42,438] DEBUG Set SASL client state to COMPLETE > (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator) -- This message was sent by Atlassian JIRA (v7.6.3#76005)