I digged it furthermore... It seems the API blockingSendAndReceive hanging for a long to send/receive response from the broker which is not affected.
I just checked send and receive time its taking near about 30 sec. On Tue, Jun 14, 2016 at 11:03 AM, safique ahemad <saf.jnu...@gmail.com> wrote: > Guys any response would be appreciated. > > > ---------- Forwarded message ---------- > From: safique ahemad <saf.jnu...@gmail.com> > Date: Thu, Jun 9, 2016 at 11:18 AM > Subject: Re: Kafka take too long to update the client with metadata when a > broker is gone > To: users@kafka.apache.org > Preview attachment tracesKafka3FailTruncated.log > tracesKafka3FailTruncated.log > Not virus scanned > > <https://doc-04-50-docs.googleusercontent.com/docs/securesc/7068rl289qqa9uum23oadhr0rcvjhflt/1r1nep3r3tp180gnekri32cn636vcet7/1465927200000/06745288266951563386/06745288266951563386/0B-nANlrsm5ogQkh1NUR2UHYtbkU?e=download> > > Hello guys, > > Below is the link where kafka logs can be seens with TRACE enabled. > > > https://drive.google.com/file/d/0B-nANlrsm5ogQkh1NUR2UHYtbkU/view?usp=sharing > > I have truncated log as it was very big but it has all the cover of the > time of problem. > > Scenario: > 1) There were three kafka running i.e. kafka1(perhaps it is the > controller), kafka2 and kafka3. Go Sarama producer was producing. > > 2) Kafka3 is killed. > > the log time stamp is: 00:47:26 > 3) At the time stamp 00:47:34, new leaders are chosen for down partitions. > > 4) But If you see, before 00:48:58, when client send a metadata fetch > request, kafka1 give it stale metadata in response. > But internally it use correct metadata. > At 00:48:58, Kafka receive some trigger after then it start giving correct > metadata to client. > > > Kindly, go through the log and revert if I am missing anything. > > > > On Fri, Jun 3, 2016 at 6:36 AM, Christian <engr...@gmail.com> wrote: > >> Hi Gerard, >> >> When trying to reproduce this, did you use the go sarama client Safique >> mentioned? >> >> >> On Fri, Jun 3, 2016 at 5:10 AM, Gerard Klijs <gerard.kl...@dizzit.com> >> wrote: >> >> > I asume you use a replication factor of 3 for the topics? When I ran >> some >> > test with producer/consumers in a dockerized setup, there where only few >> > failures before the producer switched to to correct new broker again. I >> > don't know the exact time, but seemed like a few seconds at max, this >> was >> > with with 0.9.0.0. >> > >> > On Fri, Jun 3, 2016 at 9:00 AM safique ahemad <saf.jnu...@gmail.com> >> > wrote: >> > >> > > Hi Steve, >> > > >> > > There is no way to access that from public side so I won't be able to >> do >> > > that. Sorry for that. >> > > But the step is quite simple. The only difference is that we have >> > deployed >> > > Kafka cluster using mesos url. >> > > >> > > 1) launch 3 Kafka broker cluster and create a topic with multiple >> > > partitions at least 3 so that one partition land at least on a broker. >> > > 2) launch consumer/producer client. >> > > 3) kill a broker >> > > 4) just observe the behavior of producer client >> > > >> > > >> > > >> > > On Thu, Jun 2, 2016 at 8:15 PM, Steve Tian <steve.cs.t...@gmail.com> >> > > wrote: >> > > >> > > > I see. I'm not sure if this is a known issue. Do you mind share >> the >> > > > brokers/topics setup and the steps to reproduce this issue? >> > > > >> > > > Cheers, Steve >> > > > >> > > > On Fri, Jun 3, 2016, 9:45 AM safique ahemad <saf.jnu...@gmail.com> >> > > wrote: >> > > > >> > > > > you got it right... >> > > > > >> > > > > But DialTimeout is not a concern here. Client try fetching >> metadata >> > > from >> > > > > Kafka brokers but Kafka give them stale metadata near 30-40 sec. >> > > > > It try to fetch 3-4 time in between until it get updated metadata. >> > > > > This is completely different problem than >> > > > > https://github.com/Shopify/sarama/issues/661 >> > > > > >> > > > > >> > > > > >> > > > > On Thu, Jun 2, 2016 at 6:05 PM, Steve Tian < >> steve.cs.t...@gmail.com> >> > > > > wrote: >> > > > > >> > > > > > So you are coming from >> > https://github.com/Shopify/sarama/issues/661 >> > > , >> > > > > > right? I'm not sure if anything from broker side can help but >> > looks >> > > > > like >> > > > > > you already found DialTimeout on client side can help? >> > > > > > >> > > > > > Cheers, Steve >> > > > > > >> > > > > > On Fri, Jun 3, 2016, 8:33 AM safique ahemad < >> saf.jnu...@gmail.com> >> > > > > wrote: >> > > > > > >> > > > > > > kafka version:0.9.0.0 >> > > > > > > go sarama client version: 1.8 >> > > > > > > >> > > > > > > On Thu, Jun 2, 2016 at 5:14 PM, Steve Tian < >> > > steve.cs.t...@gmail.com> >> > > > > > > wrote: >> > > > > > > >> > > > > > > > Client version? >> > > > > > > > >> > > > > > > > On Fri, Jun 3, 2016, 4:44 AM safique ahemad < >> > > saf.jnu...@gmail.com> >> > > > > > > wrote: >> > > > > > > > >> > > > > > > > > Hi All, >> > > > > > > > > >> > > > > > > > > We are using Kafka broker cluster in our data center. >> > > > > > > > > Recently, It is realized that when a Kafka broker goes >> down >> > > then >> > > > > > client >> > > > > > > > try >> > > > > > > > > to refresh the metadata but it get stale metadata upto >> near >> > 30 >> > > > > > seconds. >> > > > > > > > > >> > > > > > > > > After near 30-35 seconds, updated metadata is obtained by >> > > client. >> > > > > > This >> > > > > > > is >> > > > > > > > > really a large time for the client continuously gets send >> > > failure >> > > > > for >> > > > > > > so >> > > > > > > > > long. >> > > > > > > > > >> > > > > > > > > Kindly, reply if any configuration may help here or >> something >> > > > else >> > > > > or >> > > > > > > > > required. >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > -- >> > > > > > > > > >> > > > > > > > > Regards, >> > > > > > > > > Safique Ahemad >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > -- >> > > > > > > >> > > > > > > Regards, >> > > > > > > Safique Ahemad >> > > > > > > GlobalLogic | Leaders in software R&D services >> > > > > > > P :+91 120 4342000-2990 | M:+91 9953533367 >> > > > > > > www.globallogic.com >> > > > > > > >> > > > > > >> > > > > >> > > > > >> > > > > >> > > > > -- >> > > > > >> > > > > Regards, >> > > > > Safique Ahemad >> > > > > GlobalLogic | Leaders in software R&D services >> > > > > P :+91 120 4342000-2990 | M:+91 9953533367 >> > > > > www.globallogic.com >> > > > > >> > > > >> > > >> > > >> > > >> > > -- >> > > >> > > Regards, >> > > Safique Ahemad >> > > GlobalLogic | Leaders in software R&D services >> > > P :+91 120 4342000-2990 | M:+91 9953533367 >> > > www.globallogic.com >> > > >> > >> > > > > -- > > Regards, > Safique Ahemad > GlobalLogic | Leaders in software R&D services > P :+91 120 4342000-2990 | M:+91 9953533367 > www.globallogic.com > > > > -- > > Regards, > Safique Ahemad > GlobalLogic | Leaders in software R&D services > P :+91 120 4342000-2990 | M:+91 9953533367 > www.globallogic.com > -- Regards, Safique Ahemad GlobalLogic | Leaders in software R&D services P :+91 120 4342000-2990 | M:+91 9953533367 www.globallogic.com