Hello guys, Below is the link where kafka logs can be seens with TRACE enabled.
https://drive.google.com/file/d/0B-nANlrsm5ogQkh1NUR2UHYtbkU/view?usp=sharing I have truncated log as it was very big but it has all the cover of the time of problem. Scenario: 1) There were three kafka running i.e. kafka1(perhaps it is the controller), kafka2 and kafka3. Go Sarama producer was producing. 2) Kafka3 is killed. the log time stamp is: 00:47:26 3) At the time stamp 00:47:34, new leaders are chosen for down partitions. 4) But If you see, before 00:48:58, when client send a metadata fetch request, kafka1 give it stale metadata in response. But internally it use correct metadata. At 00:48:58, Kafka receive some trigger after then it start giving correct metadata to client. Kindly, go through the log and revert if I am missing anything. On Fri, Jun 3, 2016 at 6:36 AM, Christian <engr...@gmail.com> wrote: > Hi Gerard, > > When trying to reproduce this, did you use the go sarama client Safique > mentioned? > > > On Fri, Jun 3, 2016 at 5:10 AM, Gerard Klijs <gerard.kl...@dizzit.com> > wrote: > > > I asume you use a replication factor of 3 for the topics? When I ran some > > test with producer/consumers in a dockerized setup, there where only few > > failures before the producer switched to to correct new broker again. I > > don't know the exact time, but seemed like a few seconds at max, this was > > with with 0.9.0.0. > > > > On Fri, Jun 3, 2016 at 9:00 AM safique ahemad <saf.jnu...@gmail.com> > > wrote: > > > > > Hi Steve, > > > > > > There is no way to access that from public side so I won't be able to > do > > > that. Sorry for that. > > > But the step is quite simple. The only difference is that we have > > deployed > > > Kafka cluster using mesos url. > > > > > > 1) launch 3 Kafka broker cluster and create a topic with multiple > > > partitions at least 3 so that one partition land at least on a broker. > > > 2) launch consumer/producer client. > > > 3) kill a broker > > > 4) just observe the behavior of producer client > > > > > > > > > > > > On Thu, Jun 2, 2016 at 8:15 PM, Steve Tian <steve.cs.t...@gmail.com> > > > wrote: > > > > > > > I see. I'm not sure if this is a known issue. Do you mind share the > > > > brokers/topics setup and the steps to reproduce this issue? > > > > > > > > Cheers, Steve > > > > > > > > On Fri, Jun 3, 2016, 9:45 AM safique ahemad <saf.jnu...@gmail.com> > > > wrote: > > > > > > > > > you got it right... > > > > > > > > > > But DialTimeout is not a concern here. Client try fetching metadata > > > from > > > > > Kafka brokers but Kafka give them stale metadata near 30-40 sec. > > > > > It try to fetch 3-4 time in between until it get updated metadata. > > > > > This is completely different problem than > > > > > https://github.com/Shopify/sarama/issues/661 > > > > > > > > > > > > > > > > > > > > On Thu, Jun 2, 2016 at 6:05 PM, Steve Tian < > steve.cs.t...@gmail.com> > > > > > wrote: > > > > > > > > > > > So you are coming from > > https://github.com/Shopify/sarama/issues/661 > > > , > > > > > > right? I'm not sure if anything from broker side can help but > > looks > > > > > like > > > > > > you already found DialTimeout on client side can help? > > > > > > > > > > > > Cheers, Steve > > > > > > > > > > > > On Fri, Jun 3, 2016, 8:33 AM safique ahemad < > saf.jnu...@gmail.com> > > > > > wrote: > > > > > > > > > > > > > kafka version:0.9.0.0 > > > > > > > go sarama client version: 1.8 > > > > > > > > > > > > > > On Thu, Jun 2, 2016 at 5:14 PM, Steve Tian < > > > steve.cs.t...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > Client version? > > > > > > > > > > > > > > > > On Fri, Jun 3, 2016, 4:44 AM safique ahemad < > > > saf.jnu...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > > > > > We are using Kafka broker cluster in our data center. > > > > > > > > > Recently, It is realized that when a Kafka broker goes down > > > then > > > > > > client > > > > > > > > try > > > > > > > > > to refresh the metadata but it get stale metadata upto near > > 30 > > > > > > seconds. > > > > > > > > > > > > > > > > > > After near 30-35 seconds, updated metadata is obtained by > > > client. > > > > > > This > > > > > > > is > > > > > > > > > really a large time for the client continuously gets send > > > failure > > > > > for > > > > > > > so > > > > > > > > > long. > > > > > > > > > > > > > > > > > > Kindly, reply if any configuration may help here or > something > > > > else > > > > > or > > > > > > > > > required. > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > Safique Ahemad > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > Regards, > > > > > > > Safique Ahemad > > > > > > > GlobalLogic | Leaders in software R&D services > > > > > > > P :+91 120 4342000-2990 | M:+91 9953533367 > > > > > > > www.globallogic.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > Regards, > > > > > Safique Ahemad > > > > > GlobalLogic | Leaders in software R&D services > > > > > P :+91 120 4342000-2990 | M:+91 9953533367 > > > > > www.globallogic.com > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Regards, > > > Safique Ahemad > > > GlobalLogic | Leaders in software R&D services > > > P :+91 120 4342000-2990 | M:+91 9953533367 > > > www.globallogic.com > > > > > > -- Regards, Safique Ahemad GlobalLogic | Leaders in software R&D services P :+91 120 4342000-2990 | M:+91 9953533367 www.globallogic.com