Re: Networking errors and durability settings

2016-08-26 Thread Bryan Baugher
Yes its quite likely we saw many zk session losses for the brokers around the same time. I'll keep an eye on that JIRA and let you know if we come up with anything else On Fri, Aug 26, 2016 at 11:44 AM Jun Rao wrote: > Bryan, > > Were there multiple brokers losing ZK session

Re: Networking errors and durability settings

2016-08-26 Thread Jun Rao
Bryan, Were there multiple brokers losing ZK session around the same time? There is one known issue https://issues.apache.org/jira/browse/KAFKA-1211. Basically, if the leader changes too quickly, it's possible for a follower to truncate some previous committed messages and then immediately

Re: Networking errors and durability settings

2016-08-26 Thread Bryan Baugher
We didn't suffer any data loss nor was there any power outage that I know of. On Fri, Aug 26, 2016 at 5:14 AM Khurrum Nasim wrote: > On Tue, Aug 23, 2016 at 9:00 AM, Bryan Baugher wrote: > > > > > Hi everyone, > > > > > > Yesterday we had lots of

Re: Networking errors and durability settings

2016-08-26 Thread Khurrum Nasim
On Tue, Aug 23, 2016 at 9:00 AM, Bryan Baugher wrote: > > > Hi everyone, > > > > Yesterday we had lots of network failures running our Kafka cluster > > (0.9.0.1 ~40 nodes). We run everything using the higher durability > settings > > in order to avoid in data loss, producers

Re: Networking errors and durability settings

2016-08-25 Thread Jun Rao
Bryan, https://issues.apache.org/jira/browse/KAFKA-3410 reported a similar issue but only happened when the leader broker's log was manually deleted. In your case, was there any data loss in the broker due to things like power outage? Thanks, Jun On Tue, Aug 23, 2016 at 9:00 AM, Bryan Baugher

Re: Networking errors and durability settings

2016-08-25 Thread Guozhang Wang
Hello Bryan, I think you were encountering https://issues.apache.org/jira/browse/KAFKA-3410. Maybe you can take a look on this ticket and see if it matches your scenario. Guozhang On Tue, Aug 23, 2016 at 9:00 AM, Bryan Baugher wrote: > Hi everyone, > > Yesterday we had lots

Networking errors and durability settings

2016-08-23 Thread Bryan Baugher
Hi everyone, Yesterday we had lots of network failures running our Kafka cluster (0.9.0.1 ~40 nodes). We run everything using the higher durability settings in order to avoid in data loss, producers use all/-1 ack, topics/brokers have min insync replicas = 2. unclean leader election = false, and