We have a 6 broker cluster running in AWS in 3 availability zones. A few times while under slight load (40k messages/second, roughly) we have seen a replica try to request a message from the leader by an index that is slightly in the future, 3-6 messages usually. When this happens the replica throws an error, deletes all of its data for that partition, and resyncs from the beginning of the leader. Given that the offset difference is so small I suspect a latency/timing issue, but am uncertain what to tweak. Thank you in advance for any assistance!
Leader logs: [2015-04-15 02:07:21,328] ERROR [Replica Manager on Broker 2]: Error when processing fetch request for partition [xxx.prod,1] offset 127413332 from follower with correlation id 35310725. Possible cause: Request for offset 127413332 but we only have log segments in the range 429569 to 127413328. (kafka.server.ReplicaManager) [2015-04-15 02:07:23,593] INFO Partition [xxx.prod,1] on broker 2: Shrinking ISR for partition [xxx.prod,1] from 2,6 to 2 (kafka.cluster.Partition) Follower logs: ... [2015-04-15 02:08:02,085] INFO Scheduling log segment 124662576 for log xxx.prod-1 for deletion. (kafka.log.Log) [2015-04-15 02:08:02,086] INFO Scheduling log segment 126360465 for log xxx.prod-1 for deletion. (kafka.log.Log) [2015-04-15 02:08:02,121] WARN [ReplicaFetcherThread-3-2], Replica 6 for partition [xxx.prod,1] reset its fetch offset from 429569 to current leader 2's start offset 429569 (kafka.server.ReplicaFetcherThread) [2015-04-15 02:08:02,131] ERROR [ReplicaFetcherThread-3-2], Current offset 127413332 for partition [xxx.prod,1] out of range; reset offset to 429569 (kafka.server.ReplicaFetcherThread) Relevant config: num.network.threads=8 num.io.threads=8 socket.send.buffer.bytes=1048576 socket.receive.buffer.bytes=1048576 socket.request.max.bytes=104857600 default.replication.factor=2 num.replica.fetchers=4 replica.fetch.max.bytes=1048576 replica.fetch.wait.max.ms=3000 replica.high.watermark.checkpoint.interval.ms=5000 replica.socket.timeout.ms=30000 replica.socket.receive.buffer.bytes=65536 replica.lag.time.max.ms=10000 replica.lag.max.messages=4000 controller.socket.timeout.ms=30000 controller.message.queue.size=100000