Hi All,

In our 3-node test cluster running Kafka 0.10.0, we faced this error:

FATAL [2017-07-06 07:30:42,962]
kafka.server.ReplicaFetcherThread:[Logging$class:fatal:110] -
[ReplicaFetcherThread-0-0] - [ReplicaFetcherThread-0-0], Halting because
log truncation is not allowed for topic Topic3, Current leader 0's latest
offset 41170020 is less than replica 3's latest offset 41170083

Kafka cluster is configured with:
replication_factor:3, min_isr:2 and unclean_leader_election: disabled

There were some machine issues where node 1 crashed out and rejoined after
30 seconds or so. Ideally, since min_isr is set to 2, another node should
have take over but for some reason the isr for some of the topic partitions
consisted of only node 1 just before node 1 crashed.

It appears similar to issues described in:
https://issues.apache.org/jira/browse/KAFKA-3861
https://issues.apache.org/jira/browse/KAFKA-3410

What I wanted to know is :

(a) How to handle such errors? ISR size is dynamically determined and it is
quite possible that in time of troubles, the troubled node will shrink its
ISR to itself (like network disruption before crashing).
(b) Is this issue addressed in any way in future Kafka versions like
0.11.0? Will https://issues.apache.org/jira/browse/KAFKA-1211 prevent this
situation?

--
thanks,
gaurav

Reply via email to