Hey,

we had the same issue as you.

I checked the code and it chooses the first live replica from the
assignment list. So if you describe a topic with kafka-topics, you will see
the brokers list that has the replica of each partition. For example:
[1001, 1002, 1003]. If that is the list, Kafka will choose the first
replica that is available (is online) in that list.

We use "acks=all" and "min.insync.replicas=2", so that should mean that
even if the leader is down and the rest of the replicas fall out of the
ISR, one of the follower replicas should have up to date data. You can
compare the two follower replicas with kafka-dump-tool to see which are
more up-to-date. If you run a partition reassignment, you can change the
order of the followers in the assignment list and then trigger an unclean
leader election for the reassigned partitions. So it seems that this way,
assuming the use of "acks=all" and "min.insync.replicas=2", we can recover
without data loss. But only if my above assumption is correct. And please
test this before using on live data.

Peter

On Mon, 28 Jun 2021 at 09:53, Oleksandr Shulgin <
oleksandr.shul...@zalando.de> wrote:

> On Mon, Jun 21, 2021 at 12:33 PM Oleksandr Shulgin <
> oleksandr.shul...@zalando.de> wrote:
> >
> > In summary: is there a risk of data loss in such a scenario?  Is this
> risk avoidable and if so, what are
> > the prerequisites?
>
> Apologies if I messed up line breaks and that made reading harder. O:-)
>
> The question boils down to: is replica selection completely random in case
> of unclean leader election or not?
>
>
> Regards,
> --
> Alex
>

Reply via email to