Hello, We are seeing an issue on rolling restarts where replicas of a few partitions are lagging and never catchup. The log files for these partitions look the same size on all the brokers- including the ones where the replicas are lagging. The failedpartitionscount metric is still at 0 but the replicas are stuck in that state until we manually either reassign partitions or reelect leader. Some of the partitions in question don’t even receive any data during the rolling reboots. These partitions have min.insync.replicas set at 1 but even then is it not expected that the replicas eventually catchup to the leader? As far as I could make out, ReplicaFetcherThread just stopped fetching for those partitions
Has anyone seen a similar issue? Thanks, Maruthi