Re: topics stuck in "Leader: -1" after crash while migrating topics

James Brown Fri, 28 Apr 2017 11:27:09 -0700

For what it's worth, shutting down the entire cluster and then restarting
it did address this issue.


I'd love anyone's thoughts on what the "correct" fix would be here.

On Fri, Apr 28, 2017 at 10:58 AM, James Brown <jbr...@easypost.com> wrote:

> The following is also appearing in the logs a lot, if anyone has any ideas:
>
> INFO Partition [easypost.syslog,7] on broker 1: Cached zkVersion [647] not
> equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
>
> On Fri, Apr 28, 2017 at 10:43 AM, James Brown <jbr...@easypost.com> wrote:
>
>> We're running 0.10.1.0 on a five-node cluster.
>>
>> I was in the process of migrating some topics from having 2 replicas to
>> having three replicas when two the five machines in this cluster crashed
>> (brokers 2 and 3).
>>
>> After restarting them, all of the topics that were previously assigned to
>> them are unavailable and showing "Leader: -1".
>>
>> Example kafka-topics output:
>>
>> % kafka-topics.sh --zookeeper $ZK_HP --describe  --unavailable-partitions
>> Topic: __consumer_offsets Partition: 9 Leader: -1 Replicas: 3,2,4 Isr:
>> Topic: __consumer_offsets Partition: 13 Leader: -1 Replicas: 3,2,4 Isr:
>> Topic: __consumer_offsets Partition: 17 Leader: -1 Replicas: 3,2,5 Isr:
>> Topic: __consumer_offsets Partition: 23 Leader: -1 Replicas: 5,2,1 Isr:
>> Topic: __consumer_offsets Partition: 25 Leader: -1 Replicas: 3,2,5 Isr:
>> Topic: __consumer_offsets Partition: 26 Leader: -1 Replicas: 3,2,1 Isr:
>> Topic: __consumer_offsets Partition: 30 Leader: -1 Replicas: 3,1,2 Isr:
>> Topic: __consumer_offsets Partition: 33 Leader: -1 Replicas: 1,2,4 Isr:
>> Topic: __consumer_offsets Partition: 35 Leader: -1 Replicas: 1,2,5 Isr:
>> Topic: __consumer_offsets Partition: 39 Leader: -1 Replicas: 3,1,2 Isr:
>> Topic: __consumer_offsets Partition: 40 Leader: -1 Replicas: 3,4,2 Isr:
>> Topic: __consumer_offsets Partition: 44 Leader: -1 Replicas: 3,1,2 Isr:
>> Topic: __consumer_offsets Partition: 45 Leader: -1 Replicas: 1,3,2 Isr:
>>
>> Note that I wasn't even moving any of the __consumer_offsets partitions,
>> so I'm not sure if the fact that a reassignment was in progress is a red
>> herring or not.
>>
>> The logs are full of
>>
>> ERROR [ReplicaFetcherThread-0-3], Error for partition [tracking.syslog,2]
>> to broker 3:org.apache.kafka.common.errors.UnknownServerException: The
>> server experienced an unexpected error when processing the request
>> (kafka.server.ReplicaFetcherThread)
>> ERROR [ReplicaFetcherThread-0-3], Error for partition [tracking.syslog,2]
>> to broker 3:org.apache.kafka.common.errors.UnknownServerException: The
>> server experienced an unexpected error when processing the request
>> (kafka.server.ReplicaFetcherThread)
>> ERROR [ReplicaFetcherThread-0-3], Error for partition
>> [epostg.request_log_v1,0] to broker 
>> 3:org.apache.kafka.common.errors.UnknownServerException:
>> The server experienced an unexpected error when processing the request
>> (kafka.server.ReplicaFetcherThread)
>> ERROR [ReplicaFetcherThread-0-3], Error for partition
>> [epostg.request_log_v1,0] to broker 
>> 3:org.apache.kafka.common.errors.UnknownServerException:
>> The server experienced an unexpected error when processing the request
>> (kafka.server.ReplicaFetcherThread)
>>
>> What can I do to fix this? Should I manually reassign all partitions
>> that were led by brokers 2 or 3 to only have whatever the third broker was
>> in their replica-set as their replica set? Do I need to temporarily enable
>> unclean elections?
>>
>> I've never seen a cluster fail this way...
>>
>> --
>> James Brown
>> Engineer
>>
>
>
>
> --
> James Brown
> Engineer
>



-- 
James Brown
Engineer

Re: topics stuck in "Leader: -1" after crash while migrating topics

Reply via email to