Hello James,

We received this exact same error this past Tuesday (we are on 0.8.2).  To
answer at least one of your bullet points -- this is a valid scenario. We
had the same questions, I'm starting to think this is a bug -- thank you
for the reproducing steps!

I looked over the Release Notes to see if maybe there were some fixes in
newer versions -- this bug fix looked the most related:
https://issues.apache.org/jira/browse/KAFKA-2143

Thank you,

Tony

On Thu, Feb 25, 2016 at 3:46 PM, James Cheng <jch...@tivo.com> wrote:

> Hi,
>
> I ran into a scenario where one of my brokers would continually shutdown,
> with the error message:
> [2016-02-25 00:29:39,236] FATAL [ReplicaFetcherThread-0-1], Halting
> because log truncation is not allowed for topic test, Current leader 1's
> latest offset 0 is less than replica 2's latest offset 151
> (kafka.server.ReplicaFetcherThread)
>
> I managed to reproduce it with the following scenario:
> 1. Start broker1, with unclean.leader.election.enable=false
> 2. Start broker2, with unclean.leader.election.enable=false
>
> 3. Create topic, single partition, with replication-factor 2.
> 4. Write data to the topic.
>
> 5. At this point, both brokers are in the ISR. Broker1 is the partition
> leader.
>
> 6. Ctrl-Z on broker2. (Simulates a GC pause or a slow network) Broker2
> gets dropped out of ISR. Broker1 is still the leader. I can still write
> data to the partition.
>
> 7. Shutdown Broker1. Hard or controlled, doesn't matter.
>
> 8. rm -rf the log directory of broker1. (This simulates a disk replacement
> or full hardware replacement)
>
> 9. Resume broker2. It attempts to connect to broker1, but doesn't succeed
> because broker1 is down. At this point, the partition is offline. Can't
> write to it.
>
> 10. Resume broker1. Broker1 resumes leadership of the topic. Broker2
> attempts to join ISR, and immediately halts with the error message:
> [2016-02-25 00:29:39,236] FATAL [ReplicaFetcherThread-0-1], Halting
> because log truncation is not allowed for topic test, Current leader 1's
> latest offset 0 is less than replica 2's latest offset 151
> (kafka.server.ReplicaFetcherThread)
>
> I am able to recover by setting unclean.leader.election.enable=true on my
> brokers.
>
> I'm trying to understand a couple things:
> * Is my scenario a valid supported one, or is it along the lines of "don't
> ever do that"?
> * In step 10, why is broker1 allowed to resume leadership even though it
> has no data?
> * In step 10, why is it necessary to stop the entire broker due to one
> partition that is in this state? Wouldn't it be possible for the broker to
> continue to serve traffic for all the other topics, and just mark this one
> as unavailable?
> * Would it make sense to allow an operator to manually specify which
> broker they want to become the new master? This would give me more control
> over how much data loss I am willing to handle. In this case, I would want
> broker2 to become the new master. Or, is that possible and I just don't
> know how to do it?
> * Would it be possible to make unclean.leader.election.enable to be a
> per-topic configuration? This would let me control how much data loss I am
> willing to handle.
>
> Btw, the comment in the source code for that error message indicates:
>
> https://github.com/apache/kafka/blob/01aeea7c7bca34f1edce40116b7721335938b13b/core/src/main/scala/kafka/server/ReplicaFetcherThread.scala#L164-L166
>
>       // Prior to truncating the follower's log, ensure that doing so is
> not disallowed by the configuration for unclean leader election.
>       // This situation could only happen if the unclean election
> configuration for a topic changes while a replica is down. Otherwise,
>       // we should never encounter this situation since a non-ISR leader
> cannot be elected if disallowed by the broker configuration.
>
> But I don't believe that happened. I never changed the configuration. But
> I did venture into "unclean leader election" territory, so I'm not sure if
> the comment still applies.
>
> Thanks,
> -James
>
>
>
> ________________________________
>
> This email and any attachments may contain confidential and privileged
> material for the sole use of the intended recipient. Any review, copying,
> or distribution of this email (or any attachments) by others is prohibited.
> If you are not the intended recipient, please contact the sender
> immediately and permanently delete this email and any attachments. No
> employee or agent of TiVo Inc. is authorized to conclude any binding
> agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
> Inc. may only be made by a signed written agreement.
>

Reply via email to