Jason Gustafson created KAFKA-5490:
--------------------------------------

             Summary: Deletion of tombstones during cleaning should consider 
idempotent message retention
                 Key: KAFKA-5490
                 URL: https://issues.apache.org/jira/browse/KAFKA-5490
             Project: Kafka
          Issue Type: Sub-task
            Reporter: Jason Gustafson
            Assignee: Jason Gustafson
            Priority: Critical
             Fix For: 0.11.0.1


The LogCleaner always preserves the message containing last sequence from a 
given ProducerId when doing a round of cleaning. This is necessary to ensure 
that the producer is not prematurely evicted which would cause an 
OutOfOrderSequenceException. The problem with this approach is that the 
preserved message won't be considered again for cleaning until a new message 
with the same key is written to the topic. Generally this could result in 
accumulation of stale entries in the log, but the bigger problem is that the 
newer entry with the same key could be a tombstone. If we end up deleting this 
tombstone before a new record with the same key is written, then the old entry 
will resurface. For example, suppose the following sequence of writes:

1. ProducerId=1, Key=A, Value=1
2. ProducerId=2, Key=A, Value=null (tombstone)

We will preserve the first entry indefinitely until a new record with Key=A is 
written AND either ProducerId 1 has written a newer record with a larger 
sequence number or ProducerId 1 becomes expired. As long as the tombstone is 
preserved, there is no correctness violation: a consumer reading from the 
beginning will ignore the first entry after reading the tombstone. But it is 
possible that the tombstone entry will be removed from the log before a new 
record with Key=A is written. If that happens, then a consumer reading from the 
beginning would incorrectly observe the overwritten value.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to