Jason Gustafson created KAFKA-5490: -------------------------------------- Summary: Deletion of tombstones during cleaning should consider idempotent message retention Key: KAFKA-5490 URL: https://issues.apache.org/jira/browse/KAFKA-5490 Project: Kafka Issue Type: Sub-task Reporter: Jason Gustafson Assignee: Jason Gustafson Priority: Critical Fix For: 0.11.0.1
The LogCleaner always preserves the message containing last sequence from a given ProducerId when doing a round of cleaning. This is necessary to ensure that the producer is not prematurely evicted which would cause an OutOfOrderSequenceException. The problem with this approach is that the preserved message won't be considered again for cleaning until a new message with the same key is written to the topic. Generally this could result in accumulation of stale entries in the log, but the bigger problem is that the newer entry with the same key could be a tombstone. If we end up deleting this tombstone before a new record with the same key is written, then the old entry will resurface. For example, suppose the following sequence of writes: 1. ProducerId=1, Key=A, Value=1 2. ProducerId=2, Key=A, Value=null (tombstone) We will preserve the first entry indefinitely until a new record with Key=A is written AND either ProducerId 1 has written a newer record with a larger sequence number or ProducerId 1 becomes expired. As long as the tombstone is preserved, there is no correctness violation: a consumer reading from the beginning will ignore the first entry after reading the tombstone. But it is possible that the tombstone entry will be removed from the log before a new record with Key=A is written. If that happens, then a consumer reading from the beginning would incorrectly observe the overwritten value. -- This message was sent by Atlassian JIRA (v6.4.14#64029)