Matthias J. Sax created KAFKA-13499:
---------------------------------------

             Summary: Avoid restoring outdated records
                 Key: KAFKA-13499
                 URL: https://issues.apache.org/jira/browse/KAFKA-13499
             Project: Kafka
          Issue Type: Improvement
          Components: streams
            Reporter: Matthias J. Sax


Kafka Streams has the config `windowstore.changelog.additional.retention.ms` to 
allow for an increase retention time.

While an increase retention time can be useful, it can also lead to unnecessary 
restore cost, especially for stream-stream joins. Assume a stream-stream join 
with 1h window size and a grace period of 1h. For this case, we only need 2h of 
data to restore. If we lag, the `windowstore.changelog.additional.retention.ms` 
helps to prevent the broker from truncating data too early. However, if we 
don't lag and we need to restore, we restore everything from the changelog.

Instead of doing a seek-to-beginning, we could use the timestamp index to seek 
the first offset older than the 2h "window" of data that we need to restore, to 
avoid unnecessary work.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to