Cross posting from SO. The exact some question. See my answer there: http://stackoverflow.com/questions/41048041/kafka-deletes-segments-even-before-segment-size-is-reached/41065100#comment69338104_41065100
-Matthias On 12/8/16 8:43 PM, Rodrigo Sandoval wrote: > This is what Tood said: > > "Retention is going to be based on a combination of both the retention and > segment size settings (as a side note, it's recommended to use > log.retention.ms and log.segment.ms, not the hours config. That's there for > legacy reasons, but the ms configs are more consistent). As messages are > received by Kafka, they are written to the current open log segment for > each partition. That segment is rotated when either the log.segment.bytes > or the log.segment.ms limit is reached. Once that happens, the log segment > is closed and a new one is opened. Only after a log segment is closed can > it be deleted via the retention settings. Once the log segment is closed > AND either all the messages in the segment are older than log.retention.ms > OR the total partition size is greater than log.retention.bytes, then the > log segment is purged. > > As a note, the default segment limit is 1 gibibyte. So if you've only > written in 1k of messages, you have a long way to go before that segment > gets rotated. This is why the retention is referred to as a minimum time. > You can easily retain much more than you're expecting for slow topics." > > On Dec 9, 2016 02:38, "Rodrigo Sandoval" <rodrigo.madfe...@gmail.com> wrote: > >> Your understanding about segment.bytes and retention.ms is correct. But >> Tood Palino said just after having reached the segment size, that is when >> the segment is "closed" PLUS all messages within the segment that was >> closed are older than the retention policy defined ( in this case >> retention.ms) THEN delete the segment. >> >> At least according to my testing, it is not necessary to wait until the >> segment is closed to delete it. Simply if all messages in a segment ( no >> matter if the segment reached the size defined by segment.bytes) are older >> than the policy defined by retention.ms , THEN delete the segment. >> >> I have been playing a lot today with kafka, and at least that is what I >> figured out. >> >> On Dec 9, 2016 02:13, "Sachin Mittal" <sjmit...@gmail.com> wrote: >> >>> I think segment.bytes defines the size of single log file before creating >>> a >>> new one. >>> retention.ms defines number of ms to wait on a log file before deleting >>> it. >>> >>> So it is working as defined in docs. >>> >>> >>> On Fri, Dec 9, 2016 at 2:42 AM, Rodrigo Sandoval < >>> rodrigo.madfe...@gmail.com >>>> wrote: >>> >>>> How is that about that when the segment size is reached, plus every >>> single >>>> message inside the segment is older than the retention time, then the >>>> segment will be deleted? >>>> >>>> >>>> I have playing with Kafka and I have the following: >>>> >>>> bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic topic1 >>>> config retention.ms=60000 >>>> >>>> bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic topic1 >>>> —config file.delete.delay.ms=40000 >>>> >>>> bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic topic1 >>>> --config segment.bytes=400000 >>>> >>>> My understanding according to your thoughts is a segment will be deleted >>>> when the segment reaches out the segment size above defined >>>> (segment.bytes=400000) PLUS every single message within the segment is >>>> older than the retention time above defined (retention.ms=60000). >>>> >>>> What I noticed is a segment of just 35 bytes, which conteined just one >>>> message, was deleted after the minute (maybe a little more). Therefore, >>> the >>>> segment size was not met in order to delete it. >>>> >>> >> >
signature.asc
Description: OpenPGP digital signature