Cross posting from SO. The exact some question. See my answer there:

http://stackoverflow.com/questions/41048041/kafka-deletes-segments-even-before-segment-size-is-reached/41065100#comment69338104_41065100


-Matthias

On 12/8/16 8:43 PM, Rodrigo Sandoval wrote:
> This is what Tood said:
> 
> "Retention is going to be based on a combination of both the retention and
> segment size settings (as a side note, it's recommended to use
> log.retention.ms and log.segment.ms, not the hours config. That's there for
> legacy reasons, but the ms configs are more consistent). As messages are
> received by Kafka, they are written to the current open log segment for
> each partition. That segment is rotated when either the log.segment.bytes
> or the log.segment.ms limit is reached. Once that happens, the log segment
> is closed and a new one is opened. Only after a log segment is closed can
> it be deleted via the retention settings. Once the log segment is closed
> AND either all the messages in the segment are older than log.retention.ms
> OR the total partition size is greater than log.retention.bytes, then the
> log segment is purged.
> 
> As a note, the default segment limit is 1 gibibyte. So if you've only
> written in 1k of messages, you have a long way to go before that segment
> gets rotated. This is why the retention is referred to as a minimum time.
> You can easily retain much more than you're expecting for slow topics."
> 
> On Dec 9, 2016 02:38, "Rodrigo Sandoval" <rodrigo.madfe...@gmail.com> wrote:
> 
>> Your understanding about segment.bytes and retention.ms is correct. But
>> Tood Palino said just after having reached the segment size, that is when
>> the segment is "closed"  PLUS all messages within the segment that was
>> closed are older than the retention policy defined ( in this case
>> retention.ms) THEN delete the segment.
>>
>> At least according to my testing, it is not necessary to wait until the
>> segment is closed to delete it. Simply if all messages in a segment ( no
>> matter if the segment reached the size defined by segment.bytes) are older
>> than the policy defined by retention.ms , THEN delete the segment.
>>
>> I have been playing a lot today with kafka, and at least that is what I
>> figured out.
>>
>> On Dec 9, 2016 02:13, "Sachin Mittal" <sjmit...@gmail.com> wrote:
>>
>>> I think segment.bytes defines the size of single log file before creating
>>> a
>>> new one.
>>> retention.ms defines number of ms to wait on a log file before deleting
>>> it.
>>>
>>> So it is working as defined in docs.
>>>
>>>
>>> On Fri, Dec 9, 2016 at 2:42 AM, Rodrigo Sandoval <
>>> rodrigo.madfe...@gmail.com
>>>> wrote:
>>>
>>>> How is that about that when the segment size is reached, plus every
>>> single
>>>> message inside the segment is older than the retention time, then the
>>>> segment will be deleted?
>>>>
>>>>
>>>> I have playing with Kafka and I have the following:
>>>>
>>>> bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic topic1
>>>> config retention.ms=60000
>>>>
>>>> bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic topic1
>>>> —config file.delete.delay.ms=40000
>>>>
>>>> bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic topic1
>>>> --config segment.bytes=400000
>>>>
>>>> My understanding according to your thoughts is a segment will be deleted
>>>> when the segment reaches out the segment size above defined
>>>> (segment.bytes=400000) PLUS every single message within the segment is
>>>> older than the retention time above defined (retention.ms=60000).
>>>>
>>>> What I noticed is a segment of just 35 bytes, which conteined just one
>>>> message, was deleted after the minute (maybe a little more). Therefore,
>>> the
>>>> segment size was not met in order to delete it.
>>>>
>>>
>>
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to