I quoted the wrong paragraph in my earlier response. The same KIP has a
section on log retention as well.

"Enforce time based log retention

To enforce time based log retention, the broker will check from the oldest
segment forward to the latest segment. For each segment, the broker checks
the last time index entry of a log segment. The timestamp will be the
latest timestamp of the messages in the log segment. So if that timestamp
expires, the broker will delete the log segment. The broker will stop at
the first segment which is not expired. i.e. the broker will not expire a
segment even if it is expired, unless all the older segment has been
expired."

If none of the messages in a segment has a timestamp, last modified time
will be used.

-hans

/**
 * Hans Jespersen, Principal Systems Engineer, Confluent Inc.
 * h...@confluent.io (650)924-2670
 */

On Thu, May 25, 2017 at 9:53 AM, Hans Jespersen <h...@confluent.io> wrote:

> 0.10.x format messages have timestamps within them so retention and
> expiring of messages isn't entirely based on the filesystem timestamp of
> the log segments anymore.
>
> From KIP-33 - https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 33+-+Add+a+time+based+log+index#KIP-33-Addatimebasedlogindex-
> Enforcetimebasedlogrolling
>
> "Enforce time based log rolling
>
> Currently time based log rolling is based on the creating time of the log
> segment. With this KIP, the time based rolling would be changed to only
> based on the message timestamp. More specifically, if the first message in
> the log segment has a timestamp, A new log segment will be rolled out if
> timestamp in the message about to be appended is greater than the timestamp
> of the first message in the segment + log.roll.ms. When
> message.timestamp.type=CreateTime, user should set
> max.message.time.difference.ms appropriately together with log.roll.ms to
> avoid frequent log segment roll out.
>
> During the migration phase, if the first message in a segment does not
> have a timestamp, the log rolling will still be based on the (current time
> - create time of the segment)."
>
> -hans
>
> /**
>  * Hans Jespersen, Principal Systems Engineer, Confluent Inc.
>  * h...@confluent.io (650)924-2670 <(650)%20924-2670>
>  */
>
> On Thu, May 25, 2017 at 12:44 AM, Milind Vaidya <kava...@gmail.com> wrote:
>
>> I have 6 broker cluster.
>>
>> I upgraded it from 0.8.1.1 to 0.10.0.0.
>>
>> Kafka Producer to cluster to consumer (apache storm) upgrade went smooth
>> without any errors.
>> Initially keeping protocol to 0.8 and after clients were upgraded it was
>> promoted to 0.10.
>>
>> Out of 6 brokers, 3 are honouring  log.retention.hours. For other 3 when
>> broker is restarted the time stamp for segment changes to current time.
>> That leads to segments not getting deleted hence disk gets full.
>>
>> du -khc /disk1/kafka-broker/topic-1
>>
>> 71G     /disk1/kafka-broker/topic-1
>>
>> 71G     total
>>
>> Latest segment timestamp : May 25 07:34
>>
>> Oldest segment timestamp : May 25 07:16
>>
>>
>> It is impossible that 71 GB data was collected in mere 15 mins of
>> time. The log.retention.hours=24
>> and this is not new broker so oldest data should be around 24 hrs old.
>>
>> As mentioned above only 3 out of 6 are showing same behaviour.  Why is
>> this
>> happening ?
>>
>
>

Reply via email to