Hi,

You mention topic size, but I will explain using partition size. If you
want to control the maximum size of all partitions combined, you need to
take that value, divide it by the number of partitions and set the
retention.bytes setting to the result.

A topic partition is stored in segments, with only the latest segment
written to, called the active segment.
If the topic cleanup policy is set to deletion then Kafka cleans up only
old segments that contain records older than the configured retention time,
or if the maximum size for the partition is exceeded.
In your case the retention partition size is 10GB and retention time is 1
hour. With a produce rate of 11GB per hour it would mean that the size
based clean up rule is triggered before the time based cleanup.
The oldest inactive segments which contain the oldest messages would be
deleted, even if they were produced in the last hour. Old segments will be
deleted until partition size is less than retention.bytes value, or no old
segments exist.

The segment sizes and message age can also be controlled with
the segment.bytes and segment.ms settings. A new segment is created when
the active segment moves beyond one of these settings. Creating more files
to track can create other performance issues, because of CPU/RAM/File
system limitations. Keep monitoring the performance to get a feel of your
cluster performance.

Producers should not notice these cleanups, unless the broker has to create
and clean segments too often. Then there can be delays or producer errors.

I hope this answers your question.

Kind regards,


Richard Bosch

Developer Advocate

Axual BV

https://axual.com/


On Thu, Nov 9, 2023 at 3:00 PM Yeikel Santana <em...@yeikel.com> wrote:

> Hi all,
>
> This might be a common question, but unfortunately, I couldn't find a
> reliable answer or documentation to guide me. There are various conflicting
> ideas.
>
> If a producer tries to ingest at a faster rate than the configuration set
> in the topic, what will happen?
>
> Example:
>
> - Topic size: 10 GB
> - Retention Period: 1h
> - Producer rate: 11 GB/h
>
>
> Will Kafka:
>
> - Aggressively delete older messages even if the retention period is
> greater than the age of the message?
> - Reject messages from the producer until there is room for new messages?
> - Potentially delete newer or older messages to make room?
> - Any other type of data handling?
>
> Thanks!
>

Reply via email to