I am saying that replication quotas will mitigate one of the potential
downsides of setting an infinite retention policy.

There is no clear set yes/no best practice rule for setting an extremely
large retention policy. It is clearly a valid configuration and there are
people who run this way.

The issues have more to do will the amount of data you expect to be stored
over the life of the system. If you have a Kafka cluster with petabytes of
data in it and a consumer comes along and blindly consumes from the
beginning, they will be getting a lot of data. So much so that this might
be considered an anti-pattern because their apps might not behave as they
expect and the network bandwidth used by lots of clients operating this way
may be considered bad practice.

Another way to avoid collecting too much data is to use compacted topics,
which are a special kind of topic that keeps the latest value for each key
forever, but removes the older messages with the same key in order to
reduce the total about of messages stored.

How much data do you expect to store in your largest topic over the life of
the cluster?

-hans





/**
 * Hans Jespersen, Principal Systems Engineer, Confluent Inc.
 * h...@confluent.io (650)924-2670
 */

On Tue, Mar 14, 2017 at 10:36 AM, Joe San <codeintheo...@gmail.com> wrote:

> So that means with replication quotas, I can set the retention policy to be
> infinite?
>
> On Tue, Mar 14, 2017 at 6:25 PM, Hans Jespersen <h...@confluent.io> wrote:
>
> > You might want to use the new replication quotas mechanism (i.e. network
> > throttling) to make sure that replication traffic doesn't negatively
> impact
> > your production traffic.
> >
> > See for details:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 73+Replication+Quotas
> >
> > This feature was added in 0.10.1
> >
> > -hans
> >
> > /**
> >  * Hans Jespersen, Principal Systems Engineer, Confluent Inc.
> >  * h...@confluent.io (650)924-2670
> >  */
> >
> > On Tue, Mar 14, 2017 at 10:09 AM, Joe San <codeintheo...@gmail.com>
> wrote:
> >
> > > Dear Kafka Users,
> > >
> > > What are the arguments against setting the retention plociy on a Kafka
> > > topic to infinite? I was in an interesting discussion with one of my
> > > colleagues where he was suggesting to set the retention policy for a
> > topic
> > > to be indefinite.
> > >
> > > So how does this play up when adding new broker partitions? Say, I have
> > > accumulated in my topic some gigabytes of data and now I realize that I
> > > have to scale up by adding another partition. Now is this going to pose
> > me
> > > a problem? The partition rebalance has to happen and I'm not sure what
> > the
> > > implications are with rebalancing a partition that has gigabytes of
> data.
> > >
> > > Any thoughts on this?
> > >
> > > Thanks and Regards,
> > > Jothi
> > >
> >
>

Reply via email to