Thanks for the reply Jason.

Our topics global retention is for 4 days and as we are planning to set the
__consumer_offsets retention to the same interval, in the worst case, we
won't loose any message offsets as the data will anyways be rotated.

Reg. the problems with log cleaner:
1. I enabled log compaction on one of the broker where the consumer offsets
partitions reside (and that was running out of memory due to huge size of
these partitions).
2. Upon reboot, the broker actually truncated all the logs and started
replicating afresh - TBs of data.
3. This made the entire cluster slow as we hit our Rx and Tx limits and all
the other topics got affected as the producing latencies spiked to minutes
from milli-seconds.

Theoretically, I was only expecting the partitions will only get the diff
of the data from the leader as it has almost all of the data for the
to-be-compacted topics. But as the partitions were compacted upon restart,
the replication started getting all the data from the leader (which is not
compacted). Hence, we took the broker OOR. Also, other wierd thing was the
consumers have reset the offsets of these partitions to the latest when
they failed to commit the offsets (which happened when the this broker was
leader for some of the partitions and acks=-1 for the offsets topic and
when shutdown, takes some time ~ max lag messages before coming off the
ISR). We will investigate further on why the offsets were reset and file a
JIRA.

Now we are in a situtation where we might run out of disk space for the
__consumer_offsets hosting brokers and we are planning to change the policy
to delete to avoid that. Please give your inputs. Thanks.

On Tue, Mar 8, 2016 at 7:10 AM, Jason Gustafson <ja...@confluent.io> wrote:

> This is actually a really good question. If you change the retention policy
> of the offsets topic, then in the worst case, consumer groups could lose
> their last committed positions and fall back to the auto reset behavior.
> However, if your consumers are not down for a long time and you set the
> retention to a reasonably long value, maybe you can get away with it? One
> downside is that broker reads the entire offset log into an in-memory cache
> when it takes over leadership of one of the __consumer_offsets partitions.
> Hence the longer your retention time, the longer it will take for the new
> leader to read to the end of the log. There may be other consequences as
> well that I haven't thought of...
>
> Can you describe in a little more detail the problem that you found
> enabling the cleaner?
>
> -Jason
>
> On Sun, Mar 6, 2016 at 3:09 AM, Achanta Vamsi Subhash <
> achanta.va...@flipkart.com> wrote:
>
> > Hi,
> >
> > We tested this on our stage environment and works fine if we change the
> > policy to delete from compact. Will there be any side effects if we
> change
> > it to delete for the __consumer_offsets topic?
> >
> > On Wed, Mar 2, 2016 at 4:43 PM, Achanta Vamsi Subhash <
> > achanta.va...@flipkart.com> wrote:
> >
> > > Hi all,
> > >
> > > We have a __consumer_offsets topic has cleanup.policy=compact and
> > > log.cleaner.enable=false. What would happen if we change the
> > cleanup.policy
> > > to delete? Will that treat the offsets topic as same as any other
> topic?
> > >
> > > We currently have a setup without log.cleaner.enable=false and we have
> > > offset topics hosting brokers using a lot of disk as they are never
> > > cleaned/compacted. We tried enabling log.cleaner.enable=true for the
> > > brokers with offsets topic and that is leading to lot of replicated
> data
> > > and is taking hours to finish.
> > >
> > > What is a better way to clean up the old segments of __consumer_offsets
> > > topic?
> > >
> > > --
> > > Regards
> > > Vamsi Subhash
> > >
> >
> >
> >
> > --
> > Regards
> > Vamsi Subhash
> >
>



-- 
Regards
Vamsi Subhash

Reply via email to