I was having this problem with one of my __consumer_offsets partitions; I used reassignment to move the large partition onto a different set of machines (which forced the cleaner to run through them again) and after the new machines finished replicating, the partition was back down from 41GB to a nice trim 38MB.
On Fri, Oct 28, 2016 at 1:00 PM, Chi Hoang <chi.ho...@zuora.com> wrote: > Hi, > We have a 3-node cluster that is running 0.9.0.1, and recently saw that the > "__consumer_offsets" topic on one of the nodes seems really skewed with > disk usage that looks like: > > 73G ./__consumer_offsets-10 > 0 ./__consumer_offsets-7 > 0 ./__consumer_offsets-4 > 0 ./__consumer_offsets-1 > 0 ./__consumer_offsets-49 > 19G ./__consumer_offsets-46 > 0 ./__consumer_offsets-43 > 0 ./__consumer_offsets-40 > > > > This goes on for all 50 partitions. Upon inspection, we saw that a lot of > the log files were old: > > ll __consumer_offsets-10 > total 76245192 > -rw-r--r-- 1 root root 0 Oct 7 20:14 00000000000000000000.index > -rw-r--r-- 1 root root 901 Oct 7 20:14 00000000000000000000.log > -rw-r--r-- 1 root root 157904 Oct 7 22:15 00000000000907046457.index > -rw-r--r-- 1 root root 104855056 Oct 7 22:15 00000000000907046457.log > -rw-r--r-- 1 root root 157904 Oct 7 22:51 00000000000909543421.index > -rw-r--r-- 1 root root 104853568 Oct 7 22:51 00000000000909543421.log > -rw-r--r-- 1 root root 157904 Oct 7 23:27 00000000000910806717.index > -rw-r--r-- 1 root root 104853568 Oct 7 23:27 00000000000910806717.log > > > We are using default parameters as it pertains to offset management, and > our config output includes the following entries: > > log.cleaner.enable = true > > offsets.retention.minutes = 1440 > > > I tried looking through the issues on JIRA but didn't see a reported > issue. Does anyone know what's going on, and how I can fix this? > > Thanks. > -- James Brown Engineer