Can you post the complete error stack trace? Yes, you need to restart the affected brokers. You can tweak log.cleaner.dedupe.buffer.size, log.cleaner.io.buffer.size configs.
Some related JIRAs: https://issues.apache.org/jira/browse/KAFKA-3587 https://issues.apache.org/jira/browse/KAFKA-3894 https://issues.apache.org/jira/browse/KAFKA-3915 On Wed, Jul 13, 2016 at 10:36 PM, Lawrence Weikum <lwei...@pandora.com> wrote: > Oh interesting. I didn’t know about that log file until now. > > The only error that has been populated among all brokers showing this > behavior is: > > ERROR [kafka-log-cleaner-thread-0], Error due to (kafka.log.LogCleaner) > > Then we see many messages like this: > > INFO Compaction for partition [__consumer_offsets,30] is resumed > (kafka.log.LogCleaner) > INFO The cleaning for partition [__consumer_offsets,30] is aborted > (kafka.log.LogCleaner) > > Using Visual VM, I do not see any log-cleaner threads in those brokers. I > do see it in the brokers not showing this behavior though. > > Any idea why the LogCleaner failed? > > As a temporary fix, should we restart the affected brokers? > > Thanks again! > > > Lawrence Weikum > > On 7/13/16, 10:34 AM, "Manikumar Reddy" <manikumar.re...@gmail.com> wrote: > > Hi, > > Are you seeing any errors in log-cleaner.log? The log-cleaner thread can > crash on certain errors. > > Thanks > Manikumar > > On Wed, Jul 13, 2016 at 9:54 PM, Lawrence Weikum <lwei...@pandora.com> > wrote: > > > Hello, > > > > We’re seeing a strange behavior in Kafka 0.9.0.1 which occurs about every > > other week. I’m curious if others have seen it and know of a solution. > > > > Setup and Scenario: > > > > - Brokers initially setup with log compaction turned off > > > > - After 30 days, log compaction was turned on > > > > - At this time, the number of Open FDs was ~ 30K per broker. > > > > - After 2 days, the __consumer_offsets topic was compacted > > fully. Open FDs reduced to ~5K per broker. > > > > - Cluster has been under normal load for roughly 7 days. > > > > - At the 7 day mark, __consumer_offsets topic seems to have > > stopped compacting on two of the brokers, and on those brokers, the FD > > count is up to ~25K. > > > > > > We have tried rebalancing the partitions before. The first time, the > > destination broker had compacted the data fine and open FDs were low. The > > second time, the destination broker kept the FDs open. > > > > > > In all the broker logs, we’re seeing this messages: > > INFO [Group Metadata Manager on Broker 8]: Removed 0 expired offsets in 0 > > milliseconds. (kafka.coordinator.GroupMetadataManager) > > > > There are only 4 consumers at the moment on the cluster; one topic with > 92 > > partitions. > > > > Is there a reason why log compaction may stop working or why the > > __consumer_offsets topic would start holding thousands of FDs? > > > > Thank you all for your help! > > > > Lawrence Weikum > > > > > > >