Solved (We're using Kafka 0.8.1 and this was caused by the bug in dynamic topic config changes)
Found the problem. We had changed retention.ms for this topic to 100,000 (100 seconds) earlier this month (using kafka-topics.sh admin tool). Then after Kafka had purged data we proceeded to set the retention back to 1209600000ms. However, I believe the second change wasn't picked up by the brokers. (See TopicConfigManager.scala) On further debugging, we found that the the znode that is added to /config/changes by AdminUtils.scala (via kafka-topics.sh) was getting deleted before all the brokers had a chance to read its content and update the topic configurations. This deletion was the result of a bug in TopicConfigManager.scala which would delete the znode if that broker did not have the topic that changed. The existence of this race condition means that a broker might or might not see a dynamic topic configuration change. Until ofcourse the broker is bounced at which point it reads the configuration afresh from /config/topics. So this was indeed because of the bug reported in KAFKA-1398. Basically dynamic topic config changes are horribly broken in Kafka 0.8.1 and we need to move to 0.8.1.1 Sequence of events: 1. We set retention.ms = 100000 2. Brokers (1,2,3) picked up that change 3. We set retention.ms to 1209600000 4. One of the other 3 brokers would have picked up the change and gone ahead and deleted it because they didn't have the topic 5. By the time these brokers reacted, the new znode is no longer there and hence they log an error and move on. In summary: DO NOT USE kafka-topics.sh --alter --config <key>=<value> if you're on Kafka 0.8.1 If you've used it do verify that all brokers are demonstrating the behaviour you expect. A symptom of this bug is that if a topic is on all brokers, then the znode added to /config/changes will never be removed. That also means that config changes to a topic that is on all brokers are safe. On Fri, Jul 25, 2014 at 11:07 AM, Kashyap Paidimarri <kashy...@gmail.com> wrote: > Attached a transcript that explains what I'm seeing > > > On Fri, Jul 25, 2014 at 10:52 AM, Kashyap Paidimarri <kashy...@gmail.com> > wrote: > >> No, we haven't configured that. We have a few hundred topics but this >> seems to be the only one affected (I did a quick check, not thorough). >> >> The relevant config params that we have set in server.properties. >> >> log.dir=/var/lib/fk-3p-kafka/logs >> log.flush.interval.messages=10000 >> log.flush.interval.ms=1000 >> log.retention.hours=168 >> log.segment.bytes=536870912 >> log.cleanup.interval.mins=1 >> log.retention.hours=336 >> >> >> >> On Fri, Jul 25, 2014 at 10:11 AM, Jun Rao <jun...@gmail.com> wrote: >> >>> Have you configured log.retention.bytes? >>> >>> Thanks, >>> >>> Jun >>> >>> >>> On Thu, Jul 24, 2014 at 10:04 AM, Kashyap Paidimarri <kashy...@gmail.com >>> > >>> wrote: >>> >>> > We just noticed that one of our topics has been horribly misbehaving. >>> > >>> > *retention.ms <http://retention.ms>* for the topic is set to >>> 1209600000 ms >>> > >>> > However, segments are getting schedule for deletetion as soon as a new >>> one >>> > is rolled over. And naturally consumers are running into a >>> > kafka.common.OffsetOutOfRangeException whenever this happens. >>> > >>> > Is this a known bug? It is incredibly serious. We seem to have lost >>> about >>> > 40 million messages on a single topic and are yet to figure out what >>> all >>> > topics are affected. >>> > >>> > I thought of restarting Kafka but figured I'd leave it untouched while >>> I >>> > figure out what I can capture for finding the root cause. >>> > >>> > Meanwhile in order to keep from losing any more data, I have a >>> periodic job >>> > that is doing a *'cp -al' *of the partitions into a separate folder. >>> That >>> > way Kafka goes ahead and deletes the segment but the data is not lost >>> from >>> > the filesystem. >>> > >>> > If this is a unseen bug, what should I save from the running instance. >>> > >>> > By the way, this has affected all partitions and replicas of the topic >>> and >>> > not on a specific host. >>> > >>> >> >> >> >> -- >> “ The difference between ramen and varelse is not in the creature >> judged, but in the creature judging. When we declare an alien species to be >> ramen, it does not mean that *they* have passed a threshold of moral >> maturity. It means that *we* have. >> >> —Demosthenes, *Letter to the Framlings* >> ” >> > > > > -- > “ The difference between ramen and varelse is not in the creature judged, > but in the creature judging. When we declare an alien species to be ramen, > it does not mean that *they* have passed a threshold of moral maturity. > It means that *we* have. > > —Demosthenes, *Letter to the Framlings* > ” > -- “ The difference between ramen and varelse is not in the creature judged, but in the creature judging. When we declare an alien species to be ramen, it does not mean that *they* have passed a threshold of moral maturity. It means that *we* have. —Demosthenes, *Letter to the Framlings* ”