Hi team, Our Kafka is getting down almost once or twice a month due to log file deletion failure.
There is single node kafka broker is running in our system and gets down every time it tires to delete the log files as cleanup and fails. Sharing the Error Logs, we need a robust solution for this so that our kafka broker doesn't gets down like this every time. Regards, Neeraj Gulia Caused by: java.io.FileNotFoundException: /tmp/kafka-logs/dokutopic-0/00000000000000000000.index (No such file or directory) at java.base/java.io.RandomAccessFile.open0(Native Method) at java.base/java.io.RandomAccessFile.open(RandomAccessFile.java:345) at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:259) at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:214) at kafka.log.AbstractIndex.$anonfun$resize$1(AbstractIndex.scala:183) at kafka.log.AbstractIndex.resize(AbstractIndex.scala:176) at kafka.log.AbstractIndex.$anonfun$trimToValidSize$1(AbstractIndex.scala:242) at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:242) at kafka.log.LogSegment.onBecomeInactiveSegment(LogSegment.scala:508) at kafka.log.Log.$anonfun$roll$8(Log.scala:1954) at kafka.log.Log.$anonfun$roll$2(Log.scala:1954) at kafka.log.Log.roll(Log.scala:2387) at kafka.log.Log.$anonfun$deleteSegments$2(Log.scala:1749) at kafka.log.Log.deleteSegments(Log.scala:2387) at kafka.log.Log.deleteRetentionMsBreachedSegments(Log.scala:1737) at kafka.log.Log.deleteOldSegments(Log.scala:1806) at kafka.log.LogManager.$anonfun$cleanupLogs$3(LogManager.scala:1074) at kafka.log.LogManager.$anonfun$cleanupLogs$3$adapted(LogManager.scala:1071) at scala.collection.immutable.List.foreach(List.scala:431) at kafka.log.LogManager.cleanupLogs(LogManager.scala:1071) at kafka.log.LogManager.$anonfun$startup$2(LogManager.scala:409) at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:114) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) [2021-05-27 09:34:07,972] WARN [ReplicaManager broker=0] Broker 0 stopped fetcher for partitions __consumer_offsets-22,__consumer_offsets-30,__consumer_offsets-8,__consumer_offsets-21,__consumer_offsets-4,__consumer_offsets-27,__consumer_offsets-7,__consumer_offsets-9,__consumer_offsets-46,fliptopic-0,__consumer_offsets-25,webhook-events-0,__consumer_offsets-35,__consumer_offsets-41,__consumer_offsets-33,__consumer_offsets-23,__consumer_offsets-49,__consumer_offsets-47,__consumer_offsets-16,__consumer_offsets-28,dokutopic-0,__consumer_offsets-31,__consumer_offsets-36,__consumer_offsets-42,__consumer_offsets-3,post_payment_topic-0,__consumer_offsets-18,__consumer_offsets-37,topic-0,events-0,__consumer_offsets-15,__consumer_offsets-24,__consumer_offsets-38,__consumer_offsets-17,__consumer_offsets-48,__consumer_offsets-19,__consumer_offsets-11,__consumer_offsets-13,__consumer_offsets-2,__consumer_offsets-43,__consumer_offsets-6,__consumer_offsets-14,__consumer_offsets-20,__consumer_offsets-0,__consumer_offsets-44,disbursementtopic-0,__consumer_offsets-39,__consumer_offsets-12,__consumer_offsets-45,__consumer_offsets-1,__consumer_offsets-5,__consumer_offsets-26,__consumer_offsets-29,__consumer_offsets-34,__consumer_offsets-10,__consumer_offsets-32,__consumer_offsets-40,faspaytopic-0 and stopped moving logs for partitions because they are in the failed log directory /tmp/kafka-logs. (kafka.server.ReplicaManager) [2021-05-27 09:34:07,974] WARN Stopping serving logs in dir /tmp/kafka-logs (kafka.log.LogManager) [2021-05-27 09:34:07,983] ERROR Shutdown broker because all log dirs in /tmp/kafka-logs have failed (kafka.log.LogManager)