We have 3 node kafka cluster. I initially created 4 topics. I wrote small shell script to create 150 topics.
TOPICS=$(< $1) for topic in $TOPICS do echo "/usr/local/kafka/bin/kafka-create-topic.sh --replica 3 --topic $topic --zookeeper $2:2181/kafka --partition 36" /usr/local/kafka/bin/kafka-create-topic.sh --replica 3 --topic $topic --zookeeper $2:2181/kafka --partition 36 done 10 minutes later I see messages like this [2013-08-13 11:43:58,944] INFO [ReplicaFetcherManager on broker 7] Removing fetcher for partition [m3_registration,0] (kafka.server.ReplicaFetcherManager) followed by [2013-08-13 11:44:00,067] WARN [ReplicaFetcherThread-0-8], error for partition [m3_registration,22] to broker 8 (kafka.server.ReplicaFetcherThread) kafka.common.NotLeaderForPartitionException Then a few minutes later followed by the following messages that overwhelmed logging system. [2013-08-13 11:46:35,916] ERROR error in loggedRunnable (kafka.utils.Utils$) java.io.FileNotFoundException: /home/kafka/data7/replication-offset-checkpoint.tmp (Too many open files) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.<init>(FileOutputStream.java:194) I restarted the service after discovering the problem. After a few minutes attempting to recover kafka service crashed with the following error. [2013-08-13 17:20:08,953] INFO [Log Manager on Broker 7] Loading log 'm3_registration-29' (kafka.log.LogManager) [2013-08-13 17:20:08,992] FATAL Fatal error during KafkaServerStable startup. Prepare to shutdown (kafka.server.KafkaServerStartable) java.lang.IllegalStateException: Found log file with no corresponding index file. No activity on the cluster after topics were added. What could have cause the crash and trigger too many open files exception? What the best way to recover in order to restart kafka service(Not sure if delete topic command will work in this particular case as all 3 services would not start)?How to prevent in the future? Thanks so much in advance, Vadim