We have 3 node kafka cluster. I initially created 4 topics.
I wrote small shell script to create 150 topics.

TOPICS=$(< $1)
for topic in $TOPICS
do
   echo "/usr/local/kafka/bin/kafka-create-topic.sh --replica 3 --topic
$topic --zookeeper $2:2181/kafka --partition 36"
   /usr/local/kafka/bin/kafka-create-topic.sh --replica 3 --topic $topic
--zookeeper $2:2181/kafka --partition 36
done

10 minutes later I see messages like this
[2013-08-13 11:43:58,944] INFO [ReplicaFetcherManager on broker 7] Removing
fetcher for partition [m3_registration,0]
(kafka.server.ReplicaFetcherManager) followed by
[2013-08-13 11:44:00,067] WARN [ReplicaFetcherThread-0-8], error for
partition [m3_registration,22] to broker 8
(kafka.server.ReplicaFetcherThread)
kafka.common.NotLeaderForPartitionException

Then a few minutes later followed by the following messages that
overwhelmed logging system.
[2013-08-13 11:46:35,916] ERROR error in loggedRunnable (kafka.utils.Utils$)
java.io.FileNotFoundException:
/home/kafka/data7/replication-offset-checkpoint.tmp (Too many open files)
        at java.io.FileOutputStream.open(Native Method)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:194)

I restarted the service after discovering the problem. After a few minutes
attempting to recover kafka service crashed with the following error.

 [2013-08-13 17:20:08,953] INFO [Log Manager on Broker 7] Loading log
'm3_registration-29' (kafka.log.LogManager)
[2013-08-13 17:20:08,992] FATAL Fatal error during KafkaServerStable
startup. Prepare to shutdown (kafka.server.KafkaServerStartable)
java.lang.IllegalStateException: Found log file with no corresponding index
file.

No activity on the cluster after topics were added.
What could have cause the crash and trigger too many open files exception?
What the best way to recover in order to restart kafka service(Not sure if
delete topic command will work in this particular case as all 3 services
would not start)?How to prevent in the future?

Thanks so much in advance,
Vadim

Reply via email to