We have a test cluster running 0.8 that is not behaving properly. It is almost
continuously spewing the following exception into its log:
2013-03-07 23:44:17,532 ERROR kafka.network.Processor: Closing socket for
/10.10.2.123 because of error
java.io.IOException: Resource temporarily unavailable
at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
at
sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:415)
at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:516)
at kafka.log.FileMessageSet.writeTo(FileMessageSet.scala:133)
at kafka.api.PartitionDataSend.writeTo(FetchResponse.scala:73)
at kafka.network.MultiSend.writeTo(Transmission.scala:94)
at kafka.network.Send$class.writeCompletely(Transmission.scala:75)
at kafka.network.MultiSend.writeCompletely(Transmission.scala:87)
at kafka.api.TopicDataSend.writeTo(FetchResponse.scala:128)
at kafka.network.MultiSend.writeTo(Transmission.scala:94)
at kafka.network.Send$class.writeCompletely(Transmission.scala:75)
at kafka.network.MultiSend.writeCompletely(Transmission.scala:87)
at kafka.api.FetchResponseSend.writeTo(FetchResponse.scala:223)
at kafka.network.Processor.write(SocketServer.scala:318)
at kafka.network.Processor.run(SocketServer.scala:211)
at java.lang.Thread.run(Thread.java:619)
And our consumer is reporting the following:
2013-03-07 23:46:09,736 INFO kafka.consumer.SimpleConsumer: Reconnect due to
socket error:
java.io.EOFException: Received -1 when reading from channel, socket has likely
been closed.
at kafka.utils.Utils$.read(Utils.scala:373)
at
kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:67)
at kafka.network.Receive$class.readCompletely(Transmission.scala:56)
at
kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
at kafka.network.BlockingChannel.receive(BlockingChannel.scala:100)
at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:124)
at
kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:122)
at
kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:161)
at
kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:161)
at
kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:161)
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
at
kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(SimpleConsumer.scala:160)
at
kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:160)
at
kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:160)
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:159)
at
kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:93)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:50)
2013-03-07 23:46:09,740 INFO kafka.consumer.ConsumerFetcherManager:
[ConsumerFetcherManager-1362697806347] removing fetcher on topic
VTFull-enriched, partition 0
We have several other environments running the same code without error.
This is a CentOS server issuing these log errors.
We have both Ubuntu and CentOS environments working.
The only thing that seems weird is that this environment got brought up with
replication factor 2 even though there is only one broker (we have both 1 and 2
broker clusters working fine). We have since purged all data and zookeeper
nodes and started the cluster with clean data and this problem is still
happening.
We have one process writing data into the VTFull-enriched topic and have 14,000
messages in that topic (only one partition).
The consumer is trying to read from message 0 and is hitting this EOFException
right away. The app is not reading any messages at all.
Any ideas on what to do?
Thanks,
Bob Jervis