Jiefu, Now even if the disk space is enough (less than 18%), when I run
it still gives me error where in the logs it says: [2015-07-14 23:08:48,735] FATAL Fatal error during KafkaServerStartable startup. Prepare to shutdown (kafka.server.KafkaServerStartable) org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 6000 at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:880) at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:98) at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:84) at kafka.server.KafkaServer.initZk(KafkaServer.scala:157) at kafka.server.KafkaServer.startup(KafkaServer.scala:82) at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:29) at kafka.Kafka$.main(Kafka.scala:46) at kafka.Kafka.main(Kafka.scala) [2015-07-14 23:08:48,737] INFO [Kafka Server 1], shutting down (kafka.server.KafkaServer) I have checked that the zookeeper is running fine. Can anyone help why I got the error? Thanks. On Tue, Jul 14, 2015 at 10:24 PM, Yuheng Du <yuheng.du.h...@gmail.com> wrote: > But is there a way to let kafka override the old data if the disk is > filled? Or is it not necessary to use this figure? Thanks. > > On Tue, Jul 14, 2015 at 10:14 PM, Yuheng Du <yuheng.du.h...@gmail.com> > wrote: > >> Jiefu, >> >> I agree with you. I checked the hardware specs of my machines, each one >> of them has: >> >> RAM >> >> >> >> 256GB ECC Memory (16x 16 GB DDR4 1600MT/s dual rank RDIMMs >> >> Disk >> >> >> >> Two 1 TB 7.2K RPM 3G SATA HDDs >> >> For the throughput versus stored data test, it uses 5*10^10 messages, >> which has the total volume of 5TB, I made the replication factor to be 3, >> which means the total size including replicas would be 15TB, which >> apparently overwhelmed the two brokers I use. >> >> Thanks. >> >> best, >> Yuheng >> >> On Tue, Jul 14, 2015 at 6:06 PM, JIEFU GONG <jg...@berkeley.edu> wrote: >> >>> Someone may correct me if I am incorrect, but how much disk space do you >>> have on these nodes? Your exception 'No space left on device' from one of >>> your brokers seems to suggest that you're full (after all you're writing >>> 500 million records). If this is the case I believe the expected behavior >>> for Kafka is to reject any more attempts to write data? >>> >>> On Tue, Jul 14, 2015 at 2:27 PM, Yuheng Du <yuheng.du.h...@gmail.com> >>> wrote: >>> >>> > Also, the log in another broker (not the bootstrap) says: >>> > >>> > [2015-07-14 15:18:41,220] FATAL [Replica Manager on Broker 1]: Error >>> > writing to highwatermark file: (kafka.server.ReplicaManager) >>> > >>> > >>> > [2015-07-14 15:18:40,160] ERROR Closing socket for /130.127.133.47 >>> because >>> > of error (kafka.network.Process >>> > >>> > or) >>> > >>> > java.io.IOException: Connection reset by peer >>> > >>> > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) >>> > >>> > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) >>> > >>> > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) >>> > >>> > at sun.nio.ch.IOUtil.read(IOUtil.java:197) >>> > >>> > at >>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) >>> > >>> > at kafka.utils.Utils$.read(Utils.scala:380) >>> > >>> > at >>> > >>> > >>> kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) >>> > >>> > at kafka.network.Processor.read(SocketServer.scala:444) >>> > >>> > at kafka.network.Processor.run(SocketServer.scala:340) >>> > >>> > at java.lang.Thread.run(Thread.java:745) >>> > >>> > ........ >>> > >>> > java.io.IOException: No space left on device >>> > >>> > at java.io.FileOutputStream.writeBytes(Native Method) >>> > >>> > at java.io.FileOutputStream.write(FileOutputStream.java:345) >>> > >>> > at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) >>> > >>> > at sun.nio.cs.StreamEncoder.implFlushBuffe >>> > >>> > (END) >>> > >>> > On Tue, Jul 14, 2015 at 5:24 PM, Yuheng Du <yuheng.du.h...@gmail.com> >>> > wrote: >>> > >>> > > Hi Jiefu, Gwen, >>> > > >>> > > I am running the Throughput versus stored data test: >>> > > bin/kafka-run-class.sh >>> org.apache.kafka.clients.tools.ProducerPerformance >>> > > test 50000000000 100 -1 acks=1 bootstrap.servers= >>> > > esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 >>> > batch.size=8196 >>> > > >>> > > After around 50,000,000 messages were sent, I got a bunch of >>> connection >>> > > refused error as I mentioned before. I checked the logs on the >>> broker and >>> > > here is what I see: >>> > > >>> > > [2015-07-14 15:11:23,578] WARN Partition [test,4] on broker 5: No >>> > > checkpointed highwatermark is found for partition [test,4] >>> > > (kafka.cluster.Partition) >>> > > >>> > > [2015-07-14 15:12:33,298] INFO Rolled new log segment for 'test-4' >>> in 4 >>> > > ms. (kafka.log.Log) >>> > > >>> > > [2015-07-14 15:12:33,299] INFO Rolled new log segment for 'test-0' >>> in 1 >>> > > ms. (kafka.log.Log) >>> > > >>> > > [2015-07-14 15:13:39,529] INFO Rolled new log segment for 'test-4' >>> in 1 >>> > > ms. (kafka.log.Log) >>> > > >>> > > [2015-07-14 15:13:39,531] INFO Rolled new log segment for 'test-0' >>> in 1 >>> > > ms. (kafka.log.Log) >>> > > >>> > > [2015-07-14 15:14:48,502] INFO Rolled new log segment for 'test-4' >>> in 3 >>> > > ms. (kafka.log.Log) >>> > > >>> > > [2015-07-14 15:14:48,502] INFO Rolled new log segment for 'test-0' >>> in 1 >>> > > ms. (kafka.log.Log) >>> > > >>> > > [2015-07-14 15:15:51,478] INFO Rolled new log segment for 'test-4' >>> in 1 >>> > > ms. (kafka.log.Log) >>> > > >>> > > [2015-07-14 15:15:51,479] INFO Rolled new log segment for 'test-0' >>> in 1 >>> > > ms. (kafka.log.Log) >>> > > >>> > > [2015-07-14 15:16:52,589] INFO Rolled new log segment for 'test-4' >>> in 1 >>> > > ms. (kafka.log.Log) >>> > > >>> > > [2015-07-14 15:16:52,590] INFO Rolled new log segment for 'test-0' >>> in 1 >>> > > ms. (kafka.log.Log) >>> > > >>> > > [2015-07-14 15:17:57,406] INFO Rolled new log segment for 'test-4' >>> in 1 >>> > > ms. (kafka.log.Log) >>> > > >>> > > [2015-07-14 15:17:57,407] INFO Rolled new log segment for 'test-0' >>> in 0 >>> > > ms. (kafka.log.Log) >>> > > >>> > > [2015-07-14 15:18:39,792] FATAL [KafkaApi-5] Halting due to >>> unrecoverable >>> > > I/O error while handling produce request: (kafka.server.KafkaApis) >>> > > >>> > > kafka.common.KafkaStorageException: I/O exception in append to log >>> > 'test-0' >>> > > >>> > > at kafka.log.Log.append(Log.scala:266) >>> > > >>> > > at >>> > > >>> > >>> kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:379) >>> > > >>> > > at >>> > > >>> > >>> kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:365) >>> > > >>> > > at kafka.utils.Utils$.inLock(Utils.scala:535) >>> > > >>> > > at kafka.utils.Utils$.inReadLock(Utils.scala:541) >>> > > >>> > > at >>> > > kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:365) >>> > > >>> > > at >>> > > >>> > >>> kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:291) >>> > > >>> > > at >>> > > >>> > >>> kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:282) >>> > > >>> > > at >>> > > >>> > >>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) >>> > > >>> > > at >>> > > >>> > >>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) >>> > > >>> > > at >>> > > >>> > >>> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) >>> > > >>> > > at >>> > > >>> > >>> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) >>> > > >>> > > at scala.coll >>> > > >>> > > >>> > > >>> > > Can you help me with this problem? Thanks. >>> > > >>> > > On Tue, Jul 14, 2015 at 5:12 PM, Yuheng Du <yuheng.du.h...@gmail.com >>> > >>> > > wrote: >>> > > >>> > >> I checked the logs on the brokers, it seems that the zookeeper or >>> the >>> > >> kafka server process is not running on this broker...Thank you >>> guys. I >>> > will >>> > >> see if it happens again. >>> > >> >>> > >> On Tue, Jul 14, 2015 at 4:53 PM, JIEFU GONG <jg...@berkeley.edu> >>> wrote: >>> > >> >>> > >>> Hmm..yeah some error logs would be nice like Gwen pointed out. Do >>> any >>> > of >>> > >>> your brokers fall out of the ISR when sending messages? It seems >>> like >>> > >>> your >>> > >>> setup should be fine, so I'm not entirely sure. >>> > >>> >>> > >>> On Tue, Jul 14, 2015 at 1:31 PM, Yuheng Du < >>> yuheng.du.h...@gmail.com> >>> > >>> wrote: >>> > >>> >>> > >>> > Jiefu, >>> > >>> > >>> > >>> > I am performing these tests on a 6 nodes cluster in cloudlab (a >>> > >>> > infrastructure built for scientific research). I use 2 nodes as >>> > >>> producers, >>> > >>> > 2 as brokers only, and 2 as consumers. I have tested for each >>> > >>> individual >>> > >>> > machines and they work well. I did not use AWS. Thank you! >>> > >>> > >>> > >>> > On Tue, Jul 14, 2015 at 4:20 PM, JIEFU GONG <jg...@berkeley.edu> >>> > >>> wrote: >>> > >>> > >>> > >>> > > Yuheng, are you performing these tests locally or using a >>> service >>> > >>> such as >>> > >>> > > AWS? I'd try using each separate machine individually first, >>> > >>> connecting >>> > >>> > to >>> > >>> > > the ZK/Kafka servers and ensuring that each is able to first >>> log >>> > and >>> > >>> > > consume messages independently. >>> > >>> > > >>> > >>> > > On Tue, Jul 14, 2015 at 1:17 PM, Gwen Shapira < >>> > gshap...@cloudera.com >>> > >>> > >>> > >>> > > wrote: >>> > >>> > > >>> > >>> > > > Are there any errors on the broker logs? >>> > >>> > > > >>> > >>> > > > On Tue, Jul 14, 2015 at 11:57 AM, Yuheng Du < >>> > >>> yuheng.du.h...@gmail.com> >>> > >>> > > > wrote: >>> > >>> > > > > Jiefu, >>> > >>> > > > > >>> > >>> > > > > Thank you. The three producers can run at the same time. I >>> mean >>> > >>> > should >>> > >>> > > > they >>> > >>> > > > > be started at exactly the same time? (I have three consoles >>> > from >>> > >>> each >>> > >>> > > of >>> > >>> > > > > the three machines and I just start the console command >>> > manually >>> > >>> one >>> > >>> > by >>> > >>> > > > > one) Or a small variation of the starting time won't >>> matter? >>> > >>> > > > > >>> > >>> > > > > Gwen and Jiefu, >>> > >>> > > > > >>> > >>> > > > > I have started the three producers at three machines, >>> after a >>> > >>> while, >>> > >>> > > all >>> > >>> > > > of >>> > >>> > > > > them gives a java.net.ConnectException: >>> > >>> > > > > >>> > >>> > > > > [2015-07-14 12:56:46,352] WARN Error in I/O with >>> > >>> producer0-link-0/ >>> > >>> > > > > 192.168.1.1 (org.apache.kafka.common.network.Selector) >>> > >>> > > > > >>> > >>> > > > > java.net.ConnectException: Connection refused...... >>> > >>> > > > > >>> > >>> > > > > [2015-07-14 12:56:48,056] WARN Error in I/O with >>> > >>> producer1-link-0/ >>> > >>> > > > > 192.168.1.2 (org.apache.kafka.common.network.Selector) >>> > >>> > > > > >>> > >>> > > > > java.net.ConnectException: Connection refused..... >>> > >>> > > > > >>> > >>> > > > > What could be the cause? >>> > >>> > > > > >>> > >>> > > > > Thank you guys! >>> > >>> > > > > >>> > >>> > > > > >>> > >>> > > > > >>> > >>> > > > > >>> > >>> > > > > On Tue, Jul 14, 2015 at 2:47 PM, JIEFU GONG < >>> > jg...@berkeley.edu> >>> > >>> > > wrote: >>> > >>> > > > > >>> > >>> > > > >> Yuheng, >>> > >>> > > > >> >>> > >>> > > > >> Yes, if you read the blog post it specifies that he's >>> using >>> > >>> three >>> > >>> > > > separate >>> > >>> > > > >> machines. There's no reason the producers cannot be >>> started at >>> > >>> the >>> > >>> > > same >>> > >>> > > > >> time, I believe. >>> > >>> > > > >> >>> > >>> > > > >> On Tue, Jul 14, 2015 at 11:42 AM, Yuheng Du < >>> > >>> > yuheng.du.h...@gmail.com >>> > >>> > > > >>> > >>> > > > >> wrote: >>> > >>> > > > >> >>> > >>> > > > >> > Hi, >>> > >>> > > > >> > >>> > >>> > > > >> > I am running the performance test for kafka. >>> > >>> > > > >> > https://gist.github.com/jkreps >>> > >>> > > > >> > /c7ddb4041ef62a900e6c >>> > >>> > > > >> > >>> > >>> > > > >> > For the "Three Producers, 3x async replication" >>> scenario, >>> > the >>> > >>> > > command >>> > >>> > > > is >>> > >>> > > > >> > the same as one producer: >>> > >>> > > > >> > >>> > >>> > > > >> > bin/kafka-run-class.sh >>> > >>> > > > org.apache.kafka.clients.tools.ProducerPerformance >>> > >>> > > > >> > test 50000000 100 -1 acks=1 >>> > >>> > > > >> > bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 >>> > >>> > > > >> > buffer.memory=67108864 batch.size=8196 >>> > >>> > > > >> > >>> > >>> > > > >> > So How to I run the test for three producers? Do I just >>> run >>> > >>> them >>> > >>> > on >>> > >>> > > > three >>> > >>> > > > >> > separate servers at the same time? Will there be some >>> error >>> > in >>> > >>> > this >>> > >>> > > > way >>> > >>> > > > >> > since the three producers can't be started at the same >>> time? >>> > >>> > > > >> > >>> > >>> > > > >> > Thanks. >>> > >>> > > > >> > >>> > >>> > > > >> > best, >>> > >>> > > > >> > Yuheng >>> > >>> > > > >> > >>> > >>> > > > >> >>> > >>> > > > >> >>> > >>> > > > >> >>> > >>> > > > >> -- >>> > >>> > > > >> >>> > >>> > > > >> Jiefu Gong >>> > >>> > > > >> University of California, Berkeley | Class of 2017 >>> > >>> > > > >> B.A Computer Science | College of Letters and Sciences >>> > >>> > > > >> >>> > >>> > > > >> jg...@berkeley.edu <elise...@berkeley.edu> | (925) >>> 400-3427 >>> > >>> > > > >> >>> > >>> > > > >>> > >>> > > >>> > >>> > > >>> > >>> > > >>> > >>> > > -- >>> > >>> > > >>> > >>> > > Jiefu Gong >>> > >>> > > University of California, Berkeley | Class of 2017 >>> > >>> > > B.A Computer Science | College of Letters and Sciences >>> > >>> > > >>> > >>> > > jg...@berkeley.edu <elise...@berkeley.edu> | (925) 400-3427 >>> > >>> > > >>> > >>> > >>> > >>> >>> > >>> >>> > >>> >>> > >>> -- >>> > >>> >>> > >>> Jiefu Gong >>> > >>> University of California, Berkeley | Class of 2017 >>> > >>> B.A Computer Science | College of Letters and Sciences >>> > >>> >>> > >>> jg...@berkeley.edu <elise...@berkeley.edu> | (925) 400-3427 >>> > >>> >>> > >> >>> > >> >>> > > >>> > >>> >>> >>> >>> -- >>> >>> Jiefu Gong >>> University of California, Berkeley | Class of 2017 >>> B.A Computer Science | College of Letters and Sciences >>> >>> jg...@berkeley.edu <elise...@berkeley.edu> | (925) 400-3427 >>> >> >> >