Jiefu,

Now even if the disk space is enough (less than 18%), when I run

it still gives me error where in the logs it says:

[2015-07-14 23:08:48,735] FATAL Fatal error during KafkaServerStartable
startup. Prepare to shutdown (kafka.server.KafkaServerStartable)

org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to
zookeeper server within timeout: 6000

        at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:880)

        at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:98)

        at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:84)

        at kafka.server.KafkaServer.initZk(KafkaServer.scala:157)

        at kafka.server.KafkaServer.startup(KafkaServer.scala:82)

        at
kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:29)

        at kafka.Kafka$.main(Kafka.scala:46)

        at kafka.Kafka.main(Kafka.scala)

[2015-07-14 23:08:48,737] INFO [Kafka Server 1], shutting down
(kafka.server.KafkaServer)

I have checked that the zookeeper is running fine. Can anyone help why I
got the error? Thanks.

On Tue, Jul 14, 2015 at 10:24 PM, Yuheng Du <yuheng.du.h...@gmail.com>
wrote:

> But is there a way to let kafka override the old data if the disk is
> filled? Or is it not necessary to use this figure? Thanks.
>
> On Tue, Jul 14, 2015 at 10:14 PM, Yuheng Du <yuheng.du.h...@gmail.com>
> wrote:
>
>> Jiefu,
>>
>> I agree with you. I checked the hardware specs of my machines, each one
>> of them has:
>>
>> RAM
>>
>>
>>
>> 256GB ECC Memory (16x 16 GB DDR4 1600MT/s dual rank RDIMMs
>>
>> Disk
>>
>>
>>
>> Two 1 TB 7.2K RPM 3G SATA HDDs
>>
>> For the throughput versus stored data test, it uses 5*10^10 messages,
>> which has the total volume of 5TB, I made the replication factor to be 3,
>> which means the total size including replicas would be 15TB, which
>> apparently overwhelmed the two brokers I use.
>>
>> Thanks.
>>
>> best,
>> Yuheng
>>
>> On Tue, Jul 14, 2015 at 6:06 PM, JIEFU GONG <jg...@berkeley.edu> wrote:
>>
>>> Someone may correct me if I am incorrect, but how much disk space do you
>>> have on these nodes? Your exception 'No space left on device' from one of
>>> your brokers seems to suggest that you're full (after all you're writing
>>> 500 million records). If this is the case I believe the expected behavior
>>> for Kafka is to reject any more attempts to write data?
>>>
>>> On Tue, Jul 14, 2015 at 2:27 PM, Yuheng Du <yuheng.du.h...@gmail.com>
>>> wrote:
>>>
>>> > Also, the log in another broker (not the bootstrap) says:
>>> >
>>> > [2015-07-14 15:18:41,220] FATAL [Replica Manager on Broker 1]: Error
>>> > writing to highwatermark file:  (kafka.server.ReplicaManager)
>>> >
>>> >
>>> > [2015-07-14 15:18:40,160] ERROR Closing socket for /130.127.133.47
>>> because
>>> > of error (kafka.network.Process
>>> >
>>> > or)
>>> >
>>> > java.io.IOException: Connection reset by peer
>>> >
>>> >         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>>> >
>>> >         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>>> >
>>> >         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>>> >
>>> >         at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>>> >
>>> >         at
>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>>> >
>>> >         at kafka.utils.Utils$.read(Utils.scala:380)
>>> >
>>> >         at
>>> >
>>> >
>>> kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
>>> >
>>> >         at kafka.network.Processor.read(SocketServer.scala:444)
>>> >
>>> >         at kafka.network.Processor.run(SocketServer.scala:340)
>>> >
>>> >         at java.lang.Thread.run(Thread.java:745)
>>> >
>>> > ........
>>> >
>>> > java.io.IOException: No space left on device
>>> >
>>> >         at java.io.FileOutputStream.writeBytes(Native Method)
>>> >
>>> >         at java.io.FileOutputStream.write(FileOutputStream.java:345)
>>> >
>>> >         at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
>>> >
>>> >         at sun.nio.cs.StreamEncoder.implFlushBuffe
>>> >
>>> > (END)
>>> >
>>> > On Tue, Jul 14, 2015 at 5:24 PM, Yuheng Du <yuheng.du.h...@gmail.com>
>>> > wrote:
>>> >
>>> > > Hi Jiefu, Gwen,
>>> > >
>>> > > I am running the Throughput versus stored data test:
>>> > > bin/kafka-run-class.sh
>>> org.apache.kafka.clients.tools.ProducerPerformance
>>> > > test 50000000000 100 -1 acks=1 bootstrap.servers=
>>> > > esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864
>>> > batch.size=8196
>>> > >
>>> > > After around 50,000,000 messages were sent, I got a bunch of
>>> connection
>>> > > refused error as I mentioned before. I checked the logs on the
>>> broker and
>>> > > here is what I see:
>>> > >
>>> > > [2015-07-14 15:11:23,578] WARN Partition [test,4] on broker 5: No
>>> > > checkpointed highwatermark is found for partition [test,4]
>>> > > (kafka.cluster.Partition)
>>> > >
>>> > > [2015-07-14 15:12:33,298] INFO Rolled new log segment for 'test-4'
>>> in 4
>>> > > ms. (kafka.log.Log)
>>> > >
>>> > > [2015-07-14 15:12:33,299] INFO Rolled new log segment for 'test-0'
>>> in 1
>>> > > ms. (kafka.log.Log)
>>> > >
>>> > > [2015-07-14 15:13:39,529] INFO Rolled new log segment for 'test-4'
>>> in 1
>>> > > ms. (kafka.log.Log)
>>> > >
>>> > > [2015-07-14 15:13:39,531] INFO Rolled new log segment for 'test-0'
>>> in 1
>>> > > ms. (kafka.log.Log)
>>> > >
>>> > > [2015-07-14 15:14:48,502] INFO Rolled new log segment for 'test-4'
>>> in 3
>>> > > ms. (kafka.log.Log)
>>> > >
>>> > > [2015-07-14 15:14:48,502] INFO Rolled new log segment for 'test-0'
>>> in 1
>>> > > ms. (kafka.log.Log)
>>> > >
>>> > > [2015-07-14 15:15:51,478] INFO Rolled new log segment for 'test-4'
>>> in 1
>>> > > ms. (kafka.log.Log)
>>> > >
>>> > > [2015-07-14 15:15:51,479] INFO Rolled new log segment for 'test-0'
>>> in 1
>>> > > ms. (kafka.log.Log)
>>> > >
>>> > > [2015-07-14 15:16:52,589] INFO Rolled new log segment for 'test-4'
>>> in 1
>>> > > ms. (kafka.log.Log)
>>> > >
>>> > > [2015-07-14 15:16:52,590] INFO Rolled new log segment for 'test-0'
>>> in 1
>>> > > ms. (kafka.log.Log)
>>> > >
>>> > > [2015-07-14 15:17:57,406] INFO Rolled new log segment for 'test-4'
>>> in 1
>>> > > ms. (kafka.log.Log)
>>> > >
>>> > > [2015-07-14 15:17:57,407] INFO Rolled new log segment for 'test-0'
>>> in 0
>>> > > ms. (kafka.log.Log)
>>> > >
>>> > > [2015-07-14 15:18:39,792] FATAL [KafkaApi-5] Halting due to
>>> unrecoverable
>>> > > I/O error while handling produce request:  (kafka.server.KafkaApis)
>>> > >
>>> > > kafka.common.KafkaStorageException: I/O exception in append to log
>>> > 'test-0'
>>> > >
>>> > >         at kafka.log.Log.append(Log.scala:266)
>>> > >
>>> > >         at
>>> > >
>>> >
>>> kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:379)
>>> > >
>>> > >         at
>>> > >
>>> >
>>> kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:365)
>>> > >
>>> > >         at kafka.utils.Utils$.inLock(Utils.scala:535)
>>> > >
>>> > >         at kafka.utils.Utils$.inReadLock(Utils.scala:541)
>>> > >
>>> > >         at
>>> > > kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:365)
>>> > >
>>> > >         at
>>> > >
>>> >
>>> kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:291)
>>> > >
>>> > >         at
>>> > >
>>> >
>>> kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:282)
>>> > >
>>> > >         at
>>> > >
>>> >
>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>>> > >
>>> > >         at
>>> > >
>>> >
>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>>> > >
>>> > >         at
>>> > >
>>> >
>>> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>>> > >
>>> > >         at
>>> > >
>>> >
>>> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>>> > >
>>> > >         at scala.coll
>>> > >
>>> > >
>>> > >
>>> > > Can you help me with this problem? Thanks.
>>> > >
>>> > > On Tue, Jul 14, 2015 at 5:12 PM, Yuheng Du <yuheng.du.h...@gmail.com
>>> >
>>> > > wrote:
>>> > >
>>> > >> I checked the logs on the brokers, it seems that the zookeeper or
>>> the
>>> > >> kafka server process is not running on this broker...Thank you
>>> guys. I
>>> > will
>>> > >> see if it happens again.
>>> > >>
>>> > >> On Tue, Jul 14, 2015 at 4:53 PM, JIEFU GONG <jg...@berkeley.edu>
>>> wrote:
>>> > >>
>>> > >>> Hmm..yeah some error logs would be nice like Gwen pointed out. Do
>>> any
>>> > of
>>> > >>> your brokers fall out of the ISR when sending messages? It seems
>>> like
>>> > >>> your
>>> > >>> setup should be fine, so I'm not entirely sure.
>>> > >>>
>>> > >>> On Tue, Jul 14, 2015 at 1:31 PM, Yuheng Du <
>>> yuheng.du.h...@gmail.com>
>>> > >>> wrote:
>>> > >>>
>>> > >>> > Jiefu,
>>> > >>> >
>>> > >>> > I am performing these tests on a 6 nodes cluster in cloudlab (a
>>> > >>> > infrastructure built for scientific research). I use 2 nodes as
>>> > >>> producers,
>>> > >>> > 2 as brokers only, and 2 as consumers. I have tested for each
>>> > >>> individual
>>> > >>> > machines and they work well. I did not use AWS. Thank you!
>>> > >>> >
>>> > >>> > On Tue, Jul 14, 2015 at 4:20 PM, JIEFU GONG <jg...@berkeley.edu>
>>> > >>> wrote:
>>> > >>> >
>>> > >>> > > Yuheng, are you performing these tests locally or using a
>>> service
>>> > >>> such as
>>> > >>> > > AWS? I'd try using each separate machine individually first,
>>> > >>> connecting
>>> > >>> > to
>>> > >>> > > the ZK/Kafka servers and ensuring that each is able to first
>>> log
>>> > and
>>> > >>> > > consume messages independently.
>>> > >>> > >
>>> > >>> > > On Tue, Jul 14, 2015 at 1:17 PM, Gwen Shapira <
>>> > gshap...@cloudera.com
>>> > >>> >
>>> > >>> > > wrote:
>>> > >>> > >
>>> > >>> > > > Are there any errors on the broker logs?
>>> > >>> > > >
>>> > >>> > > > On Tue, Jul 14, 2015 at 11:57 AM, Yuheng Du <
>>> > >>> yuheng.du.h...@gmail.com>
>>> > >>> > > > wrote:
>>> > >>> > > > > Jiefu,
>>> > >>> > > > >
>>> > >>> > > > > Thank you. The three producers can run at the same time. I
>>> mean
>>> > >>> > should
>>> > >>> > > > they
>>> > >>> > > > > be started at exactly the same time? (I have three consoles
>>> > from
>>> > >>> each
>>> > >>> > > of
>>> > >>> > > > > the three machines and I just start the console command
>>> > manually
>>> > >>> one
>>> > >>> > by
>>> > >>> > > > > one) Or a small variation of the starting time won't
>>> matter?
>>> > >>> > > > >
>>> > >>> > > > > Gwen and Jiefu,
>>> > >>> > > > >
>>> > >>> > > > > I have started the three producers at three machines,
>>> after a
>>> > >>> while,
>>> > >>> > > all
>>> > >>> > > > of
>>> > >>> > > > > them gives a java.net.ConnectException:
>>> > >>> > > > >
>>> > >>> > > > > [2015-07-14 12:56:46,352] WARN Error in I/O with
>>> > >>> producer0-link-0/
>>> > >>> > > > > 192.168.1.1 (org.apache.kafka.common.network.Selector)
>>> > >>> > > > >
>>> > >>> > > > > java.net.ConnectException: Connection refused......
>>> > >>> > > > >
>>> > >>> > > > > [2015-07-14 12:56:48,056] WARN Error in I/O with
>>> > >>> producer1-link-0/
>>> > >>> > > > > 192.168.1.2 (org.apache.kafka.common.network.Selector)
>>> > >>> > > > >
>>> > >>> > > > > java.net.ConnectException: Connection refused.....
>>> > >>> > > > >
>>> > >>> > > > > What could be the cause?
>>> > >>> > > > >
>>> > >>> > > > > Thank you guys!
>>> > >>> > > > >
>>> > >>> > > > >
>>> > >>> > > > >
>>> > >>> > > > >
>>> > >>> > > > > On Tue, Jul 14, 2015 at 2:47 PM, JIEFU GONG <
>>> > jg...@berkeley.edu>
>>> > >>> > > wrote:
>>> > >>> > > > >
>>> > >>> > > > >> Yuheng,
>>> > >>> > > > >>
>>> > >>> > > > >> Yes, if you read the blog post it specifies that he's
>>> using
>>> > >>> three
>>> > >>> > > > separate
>>> > >>> > > > >> machines. There's no reason the producers cannot be
>>> started at
>>> > >>> the
>>> > >>> > > same
>>> > >>> > > > >> time, I believe.
>>> > >>> > > > >>
>>> > >>> > > > >> On Tue, Jul 14, 2015 at 11:42 AM, Yuheng Du <
>>> > >>> > yuheng.du.h...@gmail.com
>>> > >>> > > >
>>> > >>> > > > >> wrote:
>>> > >>> > > > >>
>>> > >>> > > > >> > Hi,
>>> > >>> > > > >> >
>>> > >>> > > > >> > I am running the performance test for kafka.
>>> > >>> > > > >> > https://gist.github.com/jkreps
>>> > >>> > > > >> > /c7ddb4041ef62a900e6c
>>> > >>> > > > >> >
>>> > >>> > > > >> > For the "Three Producers, 3x async replication"
>>> scenario,
>>> > the
>>> > >>> > > command
>>> > >>> > > > is
>>> > >>> > > > >> > the same as one producer:
>>> > >>> > > > >> >
>>> > >>> > > > >> > bin/kafka-run-class.sh
>>> > >>> > > > org.apache.kafka.clients.tools.ProducerPerformance
>>> > >>> > > > >> > test 50000000 100 -1 acks=1
>>> > >>> > > > >> > bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092
>>> > >>> > > > >> > buffer.memory=67108864 batch.size=8196
>>> > >>> > > > >> >
>>> > >>> > > > >> > So How to I run the test for three producers? Do I just
>>> run
>>> > >>> them
>>> > >>> > on
>>> > >>> > > > three
>>> > >>> > > > >> > separate servers at the same time? Will there be some
>>> error
>>> > in
>>> > >>> > this
>>> > >>> > > > way
>>> > >>> > > > >> > since the three producers can't be started at the same
>>> time?
>>> > >>> > > > >> >
>>> > >>> > > > >> > Thanks.
>>> > >>> > > > >> >
>>> > >>> > > > >> > best,
>>> > >>> > > > >> > Yuheng
>>> > >>> > > > >> >
>>> > >>> > > > >>
>>> > >>> > > > >>
>>> > >>> > > > >>
>>> > >>> > > > >> --
>>> > >>> > > > >>
>>> > >>> > > > >> Jiefu Gong
>>> > >>> > > > >> University of California, Berkeley | Class of 2017
>>> > >>> > > > >> B.A Computer Science | College of Letters and Sciences
>>> > >>> > > > >>
>>> > >>> > > > >> jg...@berkeley.edu <elise...@berkeley.edu> | (925)
>>> 400-3427
>>> > >>> > > > >>
>>> > >>> > > >
>>> > >>> > >
>>> > >>> > >
>>> > >>> > >
>>> > >>> > > --
>>> > >>> > >
>>> > >>> > > Jiefu Gong
>>> > >>> > > University of California, Berkeley | Class of 2017
>>> > >>> > > B.A Computer Science | College of Letters and Sciences
>>> > >>> > >
>>> > >>> > > jg...@berkeley.edu <elise...@berkeley.edu> | (925) 400-3427
>>> > >>> > >
>>> > >>> >
>>> > >>>
>>> > >>>
>>> > >>>
>>> > >>> --
>>> > >>>
>>> > >>> Jiefu Gong
>>> > >>> University of California, Berkeley | Class of 2017
>>> > >>> B.A Computer Science | College of Letters and Sciences
>>> > >>>
>>> > >>> jg...@berkeley.edu <elise...@berkeley.edu> | (925) 400-3427
>>> > >>>
>>> > >>
>>> > >>
>>> > >
>>> >
>>>
>>>
>>>
>>> --
>>>
>>> Jiefu Gong
>>> University of California, Berkeley | Class of 2017
>>> B.A Computer Science | College of Letters and Sciences
>>>
>>> jg...@berkeley.edu <elise...@berkeley.edu> | (925) 400-3427
>>>
>>
>>
>

Reply via email to