Jiefu,

I agree with you. I checked the hardware specs of my machines, each one of
them has:

RAM



256GB ECC Memory (16x 16 GB DDR4 1600MT/s dual rank RDIMMs

Disk



Two 1 TB 7.2K RPM 3G SATA HDDs

For the throughput versus stored data test, it uses 5*10^10 messages, which
has the total volume of 5TB, I made the replication factor to be 3, which
means the total size including replicas would be 15TB, which apparently
overwhelmed the two brokers I use.

Thanks.

best,
Yuheng

On Tue, Jul 14, 2015 at 6:06 PM, JIEFU GONG <jg...@berkeley.edu> wrote:

> Someone may correct me if I am incorrect, but how much disk space do you
> have on these nodes? Your exception 'No space left on device' from one of
> your brokers seems to suggest that you're full (after all you're writing
> 500 million records). If this is the case I believe the expected behavior
> for Kafka is to reject any more attempts to write data?
>
> On Tue, Jul 14, 2015 at 2:27 PM, Yuheng Du <yuheng.du.h...@gmail.com>
> wrote:
>
> > Also, the log in another broker (not the bootstrap) says:
> >
> > [2015-07-14 15:18:41,220] FATAL [Replica Manager on Broker 1]: Error
> > writing to highwatermark file:  (kafka.server.ReplicaManager)
> >
> >
> > [2015-07-14 15:18:40,160] ERROR Closing socket for /130.127.133.47
> because
> > of error (kafka.network.Process
> >
> > or)
> >
> > java.io.IOException: Connection reset by peer
> >
> >         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> >
> >         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> >
> >         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> >
> >         at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> >
> >         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> >
> >         at kafka.utils.Utils$.read(Utils.scala:380)
> >
> >         at
> >
> >
> kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
> >
> >         at kafka.network.Processor.read(SocketServer.scala:444)
> >
> >         at kafka.network.Processor.run(SocketServer.scala:340)
> >
> >         at java.lang.Thread.run(Thread.java:745)
> >
> > ........
> >
> > java.io.IOException: No space left on device
> >
> >         at java.io.FileOutputStream.writeBytes(Native Method)
> >
> >         at java.io.FileOutputStream.write(FileOutputStream.java:345)
> >
> >         at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
> >
> >         at sun.nio.cs.StreamEncoder.implFlushBuffe
> >
> > (END)
> >
> > On Tue, Jul 14, 2015 at 5:24 PM, Yuheng Du <yuheng.du.h...@gmail.com>
> > wrote:
> >
> > > Hi Jiefu, Gwen,
> > >
> > > I am running the Throughput versus stored data test:
> > > bin/kafka-run-class.sh
> org.apache.kafka.clients.tools.ProducerPerformance
> > > test 50000000000 100 -1 acks=1 bootstrap.servers=
> > > esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864
> > batch.size=8196
> > >
> > > After around 50,000,000 messages were sent, I got a bunch of connection
> > > refused error as I mentioned before. I checked the logs on the broker
> and
> > > here is what I see:
> > >
> > > [2015-07-14 15:11:23,578] WARN Partition [test,4] on broker 5: No
> > > checkpointed highwatermark is found for partition [test,4]
> > > (kafka.cluster.Partition)
> > >
> > > [2015-07-14 15:12:33,298] INFO Rolled new log segment for 'test-4' in 4
> > > ms. (kafka.log.Log)
> > >
> > > [2015-07-14 15:12:33,299] INFO Rolled new log segment for 'test-0' in 1
> > > ms. (kafka.log.Log)
> > >
> > > [2015-07-14 15:13:39,529] INFO Rolled new log segment for 'test-4' in 1
> > > ms. (kafka.log.Log)
> > >
> > > [2015-07-14 15:13:39,531] INFO Rolled new log segment for 'test-0' in 1
> > > ms. (kafka.log.Log)
> > >
> > > [2015-07-14 15:14:48,502] INFO Rolled new log segment for 'test-4' in 3
> > > ms. (kafka.log.Log)
> > >
> > > [2015-07-14 15:14:48,502] INFO Rolled new log segment for 'test-0' in 1
> > > ms. (kafka.log.Log)
> > >
> > > [2015-07-14 15:15:51,478] INFO Rolled new log segment for 'test-4' in 1
> > > ms. (kafka.log.Log)
> > >
> > > [2015-07-14 15:15:51,479] INFO Rolled new log segment for 'test-0' in 1
> > > ms. (kafka.log.Log)
> > >
> > > [2015-07-14 15:16:52,589] INFO Rolled new log segment for 'test-4' in 1
> > > ms. (kafka.log.Log)
> > >
> > > [2015-07-14 15:16:52,590] INFO Rolled new log segment for 'test-0' in 1
> > > ms. (kafka.log.Log)
> > >
> > > [2015-07-14 15:17:57,406] INFO Rolled new log segment for 'test-4' in 1
> > > ms. (kafka.log.Log)
> > >
> > > [2015-07-14 15:17:57,407] INFO Rolled new log segment for 'test-0' in 0
> > > ms. (kafka.log.Log)
> > >
> > > [2015-07-14 15:18:39,792] FATAL [KafkaApi-5] Halting due to
> unrecoverable
> > > I/O error while handling produce request:  (kafka.server.KafkaApis)
> > >
> > > kafka.common.KafkaStorageException: I/O exception in append to log
> > 'test-0'
> > >
> > >         at kafka.log.Log.append(Log.scala:266)
> > >
> > >         at
> > >
> >
> kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:379)
> > >
> > >         at
> > >
> >
> kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:365)
> > >
> > >         at kafka.utils.Utils$.inLock(Utils.scala:535)
> > >
> > >         at kafka.utils.Utils$.inReadLock(Utils.scala:541)
> > >
> > >         at
> > > kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:365)
> > >
> > >         at
> > >
> >
> kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:291)
> > >
> > >         at
> > >
> >
> kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:282)
> > >
> > >         at
> > >
> >
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> > >
> > >         at
> > >
> >
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> > >
> > >         at
> > >
> >
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
> > >
> > >         at
> > >
> >
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
> > >
> > >         at scala.coll
> > >
> > >
> > >
> > > Can you help me with this problem? Thanks.
> > >
> > > On Tue, Jul 14, 2015 at 5:12 PM, Yuheng Du <yuheng.du.h...@gmail.com>
> > > wrote:
> > >
> > >> I checked the logs on the brokers, it seems that the zookeeper or the
> > >> kafka server process is not running on this broker...Thank you guys. I
> > will
> > >> see if it happens again.
> > >>
> > >> On Tue, Jul 14, 2015 at 4:53 PM, JIEFU GONG <jg...@berkeley.edu>
> wrote:
> > >>
> > >>> Hmm..yeah some error logs would be nice like Gwen pointed out. Do any
> > of
> > >>> your brokers fall out of the ISR when sending messages? It seems like
> > >>> your
> > >>> setup should be fine, so I'm not entirely sure.
> > >>>
> > >>> On Tue, Jul 14, 2015 at 1:31 PM, Yuheng Du <yuheng.du.h...@gmail.com
> >
> > >>> wrote:
> > >>>
> > >>> > Jiefu,
> > >>> >
> > >>> > I am performing these tests on a 6 nodes cluster in cloudlab (a
> > >>> > infrastructure built for scientific research). I use 2 nodes as
> > >>> producers,
> > >>> > 2 as brokers only, and 2 as consumers. I have tested for each
> > >>> individual
> > >>> > machines and they work well. I did not use AWS. Thank you!
> > >>> >
> > >>> > On Tue, Jul 14, 2015 at 4:20 PM, JIEFU GONG <jg...@berkeley.edu>
> > >>> wrote:
> > >>> >
> > >>> > > Yuheng, are you performing these tests locally or using a service
> > >>> such as
> > >>> > > AWS? I'd try using each separate machine individually first,
> > >>> connecting
> > >>> > to
> > >>> > > the ZK/Kafka servers and ensuring that each is able to first log
> > and
> > >>> > > consume messages independently.
> > >>> > >
> > >>> > > On Tue, Jul 14, 2015 at 1:17 PM, Gwen Shapira <
> > gshap...@cloudera.com
> > >>> >
> > >>> > > wrote:
> > >>> > >
> > >>> > > > Are there any errors on the broker logs?
> > >>> > > >
> > >>> > > > On Tue, Jul 14, 2015 at 11:57 AM, Yuheng Du <
> > >>> yuheng.du.h...@gmail.com>
> > >>> > > > wrote:
> > >>> > > > > Jiefu,
> > >>> > > > >
> > >>> > > > > Thank you. The three producers can run at the same time. I
> mean
> > >>> > should
> > >>> > > > they
> > >>> > > > > be started at exactly the same time? (I have three consoles
> > from
> > >>> each
> > >>> > > of
> > >>> > > > > the three machines and I just start the console command
> > manually
> > >>> one
> > >>> > by
> > >>> > > > > one) Or a small variation of the starting time won't matter?
> > >>> > > > >
> > >>> > > > > Gwen and Jiefu,
> > >>> > > > >
> > >>> > > > > I have started the three producers at three machines, after a
> > >>> while,
> > >>> > > all
> > >>> > > > of
> > >>> > > > > them gives a java.net.ConnectException:
> > >>> > > > >
> > >>> > > > > [2015-07-14 12:56:46,352] WARN Error in I/O with
> > >>> producer0-link-0/
> > >>> > > > > 192.168.1.1 (org.apache.kafka.common.network.Selector)
> > >>> > > > >
> > >>> > > > > java.net.ConnectException: Connection refused......
> > >>> > > > >
> > >>> > > > > [2015-07-14 12:56:48,056] WARN Error in I/O with
> > >>> producer1-link-0/
> > >>> > > > > 192.168.1.2 (org.apache.kafka.common.network.Selector)
> > >>> > > > >
> > >>> > > > > java.net.ConnectException: Connection refused.....
> > >>> > > > >
> > >>> > > > > What could be the cause?
> > >>> > > > >
> > >>> > > > > Thank you guys!
> > >>> > > > >
> > >>> > > > >
> > >>> > > > >
> > >>> > > > >
> > >>> > > > > On Tue, Jul 14, 2015 at 2:47 PM, JIEFU GONG <
> > jg...@berkeley.edu>
> > >>> > > wrote:
> > >>> > > > >
> > >>> > > > >> Yuheng,
> > >>> > > > >>
> > >>> > > > >> Yes, if you read the blog post it specifies that he's using
> > >>> three
> > >>> > > > separate
> > >>> > > > >> machines. There's no reason the producers cannot be started
> at
> > >>> the
> > >>> > > same
> > >>> > > > >> time, I believe.
> > >>> > > > >>
> > >>> > > > >> On Tue, Jul 14, 2015 at 11:42 AM, Yuheng Du <
> > >>> > yuheng.du.h...@gmail.com
> > >>> > > >
> > >>> > > > >> wrote:
> > >>> > > > >>
> > >>> > > > >> > Hi,
> > >>> > > > >> >
> > >>> > > > >> > I am running the performance test for kafka.
> > >>> > > > >> > https://gist.github.com/jkreps
> > >>> > > > >> > /c7ddb4041ef62a900e6c
> > >>> > > > >> >
> > >>> > > > >> > For the "Three Producers, 3x async replication" scenario,
> > the
> > >>> > > command
> > >>> > > > is
> > >>> > > > >> > the same as one producer:
> > >>> > > > >> >
> > >>> > > > >> > bin/kafka-run-class.sh
> > >>> > > > org.apache.kafka.clients.tools.ProducerPerformance
> > >>> > > > >> > test 50000000 100 -1 acks=1
> > >>> > > > >> > bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092
> > >>> > > > >> > buffer.memory=67108864 batch.size=8196
> > >>> > > > >> >
> > >>> > > > >> > So How to I run the test for three producers? Do I just
> run
> > >>> them
> > >>> > on
> > >>> > > > three
> > >>> > > > >> > separate servers at the same time? Will there be some
> error
> > in
> > >>> > this
> > >>> > > > way
> > >>> > > > >> > since the three producers can't be started at the same
> time?
> > >>> > > > >> >
> > >>> > > > >> > Thanks.
> > >>> > > > >> >
> > >>> > > > >> > best,
> > >>> > > > >> > Yuheng
> > >>> > > > >> >
> > >>> > > > >>
> > >>> > > > >>
> > >>> > > > >>
> > >>> > > > >> --
> > >>> > > > >>
> > >>> > > > >> Jiefu Gong
> > >>> > > > >> University of California, Berkeley | Class of 2017
> > >>> > > > >> B.A Computer Science | College of Letters and Sciences
> > >>> > > > >>
> > >>> > > > >> jg...@berkeley.edu <elise...@berkeley.edu> | (925) 400-3427
> > >>> > > > >>
> > >>> > > >
> > >>> > >
> > >>> > >
> > >>> > >
> > >>> > > --
> > >>> > >
> > >>> > > Jiefu Gong
> > >>> > > University of California, Berkeley | Class of 2017
> > >>> > > B.A Computer Science | College of Letters and Sciences
> > >>> > >
> > >>> > > jg...@berkeley.edu <elise...@berkeley.edu> | (925) 400-3427
> > >>> > >
> > >>> >
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>>
> > >>> Jiefu Gong
> > >>> University of California, Berkeley | Class of 2017
> > >>> B.A Computer Science | College of Letters and Sciences
> > >>>
> > >>> jg...@berkeley.edu <elise...@berkeley.edu> | (925) 400-3427
> > >>>
> > >>
> > >>
> > >
> >
>
>
>
> --
>
> Jiefu Gong
> University of California, Berkeley | Class of 2017
> B.A Computer Science | College of Letters and Sciences
>
> jg...@berkeley.edu <elise...@berkeley.edu> | (925) 400-3427
>

Reply via email to