Majority of messages need not be persisted to disk for us. So, we are interested in MemoryChannel. There has been gradual performance degradation from 1.3.1 -> 1.4.0 -> 1.6.0. See this graph below, were I have a constant stream of messages (blue line). While this is happening I swap different versions of flumes for agent. Orange line shows messages dropped. (Flat line is when data is streamed to HDFS) and I have marked flat lines with different versions.
2015-07-22 19:48 GMT-07:00 Roshan Naik <[email protected]>: > > My guess is that most of you will probably use File channel in > production with HDFS sink? In which scenario the common observation seems > to be that the File channel becomes the primary bottleneck. Going by > Robert's observations too seems to have dropped also since v1.3. > > Robert, can u confirm how many data dirs were used for your readings > with FCh ? > > -roshan > > > > From: lohit <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Wednesday, July 22, 2015 3:01 PM > To: "[email protected]" <[email protected]> > > Subject: Re: HDFS Sink performance > > Thanks for sharing these number Robert. Curious, I did the same > experiment. > Flume 1.3.1 version has higher throughput than 1.6.0 (I was able to get > sustained 60MB/s with Flume 1.3.1) > No config or setup change, just changing flume version shows this > difference. We should probably look at change set between 1.3.1 and 1.5 to > see if there was any obvious changes. > > 2015-07-22 14:00 GMT-07:00 Robert B Hamilton <[email protected]>: > >> Here is a comparison between versions 1.3, 1.5, and 1.6. >> I would estimate that error bars are plus or minus 15%. >> >> All parameters are identical, as between runs all I change is the version >> of flume. >> Lohit’s numbers are fairly consistent with this, because if we double the >> sinks from my 4 to his 8 and assuming linear scalability we would expect to >> get somewhere close to 30-40MB/s. >> >> It looks like the drop off is more pronounced for the larger event size. >> This is of concern to us because we are looking at this for a high volume >> feed with message sizes up to 80 kB. >> >> ------------------------------------------ >> HDFSx4 sink, Memory channel >> -------------------------------------- >> Payload V1.3 v1.5 v1.6 >> (kB) MB/s >> ---------- ----- ----- ----- >> 1 27 17 20 >> 25 56 15 15 >> >> >> >> From: Hari Shreedharan [mailto:[email protected]] >> Sent: Wednesday, July 22, 2015 1:27 PM >> To: [email protected] >> Subject: Re: HDFS Sink performance >> >> That is a bit disconcerting. Are you using the same HDFS setup and same >> config for both tests? Would it be possible for you to take a look at Flume >> 1.6.0? Such drops in performance should be taken care of. >> >> >> >> Thanks, >> Hari >> >> On Wed, Jul 22, 2015 at 11:04 AM, Robert B Hamilton < >> [email protected]> wrote: >> My mailer totally scrambled the numbers, probably by inserting special >> characters. >> Sorry, here are the actual results.... >> >> All rates in MB/s >> Payload in KB >> >> Flume 1.3.1 >> Payload rate memchRate Fch >> 25 34 29 >> 25 31 27.6 >> 25 50 23.3 >> 25 46.5 27.2 >> 50 31.3 23.8 >> 50 37.4 31.3 >> 50 32.3 31.8 >> 80 30.5 25.8 >> 80 46.2 25.2 >> 80 39.1 25.8 >> 80 56.5 25.1 >> >> Flume 1.5. >> Payload rate memchRate Fch >> 25 18.7 15.6 >> 50 18.3 17.3 >> 80 18.4 15.6 >> >> -----Original Message----- >> From: Robert B Hamilton [mailto:[email protected]] >> Sent: Wednesday, July 22, 2015 11:00 AM >> To: [email protected] >> Subject: RE: HDFS Sink performance >> >> I only see that kind of throughput for event sizes of 25kB to 50kB or >> larger. >> >> These particular tests are done on flume version 1.3.1. >> But because you asked, I thought to do a few quick runs on 1.5.0.1 and >> added those results below. The results are significantly different for 1.5 >> and I wonder if this is a cause for concern. >> >> None of this has been peer reviewed so it should be considered as >> tentative. >> >> As to the HDD, here is result of a quick and dirty dd test. >> >> dd if=/dev/zero of=100M bs=1M count=100 conv=fsync oflag=sync >> 104857600 bytes (105 MB) copied, 0.685646 s, 153 MB/s >> >> >> Source data: each record consists of random ascii strings of constant >> length (25k,50k,or 80k depending on the run). >> Source: spooldir >> Channel: file channel single dataDir, or memory channel. >> Sink: four HDFS, SequenceFile, Text, Batch size=10, rollInterval=20 >> seconds. >> >> Batch size was kept small because of memory channel capacity. Increasing >> batch size for file channel did not improve performance so I kept it at 10. >> >> Here I have numbers for some runs where the payload is varied from >> 25K,50K, and 80K. I include memory channel for comparison. >> >> Multiple runs were peformed for each event size. As you can see the >> throughput can vary from run to run because these particular measurements >> were done on an environment that is not tightly controlled. Think of them >> as "in situ" measurements :) >> >> Flume 1.3.1 memory channel and file channel >> ------------------------------------------------------- >> Payload Rate memch Rate(filechl) >> (kB)(MB/s) (MB/s) >> ----------------------------------------------------- >> 253429 >> 253127.6 >> 255023.3 >> 2546.527.2 >> 5031.223.8 >> 5037.431.3 >> 5032.331.8 >> 8030.525.8 >> 8046.225.2 >> 8039.125.8 >> 8056.525.1 >> >> >> Flume 1.5 File Channel and Memory Channel >> --------------------------------------------------- >> Event size Rate memch Rate filech >> (KB) (MB/s) (MB/s) >> --------------------------------------------------- >> 2518.715.6 >> 5018.317.3 >> 8018.415.6 >> >> -----Original Message----- >> From: Roshan Naik [mailto:[email protected]] >> Sent: Friday, July 17, 2015 6:21 PM >> To: [email protected] >> Subject: Re: HDFS Sink performance >> >> I Updated the Flume wiki with my measurements. Also added section with >> Hive sink measurements. >> >> >> https://cwiki.apache.org/confluence/display/FLUME/Performance+Measurements+ >> -+round+2 >> <https://cwiki.apache.org/confluence/display/FLUME/Performance+Measurements+-+round+2> >> >> >> @Robert: >> What sort of a HDD are you using ? >> What is event size ? >> Which version of flume ? >> >> -roshan >> >> >> >> >> On 7/17/15 12:51 PM, "Robert B Hamilton" <[email protected]> wrote: >> >> >Our testing has shown up to 60MB/s to HDFS if we use up to 8 or 10 >> >sinks per agent, and with a file channel with a single dataDir. >> > >> > >> >From: lohit [mailto:[email protected]] >> >Sent: Wednesday, July 15, 2015 11:11 AM >> >To: [email protected] >> >Subject: HDFS Sink performance >> > >> >Hello, >> > >> >Does anyone have some numbers which they can share around HDFS sink >> >performance. From our testing, for single sink writing to HDFS >> >(CompressedStream) and reading from MemoryChannel can only do about >> >35000 events per second (each event is about 1K) in size. After >> >compression this turns out to be ~10MB/s write stream to HDFS file. >> >Which is pretty low. Our configuration looks like this >> > >> >agent.sinks.hdfsSink.type = hdfs >> >agent.sinks.hdfsSink.channel = memoryChannel >> >agent.sinks.hdfsSink.hdfs.path = /tmp/lohit >> >agent.sinks.hdfsSink.hdfs.codeC = lzo >> >agent.sinks.hdfsSink.hdfs.fileType = CompressedStream >> >agent.sinks.hdfsSink.hdfs.writeFormat = Writable >> >agent.sinks.hdfsSink.hdfs.rollInterval = 3600 >> >agent.sinks.hdfsSink.hdfs.rollSize = 1073741824 >> >agent.sinks.hdfsSink.hdfs.rollCount = 0 >> >agent.sinks.hdfsSink.hdfs.batchSize = 10000 >> >agent.sinks.hdfsSink.hdfs.txnEventMax = 10000 >> > >> >agent.channels.memoryChannel.type = memory >> > >> >agent.channels.memoryChannel.capacity = 3000000 >> >agent.channels.memoryChannel.transactionCapacity = 10000 >> > >> >-- >> >Have a Nice Day! >> >Lohit >> > >> > >> >Nothing in this message is intended to constitute an electronic >> >signature unless a specific statement to the contrary is included in >> this message. >> > >> >Confidentiality Note: This message is intended only for the person or >> >entity to which it is addressed. It may contain confidential and/or >> >privileged material. Any review, transmission, dissemination or other >> >use, or taking of any action in reliance upon this message by persons >> >or entities other than the intended recipient is prohibited and may be >> >unlawful. If you received this message in error, please contact the >> >sender and delete it from your computer. >> >> >> >> Nothing in this message is intended to constitute an electronic signature >> unless a specific statement to the contrary is included in this message. >> >> Confidentiality Note: This message is intended only for the person or >> entity to which it is addressed. It may contain confidential and/or >> privileged material. Any review, transmission, dissemination or other use, >> or taking of any action in reliance upon this message by persons or >> entities other than the intended recipient is prohibited and may be >> unlawful. If you received this message in error, please contact the sender >> and delete it from your computer. >> >> >> Nothing in this message is intended to constitute an electronic signature >> unless a specific statement to the contrary is included in this message. >> >> Confidentiality Note: This message is intended only for the person or >> entity to which it is addressed. It may contain confidential and/or >> privileged material. Any review, transmission, dissemination or other use, >> or taking of any action in reliance upon this message by persons or >> entities other than the intended recipient is prohibited and may be >> unlawful. If you received this message in error, please contact the sender >> and delete it from your computer. >> >> >> >> Nothing in this message is intended to constitute an electronic signature >> unless a specific statement to the contrary is included in this message. >> >> Confidentiality Note: This message is intended only for the person or >> entity to which it is addressed. It may contain confidential and/or >> privileged material. Any review, transmission, dissemination or other use, >> or taking of any action in reliance upon this message by persons or >> entities other than the intended recipient is prohibited and may be >> unlawful. If you received this message in error, please contact the sender >> and delete it from your computer. >> > > > > -- > Have a Nice Day! > Lohit > -- Have a Nice Day! Lohit
