Here is a comparison between versions 1.3, 1.5, and 1.6.
I would estimate that error bars are plus or minus 15%.

All parameters are identical, as between runs all I change is the version of 
flume.
Lohit’s numbers are fairly consistent with this, because if we double the sinks 
from my 4 to his 8 and assuming linear scalability we would expect to get 
somewhere close to 30-40MB/s.

It looks like the drop off is more pronounced for the larger event size.  This 
is of concern to us because we are looking at this for a high volume feed with 
message sizes up to 80 kB.

------------------------------------------
HDFSx4 sink, Memory channel
--------------------------------------
Payload     V1.3      v1.5     v1.6
(kB)              MB/s
----------      -----     -----    -----
1                    27         17         20
25                  56         15         15



From: Hari Shreedharan [mailto:[email protected]]
Sent: Wednesday, July 22, 2015 1:27 PM
To: [email protected]
Subject: Re: HDFS Sink performance

That is a bit disconcerting. Are you using the same HDFS setup and same config 
for both tests? Would it be possible for you to take a look at Flume 1.6.0? 
Such drops in performance should be taken care of.



Thanks,
Hari

On Wed, Jul 22, 2015 at 11:04 AM, Robert B Hamilton <[email protected]> 
wrote:
My mailer totally scrambled the numbers, probably by inserting special 
characters.
Sorry, here are the actual results....

All rates in MB/s
Payload in KB

Flume 1.3.1
Payload   rate memchRate Fch
25                  34                      29
25                  31                  27.6
25                  50                  23.3
25                  46.5                  27.2
50                  31.3                  23.8
50                  37.4                  31.3
50                  32.3                  31.8
80                  30.5                  25.8
80                  46.2                  25.2
80                  39.1                  25.8
80                  56.5                  25.1

Flume 1.5.
Payload  rate memchRate Fch
25                  18.7                  15.6
50                  18.3                  17.3
80                  18.4                   15.6

-----Original Message-----
From: Robert B Hamilton [mailto:[email protected]]
Sent: Wednesday, July 22, 2015 11:00 AM
To: [email protected]
Subject: RE: HDFS Sink performance

 I only see that kind of throughput for event sizes of 25kB to 50kB or larger.

These particular tests are done on flume version 1.3.1.
But because you asked,  I thought to do a few quick runs on 1.5.0.1 and added 
those results below.  The results are significantly different for 1.5 and I 
wonder if this is a cause for concern.

None of this has been peer reviewed so it should be considered as tentative.

As to the HDD, here is result of a quick and dirty dd test.

  dd if=/dev/zero of=100M bs=1M count=100 conv=fsync oflag=sync
   104857600 bytes (105 MB) copied, 0.685646 s, 153 MB/s


Source data: each record consists of random ascii strings of constant length 
(25k,50k,or 80k depending on the run).
Source: spooldir
Channel: file channel single dataDir, or memory channel.
Sink: four HDFS, SequenceFile, Text, Batch size=10, rollInterval=20 seconds.

Batch size was kept small because of memory channel capacity. Increasing batch 
size for file channel did not improve performance so I kept it at 10.

Here I have numbers for some runs where the payload is varied from 25K,50K, and 
80K. I include memory channel for comparison.

Multiple runs were peformed for each event size. As you can see the throughput 
can vary from run to run because these particular measurements were done on an 
environment that is not tightly controlled.  Think of them as "in situ" 
measurements :)

Flume 1.3.1 memory channel and file channel
-------------------------------------------------------
Payload  Rate memch Rate(filechl)
(kB)(MB/s)       (MB/s)
-----------------------------------------------------
253429
253127.6
255023.3
2546.527.2
5031.223.8
5037.431.3
5032.331.8
8030.525.8
8046.225.2
8039.125.8
8056.525.1


Flume 1.5 File Channel and Memory Channel
---------------------------------------------------
Event size  Rate memch Rate filech
(KB)        (MB/s)  (MB/s)
---------------------------------------------------
2518.715.6
5018.317.3
8018.415.6

-----Original Message-----
From: Roshan Naik [mailto:[email protected]]
Sent: Friday, July 17, 2015 6:21 PM
To: [email protected]
Subject: Re: HDFS Sink performance

I Updated the Flume wiki with my measurements. Also added section with Hive 
sink measurements.

https://cwiki.apache.org/confluence/display/FLUME/Performance+Measurements+
-+round+2


@Robert:
  What sort of a HDD are you using ?
  What is event size ?
  Which version of flume ?

-roshan




On 7/17/15 12:51 PM, "Robert B Hamilton" <[email protected]> wrote:

>Our testing has shown up to 60MB/s to HDFS if we use up to 8 or 10
>sinks per agent, and with a file channel with a single dataDir.
>
>
>From: lohit [mailto:[email protected]]
>Sent: Wednesday, July 15, 2015 11:11 AM
>To: [email protected]
>Subject: HDFS Sink performance
>
>Hello,
>
>Does anyone have some numbers which they can share around HDFS sink
>performance. From our testing, for single sink writing to HDFS
>(CompressedStream) and reading from MemoryChannel can only do about
>35000 events per second (each event is about 1K) in size. After
>compression this turns out to be ~10MB/s write stream to HDFS file.
>Which is pretty low. Our configuration looks like this
>
>agent.sinks.hdfsSink.type = hdfs
>agent.sinks.hdfsSink.channel = memoryChannel
>agent.sinks.hdfsSink.hdfs.path = /tmp/lohit
>agent.sinks.hdfsSink.hdfs.codeC = lzo
>agent.sinks.hdfsSink.hdfs.fileType = CompressedStream
>agent.sinks.hdfsSink.hdfs.writeFormat = Writable
>agent.sinks.hdfsSink.hdfs.rollInterval = 3600
>agent.sinks.hdfsSink.hdfs.rollSize = 1073741824
>agent.sinks.hdfsSink.hdfs.rollCount = 0
>agent.sinks.hdfsSink.hdfs.batchSize = 10000
>agent.sinks.hdfsSink.hdfs.txnEventMax = 10000
>
>agent.channels.memoryChannel.type = memory
>
>agent.channels.memoryChannel.capacity = 3000000
>agent.channels.memoryChannel.transactionCapacity = 10000
>
>--
>Have a Nice Day!
>Lohit
>
>
>Nothing in this message is intended to constitute an electronic
>signature unless a specific statement to the contrary is included in this 
>message.
>
>Confidentiality Note: This message is intended only for the person or
>entity to which it is addressed. It may contain confidential and/or
>privileged material. Any review, transmission, dissemination or other
>use, or taking of any action in reliance upon this message by persons
>or entities other than the intended recipient is prohibited and may be
>unlawful. If you received this message in error, please contact the
>sender and delete it from your computer.



Nothing in this message is intended to constitute an electronic signature 
unless a specific statement to the contrary is included in this message.

Confidentiality Note: This message is intended only for the person or entity to 
which it is addressed. It may contain confidential and/or privileged material. 
Any review, transmission, dissemination or other use, or taking of any action 
in reliance upon this message by persons or entities other than the intended 
recipient is prohibited and may be unlawful. If you received this message in 
error, please contact the sender and delete it from your computer.


Nothing in this message is intended to constitute an electronic signature 
unless a specific statement to the contrary is included in this message.

Confidentiality Note: This message is intended only for the person or entity to 
which it is addressed. It may contain confidential and/or privileged material. 
Any review, transmission, dissemination or other use, or taking of any action 
in reliance upon this message by persons or entities other than the intended 
recipient is prohibited and may be unlawful. If you received this message in 
error, please contact the sender and delete it from your computer.



Nothing in this message is intended to constitute an electronic signature 
unless a specific statement to the contrary is included in this message.

Confidentiality Note: This message is intended only for the person or entity to 
which it is addressed. It may contain confidential and/or privileged material. 
Any review, transmission, dissemination or other use, or taking of any action 
in reliance upon this message by persons or entities other than the intended 
recipient is prohibited and may be unlawful. If you received this message in 
error, please contact the sender and delete it from your computer.

Reply via email to