Thanks for information Roshan. I was able to find your email. >From your experiment the best you could get was 538K message for single agent which you mentioned was about ~250MB/s. Do you know what was compression ratio? Also how much memory did you give for agent? These numbers are similar to what we are seeing. WIth 2 sinks we see about 50K (1K messages) so ~50MB/s.
2015-07-15 13:45 GMT-07:00 Roshan Naik <[email protected]>: > Yes.. My bad.. Been meaning to do it… will try to do it his week. > -roshan > > From: Hari Shreedharan <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Wednesday, July 15, 2015 1:41 PM > > To: "[email protected]" <[email protected]> > Subject: Re: HDFS Sink performance > > Roshan - how about posting that on the Flume wiki? > > > Thanks, > Hari > > On Wed, Jul 15, 2015 at 1:07 PM, Roshan Naik <[email protected]> > wrote: > >> Lohit, >> You may want to search the mailing list for 'Flume perf measurements' . >> You should find the recent measurements I posted. >> -roshan >> >> From: lohit <[email protected]> >> Reply-To: "[email protected]" <[email protected]> >> Date: Wednesday, July 15, 2015 11:19 AM >> To: "[email protected]" <[email protected]> >> Subject: Re: HDFS Sink performance >> >> Thanks for the reply Hari. Multiple Sinks make sense, but this would >> also mean there is lot more files on HDFS. I will try multiple sinks and >> see how fast this can go to. >> Given that single HDFS stream can do much higher throughput, may be there >> is way to have threadpool for SinkRunner-PollingRunner-DefaultSinkProcessor >> instead of single thread per sink. >> >> 2015-07-15 11:11 GMT-07:00 Hari Shreedharan <[email protected]>: >> >>> Hi Lohit, >>> >>> HDFS sinks (in fact, most sinks) are single-threaded by design. This >>> is meant to make writing the sinks easier, but all channels can handle >>> multiple sinks reading from them. So to improve the efficiency, you >>> basically configure several sinks which read off the same channel. Make >>> sure that each sink though writes to files with different HDFS paths or >>> different file prefixes (else HDFS client API will complain about leases). >>> >>> >>> Thanks, >>> Hari >>> >>> On Wed, Jul 15, 2015 at 9:10 AM, lohit <[email protected]> >>> wrote: >>> >>>> Hello, >>>> >>>> Does anyone have some numbers which they can share around HDFS sink >>>> performance. From our testing, for single sink writing to HDFS >>>> (CompressedStream) and reading from MemoryChannel can only do about 35000 >>>> events per second (each event is about 1K) in size. After compression this >>>> turns out to be ~10MB/s write stream to HDFS file. Which is pretty low. Our >>>> configuration looks like this >>>> >>>> agent.sinks.hdfsSink.type = hdfs >>>> agent.sinks.hdfsSink.channel = memoryChannel >>>> agent.sinks.hdfsSink.hdfs.path = /tmp/lohit >>>> agent.sinks.hdfsSink.hdfs.codeC = lzo >>>> agent.sinks.hdfsSink.hdfs.fileType = CompressedStream >>>> agent.sinks.hdfsSink.hdfs.writeFormat = Writable >>>> agent.sinks.hdfsSink.hdfs.rollInterval = 3600 >>>> agent.sinks.hdfsSink.hdfs.rollSize = 1073741824 >>>> agent.sinks.hdfsSink.hdfs.rollCount = 0 >>>> agent.sinks.hdfsSink.hdfs.batchSize = 10000 >>>> agent.sinks.hdfsSink.hdfs.txnEventMax = 10000 >>>> >>>> agent.channels.memoryChannel.type = memory >>>> >>>> agent.channels.memoryChannel.capacity = 3000000 >>>> agent.channels.memoryChannel.transactionCapacity = 10000 >>>> >>>> -- >>>> Have a Nice Day! >>>> Lohit >>>> >>> >>> >> >> >> -- >> Have a Nice Day! >> Lohit >> > > -- Have a Nice Day! Lohit
