Our testing has shown up to 60MB/s to HDFS if we use up to 8 or 10 sinks per agent, and with a file channel with a single dataDir.
From: lohit [mailto:[email protected]] Sent: Wednesday, July 15, 2015 11:11 AM To: [email protected] Subject: HDFS Sink performance Hello, Does anyone have some numbers which they can share around HDFS sink performance. From our testing, for single sink writing to HDFS (CompressedStream) and reading from MemoryChannel can only do about 35000 events per second (each event is about 1K) in size. After compression this turns out to be ~10MB/s write stream to HDFS file. Which is pretty low. Our configuration looks like this agent.sinks.hdfsSink.type = hdfs agent.sinks.hdfsSink.channel = memoryChannel agent.sinks.hdfsSink.hdfs.path = /tmp/lohit agent.sinks.hdfsSink.hdfs.codeC = lzo agent.sinks.hdfsSink.hdfs.fileType = CompressedStream agent.sinks.hdfsSink.hdfs.writeFormat = Writable agent.sinks.hdfsSink.hdfs.rollInterval = 3600 agent.sinks.hdfsSink.hdfs.rollSize = 1073741824 agent.sinks.hdfsSink.hdfs.rollCount = 0 agent.sinks.hdfsSink.hdfs.batchSize = 10000 agent.sinks.hdfsSink.hdfs.txnEventMax = 10000 agent.channels.memoryChannel.type = memory agent.channels.memoryChannel.capacity = 3000000 agent.channels.memoryChannel.transactionCapacity = 10000 -- Have a Nice Day! Lohit Nothing in this message is intended to constitute an electronic signature unless a specific statement to the contrary is included in this message. Confidentiality Note: This message is intended only for the person or entity to which it is addressed. It may contain confidential and/or privileged material. Any review, transmission, dissemination or other use, or taking of any action in reliance upon this message by persons or entities other than the intended recipient is prohibited and may be unlawful. If you received this message in error, please contact the sender and delete it from your computer.
