Hello, Does anyone have some numbers which they can share around HDFS sink performance. From our testing, for single sink writing to HDFS (CompressedStream) and reading from MemoryChannel can only do about 35000 events per second (each event is about 1K) in size. After compression this turns out to be ~10MB/s write stream to HDFS file. Which is pretty low. Our configuration looks like this
agent.sinks.hdfsSink.type = hdfs agent.sinks.hdfsSink.channel = memoryChannel agent.sinks.hdfsSink.hdfs.path = /tmp/lohit agent.sinks.hdfsSink.hdfs.codeC = lzo agent.sinks.hdfsSink.hdfs.fileType = CompressedStream agent.sinks.hdfsSink.hdfs.writeFormat = Writable agent.sinks.hdfsSink.hdfs.rollInterval = 3600 agent.sinks.hdfsSink.hdfs.rollSize = 1073741824 agent.sinks.hdfsSink.hdfs.rollCount = 0 agent.sinks.hdfsSink.hdfs.batchSize = 10000 agent.sinks.hdfsSink.hdfs.txnEventMax = 10000 agent.channels.memoryChannel.type = memory agent.channels.memoryChannel.capacity = 3000000 agent.channels.memoryChannel.transactionCapacity = 10000 -- Have a Nice Day! Lohit
