Hello guys: I am now doing flume-ng performance test in EC2 instance. And I have a tier-2 framework, avro client post a 1G file to avro source, and then the file is store to HDFS by hdfsSink. I wondered why this take about 33ms, network, cpu, memory both have no pressure. In theory my network can work at 100MB/s, but flume only take about 60MB/s. How can I resolve this problem? Thanks a lot. Below is my configure and my test result
# SOURCE a1.sources = r1 a1.sinks = k1 a1.channels = c1 a1.sources.r1.type = avro a1.sources.r1.bind = 10.0.2.13 a1.sources.r1.port = 9876 a1.sources.r1.threads = 10 # SINK (HDFS) a1.sinks.k1.type = hdfs a1.sinks.k1.channel = c1 a1.sinks.k1.hdfs.filePrefix = packet a1.sinks.k1.hdfs.batchSize= 5000 a1.sinks.k1.hdfs.fileSuffix = .snappy a1.sinks.k1.hdfs.codeC = snappy a1.sinks.k1.hdfs.fileType = CompressedStream a1.sinks.k1.hdfs.rollCount = 0 a1.sinks.k1.hdfs.rollSize = 500000000 a1.sinks.k1.hdfs.rollInterval = 0 a1.sinks.k1.hdfs.path = .... # INTERCEPTORS (TIMESTAMP FOR HDFS PATH) a1.sources.r1.interceptors = i1 i2 a1.sources.r1.interceptors.i1.type = timestamp a1.sources.r1.interceptors.i2.type = host a1.sources.r1.interceptors.i2.preserveExisting = false a1.sources.r1.interceptors.i2.hostHeader = test-1 # CHANNEL (MEM),take max 1g memory a1.channels.c1.type = memory a1.channels.c1.capacity = 10000 a1.channels.c1.transactionCapacity = 5000 a1.channels.c1.byteCapacity = 1000000000 ## bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1 my client command: time flume-ng avro-client -c /etc/flume-ng/conf -P /tmp/rpcProps -F /tmp/flume/tmp/test-1.tmp -Xmx1024m -Xms1024m -Xmn800m -Xss512k command result:real 0m33.061s user 0m13.689s sys 0m5.504s
