I found one day that Flume's HTTP source implementation is somewhat outdated and it's not really optimized for performance. Our requirement includes processing more than 10k requests within a single node, but as Hemanth said, Flume's HTTP source processed a few hundreds per second. We decided to implement our own Http source based on netty 4, and it processes 30~40k per second which perfectly meet our requirements.(without much optimization) Regards, Adrian Seungjin Lee -----Original Message----- From: "Hari Shreedharan"<[email protected]> To: "[email protected]"<[email protected]>; Cc: Sent: 2015-11-15 (일) 16:37:38 Subject: Re: Flume benchmarking with HTTP source & File channel Single event batches are going to be really slow. Multiple reasons - protocol overhead, flume channels written to handle batches of events and not single events etc
On Saturday, November 14, 2015, Hemanth Abbina <[email protected]> wrote: Hi Hari, Thanks for the response. I haven’t tried with different source. Will try that. We are sending through multiple HTTP clients (around 40 clients) and using single event per batch. First, we would like to validate & see the max supported HTTP source EPS for a single Flume server ( we are testing with 8 core 32 GB RAM), when sent single event batch from multiple clients. After confirming the EPS at this stage, we are planning to check the performance with batching & multi node Flume support. Thanks, Hemanth From: Hari Shreedharan [mailto:[email protected]] Sent: Sunday, November 15, 2015 8:41 AM To: [email protected] Subject: Re: Flume benchmarking with HTTP source & File channel Did you try with a different source? Is your sender multithreaded? Sending from a single thread would obviously be slow. How many messages per batch? The bigger your batch is, better your perf will be On Saturday, November 14, 2015, Hemanth Abbina <[email protected]> wrote: Thanks Gonzalo. Yes, it’s a single server. First we would like to confirm the max throughput by a single server with this configuration. Size of each message is around 512 bytes. I have tried with in-memory & null sink too. Performance increased by 50 requests/sec or so, not beyond that. In some of the forums, I have seen Flume benchmark of 30K/40K per single node (I’m not sure about the configurations). So, trying to check the max throughput by a server. From: Gonzalo Herreros [mailto:[email protected]] Sent: Saturday, November 14, 2015 2:02 PM To: user <[email protected]> Subject: Re: Flume benchmarking with HTTP source & File channel If that is just with a single server, 600 messages per sec doesn't sound bad to me. Depending on the size of each message, it could be the network the limiting factor. I would try with the null sink and in memory channel. If that doesn't improve things I would say you need more nodes to go beyond that. Regards, Gonzalo On Nov 14, 2015 7:40 AM, "Hemanth Abbina" <[email protected]> wrote: Hi, We have been trying to validate & benchmark the Flume performance for our production use. We have configured Flume to have HTTP source, File channel & Kafka sink. Hardware : 8 Core, 32 GB RAM, CentOS6.5, Disk - 500 GB HDD. Flume configuration: svcagent.sources = http-source svcagent.sinks = kafka-sink1 svcagent.channels = file-channel1 # HTTP source to read receive events on port 5005 svcagent.sources.http-source.type = http svcagent.sources.http-source.channels = file-channel1 svcagent.sources.http-source.port = 5005 svcagent.sources.http-source.bind = 10.15.1.31 svcagent.sources.http-source.selector.type = multiplexing svcagent.sources.http-source.selector.header = archival svcagent.sources.http-source.selector.mapping.true = file-channel1 svcagent.sources.http-source.selector.default = file-channel1 #svcagent.sources.http-source.handler =org.eiq.flume.JSONHandler.HTTPSourceJSONHandler svcagent.sinks.kafka-sink1.topic = flume-sink1 svcagent.sinks.kafka-sink1.brokerList = 10.15.1.32:9092 svcagent.sinks.kafka-sink1.channel = file-channel1 svcagent.sinks.kafka-sink1.batchSize = 5000 svcagent.channels.file-channel1.type = file svcagent.channels.file-channel1.checkpointDir=/etc/flume-kafka/checkpoint svcagent.channels.file-channel1.dataDirs=/etc/flume-kafka/data svcagent.channels.file-channel1.transactionCapacity=10000 svcagent.channels.file-channel1.capacity=50000 svcagent.channels.file-channel1.checkpointInterval=120000 svcagent.channels.file-channel1.checkpointOnClose=true svcagent.channels.file-channel1.maxFileSize=536870912 svcagent.channels.file-channel1.use-fast-replay=false When we tried to stream HTTP data, from multiple clients (around 40 HTTP clients), we could get a max processing of 600 requests/sec, and not beyond that. Increased the XMX setting of Flume to 4096. Even we have tried with a Null Sink (instead of Kafka sink). Did not get much performance improvements. So, assuming the blockage is the HTTP source & File channel. Could you please suggest any fine tunings to improve the performance of this setup. --regards Hemanth -- Thanks, Hari -- Thanks,Hari
