I found one day that Flume's HTTP source implementation is somewhat outdated 
and it's not really optimized for performance.
 
Our requirement includes processing more than 10k requests within a single 
node, but as Hemanth said, Flume's HTTP source processed a few hundreds per 
second.
 
We decided to implement our own Http source based on netty 4, and it processes 
30~40k per second which perfectly meet our requirements.(without much 
optimization)
 
Regards,
Adrian Seungjin Lee
  
 
-----Original Message-----
From: "Hari Shreedharan"<[email protected]> 
To: "[email protected]"<[email protected]>; 
Cc: 
Sent: 2015-11-15 (일) 16:37:38
Subject: Re: Flume benchmarking with HTTP source & File channel
 
Single event batches are going to be really slow. Multiple reasons - protocol 
overhead, flume channels written to handle batches of events and not single 
events etc

On Saturday, November 14, 2015, Hemanth Abbina <[email protected]> 
wrote:








Hi Hari,

 

Thanks for the response.

 

I haven’t  tried with different source. Will try that.

We are sending through multiple HTTP clients (around 40 clients) and using 
single event per batch.

 

First, we would like to validate & see the max supported HTTP source EPS 
for a single Flume server ( we are testing with 8 core 32 GB RAM), when sent 
single event
 batch from multiple clients.

 

After confirming the EPS at this stage, we are planning to check  the 
performance with batching & multi node Flume support.


 

Thanks,

Hemanth

 

From: Hari Shreedharan [mailto:[email protected]]


Sent: Sunday, November 15, 2015 8:41 AM

To: [email protected]

Subject: Re: Flume benchmarking with HTTP source & File channel

 

Did you try with a different source? Is your sender multithreaded? Sending from 
a single thread would obviously be slow. How many messages per batch? The 
bigger your batch is, better your perf will be



On Saturday, November 14, 2015, Hemanth Abbina <[email protected]> 
wrote:




Thanks Gonzalo.

 

Yes, it’s a single server. First we would like to confirm the max throughput by 
a single server with
 this configuration. Size of each message is around 512 bytes.

 

I have tried with in-memory & null sink too. Performance increased by 50 
requests/sec or so, not beyond
 that.

 

In some of the forums, I have seen Flume benchmark of 30K/40K per single node 
(I’m not sure about the
 configurations). So, trying to check the max throughput by a server.

 

From: Gonzalo Herreros [mailto:[email protected]]


Sent: Saturday, November 14, 2015 2:02 PM

To: user <[email protected]>

Subject: Re: Flume benchmarking with HTTP source & File channel

 

If that is just with a single server, 600 messages per sec doesn't sound bad to 
me.

Depending on the size of each message, it could be the network the limiting 
factor.

I would try with the null sink and in memory channel. If that doesn't improve 
things I would say you need more nodes to go beyond that.

Regards,

Gonzalo


On Nov 14, 2015 7:40 AM, "Hemanth Abbina" <[email protected]> 
wrote:




Hi,

 

We have been trying to validate & benchmark the Flume performance for our 
production use.

 

We have configured Flume to have HTTP source, File channel & Kafka sink.


Hardware : 8 Core, 32 GB RAM, CentOS6.5, Disk - 500 GB HDD.

Flume configuration:


svcagent.sources = http-source                                                  
                     



svcagent.sinks = kafka-sink1                                                    
                     



svcagent.channels = file-channel1


 


# HTTP source to read receive events on port 5005


svcagent.sources.http-source.type = http                                        
                     



svcagent.sources.http-source.channels = file-channel1                           
                                                                                
                                                                                
                                                                  


svcagent.sources.http-source.port = 5005                                        
                     



svcagent.sources.http-source.bind = 10.15.1.31                                  
                     



                                                                                
                      


svcagent.sources.http-source.selector.type = multiplexing                       
                     



svcagent.sources.http-source.selector.header = archival                         
                      


svcagent.sources.http-source.selector.mapping.true = file-channel1              
                     



svcagent.sources.http-source.selector.default = file-channel1                   
                     



#svcagent.sources.http-source.handler 
=org.eiq.flume.JSONHandler.HTTPSourceJSONHandler               



                                                                                
                                                                                
                                    


svcagent.sinks.kafka-sink1.topic = flume-sink1                                  
                    



svcagent.sinks.kafka-sink1.brokerList = 
10.15.1.32:9092                                              


svcagent.sinks.kafka-sink1.channel = file-channel1                              
                    



svcagent.sinks.kafka-sink1.batchSize = 5000                                     
                                                                                
                                            


                                                                                
                      


svcagent.channels.file-channel1.type = file                                     
                     



svcagent.channels.file-channel1.checkpointDir=/etc/flume-kafka/checkpoint       
                     



svcagent.channels.file-channel1.dataDirs=/etc/flume-kafka/data                  
                     



svcagent.channels.file-channel1.transactionCapacity=10000                       
                      


svcagent.channels.file-channel1.capacity=50000                                  
                     



svcagent.channels.file-channel1.checkpointInterval=120000                       
                     



svcagent.channels.file-channel1.checkpointOnClose=true                          
                     



svcagent.channels.file-channel1.maxFileSize=536870912                           
                     



svcagent.channels.file-channel1.use-fast-replay=false                           
                      

  

When we tried to stream HTTP data, from multiple clients (around 40 HTTP 
clients), we could get a max processing of 600  requests/sec, and not beyond 
that. Increased the XMX setting
 of Flume to 4096.

 

Even we have tried with a Null Sink (instead of Kafka sink). Did not get much 
performance improvements. So, assuming the blockage is the HTTP source & 
File channel.

 

Could you please suggest any fine tunings to improve the performance of this 
setup.                                                                          
                        

 

--regards

Hemanth












-- 


 


Thanks,



Hari



 






-- 

Thanks,Hari



Reply via email to