RE: Flume benchmarking with HTTP source & File channel

Hemanth Abbina Thu, 19 Nov 2015 20:20:07 -0800

Hi All,

These are the follow up observations & issues on the benchmarking.


Configuration is same as HTTP source -> File Channel -> Kafka Sink: When sent 
larger messages from the HTTP clients, observed EPS is around 140. Each single 
large message is batch of 100 individual log messages, so I can say the 
effective EPS is 14,000.

When I further increase the streaming rate from the clients, the file channel 
is overflowing and throwing errors “Error appending event to channel. Channel 
might be full. Unable to put batch on required channel: FileChannel 
file-channel1 { dataDirs: [/etc/flume-kafka/data]”.

I understood that the issue might be Kafka sink is slower than the HTTP source. 
How can I overcome this ? Tried creaking a sink group with load balancer 
support, but of no use.

Could you pleaes suggest me something to overcome the slow Kafka sink problem ?

svcagent.sources = http-source
svcagent.sinks = kafka-sink1
svcagent.channels = file-channel1

svcagent.sources.http-source.type = http
svcagent.sources.http-source.channels = file-channel1
svcagent.sources.http-source.port = 5005
svcagent.sources.http-source.bind = 10.15.1.31
svcagent.sources.http-source.handler 
=org.eiq.flume.JSONHandler.HTTPSourceJSONHandler

svcagent.sinks.kafka-sink1.type =  org.apache.flume.sink.kafka.KafkaSink
svcagent.sinks.kafka-sink1.topic = flume-sink1
svcagent.sinks.kafka-sink1.brokerList = 10.15.1.32:9092,10.15.1.32:9093
svcagent.sinks.kafka-sink1.channel = file-channel1
svcagent.sinks.kafka-sink1.batchSize = 100
svcagent.sinks.kafka-sink1.request.required.acks = 1
svcagent.sinks.kafka-sink1.send.buffer.bytes = 1310720

svcagent.channels.file-channel1.type=file
svcagent.channels.file-channel1.checkpointDir=/etc/flume-kafka/checkpoint
svcagent.channels.file-channel1.dataDirs=/etc/flume-kafka/data
svcagent.channels.file-channel1.transactionCapacity=1000
svcagent.channels.file-channel1.capacity=10000
svcagent.channels.file-channel1.checkpointInterval=120000
svcagent.channels.file-channel1.checkpointOnClose=true
svcagent.channels.file-channel1.maxFileSize=536870912
svcagent.channels.file-channel1.use-fast-replay=false

From: 이승진 [mailto:[email protected]]
Sent: Sunday, November 15, 2015 7:49 PM
To: [email protected]
Subject: Re: Flume benchmarking with HTTP source & File channel


I found one day that Flume's HTTP source implementation is somewhat outdated 
and it's not really optimized for performance.



Our requirement includes processing more than 10k requests within a single 
node, but as Hemanth said, Flume's HTTP source processed a few hundreds per 
second.



We decided to implement our own Http source based on netty 4, and it processes 
30~40k per second which perfectly meet our requirements.(without much 
optimization)



Regards,

Adrian Seungjin Lee





-----Original Message-----
From: "Hari 
Shreedharan"<[email protected]<mailto:[email protected]>>
To: 
"[email protected]<mailto:[email protected]>"<[email protected]<mailto:[email protected]>>;
Cc:
Sent: 2015-11-15 (일) 16:37:38
Subject: Re: Flume benchmarking with HTTP source & File channel

Single event batches are going to be really slow. Multiple reasons - protocol 
overhead, flume channels written to handle batches of events and not single 
events etc

On Saturday, November 14, 2015, Hemanth Abbina 
<[email protected]<mailto:[email protected]>> wrote:

Hi Hari,



Thanks for the response.



I haven’t  tried with different source. Will try that.

We are sending through multiple HTTP clients (around 40 clients) and using 
single event per batch.



First, we would like to validate & see the max supported HTTP source EPS for a 
single Flume server ( we are testing with 8 core 32 GB RAM), when sent single 
event batch from multiple clients.



After confirming the EPS at this stage, we are planning to check  the 
performance with batching & multi node Flume support.



Thanks,

Hemanth



From: Hari Shreedharan [mailto:[email protected]]
Sent: Sunday, November 15, 2015 8:41 AM
To: [email protected]<mailto:[email protected]>
Subject: Re: Flume benchmarking with HTTP source & File channel



Did you try with a different source? Is your sender multithreaded? Sending from 
a single thread would obviously be slow. How many messages per batch? The 
bigger your batch is, better your perf will be

On Saturday, November 14, 2015, Hemanth Abbina 
<[email protected]<mailto:[email protected]>> wrote:

Thanks Gonzalo.



Yes, it’s a single server. First we would like to confirm the max throughput by 
a single server with this configuration. Size of each message is around 512 
bytes.



I have tried with in-memory & null sink too. Performance increased by 50 
requests/sec or so, not beyond that.



In some of the forums, I have seen Flume benchmark of 30K/40K per single node 
(I’m not sure about the configurations). So, trying to check the max throughput 
by a server.



From: Gonzalo Herreros [mailto:[email protected]]
Sent: Saturday, November 14, 2015 2:02 PM
To: user <[email protected]<mailto:[email protected]>>
Subject: Re: Flume benchmarking with HTTP source & File channel



If that is just with a single server, 600 messages per sec doesn't sound bad to 
me.
Depending on the size of each message, it could be the network the limiting 
factor.

I would try with the null sink and in memory channel. If that doesn't improve 
things I would say you need more nodes to go beyond that.

Regards,
Gonzalo

On Nov 14, 2015 7:40 AM, "Hemanth Abbina" 
<[email protected]<mailto:[email protected]>> wrote:

Hi,



We have been trying to validate & benchmark the Flume performance for our 
production use.



We have configured Flume to have HTTP source, File channel & Kafka sink.

Hardware : 8 Core, 32 GB RAM, CentOS6.5, Disk - 500 GB HDD.

Flume configuration:

svcagent.sources = http-source

svcagent.sinks = kafka-sink1

svcagent.channels = file-channel1



# HTTP source to read receive events on port 5005

svcagent.sources.http-source.type = http

svcagent.sources.http-source.channels = file-channel1

svcagent.sources.http-source.port = 5005

svcagent.sources.http-source.bind = 10.15.1.31



svcagent.sources.http-source.selector.type = multiplexing

svcagent.sources.http-source.selector.header = archival

svcagent.sources.http-source.selector.mapping.true = file-channel1

svcagent.sources.http-source.selector.default = file-channel1

#svcagent.sources.http-source.handler 
=org.eiq.flume.JSONHandler.HTTPSourceJSONHandler



svcagent.sinks.kafka-sink1.topic = flume-sink1

svcagent.sinks.kafka-sink1.brokerList = 10.15.1.32:9092<http://10.15.1.32:9092>

svcagent.sinks.kafka-sink1.channel = file-channel1

svcagent.sinks.kafka-sink1.batchSize = 5000



svcagent.channels.file-channel1.type = file

svcagent.channels.file-channel1.checkpointDir=/etc/flume-kafka/checkpoint

svcagent.channels.file-channel1.dataDirs=/etc/flume-kafka/data

svcagent.channels.file-channel1.transactionCapacity=10000

svcagent.channels.file-channel1.capacity=50000

svcagent.channels.file-channel1.checkpointInterval=120000

svcagent.channels.file-channel1.checkpointOnClose=true

svcagent.channels.file-channel1.maxFileSize=536870912

svcagent.channels.file-channel1.use-fast-replay=false



When we tried to stream HTTP data, from multiple clients (around 40 HTTP 
clients), we could get a max processing of 600  requests/sec, and not beyond 
that. Increased the XMX setting of Flume to 4096.



Even we have tried with a Null Sink (instead of Kafka sink). Did not get much 
performance improvements. So, assuming the blockage is the HTTP source & File 
channel.



Could you please suggest any fine tunings to improve the performance of this 
setup.



--regards

Hemanth


--



Thanks,

Hari




--

Thanks,
Hari

[http://ack.mail.navercorp.com/readReceipt/notify/?img=rZbmFqKrFxJ0KrUYaqumKzp0FA2qKxtrKrKXpA2dM6EdKoUrKogrK6MZtzFXp6UmKLl5W63474lcWNFlbX30WLloWrdQaXkqpBigp4w9W6E5MBICMrC074eZpm%3D%3D.gif]

RE: Flume benchmarking with HTTP source & File channel

Reply via email to