Hi All,
These are the follow up observations & issues on the benchmarking.
Configuration is same as HTTP source -> File Channel -> Kafka Sink: When sent
larger messages from the HTTP clients, observed EPS is around 140. Each single
large message is batch of 100 individual log messages, so I can say the
effective EPS is 14,000.
When I further increase the streaming rate from the clients, the file channel
is overflowing and throwing errors “Error appending event to channel. Channel
might be full. Unable to put batch on required channel: FileChannel
file-channel1 { dataDirs: [/etc/flume-kafka/data]”.
I understood that the issue might be Kafka sink is slower than the HTTP source.
How can I overcome this ? Tried creaking a sink group with load balancer
support, but of no use.
Could you pleaes suggest me something to overcome the slow Kafka sink problem ?
svcagent.sources = http-source
svcagent.sinks = kafka-sink1
svcagent.channels = file-channel1
svcagent.sources.http-source.type = http
svcagent.sources.http-source.channels = file-channel1
svcagent.sources.http-source.port = 5005
svcagent.sources.http-source.bind = 10.15.1.31
svcagent.sources.http-source.handler
=org.eiq.flume.JSONHandler.HTTPSourceJSONHandler
svcagent.sinks.kafka-sink1.type = org.apache.flume.sink.kafka.KafkaSink
svcagent.sinks.kafka-sink1.topic = flume-sink1
svcagent.sinks.kafka-sink1.brokerList = 10.15.1.32:9092,10.15.1.32:9093
svcagent.sinks.kafka-sink1.channel = file-channel1
svcagent.sinks.kafka-sink1.batchSize = 100
svcagent.sinks.kafka-sink1.request.required.acks = 1
svcagent.sinks.kafka-sink1.send.buffer.bytes = 1310720
svcagent.channels.file-channel1.type=file
svcagent.channels.file-channel1.checkpointDir=/etc/flume-kafka/checkpoint
svcagent.channels.file-channel1.dataDirs=/etc/flume-kafka/data
svcagent.channels.file-channel1.transactionCapacity=1000
svcagent.channels.file-channel1.capacity=10000
svcagent.channels.file-channel1.checkpointInterval=120000
svcagent.channels.file-channel1.checkpointOnClose=true
svcagent.channels.file-channel1.maxFileSize=536870912
svcagent.channels.file-channel1.use-fast-replay=false
From: 이승진 [mailto:[email protected]]
Sent: Sunday, November 15, 2015 7:49 PM
To: [email protected]
Subject: Re: Flume benchmarking with HTTP source & File channel
I found one day that Flume's HTTP source implementation is somewhat outdated
and it's not really optimized for performance.
Our requirement includes processing more than 10k requests within a single
node, but as Hemanth said, Flume's HTTP source processed a few hundreds per
second.
We decided to implement our own Http source based on netty 4, and it processes
30~40k per second which perfectly meet our requirements.(without much
optimization)
Regards,
Adrian Seungjin Lee
-----Original Message-----
From: "Hari
Shreedharan"<[email protected]<mailto:[email protected]>>
To:
"[email protected]<mailto:[email protected]>"<[email protected]<mailto:[email protected]>>;
Cc:
Sent: 2015-11-15 (일) 16:37:38
Subject: Re: Flume benchmarking with HTTP source & File channel
Single event batches are going to be really slow. Multiple reasons - protocol
overhead, flume channels written to handle batches of events and not single
events etc
On Saturday, November 14, 2015, Hemanth Abbina
<[email protected]<mailto:[email protected]>> wrote:
Hi Hari,
Thanks for the response.
I haven’t tried with different source. Will try that.
We are sending through multiple HTTP clients (around 40 clients) and using
single event per batch.
First, we would like to validate & see the max supported HTTP source EPS for a
single Flume server ( we are testing with 8 core 32 GB RAM), when sent single
event batch from multiple clients.
After confirming the EPS at this stage, we are planning to check the
performance with batching & multi node Flume support.
Thanks,
Hemanth
From: Hari Shreedharan [mailto:[email protected]]
Sent: Sunday, November 15, 2015 8:41 AM
To: [email protected]<mailto:[email protected]>
Subject: Re: Flume benchmarking with HTTP source & File channel
Did you try with a different source? Is your sender multithreaded? Sending from
a single thread would obviously be slow. How many messages per batch? The
bigger your batch is, better your perf will be
On Saturday, November 14, 2015, Hemanth Abbina
<[email protected]<mailto:[email protected]>> wrote:
Thanks Gonzalo.
Yes, it’s a single server. First we would like to confirm the max throughput by
a single server with this configuration. Size of each message is around 512
bytes.
I have tried with in-memory & null sink too. Performance increased by 50
requests/sec or so, not beyond that.
In some of the forums, I have seen Flume benchmark of 30K/40K per single node
(I’m not sure about the configurations). So, trying to check the max throughput
by a server.
From: Gonzalo Herreros [mailto:[email protected]]
Sent: Saturday, November 14, 2015 2:02 PM
To: user <[email protected]<mailto:[email protected]>>
Subject: Re: Flume benchmarking with HTTP source & File channel
If that is just with a single server, 600 messages per sec doesn't sound bad to
me.
Depending on the size of each message, it could be the network the limiting
factor.
I would try with the null sink and in memory channel. If that doesn't improve
things I would say you need more nodes to go beyond that.
Regards,
Gonzalo
On Nov 14, 2015 7:40 AM, "Hemanth Abbina"
<[email protected]<mailto:[email protected]>> wrote:
Hi,
We have been trying to validate & benchmark the Flume performance for our
production use.
We have configured Flume to have HTTP source, File channel & Kafka sink.
Hardware : 8 Core, 32 GB RAM, CentOS6.5, Disk - 500 GB HDD.
Flume configuration:
svcagent.sources = http-source
svcagent.sinks = kafka-sink1
svcagent.channels = file-channel1
# HTTP source to read receive events on port 5005
svcagent.sources.http-source.type = http
svcagent.sources.http-source.channels = file-channel1
svcagent.sources.http-source.port = 5005
svcagent.sources.http-source.bind = 10.15.1.31
svcagent.sources.http-source.selector.type = multiplexing
svcagent.sources.http-source.selector.header = archival
svcagent.sources.http-source.selector.mapping.true = file-channel1
svcagent.sources.http-source.selector.default = file-channel1
#svcagent.sources.http-source.handler
=org.eiq.flume.JSONHandler.HTTPSourceJSONHandler
svcagent.sinks.kafka-sink1.topic = flume-sink1
svcagent.sinks.kafka-sink1.brokerList = 10.15.1.32:9092<http://10.15.1.32:9092>
svcagent.sinks.kafka-sink1.channel = file-channel1
svcagent.sinks.kafka-sink1.batchSize = 5000
svcagent.channels.file-channel1.type = file
svcagent.channels.file-channel1.checkpointDir=/etc/flume-kafka/checkpoint
svcagent.channels.file-channel1.dataDirs=/etc/flume-kafka/data
svcagent.channels.file-channel1.transactionCapacity=10000
svcagent.channels.file-channel1.capacity=50000
svcagent.channels.file-channel1.checkpointInterval=120000
svcagent.channels.file-channel1.checkpointOnClose=true
svcagent.channels.file-channel1.maxFileSize=536870912
svcagent.channels.file-channel1.use-fast-replay=false
When we tried to stream HTTP data, from multiple clients (around 40 HTTP
clients), we could get a max processing of 600 requests/sec, and not beyond
that. Increased the XMX setting of Flume to 4096.
Even we have tried with a Null Sink (instead of Kafka sink). Did not get much
performance improvements. So, assuming the blockage is the HTTP source & File
channel.
Could you please suggest any fine tunings to improve the performance of this
setup.
--regards
Hemanth
--
Thanks,
Hari
--
Thanks,
Hari
[http://ack.mail.navercorp.com/readReceipt/notify/?img=rZbmFqKrFxJ0KrUYaqumKzp0FA2qKxtrKrKXpA2dM6EdKoUrKogrK6MZtzFXp6UmKLl5W63474lcWNFlbX30WLloWrdQaXkqpBigp4w9W6E5MBICMrC074eZpm%3D%3D.gif]