I think your batch sizes are the key. What is your batch size from your source?
Thanks, Hari Shreedharan > On Nov 12, 2015, at 4:06 AM, Guillermo Ortiz <[email protected]> wrote: > > Yes,, I tried as well changing the capacity of the KafkaChannel because there > is an example in the documentation, although the documentation doesn't say > anything about what it means. > > Anyway, I finally write messages in Kafka from the PoolDir Source or from a > KafkaSink. I take the measure in Kafka. Maybe it's not the same to write from > a sink or directly as a channel. I thought that it should be faster since > there're less pieces though the complete flow. > > Another theory that I have it's that I have taken a look to the code > MemoryChannel and KafkaChannel. It was a quick look, but I saw that in > KafkaChannel it has to serialize the events with Avro and in MemoryChannel I > didn't see that transformation. There is a method doCommit but I'm not sure > when this method is called. > > > 2015-11-12 12:39 GMT+01:00 Gonzalo Herreros <[email protected] > <mailto:[email protected]>>: > I think your expectations are not realistic. > The MemoryChannel adds minimum overhead but is not reliable like the > KafkaChannel > In the first case you can lose 10k messages if you are unlucky while with the > KafkaChannel you won't lose a single one. > With more reliability normally you have a small performance hit > > However, the differences you are seeing are too great so I also believe it's > related to the batch size. > While the sink it's using 10k batches, there is nothing configured for the > KafkaChannel (it could be committing every message or something like 100). > Not sure what is the default batch size there, > In the documentation there are no properties for batch or transactionCapacity > but the example it does set the capacity and transactionCapacity. Not sure if > they apply to this channel.. > > Regards, > Gonzalo > > > On 12 November 2015 at 11:23, Ahmed Vila <[email protected] > <mailto:[email protected]>> wrote: > Hi Guillermo, > > With KafkaSink you're passing 10k events at once to Kafka due to batchSize > (transaction size) being that big. > > So, it's important to know how big batchSize is in your source in order to be > able to compare. Set it to 10k and check it's performance again. > > Please keep in mind that Flume has to keep track of transactions and other > housekeeping within any channel, so in my opinion it's supposed to be slower > than Sink for the same output (Kafka, file or whatever). > > > > On Thu, Nov 12, 2015 at 12:05 PM, Guillermo Ortiz <[email protected] > <mailto:[email protected]>> wrote: > Hello, > > I'm using Flume with Kafka and I don't understand some performance results > that I'm getting. > > I have a topic with 3 nodes, 6 partitions, replication 2. > I'm ingesting messages of 1100bytes each one with a poolDirectory source. > > I tried with Source-MemoryChannel-KafkaSink and I get about 50Kmessage/second > - 54Mb/s in Kafka. > > If I use Source-KafkaChannel I just got about 1Kmessage/second - 1.2Mb/s in > Kafka > > I thought that I was going to get better performance with the KafkaChannel > and I'm getting 50x times better with KafkaSink. > > The first configuration is > agent.sources = seqGenSrc > agent.channels = memoryChannel > agent.sinks = kafkaSink > > #Source configuration > ... > > agent.sources.seqGenSrc.channels = memoryChannel > agent.sinks.kafkaSink.channel = memoryChannel > agent.sinks.kafkaSink.type = org.apache.flume.sink.kafka.KafkaSink > agent.sinks.kafkaSink.batchSize = 10000 > agent.sinks.kafkaSink.brokerList = > ose10kafkaelk:9092,ose11kafkaelk:9092,ose12kafkaelk:9092 > agent.sinks.kafkaSink.topic = kafka-topic > agent.sinks.kafkaSink.requiredAcks = -1 > agent.sinks.kafkaSink.channel = memoryChannel > > agent.channels.memoryChannel.type = memory > agent.channels.memoryChannel.capacity = 100000 > agent.channels.memoryChannel.transactionCapacity = 10000 > > > > The second is: > agent.sources = seqGenSrc > agent.channels = kafkaChannel > > > # Describe/configure the source > ###Configuration spoolDir source... > ... > > # The channel can be defined as follows. > agent.sources.seqGenSrc.channels = kafkaChannel > > agent.channels.kafkaChannel.type = > org.apache.flume.channel.kafka.KafkaChannel > agent.channels.kafkaChannel.brokerList=ose10kafkaelk:9092,ose11kafkaelk:9092,ose12kafkaelk:9092 > agent.channels.kafkaChannel.topic=kafka-topic3 > agent.channels.kafkaChannel.zookeeperConnect=ose10kafkaelk:2181 > > > > > > -- > Best regards, > > Ahmed Vila | Senior software developer > DevLogic | Sarajevo | Bosnia and Herzegovina > > Office : +387 33 942 123 <tel:%2B387%2033%20942%20123> > Mobile: +387 62 139 348 <tel:%2B387%2062%20139%20348> > > Website: www.devlogic.eu <http://www.devlogic.eu/> > E-mail : [email protected] > <mailto:[email protected]>--------------------------------------------------------------------- > This e-mail and any attachment is for authorised use by the intended > recipient(s) only. This email contains confidential information. It should > not be copied, disclosed to, retained or used by, any party other than the > intended recipient. Any unauthorised distribution, dissemination or copying > of this E-mail or its attachments, and/or any use of any information > contained in them, is strictly prohibited and may be illegal. If you are not > an intended recipient then please promptly delete this e-mail and any > attachment and all copies and inform the sender directly via email. Any > emails that you send to us may be monitored by systems or persons other than > the named communicant for the purposes of ascertaining whether the > communication complies with the law and company policies. > > --------------------------------------------------------------------- > This e-mail and any attachment is for authorised use by the intended > recipient(s) only. This email contains confidential information. It should > not be copied, disclosed to, retained or used by, any party other than the > intended recipient. Any unauthorised distribution, dissemination or copying > of this E-mail or its attachments, and/or any use of any information > contained in them, is strictly prohibited and may be illegal. If you are not > an intended recipient then please promptly delete this e-mail and any > attachment and all copies and inform the sender directly via email. Any > emails that you send to us may be monitored by systems or persons other than > the named communicant for the purposes of ascertaining whether the > communication complies with the law and company policies. > >
