There are subtle but significant differences. When you configure in the sink: "batchSize" you are specifying how many messages are taken as a transaction from the channel at once (like in any other sink). While the Kafka property "batch.num.messages" (which in the flume config is specified as "kafka.batch.num.messages", specifies the batch size for sending messages to the broker from an asynchronous producer. By default the producer is synchronous, so that configuration property would do nothing.
If you use the synchronous producer (which is default), the messages taken from the channel as a batch (100 by default) will be sent together to the kafka broker. However, if you change the producer to async then it's more complicated, by default "kafka.batch.num.messages" is 200 so it means that the Sink will take 100 from the channel and commit that but those messages will be kept in memory until another 100 are taken (so there is a risk of losing messages). I would stay away for the async producer in a Flume sink because you want the sink to control the pace (a file or memory channel will be faster) so it doesn't need to buffer in memory risking message loss. An async producer is useful when the client is an online application you don't want to delay. Answering you question: if you don't specify any batching properties, by default it will delivery messages in batches of 100, which is probably good in most cases. Hope that makes sense. Regards, Gonzalo On 26 September 2015 at 05:19, Sharninder <[email protected]> wrote: > Anyone ? > > > On 25-Sep-2015, at 3:51 PM, Sharninder <[email protected]> wrote: > > > > Hi, > > > > We want to move to the built-in kafka sink from our own custom > implementation and I have a question about the batchsize config parameter. > > > > Looking at the code of the sink, I can tell that the batchsize is used > to construct the list of keyed messages fed to the producer. > > > > My question is what is the difference between this variable and the > kafka batch.num.messages parameter? > > > > Is the flume parameter necessary ? > > > > -- > > Sharninder > > > > >
