That does make sense. Thanks Gonzalo. We do use the async producer with a default kafka num.messages. We don't care about a few messages being lost in the event of a crash or something so I think we'll continue using the async producer but picking up an X number of messages in a single transaction will surely help withe reducing IO on the flume server.
Thanks a lot. -- Sharninder On Sun, Sep 27, 2015 at 3:16 PM, Gonzalo Herreros <[email protected]> wrote: > There are subtle but significant differences. > > When you configure in the sink: "batchSize" you are specifying how many > messages are taken as a transaction from the channel at once (like in any > other sink). > While the Kafka property "batch.num.messages" (which in the flume config > is specified as "kafka.batch.num.messages", specifies the batch size for > sending messages to the broker from an asynchronous producer. By default > the producer is synchronous, so that configuration property would do > nothing. > > If you use the synchronous producer (which is default), the messages taken > from the channel as a batch (100 by default) will be sent together to the > kafka broker. > However, if you change the producer to async then it's more complicated, > by default "kafka.batch.num.messages" is 200 so it means that the Sink > will take 100 from the channel and commit that but those messages will be > kept in memory until another 100 are taken (so there is a risk of losing > messages). > > I would stay away for the async producer in a Flume sink because you want > the sink to control the pace (a file or memory channel will be faster) so > it doesn't need to buffer in memory risking message loss. An async producer > is useful when the client is an online application you don't want to delay. > > Answering you question: if you don't specify any batching properties, by > default it will delivery messages in batches of 100, which is probably good > in most cases. > Hope that makes sense. > > Regards, > Gonzalo > > > On 26 September 2015 at 05:19, Sharninder <[email protected]> wrote: > >> Anyone ? >> >> > On 25-Sep-2015, at 3:51 PM, Sharninder <[email protected]> wrote: >> > >> > Hi, >> > >> > We want to move to the built-in kafka sink from our own custom >> implementation and I have a question about the batchsize config parameter. >> > >> > Looking at the code of the sink, I can tell that the batchsize is used >> to construct the list of keyed messages fed to the producer. >> > >> > My question is what is the difference between this variable and the >> kafka batch.num.messages parameter? >> > >> > Is the flume parameter necessary ? >> > >> > -- >> > Sharninder >> > >> > >> > > -- -- Sharninder
