Thanks Joel for the response. Sure, I can achieve same by using some data structure before we push messages to Kafka. I was hoping if I can utilize any kafka capabilities specially as we are using async producer.
thanks On Wed, Jul 3, 2013 at 11:18 PM, Joel Koshy <jjkosh...@gmail.com> wrote: > Hi Nitin, > > > > We receive events from external source ( for example, facebook status > > update events). These events are pushed to kafka queue when received > There > > is possibility of duplicate event ( multiple facebook status update > events > > for same account in quick intervals ) coming again and gets pushed into > > kafka queue. At consumer end, we do not want to process duplicate events > ( > > connect to facebook and fetch the status). We would prefer not to have > > another data structure to single instance the events received. There is > > time limit in which whatever events received, we want to do single > > instancing ( at once fetch all the status update events received in five > > minutes for single account ). In async producer, events are anyways not > > written to broker synchronously. They are batched and then pushed after > > predefined time interval. That's perfect for us. We just want to look at > > the batch and delete duplicate events from it and then push it broker. > > However, this depends on your event rate - i.e., if it's a very high > event rate then the batch threshold is reached and send happens before > that time interval. Furthermore, even if event rate is low, the > de-duplication applies to a time-window of at most > queue.buffering.max.ms). > > In 0.7, batches would be taken and the call-back handler's > beforeSendingData method would be invoked on those batches. i.e., you > can achieve the same effect as you did in 0.7 by keeping keeping an > additional buffer (before the actual send) of > queue.buffering.max.messages (i.e., the batch size) and de-duping > within that buffer. > > Thanks, > > Joel > > > On Wed, Jul 3, 2013 at 3:07 AM, Joel Koshy <jjkosh...@gmail.com> wrote: > > > >> Callback handlers are no longer supported in 0.8. Can you go into why > >> the filtering needs to be done at this stage as opposed to before > >> actually sending to the producer? > >> > >> Thanks, > >> > >> Joel > >> > >> On Tue, Jul 2, 2013 at 10:41 AM, Nitin Supekar <ni...@ecinity.com> > wrote: > >> > Hello- > >> > > >> > Is CallbackHandler supported in Kafka 0.8 for async producers? > >> > > >> > If yes, can I use it to alter the batched messages before they are > pushed > >> > to broker? For example, I may want to delete some of the messages in > the > >> > batch based on some business logic in my application? > >> > > >> > If no, is there any alternate way? I want to do some kind of single > >> > instancing on messages pushed in kafka in last X minutes. > >> > > >> > thanks > >> > > > On Wed, Jul 3, 2013 at 1:33 AM, Nitin Supekar <ni...@ecinity.com> wrote: > > Hello- > > > > We receive events from external source ( for example, facebook status > > update events). These events are pushed to kafka queue when received > There > > is possibility of duplicate event ( multiple facebook status update > events > > for same account in quick intervals ) coming again and gets pushed into > > kafka queue. At consumer end, we do not want to process duplicate events > ( > > connect to facebook and fetch the status). We would prefer not to have > > another data structure to single instance the events received. There is > > time limit in which whatever events received, we want to do single > > instancing ( at once fetch all the status update events received in five > > minutes for single account ). In async producer, events are anyways not > > written to broker synchronously. They are batched and then pushed after > > predefined time interval. That's perfect for us. We just want to look at > > the batch and delete duplicate events from it and then push it broker. > > > > Possible? > > > > thanks > > > > > > On Wed, Jul 3, 2013 at 3:07 AM, Joel Koshy <jjkosh...@gmail.com> wrote: > > > >> Callback handlers are no longer supported in 0.8. Can you go into why > >> the filtering needs to be done at this stage as opposed to before > >> actually sending to the producer? > >> > >> Thanks, > >> > >> Joel > >> > >> On Tue, Jul 2, 2013 at 10:41 AM, Nitin Supekar <ni...@ecinity.com> > wrote: > >> > Hello- > >> > > >> > Is CallbackHandler supported in Kafka 0.8 for async producers? > >> > > >> > If yes, can I use it to alter the batched messages before they are > pushed > >> > to broker? For example, I may want to delete some of the messages in > the > >> > batch based on some business logic in my application? > >> > > >> > If no, is there any alternate way? I want to do some kind of single > >> > instancing on messages pushed in kafka in last X minutes. > >> > > >> > thanks > >> >