Ok, Thanks for your answers On 3/22/17, 1:34 PM, "Cody Koeninger" <c...@koeninger.org> wrote:
If you're talking about reading the same message multiple times in a failure situation, see https://github.com/koeninger/kafka-exactly-once If you're talking about producing the same message multiple times in a failure situation, keep an eye on https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging If you're talking about producers just misbehaving and sending different copies of what is essentially the same message from a domain perspective, you have to dedupe that with your own logic. On Wed, Mar 22, 2017 at 2:52 PM, Matt Deaver <mattrdea...@gmail.com> wrote: > You have to handle de-duplication upstream or downstream. It might > technically be possible to handle this in Spark but you'll probably have a > better time handling duplicates in the service that reads from Kafka. > > On Wed, Mar 22, 2017 at 1:49 PM, Maurin Lenglart <mau...@cuberonlabs.com> > wrote: >> >> Hi, >> we are trying to build a spark streaming solution that subscribe and push >> to kafka. >> >> But we are running into the problem of duplicates events. >> >> Right now, I am doing a “forEachRdd” and loop over the message of each >> partition and send those message to kafka. >> >> >> >> Is there any good way of solving that issue? >> >> >> >> thanks > > > > > -- > Regards, > > Matt > Data Engineer > https://www.linkedin.com/in/mdeaver > http://mattdeav.pythonanywhere.com/