Also, as I understand we either have to mark all messages with unique IDs
and then deduplicate them, or, if we want just store last message processed
per partition we will need exactly the same partitions number in both
clusters?

On Fri, Mar 20, 2015 at 10:19 AM, Guozhang Wang <wangg...@gmail.com> wrote:

> Not sure if transactional messaging will help in this case, as at least for
> now it is still targeted within a single DC, i.e. a "transaction" is only
> defined within a Kafka cluster, not across clusters.
>
> Guozhang
>
> On Fri, Mar 20, 2015 at 10:08 AM, Jon Bringhurst <
> jbringhu...@linkedin.com.invalid> wrote:
>
> > Hey Kane,
> >
> > When mirrormakers loose offsets on catastrophic failure, you generally
> > have two options. You can keep auto.offset.reset set to "latest" and
> handle
> > the loss of messages, or you can have it set to "earliest" and handle the
> > duplication of messages.
> >
> > Although we try to avoid duplicate messages overall, when failure
> happens,
> > we (mostly) take the "earliest" path and deal with the duplication of
> > messages.
> >
> > If your application doesn't treat messages as idempotent, you might be
> > able to get away with something like couchbase or memcached with a TTL
> > slightly higher than your Kafka retention time and use that to filter
> > duplicates. Another pattern may be to deduplicate messages in Hadoop
> before
> > taking action on them.
> >
> > -Jon
> >
> > P.S. An option in the future might be
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Transactional+Messaging+in+Kafka
> >
> > On Mar 19, 2015, at 5:32 PM, Kane Kim <kane.ist...@gmail.com> wrote:
> >
> > > Hello,
> > >
> > > What's the best strategy for failover when using mirror-maker to
> > replicate
> > > across datacenters? As I understand offsets in both datacenters will be
> > > different, how consumers should be reconfigured to continue reading
> from
> > > the same point where they stopped without data loss and/or duplication?
> > >
> > > Thanks.
> >
> >
>
>
> --
> -- Guozhang
>

Reply via email to