Hello again, the reviewers and myself want to point out that PR-6295
includes the minor API change discussed in KIP-416, a new parameter in
SourceTask.commitRecord(), since KIP-382 depends on it. Concensus is to not
deprecate anything or alter any existing semantics. I've updated both KIPs
to
Hey y'all, I'm happy to announce that the PR for "MirrorMaker 2.0" is ready
for review, after a long spell in "draft".
https://github.com/apache/kafka/pull/6295
MirrorMaker 2.0 is in the Kafka 2.3.0 release plan. Please take a look so
we can get this merged.
Also, shameless plug: I'm giving a
Pippin, thanks for your interest. I will publish a PR soon (several
days?) which you'll be able to build and play with. Watch this space
:)
Ryanne
On Thu, Jan 24, 2019 at 5:19 PM Pippin Wallace wrote:
>
> I see that the Current state of KIP-382 recently changed from Voting to
> Accepted on
I see that the Current state of KIP-382 recently changed from Voting to
Accepted on Confluence page
https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0
I am just looking for a best guess as to when this might make it into an alpha,
beta, or GA release?
Regards,
Thanks Dong.
> 1. Currently if there is topic created with "." in the topic name, would
it cause correctness issue for this KIP?
Yes, RemoteClusterUtils would be confused by existing topics that have a
period, and MM2 might try to send records to existing topics if they happen
to be prefixed
Hey Ryanne,
Sorry I am late here. Thanks much for all the work! After reading through
the latest KIP and all the previous discussion, I have some questions below:
1. Currently if there is topic created with "." in the topic name, would it
cause correctness issue for this KIP? For example, will
> very
> > > > > > > > > important scenario. Right now open source community does
> not
> > > > have a
> > > > > > > > > standard solution to that.
> > > > > > > > >
> > > > > > > > > A few comments/questions are following:
> > > > > > > > >
> > > > > > > > > 1. Only relying on the t
> > destination offset? If so how would that be done?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Jiangjie (Becket) Qin
> > > > > > > >
>
> > > > > > > > Notice that MM2 isn't really breaking anything here. The
> > problem
> > > is
> > > > > > that
> > > > > > > > you are using MirrorMaker to aggregate records from multiple
> > > >
gt; > > > With (2) you still get the nice DR semantics. The KTable will
> > > > represent
> > > > > > the
> > > > > > > latest account states aggregated across all clusters. If your
> > > > producers
> > > > > &g
gt; > > > better
> > > > > > suited than Connect for this, but I won't rule it out for a
> future
> > > KIP.
> > > > > >
> > > > > > Thanks again!
> > > > > > Ryanne
> > > > > >
>
Account A + B are in Ireland
> > > > > > Account C + D are in Germany
> > > > > > Account E are in UK
> > > > > >
> > > > > > Lets prefix call kafka cluster in Ireland Z, Germany Y and UK X
> > > > > >
> &
r unplanned outage in the UK.
> > > > > We move/shift the transactional processing of account E to Ireland.
> > > > >
> > > > > Now we end up with
> > > > > Z.account_state (holding state for accounts A + B + current state
> > for
> > >
get both current state and also outdated state for E, which
> > state
> > > > would it use? And this gets worse as it scales and you move the
> > > processing
> > > > of accounts around over time.
> > > >
> > > >
> > > > Li
Sönke, I can probably get a KIP together in the next several weeks, but
you're welcome to beat me to it :)
Ryanne
On Fri, Dec 21, 2018, 3:59 AM Sönke Liebau Hi Ryanne,
>
> just to briefly check in, am I understanding your mail correctly, that
> you want to pick up the
Hi Ryanne,
just to briefly check in, am I understanding your mail correctly, that
you want to pick up the "multi-cluster/herder/worker features" in a
different KIP at some time? If yes, please feel free to let me know if
I can provide any help on that front. Otherwise, I am also happy to
draft a
Jun, let's leave the REST API out of the KIP then.
I have been arguing that Connect wouldn't benefit from the
multi-cluster/herder/worker features we need in MM2, and that the effort
would result in a needlessly complex Connect REST API. But certainly two
separate APIs is inherently more complex
Hi, Ryanne,
Hmm, there are still quite a few MM2 specific rest apis. Overall, I am
still not sure that having a separate dedicated MM2 cluster is better. From
an operational perspective, if someone is already running a connect cluster
for other connectors, it seems that it's better to just run
.
> > > > > We move/shift the transactional processing of account E to Ireland.
> > > > >
> > > > > Now we end up with
> > > > > Z.account_state (holding state for accounts A + B + current state
> > for
> > > >
it use? And this gets worse as it scales and you move the
> > > processing
> > > > of accounts around over time.
> > > >
> > > >
> > > > Likewise the issue is the same without compacted state.
> > > >
> > > > Imagine order of s
le
> > > topic partition only. You only care for order by the account (not for
> > > offets being same, we simply care that updates are in order and latest
> > > state is at head on restart we ignore offsets). So it doesn’t matter if
> > in
> > > Z cluster you see upd
Ryanne, thank you, this looks great and will be really appreciated by the
community.
My only comment at this point: is the REST api strictly necessary for this
KIP? Perhaps consider moving that to a different KIP since the main
contribution is more than sufficient. However that is just a
> So, if we want to add it, it seems it would be useful to do it in a
backward compatible way in the connect framework, rather than sth specific
to MM
Jun, that sgtm. The MirrorMaker driver I have right now creates multiple
Herders (for multiple Kafka clusters) and exposes them through a
Hi, Sonke, Ryanne,
Thanks for the explanation. To me, the single connect cluster model could
be useful for any connector, not just MM. So, if we want to add it, it
seems it would be useful to do it in a backward compatible way in the
connect framework, rather than sth specific to MM. I am not
Thanks Sönke, you're spot-on. I don't want MM2 to wait for Connect features
that don't exist yet, especially if MM2 is the primary use case for them.
Moreover, I think MM2 can drive and inform some of these features, which
only makes sense if we adopt MM2 first.
Ryanne
On Fri, Dec 14, 2018, 9:03
Hi Jun,
I believe Ryanne's idea is to run multiple workers per MM cluster-node, one
per target cluster. So in essence you'd specify three clusters in the MM
config and MM would then instantiate one worker per cluster. Every MM
connector would then be deployed to the appropriate (internal) worker
Hi, Ryanne,
Regarding the single connect cluster model, yes, the co-existence of a MM2
REST API and the nearly identical Connect API is one of my concerns.
Implementation wise, my understanding is that the producer URL in a
SourceTask is always obtained from the connect worker's configuration.
A1,E1, A2, A3, E2 as the ordering by of the updates account is
> > preserved.
> >
> > With the topic solution your suggesting we would have no way true way of
> > replaying and re-constituting the order between X.account_state and
> > Z.account_state topics in the c
messages will be in different
> topics and partitions.
>
>
>
>
>
>
> -Original Message-
> From: Ryanne Dolan
> Sent: Wednesday, December 12, 2018 4:37 PM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-382: MirrorMaker 2.0
>
> > You haven
4:37 PM
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-382: MirrorMaker 2.0
> You haven’t described how one would handle the ordering issues and also
the compaction issues where transactional processing is master-master in
regions, where the processing is sticky to region but of fail
gt;
> -Original Message-
> From: Ryanne Dolan
> Sent: Wednesday, December 12, 2018 6:41 AM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-382: MirrorMaker 2.0
>
> > One based on hops using headers, and another based on topic naming.
>
> Michae
> Wikimedia currently implements 'master <-> master' replication by
manually prefixing topics with datacenter names, and then configuring
MirrorMaker to only replicate topics that begin with a DC name to another.
Andrew, this is a common approach and solves some of the problems I've
mentioned,
that you keep compatibility of the handler api interface in MM into
MM2.
-Original Message-
From: Ryanne Dolan
Sent: Wednesday, December 12, 2018 6:41 AM
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-382: MirrorMaker 2.0
> One based on hops using headers, and another based on to
be processing in germany region currently and C43F2SA could be in uk
> region currently.
>
>
>
> Sent from my Samsung Galaxy smartphone.
> ---- Original message ----From: Andrew Otto
> Date: 11/12/2018 14:28 (GMT+00:00) To: dev@kafka.apache.org Subject:
> Re: [DIS
@kafka.apache.org Subject: Re: [DISCUSS]
KIP-382: MirrorMaker 2.0
Wikimedia currently implements 'master <-> master' replication by manually
prefixing topics with datacenter names, and then configuring MirrorMaker to
only replicate topics that begin with a DC name to another.
While having
Hi Ryanne,
We had an IM exchange about KIP-382 and Mirus a few weeks back, but I also
want to post here to publicly express my support. I'm the primary developer
of Mirus, which is a Kafka Connect based replication tool we wrote at
Salesforce to replace Mirror Maker internally. We open-sourced
Wikimedia currently implements 'master <-> master' replication by manually
prefixing topics with datacenter names, and then configuring MirrorMaker to
only replicate topics that begin with a DC name to another.
While having topics named with topological details is manageable, I
wouldn't say it is
So this is indeed what using headers with hops avoids is creating lots and lots
of topics __, so you can have more complex topology setups.
I ask why not support having two ways of setting up and closing the door?
One based on hops using headers, and another based on topic naming. After all
Hey Ryanne,
Thanks much for the KIP!
Though I don't have time to review this KIP in detail at this stage, I
think this KIP will be very useful to Apache Kafka users (particularly
global enterprise users) who need geo replication capability. Currently
Kafka users have to setup and manage MM
Jun, thanks for your time reviewing the KIP.
> In a MirrorSourceConnector, it seems that the offsets of the source will
be stored in a different cluster from the target cluster?
Jan Filipiak raised this issue as well, and suggested that no state be
tracked in the source cluster. I've since
Hi, Ryanne,
Thanks for the KIP. At the high level, this looks like a reasonable
proposal. A few comments below.
1. About using a single connector cluster to manage connectors accessing
multiple Kafka clusters. It's good that you brought this up. The following
are the tradeoffs that I see. The
Michael, thanks for the comments!
> would like to see support for this to be done by hops, as well [...]
This then allows ring (hops = number of brokers in the ring), mesh (every
cluster interconnected so hop=1), or even a tree (more fine grained setup)
cluster topology.
That's a good idea,
Re hops to stop the cycle and to allow a range of multi cluster topologies, see
https://www.rabbitmq.com/federated-exchanges.html where very similar was done
in rabbit.
On 12/7/18, 12:47 AM, "Michael Pearce" wrote:
Nice proposal.
Some comments.
On the section around cycle
Nice proposal.
Some comments.
On the section around cycle detection.
I would like to see support for this to be done by hops, as well e.g. using
approach is to use a header for the number of hops, as the mm2 replicates it
increases the hop count and you can make the mm2 configurable to only
Sönke,
> The only thing that I could come up with is the limitation to a single
offset commit interval
Yes, and other internal properties, e.g. those used by the internal
consumers and producers, which, granted, probably are not often changed
from their defaults, but that apply to Connectors
Hi Ryanne,
when you say "Currently worker configs apply across the entire cluster,
which is limiting even for use-cases involving a single Kafka cluster.",
may I ask you to elaborate on those limitations a little?
The only thing that I could come up with is the limitation to a single
offset
Sönke,
I think so long as we can keep the differences at a very high level (i.e.
the "control plane"), there is little downside to MM2 and Connect
coexisting. I do expect them to converge to some extent, with features from
MM2 being pulled into Connect whenever this is possible without breaking
Hi Ryanne,
thanks for your response!
It seems like you have already done a lot of investigation into the
existing code and the solution design and all of what you write makes sense
to me. Would it potentially be worth adding this to the KIP, now that you
had to write it up because of me anyway?
Thanks Sönke.
> it just feels to me like an awful lot of Connect functionality would need
to be reimplemented or at least wrapped
Connect currently has two drivers, ConnectDistributed and
ConnectStandalone. Both set up a Herder, which manages Workers. I've
implemented a third driver which sets
Hi Ryanne,
thanks. I missed the remote to remote replication scenario in my train of
thought, you are right.
That being said I have to admit that I am not yet fully on board with the
concept, sorry. But I might just be misunderstanding what your intention
is. Let me try and explain what I think
Sönke, thanks for the feedback!
> the renaming policy [...] can be disabled [...] The KIP itself does not
mention this
Good catch. I've updated the KIP to call this out.
> "MirrorMaker clusters" I am not sure I fully understand the issue you are
trying to solve
MirrorMaker today is not
Hi Ryanne,
first of all, thanks for the KIP, great work overall and much needed I
think!
I have a small comment on the renaming policy, in one of the mails on this
thread you mention that this can be disabled (to replicate topic1 in
cluster A as topic1 on cluster B I assume). The KIP itself does
Hey y'all, I'd like you draw your attention to a new section in KIP-382 re
MirrorMaker Clusters:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-382:+MirrorMaker+2.0#KIP-382:MirrorMaker2.0-MirrorMakerClusters
A common concern I hear about using Connect for replication is that all
Dan, you've got it right. ACL sync will be done by MM2 automatically
(unless disabled) according to simple rules:
- If a principal has READ access on a topic in a source cluster, the same
principal should have READ access on downstream replicated topics ("remote
topics").
- Only MM2 has WRITE
Hi guys,
This is an exciting topic. could I have a word here?
I understand there are many scenarios that we can apply mirrormaker. I am at
the moment working on active/active DC solution using MirrorMaker; our goal is
to allow the clients to failover to connect the other kafka cluster (on the
Alex, thanks for the feedback.
> Would it be possible to utilize the
> Message Headers feature to prevent infinite recursion
This isn't necessary due to the topic renaming feature which already
prevents infinite recursion.
If you turn off topic renaming you lose cycle detection, so maybe we
Hey Ryanne,
Awesome KIP, exited to see improvements in MirrorMaker land, I particularly
like the reuse of Connect framework! Would it be possible to utilize the
Message Headers feature to prevent infinite recursion? For example, MM2
could stamp every message with a special header payload (e.g.
Thanks Harsha. Done.
On Fri, Oct 19, 2018 at 1:03 AM Harsha Chintalapani wrote:
> Ryanne,
>Makes sense. Can you please add this under rejected alternatives so
> that everyone has context on why it wasn’t picked.
>
> Thanks,
> Harsha
> On Oct 18, 2018, 8:02 AM -0700, Ryanne Dolan ,
>
Ryanne,
Makes sense. Can you please add this under rejected alternatives so that
everyone has context on why it wasn’t picked.
Thanks,
Harsha
On Oct 18, 2018, 8:02 AM -0700, Ryanne Dolan , wrote:
> Harsha, concerning uReplicator specifically, the project is a major
> inspiration for
Harsha, concerning uReplicator specifically, the project is a major
inspiration for MM2, but I don't think it is a good foundation for anything
included in Apache Kafka. uReplicator uses Helix to solve problems that
Connect also solves, e.g. REST API, live configuration changes, cluster
Jan, thanks for the share. Also similar are Pulsar's concepts of namespaces
and global topics. I don't think these need to be supported in Kafka
itself, but there are many benefits to adopting naming conventions along
these lines, esp for tooling, dashboards etc.
> use it to copy my messages from
then I just hope that in the midsts of all this new features I can still
at least use it to copy my messages from A to B later.
Another hint you should be aware of:
https://cwiki.apache.org/confluence/display/KAFKA/Hierarchical+Topics
That was always a design I admired, with active / active
Harsha, yes I can do that. I'll update the KIP accordingly, thanks.
Ryanne
On Wed, Oct 17, 2018 at 12:18 PM Harsha wrote:
> Hi Ryanne,
>Thanks for the KIP. I am also curious about why not use the
> uReplicator design as the foundation given it alreadys resolves some of the
>
Hi Ryanne,
Thanks for the KIP. I am also curious about why not use the
uReplicator design as the foundation given it alreadys resolves some of the
fundamental issues in current MIrrorMaker, updating the confifgs on the fly and
running the mirror maker agents in a worker model
Jan, these are two separate issues.
1) consumer coordination should not, ideally, involve unreliable or slow
connections. Naively, a KafkaSourceConnector would coordinate via the
source cluster. We can do better than this, but I'm deferring this
optimization for now.
2) exactly-once between two
This is not a performance optimisation. Its a fundamental design choice.
I never really took a look how streams does exactly once. (its a trap
anyways and you usually can deal with at least once donwstream pretty
easy). But I am very certain its not gonna get somewhere if offset
commit and
> Oh - got it, it checks the entire prefix, which seems obvious to me in
retrospect :)
Rhys, I've changed the wording to make this more clear, thanks for calling
it out.
Ryanne
On Tue, Oct 16, 2018 at 4:16 PM McCaig, Rhys
wrote:
>
> > In your example, us-west.us-east.us-central.us-west.topic
> In your example, us-west.us-east.us-central.us-west.topic is an invalid
> "remote topic" name because us-west appears twice. MM2 will not replicate
> us-east.us-central.us-west.topic into us-west a second time, because the
> source topic already has us-west in the prefix. This is what I mean by
> Could you comment on the approach of
> your method vs. using other open source tools like Uber's uReplicator or
> the recently open-sourced Mirus from Salesforce?
Eno, a primary differentiator is that KIP-382 is "opinionated" about how
replication should be done, e.g. by applying topic renaming
> But one big obstacle in this was
always that group coordination happened on the source cluster.
Jan, thank you for bringing up this issue with legacy MirrorMaker. I
totally agree with you. This is one of several problems with MirrorMaker I
intend to solve in MM2, and I already have a design
no worries,
glad i could clarify
On 16.10.2018 15:14, Andrew Otto wrote:
> O ok apologies. Interesting!
>
> On Tue, Oct 16, 2018 at 9:06 AM Jan Filipiak
> wrote:
>
>> Hi Andrew,
>>
>> thanks for your message, you missed my point.
>>
>> Mirrormaker collocation with target is for sure
O ok apologies. Interesting!
On Tue, Oct 16, 2018 at 9:06 AM Jan Filipiak
wrote:
> Hi Andrew,
>
> thanks for your message, you missed my point.
>
> Mirrormaker collocation with target is for sure correct.
> But then group coordination happens across WAN which is unnecessary.
> And I request
Hi Andrew,
thanks for your message, you missed my point.
Mirrormaker collocation with target is for sure correct.
But then group coordination happens across WAN which is unnecessary.
And I request to be thought about again.
I made a PR back then for zk Consumer to allow having 2 zookeeper
> I would generally say a LAN is better than a WAN for doing group
> coordinaton
For sure, but a LAN is better than a WAN for producing messages too. If
there is network congestion during network production, messages will be
dropped. With MirrorMaker currently, you can either skip these dropped
Hi,
Currently MirrorMaker is usually run collocated with the target cluster.
This is all nice and good. But one big obstacle in this was
always that group coordination happened on the source cluster. So when
then network was congested, you sometimes loose group membership and
have to
This update is much needed, thank you! Could you comment on the approach of
your method vs. using other open source tools like Uber's uReplicator or
the recently open-sourced Mirus from Salesforce? (
https://engineering.salesforce.com/open-sourcing-mirus-3ec2c8a38537). I
strongly believe
Rhys, thanks for your enthusiasm!
In your example, us-west.us-east.us-central.us-west.topic is an invalid
"remote topic" name because us-west appears twice. MM2 will not replicate
us-east.us-central.us-west.topic into us-west a second time, because the
source topic already has us-west in the
Hi Ryanne,
This KIP is fantastic. It provides a great vision for how MirrorMaker should
evolve in the Kafka project.
I have a question on cycle detection - In a scenario where I have 3 clusters
replicating between each other, it seems it may be easy to misconfigure the
connectors if auto
Hey y'all!
Please take a look at KIP-382:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0
Thanks for your feedback and support.
Ryanne
79 matches
Mail list logo