Re: Mirror Maker 2.0 Queries

Ananya Sen Thu, 20 Aug 2020 11:21:24 -0700

Thanks a lot Ryanne. That was really very helpful.

On Thu, Aug 20, 2020, 11:49 PM Ryanne Dolan <ryannedo...@gmail.com> wrote:


> > Can we configure tasks.max for each of these connectors separately?
>
> I don't believe that's currently possible. If you need fine-grained control
> over each Connector like that, you might consider running MM2's Connectors
> manually on a bunch of Connect clusters. This requires more effort to set
> up, but enables you to control the configuration of each Connector using
> the Connect REST API.
>
> Ryanne
>
> On Thu, Aug 20, 2020 at 12:30 PM Ananya Sen <ananya281...@gmail.com>
> wrote:
>
> > Thanks, Ryanne. That answers my questions. I was actually missing this
> > "tasks.max" property. Thanks for pointing that out.
> >
> > Furthermore, as per the KIP of Mirror Maker 2.0, there are 3 types of
> > connectors in a Mirror Maker Cluster:
> >
> >    1. KafkaSourceConnector - focus on replicating topic partitions
> >    2. KafkaCheckpointConnector - focus on replicating consumer groups
> >    3. KafkaHeartbeatConnector - focus on checking cluster availability
> >
> > *Can we configure tasks.max for each of these connectors separately? That
> > is, Can I have 3 tasks for KafkaSourceConnector, 5
> > for KafkaCheckpointConnector, and 1 for KafkaHeartbeatConnector?*
> >
> >
> >
> > Regards
> > Ananya Sen
> >
> > On Thu, Aug 20, 2020 at 6:39 PM Ryanne Dolan <ryannedo...@gmail.com>
> > wrote:
> >
> > > Ananya, see responses below.
> > >
> > > > Can this number of workers be configured?
> > >
> > > The number of workers is not exactly configurable, but you can control
> it
> > > by spinning up drivers and using the '--clusters' flag. A driver
> instance
> > > without '--clusters' will run one worker for each A->B replication
> flow.
> > So
> > > e.g. if you've got two clusters being replicated bidirectionally,
> you'll
> > > have an A->B worker and a B->A worker on each MM2 driver.
> > >
> > > You can use the '--clusters' flag to limit what clusters are targeted
> > for a
> > > given driver, which is useful in many ways, including to limit the
> number
> > > of workers for a given worker. So e.g. if you've got 10 clusters all
> > being
> > > replicated in a full mesh you can run a driver with '--clusters A' and
> it
> > > will have only 9 workers, one for each of the other clusters.
> > >
> > > Also note that there is a configuration property 'tasks.max' that
> > controls
> > > the number of tasks available to workers. Each A->B flow is replicated
> > by a
> > > Herd of Workers (in Connect terminology), and Herds work on Tasks. By
> > > default, 'tasks.max' is one, which means there will only be one task
> for
> > > each Herd, regardless of how many drivers and workers you spin up. You
> > > definitely want to change this property. You can tweak this for each
> A->B
> > > replication flow independently to strike the right balance. If
> > 'tasks.max'
> > > is the same or more than the total number of topic-partitions being
> > > replicated, it will mean each topic-partition is replicated in a
> > dedicated
> > > task, which is probably not an efficient use of resource overhead.
> > >
> > > > Does every topic partition given a new task?
> > >
> > > No, topic-partitions are spread out across tasks. Each topic's
> partitions
> > > are divided round-robin among available tasks. However, keep in mind
> that
> > > if 'tasks.max' is too high, you could end up with one topic-partition
> in
> > > each task.
> > >
> > > > Does every consumer group - topic pair given a new task for
> replicating
> > > offset?
> > >
> > > No, consumer-groups are also spread out across tasks. As with
> > > topic-partitions, 'tasks.max' applies.
> > >
> > > > How can I scale up the mirror maker instance so that I can have very
> > > little lag?
> > >
> > > Tweak 'tasks.max' and spin up more driver instances.
> > >
> > > Ryanne
> > >
> > > On Sat, Aug 8, 2020 at 1:43 AM Ananya Sen <ananya281...@gmail.com>
> > wrote:
> > >
> > > > Thank you Ryanne for the quick response.
> > > > I further want to clarify a few points.
> > > >
> > > > The mirror maker 2.0 is based on the Kafka Connect framework. In
> Kafka
> > > > connect we have multiple workers and each worker has some assigned
> > task.
> > > To
> > > > map this to Mirror Maker 2.0, A mirror Maker will driver have some
> > > workers.
> > > >
> > > > 1) Can this number of workers be configured?
> > > > 2) What is the default value of this worker configuration?
> > > > 3) Does every topic partition given a new task?
> > > > 4) Does every consumer group - topic pair given a new task for
> > > replicating
> > > > offset?
> > > >
> > > > Also, consider a case where I have 1000 topics in a Kafka cluster and
> > > each
> > > > topic has a high amount of data + new data is being written at high
> > > > throughput. Now I want to set up a mirror maker 2.0 on this cluster
> to
> > > > replicate all the old data (which is retained in the topic) as well
> as
> > > the
> > > > new incoming data in a backup cluster. How can I scale up the mirror
> > > maker
> > > > instance so that I can have very little lag?
> > > >
> > > > On 2020/07/11 06:37:56, Ananya Sen <ananya281...@gmail.com> wrote:
> > > > > Hi
> > > > >
> > > > > I was exploring the Mirror maker 2.0. I read through this
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0
> > > > > documentation
> > > > > and I have  a few questions.
> > > > >
> > > > >    1. For running mirror maker as a dedicated mirror maker cluster,
> > the
> > > > >    documentation specifies a config file and a starter script. Is
> > this
> > > > mirror
> > > > >    maker process distributed ?
> > > > >    2. I could not find any port configuration for the above mirror
> > > maker
> > > > >    process, So can we configure mirror maker itself to run as a
> > cluster
> > > > i.e
> > > > >    running the process instance across multiple server to avoid
> > > downtime
> > > > due
> > > > >    to server crash.
> > > > >    3. If we could somehow run the mirror maker as a distributed
> > process
> > > > >    then does that mean that topic and consumer offset replication
> > will
> > > be
> > > > >    shared among those mirror maker processes?
> > > > >    4. What is the default port of this mirror maker process and how
> > can
> > > > we
> > > > >    override it?
> > > > >
> > > > > Looking forward to your reply.
> > > > >
> > > > >
> > > > > Thanks & Regards
> > > > > Ananya Sen
> > > > >
> > > >
> > >
> >
>

Re: Mirror Maker 2.0 Queries

Reply via email to