Thanks a lot Ryanne. That was really very helpful. On Thu, Aug 20, 2020, 11:49 PM Ryanne Dolan <ryannedo...@gmail.com> wrote:
> > Can we configure tasks.max for each of these connectors separately? > > I don't believe that's currently possible. If you need fine-grained control > over each Connector like that, you might consider running MM2's Connectors > manually on a bunch of Connect clusters. This requires more effort to set > up, but enables you to control the configuration of each Connector using > the Connect REST API. > > Ryanne > > On Thu, Aug 20, 2020 at 12:30 PM Ananya Sen <ananya281...@gmail.com> > wrote: > > > Thanks, Ryanne. That answers my questions. I was actually missing this > > "tasks.max" property. Thanks for pointing that out. > > > > Furthermore, as per the KIP of Mirror Maker 2.0, there are 3 types of > > connectors in a Mirror Maker Cluster: > > > > 1. KafkaSourceConnector - focus on replicating topic partitions > > 2. KafkaCheckpointConnector - focus on replicating consumer groups > > 3. KafkaHeartbeatConnector - focus on checking cluster availability > > > > *Can we configure tasks.max for each of these connectors separately? That > > is, Can I have 3 tasks for KafkaSourceConnector, 5 > > for KafkaCheckpointConnector, and 1 for KafkaHeartbeatConnector?* > > > > > > > > Regards > > Ananya Sen > > > > On Thu, Aug 20, 2020 at 6:39 PM Ryanne Dolan <ryannedo...@gmail.com> > > wrote: > > > > > Ananya, see responses below. > > > > > > > Can this number of workers be configured? > > > > > > The number of workers is not exactly configurable, but you can control > it > > > by spinning up drivers and using the '--clusters' flag. A driver > instance > > > without '--clusters' will run one worker for each A->B replication > flow. > > So > > > e.g. if you've got two clusters being replicated bidirectionally, > you'll > > > have an A->B worker and a B->A worker on each MM2 driver. > > > > > > You can use the '--clusters' flag to limit what clusters are targeted > > for a > > > given driver, which is useful in many ways, including to limit the > number > > > of workers for a given worker. So e.g. if you've got 10 clusters all > > being > > > replicated in a full mesh you can run a driver with '--clusters A' and > it > > > will have only 9 workers, one for each of the other clusters. > > > > > > Also note that there is a configuration property 'tasks.max' that > > controls > > > the number of tasks available to workers. Each A->B flow is replicated > > by a > > > Herd of Workers (in Connect terminology), and Herds work on Tasks. By > > > default, 'tasks.max' is one, which means there will only be one task > for > > > each Herd, regardless of how many drivers and workers you spin up. You > > > definitely want to change this property. You can tweak this for each > A->B > > > replication flow independently to strike the right balance. If > > 'tasks.max' > > > is the same or more than the total number of topic-partitions being > > > replicated, it will mean each topic-partition is replicated in a > > dedicated > > > task, which is probably not an efficient use of resource overhead. > > > > > > > Does every topic partition given a new task? > > > > > > No, topic-partitions are spread out across tasks. Each topic's > partitions > > > are divided round-robin among available tasks. However, keep in mind > that > > > if 'tasks.max' is too high, you could end up with one topic-partition > in > > > each task. > > > > > > > Does every consumer group - topic pair given a new task for > replicating > > > offset? > > > > > > No, consumer-groups are also spread out across tasks. As with > > > topic-partitions, 'tasks.max' applies. > > > > > > > How can I scale up the mirror maker instance so that I can have very > > > little lag? > > > > > > Tweak 'tasks.max' and spin up more driver instances. > > > > > > Ryanne > > > > > > On Sat, Aug 8, 2020 at 1:43 AM Ananya Sen <ananya281...@gmail.com> > > wrote: > > > > > > > Thank you Ryanne for the quick response. > > > > I further want to clarify a few points. > > > > > > > > The mirror maker 2.0 is based on the Kafka Connect framework. In > Kafka > > > > connect we have multiple workers and each worker has some assigned > > task. > > > To > > > > map this to Mirror Maker 2.0, A mirror Maker will driver have some > > > workers. > > > > > > > > 1) Can this number of workers be configured? > > > > 2) What is the default value of this worker configuration? > > > > 3) Does every topic partition given a new task? > > > > 4) Does every consumer group - topic pair given a new task for > > > replicating > > > > offset? > > > > > > > > Also, consider a case where I have 1000 topics in a Kafka cluster and > > > each > > > > topic has a high amount of data + new data is being written at high > > > > throughput. Now I want to set up a mirror maker 2.0 on this cluster > to > > > > replicate all the old data (which is retained in the topic) as well > as > > > the > > > > new incoming data in a backup cluster. How can I scale up the mirror > > > maker > > > > instance so that I can have very little lag? > > > > > > > > On 2020/07/11 06:37:56, Ananya Sen <ananya281...@gmail.com> wrote: > > > > > Hi > > > > > > > > > > I was exploring the Mirror maker 2.0. I read through this > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0 > > > > > documentation > > > > > and I have a few questions. > > > > > > > > > > 1. For running mirror maker as a dedicated mirror maker cluster, > > the > > > > > documentation specifies a config file and a starter script. Is > > this > > > > mirror > > > > > maker process distributed ? > > > > > 2. I could not find any port configuration for the above mirror > > > maker > > > > > process, So can we configure mirror maker itself to run as a > > cluster > > > > i.e > > > > > running the process instance across multiple server to avoid > > > downtime > > > > due > > > > > to server crash. > > > > > 3. If we could somehow run the mirror maker as a distributed > > process > > > > > then does that mean that topic and consumer offset replication > > will > > > be > > > > > shared among those mirror maker processes? > > > > > 4. What is the default port of this mirror maker process and how > > can > > > > we > > > > > override it? > > > > > > > > > > Looking forward to your reply. > > > > > > > > > > > > > > > Thanks & Regards > > > > > Ananya Sen > > > > > > > > > > > > > > >