I am trying to tune mirrormaker configurations based on this doc
<https://cwiki.apache.org/confluence/display/KAFKA/Kafka+mirroring+(MirrorMaker)#Kafkamirroring%28MirrorMaker%29-Consumerandsourceclustersocketbuffersizes>
and
would like know your recommendations.

Our configuration: We are doing inter datacenter replication with 5 brokers
in source and destination DC and 2 mirrormakers doing replication. We have
about 4 topics with 4 partitions each.
I have been consumerOffsetChecker to analysis lag based on tuning.


   1. num.streams : - We have set num.streams=2 so that 4 partitions will
   be shared between 2 mirrormaker. Increasing num.streams more than this did
   not improve any performance, is this correct?
   2. num.producers:- We initially set num.producers = 4 (assuming one
   producer thread per topic), then we bumped num.producers = 16, but did not
   see any improvement in performance..? Is this correct..? How do we
   determine optimum value for num.producers ?
   3. *socket.buffersize : *We initially had default values for these, then
   I changed socket.send.buffer.bytes on source broker,
   socket.receive.buffer.bytes, fetch.message.max.bytes on mirrormaker
   consumer properties, socket.receive.buffer.bytes,
   socket.request.max.bytes on destination broker all to
   1024*1024*1024(1073741824) . This did improve the performance, but I could
   not get Lag to < 100.

   Here is how our lag looks like after above changes:

Group           Topic                                  Pid Offset
    logSize          Lag             Owner
mirrormakerProd FunnelProto                    0   554704539
554717088       12549
mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-0
mirrormakerProd FunnelProto                    1   547370573
547383136       12563
mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-1
mirrormakerProd FunnelProto                    2   553124930
553125742       812
mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-0
mirrormakerProd FunnelProto                    3   552990834
552991650       816
mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-1
mirrormakerProd agent                          0   35438           35440
        2
mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-0
mirrormakerProd agent                          1   35447           35448
        1
mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-1
mirrormakerProd agent                          2   35375           35375
        0
mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-0
mirrormakerProd agent                          3   35336           35336
        0
mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-1
mirrormakerProd internal_metrics               0   1930852823
 1930917418      64595
mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-0
mirrormakerProd internal_metrics               1   1937237324
 1937301841      64517
mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-1
mirrormakerProd internal_metrics               2   1945894901
 1945904067      9166
 mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-0
mirrormakerProd internal_metrics               3   1946906932
 1946915928      8996
 mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-1
mirrormakerProd jmx                            0   485270038
485280882       10844
mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-0
mirrormakerProd jmx                            1   486363914
486374759       10845
mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-1
mirrormakerProd jmx                            2   491783842
491784826       984
mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-0
mirrormakerProd jmx                            3   485675629
485676643       1014
 mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-1

In mirrormaker logs, I see topic metadata is fetched after every 10mins and
connection reestablished with producers for producing. Is this normal? If
it's continuously producing, why does it need to reconnect to destination
brokers for producing.?
What else can we tune to bring lag < 100 ..?  This is just small set of
data we are currently testing, the real production traffic will be very
large. How can compute optimum configuration as data traffic increases.?

Thanks for help,

Thanks,
Raja.

Reply via email to