Is this just a metrics issue or is there an actual replication lag?
Valli <[email protected]> 于2023年8月11日周五 22:51写道: > > Hello HBase Community, > > We recently upgraded our HBase cluster from version 1.2.6 to 1.4.14 and > have encountered an issue with replication lag in our Disaster Recovery > (DR) cluster. We have two clusters in our setup: an active write cluster > and a DR cluster that receives replication from the active cluster. The > replication lag in the DR cluster has been building up, even though there > are no direct writes to it. > > Here's a brief overview of the problem: > - We have an active write cluster with no replication lag. > - The DR cluster only receives replication from the active cluster and > doesn't have direct writes. > - Replication lag builds up in the DR cluster over time, even though there > is no active write. > - When a 'put' call is made in the DR cluster, the replication lag reduces > momentarily, but then starts building up . > > We have experienced similar kind of issue in 1.4.9 version in another > cluster. We used the below patch for it. > > https://issues.apache.org/jira/browse/HBASE-22784 > > But 1.4.14 version contains above patch but still we experience issue. > > If there are any specific configurations or adjustments we should be making > to address this problem. It's important for us to maintain a reliable DR > setup, and any guidance or insights you can provide would be greatly > appreciated. > > If anyone has experienced a similar issue after upgrading HBase or has any > recommendations on how to troubleshoot and resolve replication lag in a DR > cluster, please share your thoughts. > > Thank you in advance for your time and assistance. Your expertise and > insights are invaluable to us as we work to resolve this issue and maintain > the stability of our HBase setup. > > Best regards, > Manimekalai K > -- > *Regards,* > *Manimekalai K*
