Re: My Hbase replication is not working

Duo Zhang Sun, 12 Dec 2021 06:36:29 -0800

We have fixed several replication related issues which may cause data loss,
for example, this one


https://issues.apache.org/jira/browse/HBASE-26482

For serial replication, if we miss some wal files, it usually causes
replication to be stuck...

Mallikarjun <[email protected]> 于2021年12月12日周日 18:19写道：

> Sync table is to be run manually when you think there can be
> inconsistencies between the 2 clusters only for specific time period.
>
> As soon as you disable serial replication, it should start replicating from
> the time it was stuck. You can build dashboards from jmx metrics generated
> from hmaster to know about these and setup alerts as well.
>
>
>
> On Sun, Dec 12, 2021, 3:33 PM Hamado Dene <[email protected]>
> wrote:
>
> > Ok perfect.How often should this sync run? I guess in this case you have
> > to automate it somehow, correct?
> > Since I will have to disable serial mode, do I first have to align tables
> > manually or the moment I disable serial mode, the regionservers will
> start
> > replicating from where they were blocked?
> >
> >
> >     Il domenica 12 dicembre 2021, 10:55:05 CET, Mallikarjun <
> > [email protected]> ha scritto:
> >
> >  https://hbase.apache.org/book.html#hashtable.synctable
> >
> > To copy the difference between tables for a specific time period.
> >
> > On Sun, Dec 12, 2021, 3:12 PM Hamado Dene <[email protected]>
> > wrote:
> >
> > > Interesting, thank you very much for the info. I'll try to disable
> serial
> > > replication.As for "sync table utility" what do you mean?I am new to
> > Hbase,
> > > I am not yet familiar with all Hbase tools.
> > >
> > >
> > >
> > >    Il domenica 12 dicembre 2021, 10:15:01 CET, Mallikarjun <
> > > [email protected]> ha scritto:
> > >
> > >  We have faced issues with serial replication when one of the region
> > server
> > > of either cluster goes into hardware failure, typically memory from my
> > > understanding. I could not spend enough time to reproduce reliably to
> > > identify the root cause. So I don't know why it is caused.
> > >
> > > Issue could be your serial replication has got into deadlock mode among
> > the
> > > region servers. Who are not able to make any progress because older
> > > sequence ID is not replicated and older sequence ID is not in front of
> > the
> > > line to be able to replicate itself.
> > >
> > > Quick fix: disable serial replication temporarily so that out of
> ordering
> > > is allowed to unblock the replication. Can result into some
> > inconsistencies
> > > between clusters which can be fixed using sync table utility since your
> > > setup is active passive
> > >
> > > Another fix: delete barriers for each regions in hbase:meta. Same
> > > consequence as above.
> > >
> > >
> > > On Sun, Dec 12, 2021, 2:24 PM Hamado Dene <[email protected]
> >
> > > wrote:
> > >
> > > > I'm using hbase 2.2.6 with hadoop 2.8.5.Yes, My replication serial is
> > > > enabled.This is my peer configuration
> > > >
> > > >
> > > >
> > > > |
> > > > | Peer Id | Cluster Key | Endpoint | State | IsSerial | Bandwidth |
> > > > ReplicateAll | Namespaces | Exclude Namespaces | Table Cfs | Exclude
> > > Table
> > > > Cfs |
> > > > | replicav1 | acv-db10-hn,acv-db11-hn,acv-db12-hn:2181:/hbase |  |
> > > ENABLED
> > > > | true | UNLIMITED | true
> > > >
> > > >  |
> > > >
> > > >    Il domenica 12 dicembre 2021, 09:39:44 CET, Mallikarjun <
> > > > [email protected]> ha scritto:
> > > >
> > > >  Which version of hbase are you using? Is your replication serial
> > > enabled?
> > > >
> > > > ---
> > > > Mallikarjun
> > > >
> > > >
> > > > On Sun, Dec 12, 2021 at 1:54 PM Hamado Dene
> > <[email protected]
> > > >
> > > > wrote:
> > > >
> > > > > Hi Hbase community,
> > > > >
> > > > > On our production installation we have two hbase clusters in two
> > > > different
> > > > > datacenters.The primary datacenter replicates the data to the
> > secondary
> > > > > datacenter.When we create the tables, we first create on the
> > secondary
> > > > > datacenter and then on the primary and then we set replication
> scope
> > > to 1
> > > > > on the primary.The peer pointing to quorum zk of the secondary
> > cluster
> > > is
> > > > > configured on the primary.
> > > > > Initially, replication worked fine and data was replicated.We have
> > > > > recently noticed that some tables are empty in the secondary
> > > datacenter.
> > > > So
> > > > > most likely the data is no longer replicated. I'm seeing lines like
> > > this
> > > > in
> > > > > the logs:
> > > > >
> > > > >
> > > > > Recovered source for cluster/machine(s) replicav1: Total replicated
> > > > edits:
> > > > > 0, current progress:walGroup [db11%2C16020%2C1637849866921]:
> > currently
> > > > > replicating from:
> > > > >
> > > >
> > >
> >
> hdfs://rozzanohadoopcluster/hbase/oldWALs/db11-hd%2C16020%2C1637849866921.1637849874263
> > > > > at position: -1
> > > > > Recovered source for cluster/machine(s) replicav1: Total replicated
> > > > edits:
> > > > > 0, current progress:walGroup [db09%2C16020%2C1637589840862]:
> > currently
> > > > > replicating from:
> > > > >
> > > >
> > >
> >
> hdfs://rozzanohadoopcluster/hbase/oldWALs/db09-hd%2C16020%2C1637589840862.1637589846870
> > > > > at position: -1
> > > > > Recovered source for cluster/machine(s) replicav1: Total replicated
> > > > edits:
> > > > > 0, current progress:walGroup [db13%2C16020%2C1635424806449]:
> > currently
> > > > > replicating from:
> > > > >
> > > >
> > >
> >
> hdfs://rozzanohadoopcluster/hbase/oldWALs/db13%2C16020%2C1635424806449.1635424812985
> > > > > at position: -1
> > > > >
> > > > >
> > > > >
> > > > > 2021-12-12 09:13:47,148 INFO  [rzv-db09-hd:16020Replication
> > Statistics
> > > > #0]
> > > > > regionserver.Replication: ormal source for cluster replicav1: Total
> > > > > replicated edits: 0, current progress:walGroup
> > > > > [db09%2C16020%2C1638791923537]: currently replicating from:
> > > > >
> > > >
> > >
> >
> hdfs://rozzanohadoopcluster/hbase/WALs/rzv-db09-hd.rozzano.diennea.lan,16020,1638791923537/rzv-db09-hd.rozzano.diennea.lan%2C16020%2C1638791923537.1638791930213
> > > > > at position: -1
> > > > > Recovered source for cluster/machine(s) replicav1: Total replicated
> > > > edits:
> > > > > 0, current progress:walGroup [db09%2C16020%2C1634401671527]:
> > currently
> > > > > replicating from:
> > > > >
> > > >
> > >
> >
> hdfs://rozzanohadoopcluster/hbase/oldWALs/rzv-db09-hd.rozzano.diennea.lan%2C16020%2C1634401671527.1634401679218
> > > > > at position: -1
> > > > > Recovered source for cluster/machine(s) replicav1: Total replicated
> > > > edits:
> > > > > 0, current progress:walGroup [db10%2C16020%2C1637585899997]:
> > currently
> > > > > replicating from:
> > > > >
> > > >
> > >
> >
> hdfs://rozzanohadoopcluster/hbase/oldWALs/rzv-db10-hd.rozzano.diennea.lan%2C16020%2C1637585899997.1637585906625
> > > > > at position: -1
> > > > >
> > > > >
> > > > >
> > > > > 2021-12-12 08:24:58,561 WARN
> > > [regionserver/rzv-db12-hd:16020.logRoller]
> > > > > regionserver.ReplicationSource: WAL group
> > db12%2C16020%2C1638790692057
> > > > > queue size: 187 exceeds value of
> replication.source.log.queue.warn: 2
> > > > > Do you have any info on what could be the problem?
> > > > >
> > > > > Thanks
> > > >
> > >
> >
>

Re: My Hbase replication is not working

Reply via email to