[ https://issues.apache.org/jira/browse/HBASE-3604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
dhruba borthakur resolved HBASE-3604. ------------------------------------- Resolution: Duplicate duplicate of HBASE-2231 > Two region servers think that they own the same region: data loss > ----------------------------------------------------------------- > > Key: HBASE-3604 > URL: https://issues.apache.org/jira/browse/HBASE-3604 > Project: HBase > Issue Type: Bug > Components: regionserver > Affects Versions: 0.90.0 > Reporter: dhruba borthakur > Assignee: dhruba borthakur > > I observed this on a 100 node cluster that is constantly doing about 500K > ops/second. > The region server on machine A was servicing IOs for a particular region. > Then the machine went into a bad state where it is ping-able but not > ssh-able. The master detected that there is a problem with machine A and > reassigned the region to machine B. The regionserver on machine B opened the > region and opened all the required HFiles for this region. After two hours, > the NameNode received a delete request for one of the HFiles from machine A > and happily renamed the file to HDFS-Trash. After another 3 hours or so, the > regionserver on machine B tried to read contents from that HFile but failed > because the file was renamed earlier. The region server on B in now stuck, > and possible data loss. > The problems stems from the fact that although the master-and-ZK reassigned > the region, the old regionserver was not possibly dead. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira