[ https://issues.apache.org/jira/browse/HBASE-8502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jimmy Xiang resolved HBASE-8502. -------------------------------- Resolution: Unresolved Closed this one for now. I think the region stuck in transition because hbck repair could have done something wrong. We can re-open it if we see the issue again, without running hbck repair. > Eternally stuck Region after split > ---------------------------------- > > Key: HBASE-8502 > URL: https://issues.apache.org/jira/browse/HBASE-8502 > Project: HBase > Issue Type: Bug > Affects Versions: 0.92.1 > Reporter: Dimitri Goldin > Priority: Critical > Attachments: hbase_lost_parent.txt, hbase_run.log, > stuck_region_exception.txt > > > Exact HBase version: 0.92.1-cdh4.1.2 > A couple of days ago I encountered a RIT problem with a single region. > After an hbck run it started trying to assign a region which has been > bouncing between OFFLINE/PENDING_OPEN/OPENING for two days afterwards. > This was due to a split gone wrong in some way, which led to several > reference files being left in the region-directory despite the two relevant > HFiles being copies successfully to the daughter. > I will try to give as many details as possible, but unfortunately I was > unable to find any information about the split itself. > Short thread about this issue on the users-ML: > http://mail-archives.apache.org/mod_mbox/hbase-user/201305.mbox/%3c5182758b.1060...@neofonie.de%3E > === > Parent region: 5b9c16898a371de58f31f0bdf86b1f8b > Daughter region in question: 79c619508659018ff3ef0887611eb8f7 > Rough sequence from the logs seems to be the following: > === > * Received request to open region: > documents,7128586022887322720,1363696791400.79c619508659018ff3ef0887611eb8f7. > * Setting up tabledescriptor config now ... > * Opening of region {NAME => > 'documents,7128586022887322720,1363696791400.79c619508659018ff3ef0887611eb8f7.', > STARTKEY => '7128586022887322720', > ENDKEY => '7130716361635801616', > ENCODED => 79c619508659018ff3ef0887611eb8f7,} failed, marking as > FAILED_OPEN in ZK > * File does not exist: > /hbase/documents/5b9c16898a371de58f31f0bdf86b1f8b/d/0707b1ec4c6b41cf9174e0d2a1785fe9 > > [...] > === > What happened, was that somehow (and that's the question here) the daughters > region folder contained some left-over reference files were causing the > RegionServer to look-up the parent region, which already was deleted. > original contents of /hbase/documents/79c619508659018ff3ef0887611eb8f7/d: > == > 0707b1ec4c6b41cf9174e0d2a1785fe9.5b9c16898a371de58f31f0bdf86b1f8b > 47511faae81b4452afd3ca206e28346f.5b9c16898a371de58f31f0bdf86b1f8b > 4f01ecd052ce464d81e79a62ea227d6b > 4f01ecd052ce464d81e79a62ea227d6b.5b9c16898a371de58f31f0bdf86b1f8b > eb7dbb09701d4353be24ca82481c4a7e > == > I attached the full FileNotFound Exception. > Please let me know if I can provide more information or help otherwise. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira