One of our production clusters had several region server failures. As a
result one of the tables is in an inconsistent state as reported by hbck.
We have tried using hbck repair commands but none seem to work. There is
one region that is stuck in a forever pending open state.
The error reported in RS log is about a StoreFile not found. But what is
really strange is that the store file that is reported as missing does
not even belong to the region being opened.
We tried to manually create a directory in HDFS and copy the missing
file but it causes hbck to report about a region in HDFS but not in Meta.
There 4 inconsistencies currently.
ERROR: Region { meta =>
<tableName>,I.1521_D.1361689200_9,1369099149747.2123fc70fac804cd8d48ea4494cc8184.,
hdfs =>
hdfs://host:8020/hbase/tableName/2123fc70fac804cd8d48ea4494cc8184,
deployed => } not deployed on any region server.
ERROR: Region { meta => null, hdfs =>
hdfs://hostname:8020/hbase/tableName/450ed30b410e9d6d54ac53099039cb28,
deployed => } on HDFS, but not listed in META or deployed on any region
server
13/05/21 10:51:11 DEBUG util.HBaseFsck: There are 1769 region info entries
ERROR: There is a hole in the region chain between I.1521_D.1361689200_9
and I.1521_D.1362150000_8. You need to create a new .regioninfo and
region dir in hdfs to plug the hole.
ERROR: There is a hole in the region chain between I.1_D.1368392400_9
and I.2020_D.1338948000_2. You need to create a new .regioninfo and
region dir in hdfs to plug the hole.
ERROR: Found inconsistency in table <tableName>
We are running Hbase 0.94 (Apache) on Hadoop 1.0.3
At this stage, we are stuck and are looking for help ! The cluster is in
an unbalanced state and region servers frequently keep dying.
Thanks,
Jay