Hello, after several network outages in AWS (never ever run HBase there!), my HBase was seriously damaged. After doing some steps like restarting namenodes, hdfs fsck, restarting all regionservers and hbase master, i'm still having 8 offline regions I am unable to start.
When running hbck with any combination of repair parameters, it's always stuck on messages like: 2016-04-20 03:26:16,812 INFO [hbasefsck-pool1-t45] util.HBaseFsckRepair: *Region still in transition, waiting for it to become assigned*: {ENCODED => 8fe9d66a1f4c4739dd1929e3c38bf951, NAME => 'MEDIA,\x01rvkUDKIuye0\x00YT,1460997677820.8fe9d66a1f4c4739dd1929e3c38bf951.', STARTKEY => '\x01rvkUDKIuye0\x00YT', ENDKEY => '\x01stefanonoferini/club-edition-17'} when looking into regionserver logs, I see messages like: 2016-04-19 23:27:54,969 ERROR [RS_OPEN_REGION-prod-aws-hbase-data-0010:16020-80] handler.OpenRegionHandler: Failed open of region=MEDIA,\x05JEklcNpOKos\ x00YT,1461001150488.20d48fd40c94c7c81049cbc506de4ad4., starting to roll back the global memstore size. java.io.IOException: java.io.IOException: java.io.FileNotFoundException: *File does not exist: /hbase/data/default/MEDIA/ecd1e565ab8a8bfba77cab46ed023539* /F/5eacfeb8a2eb419cb6fe348df0540145 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1828) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:587) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB .java:365) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.j ava) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) 2016-04-19 23:27:54,957 INFO [StoreOpener-20d48fd40c94c7c81049cbc506de4ad4-1] hfile.CacheConfig: blockCache=LruBlockCache{blockCount=2, currentSize=328 5448, freeSize=3198122040, maxSize=3201407488, heapSize=3285448, minSize=3041337088, minFactor=0.95, multiSize=1520668544, multiFactor=0.5, singleSize=7 60334272, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false , cacheDataCompressed=false, prefetchOnOpen=false 2016-04-19 23:27:54,957 INFO [StoreOpener-20d48fd40c94c7c81049cbc506de4ad4-1] compactions.CompactionConfiguration: size [134217728, 9223372036854775807 ); files [3, 10); ratio 1.200000; off-peak ratio 5.000000; throttle point 2684354560; major period 604800000, major jitter 0.500000, min locality to com pact 0.700000 2016-04-19 23:27:54,962 INFO [StoreFileOpenerThread-F-1] regionserver.StoreFile$Reader: Loaded Delete Family Bloom (CompoundBloomFilter) metadata for 5 eacfeb8a2eb419cb6fe348df0540145 2016-04-19 23:27:54,969 ERROR [RS_OPEN_REGION-prod-aws-hbase-data-0010:16020-80] regionserver.HRegion: Could not initialize all stores for the region=ME DIA,\x05JEklcNpOKos\x00YT,1461001150488.20d48fd40c94c7c81049cbc506de4ad4. 2016-04-19 23:27:54,969 WARN [StoreOpener-20d48fd40c94c7c81049cbc506de4ad4-1] ipc.Client: interrupted waiting to send rpc request to server java.lang.InterruptedException at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404) at java.util.concurrent.FutureTask.get(FutureTask.java:191) at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1054) at org.apache.hadoop.ipc.Client.call(Client.java:1449) at org.apache.hadoop.ipc.Client.call(Client.java:1407) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy16.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy17.getFileInfo(Unknown Source) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:279) at com.sun.proxy.$Proxy18.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424) at org.apache.hadoop.hbase.regionserver.HRegionFileSystem.createStoreDir(HRegionFileSystem.java:171) at org.apache.hadoop.hbase.regionserver.HStore.<init>(HStore.java:220) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:4973) at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:925) at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:922) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) I did all kinds of recovery magic, like restarting all components or cleaning ZK. I found this thread: http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/31308 that supposes to create empty hfiles, but I'm a bit afraid to do this. I'm using hbase 1.1.3 with hadoop 2.7.1, (both binary-downloaded from their websites) on ubuntu 14.04. Thank you for any help Michal