Hello,

after several network outages in AWS (never ever run HBase there!), my
HBase was seriously damaged. After doing some steps like restarting
namenodes, hdfs fsck, restarting all regionservers and hbase master, i'm
still having 8 offline regions I am unable to start.

When running hbck with any combination of repair parameters, it's always
stuck on messages like:

2016-04-20 03:26:16,812 INFO  [hbasefsck-pool1-t45]
util.HBaseFsckRepair: *Region
still in transition, waiting for it to become assigned*: {ENCODED =>
8fe9d66a1f4c4739dd1929e3c38bf951, NAME =>
'MEDIA,\x01rvkUDKIuye0\x00YT,1460997677820.8fe9d66a1f4c4739dd1929e3c38bf951.',
STARTKEY => '\x01rvkUDKIuye0\x00YT', ENDKEY =>
'\x01stefanonoferini/club-edition-17'}

when looking into regionserver logs, I see messages like:

2016-04-19 23:27:54,969 ERROR
[RS_OPEN_REGION-prod-aws-hbase-data-0010:16020-80]
handler.OpenRegionHandler: Failed open of region=MEDIA,\x05JEklcNpOKos\
x00YT,1461001150488.20d48fd40c94c7c81049cbc506de4ad4., starting to roll
back the global memstore size.
java.io.IOException: java.io.IOException: java.io.FileNotFoundException: *File
does not exist: /hbase/data/default/MEDIA/ecd1e565ab8a8bfba77cab46ed023539*
/F/5eacfeb8a2eb419cb6fe348df0540145
        at
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71)
        at
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1828)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
        at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:587)
        at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB
.java:365)
        at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.j
ava)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
2016-04-19 23:27:54,957 INFO
 [StoreOpener-20d48fd40c94c7c81049cbc506de4ad4-1] hfile.CacheConfig:
blockCache=LruBlockCache{blockCount=2, currentSize=328
5448, freeSize=3198122040, maxSize=3201407488, heapSize=3285448,
minSize=3041337088, minFactor=0.95, multiSize=1520668544, multiFactor=0.5,
singleSize=7
60334272, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false,
cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false
, cacheDataCompressed=false, prefetchOnOpen=false
2016-04-19 23:27:54,957 INFO
 [StoreOpener-20d48fd40c94c7c81049cbc506de4ad4-1]
compactions.CompactionConfiguration: size [134217728, 9223372036854775807
); files [3, 10); ratio 1.200000; off-peak ratio 5.000000; throttle point
2684354560; major period 604800000, major jitter 0.500000, min locality to
com
pact 0.700000
2016-04-19 23:27:54,962 INFO  [StoreFileOpenerThread-F-1]
regionserver.StoreFile$Reader: Loaded Delete Family Bloom
(CompoundBloomFilter) metadata for 5
eacfeb8a2eb419cb6fe348df0540145
2016-04-19 23:27:54,969 ERROR
[RS_OPEN_REGION-prod-aws-hbase-data-0010:16020-80] regionserver.HRegion:
Could not initialize all stores for the region=ME
DIA,\x05JEklcNpOKos\x00YT,1461001150488.20d48fd40c94c7c81049cbc506de4ad4.
2016-04-19 23:27:54,969 WARN
 [StoreOpener-20d48fd40c94c7c81049cbc506de4ad4-1] ipc.Client: interrupted
waiting to send rpc request to server
java.lang.InterruptedException
        at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
        at java.util.concurrent.FutureTask.get(FutureTask.java:191)
        at
org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1054)
        at org.apache.hadoop.ipc.Client.call(Client.java:1449)
        at org.apache.hadoop.ipc.Client.call(Client.java:1407)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
        at com.sun.proxy.$Proxy16.getFileInfo(Unknown Source)
        at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
        at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        at com.sun.proxy.$Proxy17.getFileInfo(Unknown Source)
        at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at
org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:279)
        at com.sun.proxy.$Proxy18.getFileInfo(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116)
        at
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
        at
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
        at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
        at
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424)
        at
org.apache.hadoop.hbase.regionserver.HRegionFileSystem.createStoreDir(HRegionFileSystem.java:171)
        at
org.apache.hadoop.hbase.regionserver.HStore.<init>(HStore.java:220)
        at
org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:4973)
        at
org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:925)
        at
org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:922)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

I did all kinds of recovery magic, like restarting all components or
cleaning ZK.

I found this thread:
http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/31308 that
supposes to create empty hfiles, but I'm a bit afraid to do this.

I'm using hbase 1.1.3 with hadoop 2.7.1, (both binary-downloaded from their
websites) on ubuntu 14.04.

Thank you for any help

Michal

Reply via email to