Sounds great! Thanks for the info. On Sun, Mar 15, 2015 at 2:00 AM, Andrew Purtell <apurt...@apache.org> wrote:
> We're hitting this type of HDFS issue in production too. Your best option > is to kill the regionserver process forcefully, start a replacement, and > let the region(s) affected recover. All edits should be persisted to the > WAL regardless of what Ted said about flushing. > > We are working on the problem, please see HBASE-13238 > > > On Saturday, March 14, 2015, Kristoffer Sjögren <sto...@gmail.com > <javascript:_e(%7B%7D,'cvml','sto...@gmail.com');>> wrote: > > > I think I found the thread that is stuck. Is restarting the server > harmless > > in this state? > > > > "RS_CLOSE_REGION-hdfs-ix03.se-ix.delta.prod,60020,1424687995350-1" > prio=10 > > tid=0x00007f75a0008000 nid=0x23ee in Object.wait() [0x00007f757d30b000] > > java.lang.Thread.State: WAITING (on object monitor) > > at java.lang.Object.wait(Native Method) > > at java.lang.Object.wait(Object.java:503) > > at > > > > > org.apache.hadoop.hdfs.DFSOutputStream.waitAndQueueCurrentPacket(DFSOutputStream.java:1411) > > - locked <0x00000007544573e8> (a java.util.LinkedList) > > at > > > > > org.apache.hadoop.hdfs.DFSOutputStream.writeChunk(DFSOutputStream.java:1479) > > - locked <0x0000000756780218> (a org.apache.hadoop.hdfs.DFSOutputStream) > > at > > > > > org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:173) > > at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:116) > > at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:102) > > - locked <0x0000000756780218> (a org.apache.hadoop.hdfs.DFSOutputStream) > > at > > > > > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54) > > at java.io.DataOutputStream.write(DataOutputStream.java:107) > > - locked <0x00000007543ef268> (a > > org.apache.hadoop.hdfs.client.HdfsDataOutputStream) > > at java.io.FilterOutputStream.write(FilterOutputStream.java:97) > > at > > > > > org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.writeHeaderAndData(HFileBlock.java:1061) > > at > > > > > org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.writeHeaderAndData(HFileBlock.java:1047) > > at > > > > > org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIntermediateBlock(HFileBlockIndex.java:952) > > at > > > > > org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIntermediateLevel(HFileBlockIndex.java:935) > > at > > > > > org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIndexBlocks(HFileBlockIndex.java:844) > > at > > > > > org.apache.hadoop.hbase.io.hfile.HFileWriterV2.close(HFileWriterV2.java:403) > > at > > > > > org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:1272) > > at > > > > > org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:835) > > - locked <0x000000075d8b2110> (a java.lang.Object) > > at org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:746) > > at > > > > > org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:2348) > > at > > > > > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1580) > > at > > > > > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1479) > > at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:992) > > at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:956) > > - locked <0x000000075d97b628> (a java.lang.Object) > > at > > > > > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:119) > > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175) > > at > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:745) > > > > > > On Sat, Mar 14, 2015 at 9:43 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > > > > bq. flush the region manually using shell? > > > > > > I doubt that would work - you can give it a try. > > > Please take jstack of region server in case you need to restart the > > server. > > > > > > BTW HBASE-10499 didn't go into 0.94 (maybe it should have). Please > > consider > > > upgrading. > > > > > > Cheers > > > > > > On Sat, Mar 14, 2015 at 1:30 PM, Kristoffer Sjögren <sto...@gmail.com> > > > wrote: > > > > > > > Hi Ted > > > > > > > > Sorry I forgot to mention, hbase-0.94.6 cdh 4.4. > > > > > > > > Yeah, it was a pretty write intensive scenario that I think triggered > > it > > > > (importing a lot of datapoints into opentsdb). > > > > > > > > Do I flush the region manually using shell? > > > > > > > > Cheers, > > > > -Kristoffer > > > > > > > > On Sat, Mar 14, 2015 at 9:22 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > > > > > > > > Which release of HBase are you using ? > > > > > > > > > > I wonder if your cluster was hit with HBASE-10499. > > > > > > > > > > Cheers > > > > > > > > > > On Sat, Mar 14, 2015 at 1:13 PM, Kristoffer Sjögren < > > sto...@gmail.com> > > > > > wrote: > > > > > > > > > > > Hi > > > > > > > > > > > > It seems one of our region servers has been stuck closing a > region > > > for > > > > > > almost 22 hours. Puts or gets eventually fail with an exception > > [1]. > > > > > > > > > > > > Is there any safe way to release the region like restarting the > > > region > > > > > > server? > > > > > > > > > > > > Cheers, > > > > > > -Kristoffer > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > 2015-03-14 21:02:24,316 INFO > > > > > org.apache.hadoop.hbase.regionserver.HRegion: > > > > > > Failed to unblock updates for region > > > > > > tsdb,\x00\x00\x9ETU\xAC@ > > > > > > > > > > > > > > > > > > > > > \x00\x00\x01\x00\x00\xAD\x00\x00\x05\x00\x00\xA7,1426282871862.4512f92b3d81e9142542d3b458223b63. > > > > > > 'IPC Server handler 9 on 60020' in 60000ms. The region is still > > busy. > > > > > > 2015-03-14 21:02:24,316 ERROR > > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: > > > > > > org.apache.hadoop.hbase.RegionTooBusyException: region is > flushing > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.java:2731) > > > > > > at > > > org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:2002) > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:2114) > > > > > > at sun.reflect.GeneratedMethodAccessor109.invoke(Unknown Source) > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > > > > > at java.lang.reflect.Method.invoke(Method.java:606) > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320) > > > > > > at > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1428) > > > > > > > > > > > > > > > > > > > > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) >