If early 2.0 or 2.1 versions, perhaps it is HBASE-21292 ? S On Thu, Jun 27, 2019 at 10:13 PM Stack <st...@duboce.net> wrote:
> On Wed, Jun 26, 2019 at 6:47 AM James Kebinger <jkebin...@hubspot.com> > wrote: > >> Hello, >> We're experimenting with meta table replication > > > Replicating the meta table itself? What you thinking? > > > >> but have found that the >> region servers hosting the replicas will get extremely high priority RPC >> handler usage, sometimes all the way to 100% at which point clients start >> to experience errors - the priority RPC handler usage is much higher than >> the region servers hosting non-replicated meta ever experienced. >> > > Meta Table will be hot on a busy cluster. Adding replication load will up > the temperature even more. > > Check out the James Moore suggestions. HDFS issue could occupy handler > while it holds a the id lock on a popular block. > > Can you figure who is trying to read the removed file? > > Can't tell for sure from the snippet below but are handlers frequently > going out to HDFS to pull in blocks rather than fetch from block cache? How > is server-side cache on the server hosting meta? Can you get more meta up > into block cache? > > If you disable replication, all settles down? > > S > > > Has anyone seen this behavior? Are we doing something wrong? >> Our settings changes are >> >> hbase.regionserver.meta.storefile.refresh.period: 30000 >> hbase.meta.replica.count: 3 >> hbase.master.hfilecleaner.ttl: 600000 >> >> We're on cdh 5.9.0 (hbase 1.2) >> >> We see some occasional errors like this one below (at the time I'm writing >> this 30 minutes after the log entry, that file doesn't exist) >> java.io.IOException: Got error for OP_READ_BLOCK, status=ERROR, >> self=/REDACTED:11702, remote=REDACTED:50010, for file >> /hbase/data/hbase/meta/1588230740/info/85c3e4b648eb437a8ef62ff9b39f525d, >> for pool BP-812715994-REDACTED-1441218207856 block 1083723183_9985162 >> at >> >> org.apache.hadoop.hdfs.RemoteBlockReader2.checkSuccess(RemoteBlockReader2.java:467) >> at >> >> org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:432) >> at >> >> org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:881) >> at >> >> org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:759) >> at >> >> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:376) >> >> Doing a stack dump of a region server process to see what the priority >> handlers are up to reveals the following: >> >> 1 times (0% total cpu) >> "RpcServer.Priority.general.handler=12,queue=0,port=60020" #138 daemon >> prio=5 os_prio=0 tid=0x00007fc5471a3000 nid=0x3f33 waiting on condition >> [0x00007fc4f5870000] cpu=0.0 start=2019-06-20 19:31:47 >> java.lang.Thread.State: WAITING (parking) >> at sun.misc.Unsafe.park(Native Method) >> - parking to wait for <0x00000003d94b6f18> (a >> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) >> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) >> at >> >> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) >> at >> >> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) >> at >> >> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) >> at >> >> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943) >> at >> >> org.apache.hadoop.hbase.regionserver.HStore.replaceStoreFiles(HStore.java:1348) >> at >> >> org.apache.hadoop.hbase.regionserver.HStore.refreshStoreFilesInternal(HStore.java:652) >> -- >> >> 3 times (0% total cpu) >> "RpcServer.Priority.general.handler=3,queue=1,port=60020" #129 daemon >> prio=5 os_prio=0 tid=0x00007fc547190800 nid=0x3f2a waiting on condition >> [0x00007fc4f6178000] cpu=0.0 start=2019-06-20 19:31:47 >> java.lang.Thread.State: WAITING (parking) >> at sun.misc.Unsafe.park(Native Method) >> - parking to wait for <0x0000000723bbee00> (a >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) >> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) >> at >> >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) >> at >> >> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) >> at >> >> java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:193) >> at >> >> org.apache.hadoop.hdfs.DFSInputStream.getFirstToComplete(DFSInputStream.java:1359) >> at >> >> org.apache.hadoop.hdfs.DFSInputStream.hedgedFetchBlockByteRange(DFSInputStream.java:1327) >> at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1468) >> -- >> >> 5 times (0% total cpu) >> "RpcServer.Priority.general.handler=2,queue=0,port=60020" #128 daemon >> prio=5 os_prio=0 tid=0x00007fc54718f000 nid=0x3f29 in Object.wait() >> [0x00007fc4f6279000] cpu=0.0 start=2019-06-20 19:31:47 >> java.lang.Thread.State: WAITING (on object monitor) >> at java.lang.Object.wait(Native Method) >> at java.lang.Object.wait(Object.java:502) >> at org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:81) >> - locked <0x00000004c0878978> (a >> org.apache.hadoop.hbase.util.IdLock$Entry) >> at >> org.apache.hadoop.hbase.io >> .hfile.HFileReaderV2.readBlock(HFileReaderV2.java:407) >> at >> org.apache.hadoop.hbase.io >> .hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:266) >> at >> org.apache.hadoop.hbase.io >> .hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:642) >> at >> org.apache.hadoop.hbase.io >> .hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:592) >> at >> >> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:294) >> -- >> >> 7 times (0% total cpu) >> "RpcServer.Priority.general.handler=1,queue=1,port=60020" #127 daemon >> prio=5 os_prio=0 tid=0x00007fc54718c800 nid=0x3f28 in Object.wait() >> [0x00007fc4f637a000] cpu=0.0 start=2019-06-20 19:31:47 >> java.lang.Thread.State: WAITING (on object monitor) >> at java.lang.Object.wait(Native Method) >> at java.lang.Object.wait(Object.java:502) >> at org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:81) >> - locked <0x0000000723bbef88> (a >> org.apache.hadoop.hbase.util.IdLock$Entry) >> at >> org.apache.hadoop.hbase.io >> .hfile.HFileReaderV2.readBlock(HFileReaderV2.java:407) >> at >> org.apache.hadoop.hbase.io >> .hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:266) >> at >> org.apache.hadoop.hbase.io >> .hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:642) >> at >> org.apache.hadoop.hbase.io >> .hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:592) >> at >> >> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:294) >> >