On Wed, Jun 26, 2019 at 6:47 AM James Kebinger <[email protected]>
wrote:

> Hello,
> We're experimenting with meta table replication


Replicating the meta table itself? What you thinking?



> but have found that the
> region servers hosting the replicas will get extremely high priority RPC
> handler usage, sometimes all the way to 100% at which point clients start
> to experience errors - the priority RPC handler usage is much higher than
> the region servers hosting non-replicated meta ever experienced.
>

Meta Table will be hot on a busy cluster. Adding replication load will up
the temperature even more.

Check out the James Moore suggestions. HDFS issue could occupy handler
while it holds a the id lock on a popular block.

Can you figure who is trying to read the removed file?

Can't tell for sure from the snippet below but are handlers frequently
going out to HDFS to pull in blocks rather than fetch from block cache? How
is server-side cache on the server hosting meta? Can you get more meta up
into block cache?

If you disable replication, all settles down?

S


Has anyone seen this behavior? Are we doing something wrong?
> Our settings changes are
>
> hbase.regionserver.meta.storefile.refresh.period: 30000
> hbase.meta.replica.count: 3
> hbase.master.hfilecleaner.ttl: 600000
>
> We're on cdh 5.9.0 (hbase 1.2)
>
> We see some occasional errors like this one below (at the time I'm writing
> this 30 minutes after the log entry, that file doesn't exist)
> java.io.IOException: Got error for OP_READ_BLOCK, status=ERROR,
> self=/REDACTED:11702, remote=REDACTED:50010, for file
> /hbase/data/hbase/meta/1588230740/info/85c3e4b648eb437a8ef62ff9b39f525d,
> for pool BP-812715994-REDACTED-1441218207856 block 1083723183_9985162
>         at
>
> org.apache.hadoop.hdfs.RemoteBlockReader2.checkSuccess(RemoteBlockReader2.java:467)
>         at
>
> org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:432)
>         at
>
> org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:881)
>         at
>
> org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:759)
>         at
>
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:376)
>
> Doing a stack dump of a region server process to see what the priority
> handlers are up to reveals the following:
>
> 1 times (0% total cpu)
> "RpcServer.Priority.general.handler=12,queue=0,port=60020" #138 daemon
> prio=5 os_prio=0 tid=0x00007fc5471a3000 nid=0x3f33 waiting on condition
> [0x00007fc4f5870000] cpu=0.0 start=2019-06-20 19:31:47
>    java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x00000003d94b6f18> (a
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at
>
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
> at
>
> org.apache.hadoop.hbase.regionserver.HStore.replaceStoreFiles(HStore.java:1348)
> at
>
> org.apache.hadoop.hbase.regionserver.HStore.refreshStoreFilesInternal(HStore.java:652)
> --
>
> 3 times (0% total cpu)
> "RpcServer.Priority.general.handler=3,queue=1,port=60020" #129 daemon
> prio=5 os_prio=0 tid=0x00007fc547190800 nid=0x3f2a waiting on condition
> [0x00007fc4f6178000] cpu=0.0 start=2019-06-20 19:31:47
>    java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x0000000723bbee00> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at
>
> java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:193)
> at
>
> org.apache.hadoop.hdfs.DFSInputStream.getFirstToComplete(DFSInputStream.java:1359)
> at
>
> org.apache.hadoop.hdfs.DFSInputStream.hedgedFetchBlockByteRange(DFSInputStream.java:1327)
> at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1468)
> --
>
> 5 times (0% total cpu)
> "RpcServer.Priority.general.handler=2,queue=0,port=60020" #128 daemon
> prio=5 os_prio=0 tid=0x00007fc54718f000 nid=0x3f29 in Object.wait()
> [0x00007fc4f6279000] cpu=0.0 start=2019-06-20 19:31:47
>    java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:502)
> at org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:81)
> - locked <0x00000004c0878978> (a org.apache.hadoop.hbase.util.IdLock$Entry)
> at
> org.apache.hadoop.hbase.io
> .hfile.HFileReaderV2.readBlock(HFileReaderV2.java:407)
> at
> org.apache.hadoop.hbase.io
> .hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:266)
> at
> org.apache.hadoop.hbase.io
> .hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:642)
> at
> org.apache.hadoop.hbase.io
> .hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:592)
> at
>
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:294)
> --
>
> 7 times (0% total cpu)
> "RpcServer.Priority.general.handler=1,queue=1,port=60020" #127 daemon
> prio=5 os_prio=0 tid=0x00007fc54718c800 nid=0x3f28 in Object.wait()
> [0x00007fc4f637a000] cpu=0.0 start=2019-06-20 19:31:47
>    java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:502)
> at org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:81)
> - locked <0x0000000723bbef88> (a org.apache.hadoop.hbase.util.IdLock$Entry)
> at
> org.apache.hadoop.hbase.io
> .hfile.HFileReaderV2.readBlock(HFileReaderV2.java:407)
> at
> org.apache.hadoop.hbase.io
> .hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:266)
> at
> org.apache.hadoop.hbase.io
> .hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:642)
> at
> org.apache.hadoop.hbase.io
> .hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:592)
> at
>
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:294)
>

Reply via email to