[ https://issues.apache.org/jira/browse/HDFS-16671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hui Fei resolved HDFS-16671. ---------------------------- Fix Version/s: 1.3.0 Resolution: Fixed > RBF: RouterRpcFairnessPolicyController supports configurable permit acquire > timeout > ----------------------------------------------------------------------------------- > > Key: HDFS-16671 > URL: https://issues.apache.org/jira/browse/HDFS-16671 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: ZanderXu > Assignee: ZanderXu > Priority: Major > Labels: pull-request-available > Fix For: 1.3.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > RouterRpcFairnessPolicyController supports configurable permit acquire > timeout. Hardcode 1s is very long, and it has causes an incident in our prod > environment when one nameserivce is busy. > And the optimal timeout maybe should be less than p50(avgTime). > And all handlers in RBF is waiting to acquire the permit of the busy ns. > {code:java} > "IPC Server handler 12 on default port 8888" #2370 daemon prio=5 os_prio=0 > tid=? nid=? waiting on condition [?] > java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <?> (a > java.util.concurrent.Semaphore$NonfairSync) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) > at java.util.concurrent.Semaphore.tryAcquire(Semaphore.java:409) > at > org.apache.hadoop.hdfs.server.federation.fairness.AbstractRouterRpcFairnessPolicyController.acquirePermit(AbstractRouterRpcFairnessPolicyController.java:56) > at > org.apache.hadoop.hdfs.server.federation.fairness.DynamicRouterRpcFairnessPolicyController.acquirePermit(DynamicRouterRpcFairnessPolicyController.java:123) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.acquirePermit(RouterRpcClient.java:1500) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org