Ming Ma created HDFS-6478: ----------------------------- Summary: RemoteException can't be retried properly for non-HA scenario Key: HDFS-6478 URL: https://issues.apache.org/jira/browse/HDFS-6478 Project: Hadoop HDFS Issue Type: Bug Reporter: Ming Ma
For HA case, the call stack is DFSClient -> RetryInvocationHandler -> ClientNamenodeProtocolTranslatorPB -> ProtobufRpcEngine. ProtobufRpcEngine. ProtobufRpcEngine throws ServiceException and expects the caller to unwrap it; ClientNamenodeProtocolTranslatorPB is the component that takes care of that. {noformat} at org.apache.hadoop.ipc.Client.call at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke at com.sun.proxy.$Proxy26.getFileInfo at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo at sun.reflect.GeneratedMethodAccessor24.invoke at sun.reflect.DelegatingMethodAccessorImpl.invoke at java.lang.reflect.Method.invoke at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke at com.sun.proxy.$Proxy27.getFileInfo at org.apache.hadoop.hdfs.DFSClient.getFileInfo at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus {noformat} However, for non-HA case, the call stack is DFSClient -> ClientNamenodeProtocolTranslatorPB -> RetryInvocationHandler -> ProtobufRpcEngine. RetryInvocationHandler gets ServiceException and can't be retried properly. {noformat} at org.apache.hadoop.ipc.Client.call at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke at com.sun.proxy.$Proxy9.getListing at sun.reflect.NativeMethodAccessorImpl.invoke0 at sun.reflect.NativeMethodAccessorImpl.invoke at sun.reflect.DelegatingMethodAccessorImpl.invoke at java.lang.reflect.Method.invoke at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke at com.sun.proxy.$Proxy9.getListing at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing at org.apache.hadoop.hdfs.DFSClient.listPaths {noformat} Perhaps, we can fix it by have NN wrap RetryInvocationHandler around ClientNamenodeProtocolTranslatorPB and other PBs, instead of the current wrap order. -- This message was sent by Atlassian JIRA (v6.2#6252)