Akira Ajisaka created HDFS-15555:
------------------------------------

             Summary: RBF: Refresh cacheNS when SocketException occurs
                 Key: HDFS-15555
                 URL: https://issues.apache.org/jira/browse/HDFS-15555
             Project: Hadoop HDFS
          Issue Type: Sub-task
          Components: rbf
            Reporter: Akira Ajisaka
            Assignee: Akira Ajisaka


Problem:
When active NameNode is restarted and loading fsimage, DFSRouters significantly 
slow down.

Investigation:
When active NameNode is restarted and loading fsimage, RouterRpcClient receives 
SocketException. Since RouterRpcClient#isUnavailableException(IOException) 
returns false when the argument is SocketException, the 
MembershipNameNodeResolver#cacheNS is not refreshed. That's why the order of 
the NameNodes returned by 
MemberShipNameNodeResolver#getNamenodesForNameserviceId(String) is unchanged 
and the active NameNode is still returned first. Therefore RouterRpcClient 
still tries to connect to the NameNode that is loading fsimage.

After loading the fsimage, the NameNode throws StandbyException. The exception 
is one of the 'Unavailable Exception' and the cacheNS is refreshed.

Workaround:
Stop NameNode and wait 1 minute before starting NameNode instead of restarting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to