nemon lou created HADOOP-9655:
---------------------------------

             Summary: IPC Client call to the same host with multi thread takes 
very long time to report connection time out for many times 
                 Key: HADOOP-9655
                 URL: https://issues.apache.org/jira/browse/HADOOP-9655
             Project: Hadoop Common
          Issue Type: Bug
          Components: ipc
    Affects Versions: 2.0.4-alpha
            Reporter: nemon lou


When one machine power off during running a job ,MRAppMaster find tasks timed 
out on that host and then call stop container for each container concurrently.
But the IPC layer did it serially, for each call,the connection time out 
exception toke a few minutes to raise after 45 times reties. And AM hang for 
many hours to wait for stopContainer to finish.
The jstack output file shows that most threads stuck at Connection.addCall 
waiting for a lock object hold by  Connection.setupIOstreams.
(The setupIOstreams method run slowlly becauseof connection time out during 
setupconnection.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to