nemon lou created HADOOP-9655: --------------------------------- Summary: IPC Client call to the same host with multi thread takes very long time to report connection time out for many times Key: HADOOP-9655 URL: https://issues.apache.org/jira/browse/HADOOP-9655 Project: Hadoop Common Issue Type: Bug Components: ipc Affects Versions: 2.0.4-alpha Reporter: nemon lou
When one machine power off during running a job ,MRAppMaster find tasks timed out on that host and then call stop container for each container concurrently. But the IPC layer did it serially, for each call,the connection time out exception toke a few minutes to raise after 45 times reties. And AM hang for many hours to wait for stopContainer to finish. The jstack output file shows that most threads stuck at Connection.addCall waiting for a lock object hold by Connection.setupIOstreams. (The setupIOstreams method run slowlly becauseof connection time out during setupconnection.) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira