Hello, I am fairly new to the Hadoop framework. So I appreciate your patience in case my email has not entirely correct or the terminology is wrong. I have a working installation. However, I am facing a few issues:
1) I have run PI example a number of times. The number of slave nodes used is 4. Most times the runtime is about 31 secs. Other times, i varies widely and goes up to 650 secs. What could be causing this? This is a dedicated cluster with no other workloads 2) "nodemanager did not stop gracefully after 5 seconds: killing with kill -9" Every time during shutdown, the nodemanager is forcibly killed because it doesnt respond in 5 seconds. I dug through the logs and dont find any thing off. One thing I found is noted in (3). 3) I see errors as follows: "2014-03-31 12:27:26,975 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container complete event for unknown container id container_1396286812424_0001_01_000042" My searches indicate this is because the connection to the appmaster is lost. I cant seem to find where the appmaster logs are 4) If Proxy server needed? I did not set the " yarn.web-proxy.address" and so it never starts. My understand is that it starts as a part of RM in this case. 5) RDMA based shuffle - Mellanox seems to have contributed code for RDMA shuffle instead of HTTP. Is this part of YARN? If yes, how do I enable it? Is UDA required for RDMA Shuffle. 6) If I want to provide support for a new file system, is there a tutorial on what all needs to be implemented? I found that org.apache.hadoop.fs.FileSystem is the class to extend. However, a sample code or documentation would help. Appreciate the help. Regards, Casey