Hello,

I am fairly new to the Hadoop framework. So I appreciate your patience in
case my email has not entirely correct or the terminology is wrong. I have
a working installation. However, I am facing a few issues:

1) I have run PI example a number of times. The number of slave nodes used
is 4. Most times the runtime is about 31 secs. Other times, i varies widely
and goes up to 650 secs. What could be causing this? This is a dedicated
cluster with no other workloads

2) "nodemanager did not stop gracefully after 5 seconds: killing with kill
-9" Every time during shutdown, the nodemanager is forcibly killed because
it doesnt respond in 5 seconds. I dug through the logs and dont find any
thing off. One thing I found is noted in (3).

3) I see errors as follows: "2014-03-31 12:27:26,975 ERROR [RMCommunicator
Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Container complete event for unknown container id
container_1396286812424_0001_01_000042" My searches indicate this is
because the connection to the appmaster is lost. I cant seem to find where
the appmaster logs are

4) If Proxy server needed? I did not set the " yarn.web-proxy.address" and
so it never starts. My understand is that it starts as a part of RM in this
case.

5) RDMA based shuffle - Mellanox seems to have contributed code for RDMA
shuffle instead of HTTP. Is this part of YARN? If yes, how do I enable it?
Is UDA required for RDMA Shuffle.

6) If I want to provide support for a new file system, is there a tutorial
on what all needs to be implemented? I found that
org.apache.hadoop.fs.FileSystem is the class to extend. However, a sample
code or documentation would help.

Appreciate the help.

Regards,
Casey

Reply via email to