Hello,
We have two HOD questions:
(1) For our current Torque PBS setup, the number of nodes requested by
HOD (-l nodes=X) corresponds to the number of CPUs allocated, however
these nodes can be spread across various partially or empty nodes.
Unfortunately, HOD does not appear to honour the
Craig,
Hello,
We have two HOD questions:
(1) For our current Torque PBS setup, the number of nodes requested by
HOD (-l nodes=X) corresponds to the number of CPUs allocated, however
these nodes can be spread across various partially or empty nodes.
Unfortunately, HOD does not appear to
Hemanth,
Just FYI, at Yahoo! we've set torque to allocate separate nodes for
the number specified to HOD. In other words, the number corresponds to
the number of nodes, not processors. This has proved simpler to
manage. I forget right now, but I think you can make Torque behave
like this (to
Craig,
While HOD does not do this automatically, please note that since you
are bringing up a Map/Reduce cluster on the allocated nodes, you can
submit map/reduce parameters with which to bring up the cluster when
allocating jobs. The relevant options are
--gridservice-mapred.server-params (or
Hi Hemanth,
While HOD does not do this automatically, please note that since you
are bringing up a Map/Reduce cluster on the allocated nodes, you can
submit map/reduce parameters with which to bring up the cluster when
allocating jobs. The relevant options are
--gridservice-mapred.server-para
My hadoop jobs don't start
This is configured to use an existing DFS and to unpack a tarball with a
cut down 0.16.0 config
I have looked in the mom logs on the client machines and am not getting
anything meaningful.
The hadoop ports are biased by 1000 to allow another cluster to run on
this
Jason Venner wrote:
My hadoop jobs don't start
This is configured to use an existing DFS and to unpack a tarball with
a cut down 0.16.0 config
I have looked in the mom logs on the client machines and am not
getting anything meaningful.
What is your hod command line ? Specifically, how did you
Well, this finally started to work, after we learned how to debug.
There were 2 issues, 1, the torque scp command was passing 3 arguments
instead of 2, and this was causing the error logs to get eaten.
On our master node, the dfs hod is installed in a different place that
on the child nodes,