HOD questions

2008-12-17 Thread Craig Macdonald
Hello, We have two HOD questions: (1) For our current Torque PBS setup, the number of nodes requested by HOD (-l nodes=X) corresponds to the number of CPUs allocated, however these nodes can be spread across various partially or empty nodes. Unfortunately, HOD does not appear to honour the

Re: HOD questions

2008-12-17 Thread Hemanth Yamijala
Craig, Hello, We have two HOD questions: (1) For our current Torque PBS setup, the number of nodes requested by HOD (-l nodes=X) corresponds to the number of CPUs allocated, however these nodes can be spread across various partially or empty nodes. Unfortunately, HOD does not appear to

Re: HOD questions

2008-12-18 Thread Craig Macdonald
Hemanth, Just FYI, at Yahoo! we've set torque to allocate separate nodes for the number specified to HOD. In other words, the number corresponds to the number of nodes, not processors. This has proved simpler to manage. I forget right now, but I think you can make Torque behave like this (to

Re: HOD questions

2008-12-18 Thread Hemanth Yamijala
Craig, While HOD does not do this automatically, please note that since you are bringing up a Map/Reduce cluster on the allocated nodes, you can submit map/reduce parameters with which to bring up the cluster when allocating jobs. The relevant options are --gridservice-mapred.server-params (or

Re: HOD questions

2008-12-19 Thread Craig Macdonald
Hi Hemanth, While HOD does not do this automatically, please note that since you are bringing up a Map/Reduce cluster on the allocated nodes, you can submit map/reduce parameters with which to bring up the cluster when allocating jobs. The relevant options are --gridservice-mapred.server-para

More HOD questions 0.16.0 - debug log enclosed - help with how to debug

2008-02-25 Thread Jason Venner
My hadoop jobs don't start This is configured to use an existing DFS and to unpack a tarball with a cut down 0.16.0 config I have looked in the mom logs on the client machines and am not getting anything meaningful. The hadoop ports are biased by 1000 to allow another cluster to run on this

Re: More HOD questions 0.16.0 - debug log enclosed - help with how to debug

2008-02-26 Thread Hemanth Yamijala
Jason Venner wrote: My hadoop jobs don't start This is configured to use an existing DFS and to unpack a tarball with a cut down 0.16.0 config I have looked in the mom logs on the client machines and am not getting anything meaningful. What is your hod command line ? Specifically, how did you

Re: More HOD questions 0.16.0 - debug log enclosed - help with how to debug - solved

2008-02-26 Thread Jason Venner
Well, this finally started to work, after we learned how to debug. There were 2 issues, 1, the torque scp command was passing 3 arguments instead of 2, and this was causing the error logs to get eaten. On our master node, the dfs hod is installed in a different place that on the child nodes,