Problems running a HOD test cluster

2008-02-21 Thread Luca
Hello everyone, I've been trying to run HOD on a sample cluster with three nodes that already have Torque installed and (hopefully?) properly working. I also prepared a configuration file for hod, that I'm gonna paste at the end of this email. A few questions: - is Java6 ok for HOD? - I have

Re: Problems running a HOD test cluster

2008-02-22 Thread Allen Wittenauer
On 2/21/08 10:52 AM, "Luca" <[EMAIL PROTECTED]> wrote: > A few questions: > - is Java6 ok for HOD? That's what we use. > - I have an externally running HDFS cluster, as specified in > [gridservice-hdfs]: how do I find out the fs_port of my cluster? IS it > something specified in the hadoop-si

Re: Problems running a HOD test cluster

2008-02-22 Thread Jason Venner
We have been unable to get torque up and running. The magic value in the server_name file seems to elude us. We have tried localhost, 127.0.0.1, machine name, machine ip, fq machine name. Depending on what we use, we either get Unauthorized request or invalid entry qmgr obj= svr=default: Bad ACL

Re: Problems running a HOD test cluster

2008-02-22 Thread Allen Wittenauer
On 2/22/08 3:58 PM, "Jason Venner" <[EMAIL PROTECTED]> wrote: > We have been unable to get torque up and running. The magic value in the > server_name file seems to elude us. The server_name should be the real hostname of the machine running pbs_server. > We have tried localhost, 127.0.0.1, m

Re: Problems running a HOD test cluster

2008-02-25 Thread Luca
Allen Wittenauer wrote: [2008-02-21 19:46:11,014] ERROR/40 torque:96 - qstat error: exit code: 153 | signal: False | core False [2008-02-21 19:46:11,017] INFO/20 hadoop:451 - Ringmaster at : None. I bet your ringmaster didn't come up. Check which nodes were allocated to your job via qstat