subject:"\[OMPI users\] running OpenMPI jobs \(either 1.10.1 or 1.8.7\) on SoGE more problems"

Re: [OMPI users] running OpenMPI jobs (either 1.10.1 or 1.8.7) on SoGE more problems

2016-03-17 Thread Dave Love

Ralph Castain writes: > That’s an SGE error message - looks like your tmp file system on one > of the remote nodes is full. Yes; surely that just needs to be fixed, and I'd expect the host not to accept jobs in that state. It's not just going to break ompi. > We don’t control where SGE puts it

Re: [OMPI users] running OpenMPI jobs (either 1.10.1 or 1.8.7) on SoGE more problems

2016-03-16 Thread Ralph Castain

That’s an SGE error message - looks like your tmp file system on one of the remote nodes is full. We don’t control where SGE puts its files, but it might be that your backend nodes are having issues with us doing a tree-based launch (i.e., where each backend daemon launches more daemons along th

[OMPI users] running OpenMPI jobs (either 1.10.1 or 1.8.7) on SoGE more problems

2016-03-16 Thread Lane, William

I'm getting an error message early on: [csclprd3-0-11:17355] [[36373,0],17] plm:rsh: using "/opt/sge/bin/lx-amd64/qrsh -inherit -nostdin -V -verbose" for launching unable to write to file /tmp/285019.1.verylong.q/qrsh_error: No space left on device[csclprd3-6-10:18352] [[36373,0],21] plm:rsh: usi