Ralph Castain writes:
> That’s an SGE error message - looks like your tmp file system on one
> of the remote nodes is full.
Yes; surely that just needs to be fixed, and I'd expect the host not to
accept jobs in that state. It's not just going to break ompi.
> We don’t control where SGE puts it
That’s an SGE error message - looks like your tmp file system on one of the
remote nodes is full. We don’t control where SGE puts its files, but it might
be that your backend nodes are having issues with us doing a tree-based launch
(i.e., where each backend daemon launches more daemons along th
I'm getting an error message early on:
[csclprd3-0-11:17355] [[36373,0],17] plm:rsh: using "/opt/sge/bin/lx-amd64/qrsh
-inherit -nostdin -V -verbose" for launching
unable to write to file /tmp/285019.1.verylong.q/qrsh_error: No space left on
device[csclprd3-6-10:18352] [[36373,0],21] plm:rsh: usi