I agree there are two cases for pure-MPI mode: 1. Job fails with no apparent reason; 2 job complains shared-memory file on network file system, which can be resolved by " export TMPDIR=/home/yanb/tmp", /home/yanb/tmp is my local directory. The default TMPDIR points to a Lustre directory.
There is no any other output. I checked my job with "qstat -n" and found that processes were actually not started on compute nodes even though PBS Pro has "started" my job. Beichuan > 3. Then I test pure-MPI mode: OPENMP is turned off, and each compute node > runs 16 processes (clearly shared-memory of MPI is used). Four combinations > of "TMPDIR" and "TCP" are tested: > case 1: > #export TMPDIR=/home/yanb/tmp > TCP="--mca btl_tcp_if_include 10.148.0.0/16" > mpirun $TCP -np 64 -npernode 16 -hostfile $PBS_NODEFILE ./paraEllip3d > input.txt > output: > Start Prologue v2.5 Mon Mar 3 15:47:16 EST 2014 End Prologue v2.5 Mon > Mar 3 15:47:16 EST 2014 > -bash: line 1: 448597 Terminated > /var/spool/PBS/mom_priv/jobs/602244.service12.SC > Start Epilogue v2.5 Mon Mar 3 15:50:51 EST 2014 Statistics > cpupercent=0,cput=00:00:00,mem=7028kb,ncpus=128,vmem=495768kb,walltime > =00:03:24 End Epilogue v2.5 Mon Mar 3 15:50:52 EST 2014 It looks like you have two general cases: 1. The job fails for no apparent reason (like above), or 2. The job complains that your TMPDIR is on a shared filesystem Right? I think the real issue, then, is to figure out why your jobs are failing with no output. Is there anything in the stderr output? -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users