Hi

If I remember right, there were issues in the past with setting TMPDIR
on an NFS share.  Maybe the same problem happens on Lustre?

http://arc.liv.ac.uk/pipermail/gridengine-users/2009-November/027767.html

FWIW, we leave it to the default local /tmp, and it works.

Gus Correa

On 03/03/2014 06:57 PM, Jeff Squyres (jsquyres) wrote:
How about setting TMPDIR to a local filesystem?


On Mar 3, 2014, at 3:43 PM, Beichuan Yan<beichuan....@colorado.edu>  wrote:

I agree there are two cases for pure-MPI mode: 1. Job fails with no apparent reason;  2 
job complains shared-memory file on network file system, which can be resolved by " 
export TMPDIR=/home/yanb/tmp", /home/yanb/tmp is my local directory. The default 
TMPDIR points to a Lustre directory.

There is no any other output. I checked my job with "qstat -n" and found that processes 
were actually not started on compute nodes even though PBS Pro has "started" my job.

Beichuan

3. Then I test pure-MPI mode: OPENMP is turned off, and each compute node runs 16 processes 
(clearly shared-memory of MPI is used). Four combinations of "TMPDIR" and "TCP" 
are tested:
case 1:
#export TMPDIR=/home/yanb/tmp
TCP="--mca btl_tcp_if_include 10.148.0.0/16"
mpirun $TCP -np 64 -npernode 16 -hostfile $PBS_NODEFILE ./paraEllip3d
input.txt
output:
Start Prologue v2.5 Mon Mar  3 15:47:16 EST 2014 End Prologue v2.5 Mon
Mar  3 15:47:16 EST 2014
-bash: line 1: 448597 Terminated              
/var/spool/PBS/mom_priv/jobs/602244.service12.SC
Start Epilogue v2.5 Mon Mar  3 15:50:51 EST 2014 Statistics
cpupercent=0,cput=00:00:00,mem=7028kb,ncpus=128,vmem=495768kb,walltime
=00:03:24 End Epilogue v2.5 Mon Mar  3 15:50:52 EST 2014

It looks like you have two general cases:

1. The job fails for no apparent reason (like above), or 2. The job complains 
that your TMPDIR is on a shared filesystem

Right?

I think the real issue, then, is to figure out why your jobs are failing with 
no output.

Is there anything in the stderr output?

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Reply via email to