Re: [OMPI users] OpenMPI job initializing problem

Ralph Castain Fri, 21 Mar 2014 09:07:35 -0400 (EDT)

One thing to check would be the time spent between MPI_Init and MPI_Finalize - 
i.e., see if the time difference is caused by differences in init and finalize 
themselves. My guess is that is the source - would help us target the problem.



On Mar 20, 2014, at 9:00 PM, Beichuan Yan <beichuan....@colorado.edu> wrote:

> Here is an example of my data measured in seconds:
> 
> communication overhead = commuT + migraT + print, compuT is computational 
> cost, totalT = compuT + communication overhead, overhead% denotes percentage 
> of communication overhead
> 
> intelmpi (walltime=00:03:51)
> iter         [commuT          migraT              printT]              compuT 
>            totalT          overhead%
> 3999   4.945993e-03   2.689362e-04   1.440048e-04   1.689100e-02   
> 2.224994e-02   2.343795e+01
> 5999   4.938126e-03   1.451969e-04   2.689362e-04   1.663089e-02   
> 2.198315e-02   2.312373e+01
> 7999   4.904985e-03   1.490116e-04   1.451969e-04   1.678491e-02   
> 2.198410e-02   2.298933e+01
> 9999   4.915953e-03   1.380444e-04   1.490116e-04   1.687193e-02   
> 2.207494e-02   2.289473e+01
> 
> openmpi (walltime=00:04:32)
> iter          [commuT          migraT             printT]              compuT 
>              totalT         overhead%
> 3999   3.574133e-03   1.139641e-04   1.089573e-04   1.598001e-02   
> 1.977706e-02   1.864836e+01
> 5999   3.574848e-03   1.189709e-04   1.139641e-04   1.599526e-02   
> 1.980305e-02   1.865278e+01
> 7999   3.571033e-03   1.168251e-04   1.189709e-04   1.601100e-02   
> 1.981783e-02   1.860879e+01
> 9999   3.587008e-03   1.258850e-04   1.168251e-04   1.596618e-02   
> 1.979589e-02   1.875587e+01
> 
> It can be seen that Open MPI is faster in both communication and computation 
> measured by MPI_Wtime calls, but the wall time reported by PBS pro is larger.
> 
> 
> -----Original Message-----
> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gus Correa
> Sent: Thursday, March 20, 2014 15:08
> To: Open MPI Users
> Subject: Re: [OMPI users] OpenMPI job initializing problem
> 
> On 03/20/2014 04:48 PM, Beichuan Yan wrote:
>> Ralph and Noam,
>> 
>> Thanks for the clarifications, they are important.
> I could be wrong in understanding the filesystem.
>> 
>> Spirit appears to use a scratch directory for
> shared memory backing which is mounted on Lustre, and does not seem to have 
> local directories or does not allow user to change TEMPDIR. Here is the info:
>> [compute node]$ stat -f -L -c %T /tmp
>> tmpfs
>> [compute node]$ stat -f -L -c %T /home/yanb/scratch lustre
>> 
> 
> So, /tmp is a tmpfs, in memory/RAM.
> Maybe they don't open writing permissions for regular users on /tmp?
> 
>> On another university supercomputer, I found the following:
>> node0448[~]$ stat -f -L -c %T /tmp
>> ramfs
>> node0448[~]$ stat -f -L -c %T /home/yanb/scratch/ lustre Is this /tmp
>> at compute node a local directory? I don't know how to tell it.
>> 
>> Thanks,
>> Beichuan
>> 
>> 
>> 
>> -----Original Message-----
>> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph
>> Castain
>> Sent: Thursday, March 20, 2014 12:13
>> To: Open MPI Users
>> Subject: Re: [OMPI users] OpenMPI job initializing problem
>> 
>> 
>> On Mar 20, 2014, at 9:48 AM, Beichuan Yan <beichuan....@colorado.edu> wrote:
>> 
>>> Hi,
>>> 
>>> Today I tested OMPI v1.7.5rc5 and surprisingly, it works like a charm!
>>> 
>>> I found discussions related to this issue:
>>> 
>>> 1. http://www.open-mpi.org/community/lists/users/2011/11/17688.php
>>> The correct solution here is get your sys admin to make /tmp local. Making 
>>> /tmp NFS mounted across multiple nodes is a major "faux pas" in the Linux 
>>> world - it should never be done, for the reasons stated by Jeff.
>>> 
>>> my comment: for most clusters I have used, /tmp is NOT local. Open MPI 
>>> community may not enforce it.
>> 
>> We don't enforce anything, but /tmp being network mounted is a VERY
>> unusual situation in the cluster world, and highly unrecommended
>> 
>> 
>>> 
>>> 2. http://www.open-mpi.org/community/lists/users/2011/11/17684.php
>>> In the upcoming OMPI v1.7, we revamped the shared memory setup code such 
>>> that it'll actually use /dev/shm properly, or use some other mechanism 
>>> other than a mmap file backed in a real filesystem. So the issue goes away.
>>> 
>>> my comment: up to OMPI v1.7.4, this shmem issue is still there. However, it 
>>> is resolved in OMPI v1.7.5rc5. This is surprising.
>>> 
>>> Anyway, OMPI v1.7.5rc5 works well for multi-processes-on-one-node (shmem) 
>>> mode on Spirit. There is no need to tune TCP or IB parameters to use it. My 
>>> code just runs well:
>>> 
>>> My test data takes 20 minutes to run with OMPI v1.7.4, but needs less than 
>>> 1 minute with OMPI v1.7.5rc5. I don't know what the magic is. I am 
>>> wondering when OMPI v1.7.5 final will be released.
>>> 
>>> I will update performance comparison between Intel MPI and Open MPI.
>>> 
>>> Thanks,
>>> Beichuan
>>> 
>>> 
>>> 
>>> -----Original Message-----
>>> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gus
>>> Correa
>>> Sent: Friday, March 07, 2014 18:41
>>> To: Open MPI Users
>>> Subject: Re: [OMPI users] OpenMPI job initializing problem
>>> 
>>> On 03/06/2014 04:52 PM, Beichuan Yan wrote:
>>>> No, I did all these and none worked.
>>>> 
>>>> I just found, with exact the same code, data and job settings, a job can 
>>>> really run one day while cannot the other day. It is NOT repeatable. I 
>>>> don't know what the problem is: hardware? OpenMPI? PBS Pro?
>>>> 
>>>> Anyway, I may have to give up using OpenMPI on that system and switch to 
>>>> IntelMPI which always work.
>>>> 
>>>> Thanks,
>>>> Beichuan
>>> 
>>> Well, this machine may have been setup to run only Intel MPI (DAPL?) and 
>>> SGI MPI.
>>> It is a pity that it doesn't seem to work with OpenMPI.
>>> 
>>> In any case, good luck with your research project.
>>> 
>>> Gus Correa
>>> 
>>>> 
>>>> -----Original Message-----
>>>> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gus
>>>> Correa
>>>> Sent: Thursday, March 06, 2014 13:51
>>>> To: Open MPI Users
>>>> Subject: Re: [OMPI users] OpenMPI job initializing problem
>>>> 
>>>> On 03/06/2014 03:35 PM, Beichuan Yan wrote:
>>>>> Gus,
>>>>> 
>>>>> Yes, 10.148.0.0/16 is the IB subnet.
>>>>> 
>>>>> I did try others but none worked:
>>>>> #export
>>>>> TCP="--mca btl sm,openib"
>>>>> No run, no output
>>>> 
>>>> If I remember right, and unless this changed in recent OMPI vervsions, you 
>>>> also need "self":
>>>> 
>>>> -mca btl sm,openib,self
>>>> 
>>>> Alternatively, you could rule out tcp:
>>>> 
>>>> -mca btl ^tcp
>>>> 
>>>>> 
>>>>> #export
>>>>> TCP="--mca btl sm,openib --mca btl_tcp_if_include 10.148.0.0/16"
>>>>> No run, no output
>>>>> 
>>>>> Beichuan
>>>> 
>>>> Likewise, "self" is missing here.
>>>> 
>>>> Also, I don't know if you can ask for openib and also add --mca 
>>>> btl_tcp_if_include 10.148.0.0/16.
>>>> Note that one turns off tcp (I think), whereas the other requests a
>>>> tcp interface (or that the IB interface with IPoIB functionality).
>>>> That combination sounds weird to me.
>>>> The OMPI developers may clarify if this is valid syntax/syntax combination.
>>>> 
>>>> I would try simply -mca btl sm,openib,self, which is likely to give
>>>> you the IB transport with verbs, plus shared memory intra-node, plus
>>>> the
>>>> (mandatory?) self (loopback interface?).
>>>> In my experience, this will also help identify any malfunctioning IB HCA 
>>>> in the nodes (with a failure/error message).
>>>> 
>>>> 
>>>> I hope it helps,
>>>> Gus Correa
>>>> 
>>>> 
>>>>> 
>>>>> -----Original Message-----
>>>>> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gus
>>>>> Correa
>>>>> Sent: Thursday, March 06, 2014 13:16
>>>>> To: Open MPI Users
>>>>> Subject: Re: [OMPI users] OpenMPI job initializing problem
>>>>> 
>>>>> Hi Beichuan
>>>>> 
>>>>> So, it looks like that now the program runs, even though with specific 
>>>>> settings depending on whether you're using OMPI 1.6.5 or 1.7.4, right?
>>>>> 
>>>>> It looks like the problem now is performance, right?
>>>>> 
>>>>> System load affects performance, but unless the network is overwhelmed, 
>>>>> or perhaps the Lustre file system is hanging or too slow, I would think 
>>>>> that a walltime increase from 1min to 10min is not related to system 
>>>>> load, but something else.
>>>>> 
>>>>> Do you remember the setup that gave you 1min walltime?
>>>>> Was it the same that you sent below?
>>>>> Do you happen to know which nodes?
>>>>> Are you sharing nodes with other jobs, or are you running alone on the 
>>>>> nodes?
>>>>> Sharing with other processes may slow down your job.
>>>>> If you request all cores in the node, PBS should give you a full node 
>>>>> (unless they tricked PBS to think the nodes have more cores than they 
>>>>> actually do).
>>>>> How do you request the nodes in your #PBS directives?
>>>>> Do you request nodes and ppn, or do you request procs?
>>>>> 
>>>>> I suggest that you do:
>>>>> cat $PBS_NODEFILE
>>>>> in your PBS script, just to document which nodes are actually given to 
>>>>> you.
>>>>> 
>>>>> Also helpful to document/troubleshoot is to add -v and -tag-output to 
>>>>> your mpiexec command line.
>>>>> 
>>>>> 
>>>>> The difference in walltime could be due to some malfunction of IB HCAs on 
>>>>> the nodes, for instance.
>>>>> Since you are allowing (if I remember right) the use of TCP, OpenMPI will 
>>>>> try to use any interfaces that you did not rule out.
>>>>> If your mpiexec command line doesn't make any restriction, it will use 
>>>>> anything available, if I remember right.
>>>>> (Jeff will correct me in the next second.) If your mpiexec command line 
>>>>> has mca btl_tcp_if_include 10.148.0.0/16 it will use the 10.148.0.0/16 
>>>>> subnet in with TCP transport, I think.
>>>>> (Jeff will cut my list subscription after that one, for spreading
>>>>> misinformation.)
>>>>> 
>>>>> In either case my impression is that you may have left a door open to the 
>>>>> use of non-IB (and non-IB-verbs) transport.
>>>>> 
>>>>> Is 10.148.0.0/16 the an Infiniband subnet or an Ethernet subnet?
>>>>> 
>>>>> Did you remeber Jeff's suggestion from a while ago to avoid TCP (over 
>>>>> Ethernet or over IB), and stick to IB verbs?
>>>>> 
>>>>> 
>>>>> Is 10.148.0.0/16 the IB or the Ethernet subnet?
>>>>> 
>>>>> On 03/02/2014 02:38 PM, Jeff Squyres (jsquyres) wrote:
>>>>>>  Both 1.6.x and 1.7.x/1.8.x will need verbs.h to use the native
>>>>>> verbs  network stack.
>>>>>> 
>>>>>>  You can use emulated TCP over IB (e.g., using the OMPI TCP BTL),
>>>>>> but  it's nowhere near as fast/efficient the native verbs network stack.
>>>>>> 
>>>>> 
>>>>> 
>>>>> You could force the use of IB verbs with
>>>>> 
>>>>> -mca btl ^tcp
>>>>> 
>>>>> or with
>>>>> 
>>>>> -mca btl sm,openib,self
>>>>> 
>>>>> on the mpiexec command line.
>>>>> 
>>>>> In this case, if any of the IB HCAs on the nodes is bad, the job
>>>>> will abort with an error message, instead of running too slow (if
>>>>> it is using other networks).
>>>>> 
>>>>> There are also ways to tell OMPI to do a more verbose output, that
>>>>> may perhaps help diagnose the problem.
>>>>> ompi_info | grep verbose
>>>>> may give some hints (I confess I don't remember them).
>>>>> 
>>>>> 
>>>>> Believe me, this did happen to me, i.e., to run MPI programs in a
>>>>> cluster that had all sorts of non-homogeneous nodes, some with
>>>>> faulty IB HCAs, some with incomplete OFED installation, some that
>>>>> were not mounting shared file systems properly, etc.
>>>>> [I didn't administer that one!]
>>>>> Hopefully that is not the problem you are facing, but verbose
>>>>> output may help anyways.
>>>>> 
>>>>> I hope this helps,
>>>>> Gus Correa
>>>>> 
>>>>> 
>>>>> 
>>>>> On 03/06/2014 01:49 PM, Beichuan Yan wrote:
>>>>>> 1. For $TMPDIR and $TCP, there are four combinations by commenting 
>>>>>> on/off (note the system's default TMPDIR=/work3/yanb):
>>>>>> export TMPDIR=/work1/home/yanb/tmp TCP="--mca btl_tcp_if_include
>>>>>> 10.148.0.0/16"
>>>>>> 
>>>>>> 2. I tested the 4 combinations for OpenMPI 1.6.5 and OpenMPI 1.7.4 
>>>>>> respectively for the pure-MPI mode (no OPENMP threads; 8 nodes, each 
>>>>>> node runs 16 processes). The results are weird: of all 8 cases, only TWO 
>>>>>> of them can run, but run so slow:
>>>>>> 
>>>>>> OpenMPI 1.6.5:
>>>>>> export TMPDIR=/work1/home/yanb/tmp TCP="--mca btl_tcp_if_include
>>>>>> 10.148.0.0/16"
>>>>>> Warning: shared-memory, /work1/home/yanb/tmp/ Run, take 10
>>>>>> minutes, slow
>>>>>> 
>>>>>> OpenMPI 1.7.4:
>>>>>> #export TMPDIR=/work1/home/yanb/tmp #TCP="--mca btl_tcp_if_include
>>>>>> 10.148.0.0/16"
>>>>>> Warning: shared-memory /work3/yanb/605832.SPIRIT/ Run, take 10
>>>>>> minutess, slow
>>>>>> 
>>>>>> So you see, a) openmpi 1.6.5 and 1.7.4 need different settings to
>>>>>> run;
>>>>> b) whether specifying TMPDIR, I got the shared memory warning.
>>>>>> 
>>>>>> 3. But a few days ago, OpenMPI 1.6.5 worked great and took only 1
>>>>>> minute
>>>>> (now it takes 10 minutes). I am so confused by the results.
>>>>> Does the system loading level or fluctuation or PBS pro affect
>>>>> OpenMPI performance?
>>>>>> 
>>>>>> Thanks,
>>>>>> Beichuan
>>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gus
>>>>>> Correa
>>>>>> Sent: Tuesday, March 04, 2014 08:48
>>>>>> To: Open MPI Users
>>>>>> Subject: Re: [OMPI users] OpenMPI job initializing problem
>>>>>> 
>>>>>> Hi Beichuan
>>>>>> 
>>>>>> So, from "df" it looks like /home is /work1, right?
>>>>>> 
>>>>>> Also, "mount" shows only /work[1-4], not the other
>>>>>> 7 CWFS panfs (Panasas?), which apparently are not available in the 
>>>>>> compute nodes/blades.
>>>>>> 
>>>>>> I presume you have access and are using only some of the
>>>>>> /work[1-4]
>>>>>> (lustre) file systems for all your MPI and other software installation, 
>>>>>> right? Not the panfs, right?
>>>>>> 
>>>>>> Awkward that it doesn't work, because lustre is supposed to be a 
>>>>>> parallel file system, highly available to all nodes (assuming it is 
>>>>>> mounted on all nodes).
>>>>>> 
>>>>>> It also shows a small /tmp with a tmpfs file system, which is volatile, 
>>>>>> in memory:
>>>>>> 
>>>>>> http://en.wikipedia.org/wiki/Tmpfs
>>>>>> 
>>>>>> I would guess they don't let you write there, so TMPDIR=/tmp may not be 
>>>>>> a possible option, but this is just a wild guess.
>>>>>> Or maybe OMPI requires an actual non-volatile file system to write its 
>>>>>> shared memory auxiliary files and other stuff that normally goes on 
>>>>>> /tmp?  [Jeff, Ralph, help!!] I kind of remember some old discussion on 
>>>>>> this list about this, but maybe it was in another list.
>>>>>> 
>>>>>> [You could ask the sys admin about this, and perhaps what he
>>>>>> recommends to use to replace /tmp.]
>>>>>> 
>>>>>> Just in case they may have some file system mount point mixup, you could 
>>>>>> try perhaps TMPDIR=/work1/yanb/tmp (rather than /home) You could also 
>>>>>> try TMPDIR=/work3/yanb/tmp, as if I remember right this is another file 
>>>>>> system you have access to (not sure anymore, it may have been in the 
>>>>>> previous emails).
>>>>>> Either way, you may need to create the tmp directory beforehand.
>>>>>> 
>>>>>> **
>>>>>> 
>>>>>> Any chances that this is an environment mixup?
>>>>>> 
>>>>>> Say, that you may be inadvertently using the SGI-MPI mpiexec Using a 
>>>>>> /full/path/to/mpiexec in your job may clarify this.
>>>>>> 
>>>>>> "which mpiexec" will tell, but since the environment on the compute 
>>>>>> nodes may not be exactly the same as in the login node, it may not be 
>>>>>> reliable information.
>>>>>> 
>>>>>> Or perhaps you may not be pointing to the OMPI libraries?
>>>>>> Are you exporting PATH and LD_LIBRARY_PATH on .bashrc/.tcshrc, with the 
>>>>>> OMPI items (bin and lib) *PREPENDED* (not appended), so as to take 
>>>>>> precedence over other possible/SGI/pre-existent MPI items?
>>>>>> 
>>>>>> Those are pretty (ugly) common problems.
>>>>>> 
>>>>>> **
>>>>>> 
>>>>>> I hope this helps,
>>>>>> Gus Correa
>>>>>> 
>>>>>> On 03/03/2014 10:13 PM, Beichuan Yan wrote:
>>>>>>> 1. info from a compute node
>>>>>>> -bash-4.1$ hostname
>>>>>>> r32i1n1
>>>>>>> -bash-4.1$ df -h /home
>>>>>>> Filesystem            Size  Used Avail Use% Mounted on
>>>>>>> 10.148.18.45@o2ib:10.148.18.46@o2ib:/fs1
>>>>>>>                          1.2P  136T  1.1P  12% /work1 -bash-4.1$
>>>>>>> mount devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs on
>>>>>>> /tmp type tmpfs (rw,size=150m) none on /proc/sys/fs/binfmt_misc
>>>>>>> type binfmt_misc
>>>>>>> (rw) cpuset on /dev/cpuset type cpuset (rw)
>>>>>>> 10.148.18.45@o2ib:10.148.18.46@o2ib:/fs1 on /work1 type lustre
>>>>>>> (rw,flock)
>>>>>>> 10.148.18.76@o2ib:10.148.18.164@o2ib:/fs2 on /work2 type lustre
>>>>>>> (rw,flock)
>>>>>>> 10.148.18.104@o2ib:10.148.18.165@o2ib:/fs3 on /work3 type lustre
>>>>>>> (rw,flock)
>>>>>>> 10.148.18.132@o2ib:10.148.18.133@o2ib:/fs4 on /work4 type lustre
>>>>>>> (rw,flock)
>>>>>>> 
>>>>>>> 
>>>>>>> 2. For "export TMPDIR=/home/yanb/tmp", I created it beforehand, and I 
>>>>>>> did see mpi-related temporary files there when the job gets started.
>>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gus
>>>>>>> Correa
>>>>>>> Sent: Monday, March 03, 2014 18:23
>>>>>>> To: Open MPI Users
>>>>>>> Subject: Re: [OMPI users] OpenMPI job initializing problem
>>>>>>> 
>>>>>>> Hi Beichuan
>>>>>>> 
>>>>>>> OK, it says "unclassified.html", so I presume it is not a problem.
>>>>>>> 
>>>>>>> The web site says the computer is an SGI ICE X.
>>>>>>> I am not familiar to it, so what follows are guesses.
>>>>>>> 
>>>>>>> The SGI site brochure suggests that the nodes/blades have local disks:
>>>>>>> https://www.sgi.com/pdfs/4330.pdf
>>>>>>> 
>>>>>>> The file systems prefixed with IP addresses (work[1-4]) and with panfs 
>>>>>>> (cwfs and CWFS[1-6]) and a colon (:) are shared exports (not local), 
>>>>>>> but not necessarily NFS (panfs may be Panasas?).
>>>>>>>     From this output it is hard to tell where /home is, but I would 
>>>>>>> guess it is also shared (not local).
>>>>>>> Maybe "df -h /home" will tell.  Or perhaps "mount".
>>>>>>> 
>>>>>>> You may be logged in to a login/service node, so although it does have 
>>>>>>> a /tmp (your ls / shows tmp), this doesn't guarantee that the compute 
>>>>>>> nodes/blades also do.
>>>>>>> 
>>>>>>> Since your jobs failed when you specified TMPDIR=/tmp, I would guess 
>>>>>>> /tmp doesn't exist on the nodes/blades, or is not writable.
>>>>>>> 
>>>>>>> Did you try to submit a job with, say, "mpiexec -np 16 ls -ld /tmp"?
>>>>>>> This should tell if /tmp exists on the nodes, if it is writable.
>>>>>>> 
>>>>>>> A stupid question:
>>>>>>> When you tried your job with this:
>>>>>>> 
>>>>>>> export TMPDIR=/home/yanb/tmp
>>>>>>> 
>>>>>>> Did you create the directory /home/yanb/tmp beforehand?
>>>>>>> 
>>>>>>> Anyway, you may need to ask the help of a system administrator of this 
>>>>>>> machine.
>>>>>>> 
>>>>>>> Gus Correa
>>>>>>> 
>>>>>>> On 03/03/2014 07:43 PM, Beichuan Yan wrote:
>>>>>>>> Gus,
>>>>>>>> 
>>>>>>>> I am using this system: 
>>>>>>>> http://centers.hpc.mil/systems/unclassified.html#Spirit. I don't know 
>>>>>>>> exactly configurations of the file system. Here is the output of "df 
>>>>>>>> -h":
>>>>>>>> Filesystem            Size  Used Avail Use% Mounted on
>>>>>>>> /dev/sda6             919G   16G  857G   2% /
>>>>>>>> tmpfs                  32G     0   32G   0% /dev/shm
>>>>>>>> /dev/sda5             139M   33M  100M  25% /boot
>>>>>>>> adfs3v-s:/adfs3/hafs14
>>>>>>>>                           6.5T  678G  5.5T  11% /scratch
>>>>>>>> adfs3v-s:/adfs3/hafs16
>>>>>>>>                           6.5T  678G  5.5T  11% /var/spool/mail
>>>>>>>> 10.148.18.45@o2ib:10.148.18.46@o2ib:/fs1
>>>>>>>>                           1.2P  136T  1.1P  12% /work1
>>>>>>>> 10.148.18.132@o2ib:10.148.18.133@o2ib:/fs4
>>>>>>>>                           1.2P  793T  368T  69% /work4
>>>>>>>> 10.148.18.104@o2ib:10.148.18.165@o2ib:/fs3
>>>>>>>>                           1.2P  509T  652T  44% /work3
>>>>>>>> 10.148.18.76@o2ib:10.148.18.164@o2ib:/fs2
>>>>>>>>                           1.2P  521T  640T  45% /work2
>>>>>>>> panfs://172.16.0.10/CWFS
>>>>>>>>                           728T  286T  443T  40% /p/cwfs
>>>>>>>> panfs://172.16.1.61/CWFS1
>>>>>>>>                           728T  286T  443T  40% /p/CWFS1
>>>>>>>> panfs://172.16.0.210/CWFS2
>>>>>>>>                           728T  286T  443T  40% /p/CWFS2
>>>>>>>> panfs://172.16.1.125/CWFS3
>>>>>>>>                           728T  286T  443T  40% /p/CWFS3
>>>>>>>> panfs://172.16.1.224/CWFS4
>>>>>>>>                           728T  286T  443T  40% /p/CWFS4
>>>>>>>> panfs://172.16.1.224/CWFS5
>>>>>>>>                           728T  286T  443T  40% /p/CWFS5
>>>>>>>> panfs://172.16.1.224/CWFS6
>>>>>>>>                           728T  286T  443T  40% /p/CWFS6
>>>>>>>> panfs://172.16.1.224/CWFS7
>>>>>>>>                           728T  286T  443T  40% /p/CWFS7
>>>>>>>> 
>>>>>>>> 1. My home directory is /home/yanb.
>>>>>>>> My simulation files are located at /work3/yanb.
>>>>>>>> The default TMPDIR set by system is just /work3/yanb
>>>>>>>> 
>>>>>>>> 2. I did try not to set TMPDIR and let it default, which is just case 
>>>>>>>> 1 and case 2.
>>>>>>>>       Case1: #export TMPDIR=/home/yanb/tmp
>>>>>>>>                 TCP="--mca btl_tcp_if_include 10.148.0.0/16"
>>>>>>>>          It gives no apparent reason.
>>>>>>>>       Case2: #export TMPDIR=/home/yanb/tmp
>>>>>>>>                 #TCP="--mca btl_tcp_if_include 10.148.0.0/16"
>>>>>>>>          It gives warning of shared memory file on network file system.
>>>>>>>> 
>>>>>>>> 3. With "export TMPDIR=/tmp", the job gives the same, no apparent 
>>>>>>>> reason.
>>>>>>>> 
>>>>>>>> 4. FYI, "ls /" gives:
>>>>>>>> ELT    apps  cgroup  hafs1   hafs12  hafs2  hafs5  hafs8        home   
>>>>>>>> lost+found  mnt  p      root     selinux  tftpboot  var    work3
>>>>>>>> admin  bin   dev     hafs10  hafs13  hafs3  hafs6  hafs9        lib    
>>>>>>>> media       net  panfs  sbin     srv      tmp       work1  work4
>>>>>>>> app    boot  etc     hafs11  hafs15  hafs4  hafs7  hafs_x86_64  lib64  
>>>>>>>> misc        opt  proc   scratch  sys      usr       work2  workspace
>>>>>>>> 
>>>>>>>> Beichuan
>>>>>>>> 
>>>>>>>> -----Original Message-----
>>>>>>>> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gus
>>>>>>>> Correa
>>>>>>>> Sent: Monday, March 03, 2014 17:24
>>>>>>>> To: Open MPI Users
>>>>>>>> Subject: Re: [OMPI users] OpenMPI job initializing problem
>>>>>>>> 
>>>>>>>> Hi Beichuan
>>>>>>>> 
>>>>>>>> If you are using the university cluster, chances are that /home is not 
>>>>>>>> local, but on an NFS share, or perhaps Lustre (which you may have 
>>>>>>>> mentioned before, I don't remember).
>>>>>>>> 
>>>>>>>> Maybe "df -h" will show what is local what is not.
>>>>>>>> It works for NFS, it prefixes file systems with the server name, but I 
>>>>>>>> don't know about Lustre.
>>>>>>>> 
>>>>>>>> Did you try just not to set TMPDIR and let it default?
>>>>>>>> If the default TMPDIR is on Lustre (did you say this?, anyway I
>>>>>>>> don't
>>>>>>>> remember) you could perhaps try to force it to /tmp:
>>>>>>>> export TMPDIR=/tmp,
>>>>>>>> If the cluster nodes are diskfull /tmp is likely to exist and be local 
>>>>>>>> to the cluster nodes.
>>>>>>>> [But the cluster nodes may be diskless ... :( ]
>>>>>>>> 
>>>>>>>> I hope this helps,
>>>>>>>> Gus Correa
>>>>>>>> 
>>>>>>>> On 03/03/2014 07:10 PM, Beichuan Yan wrote:
>>>>>>>>> How to set TMPDIR to a local filesystem? Is /home/yanb/tmp a local 
>>>>>>>>> filesystem? I don't know how to tell a directory is local file system 
>>>>>>>>> or network file system.
>>>>>>>>> 
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of
>>>>>>>>> Jeff Squyres (jsquyres)
>>>>>>>>> Sent: Monday, March 03, 2014 16:57
>>>>>>>>> To: Open MPI Users
>>>>>>>>> Subject: Re: [OMPI users] OpenMPI job initializing problem
>>>>>>>>> 
>>>>>>>>> How about setting TMPDIR to a local filesystem?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Mar 3, 2014, at 3:43 PM, Beichuan Yan<beichuan....@colorado.edu>   
>>>>>>>>>     wrote:
>>>>>>>>> 
>>>>>>>>>> I agree there are two cases for pure-MPI mode: 1. Job fails with no 
>>>>>>>>>> apparent reason;  2 job complains shared-memory file on network file 
>>>>>>>>>> system, which can be resolved by " export TMPDIR=/home/yanb/tmp", 
>>>>>>>>>> /home/yanb/tmp is my local directory. The default TMPDIR points to a 
>>>>>>>>>> Lustre directory.
>>>>>>>>>> 
>>>>>>>>>> There is no any other output. I checked my job with "qstat -n" and 
>>>>>>>>>> found that processes were actually not started on compute nodes even 
>>>>>>>>>> though PBS Pro has "started" my job.
>>>>>>>>>> 
>>>>>>>>>> Beichuan
>>>>>>>>>> 
>>>>>>>>>>> 3. Then I test pure-MPI mode: OPENMP is turned off, and each 
>>>>>>>>>>> compute node runs 16 processes (clearly shared-memory of MPI is 
>>>>>>>>>>> used). Four combinations of "TMPDIR" and "TCP" are tested:
>>>>>>>>>>> case 1:
>>>>>>>>>>> #export TMPDIR=/home/yanb/tmp TCP="--mca btl_tcp_if_include
>>>>>>>>>>> 10.148.0.0/16"
>>>>>>>>>>> mpirun $TCP -np 64 -npernode 16 -hostfile $PBS_NODEFILE
>>>>>>>>>>> ./paraEllip3d input.txt
>>>>>>>>>>> output:
>>>>>>>>>>> Start Prologue v2.5 Mon Mar  3 15:47:16 EST 2014 End Prologue
>>>>>>>>>>> v2.5 Mon Mar  3 15:47:16 EST 2014
>>>>>>>>>>> -bash: line 1: 448597 Terminated              
>>>>>>>>>>> /var/spool/PBS/mom_priv/jobs/602244.service12.SC
>>>>>>>>>>> Start Epilogue v2.5 Mon Mar  3 15:50:51 EST 2014 Statistics
>>>>>>>>>>> cpupercent=0,cput=00:00:00,mem=7028kb,ncpus=128,vmem=495768kb
>>>>>>>>>>> ,
>>>>>>>>>>> w
>>>>>>>>>>> all
>>>>>>>>>>> t
>>>>>>>>>>> i
>>>>>>>>>>> m
>>>>>>>>>>> e
>>>>>>>>>>> =00:03:24 End Epilogue v2.5 Mon Mar  3 15:50:52 EST 2014
>>>>>>>>>> 
>>>>>>>>>> It looks like you have two general cases:
>>>>>>>>>> 
>>>>>>>>>> 1. The job fails for no apparent reason (like above), or 2.
>>>>>>>>>> The job complains that your TMPDIR is on a shared filesystem
>>>>>>>>>> 
>>>>>>>>>> Right?
>>>>>>>>>> 
>>>>>>>>>> I think the real issue, then, is to figure out why your jobs are 
>>>>>>>>>> failing with no output.
>>>>>>>>>> 
>>>>>>>>>> Is there anything in the stderr output?
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> Jeff Squyres
>>>>>>>>>> jsquy...@cisco.com
>>>>>>>>>> For corporate legal information go to:
>>>>>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>>>>>> 
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Jeff Squyres
>>>>>>>>> jsquy...@cisco.com
>>>>>>>>> For corporate legal information go to:
>>>>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>>>>> 
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> us...@open-mpi.org
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> us...@open-mpi.org
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> 
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] OpenMPI job initializing problem

Reply via email to