I apologize Ralph, I forgot to include my command line for invoking OpenMPI on 
SoGE:

qsub -q short.q -V -pe make 87 -b y mpirun -np 87 --prefix 
/hpc/apps/mpi/openmpi/1.10.1/ --hetero-nodes --mca btl ^sm --mca 
plm_base_verbose 5 /hpc/home/lanew/mpi/openmpi/a_1_10_1.out

a_1_10_1.out is my OpenMPI test code binary compiled under OpenMPI 1.10.1.

Thanks for the quick response!

-Bill L.

________________________________
From: users [users-boun...@open-mpi.org] on behalf of Ralph Castain 
[r...@open-mpi.org]
Sent: Thursday, March 17, 2016 4:44 PM
To: Open MPI Users
Subject: Re: [OMPI users] OpenMPI 1.10.1 *ix hard/soft open files limits >= 
4096 still required?

No, that shouldn’t be the issue any more - and that isn’t what the backtrace 
indicates. It looks instead like there was a problem with the shared memory 
backing file on a remote node, and that caused the vader shared memory BTL to 
segfault.

Try turning vader off and see if that helps - I’m not sure what you are using, 
but maybe “-mca btl ^vader” will suffice

Nathan - any other suggestions?


On Mar 17, 2016, at 4:40 PM, Lane, William 
<william.l...@cshs.org<mailto:william.l...@cshs.org>> wrote:

I remember years ago, OpenMPI (version 1.3.3) required the hard/soft open
files limits be >= 4096 in order to function when large numbers of slots
were requested (with 1.3.3 this was at roughly 85 slots). Is this requirement
still present for OpenMPI versions 1.10.1 and greater?

I'm having some issues now with OpenMPI version 1.10.1 that remind me
of the issues I had w/1.3.3 where OpenMPI worked fine as long as I don't
request too many slots.

When I look at the ulimits -a (soft limit) I see:
open files                      (-n) 1024

Ulimits -Ha (hard limit) gives:
open files                      (-n) 4096

I'm getting errors of the form:
[csclprd3-5-5:15248] [[40732,0],0] plm:base:receive got update_proc_state for 
job [40732,1]
[csclprd3-6-12:30567] *** Process received signal ***
[csclprd3-6-12:30567] Signal: Bus error (7)
[csclprd3-6-12:30567] Signal code: Non-existant physical address (2)
[csclprd3-6-12:30567] Failing at address: 0x2b3d19f72000
[csclprd3-6-12:30568] *** Process received signal ***
[csclprd3-6-12:30567] [ 0] /lib64/libpthread.so.0(+0xf500)[0x2b3d0f71f500]
[csclprd3-6-12:30567] [ 1] 
/hpc/apps/mpi/openmpi/1.10.1/lib/openmpi/mca_shmem_mmap.so(+0x1524)[0x2b3d10cb0524]
[csclprd3-6-12:30567] [ 2] 
/hpc/apps/mpi/openmpi/1.10.1/lib/openmpi/mca_btl_vader.so(+0x3674)[0x2b3d18494674]
[csclprd3-6-12:30567] [ 3] 
/hpc/apps/mpi/openmpi/1.10.1/lib/libmpi.so.12(mca_btl_base_select+0x117)[0x2b3d0f4b0b07]
[csclprd3-6-12:30567] [ 4] 
/hpc/apps/mpi/openmpi/1.10.1/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x2b3d13d917b2]
[csclprd3-6-12:30567] [ 5] 
/hpc/apps/mpi/openmpi/1.10.1/lib/libmpi.so.12(mca_bml_base_init+0x99)[0x2b3d0f4b0309]
[csclprd3-6-12:30567] [ 6] 
/hpc/apps/mpi/openmpi/1.10.1/lib/openmpi/mca_pml_ob1.so(+0x538c)[0x2b3d18ac238c]
[csclprd3-6-12:30567] [ 7] 
/hpc/apps/mpi/openmpi/1.10.1/lib/libmpi.so.12(mca_pml_base_select+0x1e0)[0x2b3d0f4c1780]
[csclprd3-6-12:30567] [ 8] 
/hpc/apps/mpi/openmpi/1.10.1/lib/libmpi.so.12(ompi_mpi_init+0x51d)[0x2b3d0f47317d]
[csclprd3-6-12:30567] [ 9] 
/hpc/apps/mpi/openmpi/1.10.1/lib/libmpi.so.12(MPI_Init+0x170)[0x2b3d0f492820]
[csclprd3-6-12:30567] [10] /hpc/home/lanew/mpi/openmpi/a_1_10_1.out[0x400ad0]
[csclprd3-6-12:30567] [11] 
/lib64/libc.so.6(__libc_start_main+0xfd)[0x2b3d0f94bcdd]
[csclprd3-6-12:30567] [12] /hpc/home/lanew/mpi/openmpi/a_1_10_1.out[0x400999]
[csclprd3-6-12:30567] *** End of error message ***

Ugh.

Bill L.
IMPORTANT WARNING: This message is intended for the use of the person or entity 
to which it is addressed and may contain information that is privileged and 
confidential, the disclosure of which is governed by applicable law. If the 
reader of this message is not the intended recipient, or the employee or agent 
responsible for delivering it to the intended recipient, you are hereby 
notified that any dissemination, distribution or copying of this information is 
strictly prohibited. Thank you for your cooperation. 
_______________________________________________
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/03/28746.php

IMPORTANT WARNING: This message is intended for the use of the person or entity 
to which it is addressed and may contain information that is privileged and 
confidential, the disclosure of which is governed by applicable law. If the 
reader of this message is not the intended recipient, or the employee or agent 
responsible for delivering it to the intended recipient, you are hereby 
notified that any dissemination, distribution or copying of this information is 
strictly prohibited. Thank you for your cooperation.

Reply via email to