Hi, don't know if this helps, but looks like the cause for me is btl_endpoint->endpoint_addr being NULL in this line:

             btl_endpoint->endpoint_addr->addr_inuse--;

I.e. if in ompi/mca/btl/tcp/btl_tcp_proc.c in mca_btl_tcp_proc_remove in, I put an "if (btl_endpoint->endpoint_addr)" before the decrement, apparently things work...

Marcus G. Daniels wrote:
Hi all,

I built 1.0.2 on Fedora 5 for x86_64 on a cluster setup as described below and I witness the same behavior when I try to run a job. Any ideas on the cause?
Jeff Squyres wrote:
One additional question: are you using TCP as your communications
network, and if so, do either of the nodes that you are running on
have more than one TCP NIC? We recently fixed a bug for situations
where at least one node in on multiple TCP networks, not all of which
were shared by the nodes where the peer MPI processes were running.
If this situation describes your network setup (e.g., a cluster where
the head node has a public and a private network, and where the
cluster nodes only have a private network -- and your MPI process was
running on the head node and a compute node), can you try upgrading
to the latest 1.0.2 release candidate tarball:

http://www.open-mpi.org/software/ompi/v1.0/


$ mpiexec -machinefile ../bhost -np 9 ./ng
Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
Failing at addr:0x6
[0] func:/opt/openmpi/1.0.2a9/lib/libopal.so.0 [0x2aaaac062d0c]
[1] func:/lib64/tls/libpthread.so.0 [0x3b8d60c320]
[2]
func:/opt/openmpi/1.0.2a9/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_proc_remove+0xb5)
[0x2aaaae6e4c65]
[3] func:/opt/openmpi/1.0.2a9/lib/openmpi/mca_btl_tcp.so [0x2aaaae6e2b09]
[4]
func:/opt/openmpi/1.0.2a9/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_add_procs+0x157)
[0x2aaaae6dfdd7]
[5]
func:/opt/openmpi/1.0.2a9/lib/openmpi/mca_bml_r2.so(mca_bml_r2_add_procs+0x231)
[0x2aaaae3cd1e1]
[6]
func:/opt/openmpi/1.0.2a9/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0x94)
[0x2aaaae1b1f44]
[7] func:/opt/openmpi/1.0.2a9/lib/libmpi.so.0(ompi_mpi_init+0x3af)
[0x2aaaabdd2d7f]
[8] func:/opt/openmpi/1.0.2a9/lib/libmpi.so.0(MPI_Init+0x93)
[0x2aaaabdbeb33]
[9] func:/opt/openmpi/1.0.2a9/lib/libmpi.so.0(MPI_INIT+0x28)
[0x2aaaabdce948]
[10] func:./ng(MAIN__+0x38) [0x4022a8]
[11] func:./ng(main+0xe) [0x4126ce]
[12] func:/lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x3b8cb1c4bb]
[13] func:./ng [0x4021da]
*** End of error message ***

Bye,
Czarek

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to