-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 we use a slightly modified openmpi-1.4.1
the patch is here: <diff> - --- ompi/mca/btl/tcp/btl_tcp_proc.c.orig 2010-03-23 14:01:28.000000000 +0100 +++ ompi/mca/btl/tcp/btl_tcp_proc.c 2010-03-23 14:01:50.000000000 +0100 @@ -496,7 +496,7 @@ local_interfaces[i]->ipv4_netmask)) { weights[i][j] = CQ_PRIVATE_SAME_NETWORK; } else { - - weights[i][j] = CQ_PRIVATE_DIFFERENT_NETWORK; + weights[i][j] = CQ_NO_CONNECTION; } best_addr[i][j] = peer_interfaces[j]->ipv4_endpoint_addr; } </diff> I actually just discovered the existence of this patch, I'm planning to run tests with a vanilla 1.4.1 and if possible a 1.4.2 ASAP. On 05/31/2010 04:18 PM, Ralph Castain wrote: > What OMPI version are you using? > > On May 31, 2010, at 5:37 AM, guillaume ranquet wrote: > > Hi, > I'm new to the list and quite new to the world of MPI. > > a bit of background: > I'm a sysadmin and have to provide a working environment (debian base) > for researchers to work with MPI : I'm _NOT_ an open-mpi user - I know > C, but that's all. > > I compile openmpi with the following selectors: --prefix=/usr > --with-openib=/usr --with-mx=/usr > (yes, everything goes in /usr) > > when running an mpi application (any application) on a machine equipped > with infiniband hardware, I get a segmentation fault during the > MPI_Finalise() > the code just runs fine on machines that have no Infiniband devices. > > <code> > #include <stdio.h> > #include <mpi.h> > > > int main (int argc,char *argv[]) > { > int i=0,rank, size; > > MPI_Init (&argc, &argv); /* starts MPI */ > MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get current process id */ > MPI_Comm_size (MPI_COMM_WORLD, &size); /* get number of > processes */ > while (i == 0) > sleep(5); > printf( "Hello world from process %d of %d\n", rank, size ); > MPI_Finalize(); > return 0; > } > </code> > > my gdb-fu is quite rusty, but I get the vague idea it happens somewhere > in the MPI_Finalize(); (I can probably dig a bit there to find exactly > where, if it's relevant) > > I'm running it with: > $ mpirun --mca orte_base_help_aggregate 0 --mca plm_rsh_agent oarsh > -machinefile nodefile ./mpi_helloworld > > > after various tests I've been suggested to try recompiling openmpi with > the --without-memory-manager selector. > it actually solves the issue and everything runs fine. > > from what I understand (correct me if I'm wrong) the "memory manager" is > used with Infiniband RDMA to have a somewhat persistant memory region > available on the device instead of destroying/recreating it everytime. > and thus, it's only a "performance tunning" issue, that disables the > openmpi "leave_pinned" option? > > the various questions I have: > is this bug/behaviour known? > if so, is there a better workaround? > as I'm not an openmpi user, I don't really know if it's considered > acceptable to have this option disabled? > does the list want more details on this bug? > > > thanks, > Guillaume Ranquet. > Grid5000 support-staff. >> _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.15 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJMA893AAoJEEzIl7PMEAliCWIH/0aheCEvCDeDDhNvCuAetCbF jny45swb8jmfNBVIYd9dTruBmU/1WKC0QBcyxWG0El6ST/xKfXMXGBpKf+tC2Hi1 GS2pz8YEW4x/m3dcVxCVQS9wZIpIG/JHcBqduQtGtlbLq51mTLoc1ygedkCqHjIA jaimi9VXDyjyeNUV9Yby0zejLO2nRkR29bZ2+I8N8eiHw5lLkstyrQqjsF5d0R1i Dvr7xtrYEDeqgrdTjv6Gb4BkEqatPH6QEFdS4SIGL/6BPhMgiV2MBn6G/Lsvvy6u Z97CGwt9usicyxQpCLXtrPTpjUTcqLjlEx7iIVsFtpL4VzqlZYDMt2TXNfheRig= =MtAr -----END PGP SIGNATURE-----
smime.p7s
Description: S/MIME Cryptographic Signature