[OMPI users] ompi-restart fails with "found pid in use"

2010-05-14 Thread ananda.mudar
Hi I am using open mpi v1.3.4 with BLCR 0.8.2. I have been testing my openmpi based program on a 3-node cluster (each node is a Intel Nehalem based dual quad core) and I have been successful in checkpointing and restarting the program successfully multiple times. Recently I moved to a 15 node

[OMPI users] Segmentation fault at program end with 2+ processes

2010-05-14 Thread Paul-Michael Agapow
Apologies for the vague details of the problem I'm about to describe, but then I only understand it vaguely. Any pointers about the best directions for further investigation would be appreciated. Lengthy details follow: So I'm "MPI-izing" a pre-existing C++ program (not mine) and have run into som

[OMPI users] GM + OpenMPI bug ...

2010-05-14 Thread José Ignacio Aliaga Estellés
Hi, I think that I have found a bug on the implementation of GM collectives routines included in OpenMPI. The version of the GM software is 2.0.30 for the PCI64 cards. Sometimes, when I broadcast a vector with 1024 integer by using the MPI_Bcast call, some processor receives a bad packet.