Hi
I am using open mpi v1.3.4 with BLCR 0.8.2. I have been testing my
openmpi based program on a 3-node cluster (each node is a Intel Nehalem
based dual quad core) and I have been successful in checkpointing and
restarting the program successfully multiple times.
Recently I moved to a 15 node
Apologies for the vague details of the problem I'm about to describe,
but then I only understand it vaguely. Any pointers about the best
directions for further investigation would be appreciated. Lengthy
details follow:
So I'm "MPI-izing" a pre-existing C++ program (not mine) and have run
into som
Hi,
I think that I have found a bug on the implementation of GM
collectives routines included in OpenMPI. The version of the GM
software is 2.0.30 for the PCI64 cards.
Sometimes, when I broadcast a vector with 1024 integer by using the
MPI_Bcast call, some processor receives a bad packet.