Re: [OMPI users] NFS and openmpi through different NICs

2009-12-14 Thread Bill Rankin
On 12/14/2009 11:11 PM, Dmitry Zaletnev wrote: > Hi, > is it possible to have NFS and openmpi running on different NICs? Yes. Just make sure that the two subnets for the NICs don't overlap and that your routing tables are correct. As for channel bonding, I'll let someone who has actually used it

[OMPI users] NFS and openmpi through different NICs

2009-12-14 Thread Dmitry Zaletnev
Hi, is it possible to have NFS and openmpi running on different NICs? By the way, is it possible to have openmpi using multiple NICs without hardware support for bonding? Thank you in advance. -- Dmitry

Re: [OMPI users] Hanging vs Stopping behaviour in communication failures

2009-12-14 Thread Constantinos Makassikis
Jeff Squyres wrote: On Dec 9, 2009, at 3:47 AM, Constantinos Makassikis wrote: sometimes when running Open MPI jobs, the application hangs. By looking the output I get the following error message: [ic17][[34562,1],74][../../../../../ompi/mca/btl/tcp/btl_tcp_frag.c:216:mca_btl_tcp_frag_recv

Re: [OMPI users] Hanging vs Stopping behaviour in communication failures

2009-12-14 Thread Jeff Squyres
On Dec 9, 2009, at 3:47 AM, Constantinos Makassikis wrote: > sometimes when running Open MPI jobs, the application hangs. By looking the > output I get the following error message: > > [ic17][[34562,1],74][../../../../../ompi/mca/btl/tcp/btl_tcp_frag.c:216:mca_btl_tcp_frag_recv > > ] mca_btl_

Re: [OMPI users] checkpoint opempi-1.3.3+sge62

2009-12-14 Thread Reuti
Hi, no, I never tried Open MPI's checkpointing. But there are two Howto's from which you may get some ideas to integrate it with SGE: http://gridengine.sunsource.net/howto/checkpointing.html http://gridengine.sunsource.net/howto/APSTC-TB-2004-005.pdf (but Open MPI's checkpointing seems more

Re: [OMPI users] OpenMPI 1.4 RPM Spec file problem

2009-12-14 Thread Jeff Squyres
Jim and I iterated a bit off-list. Jim -- I committed a change to our specfile that makes it work for me. Before I release a 1.4-2 SRPM, could you give it a whirl? http://www.open-mpi.org/~jsquyres/unofficial/ On Dec 9, 2009, at 6:41 PM, Jim Kusznir wrote: > By the way, if I set build_a

[OMPI users] Disabling irqbalance service for better performance of MPI jobs

2009-12-14 Thread Rahul Nabar
I have already been using the processor and memory affinity options to bind the processes to specific cores. Does the presence of the irqbalance daemon matter? I saw some recommendation to disable this for a performance boost. Or is this irrelevant? I am running HPC jobs with no over- nor under-su

Re: [OMPI users] OpenMPI problem on Fedora Core 12

2009-12-14 Thread Ashley Pittman
On Sun, 2009-12-13 at 19:04 +0100, Gijsbert Wiesenekker wrote: > The following routine gives a problem after some (not reproducible) > time on Fedora Core 12. The routine is a CPU usage friendly version of > MPI_Barrier. There are some proposals for Non-blocking collectives before the MPI forum cu

Re: [OMPI users] OpenMPI problem on Fedora Core 12

2009-12-14 Thread Eugene Loh
Let's start with this: You generate non-blocking sends (MPI_Isend). Those sends are not completed anywhere. So, strictly speaking, they don't need to be executed. In practice, even if they are executed, they should be "completed" from the user program's point of view (MPI_Test, MPI_Wait, MP

Re: [OMPI users] checkpoint opempi-1.3.3+sge62

2009-12-14 Thread Sergio Díaz
Hi Reuti, Yes, I sent a job with SGE and I checkpointed the mpirun process, by hand, entering into the mpi master node. Then I killed the job with qdel and after that I did the ompi-restart. I will try to integrate with SGE creating a ckpt environment but I think that it could be a bit difficu

Re: [OMPI users] checkpoint opempi-1.3.3+sge62

2009-12-14 Thread Reuti
Hi, Am 14.12.2009 um 17:05 schrieb Sergio Díaz: I got a successful checkpoint with a fresh installation and without use the trunk. I can't understand why it is working now and before I could do a successful restart... Maybe there was something wrong in the openmpi installation and then the

Re: [OMPI users] checkpoint opempi-1.3.3+sge62

2009-12-14 Thread Sergio Díaz
Hi Josh, I got a successful checkpoint with a fresh installation and without use the trunk. I can't understand why it is working now and before I could do a successful restart... Maybe there was something wrong in the openmpi installation and then the metadata was created in a wrong way. I wi