Re: [OMPI users] problems with HPLinpack over myrinet MX-10G

2007-02-14 Thread George Bosilca
On Feb 14, 2007, at 7:27 PM, Scott Atchley wrote: On Feb 14, 2007, at 12:33 PM, Alex Tumanov wrote: Hello, I recently tried running HPLinpack, compiled with OMPI, over myrinet MX interconnect. Running a simple hello world program works, but XHPL fails with an error occurring when it tries to

Re: [OMPI users] problems with HPLinpack over myrinet MX-10G

2007-02-14 Thread Scott Atchley
On Feb 14, 2007, at 12:33 PM, Alex Tumanov wrote: Hello, I recently tried running HPLinpack, compiled with OMPI, over myrinet MX interconnect. Running a simple hello world program works, but XHPL fails with an error occurring when it tries to MPI_Send: # mpirun -np 4 -H l0-0,c0-2 --prefix $MPI

Re: [OMPI users] first time user - can run mpi job SMP but not over cluster

2007-02-14 Thread Jeff Squyres
I think Brian is referring to: https://svn.open-mpi.org/trac/ompi/changeset/12852 On Feb 14, 2007, at 1:02 PM, Brian W. Barrett wrote: On Feb 14, 2007, at 10:50 AM, Jeff Squyres wrote: On Feb 14, 2007, at 12:43 PM, Adrian Knoth wrote: Blah. We definitely need to work on our error me

Re: [OMPI users] first time user - can run mpi job SMP but not over cluster

2007-02-14 Thread Mark Kosmowski
FWIW, what did MPICH say for the error? I followed the install.pdf that comes with mpich. They have you start up the daemon ring then run mpdtrace. This command tells each daemon instance to report the hostname. I don't remember the exact error message, but it was very clear that NODENAME was

Re: [OMPI users] NetPipe benchmark & spanning multiple interconnects

2007-02-14 Thread Alex Tumanov
For OpenIB + GM you are probably going to be limited by the memory bus. Take the InfiniBand Nic, it peaks at say 900 MBytes/Sec, the Myrinet 2-G will peak at say 250 MBytes/Sec. Unless you are doing direct DMAs from pre-registered host memory than you will not see 900 + 250 MBytes/Sec bandwidth. T

Re: [OMPI users] first time user - can run mpi job SMP but not over cluster

2007-02-14 Thread Brian W. Barrett
On Feb 14, 2007, at 10:50 AM, Jeff Squyres wrote: On Feb 14, 2007, at 12:43 PM, Adrian Knoth wrote: Blah. We definitely need to work on our error messages. I think we could use gai_strerror(3) for this. If we could agree to get rid of SUSv2 and rely on RFC 3493 ;) It would not be too diff

Re: [OMPI users] first time user - can run mpi job SMP but not over cluster

2007-02-14 Thread Jeff Squyres
On Feb 14, 2007, at 12:43 PM, Adrian Knoth wrote: Blah. We definitely need to work on our error messages. I think we could use gai_strerror(3) for this. If we could agree to get rid of SUSv2 and rely on RFC 3493 ;) It would not be too difficult to add gai_strerror() checking into confi

Re: [OMPI users] first time user - can run mpi job SMP but not over cluster

2007-02-14 Thread Adrian Knoth
On Wed, Feb 14, 2007 at 12:32:46PM -0500, Jeff Squyres wrote: > > ... hostname worked, but my application hung and gave a connect() > > errno 110. adi@drcomp:~$ perl -e 'die$!=110' Connection timed out at -e line 1. > Blah. We definitely need to work on our error messages. I think we could use

[OMPI users] problems with HPLinpack over myrinet MX-10G

2007-02-14 Thread Alex Tumanov
Hello, I recently tried running HPLinpack, compiled with OMPI, over myrinet MX interconnect. Running a simple hello world program works, but XHPL fails with an error occurring when it tries to MPI_Send: # mpirun -np 4 -H l0-0,c0-2 --prefix $MPIHOME --mca btl mx,self /opt/hpl/openmpi-hpl/bin/xhpl

Re: [OMPI users] first time user - can run mpi job SMP but not over cluster

2007-02-14 Thread Jeff Squyres
On Feb 14, 2007, at 12:28 PM, Mark Kosmowski wrote: Everything is working properly now. I needed to reinstall Linux on one of my nodes after a botched attempt at a network install - mpirun ... hostname worked, but my application hung and gave a connect() errno 110. At this point I decided to g

Re: [OMPI users] first time user - can run mpi job SMP but not over cluster

2007-02-14 Thread Mark Kosmowski
Everything is working properly now. I needed to reinstall Linux on one of my nodes after a botched attempt at a network install - mpirun ... hostname worked, but my application hung and gave a connect() errno 110. At this point I decided to give up and try mpich instead. During the mpich sanity

Re: [OMPI users] SEGV in ompi_coll_tuned_reduce_generic (1.2b4r13488)

2007-02-14 Thread Jelena Pjesivac-Grbovic
Hello Lydia, how does the call to MPI_Reduce look like in your application? Is the code available? Thank you, Jelena On Wed, 14 Feb 2007, Lydia Heck wrote: When running either over myrinet or over gigabit one of our codes (Gagdet2) it fails predictably with the following error message. F

[OMPI users] SEGV in ompi_coll_tuned_reduce_generic (1.2b4r13488)

2007-02-14 Thread Lydia Heck
When running either over myrinet or over gigabit one of our codes (Gagdet2) it fails predictably with the following error message. >From the back trace it looks as if the SEGV is in ompi_coll_tuned_reduce_generic. Have there been similar reportings and/or is there a fix for this? Lydia Heck [m