Hello Sofia, Looking at your stack trace it is what I thought was happening and that is one process is stuck trying to connect to the other. The stack unfortunately does not give enough information as to why. The only suggestion I could give is walk through a debuggable version of the code from ompi_init_do_preconnect and see if you can find where the process is calling connect and see if the connect call is failing. If you don't have a firewall I am not sure what is then blocking the connection from happening. Either the address somehow is being mashed up or something else. --td Date: Mon, 22 Sep 2008 10:49:41 +0200 From: "Sofia Aparicio Secanellas" <sapari...@grpss.ssr.upm.es> Subject: Re: [OMPI users] Problem with MPI_Send and MPI_Recv To: "Open MPI Users" <us...@open-mpi.org> Message-ID: <2F607CC2B43A422B80CEBBD540BFFE8B@aparicio1> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed" Hello Terry, I do not have an active firewall. I have typed on both computers: netstat -lnut I enclose you the results. I have also written on both computers: mpirun -np 2 --host 10.1.10.208,10.1.10.240 --mca mpi_preconnect_all 1 --prefix /usr/local -mca btl self,tcp -mca btl_tcp_if_include eth1 ./PruebaSumaParalela.out I enclose you the results. Thank you. Sofia ----- Original Message ----- From: "Terry Dontje" <terry.don...@sun.com> To: <us...@open-mpi.org> Sent: Friday, September 19, 2008 7:54 PM Subject: Re: [OMPI users] Problem with MPI_Send and MPI_Recv

> Hello Sofia,
>
> After further reflection I wonder if you have a firewall that is > preventing connections to certain ports.
>
> --td
>
> Terry Dontje wrote:
>> Hello Sofia,
>>
>> Ok, so I really wanted the stack of when you run with "-mca >> mpi_preconnect_all 1" I believe you'll see that one of the processes >> will be in init. However, the stack still probably will not help me help >> you. What needs to happen is to step through the code in dbx while the >> connection is trying to be established. I am hoping you might find the >> connect call fails or that we've been given an interface that somehow >> cannot reach the other node. However, when you specified "-mca >> btl_tcp_if_include eth1" that should have forced things to use the >> interface you need. So it really comes down to why are we not connecting >> to the eth1 address? Are we failing on routing to that address or is the >> connect failing because we are trying to use a port that we are not >> really allowed to use or is it something else?
>>
>> I don't think it is a routing problem since you are able to reach each >> node via ssh. Is there someone else on the list that might want to lend >> a hand here? I feel like I am missing something obvious going on here.
>>
>> --td
>>> Date: Fri, 19 Sep 2008 16:09:11 +0200
>>> From: "Sofia Aparicio Secanellas" <sapari...@grpss.ssr.upm.es>
>>> Subject: Re: [OMPI users] Problem with MPI_Send and MPI_Recv
>>> To: "Open MPI Users" <us...@open-mpi.org>
>>> Message-ID: <1BBF50FE29F743B5829CC3785F47CADD@aparicio1>
>>> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
>>>
>>> Hello Terry,
>>>
>>> I have installed 1.2.7 and I obtain the same result.
>>>
>>> I will explain you what I have done.
>>>
>>> 1. On my computer edu@10.1.10.240 I have added a new user called sofia. >>> This way I have sofia@10.1.10.208 and sofia@10.1.10.240. >>> 2. I have downloaded the openmpi 1.2.7 from the openmpi website on both >>> computers in /home/sofia/Desktop. >>> 3. I have installed everything using "sudo ./configure", "sudo make" and >>> "sudo make install". >>> 4. To make ssh not ask me for a password. I have typed in >>> sofia@10.1.10.208 "ssh-keygen -t dsa", "cd $HOME/.ssh" and "cp >>> id_dsa.pub authorized_keys". I have copied the directory >>> "/home/sofia/.ssh" from sofia@10.1.10.208 to /home/sofia/.ssh in >>> sofia@10.1.10.240. The ssh command without password works on computer >>> sofia@10.1.10.208 but computer sofia@10.1.10.208 ask me for a >>> passphrase and for the password. Is it normal? >>> 5. I have created a directory "/home/sofia/programasparalelos" on both >>> computers and I have given permissions to the directory with "chmod >>> 777". >>> 6. I have copied on both computers in "/home/sofia/programasparalelos" >>> the program "PruebaSumaParalela.c" (I have changed a little bit the >>> program, I enclose you the new program) and I have compiled using "mpicc >>> PruebaSumaParalela.c -o PruebaSumaParalela.out".
>>>
>>> 7. Now I run the program on both computersusing the command:
>>>
>>> mpirun -np2 --host 10.1.10.208,10.1.10.240 --prefix /usr/local >>> ./PruebaSumaParalela.out
>>>
>>> When I run the program I obtain 3 PIDs executing on every computer, 2 >>> of "./PruebaSumaParalela.out" and 1 of "mpirun -np2 --host >>> 10.1.10.208,10.1.10.240 --prefix /usr/local ./PruebaSumaParalela.out". I >>> enclose you the results obtained on every computer for every >>> "./PruebaSumaParalela.out".
>>>
>>> Thank you very much.
>>>
>>> Sofia
>>>
>>
>>
>

Reply via email to