Your cmd line is telling OMPI to run 17 processes. Since your hostfile 
indicates that only 16 of them are to run on 10.4.23.107 (which I assume is 
your PS3 node?), 1 process is going to be run on 10.4.1.23 (I assume this is 
node1?).

I would guess that the executable is compiled to run on the PS3 given your 
specified path, so I would expect it to bomb on node1 - which is exactly what 
appears to be happening.


On Nov 17, 2009, at 1:59 AM, Laurin Müller wrote:

> Hi,
>  
> i want to build a cluster with openmpi.
>  
> 2 nodes:
> node 1: 4 x Amd Quad Core, ubuntu 9.04, openmpi 1.3.2
> node 2: Sony PS3, ubuntu 9.04, openmpi 1.3
>  
> both can connect with ssh to each other and to itself without passwd.
>  
> I can run the sample proramm pi.c on both nodes seperatly (see below). But if 
> i try to start it on node1 with --hostfile option to use node 2 "remote" i 
> got this error:
>  
> cluster@bioclust:~$ mpirun --hostfile /etc/openmpi/openmpi-default-hostfile 
> -np 17 /mnt/projects/PS3Cluster/Benchmark/pi
> --------------------------------------------------------------------------
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --------------------------------------------------------------------------
> my hostfile:
> cluster@bioclust:~$ cat /etc/openmpi/openmpi-default-hostfile
> 10.4.23.107 slots=16
> 10.4.1.23 slots=2
> i can see with top that the processors of node2 begin to work shortly, then 
> it apports on node1.
>  
> I use this sample/test program:
> #include <stdio.h>
> #include <stdlib.h>
> #include "mpi.h"
> int main(int argc, char *argv[])
> {
>       int    i, n;
>       double h, pi, x;
>       int    me, nprocs;
>       double piece;
> /* --------------------------------------------------- */
>       MPI_Init (&argc, &argv);
>       MPI_Comm_size (MPI_COMM_WORLD, &nprocs);
>       MPI_Comm_rank (MPI_COMM_WORLD, &me);
> /* --------------------------------------------------- */
>       if (me == 0)
>       {
>          printf("%s", "Input number of intervals:\n");
>          scanf ("%d", &n);
>       }
> /* --------------------------------------------------- */
>       MPI_Bcast (&n, 1, MPI_INT, 0, MPI_COMM_WORLD);
> /* --------------------------------------------------- */
>       h     = 1. / (double) n;
>       piece = 0.;
>       for (i=me+1; i <= n; i+=nprocs)
>       {
>            x     = (i-1)*h;
>            piece = piece + ( 4/(1+(x)*(x)) + 4/(1+(x+h)*(x+h))) / 2 * h;
>       }
>       printf("%d: pi = %25.15f\n", me, piece);
> /* --------------------------------------------------- */
>       MPI_Reduce (&piece, &pi, 1, MPI_DOUBLE,
>                   MPI_SUM, 0, MPI_COMM_WORLD);
> /* --------------------------------------------------- */
>       if (me == 0)
>       {
>          printf("pi = %25.15f\n", pi);
>       }
> /* --------------------------------------------------- */
>      MPI_Finalize();
>       return 0;
> }
> it works on each node.
> node1:
> cluster@bioclust:~$ mpirun -np 4 /mnt/projects/PS3Cluster/Benchmark/piInput 
> number of intervals:
> 20
> 0: pi =         0.822248040052981
> 2: pi =         0.773339953424083
> 3: pi =         0.747089984650041
> 1: pi =         0.798498008827023
> pi =         3.141175986954128
>  
> node2:
> cluster@kasimir:~$ mpirun -np 2 /mnt/projects/PS3Cluster/Benchmark/pi
> Input number of intervals:
> 5
> 1: pi =         1.267463056905495
> 0: pi =         1.867463056905495
> pi =         3.134926113810990
> cluster@kasimir:~$
>  
> Thx in advance,
> Laurin
> 
>  
>  
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to