Your cmd line is telling OMPI to run 17 processes. Since your hostfile indicates that only 16 of them are to run on 10.4.23.107 (which I assume is your PS3 node?), 1 process is going to be run on 10.4.1.23 (I assume this is node1?).
I would guess that the executable is compiled to run on the PS3 given your specified path, so I would expect it to bomb on node1 - which is exactly what appears to be happening. On Nov 17, 2009, at 1:59 AM, Laurin Müller wrote: > Hi, > > i want to build a cluster with openmpi. > > 2 nodes: > node 1: 4 x Amd Quad Core, ubuntu 9.04, openmpi 1.3.2 > node 2: Sony PS3, ubuntu 9.04, openmpi 1.3 > > both can connect with ssh to each other and to itself without passwd. > > I can run the sample proramm pi.c on both nodes seperatly (see below). But if > i try to start it on node1 with --hostfile option to use node 2 "remote" i > got this error: > > cluster@bioclust:~$ mpirun --hostfile /etc/openmpi/openmpi-default-hostfile > -np 17 /mnt/projects/PS3Cluster/Benchmark/pi > -------------------------------------------------------------------------- > mpirun noticed that the job aborted, but has no info as to the process > that caused that situation. > -------------------------------------------------------------------------- > my hostfile: > cluster@bioclust:~$ cat /etc/openmpi/openmpi-default-hostfile > 10.4.23.107 slots=16 > 10.4.1.23 slots=2 > i can see with top that the processors of node2 begin to work shortly, then > it apports on node1. > > I use this sample/test program: > #include <stdio.h> > #include <stdlib.h> > #include "mpi.h" > int main(int argc, char *argv[]) > { > int i, n; > double h, pi, x; > int me, nprocs; > double piece; > /* --------------------------------------------------- */ > MPI_Init (&argc, &argv); > MPI_Comm_size (MPI_COMM_WORLD, &nprocs); > MPI_Comm_rank (MPI_COMM_WORLD, &me); > /* --------------------------------------------------- */ > if (me == 0) > { > printf("%s", "Input number of intervals:\n"); > scanf ("%d", &n); > } > /* --------------------------------------------------- */ > MPI_Bcast (&n, 1, MPI_INT, 0, MPI_COMM_WORLD); > /* --------------------------------------------------- */ > h = 1. / (double) n; > piece = 0.; > for (i=me+1; i <= n; i+=nprocs) > { > x = (i-1)*h; > piece = piece + ( 4/(1+(x)*(x)) + 4/(1+(x+h)*(x+h))) / 2 * h; > } > printf("%d: pi = %25.15f\n", me, piece); > /* --------------------------------------------------- */ > MPI_Reduce (&piece, &pi, 1, MPI_DOUBLE, > MPI_SUM, 0, MPI_COMM_WORLD); > /* --------------------------------------------------- */ > if (me == 0) > { > printf("pi = %25.15f\n", pi); > } > /* --------------------------------------------------- */ > MPI_Finalize(); > return 0; > } > it works on each node. > node1: > cluster@bioclust:~$ mpirun -np 4 /mnt/projects/PS3Cluster/Benchmark/piInput > number of intervals: > 20 > 0: pi = 0.822248040052981 > 2: pi = 0.773339953424083 > 3: pi = 0.747089984650041 > 1: pi = 0.798498008827023 > pi = 3.141175986954128 > > node2: > cluster@kasimir:~$ mpirun -np 2 /mnt/projects/PS3Cluster/Benchmark/pi > Input number of intervals: > 5 > 1: pi = 1.267463056905495 > 0: pi = 1.867463056905495 > pi = 3.134926113810990 > cluster@kasimir:~$ > > Thx in advance, > Laurin > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users