We noticed that the attached mpi program using openmpi (version 1.2.6 or openmpi-1.3a1r18785), stalls.

compile: mpicc -o scattertest scattertest.c
run:     mpiexec -n 4 ./scattertest 10000

This is for a ubuntu 32 bit system, equipped with 1 Gbyte of memory.
A test on a debian system shows the same results, however on a machine with 8 Gbyte of memory, the number 10000 must be enlarged in order to get a stall happening. The program runs ok when the number is lower:

  mpiexec -n 4 ./scattertest 10

or when disabling the sm:

  mpiexec -n 4 -mca btl ^sm ./sctattertest 100000

or when activating the commented out MPI_Barrier call

The same behaviour is observed with the use of MPI_Scatterv and MPI_Isend - MPI_Irecv

Please find attached:

  scattertest.c    : the test program
  config.log.bz2   : config.log from configuring openmpi-1.3a1r18785
  ompi_info--all.bz2: output from ompi_info --all
  ifconfig          : output of ifconfig

Regards,

Willem


--
Willem Vermin         tel (31)20 5923054/5923000
SARA, Kruislaan 415   fax (31)20 6683167
1098 SJ Amsterdam     wil...@sara.nl
Nederland
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
int main(int argc, char*argv[])
{
  double *x, *y;
  int i,times,rank,size;
  const int l=200;

  MPI_Init(&argc,&argv);

  times=10000;
  if (argc > 1)
    times = atoi(argv[1]);

  MPI_Comm_rank(MPI_COMM_WORLD,&rank);
  MPI_Comm_size(MPI_COMM_WORLD,&size);

  if (rank == 0)
    printf("scattertest, repetitions is %d\n",times);

  x=(double*) malloc(sizeof(double)*l*size);
  y=(double*) malloc(sizeof(double)*l);

  for (i=0; i<l*size; i++)
    x[i]=10.0;

  for (i=0; i<times; i++)
    {
      MPI_Scatter(x,l,MPI_DOUBLE,y,l,MPI_DOUBLE,0,MPI_COMM_WORLD);
//      if (i%100 == 0) 
 //        MPI_Barrier(MPI_COMM_WORLD);
    }

  printf("all is well on %d\n",rank);

  MPI_Finalize();

}

Attachment: config.log.bz2
Description: application/bzip

Attachment: ompi_info--all.bz2
Description: application/bzip

eth0      Link encap:Ethernet  HWaddr 00:12:3F:2B:5D:77  
          inet addr:145.100.6.148  Bcast:145.100.6.255  Mask:255.255.255.0
          inet6 addr: fe80::212:3fff:fe2b:5d77/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:3918360 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6003598 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:2628497326 (2.4 GB)  TX bytes:1968595851 (1.8 GB)
          Interrupt:16 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:25381363 errors:0 dropped:0 overruns:0 frame:0
          TX packets:25381363 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:1022252832 (974.8 MB)  TX bytes:1022252832 (974.8 MB)

Reply via email to