Hi Martin,

Your code seems to have several issues in inform_my_completion: comm is used 
uninitialized in the my_pack macro.
If the intention is that isend is executed by spawned processes, MPI_COMM_WORLD 
is probably the wrong communicator to use.

Best
Joachim
________________________________
From: users <users-boun...@lists.open-mpi.org> on behalf of Martín Morales via 
users <users@lists.open-mpi.org>
Sent: Tuesday, September 13, 2022 11:07:06 PM
To: users@lists.open-mpi.org <users@lists.open-mpi.org>
Cc: Martín Morales <martineduardomora...@hotmail.com>
Subject: [OMPI users] Cygwin. Strange issue with MPI_Isend() and packed data

Hello over there.

We have a very strange issue when the program tries to send a non-blocking 
message with MPI_Isend() and packed data: if we run this send after some 
unnecessary code (see details below), it works, but without it, not.

This program uses dynamic spawning to launch processes. Below are some extracts 
of the code with comments, environment specifications, and the output error.

Thanks in advance,

Martín


—



char * xmul_coord_transbuf = NULL , * transpt , * transend ;
char * mpi_buffer ;
int mpi_buffer_size ;

void init_xmul_coord_buff ( int siz ) {
  unsigned long int i = ( ( ( unsigned long ) ( siz ) + 7 ) & ~ 7 ) ;
  if ( xmul_coord_transbuf == NULL ) {
      transpt = xmul_coord_transbuf = ( char * ) malloc ( 512 ) ;
      transend = transpt + 508 ; }
  mpi_buffer = transpt ;
  transpt += i ;
  if ( transpt >= transend ) transpt = xmul_coord_transbuf ;
  mpi_buf_position = 0 ;
  mpi_buffer_size = siz ;
}

#define my_pack(x, mpi_type) { MPI_Pack_size(1,mpi_type,comm,&mpi_pack_size); 
MPI_Pack(&x, 1, mpi_type, mpi_buffer,mpi_buffer_size,&mpi_buf_position, comm); }

void inform_my_completion ( double val , Fint imstopped ) {
  int a , i = imstopped ;
  MPI_Comm comm;
  MPI_Status status;
  MPI_Request request;
  if ( !myslavenum ) return ;  // Note: myslavenum equals rank; there are 6 
slaves in our test...
  init_xmul_coord_buff ( sizeof ( double ) + sizeof ( int ) ) ;
  my_pack ( val , MPI_DOUBLE ) ;
  my_pack ( i , MPI_INT ) ;

#ifdef FUNNY_CODE
  // compiling with -DFUNNY_CODE, it works; otherwise it crashes with message 
below ...
  if ( FALSE ) { fprintf ( stderr , "\r/////SLAVE %i - report to COORD... 
%.0f\n" , myslavenum , val ) ; fflush ( stderr ) ; }
#endif

               // this is done only ONCE, no reception even attempted in our 
test code
  MPI_Isend( mpi_buffer , mpi_buffer_size , MPI_PACKED , 0 , XMUL_DONE , 
MPI_COMM_WORLD , &request ) ;
}


-----------------------------
File compiled without optimization, linked with -O3

-----------------------------
Windows Version:
    Windows 10 Pro
Single machine, 4 CPUs (2 threads each)

-----------------------------
Cygwin Version:

$ uname -r
3.3.4(0.341/5/3)

-----------------------------
MPI version:

mpirun (Open MPI) 4.1.2

All processes started with MPI_Comm_Spawn()

-----------------------------
Crash message at runtime:

[DESKTOP-N9KKTKD:00286] *** Process received signal ***
[DESKTOP-N9KKTKD:00286] Signal: Segmentation fault (11)
[DESKTOP-N9KKTKD:00286] Signal code: Address not mapped (23)
[DESKTOP-N9KKTKD:00286] Failing at address: 0xc9
Unable to print stack trace!
[DESKTOP-N9KKTKD:00286] *** End of error message ***
--------------------------------------------------------------------------
Child job 2 terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
[DESKTOP-N9KKTKD:00282] *** Process received signal ***
[DESKTOP-N9KKTKD:00282] Signal: Segmentation fault (11)
[DESKTOP-N9KKTKD:00282] Signal code: Address not mapped (23)
[DESKTOP-N9KKTKD:00282] Failing at address: 0xcb
Unable to print stack trace!
[DESKTOP-N9KKTKD:00282] *** End of error message ***

-----------------------------
Message when exitting master:

[DESKTOP-N9KKTKD][[47566,1],0][/pub/devel/openmpi/v4.1/openmpi-4.1.2-1.x86_64/src/openmpi-4.1.2/opal/mca/btl/tcp/btl_tcp_frag.c:242:mca_btl_tcp_frag_recv]
 mca_btl_tcp_frag_recv: readv failed: Software caused connection abort (113)
[DESKTOP-N9KKTKD][[47566,1],0][/pub/devel/openmpi/v4.1/openmpi-4.1.2-1.x86_64/src/openmpi-4.1.2/opal/mca/btl/tcp/btl_tcp_frag.c:242:mca_btl_tcp_frag_recv]
 mca_btl_tcp_frag_recv: readv failed: Software caused connection abort (113)
[DESKTOP-N9KKTKD][[47566,1],0][/pub/devel/openmpi/v4.1/openmpi-4.1.2-1.x86_64/src/openmpi-4.1.2/opal/mca/btl/tcp/btl_tcp_frag.c:242:mca_btl_tcp_frag_recv]
 mca_btl_tcp_frag_recv: readv failed: Software caused connection abort (113)
[DESKTOP-N9KKTKD][[47566,1],0][/pub/devel/openmpi/v4.1/openmpi-4.1.2-1.x86_64/src/openmpi-4.1.2/opal/mca/btl/tcp/btl_tcp_frag.c:242:mca_btl_tcp_frag_recv]
 mca_btl_tcp_frag_recv: readv failed: Software caused connection abort (113)
[DESKTOP-N9KKTKD][[47566,1],0][/pub/devel/openmpi/v4.1/openmpi-4.1.2-1.x86_64/src/openmpi-4.1.2/opal/mca/btl/tcp/btl_tcp_frag.c:242:mca_btl_tcp_frag_recv]
 mca_btl_tcp_frag_recv: readv failed: Software caused connection abort (113)
[DESKTOP-N9KKTKD][[47566,1],0][/pub/devel/openmpi/v4.1/openmpi-4.1.2-1.x86_64/src/openmpi-4.1.2/opal/mca/btl/tcp/btl_tcp_frag.c:242:mca_btl_tcp_frag_recv]
 mca_btl_tcp_frag_recv: readv failed: Software caused connection abort (113)
--------------------------------------------------------------------------
(null) noticed that process rank 5 with PID 0 on node DESKTOP-N9KKTKD exited on 
signal 11 (Segmentation fault).
--------------------------------------------------------------------------













Reply via email to