Hi everyone, 

I'm currently implementing communication based on MPI in our parallel language middle-ware POP-C++. It was using TCP/IP socket before but due to a project to port the language on a supercomputer, I have to use OpenMPI for the communication. I successfully change the old communication by MPI communication. Anyway I having the following error sometimes during the execution of my program. 

[clementon:58465] [[52825,3],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file dpm_orte.c at line 315
[clementon:58465] *** An error occurred in MPI_Comm_accept
[clementon:58465] *** on communicator MPI_COMM_WORLD
[clementon:58465] *** MPI_ERR_UNKNOWN: unknown error
[clementon:58465] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort

Sometimes I have to MPI_Comm_connect that failed :

MPI-COMBOX(client): Want to get a connection to 1318912000.0;tcp://192.168.59.176:33956+1318912002.0;tcp://192.168.59.176:54394:300
[ubuntu:19666] [[20125,3],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file dpm_orte.c at line 315
[ubuntu:19666] *** An error occurred in MPI_Comm_accept
[ubuntu:19666] *** on communicator MPI_COMM_WORLD
[ubuntu:19666] *** MPI_ERR_UNKNOWN: unknown error
[ubuntu:19666] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort

So basically, I have a process waiting for connection with MPI_Comm_accept (Comm.Accept as I used C++). And another process want to connect to it with the MPI_Comm_connect (MPI::COMM_WORLD.Connect(port_name) ... ). It works fine most of the time. I'm suspecting a problem with multiple threads. The process who receives connection as a second thread to serve request. 

* The process 1 connects to the process 2 
* process 2 thread 1 register the request
* process 2 thread 1 will wait for a new connection
* process 2 thread 2 will server the pending request and might send data
* A another process might start again a connection to the process 2

I'm running this code on an Ubuntu 12.04 with OpenMPI 1.6.2 configured with --enable-mpi-thread-multiple. I joined ompi_info -all output. 
I'm running also the same code on a Mac OS X 10.8.2 with OpenMPI 1.6.2 also configured with --enable-mpi-thread-multiple. 

I don't run on multiple node for the moment. Just one node and already experiencing this. As I said I'm suspecting a problem with multiple thread but my configuration should allow multiple thread to use MPI calls. 



Any help much appreciated 



Valentin Clement

--
Valentin Clement
Student trainee
Advanced Institute for Computational Science
Programming environnement research team 
RIKEN Institute
Kobe, Japan

Attachment: ompi-output.tar.bz2
Description: BZip2 compressed data

 


Reply via email to