Well, according to George Bosilca (http://www.open-mpi.org/community/lists/users/2005/02/0005.php), threads are supported in OpenMPI. The program I try to run works with the TCP stack and MX driver is thread-safe, so i guess the problem comes from the MX BTL or MTL.


Scott Atchley wrote:
Hi Francois,

I am not familiar with the internals of the OMPI code. Are you sure, however, that threads are fully supported yet? I was under the impression that thread support was still partial.

Can anyone else comment?


On Jun 8, 2009, at 8:43 AM, François Trahay wrote:

I'm encountering some issues when running a multithreaded program with
OpenMPI (trunk rev. 21380, configured with --enable-mpi-threads)
My program (included in the tar.bz2) uses several pthreads that perform
ping pongs concurrently (thread #1 uses tag #1, thread #2 uses tag #2, etc.)
This program crashes over MX (either btl or mtl) with the following

concurrent_ping_v2: pml_cm_recvreq.c:53:
mca_pml_cm_recv_request_completion: Assertion `0 ==
[joe0:01709] *** Process received signal ***
[joe0:01709] *** Process received signal ***
[joe0:01709] Signal: Segmentation fault (11)
[joe0:01709] Signal code: Address not mapped (1)
[joe0:01709] Failing at address: 0x1238949c4
[joe0:01709] Signal: Aborted (6)
[joe0:01709] Signal code:  (-6)
[joe0:01709] [ 0] /lib/libpthread.so.0 [0x7f57240be7b0]
[joe0:01709] [ 1] /lib/libc.so.6(gsignal+0x35) [0x7f5722cba065]
[joe0:01709] [ 2] /lib/libc.so.6(abort+0x183) [0x7f5722cbd153]
[joe0:01709] [ 3] /lib/libc.so.6(__assert_fail+0xe9) [0x7f5722cb3159]
[joe0:01709] [ 0] /lib/libpthread.so.0 [0x7f57240be7b0]
[joe0:01709] [ 1]
[joe0:01709] [ 2]
[joe0:01709] [ 3]
[joe0:01709] [ 4]
[joe0:01709] [ 5]
[joe0:01709] [ 6]
[joe0:01709] [ 7]
[joe0:01709] [ 8]
[joe0:01709] [ 9]
[joe0:01709] [10] ./concurrent_ping_v2(client+0x123) [0x401404]
[joe0:01709] [11] /lib/libpthread.so.0 [0x7f57240b6faa]
[joe0:01709] [12] /lib/libc.so.6(clone+0x6d) [0x7f5722d5629d]
[joe0:01709] *** End of error message ***
[joe0:01709] [ 4]
[joe0:01709] [ 5]
[joe0:01709] [ 6]
[joe0:01709] [ 7]
[joe0:01709] [ 8]
[joe0:01709] [ 9]
[joe0:01709] [10]
[joe0:01709] [11] ./concurrent_ping_v2(client+0x123) [0x401404]
[joe0:01709] [12] /lib/libpthread.so.0 [0x7f57240b6faa]
[joe0:01709] [13] /lib/libc.so.6(clone+0x6d) [0x7f5722d5629d]
[joe0:01709] *** End of error message ***
mpirun noticed that process rank 1 with PID 1709 on node joe0 exited on
signal 6 (Aborted).

Any idea ?

Francois Trahay

users mailing list

users mailing list

Reply via email to