Hi, I have installed openmpi-1.9a1r27342 on Solaris 10 with Oracle Solaris Studio compiler 12.3.
rs0 fd1026 106 mpicc -showme cc -I/usr/local/openmpi-1.9_64_cc/include -mt -m64 \ -L/usr/local/openmpi-1.9_64_cc/lib64 -lmpi -lpicl -lm -lkstat \ -llgrp -lsocket -lnsl -lrt -lm I can run the following command. rs0 fd1026 107 mpiexec -report-bindings -np 2 -bind-to hwthread date [rs0.informatik.hs-fulda.de:19704] MCW rank 0 bound to : [B./../../..][../../../..] [rs0.informatik.hs-fulda.de:19704] MCW rank 1 bound to : [../B./../..][../../../..] Mon Sep 17 13:07:34 CEST 2012 Mon Sep 17 13:07:34 CEST 2012 I get a segmention fault if I increase the number of processes to 3. rs0 fd1026 108 mpiexec -report-bindings -np 3 -bind-to hwthread date -------------------------------------------------------------------------- mpiexec noticed that process rank 0 with PID 19711 on node rs0.informatik.hs-fulda.de exited on signal 11 (Segmentation Fault). -------------------------------------------------------------------------- [rs0:19713] *** Process received signal *** [rs0:19713] Signal: Segmentation Fault (11) [rs0:19713] Signal code: Invalid permissions (2) [rs0:19713] Failing at address: 1000002e8 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x282640 /lib/sparcv9/libc.so.1:0xd8684 /lib/sparcv9/libc.so.1:0xcc1f8 /lib/sparcv9/libc.so.1:0xcc404 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x2c1488 [ Signal 11 (SEGV)] /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_hwloc_base_cset2str+0x28 /usr/local/openmpi-1.9_64_cc/lib64/openmpi/mca_odls_default.so:0xab00 /usr/local/openmpi-1.9_64_cc/lib64/openmpi/mca_odls_default.so:0xb7e4 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:orte_odls_base_default_launch_local+0xa20 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x2997f4 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x299a20 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_libevent2019_event_base_loop+0x1e8 /usr/local/openmpi-1.9_64_cc/bin/orterun:orterun+0x1920 /usr/local/openmpi-1.9_64_cc/bin/orterun:main+0x24 /usr/local/openmpi-1.9_64_cc/bin/orterun:_start+0x12c [rs0:19713] *** End of error message *** ... (same output for the other two processes) If I add "-bynode" I get a bus error. rs0 fd1026 110 mpiexec -report-bindings -np 2 -bynode -bind-to hwthread date -------------------------------------------------------------------------- mpiexec noticed that process rank 0 with PID 19724 on node rs0.informatik.hs-fulda.de exited on signal 10 (Bus Error). -------------------------------------------------------------------------- [rs0:19724] *** Process received signal *** [rs0:19724] Signal: Bus Error (10) [rs0:19724] Signal code: Invalid address alignment (1) [rs0:19724] Failing at address: 1 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x282640 /lib/sparcv9/libc.so.1:0xd8684 /lib/sparcv9/libc.so.1:0xcc1f8 /lib/sparcv9/libc.so.1:0xcc404 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x2c147c [ Signal 10 (BUS)] /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_hwloc_base_cset2str+0x28 /usr/local/openmpi-1.9_64_cc/lib64/openmpi/mca_odls_default.so:0xab00 /usr/local/openmpi-1.9_64_cc/lib64/openmpi/mca_odls_default.so:0xb7e4 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:orte_odls_base_default_launch_local+0xa20 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x2997f4 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:0x299a20 /usr/local/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:opal_libevent2019_event_base_loop+0x1e8 /usr/local/openmpi-1.9_64_cc/bin/orterun:orterun+0x1920 /usr/local/openmpi-1.9_64_cc/bin/orterun:main+0x24 /usr/local/openmpi-1.9_64_cc/bin/orterun:_start+0x12c [rs0:19724] *** End of error message *** ... (same output for the other two processes) I get a segmentation fault for the following commands. mpiexec -report-bindings -np 2 -map-by slot -bind-to hwthread date mpiexec -report-bindings -np 2 -map-by numa -bind-to hwthread date mpiexec -report-bindings -np 2 -map-by node -bind-to hwthread date I get a bus error for the following command. mpiexec -report-bindings -np 2 -map-by socket -bind-to hwthread date The following commands work. rs0 fd1026 120 mpiexec -report-bindings -np 2 -map-by hwthread -bind-to hwthread date [rs0.informatik.hs-fulda.de:19788] MCW rank 0 bound to : [B./../../..][../../../..] [rs0.informatik.hs-fulda.de:19788] MCW rank 1 bound to : [.B/../../..][../../../..] Mon Sep 17 13:20:30 CEST 2012 Mon Sep 17 13:20:30 CEST 2012 rs0 fd1026 121 mpiexec -report-bindings -np 2 -map-by core -bind-to hwthread date [rs0.informatik.hs-fulda.de:19793] MCW rank 0 bound to : [B./../../..][../../../..] [rs0.informatik.hs-fulda.de:19793] MCW rank 1 bound to : [../B./../..][../../../..] Mon Sep 17 13:21:06 CEST 2012 Mon Sep 17 13:21:06 CEST 2012 I think that the following output is correct because I have a Sun M4000 server with two quad-core processors each supporting two hardware-threads. rs0 fd1026 124 mpiexec -report-bindings -np 2 -map-by board -bind-to hwthread date -------------------------------------------------------------------------- The specified mapping policy is not recognized: Policy: BYBOARD Please check for a typo or ensure that the option is a supported one. -------------------------------------------------------------------------- In my opinion I should be able to start and bind up to 16 processes if a map and bind to hwthreads or not? Thank you very much for any help in advance. Kind regards Siegmar