Do you know what r# of 1.6 you were trying to compile? Is this via the tarball or svn?

thanks,

--td

On 7/30/2012 9:41 AM, Daniel Junglas wrote:
Hi,

I compiled OpenMPI 1.6 on a 64bit Solaris ultrasparc machine.
Compilation and installation worked without a problem. However,
when trying to run an application with mpirun I always faced
this error:

[hostname:14798] [[50433,0],0] rmcast:init: setsockopt() failed on
MULTICAST_IF
         for multicast network xxx.xxx.xxx.xxx interface xxx.xxx.xxx.xxx
         Error: Invalid argument (22)
[hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file
../../../../../openmpi-1.6/orte/mca/rmcast/udp/rmcast_udp.c at line 825
[hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file
../../../../../openmpi-1.6/orte/mca/rmcast/udp/rmcast_udp.c at line 744
[hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file
../../../../../openmpi-1.6/orte/mca/rmcast/udp/rmcast_udp.c at line 193
[hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file
../../../../openmpi-1.6/orte/mca/rmcast/base/rmcast_base_select.c at line
56
[hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file
../../../../../openmpi-1.6/orte/mca/ess/hnp/ess_hnp_module.c at line 233
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

   orte_rmcast_base_select failed
   -->  Returned value Error (-1) instead of ORTE_SUCCESS


After some digging I found that the following patch seems to fix the
problem (at least the application seems to run correct now):
--- a/orte/mca/rmcast/udp/rmcast_udp.c  Tue Apr  3 16:30:29 2012
+++ b/orte/mca/rmcast/udp/rmcast_udp.c  Mon Jul 30 15:12:02 2012
@@ -936,9 +936,16 @@
          }
      } else {
          /* on the xmit side, need to set the interface */
+        void const *addrptr;
          memset(&inaddr, 0, sizeof(inaddr));
          inaddr.sin_addr.s_addr = htonl(chan->interface);
+#ifdef __sun
+        addrlen = sizeof(inaddr.sin_addr);
+        addrptr = (void *)&inaddr.sin_addr;
+#else
          addrlen = sizeof(struct sockaddr_in);
+        addrptr = (void *)&inaddr;
+#endif

          OPAL_OUTPUT_VERBOSE((2, orte_rmcast_base.rmcast_output,
                               "setup:socket:xmit interface
%03d.%03d.%03d.%03d",
@@ -945,7 +952,7 @@
                               OPAL_IF_FORMAT_ADDR(chan->interface)));

          if ((setsockopt(target_sd, IPPROTO_IP, IP_MULTICAST_IF,
-                        (void *)&inaddr, addrlen))<  0) {
+                        addrptr, addrlen))<  0) {
              opal_output(0, "%s rmcast:init: setsockopt() failed on
MULTICAST_IF\n"
                          "\tfor multicast network %03d.%03d.%03d.%03d
interface %03d.%03d.%03d.%03d\n\tError: %s (%d)",
                          ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
Can anybody confirm that the patch is good/correct? In particular
that the '__sun' part is the right thing to do?

Thanks,

Daniel


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>



Reply via email to