Ralph actually suggests that we just remove rmcast from 1.6.1.
On Jul 30, 2012, at 10:15 AM, TERRY DONTJE wrote: > Do you know what r# of 1.6 you were trying to compile? Is this via the > tarball or svn? > > thanks, > > --td > > On 7/30/2012 9:41 AM, Daniel Junglas wrote: >> Hi, >> >> I compiled OpenMPI 1.6 on a 64bit Solaris ultrasparc machine. >> Compilation and installation worked without a problem. However, >> when trying to run an application with mpirun I always faced >> this error: >> >> [hostname:14798] [[50433,0],0] rmcast:init: setsockopt() failed on >> MULTICAST_IF >> for multicast network xxx.xxx.xxx.xxx interface xxx.xxx.xxx.xxx >> Error: Invalid argument (22) >> [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file >> ../../../../../openmpi-1.6/orte/mca/rmcast/udp/rmcast_udp.c at line 825 >> [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file >> ../../../../../openmpi-1.6/orte/mca/rmcast/udp/rmcast_udp.c at line 744 >> [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file >> ../../../../../openmpi-1.6/orte/mca/rmcast/udp/rmcast_udp.c at line 193 >> [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file >> ../../../../openmpi-1.6/orte/mca/rmcast/base/rmcast_base_select.c at line >> 56 >> [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file >> ../../../../../openmpi-1.6/orte/mca/ess/hnp/ess_hnp_module.c at line 233 >> -------------------------------------------------------------------------- >> It looks like orte_init failed for some reason; your parallel process is >> likely to abort. There are many reasons that a parallel process can >> fail during orte_init; some of which are due to configuration or >> environment problems. This failure appears to be an internal failure; >> here's some additional information (which may only be relevant to an >> Open MPI developer): >> >> orte_rmcast_base_select failed >> --> Returned value Error (-1) instead of ORTE_SUCCESS >> >> >> After some digging I found that the following patch seems to fix the >> problem (at least the application seems to run correct now): >> --- a/orte/mca/rmcast/udp/rmcast_udp.c Tue Apr 3 16:30:29 2012 >> +++ b/orte/mca/rmcast/udp/rmcast_udp.c Mon Jul 30 15:12:02 2012 >> @@ -936,9 +936,16 @@ >> } >> } else { >> /* on the xmit side, need to set the interface */ >> + void const *addrptr; >> memset(&inaddr, 0, sizeof(inaddr)); >> inaddr.sin_addr.s_addr = htonl(chan->interface); >> +#ifdef __sun >> + addrlen = sizeof(inaddr.sin_addr); >> + addrptr = (void *)&inaddr.sin_addr; >> +#else >> addrlen = sizeof(struct sockaddr_in); >> + addrptr = (void *)&inaddr; >> +#endif >> >> OPAL_OUTPUT_VERBOSE((2, orte_rmcast_base.rmcast_output, >> "setup:socket:xmit interface >> %03d.%03d.%03d.%03d", >> @@ -945,7 +952,7 @@ >> OPAL_IF_FORMAT_ADDR(chan->interface))); >> >> if ((setsockopt(target_sd, IPPROTO_IP, IP_MULTICAST_IF, >> - (void *)&inaddr, addrlen)) < 0) { >> + addrptr, addrlen)) < 0) { >> opal_output(0, "%s rmcast:init: setsockopt() failed on >> MULTICAST_IF\n" >> "\tfor multicast network %03d.%03d.%03d.%03d >> interface %03d.%03d.%03d.%03d\n\tError: %s (%d)", >> ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), >> Can anybody confirm that the patch is good/correct? In particular >> that the '__sun' part is the right thing to do? >> >> Thanks, >> >> Daniel >> >> >> >> _______________________________________________ >> users mailing list >> >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > -- > Terry D. Dontje | Principal Software Engineer > Developer Tools Engineering | +1.781.442.2631 > Oracle - Performance Technologies > 95 Network Drive, Burlington, MA 01803 > Email terry.don...@oracle.com > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/