Can you send your diff in unified form? On May 11, 2011, at 4:05 PM, Peter Thompson wrote:
> We've gotten a few reports of problems with memory debugging when using > OpenMPI under TotalView. Usually, TotalView will attach tot he processes > started after an MPI_Init. However in the case where memory debugging is > enabled, things seemed to run away or fail. My analysis showed that we had > a number of core files left over from the attempt, and all were mpirun (or > orterun) cores. It seemed to be a regression on our part, since testing > seemed to indicate this worked okay before TotalView 8.9.0-0, so I filed an > internal bug and passed it to engineering. After giving our engineer a > brief tutorial on how to build a debug version of OpenMPI, he found what > appears to be a problem in the code for orterun.c. He's made a slight > change that fixes the issue in 1.4.2, 1.4.3, 1.4.4rc2 and 1.5.3, those being > the versions he's tested with so far. He doesn't subscribe to this list > that I know of, so I offered to pass this by the group. Of course, I'm not > sure if this is exactly the right place to submit patches, but I'm sure you'd > tell me where to put it if I'm in the wrong here. It's a short patch, so > I'll cut and paste it, and attach as well, since cut and paste can do weird > things to formatting. > > Credit goes to Ariel Burton for this patch. Of course he used TotalVIew to > find this ;-) It shows up if you do 'mpirun -tv -np 4 ./foo' or 'totalview > mpirun -a -np 4 ./foo' > > Cheers, > PeterT > > > more ~/patches/anbs-patch > *** orte/tools/orterun/orterun.c 2010-04-13 13:30:34.000000000 -0400 > --- > /home/anb/packages/openmpi-1.4.2/linux-x8664-iwashi/installation/bin/../../. > ./src/openmpi-1.4.2/orte/tools/orterun/orterun.c 2011-05-09 > 20:28:16.5881 > 83000 -0400 > *************** > *** 1578,1588 **** > } > if (NULL != env) { > size1 = opal_argv_count(env); > for (j = 0; j < size1; ++j) { > ! putenv(env[j]); > } > } > /* All done */ > --- 1578,1600 ---- > } > if (NULL != env) { > size1 = opal_argv_count(env); > for (j = 0; j < size1; ++j) { > ! /* Use-after-Free error possible here. putenv does not copy > ! the string passed to it, and instead stores only the pointer. > ! env[j] may be freed later, in which case the pointer > ! in environ will now be left dangling into a deallocated > ! region. > ! So we make a copy of the variable. > ! */ > ! char *s = strdup(env[j]); > ! > ! if (NULL == s) { > ! return OPAL_ERR_OUT_OF_RESOURCE; > ! } > ! putenv(s); > } > } > /* All done */ > > *** orte/tools/orterun/orterun.c 2010-04-13 13:30:34.000000000 -0400 > --- > /home/anb/packages/openmpi-1.4.2/linux-x8664-iwashi/installation/bin/../../../src/openmpi-1.4.2/orte/tools/orterun/orterun.c > 2011-05-09 20:28:16.588183000 -0400 > *************** > *** 1578,1588 **** > } > > if (NULL != env) { > size1 = opal_argv_count(env); > for (j = 0; j < size1; ++j) { > ! putenv(env[j]); > } > } > > /* All done */ > > --- 1578,1600 ---- > } > > if (NULL != env) { > size1 = opal_argv_count(env); > for (j = 0; j < size1; ++j) { > ! /* Use-after-Free error possible here. putenv does not copy > ! the string passed to it, and instead stores only the pointer. > ! env[j] may be freed later, in which case the pointer > ! in environ will now be left dangling into a deallocated > ! region. > ! So we make a copy of the variable. > ! */ > ! char *s = strdup(env[j]); > ! > ! if (NULL == s) { > ! return OPAL_ERR_OUT_OF_RESOURCE; > ! } > ! putenv(s); > } > } > > /* All done */ > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/