Re: [OMPI devel] Uninitialized ORTE epoch values

2011-08-08 Thread Wesley Bland
Fixed in r25015. On Fri, Aug 5, 2011 at 4:52 PM, Ralph Castain wrote: > Thanks Wes - it isn't the print that's the issue, it's the fact that we > have epochs that aren't being initialized, and what else that may be causing > to have problems. > > > On Aug 5, 2011, at 2:45 PM, Wesley Bland wrote:

Re: [OMPI devel] Uninitialized ORTE epoch values

2011-08-05 Thread Ralph Castain
Thanks for the explanation. It kinda begs a question, though - I've noticed that the assignment of epoch seems to circle around in a number of places. We call the ess_base function to get_epoch, and then we assign an epoch. But the base function actually seem to do much, if anything. It's somew

Re: [OMPI devel] Uninitialized ORTE epoch values

2011-08-05 Thread Thomas Herault
The warnings issued through ess_base_select.c:46 are annoying but harmless. Wesley is going to hunt them and remove them, but they are really issued because of the print: orte_ess_base_proc_get_epoch (ess_base_select.c:46) calls ORTE_NAME_PRINT(proc), which prints proc->epoch, before proc->epoc

Re: [OMPI devel] Uninitialized ORTE epoch values

2011-08-05 Thread Ralph Castain
Thanks Wes - it isn't the print that's the issue, it's the fact that we have epochs that aren't being initialized, and what else that may be causing to have problems. On Aug 5, 2011, at 2:45 PM, Wesley Bland wrote: > I don't think these are anything to worry about since they're all print > st

Re: [OMPI devel] Uninitialized ORTE epoch values

2011-08-05 Thread Wesley Bland
I don't think these are anything to worry about since they're all print statements, but I will work on these tonight. On Fri, Aug 5, 2011 at 3:03 PM, Jeff Squyres wrote: > Ralph and I are trying to track down the mysterious ORTE error. > > In doing so, I have found at least one fairly repeatable

Re: [OMPI devel] Uninitialized ORTE epoch values

2011-08-05 Thread Jeff Squyres
BTW, the -1 file has an invalid free in it that we just fixed. That's not part of the epoch value issue, of course. :-) On Aug 5, 2011, at 3:03 PM, Jeff Squyres wrote: > Ralph and I are trying to track down the mysterious ORTE error. > > In doing so, I have found at least one fairly repeata

[OMPI devel] Uninitialized ORTE epoch values

2011-08-05 Thread Jeff Squyres
Ralph and I are trying to track down the mysterious ORTE error. In doing so, I have found at least one fairly repeatable error on my cluster: when running through SLURM the ibm/dynamic/spawn test, where we mpirun 3 procs and then we MPI_COMM_SPAWN 3 more. Running the orteds through valgrind,