Re: [OMPI devel] C style rules / reformatting

2021-05-18 Thread Wesley Bland via devel
Hey folks, As a datapoint, in MPICH, we use GNU indent for this. Script to format code (either one file or the entire code tree) - https://github.com/pmodels/mpich/blob/main/maint/code-cleanup.bash

Re: [OMPI devel] devel Digest, Vol 3756, Issue 1

2019-10-22 Thread Wesley Bland via devel
No. The MPI Forum will be meeting at the Microsoft office in the same place as the last time it was held in Portland: https://www.mpi-forum.org/meetings/2018/02/logistics 1414 NW Northrup Street The website hasn’t yet been updated to include the logistics. > On Oct 22, 2019, at 1:09 PM, Ju-Hyo

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25093

2011-08-30 Thread Wesley Bland
t; > > Pavel (Pasha) Shamis > --- > Application Performance Tools Group > Computer Science and Math Division > Oak Ridge National Laboratory > > > > > > > On Aug 26, 2011, at 6:18 PM, Wesley Bland wrote: > >> The epoch and resilient rote code

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25093

2011-08-30 Thread Wesley Bland
t; > > Pavel (Pasha) Shamis > --- > Application Performance Tools Group > Computer Science and Math Division > Oak Ridge National Laboratory > > > > > > > On Aug 26, 2011, at 6:18 PM, Wesley Bland wrote: > >> The epoch and resilient rote code

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25093

2011-08-26 Thread Wesley Bland
The epoch and resilient rote code is now macro'd away. To enable use --enable-resilient-orte which defines: ORTE_ENABLE_EPOCH ORTE_RESIL_ORTE -- Wesley On Aug 26, 2011, at 6:16 PM, wbl...@osl.iu.edu wrote: > Author: wbland > Date: 2011-08-26 18:16:14 EDT (Fri, 26 Aug 2011) > New Revision: 25

Re: [OMPI devel] MPI_Errhandler_fatal_c failure

2011-08-18 Thread Wesley Bland
I just checked in a fix (I hope). I think the problem was that the errmgr was removing children from the list of odls children without using the mutex to prevent race conditions. Let me know if the MTT is still having problems tomorrow. Wes > I am seeing the intel test suite tests MPI_Errhandler_

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25015

2011-08-08 Thread Wesley Bland
of the function where the other initializations are. On Mon, Aug 8, 2011 at 11:41 AM, Barrett, Brian W wrote: > On 8/8/11 9:34 AM, "Jeff Squyres" wrote: > > >On Aug 8, 2011, at 11:30 AM, Wesley Bland wrote: > > > >> The reason is because valgrind was complaining

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25015

2011-08-08 Thread Wesley Bland
The reason is because valgrind was complaining about uninitialized values that were passed into proc_get_epoch. I saw the same warnings from valgrind when I ran it. I added the code to initialize the values to what really should be the default value and the warnings went away. Since the process_nam

Re: [OMPI devel] Uninitialized ORTE epoch values

2011-08-08 Thread Wesley Bland
Fixed in r25015. On Fri, Aug 5, 2011 at 4:52 PM, Ralph Castain wrote: > Thanks Wes - it isn't the print that's the issue, it's the fact that we > have epochs that aren't being initialized, and what else that may be causing > to have problems. > > > On Aug 5

Re: [OMPI devel] Uninitialized ORTE epoch values

2011-08-05 Thread Wesley Bland
I don't think these are anything to worry about since they're all print statements, but I will work on these tonight. On Fri, Aug 5, 2011 at 3:03 PM, Jeff Squyres wrote: > Ralph and I are trying to track down the mysterious ORTE error. > > In doing so, I have found at least one fairly repeatable

Re: [OMPI devel] RFC: Resilient ORTE

2011-06-23 Thread Wesley Bland
Committed in r24815. On Thursday, June 23, 2011 at 4:19 PM, Ralph Castain wrote: > > On Jun 23, 2011, at 2:14 PM, Wesley Bland wrote: > > Maybe before the ORTED saw the signal, it detected a communication failure > > and reacted to that. > > Quite possible. However, r

Re: [OMPI devel] RFC: Resilient ORTE

2011-06-23 Thread Wesley Bland
uld have been equally okay to simply call > "opal_event_dispatch" while waiting for the callback. > > All applications have to cycle the progress engine. > > > On Jun 23, 2011, at 1:18 PM, Wesley Bland wrote: > > Josh, > > > > There were a couple of

Re: [OMPI devel] RFC: Resilient ORTE

2011-06-23 Thread Wesley Bland
4 Pid 3843 -- Initalized > orte_abort: Name [[60292,1],3,0] Host: smoky14 Pid 3843 -- Calling Abort > [jjhursey@smoky14 system] echo $? > 3 > ---- > > Any ideas on what I might be doing wrong? > > I tried with both calling 'o

Re: [OMPI devel] RFC: Resilient ORTE

2011-06-23 Thread Wesley Bland
Last reminder (I hope). RFC goes in a COB today. Wesley

Re: [OMPI devel] RFC: Resilient ORTE

2011-06-18 Thread Wesley Bland
new patch on Friday COB (the RFC gave us 2 weeks to review the original patch). Would waiting until next Thursday/Friday COB be too disruptive? That should give me and maybe Ralph enough time to test and send any further feedback. > > Thanks, > Josh > > On Jun 17, 2011, at 5:59 PM, Wesl

Re: [OMPI devel] RFC: Resilient ORTE

2011-06-17 Thread Wesley Bland
refactoring, which should > probably be done once in the trunk instead of two possibly disruptive > commits. > > -- Josh > > On Fri, Jun 17, 2011 at 5:18 PM, Wesley Bland wrote: >> This is a reminder that the Resilient ORTE RFC is set to go into the trunk >> on Monda

Re: [OMPI devel] RFC: Resilient ORTE

2011-06-17 Thread Wesley Bland
ected normal termination issues). Please take another look at it if you have any interest. The code can be found here: https://bitbucket.org/wesbland/resilient-orte/ Thanks, Wesley Bland

Re: [OMPI devel] RFC: Resilient ORTE

2011-06-08 Thread Wesley Bland
On Tuesday, June 7, 2011 at 4:55 PM, Josh Hursey wrote: - orte_errmgr.post_startup() start the persistent RML message. There does not seem to be a shutdown version of this (to deregister the RML message at orte_finalize time). Was this intentional, or just missed? I just missed that one. I've ad

Re: [OMPI devel] RFC: Resilient ORTE

2011-06-07 Thread Wesley Bland
Definitely we are targeting ORTED failures here. If an ORTED fails than any other ORTEDs connected to it will notice and report the failure. Of course if the failure is an application than the ORTED on that node will be the only one to detect it. Also, if an ORTED is lost, all of the applicatio

Re: [OMPI devel] RFC: Resilient ORTE

2011-06-07 Thread Wesley Bland
> > Perhaps it would help if you folks could provide a little explanation about > how you use epoch? While the value sounds similar, your explanations are > beginning to sound very different from what we are doing and/or had > envisioned. > > I'm not sure how you can talk about an epoch being

Re: [OMPI devel] RFC: Resilient ORTE

2011-06-07 Thread Wesley Bland
On Tuesday, June 7, 2011 at 12:14 PM, Ralph Castain wrote: > > > On Tue, Jun 7, 2011 at 9:45 AM, Wesley Bland (mailto:wbl...@eecs.utk.edu)> wrote: > > To adress your concerns about putting the epoch in the process name > > structure, putting it in there rat

Re: [OMPI devel] RFC: Resilient ORTE

2011-06-07 Thread Wesley Bland
t; I know I'll have merge conflicts with my state machine branch, which would be > ready for commit in the same time frame, but I'll hold off on that one and > deal with the merge issues on my side. > > > > On Tue, Jun 7, 2011 at 8:46 AM, Wesley Bland (mailto:wbl..

Re: [OMPI devel] RFC: Resilient ORTE

2011-06-07 Thread Wesley Bland
This could certainly work alongside another ORCM or any other fault detection/prediction/recovery mechanism. Most of the code is just dedicated to keeping the epoch up to date and tracking the status of the processes. The underlying idea was to provide a way for the application to decide what it

Re: [OMPI devel] Open MPI Developers Meeting Agenda

2011-05-02 Thread Wesley Bland
Josh, Do you have a time that the meetings will be starting tomorrow for the Open MPI meeting? I'm sorry if I've missed it on the list. Thanks, Wesley Bland On Wed, Apr 6, 2011 at 11:09 AM, Joshua Hursey wrote: > Reminder: > If you are interested in attending the May 3-5 Ope

Re: [OMPI devel] [OMPI svn] svn:open-mpi r23628

2010-08-19 Thread Wesley Bland
So just to clarify, this means that we don't need to worry about having more than one errmgr module handling a single failure and therefore don't have to set the stack_state (which is now gone anyway). Am I reading this correctly? Thanks, Wesley On Thu, Aug 19, 2010 at 9:09 AM, wrote: > Author

Re: [OMPI devel] RFC: Merge tmp fault recovery branch into trunk

2010-03-10 Thread Wesley Bland
Josh, You mentioned some MCA parameters that you would include in the email, but I don't see those parameters anywhere. Could you please put those in here to make testing easier for people. Wesley On Wed, Mar 10, 2010 at 1:26 PM, Josh Hursey wrote: > Yesterday evening George, Thomas and I dis