Re: [OMPI devel] How to add a schedule algorithm to the pml

2010-09-22 Thread Joshua Hursey
For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > --

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r23936

2010-10-26 Thread Joshua Hursey
had before. >> >> It's a feature! >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> >>

Re: [OMPI devel] 1.5.x plans

2010-11-01 Thread Joshua Hursey
rom people who weren't there on the call today? >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > Joshua Hursey Postdoctoral Research Associate Oak Ridge National Laboratory http://users.nccs.gov/~jjhursey

[OMPI devel] Fwd: [all-osl-users] Fwd: Servers reboot on Wednesday (11/24) morning starting at 8:00

2010-11-22 Thread Joshua Hursey
l and Alessandro > > This is supposed to be a quick reboot for the new kernel to kick in. So > if this is a problem to you, please let us know ASAP and we can > reschedule the machine you do not to reboot on Wednesday. > > Thanks and have a Happy Thanksgiving holiday. > > Bruce > Joshua Hursey Postdoctoral Research Associate Oak Ridge National Laboratory http://users.nccs.gov/~jjhursey

Re: [OMPI devel] 1.5 plans

2010-11-30 Thread Joshua Hursey
o to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > Joshua Hursey Postdoctoral Research Associate Oak Ridge National Laboratory http://users.nccs.gov/~jjhursey

Re: [OMPI devel] 1.5 plans

2010-11-30 Thread Joshua Hursey
1/30/2010 09:00 AM, Jeff Squyres wrote: >> On Nov 30, 2010, at 8:54 AM, Joshua Hursey wrote: >> >> >>> Can you make a v1.7 milestone on Trac, so I can move some of my tickets? >>> >> Done. >> > I have a question about Josh's recent ticket mo

Re: [OMPI devel] Some questions about checkpoint/restart (16)

2010-12-22 Thread Joshua Hursey
k == 0) { > printf(" rank=%d loop=%d \n",rank,i); fflush(stdout); > } > } > if (rank == 0) { >printf(" rank=%d 60 seconds sleeping finished \n",rank); fflush(stdout); > } > > MPI_Barrier(MPI_COMM_WORLD); > if (rank == 0) { >printf(" rank=%d executes Finalize \n",rank); fflush(stdout); > } > MPI_Finalize(); > if (rank == 0) { >printf(" rank=%d program end \n",rank); fflush(stdout); > } > return(0); > } > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > Joshua Hursey Postdoctoral Research Associate Oak Ridge National Laboratory http://users.nccs.gov/~jjhursey

Re: [OMPI devel] Some questions about checkpoint/restart (16)

2010-12-29 Thread Joshua Hursey
by my simple test > program. > > Best regards, > Takayuki Seki. > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > Joshua Hursey Postdoctoral Research Associate Oak Ridge National Laboratory http://users.nccs.gov/~jjhursey

Re: [OMPI devel] Change in communication between process (RMAPS)

2011-01-06 Thread Joshua Hursey
; jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel Joshua Hursey Postdoctoral Research Associate Oak Ridge National Laboratory http://users.nccs.gov/~jjhursey

Re: [OMPI devel] Building Open MPI components outside of the sourcetree

2011-01-20 Thread Joshua Hursey
oing_business/legal/cri/ >>>>>> >>>>>> >>>>>> ___ >>>>>> devel mailing list >>>>>> de...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> >>>>> >>>>> ___ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> >>>> >>>> -- >>>> Jeff Squyres >>>> jsquy...@cisco.com >>>> For corporate legal information go to: >>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>> >>>> >>>> ___ >>>> devel mailing list >>>> de...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > Joshua Hursey Postdoctoral Research Associate Oak Ridge National Laboratory http://users.nccs.gov/~jjhursey

Re: [OMPI devel] OMPI-MIGRATE error

2011-01-25 Thread Joshua Hursey
the MPI_FINALIZE(), but with one process ompi-checkpoint and > ompi-restart work great. > > Best regards. > > Hugo Meyer > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel Joshua Hursey Postdoctoral Research Associate Oak Ridge National Laboratory http://users.nccs.gov/~jjhursey

Re: [OMPI devel] OMPI-MIGRATE error

2011-01-26 Thread Joshua Hursey
I'm going to try with the trunk head, and then i'll let you know how it goes. > > Best regards. > > Hugo Meyer > > 2011/1/25 Joshua Hursey > > Can you try with the current trunk head (r24296)? > I just committed a fix for the C/R functionality in which rest

Re: [OMPI devel] OMPI-MIGRATE error

2011-01-27 Thread Joshua Hursey
e application since automatic >recovery cannot occur. > Internal Name: [[62740,1],0] > MCW Rank: 0 > > ------ > [clus9:18082] 1 more process has sent help message help-orte-errmgr-hnp.txt / > autor_f

Re: [OMPI devel] OMPI-MIGRATE error

2011-01-31 Thread Joshua Hursey
ike the automatic recovery is jumping in while migrating, which should not be happening. I'll take a look and see if I can reproduce locally. Thanks, Josh > > I'm using the ompi-migrate command in the right way? or i am missing > something? Because the first attempt didn'

Re: [OMPI devel] OMPI-MIGRATE error

2011-01-31 Thread Joshua Hursey
-up text file of the output. It might show us where things are going wrong: orte_debug_daemons=1 errmgr_base_verbose=20 snapc_full_verbose=20 -- Josh On Jan 31, 2011, at 9:46 AM, Joshua Hursey wrote: > > On Jan 31, 2011, at 6:47 AM, Hugo Meyer wrote: &g

Re: [OMPI devel] OMPI-MIGRATE error

2011-01-31 Thread Joshua Hursey
end of the > file i put the output of the second terminal. > > Best Regards > > Hugo Meyer > > 2011/1/31 Joshua Hursey > So I was not able to reproduce this issue. > > A couple notes: > - You can see the node-to-process-rank mapping using the '-display

[OMPI devel] Return status of MPI_Probe()

2011-03-21 Thread Joshua Hursey
e. Can anyone shed some light on this topic for me? Thanks, Josh -------- Joshua Hursey Postdoctoral Research Associate Oak Ridge National Laboratory http://users.nccs.gov/~jjhursey

Re: [OMPI devel] Return status of MPI_Probe()

2011-03-22 Thread Joshua Hursey
e will >> be used for any MPI exception that occurs during a call to MPI for the >> respective object. > > george. > > On Mar 21, 2011, at 16:50 , Joshua Hursey wrote: > >> If MPI_Probe() encounters an error causing it to exit with the >> 'statu

Re: [OMPI devel] Return status of MPI_Probe()

2011-03-22 Thread Joshua Hursey
R for the 1.4 and > 1.5. > > Thanks, >george. > > > On Mar 22, 2011, at 09:04 , Joshua Hursey wrote: > >> George, >> >> I agree that it is difficult to come up with a good scenario, outside of >> resilience, in which MPI_Probe would ret

[OMPI devel] Fwd: [devel-core] Open MPI Developers Meeting

2011-03-30 Thread Joshua Hursey
Rich wanted to make this available to a broader audience. Re-posting to the devel list. Begin forwarded message: > From: Joshua Hursey > Date: March 30, 2011 9:14:03 AM CDT > Subject: [devel-core] Open MPI Developers Meeting > > It has been requested that we have a face-t

Re: [OMPI devel] Fwd: [devel-core] Open MPI Developers Meeting

2011-04-01 Thread Joshua Hursey
If you are planning on attending, please let Rich (rlgraham -at- ornl -dot- gov) and I know as soon as possible. Thanks, Josh On Mar 30, 2011, at 10:36 AM, Joshua Hursey wrote: > Rich wanted to make this available to a broader audience. Re-posting to the > devel list. > > Begin

[OMPI devel] Open MPI Developers Meeting Agenda

2011-04-06 Thread Joshua Hursey
Collective) - Testing infrastructure (MTT) Keep sending agenda items to the list (or me directly if you would rather). I hope to have the agenda sketched out by the teleconf on 4/12 so we can fine tune it on the call. Thanks, Josh Joshua Hursey Postdoctoral

Re: [OMPI devel] Adaptive or fault-tolerant MPI

2011-04-22 Thread Joshua Hursey
On Apr 22, 2011, at 1:20 PM, N.M. Maclaren wrote: > On Apr 22 2011, Ralph Castain wrote: > >> Several of us are. Josh and George (plus teammates), and some other outside >> folks, are working the MPI side of it. >> >> I'm working only the ORTE side of the problem. >> >> Quite a bit of capabil

Re: [OMPI devel] Open MPI error

2011-04-25 Thread Joshua Hursey
Yeah this sounds like the limitation of the number of app contexts that we can use in ORTE. Since ompi-restart uses N app contexts to restart a job (one for each process in the original job), then it is possible that we can hit this limitation. I suspect that it should not be too difficult to c

[OMPI devel] Open MPI Meeting

2011-05-03 Thread Joshua Hursey
We will be starting a few min late. I'll hang around the visitor center but if you don't see me send me an email directly.

[OMPI devel] OMPI Meeting Schedule Change

2011-05-05 Thread Joshua Hursey
For those interested in joining the WebEx this afternoon, the schedule has been adjusted slightly. So please see the updated schedule on the wiki. -- Josh

Re: [OMPI devel] RFC: Fix missing code in MPI_Abort functionality

2011-06-09 Thread Joshua Hursey
The code to ask for the abort of the other processes >>>> in the group defined by the communicator is commented out. Since one >>>> process calling abort currently causes all processes in the job to >>>> abort, it has not been a big deal. However as

Re: [OMPI devel] RFC: Resilient ORTE

2011-06-09 Thread Joshua Hursey
;>>>>> ordering (which will be what the the approach can do), and can enforce >>>>>> that >>>>>> all callbacks will be called. I would rather prefer this approach. >>>>>> >>>>>> george. >>>>>> >>>>>> On Jun 9, 2

Re: [OMPI devel] RFC: Resilient ORTE

2011-06-18 Thread Joshua Hursey
. Please take another look at it if you have any interest. The code > >> can be found here: > >> https://bitbucket.org/wesbland/resilient-orte/ > >> Thanks, > >> Wesley Bland > > > > > > > > -- > > Joshua Hursey > > Postdoctoral Research Associate > > Oak Ridge National Laboratory > > http://users.nccs.gov/~jjhursey

Re: [OMPI devel] carto vs. hwloc

2009-12-16 Thread Joshua Hursey
Currently, I am working on process migration and automatic recovery based on checkpoint/restart. WRT the PML stack, this works by rewiring the BTLs after restart of the migrated/recovered MPI process(es). There is a fair amount of work in getting this right with respect to both the runtime and t

[OMPI devel] Fwd: [osl-staff] [all-osl-users] OSL systems maintenance

2009-12-29 Thread Joshua Hursey
FYI. This will affect the Open MPI Trac and SVN on Wednesday morning. Begin forwarded message: > From: "Kim, DongInn" > Date: December 28, 2009 3:55:28 PM EST > To: all-osl-us...@osl.iu.edu > Subject: [osl-staff] [all-osl-users] OSL systems maintenance > Reply-To: Internal OSL staff mailing list

[OMPI devel] Fwd: Update on CS mail problem

2010-01-08 Thread Joshua Hursey
You may have noticed that some of the messages from this morning were marked as a virus (prefixed with [PMX:VIRUS]). This was caused by the problem described below by Rob. This affected the various mailing lists (including all the Open MPI project lists) that were hosted by IU. The admins at IU

Re: [OMPI devel] New feature for SVN commit messages

2010-02-05 Thread Joshua Hursey
Is this functionality still working? I added 'cmr:v1.5.1' to r22564 and it did not create a ticket. I noticed a few of the tickets manually created yesterday also cited this problem. -- Josh On Feb 3, 2010, at 8:23 AM, Jeff Squyres wrote: > A little while ago, IU added the feature of automatic

[OMPI devel] v1.4 broken

2010-02-17 Thread Joshua Hursey
I just noticed that the nightly tarball of v1.4 failed to build in the OpenIB BTL last night. The error was: - btl_openib_component.c: In function 'init_one_device': btl_openib_component.c:2089: error: 'mca_btl_openib_component_t' has no member named 'default_recv_qps' --

[OMPI devel] Build issue: mpi_portable_platform.h

2010-03-12 Thread Joshua Hursey
I noticed the following build error on the OMPI trunk (r22821) on IU's Odin machine: make[3]: *** No rule to make target `mpi_portable_platform.h', needed by `all-am'. Stop. I took a quick pass through the svn commit log and did not see anything that would have broken this. Any thoughts on w

Re: [OMPI devel] Build issue: mpi_portable_platform.h

2010-03-12 Thread Joshua Hursey
at least r22789. > > Hope, this helps? > > Best regards, > RAiner > > > On Friday 12 March 2010 04:17:41 pm Joshua Hursey wrote: >> I noticed the following build error on the OMPI trunk (r22821) on IU's Odin >> machine: make[3]: *** No rule to make target

Re: [OMPI devel] Build issue: mpi_portable_platform.h

2010-03-12 Thread Joshua Hursey
files to build up a .hgignore file. I run this every time I svn > up on my hg+svn tree. > > > On Mar 12, 2010, at 3:06 PM, Joshua Hursey wrote: > >> I think I figured it out. The error was coming from a Mercurial branch >> cloned from my internal HG+SVN branch. HG pr

Re: [OMPI devel] RFC: Merge tmp fault recovery branch into trunk

2010-03-23 Thread Joshua Hursey
Just a reminder that this RFC will go into the trunk this evening unless there are strong objections. We intend to let this soak for a few days then bring it over to the 1.5 series (after the 1.5.0 release). -- Josh On Mar 15, 2010, at 9:26 AM, Josh Hursey wrote: > (Updated RFC, per offline d

Re: [OMPI devel] trunk breakage

2010-05-22 Thread Joshua Hursey
Along with this, the exit code from mpirun is not correct. It is returning 1, even when the run was successful. This is showing up in MTT, where the trivial test suite is failing things like 'hello world' since the return code is not what was expected. Ralph is looking into this, but I just wan

Re: [OMPI devel] trunk breakage

2010-05-22 Thread Joshua Hursey
ote: > > On May 22, 2010, at 8:43 AM, Joshua Hursey wrote: > >> Along with this, the exit code from mpirun is not correct. It is returning >> 1, even when the run was successful. This is showing up in MTT, where the >> trivial test suite is failing things like 'hel

[OMPI devel] RFC: Checkpoint/Restart Advancements and Bug Fixes

2010-07-31 Thread Joshua Hursey
WHAT: Checkpoint/Restart-based automatic recovery and process migration, advanced checkpoint storage, C/R-enabled debugging, MPI Extension API for C/R, and some bug fixes. WHY: This commit includes a variety of checkpoint/restart advancements that have been pending on a temporary branch for a l

Re: [OMPI devel] RFC: Checkpoint/Restart Advancements and Bug Fixes

2010-08-10 Thread Joshua Hursey
Committed in r23587 :) On Jul 31, 2010, at 12:51 PM, Joshua Hursey wrote: > WHAT: > Checkpoint/Restart-based automatic recovery and process migration, advanced > checkpoint storage, C/R-enabled debugging, MPI Extension API for C/R, and > some bug fixes. > > WHY: > T

Re: [OMPI devel] Question on MCA_BASE_METADATA_PARAM_NONE

2010-08-23 Thread Joshua Hursey
t > the intended recipient, you should not disseminate, distribute or copy this > e-mail. Please notify the sender immediately and destroy all copies of this > message and any attachments. > > WARNING: Computer viruses can be transmitted via email. The recipient should > check

Re: [OMPI devel] Checkpoint/restart question

2010-08-26 Thread Joshua Hursey
may > not be allowed nor to the correct thing in openmpi?! > > Thanks for any ideas/help/pointers to more information! > > Tomas > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listin

Re: [OMPI devel] nit-pick: typo in README (1.4.3rc1 and 1.5rc5)

2010-08-26 Thread Joshua Hursey
tment Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >

Re: [OMPI devel] Question on the members of ompi_crcp_bkmrk_pml_drain_message_ref_t and ompi_crcp_bkmrk_pml_traffic_message_ref_t

2010-08-26 Thread Joshua Hursey
The recipient should > check this email and any attachments for the presence of viruses. The company > accepts no liability for any damage caused by any virus transmitted by this > email. > > www.wipro.com > > Joshua Hursey Postdoctoral Research Associate Oak Ridge National Laboratory http://www.cs.indiana.edu/~jjhursey

Re: [OMPI devel] Multi-environment builds

2007-07-11 Thread Joshua Hursey
On Jul 11, 2007, at 8:09 AM, Terry D. Dontje wrote: Jeff Squyres wrote: On Jul 10, 2007, at 1:26 PM, Ralph H Castain wrote: 2. It may be useful to have some high-level parameters to specify a specific run-time environment, since ORTE has multiple, related frameworks (e.g., RAS and PLS).

Re: [OMPI devel] Notes on building and running Open MPI on Red Storm

2007-07-12 Thread Joshua Hursey
Thanks for the heads up. I've noticed this warning on the Cray systems here at ORNL, and haven't had a chance to put the fix in yet. This function is exposed in non-CR builds as a user interface item. If the user requests a checkpoint of an MPI job that was not compiled with C/R (or doesn't