For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
--
had before.
>>
>> It's a feature!
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>>
>>
rom people who weren't there on the call today?
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey
l and Alessandro
>
> This is supposed to be a quick reboot for the new kernel to kick in. So
> if this is a problem to you, please let us know ASAP and we can
> reschedule the machine you do not to reboot on Wednesday.
>
> Thanks and have a Happy Thanksgiving holiday.
>
> Bruce
>
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey
o to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey
1/30/2010 09:00 AM, Jeff Squyres wrote:
>> On Nov 30, 2010, at 8:54 AM, Joshua Hursey wrote:
>>
>>
>>> Can you make a v1.7 milestone on Trac, so I can move some of my tickets?
>>>
>> Done.
>>
> I have a question about Josh's recent ticket mo
k == 0) {
> printf(" rank=%d loop=%d \n",rank,i); fflush(stdout);
> }
> }
> if (rank == 0) {
>printf(" rank=%d 60 seconds sleeping finished \n",rank); fflush(stdout);
> }
>
> MPI_Barrier(MPI_COMM_WORLD);
> if (rank == 0) {
>printf(" rank=%d executes Finalize \n",rank); fflush(stdout);
> }
> MPI_Finalize();
> if (rank == 0) {
>printf(" rank=%d program end \n",rank); fflush(stdout);
> }
> return(0);
> }
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey
by my simple test
> program.
>
> Best regards,
> Takayuki Seki.
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey
; jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey
oing_business/legal/cri/
>>>>>>
>>>>>>
>>>>>> ___
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>>
>>>>> ___
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>>
>>>> --
>>>> Jeff Squyres
>>>> jsquy...@cisco.com
>>>> For corporate legal information go to:
>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>
>>>>
>>>> ___
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey
the MPI_FINALIZE(), but with one process ompi-checkpoint and
> ompi-restart work great.
>
> Best regards.
>
> Hugo Meyer
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey
I'm going to try with the trunk head, and then i'll let you know how it goes.
>
> Best regards.
>
> Hugo Meyer
>
> 2011/1/25 Joshua Hursey
>
> Can you try with the current trunk head (r24296)?
> I just committed a fix for the C/R functionality in which rest
e application since automatic
>recovery cannot occur.
> Internal Name: [[62740,1],0]
> MCW Rank: 0
>
> ------
> [clus9:18082] 1 more process has sent help message help-orte-errmgr-hnp.txt /
> autor_f
ike the automatic recovery is jumping in
while migrating, which should not be happening. I'll take a look and see if I
can reproduce locally.
Thanks,
Josh
>
> I'm using the ompi-migrate command in the right way? or i am missing
> something? Because the first attempt didn'
-up text file of the
output. It might show us where things are going wrong:
orte_debug_daemons=1
errmgr_base_verbose=20
snapc_full_verbose=20
-- Josh
On Jan 31, 2011, at 9:46 AM, Joshua Hursey wrote:
>
> On Jan 31, 2011, at 6:47 AM, Hugo Meyer wrote:
&g
end of the
> file i put the output of the second terminal.
>
> Best Regards
>
> Hugo Meyer
>
> 2011/1/31 Joshua Hursey
> So I was not able to reproduce this issue.
>
> A couple notes:
> - You can see the node-to-process-rank mapping using the '-display
e.
Can anyone shed some light on this topic for me?
Thanks,
Josh
--------
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey
e will
>> be used for any MPI exception that occurs during a call to MPI for the
>> respective object.
>
> george.
>
> On Mar 21, 2011, at 16:50 , Joshua Hursey wrote:
>
>> If MPI_Probe() encounters an error causing it to exit with the
>> 'statu
R for the 1.4 and
> 1.5.
>
> Thanks,
>george.
>
>
> On Mar 22, 2011, at 09:04 , Joshua Hursey wrote:
>
>> George,
>>
>> I agree that it is difficult to come up with a good scenario, outside of
>> resilience, in which MPI_Probe would ret
Rich wanted to make this available to a broader audience. Re-posting to the
devel list.
Begin forwarded message:
> From: Joshua Hursey
> Date: March 30, 2011 9:14:03 AM CDT
> Subject: [devel-core] Open MPI Developers Meeting
>
> It has been requested that we have a face-t
If you are planning on attending, please let Rich (rlgraham -at- ornl -dot-
gov) and I know as soon as possible.
Thanks,
Josh
On Mar 30, 2011, at 10:36 AM, Joshua Hursey wrote:
> Rich wanted to make this available to a broader audience. Re-posting to the
> devel list.
>
> Begin
Collective)
- Testing infrastructure (MTT)
Keep sending agenda items to the list (or me directly if you would rather). I
hope to have the agenda sketched out by the teleconf on 4/12 so we can fine
tune it on the call.
Thanks,
Josh
Joshua Hursey
Postdoctoral
On Apr 22, 2011, at 1:20 PM, N.M. Maclaren wrote:
> On Apr 22 2011, Ralph Castain wrote:
>
>> Several of us are. Josh and George (plus teammates), and some other outside
>> folks, are working the MPI side of it.
>>
>> I'm working only the ORTE side of the problem.
>>
>> Quite a bit of capabil
Yeah this sounds like the limitation of the number of app contexts that we can
use in ORTE. Since ompi-restart uses N app contexts to restart a job (one for
each process in the original job), then it is possible that we can hit this
limitation.
I suspect that it should not be too difficult to c
We will be starting a few min late. I'll hang around the visitor center but if
you don't see me send me an email directly.
For those interested in joining the WebEx this afternoon, the schedule has been
adjusted slightly. So please see the updated schedule on the wiki.
-- Josh
The code to ask for the abort of the other processes
>>>> in the group defined by the communicator is commented out. Since one
>>>> process calling abort currently causes all processes in the job to
>>>> abort, it has not been a big deal. However as
;>>>>> ordering (which will be what the the approach can do), and can enforce
>>>>>> that
>>>>>> all callbacks will be called. I would rather prefer this approach.
>>>>>>
>>>>>> george.
>>>>>>
>>>>>> On Jun 9, 2
. Please take another look at it if you have any interest. The code
> >> can be found here:
> >> https://bitbucket.org/wesbland/resilient-orte/
> >> Thanks,
> >> Wesley Bland
> >
> >
> >
> > --
> > Joshua Hursey
> > Postdoctoral Research Associate
> > Oak Ridge National Laboratory
> > http://users.nccs.gov/~jjhursey
Currently, I am working on process migration and automatic recovery based on
checkpoint/restart. WRT the PML stack, this works by rewiring the BTLs after
restart of the migrated/recovered MPI process(es). There is a fair amount of
work in getting this right with respect to both the runtime and t
FYI. This will affect the Open MPI Trac and SVN on Wednesday morning.
Begin forwarded message:
> From: "Kim, DongInn"
> Date: December 28, 2009 3:55:28 PM EST
> To: all-osl-us...@osl.iu.edu
> Subject: [osl-staff] [all-osl-users] OSL systems maintenance
> Reply-To: Internal OSL staff mailing list
You may have noticed that some of the messages from this morning were marked as
a virus (prefixed with [PMX:VIRUS]). This was caused by the problem described
below by Rob. This affected the various mailing lists (including all the Open
MPI project lists) that were hosted by IU.
The admins at IU
Is this functionality still working?
I added 'cmr:v1.5.1' to r22564 and it did not create a ticket. I noticed a few
of the tickets manually created yesterday also cited this problem.
-- Josh
On Feb 3, 2010, at 8:23 AM, Jeff Squyres wrote:
> A little while ago, IU added the feature of automatic
I just noticed that the nightly tarball of v1.4 failed to build in the OpenIB
BTL last night. The error was:
-
btl_openib_component.c: In function 'init_one_device':
btl_openib_component.c:2089: error: 'mca_btl_openib_component_t' has no member
named 'default_recv_qps'
--
I noticed the following build error on the OMPI trunk (r22821) on IU's Odin
machine:
make[3]: *** No rule to make target `mpi_portable_platform.h', needed by
`all-am'. Stop.
I took a quick pass through the svn commit log and did not see anything that
would have broken this. Any thoughts on w
at least r22789.
>
> Hope, this helps?
>
> Best regards,
> RAiner
>
>
> On Friday 12 March 2010 04:17:41 pm Joshua Hursey wrote:
>> I noticed the following build error on the OMPI trunk (r22821) on IU's Odin
>> machine: make[3]: *** No rule to make target
files to build up a .hgignore file. I run this every time I svn
> up on my hg+svn tree.
>
>
> On Mar 12, 2010, at 3:06 PM, Joshua Hursey wrote:
>
>> I think I figured it out. The error was coming from a Mercurial branch
>> cloned from my internal HG+SVN branch. HG pr
Just a reminder that this RFC will go into the trunk this evening unless there
are strong objections.
We intend to let this soak for a few days then bring it over to the 1.5 series
(after the 1.5.0 release).
-- Josh
On Mar 15, 2010, at 9:26 AM, Josh Hursey wrote:
> (Updated RFC, per offline d
Along with this, the exit code from mpirun is not correct. It is returning 1,
even when the run was successful. This is showing up in MTT, where the trivial
test suite is failing things like 'hello world' since the return code is not
what was expected.
Ralph is looking into this, but I just wan
ote:
>
> On May 22, 2010, at 8:43 AM, Joshua Hursey wrote:
>
>> Along with this, the exit code from mpirun is not correct. It is returning
>> 1, even when the run was successful. This is showing up in MTT, where the
>> trivial test suite is failing things like 'hel
WHAT:
Checkpoint/Restart-based automatic recovery and process migration, advanced
checkpoint storage, C/R-enabled debugging, MPI Extension API for C/R, and some
bug fixes.
WHY:
This commit includes a variety of checkpoint/restart advancements that have
been pending on a temporary branch for a l
Committed in r23587
:)
On Jul 31, 2010, at 12:51 PM, Joshua Hursey wrote:
> WHAT:
> Checkpoint/Restart-based automatic recovery and process migration, advanced
> checkpoint storage, C/R-enabled debugging, MPI Extension API for C/R, and
> some bug fixes.
>
> WHY:
> T
t
> the intended recipient, you should not disseminate, distribute or copy this
> e-mail. Please notify the sender immediately and destroy all copies of this
> message and any attachments.
>
> WARNING: Computer viruses can be transmitted via email. The recipient should
> check
may
> not be allowed nor to the correct thing in openmpi?!
>
> Thanks for any ideas/help/pointers to more information!
>
> Tomas
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listin
tment Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
The recipient should
> check this email and any attachments for the presence of viruses. The company
> accepts no liability for any damage caused by any virus transmitted by this
> email.
>
> www.wipro.com
>
>
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://www.cs.indiana.edu/~jjhursey
On Jul 11, 2007, at 8:09 AM, Terry D. Dontje wrote:
Jeff Squyres wrote:
On Jul 10, 2007, at 1:26 PM, Ralph H Castain wrote:
2. It may be useful to have some high-level parameters to specify a
specific run-time environment, since ORTE has multiple, related
frameworks (e.g., RAS and PLS).
Thanks for the heads up. I've noticed this warning on the Cray
systems here at ORNL, and haven't had a chance to put the fix in yet.
This function is exposed in non-CR builds as a user interface item.
If the user requests a checkpoint of an MPI job that was not compiled
with C/R (or doesn't
48 matches
Mail list logo