Re: [OMPI devel] 5.0.x bug release blockers hard stop

2023-02-21 Thread Josh Hursey via devel
. Please levy any objections to this here.   Thanks, William -- Josh Hursey IBM Spectrum MPI Developer

Re: [OMPI devel] First Open MPI v5.0.x docs have been merged

2022-03-08 Thread Josh Hursey via devel
t was complete.  There's a LOT of docs there, but you'll also see a "to do" page, and a bunch of "TODO" items in the HTML docs.   We merged "early" so that we could get wider testing of the configury, and we need people to stop updating the old docs.  It was time to switch everyone over to the new system, and move forward from there. Please, please, please make edits to the docs.  Using RST makes the docs pretty darn easy. --  Jeff Squyres jsquy...@cisco.com <mailto:jsquy...@cisco.com> -- Josh Hursey IBM Spectrum MPI Developer

Re: [OMPI devel] Configure --help results

2021-02-03 Thread Josh Hursey via devel
ternal PMIx/PRRTE are being built, but that is probably worth the tradeoff.   Anyone else have thoughts or better ideas?   Brian -- Josh Hursey IBM Spectrum MPI Developer

Re: [OMPI devel] --mca coll choices

2020-04-07 Thread Josh Hursey via devel
: Hi Josh, It makes sense, thanks. Is there a debug flag that prints out which component is chosen? Regards, Luis On 07/04/2020 19:42, Josh Hursey via devel wrote: Good question. The reason for this behavior is that the Open MPI coll(ective) framework does not require tha

Re: [OMPI devel] --mca coll choices

2020-04-07 Thread Josh Hursey via devel
ymptom of another issue (e.g., a memory problem).   mca_coll_base_comm_select(MPI_COMM_WORLD) failed    --> Returned "Not found" (-13) instead of "Success" (0) Can you please help? Regards, Luis The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. -- Josh Hursey IBM Spectrum MPI Developer

[OMPI devel] CI Testing Coordination

2020-02-24 Thread Josh Hursey via devel
.   https://github.com/open-mpi/ompi-ci-tests/issues/1 I flagged some folks on the Issue but wanted to post here in case I missed anyone that might be interested. If you can fill it out this week then we can try to meet early next week to start our discussion. Thanks, Josh -- Josh Hursey IBM

[OMPI devel] Call for Participation: Inaugural PMIx Standard Quarterly Administrative Steering Committee (ASC) Meeting

2019-09-06 Thread Josh Hursey via devel
://github.com/pmix/pmix-standard [3] PMIx website: https://pmix.org -- Josh Hursey IBM Spectrum MPI Developer ___ devel mailing list devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/devel

[OMPI devel] Fwd: System Runtime Interfaces: What’s the Best Way Forward?

2019-02-07 Thread Josh Hursey
/hpc-runtime-wg/LGaHyZ0jRvE/n2t9MeSkDgAJ Thanks, Josh -- Forwarded message - From: Josh Hursey Date: Fri, Jan 18, 2019 at 6:55 PM Subject: System Runtime Interfaces: What’s the Best Way Forward? To: I'd like to share this meeting announcement with the PMIx community. I

Re: [OMPI devel] IBM CI re-enabled.

2018-10-19 Thread Josh Hursey
mpi.org > https://lists.open-mpi.org/mailman/listinfo/devel > > ___ > devel mailing list > devel@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/devel -- Josh Hursey IBM Spectrum MPI Developer ___ devel mailing list devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Upcoming nightly tarball URL changes

2018-03-22 Thread Josh Hursey
master-201803170305-bf3dd8a > master-201803082122-0f345c0 > master-201803160306-e08e580 > > > -- > > Best regards, Boris Karasev. > ___ > devel mailing list > devel@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/devel > > > >

Re: [OMPI devel] Upcoming nightly tarball URL changes

2018-03-13 Thread Josh Hursey
-mpi.org/ > nightly/open-mpi/master > > > Thanks, > > Brian > _______ > devel mailing list > devel@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/devel -- Josh Hursey IBM Spectrum MPI Developer ___ devel mailing list devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] ROMIO support in OpenMPI

2017-11-10 Thread Josh Hursey
hing i can resume when needed (we just have to agree on > romio version 3.2 vs 3.3a2). > > if you are going to SC, i'll spend quite some time at the RIST booth #219 > and i will be happy to discuss > this face to face. > > > Cheers, > > Gilles > > On 11/8/2017

Re: [OMPI devel] ROMIO support in OpenMPI

2017-11-08 Thread Josh Hursey
t; > https://lists.open-mpi.org/mailman/listinfo/devel > _______ > devel mailing list > devel@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/devel > -- Josh Hursey IBM Spectrum MPI Developer ___ devel mailing list devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] MTT database

2017-10-17 Thread Josh Hursey
. -- Josh On Thu, Oct 12, 2017 at 12:30 PM, Josh Hursey wrote: > It looks like the AWS database is unreachable. From the dashboard it looks > like we exhausted disk space, which might be the root cause. From the logs > I can see on the Dashboard, it looks like the psql daemon failed to

Re: [OMPI devel] MTT database

2017-10-12 Thread Josh Hursey
not connect to the ompidb database; submit this > run later. > > Howard > > ___ > devel mailing list > devel@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/devel > -- Josh Hursey IBM Spectrum MPI Developer _

Re: [OMPI devel] MTT / Open MPI Visibility of New Failures

2017-07-19 Thread Josh Hursey
We are putting a pin in this for now. Amazon had an idea they wanted to run with and report back. Once they have more info then we'll try to setup another meeting. On Tue, Jul 11, 2017 at 3:20 PM, Josh Hursey wrote: > In the Open MPI face-to-face meeting we had a long discussion abou

Re: [OMPI devel] MTT / Open MPI Visibility of New Failures

2017-07-12 Thread Josh Hursey
s and tendencies can be displayed by > jenkins > > Cheers, > > Gilles > > On Wed, Jul 12, 2017 at 5:20 AM, Josh Hursey > wrote: > > In the Open MPI face-to-face meeting we had a long discussion about how > to > > better harness MTT such that new failures are iden

[OMPI devel] MTT / Open MPI Visibility of New Failures

2017-07-11 Thread Josh Hursey
responsible for maintaining their "known to fail" list. If the "failed, but we expected to pass" number is >0 then this is a 'new failure' and an email to the community is generated. -- Josh Hursey IBM Spectrum MPI Developer _

Re: [OMPI devel] [3.0.0rc1] ppc64/gcc-4.8.3 check failure (regression).

2017-07-05 Thread Josh Hursey
; /home/phargrov/OMPI/openmpi-3.0.0rc1-linux-ppc64-gcc/ > > > openmpi-3.0.0rc1/test/class/opal_fifo.c:109:26: warning: assignment > > > discards 'volatile' qualifier from pointer target type [enabled by > default] > > > for (count = 0, item = fifo->opal_fifo_head.data.item ; item != > > > &fifo->opal_fifo_ghost ; > ___ > devel mailing list > devel@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel > -- Josh Hursey IBM Spectrum MPI Developer ___ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] Mellanox Jenkins

2017-06-22 Thread Josh Hursey
s.open-mpi.org >>> > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >>> >>> ___ >>> devel mailing list >>> devel@lists.open-mpi.org >>> https://rfd.newmexicoconsortium.org/mailman/

Re: [OMPI devel] Open MPI 3.x branch naming

2017-06-02 Thread Josh Hursey
o do if they had > local changes to a branch based on v3.x. In theory, it shouldn’t be much > work to clean all that up. But theory and practice don’t always match when > using git :). > > Brian > ___ > devel mailing list > devel@lists

Re: [OMPI devel] count = -1 for reduce

2017-05-05 Thread Josh Hursey
fd46c rw-p 00:00 0 >>> [stack] >>> >> >>> -- >>> >> >>> >> #include >>> >> #include

Re: [OMPI devel] OMPI devel] Travis: one thing that might help

2017-02-09 Thread Josh Hursey
ged. > >>>> unfortunatly, master could not build any more because there was a > >> indeed > >>>> a conflict that git had no way to detect. > >>> I agree -- building pushes is not a bad thing for exactly the reason > >> you cite.

Re: [OMPI devel] MTT Server Downtime - Fri., Oct. 21, 2016 (Updated)

2016-10-22 Thread Josh Hursey
move requires -no- changes to any of your MTT client setups. Let me know if you have any issues. -- Josh On Fri, Oct 21, 2016 at 9:53 PM, Josh Hursey wrote: > I have taken down the MTT Reporter at mtt.open-mpi.org while we finish up > the migration. I'll send out another email when

Re: [OMPI devel] MTT Server Downtime - Fri., Oct. 21, 2016 (Updated)

2016-10-21 Thread Josh Hursey
I have taken down the MTT Reporter at mtt.open-mpi.org while we finish up the migration. I'll send out another email when everything is up and running again. On Fri, Oct 21, 2016 at 10:17 AM, Josh Hursey wrote: > Reminder that the MTT will go offline starting at *Noon US Eastern (11

Re: [OMPI devel] MTT Server Downtime - Fri., Oct. 21, 2016 (Updated)

2016-10-21 Thread Josh Hursey
, Oct 19, 2016 at 10:14 AM, Josh Hursey wrote: > Based on current estimates we need to extend the window of downtime for > MTT to 24 hours. > > *Start time*: *Fri., Oct. 21, 2016 at Noon US Eastern* (11 am US Central) > *End time*: *Sat., Oct. 22, 2016 at Noon US Eastern* (estimated

Re: [OMPI devel] MTT Server Downtime - Fri., Oct. 21, 2016 (Updated)

2016-10-19 Thread Josh Hursey
any questions or concerns. On Tue, Oct 18, 2016 at 10:59 AM, Josh Hursey wrote: > We are moving this downtime to *Friday, Oct. 21 from 2-5 pm US Eastern*. > > We hit a snag with the AWS configuration that we are working through. > > On Sun, Oct 16, 2016 at 9:53 AM, Josh Hursey

Re: [OMPI devel] MTT Server Downtime - Tues., Oct. 18, 2016

2016-10-18 Thread Josh Hursey
We are moving this downtime to *Friday, Oct. 21 from 2-5 pm US Eastern*. We hit a snag with the AWS configuration that we are working through. On Sun, Oct 16, 2016 at 9:53 AM, Josh Hursey wrote: > I will announce this on the Open MPI developer's teleconf on Tuesday, > befo

[OMPI devel] MTT Server Downtime - Tues., Oct. 18, 2016

2016-10-16 Thread Josh Hursey
le to access MTT using themtt.open-mpi.org URL. No changes are needed in your MTT client setup, and all permalinks are expected to still work after the move. Let me know if you have any questions or concerns about the move. -- Josh Hursey IBM Spectrum MPI Deve

Re: [OMPI devel] ompi-release repo is now closed; long live the ompi repo!

2016-09-22 Thread Josh Hursey
Long live the ompi repo! > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: http://www.cisco.com/web/ > about/doing_business/legal/cri/ > > ___ > devel mailing list > devel@lists.open-mpi.org > htt

Re: [OMPI devel] openmpi-2.0.0 - problems with ppc64, PGI and atomics

2016-09-07 Thread Josh Hursey
032d70 t opal_atomic_unlock > >>>>>>> 00033a30 t opal_atomic_unlock > >>>>>>> 00047034 T opal_atomic_wmb > >>>>>>> 000324d0 t opal_lifo_pop_atomic > >>>>>>> 000cc260 t opal_lifo_pop_atomic > >>>>>>>

Re: [OMPI devel] Jenkins setup

2016-07-22 Thread Josh Hursey
There are two pages that I know of: * https://github.com/open-mpi/ompi/wiki/PRJenkins * https://github.com/open-mpi/ompi/wiki/PRJenkinsSetupFirewall I wrote the second one, and it includes some details on how to setup the Pull Request Builder. You probably don't need all of it, since it also has

Re: [OMPI devel] LANL jenkins update

2016-07-21 Thread Josh Hursey
Awesome. Can you add that to this wiki page for future reference: https://github.com/open-mpi/ompi/wiki/PRJenkins#how-to-re-trigger-jenkins-testing On Thu, Jul 21, 2016 at 10:39 AM, Pritchard Jr., Howard wrote: > Hi Folks, > > The LANL/(soon to not be iu) jenkins should now work with > > bot:la

Re: [OMPI devel] Migration of mailman mailing lists

2016-07-18 Thread Josh Hursey
Now that netloc has rolled into hwloc, I think it is safe to kill the netloc lists. mtt-devel-core and mtt-annouce should be kept. They probably need to be cleaned. But the hope is that we release MTT at some point in the near-ish future. On Mon, Jul 18, 2016 at 10:20 AM, Jeff Squyres (jsquyres)

Re: [OMPI devel] [2.0.0rc4] dlopen_test crash with xlc

2016-07-07 Thread Josh Hursey
t; IBM XL C/C++ for Linux, V13.1.2 (5725-C73, 5765-J08) > Version: 13.01.0002. > > $ [...]/configure -prefix=[] --enable-debug \ > CC=xlc CXX=xlC FC=xlf --disable-mpi-fortran > > > There is no xlf installation on this system. > > -Paul > > > On Thu,

Re: [OMPI devel] [2.0.0rc4] dlopen_test crash with xlc

2016-07-07 Thread Josh Hursey
Paul, What was the configure string you used for this? We have a Jenkins CI mechanism for XL, but it is using XLC 13.1.3 on that system and we haven't been runing 'make check'. I have another system that has XLC 13.1.2 that I can test on as well. I'm not sure if I'll be able to fix without Nathan

Re: [OMPI devel] Additional bot:retest target for IBM Jenkins: bot:ibm:retest

2016-07-01 Thread Josh Hursey
ld seem like some knowledge that we should > capture for others to use. > > > > > On Jul 1, 2016, at 9:37 AM, Josh Hursey wrote: > > > > I added a feature to IBM's Jenkins setup yesterday that other orgs doing > Jenkins CI testing might find helpful to add as well. &

[OMPI devel] Additional bot:retest target for IBM Jenkins: bot:ibm:retest

2016-07-01 Thread Josh Hursey
I added a feature to IBM's Jenkins setup yesterday that other orgs doing Jenkins CI testing might find helpful to add as well. We have the retest target for all of the Jenkins systems to re-test a particular PR: *bot:retest* I wanted an additional target that would trigger only the IBM Jenkins t

Re: [OMPI devel] RFC: Public Test Repo

2016-05-20 Thread Josh Hursey
here, if you like. > > > On May 19, 2016, at 8:24 AM, Josh Hursey wrote: > > I think talking to the MPICH folks about creating a common test pool might > be useful. More useful would be to get the MPI Forum to 'bless' it and take > input from all of the MPI venders. Maybe

Re: [OMPI devel] RFC: Public Test Repo

2016-05-19 Thread Josh Hursey
a > discussion w the MPICH folks to see if a) their test suite is general > enough for all MPI implementations, and B) if they would accept a bunch of > random tests from us? > > And if not, I think I'd like to understand better the value add that we > can provide by making an

Re: [OMPI devel] Github pricing plan changes announced today

2016-05-18 Thread Josh Hursey
Related to this conversation, I am proposing that we as a community try to cultivate some public tests. See the following link for the start of the dicussion. https://www.open-mpi.org/community/lists/devel/2016/05/18997.php We won't be able to open up the ompi-tests repo to the public. But we mi

[OMPI devel] RFC: Public Test Repo

2016-05-18 Thread Josh Hursey
WHAT: Create a public test repo (ompi-tests-public) to collect WHY: ompi-tests is private, and difficult/impossible to open up. There is a demand for a public collection of unit tests. This repo would allow us to cultivate such a collection of unit tests. WHERE: open-mpi GitHub project TIMEOUT:

Re: [OMPI devel] New Github labels

2016-05-11 Thread Josh Hursey
:+1: This is helpful. Thanks! On Wed, May 11, 2016 at 10:19 AM, Ralph Castain wrote: > Hi folks > > For PRs on the master, I have added two labels: > > Target: 2.x > Target: 1.10 > > These are intended to mark that this PR should be ported to the target > branch once it has been committed to the

Re: [OMPI devel] [PATCH] Fix for xlc-13.1.0 ICE (hwloc)

2016-05-09 Thread Josh Hursey
y_alias], > -[int * p_value __attribute__ ((__may_alias__));], > +[struct { int i; } __attribute__ ((__may_alias__)) * p_value;], > [], > []) > > > -Paul [proving that I am good for more than just *breaking* other people's > software

Re: [OMPI devel] [2.0.0rc2] xlc-13.1.0 ICE (hwloc)

2016-05-06 Thread Josh Hursey
Brice: Can you take a look at Paul's patch here: https://www.open-mpi.org/community/lists/devel/2016/05/18918.php Thanks, Josh On Thu, May 5, 2016 at 4:28 PM, Jeff Squyres (jsquyres) wrote: > On May 5, 2016, at 5:27 PM, Josh Hursey wrote: > > > > Since this also happe

Re: [OMPI devel] [2.0.0rc2] xlc-13.1.0 ICE (hwloc)

2016-05-05 Thread Josh Hursey
Thanks. I can confirm that too. I have a power7 with xlc -qversion: IBM XL C/C++ for Linux, V12.1 (5765-J03, 5725-C73) Version: 12.01.. And it built v2.0.0rc2 fine. Unfortunately, I don't have access to a power7 system with v13.1. We might have to just make this in the release notes. So

Re: [OMPI devel] opal/mca/dl/ opal_dl_open with NULL fname - assert?

2016-05-05 Thread Josh Hursey
that you could pass NULL as fname: > > https://github.com/open-mpi/ompi/blob/master/opal/mca/dl/dl.h#L67 > > These asserts can safely be removed. Thanks! > > > > On May 5, 2016, at 3:40 PM, Josh Hursey wrote: > > > > We noticed that there is

[OMPI devel] opal/mca/dl/ opal_dl_open with NULL fname - assert?

2016-05-05 Thread Josh Hursey
We noticed that there is an assert(fname) in both of the dl components: * https://github.com/open-mpi/ompi/blob/master/opal/mca/dl/dlopen/dl_dlopen_module.c#L53 * https://github.com/open-mpi/ompi/blob/master/opal/mca/dl/libltdl/dl_libltdl_module.c#L21 But according to the dl.h, NULL should be

Re: [OMPI devel] [2.0.0rc2] xlc build failure (inline asm)

2016-05-04 Thread Josh Hursey
ot accepting a > subset of the grammar. > > -Paul [Sent from my phone] > > On Wednesday, May 4, 2016, Nathan Hjelm wrote: > >> >> Go ahead, I don't have access to xlc so I couldn't verify myself. I >> don't fully understand why the last : can be omitte

Re: [OMPI devel] [2.0.0rc2] xlc build failure (inline asm)

2016-05-04 Thread Josh Hursey
Did someone pick this up to merge into master & v2.x? I can confirm that Paul's patch fixes the issue for XL compilers. I didn't see a PR for it, but can file one if no one has yet. On Mon, May 2, 2016 at 6:55 PM, Paul Hargrove wrote: > It appears that xlc's support for gcc-style inline asm doe

Re: [OMPI devel] [2.0.0rc2] xlc-13.1.0 ICE (hwloc)

2016-05-03 Thread Josh Hursey
ersion: 13.01.0002. > > However, it is worth noting that my understanding from IBM docs is that > the xlc for ppc64el is a VERY different compiler. > Specifically it uses the Clang front-end rather than IBM's own. > > -Paul > > > On Tue, May 3, 2016 at 7:47 AM, Jo

Re: [OMPI devel] [2.0.0rc2] xlc-13.1.0 ICE (hwloc)

2016-05-03 Thread Josh Hursey
Paul, What generation of the power arch are you using? We have successfully built (a few weeks ago) with the xlc compiler 13.1.3 on a Power 8 (pcc64le). It might be related the big vs. little endian, but I wonder if it is something that was fixed in a point release of the xlc compiler. Are you ab

Re: [OMPI devel] 1.10.3rc MTT failures

2016-04-25 Thread Josh Hursey
IBM had a stale version of ompi-tests. I have sync'ed that repo, and will try again later today. The loop spawn error will take some digging. I'll see what we can find. On Mon, Apr 25, 2016 at 9:14 AM, Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote: > This is a known bug that is bein

[OMPI devel] mpirun --verbose && opal_output_verbose() on 0

2016-03-24 Thread Josh Hursey
I was rummaging through the code today and made two observations related to verbose. Neither is terribly critical, but probably worth making the developer community aware. (2) might need to be fixed. (1) mpirun --verbose It does not seem to do anything in the current master (and probably for quite

Re: [OMPI devel] ORTE headers in OPAL source

2014-10-19 Thread Josh Hursey
mca_btl_sm_component.sm_seg->shmem_ds.seg_name); > > > Do you have an idea how to fix those two? The first variable > orte_cr_continue_like_restart could probably be moved but I am not sure > how to handle the sstore call. > > Adrian > &

Re: [OMPI devel] ORTE headers in OPAL source

2014-08-09 Thread Josh Hursey
Those calls should be protected with the CR FT #define - If I remember correctly. We were using the sstore to track the shared memory file names so we could clean them up on restart. I'm not sure if the sstore framework is necessary in this location, since we should be able to tell opal_crs and it

Re: [OMPI devel] 1-question developer poll

2014-04-17 Thread Josh Hursey
For Open MPI: - Primary: Mercurial (hosted on BitBucket - better deal for academia) - Secondary: Git (hosted on either BitBucket or GitHub) - SVN only to commit back For other projects: - SVN - Becoming less commonly used, but still used for some projects like Open MPI - Mercurial and Git - e

Re: [OMPI devel] orte-restart and PATH

2014-03-14 Thread Josh Hursey
It looks like I did not add the prefix path to the binary name before fork/exec in orte-restart. There is a string variable that you can use to get the appropriate prefix: opal_install_dirs.prefix from opal/mca/installdirs/installdirs.h It's the same one that Ralph mentioned that orterun uses

Re: [OMPI devel] Fix compiler warnings in FT code

2014-03-05 Thread Josh Hursey
>> >> >> On Mon, Mar 03, 2014 at 05:42:13PM +0100, Adrian Reber wrote: >> > I will prepare a patch that moves the parameter initialization >> somewhere else >> > and will not remove it. Do you think the other parts of the patch can be >> > applied (without ss

Re: [OMPI devel] Fix compiler warnings in FT code

2014-03-03 Thread Josh Hursey
ct() removal)? > > > On Mon, Mar 03, 2014 at 10:07:36AM -0600, Josh Hursey wrote: > > It should probably be moved to the component initialization of the sstore > > stage component since those parameters are how the user controls where to > > store those files. I think there

Re: [OMPI devel] Fix compiler warnings in FT code

2014-03-03 Thread Josh Hursey
gt; > On Mon, Mar 03, 2014 at 07:17:19AM -0600, Josh Hursey wrote: > > It looks like you removed a number of sstore stage MCA parameters. Did > they > > move somewhere else? or do you have a different way to set those > parameters? > > > > Other than that it lo

Re: [OMPI devel] Fix compiler warnings in FT code

2014-03-03 Thread Josh Hursey
It looks like you removed a number of sstore stage MCA parameters. Did they move somewhere else? or do you have a different way to set those parameters? Other than that it looks good to me. On Mon, Mar 3, 2014 at 5:29 AM, Adrian Reber wrote: > I have a simple patch which fixes the remaining co

Re: [OMPI devel] startup sstore orte/mca/ess/base/ess_base_std_tool.c

2014-02-21 Thread Josh Hursey
+1 On Fri, Feb 21, 2014 at 10:04 AM, Ralph Castain wrote: > looks fine to me > > > On Feb 21, 2014, at 6:23 AM, Adrian Reber wrote: > > > To restart a process using orte-restart I need sstore initialized when > > running as a tool. This is currently missing. The new code is > > > > #if OPAL_EN

Re: [OMPI devel] CRS/CRIU: add code to actually checkpoint a process

2014-02-18 Thread Josh Hursey
Yep. For the checkpoint/continue that patch looks good. On Tue, Feb 18, 2014 at 11:30 AM, Adrian Reber wrote: > On Tue, Feb 18, 2014 at 10:21:23AM -0600, Josh Hursey wrote: > > So when a process is restarted with CRIU, does it resume execution after > > the criu_dump() or

Re: [OMPI devel] OPAL_CRS_* meaning

2014-02-18 Thread Josh Hursey
> restart the process? I would have expected opal_crs.restart() is used > > for restart. I am confused. Looking at CRS/BLCR checkpoint() seems to > > only checkpoint and restart() seems to only restart. The comment in > > opal/mca/crs/crs.h says the same as you say. > > > &

Re: [OMPI devel] CRS/CRIU: add code to actually checkpoint a process

2014-02-18 Thread Josh Hursey
s_criu_checkpoint() I am using criu_dump() to > checkpoint the process and the plan is to use criu_restore() in > opal_crs_criu_restart() (which I have not yet implemented). > > On Mon, Feb 17, 2014 at 03:45:49PM -0600, Josh Hursey wrote: > > It look fine except that the res

Re: [OMPI devel] CRS/CRIU: add code to actually checkpoint a process

2014-02-17 Thread Josh Hursey
It look fine except that the restart state is not flagged. When a process is restarted does it resume execution inside the criu_dump() function? If so, is there a way to tell from its return code (or some other mechanism) that it is being restarted versus continuing after checkpointing? On Mon, F

Re: [OMPI devel] OPAL_CRS_* meaning

2014-02-17 Thread Josh Hursey
These values indicate the current state of the checkpointing lifecycle. In particular CONTINUE/RESTART are set by the checkpointer in the CRS (all others are used by the INC mechanism). In the opal_crs.checkpoint() call the checkpointer will capture the program state and it is possible to emerge fr

Re: [OMPI devel] new CRS component added (criu)

2014-02-07 Thread Josh Hursey
That is fantastic! Thanks for the hard work so far getting the C/R infrastructure back in place. On Fri, Feb 7, 2014 at 3:46 PM, Adrian Reber wrote: > I have created a new CRS component using criu (criu.org) to support > checkpoint/restart in Open MPI. My current patch only provides the > frame

Re: [OMPI devel] C/R and orte_oob

2014-02-07 Thread Josh Hursey
In the original implementation, the OOB ft_event did not do much of anything on checkpoint preparation and continue. We did not even close the sockets. However, during restart the OOB will need to renegotiate the socket connections - usually by calling the finalization function (close stale sockets

Re: [OMPI devel] SNAPC: dynamic send buffers

2014-01-29 Thread Josh Hursey
Looks good to me too. On Wed, Jan 29, 2014 at 11:00 AM, Ralph Castain wrote: > Looks good to me! > > On Jan 29, 2014, at 8:52 AM, Adrian Reber wrote: > > > Thanks for pointing out orte_rml_recv_callback(). It does just what I > > need. I removed my own callback and I am now using > orte_rml_re

Re: [OMPI devel] [PATCH] use ORTE_PROC_IS_APP

2014-01-23 Thread Josh Hursey
That should be ok. On Thu, Jan 23, 2014 at 10:17 AM, Ralph Castain wrote: > Sure - no issues with me > > > On Jan 23, 2014, at 7:10 AM, Adrian Reber wrote: > > > Selecting SNAPC requires the information if it is an app or not: > > > > int orte_snapc_base_select(bool seed, bool app); > > > > Th

Re: [OMPI devel] [PATCH] make orte-checkpoint communicate with orterun again

2014-01-23 Thread Josh Hursey
+1 On Thu, Jan 23, 2014 at 10:16 AM, Ralph Castain wrote: > Looks correct to me - you are right in that you cannot release the buffer > until after the send completes. We don't copy the data underneath to save > memory and time. > > > On Jan 23, 2014, at 6:51 AM, Adrian Reber wrote: > > > Foll

Re: [OMPI devel] callback debugging

2014-01-20 Thread Josh Hursey
If it is the application, then there is probably a barrier in the app_coord_init() to make sure all the applications are up and running. After this point then the global coordinator knows that the application can be checkpointed. I don't think orte-checkpoint should be calling a barrier - from wha

Re: [OMPI devel] return value of opal_compress_base_register() in opal/mca/compress/base/compress_base_open.c

2014-01-07 Thread Josh Hursey
erbose(10, > opal_compress_base_framework.framework_output, > > -"compress:open: FT is not enabled, > skipping!"); > > -return OPAL_SUCCESS; > > -} > > - > > /* Open up all available components */ > >

Re: [OMPI devel] return value of opal_compress_base_register() in opal/mca/compress/base/compress_base_open.c

2014-01-02 Thread Josh Hursey
I think the only reason I protected that framework is to reduce the overhead of an application using a build of Open MPI with CR support, but no enabling it at runtime. Nothing in the compress framework depends on the CR infrastructure (although the CR infrastructure can use the compress framework

Re: [OMPI devel] [PATCH v3 1/2] Trying to get the C/R code to compile again. (recv_*_nb)

2014-01-02 Thread Josh Hursey
+1 On Thu, Dec 19, 2013 at 4:04 PM, Ralph Castain wrote: > Looks okay to me. On the places where you need to block while waiting for > an answer, you can use OMPI_WAIT_FOR_COMPLETION - this will spin on > opal_progress until the condition is met. We use it elsewhere for similar > purposes. > >

Re: [OMPI devel] [PATCH v3 2/2] Trying to get the C/R code to compile again. (send_*_nb)

2014-01-02 Thread Josh Hursey
(Sorry for the delay, just catching up on email after the holidays) I think that looks good too. On Thu, Dec 19, 2013 at 4:01 PM, Ralph Castain wrote: > +1 from me > > > On Dec 19, 2013, at 12:54 PM, Adrian Reber wrote: > > > From: Adrian Reber > > > > This patch changes all send/send_buffer

Re: [OMPI devel] [PATCH v2 1/2] Trying to get the C/R code to compile again. (recv_*_nb)

2014-01-02 Thread Josh Hursey
(Sorry for the delay, just catching up on email after the holidays) I agree with Ralph. You can remove the old function signatures, but keep the places where you replace a blocking send/recv with a non-blocking version. Then I think it is good. Thanks, Josh On Wed, Dec 18, 2013 at 9:52 AM, Ral

Re: [OMPI devel] OMPI developer's meeting today

2013-12-12 Thread Josh Hursey
I think we had C/R slated for Friday morning - is that still correct? I have a meeting at 10 that just popped up, so I can only attend for an hour if we start at 9. I just wanted to confirm that that was the current plan or if had been modified. -- Josh On Thu, Dec 12, 2013 at 6:53 AM, Jeff Sq

Re: [OMPI devel] [PATCH v2] Trying to get the C/R code to compile again. (last)

2013-12-09 Thread Josh Hursey
With the modification that Ralph mentioned below, I think the patch it good to go. Thanks! On Mon, Dec 9, 2013 at 2:19 PM, Ralph Castain wrote: > On Dec 9, 2013, at 10:07 AM, Ralph Castain wrote: > > > I see some things in here that concern me. First, there are variables > being added to func

Re: [OMPI devel] [PATCH 2/4] Trying to get the C/R code to compile again. (send_*_nb)

2013-12-06 Thread Josh Hursey
; > https://svn.open-mpi.org/trac/ompi/wiki/Dec13Meeting > > > > On Dec 6, 2013, at 9:30 AM, Josh Hursey wrote: > > > Since the blocking semantics are important for correctness of the prior > code, I would not just replace send_buffer with send_buffer_nb. This makes >

Re: [OMPI devel] [PATCH 4/4] Trying to get the C/R code to compile again. (last)

2013-12-06 Thread Josh Hursey
Did the mca_base_component_distill_checkpoint_ready paramter go away? Its intention was to allow a user to have a build with C/R compiled in and then choose at runtime if they want to restrict their component section to just C/R enabled components or not. I have reservations about that part of the

Re: [OMPI devel] [PATCH 3/4] Trying to get the C/R code to compile again. (recv_*_nb)

2013-12-06 Thread Josh Hursey
Per my other email, I would suggest #ifdef comments instead of nonblocking replacements for the blocking calls. After that modification, I think this patch is fine. As was mentioned previously, we will need to go back (after things compile) and figure out a new model for this behavior. For the exi

Re: [OMPI devel] [PATCH 2/4] Trying to get the C/R code to compile again. (send_*_nb)

2013-12-06 Thread Josh Hursey
Since the blocking semantics are important for correctness of the prior code, I would not just replace send_buffer with send_buffer_nb. This makes the semantics incorrect, and will make things confusing later when you try to sort out prior calls to send_buffer_nb with those that you replaced. As a

Re: [OMPI devel] [PATCH 1/4] Trying to get the C/R code to compile again. (void value not ignored)

2013-12-06 Thread Josh Hursey
This patch looks good to me. Let me look at some of the others. On Fri, Dec 6, 2013 at 7:14 AM, Jeff Squyres (jsquyres) wrote: > Let's see what Josh says (he said he'd review the patches today). I'm > guessing he'll be ok with this one, but let's see. > > > On Dec 6, 2013, at 6:25 AM, Adrian Re

Re: [OMPI devel] Annual OMPI membership review: SVN accounts

2013-07-10 Thread Josh Hursey
Keep my account active. On Tue, Jul 9, 2013 at 11:54 AM, Tim Mattox wrote: > What, my SVN account is still there? > I'm just a lurker now, so please remove timattox from SVN > > On Mon, Jul 8, 2013 at 6:32 PM, Jeff Squyres (jsquyres) < > jsquy...@cisco.com> wrote: > >> >> Indiana >> ===

Re: [OMPI devel] [EXTERNAL] Re: RFC: Python-generated Fortran wrappers

2013-05-22 Thread Josh Hursey
Is this Python 2.x or 3.x? I ask because they are not 100% compatible due to changes in the language syntax. Meaning not all 2.x compilant Python programs work with a 3.x interpreter. IIRC there is a way to write a 2.x compliant Python program so that it is also 3.x compliant, but my Python knowled

Re: [OMPI devel] CRIU checkpoint support in Open-MPI?

2012-12-11 Thread Josh Hursey
+1 I would be interested in seeing support for this in Open MPI. On Thu, Dec 6, 2012 at 8:14 AM, George Bosilca wrote: > Samuel, > > Yes, all contributions are welcomed. It should be almost trivial to write > a new backend in Open MPI to support what the kernel developers will agree > to add as

Re: [OMPI devel] Open MPI MTT is moving

2012-11-05 Thread Josh Hursey
experience any problems with the new server. -- Josh On Fri, Nov 2, 2012 at 9:26 AM, Josh Hursey wrote: > Reminder that we will be shutting down the MTT submission and reporter > services this weekend to migrate it to another machine. The MTT > services will go offline at COB today, and b

Re: [OMPI devel] Open MPI MTT is moving

2012-11-02 Thread Josh Hursey
Reminder that we will be shutting down the MTT submission and reporter services this weekend to migrate it to another machine. The MTT services will go offline at COB today, and be brought back by Monday morning. On Wed, Oct 31, 2012 at 7:54 AM, Jeff Squyres wrote: > *** IF YOU RUN MTT, YOU NEED

Re: [OMPI devel] ORCA - Another runtime supported

2012-08-22 Thread Josh Hursey
Yeah. I am having trouble finding cycles to spend on this at the moment. We know what we need to do to finish it so that isn't the barrier at the moment. Just hands on deck. I hope to get back to it soon, but I cannot put time bounds on soon at the moment :/ -- Josh On Wed, Aug 22, 2012 at 10:17

Re: [OMPI devel] Quiet Time on Trunk - ORCA Integration

2012-06-27 Thread Josh Hursey
Josh Hursey wrote: > ORCA was backed out of the trunk in r26676. > > Once we fix the linking issue, we will bring this back. > > Sorry for the noise folks. The trunk is open again. > > -- Josh > > On Tue, Jun 26, 2012 at 9:04 PM, Josh Hursey wrote: >> So I'm sp

Re: [OMPI devel] Quiet Time on Trunk - ORCA Integration

2012-06-26 Thread Josh Hursey
ORCA was backed out of the trunk in r26676. Once we fix the linking issue, we will bring this back. Sorry for the noise folks. The trunk is open again. -- Josh On Tue, Jun 26, 2012 at 9:04 PM, Josh Hursey wrote: > So I'm spinning my wheels on this one. I am going to need someone wit

Re: [OMPI devel] Quiet Time on Trunk - ORCA Integration

2012-06-26 Thread Josh Hursey
I never saw it. -- Josh On Tue, Jun 26, 2012 at 8:25 PM, Josh Hursey wrote: > So I can confirm that it is not linking properly on the Mac. It -is- > running correctly on Linux (which is where I have been testing). > > From what I can tell this is a linking issue specific to the Mac. I'

Re: [OMPI devel] Quiet Time on Trunk - ORCA Integration

2012-06-26 Thread Josh Hursey
on here would be appreciated as I dig further. -- Josh On Tue, Jun 26, 2012 at 7:40 PM, Josh Hursey wrote: > That is odd. I did not see that when testing on Linux. I'll take a look. > > -- josh > > On Tue, Jun 26, 2012 at 7:37 PM, Ralph Castain wrote: >> FWIW: it buil

Re: [OMPI devel] Quiet Time on Trunk - ORCA Integration

2012-06-26 Thread Josh Hursey
would suggest backing this out and the > two of us looking at it in the morning. Somehow, you've lost the progress > loop that was driving the RTE thru orte_init. > > > On Jun 26, 2012, at 3:44 PM, Josh Hursey wrote: > >> r26670 is the first of the ORCA commits. I a

Re: [OMPI devel] Quiet Time on Trunk - ORCA Integration

2012-06-26 Thread Josh Hursey
r26670 is the first of the ORCA commits. I am switching machines for testing. Hang on for a couple more hours while the initial testing is underway. -- Josh On Tue, Jun 26, 2012 at 4:34 PM, Josh Hursey wrote: > I am requesting a quiet time on the trunk for ORCA integration > starting -no

[OMPI devel] Quiet Time on Trunk - ORCA Integration

2012-06-26 Thread Josh Hursey
I am requesting a quiet time on the trunk for ORCA integration starting -now- (as previously announced). I will post back when everything is committed and ready to go. Some reading while you are waiting: http://www.open-mpi.org/community/lists/devel/2012/06/11109.php Thanks, Josh -- Joshua Hu

Re: [OMPI devel] RFC: Pineapple Runtime Interposition Project

2012-06-26 Thread Josh Hursey
start the quiet time at 4:30 pm US Eastern. I will send an email at that time to the devel list, and send a followup when I'm all done. Again: Quiet time starts at 4:30 pm US Eastern for the Open MPI trunk. Thanks! Josh On Tue, Jun 26, 2012 at 10:29 AM, Josh Hursey wrote: > I like the {O}

  1   2   3   4   >