Re: [OMPI devel] Fix a hang in carto_base_select() if carto_module_init() fails

2011-07-08 Thread nadia . derbey
Yes, sure! Agreed. Regards, -- Nadia Derbey Phone: +33 (0)4 76 29 77 62 devel-boun...@open-mpi.org wrote on 07/08/2011 02:10:22 AM: > De : Jeff Squyres > A : Open MPI Developers > Date : 07/08/2011 02:10 AM > Objet : Re: [OMPI devel] Fix a hang in carto_base_select() if > ca

Re: [OMPI devel] known limitation or bug in hwloc?

2011-08-29 Thread nadia . derbey
devel-boun...@open-mpi.org wrote on 08/29/2011 04:20:30 PM: > De : Ralph Castain > A : Open MPI Developers > Date : 08/29/2011 04:26 PM > Objet : Re: [OMPI devel] known limitation or bug in hwloc? > Envoyé par : devel-boun...@open-mpi.org > > Actually, I'll eat those words. I was looking at the

Re: [OMPI devel] known limitation or bug in hwloc?

2011-08-29 Thread nadia . derbey
devel-boun...@open-mpi.org wrote on 08/29/2011 05:57:59 PM: > De : Ralph Castain > A : Open MPI Developers > Date : 08/29/2011 05:58 PM > Objet : Re: [OMPI devel] known limitation or bug in hwloc? > Envoyé par : devel-boun...@open-mpi.org > > On Aug 29, 2011, at 8:35 AM, nadia.der...@bull.net w

Re: [OMPI devel] known limitation or bug in hwloc?

2011-08-30 Thread nadia . derbey
Thanks a lot Ralph! Regards, -- Nadia Derbey Phone: +33 (0)4 76 29 77 62 devel-boun...@open-mpi.org wrote on 08/29/2011 06:12:13 PM: > De : Ralph Castain > A : Open MPI Developers > Date : 08/29/2011 06:12 PM > Objet : Re: [OMPI devel] known limitation or bug in hwloc? > Env

Re: [OMPI devel] known limitation or bug in hwloc?

2011-08-30 Thread nadia . derbey
devel-boun...@open-mpi.org wrote on 08/29/2011 06:59:49 PM: > De : Brice Goglin > A : Open MPI Developers > Date : 08/29/2011 07:00 PM > Objet : Re: [OMPI devel] known limitation or bug in hwloc? > Envoyé par : devel-boun...@open-mpi.org > > I am playing with those aspects right now (it's plann

Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-06 Thread nadia . derbey
Resending, as i didn't get any answer... Regards, Nadia -- Nadia Derbey devel-boun...@open-mpi.org wrote on 01/27/2012 05:38:34 PM: > De : "nadia.derbey" > A : Open MPI Developers > Date : 01/27/2012 05:35 PM > Objet : [OMPI devel] btl/openib: get_

Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-09 Thread nadia . derbey
devel-boun...@open-mpi.org wrote on 02/09/2012 12:18:20 PM: > De : Jeff Squyres > A : Open MPI Developers > Date : 02/09/2012 12:18 PM > Objet : Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see > processes as bound if the job has been launched by srun > Envoyé par : devel-boun...@op

Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-09 Thread nadia . derbey
devel-boun...@open-mpi.org wrote on 02/09/2012 12:20:41 PM: > De : Brice Goglin > A : Open MPI Developers > Date : 02/09/2012 12:20 PM > Objet : Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see > processes as bound if the job has been launched by srun > Envoyé par : devel-boun...@op

Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-09 Thread nadia . derbey
devel-boun...@open-mpi.org wrote on 02/09/2012 01:32:31 PM: > De : Ralph Castain > A : Open MPI Developers > Date : 02/09/2012 01:32 PM > Objet : Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see > processes as bound if the job has been launched by srun > Envoyé par : devel-boun...@o

Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-16 Thread nadia . derbey
;s no need for any other patch: the fix you committed was the only one needed to fix the issue. Could you please move it to v1.5 (do I need to fill a CMR)? Thanks! -- Nadia Derbey devel-boun...@open-mpi.org wrote on 02/09/2012 06:00:48 PM: > De : Jeff Squyres > A : Open MPI Develop

Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-17 Thread nadia . derbey
devel-boun...@open-mpi.org wrote on 02/17/2012 08:36:54 AM: > De : Brice Goglin > A : de...@open-mpi.org > Date : 02/17/2012 08:37 AM > Objet : Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see > processes as bound if the job has been launched by srun > Envoyé par : devel-boun...@open-

[OMPI devel] bug in opal_generic_simple_pack_function()

2013-11-25 Thread Nadia Derbey
Stack->disp - pConvertor->pBaseBuf ); +source_base - pStack->disp - pConvertor->pBaseBuf - pData->lb ); DO_DEBUG( opal_output( 0, "pack save stack stack_pos %d pos_desc %d count_desc %d disp %ld\n", pConvertor->stack_po

Re: [OMPI devel] bug in opal_generic_simple_pack_function()

2013-11-25 Thread Nadia Derbey
drawn by hand. George. On Nov 25, 2013, at 11:40 , Nadia Derbey <mailto:nadia.der...@bull.net>> wrote: Hi, I'm currently working on a bug occuring at the client site with openmpi when calling MPI_Sendreceive() on datatypes built by the application. I think I've found

Re: [OMPI devel] bug in opal_generic_simple_pack_function()

2013-11-25 Thread Nadia Derbey
he case for your example, I don't think they are related. I guess we should look at all the patches in the opal/datatype and ompi/datatype over the last 13 months (the starting point of the 1.6.3). George. On Nov 25, 2013, at 14:10 , Nadia Derbey <mailto:nadia.der...@bull.net

[OMPI devel] btl_openib_receive_queues mca param not always taken into account

2014-07-11 Thread Nadia Derbey
gards, -- Nadia Derbey # HG changeset patch # Parent 4cb09323aca44faec7d027586ffa94e7d9681989 btl/openib: when specifying the receive_queues as an mca param to bypass the XRC settings, the XRC settings in the .ini file are taken into account nevertheless if we use the default QPs value di

[OMPI devel] RFC: Diagnostoc framework for MPI

2009-05-26 Thread Nadia Derbey
;m submitting this RFC to have your opinion about its usefulness, or even to know if there's an already existing mechanism to do this job. Regards, Nadia -- Nadia Derbey

Re: [OMPI devel] RFC: Diagnostoc framework for MPI

2009-05-26 Thread Nadia Derbey
up with additional (probably better) > ways of implementing this extension. My point here was simply to > ensure you knew that the basic mechanism already exists, and to > stimulate some thought as to how to use it for your proposed purpose. > > I would be happy to help you do so as

[OMPI devel] problem in the ORTE notifier framework

2009-05-26 Thread Nadia Derbey
Hi, While having a look at the notifier framework under orte, I noticed that the way it is written, the init routine for the selected module cannot be called. Attached is a small patch that fixes this issue. Regards, Nadia ORTE notifier module init routine is never called: orte_notifier.init che

Re: [OMPI devel] problem in the ORTE notifier framework

2009-05-28 Thread Nadia Derbey
>george. > > > > On May 27, 2009, at 06:59 , Ralph Castain wrote: > > > > > ORTE_NOTIFIER_VERBOSE(api, counter, threshold,...) > > > > > > #if WANT_NOTIFIER_VERBOSE > > > opal_atomic_increment(counter); > > > if (counter > threshold) { > > > orte_notifier.api(.) > > > } > > > #endif > > > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > -- Nadia Derbey

Re: [OMPI devel] problem in the ORTE notifier framework

2009-05-28 Thread Nadia Derbey
> The Big Question will how to do this with zero performance > impact when it is not being used. This has always been the > difficult issue when trying to implement any kind of > monitoring inside

Re: [OMPI devel] problem in the ORTE notifier framework

2009-05-28 Thread Nadia Derbey
ver, that > definitely will impact the critical code path, so Terry's caution is > definitely a concern. > > > On Thu, May 28, 2009 at 12:55 AM, Nadia Derbey > wrote: > On Wed, 2009-05-27 at 14:25 -0400, Jeff Squyres wrote: > > Excellent points;

Re: [OMPI devel] RFC - "system-wide-only" MCA parameters

2009-09-04 Thread Nadia Derbey
as usual. > > Can the file name ( openmpi-priv-mca-params.conf ) also be configurable ? No, it isn't, presently, but this can be changed if needed. Regards, Nadia > > Rich > > > On 9/3/09 5:23 AM, "Nadia Derbey" wrote: > > > > What: Define

Re: [OMPI devel] RFC - "system-wide-only" MCA parameters

2009-09-04 Thread Nadia Derbey
On Fri, 2009-09-04 at 10:05 +0300, Jeff Squyres wrote: > On Sep 3, 2009, at 12:23 PM, Nadia Derbey wrote: > > > What: Define a way for the system administrator to prevent users from > > overwriting the default system-wide MCA parameters settings. > > > > In

Re: [OMPI devel] RFC - "system-wide-only" MCA parameters

2009-09-04 Thread Nadia Derbey
, or do you get a single-line version number? > I get the same. The reason is simple : > > $ hg tip > changeset: 9:f11244ed72b5 > tag: tip > user:Nadia Derbey > date:Thu Sep 03 14:21:47 2009 +0200 > summary: up to changeset c4b117c5439b >

Re: [OMPI devel] RFC - "system-wide-only" MCA parameters

2009-09-04 Thread Nadia Derbey
values (or even a set of discrete values) for any such parameter. Then, any higher priority setting will be done only if the new value belongs to the declared set. But actually, may be that extension is not desirable at all. In that case, I agree that your prposal is a very good compromise: . single parser (though it should be enhanced) . single configuration file Regards, Nadia -- Nadia Derbey

Re: [OMPI devel] RFC - "system-wide-only" MCA parameters

2009-09-04 Thread Nadia Derbey
re about some mca params values and where those system-wide params should not be *unintentionally* set to different values. Regards, Nadia > > :-( > > > On Sep 4, 2009, at 12:42 AM, Jeff Squyres wrote: > > > On Sep 4, 2009, at 8:26 AM, Nadia Derbey wrote: > > >

Re: [OMPI devel] version number issues

2009-09-07 Thread Nadia Derbey
On Sat, 2009-09-05 at 11:33 +0300, Jeff Squyres wrote: > On Sep 4, 2009, at 2:56 PM, Nadia Derbey wrote: > > > Actually, I didn't have the problem on my side, because hg is not > > known > > in my build environment. Never noticed these lines: > > > &g

[OMPI devel] mca_btl_openib_post_srr() posts to an uncreated SRQ when ibv_resize_cq() has failed

2009-10-23 Thread Nadia Derbey
raded mode in terms of performances. 2. Fix mca_bml_r2_add_btls() to cleanly exit if an error occurs during btl_add_procs(). FYI I tested solution #1 and it worked... Any suggestion or comment would be welcome. Regards, Nadia -- Nadia Derbey

Re: [OMPI devel] mca_btl_openib_post_srr() posts to an uncreated SRQwhen ibv_resize_cq() has failed

2009-11-26 Thread Nadia Derbey
t; we have too few CQ entries to be useful (e.g., 0 or some higher number > > that is still "too small"), or fail the BTL alltogether...? > > > > On Oct 23, 2009, at 10:10 AM, Nadia Derbey wrote: > > > >> Hi, > >> > >> Yesterdays I ha

Re: [OMPI devel] VT config.h.in

2010-01-19 Thread Nadia Derbey
de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > -- Nadia Derbey

[OMPI devel] HOSTNAME environment variable

2010-01-22 Thread Nadia Derbey
Hi, I'm wondering whether the HOSTNAME environment variable shouldn't be handled as a "special case" when the orted daemons launch the remote jobs. This particularly applies to batch schedulers where the caller's environment is copied to the remote job: we are inheriting a $HOSTNAME which is the n

Re: [OMPI devel] HOSTNAME environment variable

2010-01-22 Thread Nadia Derbey
> Torque - and as soon as Nadia confirms, on SLURM as well. > > I know that on Torque it was an innocent mistake where a line got added to > the launch code that shouldn't have... > > On Jan 22, 2010, at 8:07 AM, N.M. Maclaren wrote: > > > On Jan 22 2010, Na

Re: [OMPI devel] HOSTNAME environment variable

2010-01-22 Thread Nadia Derbey
ld definitely mess things up for > > more than OMPI. > > > > Are you sure that SLURM is propagating the environment (something I have > > never seen before)? Or is OMPI mistakenly picking it up and propagating it? > > > > On Jan 22, 2010, at 7:25 AM, Na

[OMPI devel] PATCH: remove trailing colon at the end of the generated LD_LIBRARY_PATH

2010-02-16 Thread Nadia Derbey
Hi, The mpivars.sh genereted in openmpi.spec might in some cases lead to a LD_LIBRARY_PATH that contains a trailing ":". This happens if the LD_LIBRARY_PATH is originally unset. This means that current directory is included in the search path for the loader, which might not be the desired result.

Re: [OMPI devel] PATCH: remove trailing colon at the end of thegenerated LD_LIBRARY_PATH

2010-02-18 Thread Nadia Derbey
On Wed, 2010-02-17 at 17:14 -0500, Jeff Squyres wrote: > Looks good to me! > > Please commit and file CMRs for v1.4 and v1.5 (assuming this patch applies > cleanly to both branches). Not sure I have the rights to do these things? Regards, Nadia > > > On Feb 16, 2010, at

[OMPI devel] typo in opal/event/evutil.h ?

2010-02-26 Thread Nadia Derbey
ne ev_int64_t signed __int64 -#elif _EVENT_SIZEOF_LONG_LONG == 8 +#elif SIZEOF_LONG_LONG == 8 #define ev_uint64_t unsigned long long #define ev_int64_t long long #elif SIZEOF_LONG == 8 Regards, Nadia -- Nadia Derbey

Re: [OMPI devel] typo in opal/event/evutil.h ?

2010-02-26 Thread Nadia Derbey
this file is for me: changeset: 17413:32687831ca9e user:brbarret date:Thu Feb 04 05:38:30 2010 + summary: Update libevent to 1.4.13 But maybe something got messed here in our repo, will check. Regards, Nadia > > On Fri, Feb 26, 2010 at 3:48 AM, Nadia Der

Re: [OMPI devel] RFC 1/1: improvements to the "notifier" framework and ORTE WDC

2010-03-30 Thread Nadia Derbey
art RFC to bring the SOS and WDC > > branches > > to the trunk. This only brings in the "notifier" changes from the > > SOS > > branch, while the rest of the branch will be brought over after the > > timeout of the second RFC. > > > > == > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Nadia Derbey

[OMPI devel] problem when binding to socket on a single socket node

2010-04-09 Thread Nadia Derbey
where I see a justification to this test (see attached patch). And may be both solutions could be mixed. Regards, Nadia -- Nadia Derbey Do not test actual process binding in obvious cases diff -r 0b851b2e7934 orte/mca/odls/default/odls_default_module.c --- a/orte/mca/odls/default/odls_defaul

Re: [OMPI devel] problem when binding to socket on a single socket node

2010-04-09 Thread Nadia Derbey
s like 1. the call to OPAL_PAFFINITY_PROCESS_IS_BOUND is still there in odls_default_fork_local_proc() 2. OPAL_PAFFINITY_PROCESS_IS_BOUND() is defined the same way But, I'll give it a try with the latest trunk. Regards, Nadia > On Apr 9, 2010, at 3:39 AM, Nadia Derbey wrote: &

Re: [OMPI devel] problem when binding to socket on a single socket node

2010-04-12 Thread Nadia Derbey
socket. In the other path, what we are doing is checking if we have set one or more bits in a mask after having actually set them: don't you think it's useless? That's why I'm suggesting to call the last check only if orte_odls_globals.bound is true. Regards, Nadia > >

Re: [OMPI devel] problem when binding to socket on a single socket node

2010-04-12 Thread Nadia Derbey
tion. It > seems to me that whether or not we were externally bound is irrelevant. Even > if the overall result is what you want, I think a more logically > understandable test would help others reading the code. > > But first we need to resolve the question: should this scen

Re: [OMPI devel] problem when binding to socket on a single socket node

2010-04-13 Thread Nadia Derbey
nal bind. People actually use that as a means of suballocating > > > nodes, so the test needs to be there. Again, if the user said "bind to > > > socket", but none of that socket's cores are assigned for our use, that > > > is an error. > > > > >

Re: [OMPI devel] problem when binding to socket on a single socket node

2010-04-13 Thread Nadia Derbey
On Tue, 2010-04-13 at 01:27 -0600, Ralph Castain wrote: > On Apr 13, 2010, at 1:02 AM, Nadia Derbey wrote: > > > On Mon, 2010-04-12 at 10:07 -0600, Ralph Castain wrote: > >> By definition, if you bind to all available cpus in the OS, you are > >> bound to nothing (i.