[OMPI devel] confusion between slot and procs on mca/rmaps

2010-11-30 Thread Damien Guinier
hi all, Many time, there are no difference between "proc" and "slot". But when you use "mpirun -cpus-per-proc X", slot have X procs. On orte/mca/rmaps/base/rmaps_base_common_mappers.c, there are a confusion between proc and slot. this little error impact mapping action: On OMPI last version w

Re: [OMPI devel] confusion between slot and procs on mca/rmaps

2010-12-01 Thread Damien Guinier
is that "bycore" causes us to set the "bynode" flag by mistake. Did you check that? BTW: when running cpus-per-proc, a slot doesn't have X processes. I suspect this is just a language thing, but it will create confusion. A slot consists of X cpus - we still assign on

Re: [OMPI devel] confusion between slot and procs on mca/rmaps

2010-12-01 Thread Damien Guinier
oups Ok, you can commit it. All problem is on "procs" word, on source code, "processes" AND "cores" definition is used. Le 01/12/2010 11:37, Damien Guinier a écrit : Ok, you can commit it. All problem is on "procs" work, on source code, "proce

[OMPI devel] setenv MPI_ROOT

2011-02-08 Thread Damien Guinier
customer who use BPS and LSF batch manager. thanks Damien Guinier - diff -r 486ca4bfca95 contrib/dist/linux/openmpi.spec --- a/contrib/dist/linux/openmpi.spec Mon Feb 07 15:40:31 2011 +0100 +++ b/contrib/dist/linux/openmpi.spec Tue Feb 08 14:30:01 2011 +0100 @@ -514,6 +514,10

[OMPI devel] BTL preferred_protocol , large message

2011-03-08 Thread Damien Guinier
Hi Jeff I'm working on large message exchange optimization. My optimization consists in "choosing the best protocol for each large message". In fact, - for each device, the way to chose the best protocol is different. - the faster protocol for a given device depends on that device hardware and

[OMPI devel] Bug btl:tcp with grpcomm:hier

2011-03-16 Thread Damien Guinier
Hi all From my test, it is impossible to use "btl:tcp" with "grpcomm:hier". The "grpcomm:hier" module is important because, "srun" launch protocol can't use any other "grpcomm" module. You can reproduce this bug, by using "btl:tcp" and "grpcomm:hier" , when you create a ring(like: IMB sendrecv

Re: [OMPI devel] Bug btl:tcp with grpcomm:hier

2011-03-17 Thread Damien Guinier
You are welcome. I'm happy you find quickly this fix. Thanks to all Damien Le 17/03/2011 03:27, Ralph Castain a écrit : Okay, I fixed this in r24536. Sorry for the problem, Damien - thanks for catching it! Went unnoticed because the folks at the Labs always use IB. On Mar 16, 2011, at 7:20

Re: [OMPI devel] Bug btl:tcp with grpcomm:hier

2011-03-17 Thread Damien Guinier
Yes please, this fixes is asked by Bull clients. damien Le 17/03/2011 15:44, Jeff Squyres a écrit : Does this need to be CMR'ed to 1.4 and/or 1.5? On Mar 16, 2011, at 10:27 PM, Ralph Castain wrote: Okay, I fixed this in r24536. Sorry for the problem, Damien - thanks for catching it! We

[OMPI devel] MPI_finalize with srun

2009-12-07 Thread Damien Guinier
Hi Ralph I have found a bug in the 'grpcomm' : 'hier'. This bug create a infinite loop in mpi_finalize. In this module: the barrier is executed as an allgather with data length of zero. This allgather function can go to an infinite loop , depend of rank execution order. In orte/mca/grpcomm/

[OMPI devel] using hnp_always_use_plm

2009-12-18 Thread Damien Guinier
Hi Ralph On Openmpi, I working on a new little feature: hnp_always_use_plm. - To create final application , mpirun use on remote "orted via plm: Process lifecycle managment module" or localy "fork()". So the first compute node haven't the same methode than other compute node. Some debug tools(

Re: [OMPI devel] using hnp_always_use_plm

2009-12-18 Thread Damien Guinier
ed using the hnp ess module as it will then try to track its own launches and totally forget that it is a remote orted with slightly different responsibilities. If you need it to execute a different plm on the backend, please let me know - it is a trivial change to allow specification of remot

[OMPI devel] Openmpi with slurm : salloc -c

2010-02-26 Thread Damien Guinier
Hi Ralph, I find a minor bug on the MCA composent: ras slurm. This one have an incorrect comportement with the "X number of processors per task" feature. On the file orte/mca/ras/slurm/ras_slurm_module.c, line 356: - The node slot number is divide with "cpus_per_task" information, but "cpu

Re: [OMPI devel] Openmpi with slurm : salloc -c

2010-03-02 Thread Damien Guinier
s there seem to be happy with the way this behaves... Let me know what you find out. On Feb 26, 2010, at 9:45 AM, Damien Guinier wrote: Hi Ralph, I find a minor bug on the MCA composent: ras slurm. This one have an incorrect comportement with the "X number of processors per task" feat

[OMPI devel] Refresh the libevent to 1.4.13.

2010-06-07 Thread Damien Guinier
Hi all A recent update of the libevent seems to cause a regression on our side. On my 32 cpus node cluster , process launch by srun, hang on opal_event_loop(). We see a deadlock in MPI_Init (endlessly looping in opal_event_loop()) when we launch processes with pure srun on 32 cores nodes. He

[OMPI devel] srun + Intel OpenMP = SIGSEGV

2010-06-15 Thread Damien Guinier
Using Intel OpenMP in conjunction with srun seems to cause a segmentation fault, at least in the 1.5 branch. After a long time tracking this strange bug, I finally found out that the slurmd ess component was corrupting the __environ structure. This results in a crash in Intel OpenMP, which cal