Re: [OMPI users] The effect of --bind-to in the presence of PE=N in --map-by

2016-05-19 Thread tmishima
Hi Ralph, I didn't notice your reply. So our mails might pass each other. As you pointed out, I might be wrong. Give me a time to remember every thing about the bind-to directve. Regards, Tetsuya 2016/05/20 8:36:53、"users"さんは「Re: [OMPI users] The effect of --bind-to in the presence of PE=N in

Re: [OMPI users] The effect of --bind-to in the presence of PE=N in --map-by

2016-05-19 Thread tmishima
Yes, slot is nearly equal to core. Ralph would know the difference very well, please ask him about the detail. Tetsuya 2016/05/20 8:29:41、"users"さんは「Re: [OMPI users] The effect of --bind-to in the presence of PE=N in --map-by」で書きました > Thank you, Tetsuya. So is a slot = core? > > On Thu, May

Re: [OMPI users] The effect of --bind-to in the presence of PE=N in --map-by

2016-05-19 Thread tmishima
Hi Saliya and Ralph, I guess Ralph is confusing "bind-to core" with "bind-to slot". As far as I remember, when you add "PE=N" option to the map-by directive, you can only use "bind to slot". So if you want to bind a process to specific slots(almost same as cores), you should use "bind-to slot".

Re: [OMPI users] Any changes to rmaps in 1.10.2?

2016-01-28 Thread tmishima
Hi Ben and Ralph, just a very short comment. The error message shows the hardware detection doesn't work well, because it says the number of cpus is zero. > >   #cpus-per-proc:  1 > >   number of cpus:  0 > >   map-by:  BYSOCKET:NOOVERSUBSCRIBE Regards, Tetsuya > Thanks Ralph, > > > > T

Re: [OMPI users] hostfile without slots

2015-10-07 Thread tmishima
Hi, In addition to Ralph's explanation, you can change the policy of this behavior using MCA param orte_set_default_slots. For example, by setting "none" you can disable the auto detection of slots count, which means it's compatible with openmpi-1.6.X. Regards, Tetsuya Mishima

Re: [OMPI users] Hybrid OpenMPI+OpenMP tasks using SLURM

2015-10-05 Thread tmishima
I'm doing quite well, thank you. I'm involved in a big project and so very busy now. But I still try to keep watching these mailing lists. Regards, Tetsuya Mishima 2015/10/06 8:17:33、"users"さんは「Re: [OMPI users] Hybrid OpenMPI+OpenMP tasks using SLURM」で書きました > Ah, yes - thanks! It’s been so long

Re: [OMPI users] Hybrid OpenMPI+OpenMP tasks using SLURM

2015-10-05 Thread tmishima
Hi Ralph, it's been a long time. The option "map-by core" does not work when pe=N > 1 is specified. So, you should use "map-by slot:pe=N" as far as I remember. Regards, Tetsuya Mishima 2015/10/06 5:40:33、"users"さんは「Re: [OMPI users] Hybrid OpenMPI+OpenMP tasks using SLURM」で書きました > Hmmm…okay, try

Re: [OMPI users] How does binding option affect network traffic?

2014-08-29 Thread tmishima
Hi, Your cluster is very similar to ours where Torque and OpenMPI is installed. I would use this cmd line: #PBS -l nodes=2:ppn=12 mpirun --report-bindings -np 16  Here --map-by socket:pe=1 and -bind-to core is assumed as default setting. Then, you can run 10 jobs independently and simultaneousl

Re: [OMPI users] Running a hybrid MPI+openMP program

2014-08-27 Thread tmishima
Hi, Here is a very simple patch, but Ralph might have a different idea. So I'd like him to decide how to treat it. As far as I checked, I believe it has no side effect. (See attached file: patch.bind-to-none) Tetsuya > Hi, > > Am 27.08.2014 um 09:57 schrieb Tetsuya Mishima: > > > Hi Reuti and R

Re: [OMPI users] Running a hybrid MPI+openMP program

2014-08-20 Thread tmishima
Oscar, As I mentioned before, I've never used SGE. So please ask for Reuti's advise. Only thing I can tell is that you have to use the openmpi 1.8 series to use -map-by slot:pe=N option. Tetsuya > Hi > > Well, with qconf -sq one.q I got the following: > > [oscar@aguia free-noise]$ qconf -sq one

Re: [OMPI users] Running a hybrid MPI+openMP program

2014-08-20 Thread tmishima
Reuti, Sorry for confusing you. Under the managed condition, actually -np option is not necessary. So, this cmd line also works for me with Torque. $ qsub -l nodes=10:ppn=N $ mpirun -map-by slot:pe=N ./inverse.exe At least, Ralph confirmed it worked with Slurm and I comfirmed with Torque as show

Re: [OMPI users] Running a hybrid MPI+openMP program

2014-08-20 Thread tmishima
Reuti, If you want to allocate 10 procs with N threads, the Torque script below should work for you: qsub -l nodes=10:ppn=N mpirun -map-by slot:pe=N -np 10 -x OMP_NUM_THREADS=N ./inverse.exe Then, the openmpi automatically reduces the logical slot count to 10 by dividing real slot count 10N by b

Re: [OMPI users] OpenMPI 1.8.1 runs more OpenMP Threads on the same core

2014-06-27 Thread tmishima
Hi Luigi, Please try: --map-by slot:pe=4 Probably Ralph is very busy, so something sliped his memory... Regards, Tetsuya > Hi all, > My system is a 64 core, with Debian 3.2.57 64 bit, GNU gcc 4.7, kernel Linux 3.2.0 and OpenMPI 1.8.1. > I developed an application to matching proteins files u

Re: [OMPI users] Bind multiple cores to rank - OpenMPI 1.8.1

2014-06-08 Thread tmishima
It's a good idea to provide the default setting for the modifier pe. Okay, I can take a look to review but a bit busy now, so please give me a few days. Regards, Tetsuya > Okay, I revised the command line option to be a little more user-friendly. You can now specify the equivalent of the old

Re: [OMPI users] Bind multiple cores to rank - OpenMPI 1.8.1

2014-06-06 Thread tmishima
Hi Dan, Please try: mpirun -np 2 --map-by socket:pe=8 ./hello or mpirun -np 2 --map-by slot:pe=8 ./hello You can not bind 8 cpus to the object "core" which has only one cpu. This limitation started from 1.8 series. The objcet "socket" has 8 cores in your case. So you can do it. And, the object

Re: [OMPI users] How to replace --cpus-per-proc by --map-by

2014-03-27 Thread tmishima
Mapping and binding is related to so called process affinity. It's a bit difficult for me to explain ... So please see this URL below(especially the first half part of it - from 1 to 20 pages): http://www.slideshare.net/jsquyres/open-mpi-explorations-in-process-affinity-eurompi13-presentation A

Re: [OMPI users] How to replace --cpus-per-proc by --map-by

2014-03-27 Thread tmishima
Hi Saliya, What you want to do is map-by node. So please try below: -np 2 --map-by node:pe=4 --bind-to core You might not need to add --bind-to core, because it's default binding. Tetsuya > Hi, > > I see in v.1.7.5rc5 --cpus-per-proc is deprecated and is advised to replace by --map-by :PE=N.

Re: [OMPI users] coll_ml_priority in openmpi-1.7.5

2014-03-24 Thread tmishima
I ran our application using the final version of openmpi-1.7.5 again with coll_ml_priority = 90. Then, coll/ml was actually activated and I got these error messages as shown below: [manage][[11217,1],0][coll_ml_lmngr.c:265:mca_coll_ml_lmngr_alloc] COLL-ML List manager is empty. [manage][[11217,1

[OMPI users] cleanup of round robin mappers

2014-03-24 Thread tmishima
Hi Ralph, I tried to improve checking for mapping-too-low and fixed a minor problem in rmaps_rr.c file. Please see attached patch file. 1) Regarding mapping-too-low, in future we'll have a lager size of l1,2,3cache or other architectuers, and in that case, the needs to map by a lower object leve

Re: [OMPI users] coll_ml_priority in openmpi-1.7.5

2014-03-21 Thread tmishima
I could roughly understand what the coll_ml is and how you are going to treat it, thanks. As Ralph pointed out, I didn't see coll_ml was really used. I just thought the slowdown meant it was used. I'll check it later. It might be due to the expensive connectivity computation. Tetsuya > One of

[OMPI users] coll_ml_priority in openmpi-1.7.5

2014-03-20 Thread tmishima
Hi Ralph, congratulations on releasing new openmpi-1.7.5. By the way, opnempi-1.7.5rc3 has been slowing down our application with smaller size of testing data, where the time consuming part of our application is so called sparse solver. It's negligible with medium or large size data - more practi

Re: [OMPI users] another corner case hangup in openmpi-1.7.5rc3

2014-03-18 Thread tmishima
I confirmed your fix worked good for me. But, I guess at least we should add the line "daemons->updated = false;" in the last if-clause, although I'm not sure how the variable is used. Is it okay, Ralph? Tetsuya > Understood, and your logic is correct. It's just that I'd rather each launcher de

Re: [OMPI users] another corner case hangup in openmpi-1.7.5rc3

2014-03-17 Thread tmishima
I do not understand your fix yet, but it would be better, I guess. I'll check it later, but now please let me expalin what I thought: If some nodes are allocated, it doen't go through this part because opal_list_get_size(&nodes) > 0 at this location. 1590if (0 == opal_list_get_size(&nodes)

[OMPI users] another corner case hangup in openmpi-1.7.5rc3

2014-03-17 Thread tmishima
Hi Ralph, I found another corner case hangup in openmpi-1.7.5rc3. Condition: 1. allocate some nodes using RM such as TORQUE. 2. request the head node only in executing the job with -host or -hostfile option. Example: 1. allocate node05,node06 using TORQUE. 2. request node05 only with -host op

Re: [OMPI users] ctrl+c to abort a job with openmpi-1.7.5rc2

2014-03-13 Thread tmishima
I happened to misspell a hostname, then it hanged. [mishima@manage ~]$ mpirun -np 6 -host node05,nod06 ~/mis/openmpi/demos/myprog nod06: Unknown host mpirun: abort is already in progress...hit ctrl-c again to forcibly terminate Tetsuya > No problem - we appreciate you taking the time to confir

Re: [OMPI users] ctrl+c to abort a job with openmpi-1.7.5rc2

2014-03-13 Thread tmishima
Hi Ralph, I'm late to your release again due to TD. At that time, I manually applied #4386 and #4383 to 1.7 branch - namely openmpi-1.7.5rc2, and did the check. I might have made some mistake. Now, I found openmpi-1.7.5rc3 had just released and confirmed it worked fine, thanks. Tetsuya > It's

Re: [OMPI users] ctrl+c to abort a job with openmpi-1.7.5rc2

2014-03-13 Thread tmishima
Sorry for disturbing, please keep going ... Tetsuya > Yes, I know - I am just finishing the fix now. > > On Mar 12, 2014, at 8:48 PM, tmish...@jcity.maeda.co.jp wrote: > > > > > > > Hi Ralph, this problem is not fixed completely by today's latest > > ticket #4383, I guess ... > > > > https://sv

Re: [OMPI users] ctrl+c to abort a job with openmpi-1.7.5rc2

2014-03-12 Thread tmishima
Hi Ralph, this problem is not fixed completely by today's latest ticket #4383, I guess ... https://svn.open-mpi.org/trac/ompi/ticket/4383 For example, in case of returing with ORTE_ERR_SILENT from the line 514 in rmaps_rr_mapper.c file, the problem still occurs. I executed the job under the unm

Re: [OMPI users] ctrl+c to abort a job with openmpi-1.7.5rc2

2014-03-12 Thread tmishima
Thanks, Jeff. I really understood the situation. Tetsuya > This all seems to be a side-effect of r30942 -- see: > > https://svn.open-mpi.org/trac/ompi/ticket/4365 > > > On Mar 12, 2014, at 5:13 AM, wrote: > > > > > > > Hi Ralph, > > > > I installed openmpi-1.7.5rc2 and applied r31019 to it. >

[OMPI users] ctrl+c to abort a job with openmpi-1.7.5rc2

2014-03-12 Thread tmishima
Hi Ralph, I installed openmpi-1.7.5rc2 and applied r31019 to it. As far as I confirmed, rmaps framework worked fine. However, by chance, I noticed that single ctrl+c typing could not terminate a running job. Twice typing was necessary. Is this your expected behavior? I didn't use ctrl+c to abo

Re: [OMPI users] incorrect verbose output in bind_downwards

2014-03-11 Thread tmishima
Ralph, sorry. I missed a problem in the hwloc_base_util.c file. The "static int build_map" still depends on the opal_hwloc_topology. (Please see attached patch file) (See attached file: patch.hwloc_base_util) Tetsuya > Ralph, sorry for late confirmation. It worked for me, thanks. > > Tetsuya >

Re: [OMPI users] incorrect verbose output in bind_downwards

2014-03-11 Thread tmishima
Ralph, sorry for late confirmation. It worked for me, thanks. Tetsuya > I fear that would be a bad thing to do as it would disrupt mpirun's operations. However, I did fix the problem by adding the topology as a param to the pretty-print functions. Please see: > > https://svn.open-mpi.org/trac/o

[OMPI users] incorrect verbose output in bind_downwards

2014-03-10 Thread tmishima
Hi Ralph, I would report one more small thing. The verbose output in bind_downwards sometimes gives incorrect binding-map when I use heterogeneous nodes with different topologies. I confirmed that this patch fixedtheproblem: --- rmaps_base_binding.

Re: [OMPI users] new map-by-obj has a problem

2014-03-03 Thread tmishima
Hi Ralph, I misunderstood the point of the problem. The problem is that BIND_TO_OBJ is re-tried and done in orte_ess_base_proc_binding @ ess_base_fns.c, although you try to BIND_TO_NONE in rmaps_rr_mapper.c when it's oversubscribed. Furthermore, binding in orte_ess_base_proc_binding does not sup

Re: [OMPI users] new map-by-obj has a problem

2014-03-02 Thread tmishima
Hi Ralph, I have tested your fix - 30895. I'm afraid to say I found a mistake. You should include "SETTING BIND_TO_NONE" in the above if-clause at the line 74, 256, 511, 656. Othrewise, just warning message disappears but binding to core is still overwritten by binding to none. Pleaes see attach

[OMPI users] Duplicated ticket

2014-03-01 Thread tmishima
Hi Ralph, The root cause of ticket #3893 I reported about lama mapper would be same as that of #4035 - both are related to inverted topologies. Please delete/close #3983 as duplicated. https://svn.open-mpi.org/trac/ompi/ticket/3893 https://svn.open-mpi.org/trac/ompi/ticket/4035 Tetsuya Mishima

Re: [OMPI users] new map-by-obj has a problem

2014-02-28 Thread tmishima
Hi Ralph, I understood what you meant. I often use float for our applicatoin. float c = (float)(unsinged int a - unsinged int b) could be very huge number, if a < b. So I always carefully cast to int from unsigned int when I subtract them. I didn't know/mind inc d = (unsinged int a - unsinged in

Re: [OMPI users] new map-by-obj has a problem

2014-02-28 Thread tmishima
Yes, indeed. In future, when we will have many many cores in the machine, we will have to take care of overrun of num_procs. Tetsuya > Cool - easily modified. Thanks! > > Of course, you understand (I'm sure) that the cast does nothing to protect the code from blowing up if we overrun the var. I

Re: [OMPI users] new map-by-obj has a problem

2014-02-28 Thread tmishima
Hi Ralph, I'm a litte bit late to your release. I found a minor mistake in byobj_span -integer casting problem. --- rmaps_rr_mappers.30892.c2014-03-01 08:31:50 +0900 +++ rmaps_rr_mappers.c 2014-03-01 08:33:22 +0900 @@ -689,7 +689,7 @@ } /* compute how many objs need an extra pro

Re: [OMPI users] new map-by-obj has a problem

2014-02-27 Thread tmishima
Hi Ralph, I can't operate our cluster for a few days, sorry. But now, I'm narrowing down the cause by browsing the source code. My best guess is the line 529. The opal_hwloc_base_get_obj_by_type will reset the object pointer to the first one when you move on to the next node. 529

Re: [OMPI users] new map-by-obj has a problem

2014-02-27 Thread tmishima
Just checking the difference, not so significant meaning... Anyway, I guess it's due to the behavior when slot counts is missing (regarded as slots=1) and it's oversubscribed unintentionally. I'm going out now, so I can't verify it quickly. If I provide the correct slot counts, it wll work, I g

Re: [OMPI users] new map-by-obj has a problem

2014-02-27 Thread tmishima
Hi Ralph, this is just for your information. I tried to restore previous orte_rmaps_rr_byobj. Then I gets the result below with this command line: mpirun -np 8 -host node05,node06 -report-bindings -map-by socket:pe=2 -display-map -bind-to core:overload-allowed ~/mis/openmpi/demos/myprog Data

Re: [OMPI users] new map-by-obj has a problem

2014-02-27 Thread tmishima
They have 4 cores/socket and 2 sockets, totally 4 X 2 = 8 cores, each. Here is the output of lstopo. mishima@manage round_robin]$ rsh node05 Last login: Tue Feb 18 15:10:15 from manage [mishima@node05 ~]$ lstopo Machine (32GB) NUMANode L#0 (P#0 16GB) + Socket L#0 + L3 L#0 (6144KB) L2 L#0

[OMPI users] new map-by-obj has a problem

2014-02-27 Thread tmishima
Hi Ralph, I'm afraid to say your new "map-by obj" causes another problem. I have overload message with this command line as shown below: mpirun -np 8 -host node05,node06 -report-bindings -map-by socket:pe=2 -display-map ~/mis/openmpi/d emos/myprog

Re: [OMPI users] OpenMPI 1.7.5 and "--map-by" new syntax

2014-02-26 Thread tmishima
Hi, this help message might be just a simple mistake. Please try: mpirun -np 20 --map-by ppr:5:socket -bind-to core osu_alltoall There's no available explanation yet as far as I know, because it's still alfa version. Tetsuya Mishima > Dear all, > > I am playing with Open MPI 1.7.5 and with th

Re: [OMPI users] map-by node with openmpi-1.7.5a1

2014-02-22 Thread tmishima
Okay, I will verify your patch. It's exciting for me. Pleaes give me a time. I was reviewing rmapa_rr_mappers.c simultaneously with your work. I found some minor problems additionally. To avoid confusion, I will report them after this job will be completed. By the way, binding problem will be a

Re: [OMPI users] map-by node with openmpi-1.7.5a1

2014-02-19 Thread tmishima
Hi Ralph, I've found the fix. Please check the attached patch file. At this moment, nodes in hostfile should be listed in ascending order of slot size when we use "map-by node" or "map-by obj:span". The problem is that the hostfile created by Torque in our cluster always lists allocated nodes i

[OMPI users] map-by node with openmpi-1.7.5a1

2014-02-18 Thread tmishima
Hi Ralph, I did overall verification of rr_mapper, and I found another problem with "map-by node". As far as I checked, "map-by obj" other than node worked fine. I myself do not use "map-by node", but I'd like to report it to improve reliability of 1.7.5. It seems too difficult for me to resolve

Re: [OMPI users] one more finding in openmpi-1.7.5a1

2014-02-14 Thread tmishima
You've found it in the dream, interesting! Tetsuya Mishima > Thanks - hit me in the middle of the night over here that we had missed some options, but nice to find you had also seen it. Slightly modified patch will be applied and brought over to 1.7.5 > > > On Feb 13, 2014, at 10:16 PM, tmish..

Re: [OMPI users] one more finding in openmpi-1.7.5a1

2014-02-14 Thread tmishima
Please try attached patch - from r30723. (See attached file: patch.rmaps_base_frame.from_r30723) Tetsuya Mishima > Thanks for prompt help. > Could you please resent the patch as attachment which can be applied with "patch" command, my mail client messes long lines. > > > On Fri, Feb 14, 2014

Re: [OMPI users] one more finding in openmpi-1.7.5a1

2014-02-14 Thread tmishima
Thanks. I'm not familiar with mindist mapper. But obviously checking for ORTE_MAPPING_BYDIST is missing. In addition, ORTE_MAPPING_PPR is missing again by my mistake. Please try this patch. if OPAL_HAVE_HWLOC } else if (ORTE_MAPPING_BYCORE == ORTE_GET_MAPPING_POLICY (mapping)) {

Re: [OMPI users] one more finding in openmpi-1.7.5a1

2014-02-13 Thread tmishima
Sorry, one more shot - byslot was dropped! if (NULL == spec) { /* check for map-by object directives - we set the * ranking to match if one was given */ if (ORTE_MAPPING_GIVEN & ORTE_GET_MAPPING_DIRECTIVE(mapping)) { if (ORTE_MAPPING_BYSLOT == OR

Re: [OMPI users] one more finding in openmpi-1.7.5a1

2014-02-13 Thread tmishima
I've found it. Please add 2 lines(770, 771) in rmaps_base_frame.c: 747 if (NULL == spec) { 748 /* check for map-by object directives - we set the 749 * ranking to match if one was given 750 */ 751 if (ORTE_MAPPING_GIVEN & ORTE_GET_MAPPING

Re: [OMPI users] one more finding in openmpi-1.7.5a1

2014-02-13 Thread tmishima
You are welcome, Ralph. But, after fixing it, I'm facing another problem whin I use ppr option: [mishima@manage openmpi-1.7.4]$ mpirun -np 2 -map-by ppr:1:socket -bind-to socket -report-bindings ~/mis/openmpi/demos/m yprog [manage.cluster:28057] [[25570,0],0] ORTE_ERROR_LOG: Not implemented in f

[OMPI users] one more finding in openmpi-1.7.5a1

2014-02-13 Thread tmishima
Hi Ralph, I would report one more finding in openmpi-1.7.5a1. Because ORTE_MAPPING_BY...s are not a bit field expression, at orte_rmaps_base_set_ranking_policy in rmaps_base_frame.c you should not use "&" to compare them: 747 if (NULL == spec) { 748 /* check for map-by objec

Re: [OMPI users] "bind-to l3chace" with r30643 in ticket #4240 dosen't work

2014-02-12 Thread tmishima
Thanks. I myself have no request. Ralph might have something ... Regards, Tetsuya Mishima > Is there anything we could do in hwloc to improve this? (I don't even > know the exact piece of code you are refering to) > Brice > > > Le 12/02/2014 02:46, Ralph Castain a écrit : > > Okay, I fixed it.

Re: [OMPI users] "bind-to l3chace" with r30643 in ticket #4240 dosen't work

2014-02-11 Thread tmishima
Okay, I understood. Actually speaking, so far I do not have a definite problem about that. If I encounter some problems in future, I will tell you. Regards, Tetsuya Mishima > Guess I disagree - it isn't a question of what the code can handle, but rather user expectation. If you specify a defini

[OMPI users] "bind-to l3chace" with r30643 in ticket #4240 dosen't work

2014-02-11 Thread tmishima
Hi Ralph, Since the ticket #4240 has been already set as fixed, I'm sending this email to you. ( I don't konw I could add comments to the fixed ticket) When I tried to bind the process to l3chace, it didn't work like below: (the host mangae has the normal topology - not inverted) [mishima@manag

Re: [OMPI users] openmpi-1.7.4rc2r30425 produces unexpected output

2014-01-28 Thread tmishima
Thanks, Ralph. I'm happy to hear that. By the way, openmpi-1.7.4rc2 works fine for me. Tetsuya Mishima > Let me clarify: the functionality will remain as it is useful to many. What we need to do is somehow capture that command in the current map-by parameter so we avoid issues like the one you

Re: [OMPI users] openmpi-1.7.4rc2r30425 produces unexpected output

2014-01-27 Thread tmishima
Thank you for your comment, Ralph. I understand your explanation including "it's too late". The ppr option is convinient for us because our environment is quite hetero. (It gives flexiblity to the number of procs) I hope you do not deprecate ppr in the future release and aply my proposal someda

Re: [OMPI users] openmpi-1.7.4rc2r30425 produces unexpected output

2014-01-27 Thread tmishima
Hi Ralph, it seems you are rounding the final turn to release 1.7.4! I hope this will be my final request for openmpi-1.7.4 as well. I mostly use rr_mapper but sometimes use ppr_mapper. I have a simple request to ask you to improve its usability. Namely, I propose to remove redfining-policy-chec

Re: [OMPI users] openmpi-1.7.4rc2r30425 produces unexpected output

2014-01-27 Thread tmishima
Thanks, Ralph. I quickly checked the fix. It worked fine for me. Tetsuya Mishima > I fixed that in today's final cleanup > > On Jan 27, 2014, at 3:17 PM, tmish...@jcity.maeda.co.jp wrote: > > > > As for the NEWS - it is actually already correct. We default to map-by > core, not slot, as of 1.7.

Re: [OMPI users] openmpi-1.7.4rc2r30425 produces unexpected output

2014-01-27 Thread tmishima
> As for the NEWS - it is actually already correct. We default to map-by core, not slot, as of 1.7.4. Is it correct? As far as I browse the source code, map-by slot is used if np <=2. [mishima@manage openmpi-1.7.4rc2r30425]$ cat -n orte/mca/rmaps/base/rmaps_base_map_job.c ... 107

[OMPI users] openmpi-1.7.4rc2r30425 produces unexpected output

2014-01-26 Thread tmishima
Hi Ralph, I tried latest nightly snapshots of openmpi-1.7.4rc2r30425.tar.gz. Almost everything works fine, except that the unexpected output appears as below: [mishima@node04 ~]$ mpirun -cpus-per-proc 4 ~/mis/openmpi/demos/myprog App launch reported: 3 (out of 3) daemons - 8 (out of 12) procs ..

Re: [OMPI users] default num_procs of round_robin_mapper with cpus-per-proc option

2014-01-25 Thread tmishima
I forgot one thing to tell you. The slot is exceptional. It has no size in the logic of 1.7.4. That's why it always works with cpus-per-proc. Tetsuya Mishima

Re: [OMPI users] default num_procs of round_robin_mapper with cpus-per-proc option

2014-01-25 Thread tmishima
Hi Ralph, Thank you for your comment. I agree with your conclusion that you leave it as it is. As far as I checked, this behavior will also happen when I try to bind-to the objects which are smaller than ncpus-per-proc, ie, l1cache, l2cache and so on. So, if it is easy to know the number of co

Re: [OMPI users] default num_procs of round_robin_mapper with cpus-per-proc option

2014-01-23 Thread tmishima
Thanks for your explanation, Ralph. But it's really subtle to understand for me ... Anyway, I'd like to report what I found through verbose output. "-map-by core" calls "bind in place" as below: [mishima@manage work]$ mpirun -np 4 -hostfile pbs_hosts -report-bindings -cpus-per-proc 4 -map-by co

Re: [OMPI users] default num_procs of round_robin_mapper with cpus-per-proc option

2014-01-22 Thread tmishima
Thanks, Ralph. I have one more question. I'm sorry to ask you many things ... Could you tell me the difference between "map-by slot" and "map-by core". >From my understanding, slot is the synonym of core. But those behaviors using openmpi-1.7.4rc2 with the cpus-per-proc option are quite differe

[OMPI users] default num_procs of round_robin_mapper with cpus-per-proc option

2014-01-22 Thread tmishima
Hi Ralph, I want to ask you one more thing about default setting of num_procs when we don't specify the -np option and we set the cpus-per-proc > 1. In this case, the round_robin_mapper sets num_procs = num_slots as below: rmaps_rr.c: 130if (0 == app->num_procs) { 131/* set t

Re: [OMPI users] hosfile issue of openmpi-1.7.4rc2

2014-01-19 Thread tmishima
Hi Ralph, I confirmed that it worked quite well for my purpose. Thank you very much. I would point out just a small thing. Since the debug information in the rank-file block is useful even when a host is initially detected, OPAL_OUTPUT_VERBOSE in the line 302 should be out of the else-clause as

Re: [OMPI users] hosfile issue of openmpi-1.7.4rc2

2014-01-19 Thread tmishima
Thank you for your fix. I will try it tomorrow. Before that, although I could not understand everything, let me ask some questions about the new hostfile.c. 1. The line 244-248 is included in else-clause, which might cause memory leak(it seems to me). Should it be out of the clause? 244

Re: [OMPI users] hosfile issue of openmpi-1.7.4rc2

2014-01-17 Thread tmishima
Hi Ralph, I'm sorry that my explanation was not enough ... This is the summary of my situation: 1. I create a hostfile as shown below manually. 2. I use mpirun to start the job without Torque, which means I'm running in an un-managed environment. 3. Firstly, ORTE detects 8 slots on each host(

Re: [OMPI users] hosfile issue of openmpi-1.7.4rc2

2014-01-17 Thread tmishima
No, I didn't use Torque this time. This issue is caused only when it is not in the managed environment - namely, orte_managed_allocation is false (and orte_set_slots is NULL). Under the torque management, it works fine. I hope you can understand the situation. Tetsuya Mishima > I'm sorry, bu

[OMPI users] hosfile issue of openmpi-1.7.4rc2

2014-01-16 Thread tmishima
Hi Ralph, I encountered the hostfile issue again where slots are counted by listing the node multiple times. This should be fixed by r29765 - Fix hostfile parsing for the case where RMs count slots The difference is using RM or not. At that time, I executed mpirun through Torque manager. Th

[OMPI users] btl_tcp_use_nagle is negated in openmpi-1.7.4rc1

2014-01-08 Thread tmishima
_btl_tcp_component.tcp_not_use_nodelay); In spite of this negation, the socket option was set by tcp_not_use_nodelay as same as before in btl_tcp_endpoint.c. I think the line 515 should be: optval = !mca_btl_tcp_component.tcp_not_use_nodelay; /* tmishima */ I already confirmed that this fix worke

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-25 Thread tmishima
entinel), opal_tree_item_t ); /* tmishima */ #if OPAL_ENABLE_DEBUG /* These refcounts should never be used in assertions because they should never be removed from this list, added to another list, etc. So set them to sentinel values. */ tree->opal_tree_sentinel.opal_tree_item_ref

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-25 Thread tmishima
tree->get_key = get_key; opal_tree_get_root(tree)->opal_tree_num_children = 0 ; /* added by tmishima */ } Then, these errors all disappeared and openmpi with lama worked fine. As I told you before, I built openmpi with PGI 13.10. As far as I checked, no error was detected by valgrind with openmpi built

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-23 Thread tmishima
Hi Ralph, Here is the output when I put "-mca rmaps_base_verbose 10 --display-map" and where it stopped(by gdb), which shows it stopped in a function of lama. I usually use PGI 13.10, so I tried to change it to gnu compiler. Then, it works. Therefore, this problem depends on compiler. That's a

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-21 Thread tmishima
Ralph, thanks. I'll try it on Tuseday. Let me confirm one thing. I don't put "-with-libevent" when I build openmpi. Is there any possibility to build with external libevent automatically? Tetsuya Mishima > Not entirely sure - add "-mca rmaps_base_verbose 10 --display-map" to your cmd line and

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-21 Thread tmishima
Thank you, Ralph. Then, this problem should depend on our environment. But, at least, inversion problem is not the cause because node05 has normal hier order. I can not connect to our cluster now. Tuesday, going back to my office, I'll send you further report. Before that, please let me know y

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-20 Thread tmishima
Hi Ralph, Thank you very much. I tried many things such as: mpirun -np 2 -host node05 -report-bindings -mca rmaps lama -mca rmaps_lama_bind 1c myprog But every try failed. At least they were accepted by openmpi-1.7.3 as far as I remember. Anyway, please check it when you have a time, because u

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-20 Thread tmishima
Hi Ralph, I'm glad to hear that, thanks. By the way, yesterday I tried to check how lama in 1.7.4rc treat numa node. Then, even wiht this simple command line, it freezed without any massage: mpirun -np 2 -host node05 -mca rmaps lama myprog Could you check what happened? Is it better to ope

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-20 Thread tmishima
Hi Brice, Thank you for your comment. I understand what you mean. My opinion was made just considering easy way to adjust the code for inversion of hierarchy in object tree. Tetsuya Mishima > I don't think there's any such difference. > Also, all these NUMA architectures are reported the sam

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-20 Thread tmishima
Hi Ralph, The numa-node in AMD Mangy-Cours/Interlagos is so called cc(cache coherent)NUMA, which seems to be a little bit different from the traditional numa defined in openmpi. I notice that ccNUMA object is almost same as L3cache object. So "-bind-to l3cache" or "-map-by l3cache" is valid for

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-12-20 Thread tmishima
> >>>>>>>>>>>>> > >>>>>>>>>>>>> [mishima@manage demos]$ qsub -I -l nodes=node03:ppn=32 > >>>>>>>>>>>>> qsub: waiting for job 8265.manage.cluster to start > >>>&

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-12-19 Thread tmishima
orrect. We map a socket until full, then > > >>> move > > >>>>> to > > >>>>>>>>> the next. What you want is --map-by socket:span > > >>>>>>>>>> > > >

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-19 Thread tmishima
I can wait it'll be fixed in 1.7.5 or later, because putting "-bind-to numa" and "-map-by numa" at the same time works as a workaround. Thanks, Tetsuya Mishima > Yeah, it will impact everything that uses hwloc topology maps, I fear. > > One side note: you'll need to add --hetero-nodes to your c

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-19 Thread tmishima
I think it's normal for AMD opteron having 8/16 cores such as magny cours or interlagos. Because it usually has 2 numa nodes in a cpu(socket), numa-node can not include a socket. This type of hierarchy would be natural. (node03 is Dell PowerEdge R815 and maybe quite common, I guess) By the way,

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-19 Thread tmishima
Hi Ralph, I found the reason. I attached the main part of output with 32 core node(node03) and 8 core node(node05) at the bottom. >From this information, socket of node03 includes numa-node. On the other hand, numa-node of node05 includes socket. The direction of object tree is opposite. Since

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-18 Thread tmishima
Hi, here is the output with "-mca rmaps_base_verbose 10 -mca ess_base_verbose 5". Please see the attached file. (See attached file: output.txt) Regards, Tetsuya Mishima > Hmm...try adding "-mca rmaps_base_verbose 10 -mca ess_base_verbose 5" to your cmd line and let's see what it thinks it foun

[OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-18 Thread tmishima
Hi, I report one more problem with openmpi-1.7.4rc1, which is more serious. For our 32 core nodes(AMD magny cours based) which has 8 numa-nodes, "-bind-to numa" does not work. Without this option, it works. For your infomation, at the bottom of this mail, I added the lstopo information of the no

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-12-18 Thread tmishima
;> [mishima@node04 ~]$ cd ~/Desktop/openmpi-1.7/demos/ > >>>>>>>>>>> [mishima@node04 demos]$ mpirun -np 8 -report-bindings > >>>>> -cpus-per-proc > >>>>>>> 4 > >>>>>>>>>>> -map-by

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-12-18 Thread tmishima
;>>>>>> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so > >>>>>>>>> cket 0[core 3[hwt 0]]: > >>>>>>>>> > >>> [B/B/B/B/./././.][./././././././.][./././././././.][./././././././.] > >>>>>>>>

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-12-18 Thread tmishima
;>>>>>> The problem is cpus-per-proc with -map-by option under Torque > >>>>> manager. > >>>>>>>>> It doesn't work as shown below. I guess you can get the same > >>>>>>>>

Re: [OMPI users] tcp of openmpi-1.7.3 under our environment is very slow

2013-12-18 Thread tmishima
Hi Jeff, I did with processor binding enabled using both of openmpi-1.7.3 and 1.7.4rc1. But I got the same results as no binding. In addition, core mapping of 1.7.4rc1 seems to be strange, which has no relation with tcp slowdown. Regards, Tetsuya Mishima [mishima@node08 OMB-3.1.1]$ mpirun -V

[OMPI users] tcp of openmpi-1.7.3 under our environment is very slow

2013-12-16 Thread tmishima
Hi, I usually use infiniband network, where openmpi-1.7.3 and 1.6.5 works fine. The other days, I had a chance to use tcp network(1GbE) and I noticed that my application with openmpi-1.7.3 was quite slower than openmpi-1.6.5. So, I did OSU MPI Bandwidth Test v3.1.1 as shown below, which shows b

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-12-11 Thread tmishima
>>>>>>>>> > >>>>>>>>> The problem is cpus-per-proc with -map-by option under Torque > >>>>> manager. > >>>>>>>>> It doesn't work as shown below. I guess you can get the same > >>>&

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-12-10 Thread tmishima
;>>>>> ocket 1[core 11[hwt 0]]: > >>>>>>> > > [./././././././.][B/B/B/B/./././.][./././././././.][./././././././.] > >>>>>>> [node03.cluster:18128] MCW rank 3 bound to socket 1[core 12[hwt > > 0]], > >>>>&g

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-12-10 Thread tmishima
gt;>>>> [B/B/B/B/./././.][./././././././.][./././././././.][./././././././.] > >>>>> [node03.cluster:18128] MCW rank 1 bound to socket 0[core 4[hwt 0]], > >>> socket > >>>>> 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], so > >>>&

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-12-10 Thread tmishima
t;>>> > >>>> Firstly, I noticed that your hostfile can work and mine can not. > >>>> > >>>> Your host file: > >>>> cat hosts > >>>> bend001 slots=12 > >>>> > >>>> My host file: >

  1   2   >