[OMPI devel] openmpi-1.8 - hangup using more than 4 nodes under managed state by Torque

2014-04-01 Thread tmishima
Hi Ralph, I saw another hangup with openmpi-1.8 when I used more than 4 nodes (having 8 cores each) under managed state by Torque. Although I'm not sure you can reproduce it with SLURM, at leaset with Torque it can be reproduced in this way: [mishima@manage ~]$ qsub -I -l nodes=4:ppn=8 qsub: wai

[OMPI devel] v1.8 warnings

2014-04-01 Thread Ralph Castain
Would the respective parties please clean these up for v1.8.1? common_verbs_find_ports.c:164: warning: 'transport_name_to_str' defined but not used btl_openib_component.c: In function 'btl_openib_component_init': btl_openib_component.c:2696: warning: unused variable 'qp_index' In file included

[OMPI devel] One more v1.8 warning

2014-04-01 Thread Ralph Castain
memheap_buddy.c:93:5: warning: "__SIZEOF_LONG__" is not defined

Re: [OMPI devel] openmpi-1.8 - hangup using more than 4 nodes under managed state by Torque

2014-04-01 Thread Ralph Castain
I tracked it down - not Torque specific, but impacts all managed environments. Will fix On Apr 1, 2014, at 2:23 AM, tmish...@jcity.maeda.co.jp wrote: > > Hi Ralph, > > I saw another hangup with openmpi-1.8 when I used more than 4 nodes > (having 8 cores each) under managed state by Torque. Al

[OMPI devel] Seeking input for an RFC

2014-04-01 Thread Joshua Ladd
Soliciting input from the community: WHAT: Modify PML cm component to remove unnecessary initializations, optimizing blocking operations WHY:Remove overhead in fast-path by allowing a "direct mode" increases single packet latency HOW:In PML cm, even if the request starts and ends wit

[OMPI devel] Problem of running 'mpirun' on a cross-compiled openmpi-1.6.5 for armv7

2014-04-01 Thread Allan Wu
Hello everyone, I am trying to run OpenMPI-1.6.5 on a Linux on a system based on ARM Cortex A9. The linux system and the hardware is provided by Xilinx Inc., and for those who may have related experiences the system is called Zynq, which is an embedded SoC system with ARM cores and FPGA fabrics. X

Re: [OMPI devel] [OMPI svn] svn:open-mpi r31302 - in trunk: opal/mca/base orte/tools/orterun

2014-04-01 Thread Ralph Castain
This change just looks wrong - you can't split the variables on a "space" as there is no way to know how many variables there might be, and thus how to parse the rest of the cmd line. At best, you need a comma-delimited list. Please fix this! Ralph On Tue, Apr 1, 2014 at 2:14 PM, wrote: > Aut

Re: [OMPI devel] [OMPI svn] svn:open-mpi r31302 - in trunk: opal/mca/base orte/tools/orterun

2014-04-01 Thread Mike Dubman
not sure what you mean, could you please provide example? comma is used often as part of the value, here is a example: -mca base_env_list "HCOLL_BCOL=basesmuma,mlnx_p2p HCOLL_SBGP=basesmuma,p2p HCOLL_ML_USE_KNOMIAL_ALLREDUCE=1" On Wed, Apr 2, 2014 at 2:12 AM, Ralph Castain wrote: > This chan

Re: [OMPI devel] openmpi-1.8 - hangup using more than 4 nodes under managed state by Torque

2014-04-01 Thread tmishima
Thanks Ralph. Tetsuya > I tracked it down - not Torque specific, but impacts all managed environments. Will fix > > > On Apr 1, 2014, at 2:23 AM, tmish...@jcity.maeda.co.jp wrote: > > > > > Hi Ralph, > > > > I saw another hangup with openmpi-1.8 when I used more than 4 nodes > > (having 8 cores

Re: [OMPI devel] [OMPI svn] svn:open-mpi r31302 - in trunk: opal/mca/base orte/tools/orterun

2014-04-01 Thread Ralph Castain
If you are expecting the user to put quotes around the string, then you better tell them that in the help message. Otherwise, they would do what I did - simply list the envars with a space in-between, and everything fails. Also, you need to update mpirun.1in to reflect this new option or else nobo

Re: [OMPI devel] [OMPI svn] svn:open-mpi r31302 - in trunk: opal/mca/base orte/tools/orterun

2014-04-01 Thread Ralph Castain
Actually, the more I think about this, the more puzzled I get. We already have a mechanism for forwarding and/or setting envars, though it applies to the daemons, who then drop it down to the apps. Just use the "-x" option, which will either forward the current value of the envar, or you can set th

Re: [OMPI devel] [OMPI svn] svn:open-mpi r31302 - in trunk: opal/mca/base orte/tools/orterun

2014-04-01 Thread Mike Dubman
yes, it is expected that the "string" value should be quoted. will clarify it in the help message and man page. Thanks for spotting. The underlying libraries used from OMPI (mxm,psm,hcoll,glibc,pmi2,slurm,...) all have shell environment variables to control their behave. It is unreasonable to exp

Re: [OMPI devel] [OMPI svn] svn:open-mpi r31302 - in trunk: opal/mca/base orte/tools/orterun

2014-04-01 Thread Mike Dubman
The "-x var=val" cannot be placed into mca params file. On Wed, Apr 2, 2014 at 2:34 AM, Mike Dubman wrote: > yes, it is expected that the "string" value should be quoted. will clarify > it in the help message and man page. Thanks for spotting. > > The underlying libraries used from OMPI > (mxm,p

Re: [OMPI devel] [OMPI svn] svn:open-mpi r31302 - in trunk: opal/mca/base orte/tools/orterun

2014-04-01 Thread Ralph Castain
Understood - my point, however, was that we can easily add that capability to the mca params file.. This would seem far preferable to creating a completely new, parallel way of setting envars. Why not do it that way? On Tue, Apr 1, 2014 at 4:34 PM, Mike Dubman wrote: > The "-x var=val" cannot

Re: [OMPI devel] [OMPI svn] svn:open-mpi r31302 - in trunk: opal/mca/base orte/tools/orterun

2014-04-01 Thread Mike Dubman
mca param file treats any key=val as mca parameter only. In order to add parser support for something that is not mca param, will require change file syntax and it will look bad, i.e.: mca btl = sm,self,openib env DISPLAY = console:0 I think the current implementation is less intrusive and re-use

Re: [OMPI devel] [OMPI svn] svn:open-mpi r31302 - in trunk: opal/mca/base orte/tools/orterun

2014-04-01 Thread Ralph Castain
HmmmI'll have to ponder this one. Regardless, it doesn't belong in the 1.8 series as it isn't a bug fix. Meanwhile, let's table it until Jeff returns next week and discuss it at the weekly telecon. It just seems bothersome that we now have two orthogonal ways of specifying the same thing - thi