On Feb 24, 2011, at 2:59 PM, Henderson, Brent wrote: > [snip] > They really can't be all SLURM_PROCID=0 - that is supposed to be unique for > the job - right? It appears that the SLURM_PROCID is inherited from the > orted parent - which makes a fair amount of sense given how things are > launched.
That's correct, and I can agree with your sentiment. However, our design goals were to provide a consistent *Open MPI* experience across different launchers. Providing native access to the actual underlying launcher was a secondary goal. Balancing those two, you can see why we chose the model we did: our orted provides (nearly) the same functionality across all environments. In SLURM's case, we propagate a [seemingly] non-sensical SLURM_PROCID values to the individual processes, but only if you are making an assumption about how Open MPI is using SLURM's launcher. More specifically, our goal is to provide consistent *Open MPI information* (e.g., through the OMPI_COMM_WORLD* env variables) -- not emulate what SLURM would have done if MPI processes had been launched individually through srun. Even more specifically: we don't think that the exact underlying launching mechanism that OMPI uses is of interest to most users; we encourage them to use our portable mechanisms that work even if they move to another cluster with a different launcher. Admittedly, that does make it a little more challenging if you have to support multiple MPI implementations, and although that's an important consideration to us, it's not our first priority. > Now to answer the other question - why are there some variables missing. It > appears that when the orted processes are launched - via srun but only one > per node, it is a subset of the main allocation and thus some of the > environment variables are not the same (or missing entirely) as compared to > launching them directly with srun on the full allocation. This also makes > sense to me at some level, so I'm at peace with it now. :) Ah, good. > Last thing before I go. Please let me apologize for not being clear on what > I disagreed with Ralph about in my last note. Clearly he nailed the orted > launching process and spelled it out very clearly, but I don't believe that > HP-MPI is not doing anything special to copy/fix up the SLURM environment > variables. Hopefully that was clear by the body of that message. No worries; you were perfectly clear. Thanks! -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/