> Or one could tell OMPI to do what you really want it to do using map-by and 
> bind-to options, perhaps putting them in the default MCA param file.

Nod.  Agreed, but far too complicated for 98% of our users.
> 
> Or you could enable cgroups in slurm so that OMPI sees the binding envelope - 
> it will respect it.

We’ve configured cgroups from the beginning.

> The problem is that OMPI isn’t seeing the requested binding envelope and 
> thinks resources are available that really aren’t, and so it gets confused 
> about how to map things. Slurm expresses that envelope in an envar, but the 
> name and syntax keep changing over the releases, and we just can’t track it 
> all the time.

Understood.

> I’m not sure what “slurm_nodeid” is - where does this come from?

Sorry, it was S_JOB_NODEID from spank.h.  I ended up changing my approach to 
the tmpdir creation because of this and the fact the the job’s UID/GID were not 
available in the SPANK routine where I needed them.  I would hope that this 
maps to the exported env variable SLURM_NODEID but I don’t know that for sure.

Thanks for the feedback,

Charlie

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to