Re: [OMPI devel] Confusion about slots

2016-03-24 Thread Atchley, Scott
Gilles, Sorry, I do not have any more details about NERSC’s usage. Scott > On Mar 24, 2016, at 9:49 AM, Gilles Gouaillardet > wrote: > > Scott, > > out of curiosity ... > > generally speaking, and when HT is more efficient, how is it used ? > - flat MPI,

Re: [OMPI devel] Confusion about slots

2016-03-24 Thread Gilles Gouaillardet
Scott, out of curiosity ... generally speaking, and when HT is more efficient, how is it used ? - flat MPI, with one task per thread - Hybrid MPI+OpenMP, a task is bound to a core or socket, but never to a thread Cheers, Gilles On Thursday, March 24, 2016, Atchley, Scott

Re: [OMPI devel] Confusion about slots

2016-03-24 Thread Atchley, Scott
Hi Aurélien, I have said the same thing to many users over the years. Our colleagues at NERSC, however, have found that 20% of their codes work better when using HT. Some codes benefit from SMT2 (i.e. HT) and even SMT4 (available on Power8) in order to provide enough latency hiding of memory

Re: [OMPI devel] Confusion about slots

2016-03-23 Thread Aurélien Bouteiller
To add to what Ralf said, you probably do not want to use Hyper Threads for HPC workloads, as that generally results in very poor performance (as you noticed). Set the number of slots to the number of real cores (not HT), that would yield optimal results 95% of the time. Aurélien --

Re: [OMPI devel] Confusion about slots

2016-03-23 Thread Federico Reghenzani
Ok, I've investigated further today, it seems "--map-by hwthread" does not remove the problem. However, if I specified in the hostfile "node0 slots=32" it runs really slower than specifying only "node0". In both cases I run mpirun with -np 32. So I'm quite sure I didn't understand what slots are.

[OMPI devel] Confusion about slots

2016-03-22 Thread Federico Reghenzani
Hi guys, I'm really confused about *slots* in resource allocation: I thought that slots are the number of processes spawnable in a certain node, so it should correspond to the number of Processing Elements of the node. For example, on each of my nodes I have 2 processors, total 16 cores with