Gilles,
Sorry, I do not have any more details about NERSC’s usage.
Scott
> On Mar 24, 2016, at 9:49 AM, Gilles Gouaillardet
> wrote:
>
> Scott,
>
> out of curiosity ...
>
> generally speaking, and when HT is more efficient, how is it used ?
> - flat MPI,
Scott,
out of curiosity ...
generally speaking, and when HT is more efficient, how is it used ?
- flat MPI, with one task per thread
- Hybrid MPI+OpenMP, a task is bound to a core or socket, but never to a
thread
Cheers,
Gilles
On Thursday, March 24, 2016, Atchley, Scott
Hi Aurélien,
I have said the same thing to many users over the years. Our colleagues at
NERSC, however, have found that 20% of their codes work better when using HT.
Some codes benefit from SMT2 (i.e. HT) and even SMT4 (available on Power8) in
order to provide enough latency hiding of memory
To add to what Ralf said, you probably do not want to use Hyper Threads for HPC
workloads, as that generally results in very poor performance (as you noticed).
Set the number of slots to the number of real cores (not HT), that would yield
optimal results 95% of the time.
Aurélien
--
Ok, I've investigated further today, it seems "--map-by hwthread" does not
remove the problem. However, if I specified in the hostfile "node0
slots=32" it runs really slower than specifying only "node0". In both cases
I run mpirun with -np 32. So I'm quite sure I didn't understand what slots
are.
Hi guys,
I'm really confused about *slots* in resource allocation: I thought that
slots are the number of processes spawnable in a certain node, so it should
correspond to the number of Processing Elements of the node. For example,
on each of my nodes I have 2 processors, total 16 cores with