To add to what Ralf said, you probably do not want to use Hyper Threads for HPC
workloads, as that generally results in very poor performance (as you noticed).
Set the number of slots to the number of real cores (not HT), that would yield
optimal results 95% of the time.
Aurélien
--
Ok, I've investigated further today, it seems "--map-by hwthread" does not
remove the problem. However, if I specified in the hostfile "node0
slots=32" it runs really slower than specifying only "node0". In both cases
I run mpirun with -np 32. So I'm quite sure I didn't understand what slots
are.
Dominic,
I can only recommend you write a small self contained programs that write the
data in parallel, and then check from task 0 only that data was written as you
expected.
Feel free to take some time reading mpi io tutorials.
If you are still struggling with your code, i will try to help
I am open to any suggestions to make the code better, especially if the way
it's coded now is wrong.
I believe what the MPI_TYPE_INDEXED is trying to do is this...
I have a domain of for example 8 hexahedral elements (2x2x2 cell domain)
that has 27 unique connectivity nodes (3x3x3 nodes)
In this
Dominik,
with MPI_Type_indexed, array_of_displacements is an int[]
so yes, there is a risk of overflow
on the other hand, MPI_Type_create_hindexed, array_of_displacements is
an MPI_Aint[]
note
array_of_displacements
Displacement for each block, in multiples of
Hi Gilles,
I believe I have found the problem. Initially I thought it may have been an
mpi issue since it was internally within an mpi function. However, now I am
sure that the problem has to do with an overflow of 4-byte signed integers.
I am dealing with computational domains that have a