Hi,

in one of our codes, we want to create a log of events that happen in
the MPI processes, where the number of these events and their timing is
unpredictable.

So I implemented a simple test code, where process 0
creates a thread that is just busy-waiting for messages from any
process, and which is sent to stdout/stderr/log file upon receiving
them. The test code is at https://github.com/angel-devicente/thread_io
and the same idea went into our "real" code.

As far as I could see, this behaves very nicely, there are no deadlocks,
no lost messages and the performance penalty is minimal when considering
the real application this is intended for.

But then I found that in a local cluster the performance was very bad
(from ~5min 50s to ~5s for some test) when run with the locally
installed OpenMPI and my own OpenMPI installation (same gcc and OpenMPI
versions). Checking the OpenMPI configuration details, I found that the
locally installed OpenMPI was configured to use the Mellanox IB driver,
and in particular the hcoll component was somehow killing performance:

running with

mpirun  --mca coll_hcoll_enable 0 -np 51 ./test_t

was taking ~5s, while enabling coll_hcoll was killing performance, as
stated above (when run in a single node the performance also goes down,
but only about a factor 2X).

Has anyone seen anything like this? Perhaps a newer Mellanox driver
would solve the problem?

We were planning on making our code public, but before we do so, I want
to understand under which conditions we could have this problem with the
"Threaded I/O" approach and if possible how to get rid of it completely.

Any help/pointers appreciated.
-- 
Ángel de Vicente

Tel.: +34 922 605 747
Web.: http://research.iac.es/proyecto/polmag/
---------------------------------------------------------------------------------------------
ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de 
Datos, acceda a http://www.iac.es/disclaimer.php
WARNING: For more information on privacy and fulfilment of the Law concerning 
the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en

Reply via email to