Hi there,
I am facing some problems in an Open MPI application. Part of the
application is composed by a sender and a receiver. The problem is that the
sender is so much faster than the receiver, what causes the receiver's
memory to be completely used, aborting the application.
I would like to kn
Hi all,
it is more hardware or system configuration question but
I hope people in this list have an experience.
I have just added new ConnectX IB card to cluster with InfiniHost cards.
And no mpi programs work. Even ofed's tests do not work.
For example ib_send_*, ib_write_* just segfault on th
Don't know which SSI project you are referring to... I only know the
OpenSSI project, and I was one of the first who subscribed to its
mailing list (since 2001).
http://openssi.org/cgi-bin/view?page=openssi.html
I don't think those OpenSSI clusters are designed for tens of
thousands of nodes, and
Is anything done at the kernel level portable (e.g. to Windows)? It
*can* be, in principle at least (by putting appropriate #ifdef's in
the code), but I am wondering if it is in reality.
Also, in 2005 there was an attempt to implement SSI (Single System
Image) functionality to the then-current 2.6
Srinivas,
There's also Kernel-Level Checkpointing vs. User-Level Checkpointing -
if you can checkpoint an MPI task and restart it on a new node, then
this is also "process migration".
Of course, doing a checkpoint & restart can be slower than pure
in-kernel process migration, but the advantage is
It also depends on what part of migration interests you - are you wanting to
look at the MPI part of the problem (reconnecting MPI transports, ensuring
messages are not lost, etc.) or the RTE part of the problem (where to restart
processes, detecting failures, etc.)?
On Aug 24, 2011, at 7:04 A
Hi Folks,
the problem could be solved be using the same compiler settings for writung
out and reading in. Writing out was done with -trace (Intel compiler), and the
read in withou any supplemental options.
Best wishes
Alexander
> Hi Folks,
>
> I have problems to retrieve my data thatI have w
Hi Folks,
I have problems to retrieve my data thatI have written out with MPI parallel
IO. Ins tests everything works fine, but within an huger environment, the data
read in differ from those written out.
Here the setup of my experiment:
# the writer #
program parallel_io
use mpi