On Jan 13, 2010, at 9:58 PM, SpiduS Okami wrote:

> I would like to know if someone could help me with the following error:
> 
> [fenrir][[9567,1],1][../../../../../../ompi/mca/btl/tcp/btl_tcp_frag.c:216:mca_btl_tcp_frag_recv]
>  mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
> 
> I am trying to run the hpcc program in a beowulf type cluster with 2,3 and 4 
> machines. When I use 10.000 problems and up it gives me this error. Any one 
> know what could be this? and how can I solve this problem.

This *usually* means that an MPI process has died unexpectedly; one of its 
peers noticed that it died by the fact that a socket closed.

You might want to poke around and see if there are corefiles or somesuch that 
explain why an MPI process died...?

-- 
Jeff Squyres
jsquy...@cisco.com


Reply via email to