On Jan 13, 2010, at 9:58 PM, SpiduS Okami wrote: > I would like to know if someone could help me with the following error: > > [fenrir][[9567,1],1][../../../../../../ompi/mca/btl/tcp/btl_tcp_frag.c:216:mca_btl_tcp_frag_recv] > mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104) > > I am trying to run the hpcc program in a beowulf type cluster with 2,3 and 4 > machines. When I use 10.000 problems and up it gives me this error. Any one > know what could be this? and how can I solve this problem.
This *usually* means that an MPI process has died unexpectedly; one of its peers noticed that it died by the fact that a socket closed. You might want to poke around and see if there are corefiles or somesuch that explain why an MPI process died...? -- Jeff Squyres jsquy...@cisco.com