Re: [OMPI users] Certain files for mpi missing when building mpi4py

2016-08-30 Thread Gilles Gouaillardet
Sam, at first you mentionned Open MPI 1.7.3. though this is now a legacy version, you posted to the right place. then you # python setup.py build --mpicc=/usr/lib64/mpich/bin/mpicc this is mpich, which is a very reputable MPI implementation, but not Open MPI. so i do invite you to use

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-30 Thread Jingchao Zhang
Thank you! The patch fixed the problem. I did multiple tests with your program and another application. No more process hangs! Cheers, Dr. Jingchao Zhang Holland Computing Center University of Nebraska-Lincoln 402-472-6400 From: users

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-30 Thread Jingchao Zhang
Yes, I can definitely help to test the patch. Jingchao From: users on behalf of r...@open-mpi.org Sent: Tuesday, August 30, 2016 2:23:12 PM To: Open MPI Users Subject: Re: [OMPI users] stdin issue with

[OMPI users] bug? "The system limit on number of children a process can have was reached"

2016-08-30 Thread Jason Maldonis
Hello everyone, I am using openmpi-1.10.2 and I am using the `spawn_multiple` MPI function inside a for-loop. My program spawns N workers within each iteration of the for-loop, makes some changes to the input for the next iteration, and then proceeds to the next iteration. After a few iterations

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-30 Thread r...@open-mpi.org
Oh my - that indeed illustrated the problem!! It is indeed a race condition on the backend orted. I’ll try to fix it - probably have to send you a patch to test? > On Aug 30, 2016, at 1:04 PM, Jingchao Zhang wrote: > > $mpirun -mca state_base_verbose 5 ./a.out < test.in > >

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-30 Thread r...@open-mpi.org
Well, that helped a bit. For some reason, your system is skipping a step in the launch state machine, and so we never hit the step where we setup the IO forwarding system. Sorry to keep poking, but I haven’t seen this behavior anywhere else, and so I have no way to replicate it. Must be a

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-30 Thread Jingchao Zhang
Yes, all procs were launched properly. I added “-mca plm_base_verbose 5” to the mpirun command. Please see attached for the results. $mpirun -mca plm_base_verbose 5 ./a.out < test.in I mentioned in my initial post that the test job can run properly for the 1st time. But if I kill the job and

[OMPI users] Certain files for mpi missing when building mpi4py

2016-08-30 Thread Mahdi, Sam
HI everyone, I am using a linux fedora. I downloaded/installed openmpi-1.7.3-1.fc20(64-bit) and openmpi-devel-1.7.3-1.fc20(64-bit). As well as pypar-openmpi-2.1.5_108-3.fc20(64-bit) and python3-mpi4py-openmpi-1.3.1-1.fc20(64-bit). The problem I am having is building mpi4py using the mpicc

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-30 Thread r...@open-mpi.org
Hmmm...well, the problem appears to be that we aren’t setting up the input channel to read stdin. This happens immediately after the application is launched - there is no “if” clause or anything else in front of it. The only way it wouldn’t get called is if all the procs weren’t launched, but

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-30 Thread Jingchao Zhang
I checked again and as far as I can tell, everything was setup correctly. I added "HCC debug" to the output message to make sure it's the correct plugin. The updated outputs: $ mpirun ./a.out < test.in [c1725.crane.hcc.unl.edu:218844] HCC debug: [[26513,0],0] iof:hnp pushing fd 35 for process

Re: [OMPI users] job aborts "readv failed: Connection reset by peer"

2016-08-30 Thread Gilles Gouaillardet
In absence of a clear error message, the btl_tcp_frag related error messages can suggest a process was killed by the oom-killer. This is not your case, since rank 0 died because of an illegal instruction. Are you running under a batch manager ? On which architecture ? do your compute node have

[OMPI users] job aborts "readv failed: Connection reset by peer"

2016-08-30 Thread Mahmood Naderan
Hi, An MPI job is running on two nodes and everything seems to be fine. However, in the middle of the run, the program aborts with the following error [compute-0-1.local][[47664,1],14][btl_tcp_frag.c:215:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)