Hi Jason, No my main problem is in regards to running mpirun --np any #. I am unable to run any command on multiple processors using mpi4py and openmpi together. Whenever I type in a command that uses both, I receive no output. I was wondering whether it was a compatibility issue
On Thu, Oct 6, 2016 at 10:54 AM, <users-requ...@lists.open-mpi.org> wrote: > Send users mailing list submissions to > users@lists.open-mpi.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > or, via email, send a message with subject or body 'help' to > users-requ...@lists.open-mpi.org > > You can reach the person managing the list at > users-ow...@lists.open-mpi.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of users digest..." > > > Today's Topics: > > 1. Re: centos 7.2 openmpi from repo, stdout issue (Emre Brookes) > 2. openmpi and mpi4py compatibility (Mahdi, Sam) > 3. Re: openmpi and mpi4py compatibility (Jason Maldonis) > 4. Re: openmpi and mpi4py compatibility (Lisandro Dalcin) > 5. Re: MPI + system() call + Matlab MEX crashes (Gilles Gouaillardet) > 6. Re: MPI + system() call + Matlab MEX crashes (Bennet Fauber) > 7. Using Open MPI with multiple versions of GCC and G++ (Aditya) > 8. Re: Using Open MPI with multiple versions of GCC and G++ > (Jeff Squyres (jsquyres)) > 9. Re: [EXTERNAL] Using Open MPI with multiple versions of GCC > and G++ (Simon Hammond) > 10. Crash during MPI_Finalize (George Reeke) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 05 Oct 2016 14:00:28 -0500 > From: Emre Brookes <e...@biochem.uthscsa.edu> > To: Open MPI Users <users@lists.open-mpi.org> > Subject: Re: [OMPI users] centos 7.2 openmpi from repo, stdout issue > Message-ID: <57f54dcc.6070...@biochem.uthscsa.edu> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Thank you for the sanity check and recommendations. > I will post my results here when resolved. > > Jeff Squyres (jsquyres) wrote: > > We did have some kind of stdout/stderr truncation issue a little while > ago, but I don't remember what version it specifically affected. > > > > I would definitely update to at least Open MPI 1.10.4 (lots of bug fixes > since 1.10.0). Better would be to update to Open MPI 2.0.1 -- that's the > current generation and where all of our work is going these days. > > > > > >> On Oct 5, 2016, at 9:23 AM, Emre Brookes <e...@biochem.uthscsa.edu> > wrote: > >> > >> $ cat /etc/redhat-release > >> CentOS Linux release 7.2.1511 (Core) > >> > >> $ yum list installed | grep openmpi > >> openmpi.x86_64 1.10.0-10.el7 @base > >> openmpi-devel.x86_64 1.10.0-10.el7 @base > >> > >> (1) When I run > >> $ mpirun -H myhosts -np myprocs executable > >> the job runs fine and outputs correctly to stdout > >> > >> (2) When I run > >> $ mpirun -H myhosts -np myprocs executable > stdout.log > >> The stdout.log file prematurely ends (without full output) > >> ... but the mpi executable itself seems to keep running forever until > manually terminated will a "kill". > >> > >> (3) When I run > >> $ mpirun -H myhosts -np myprocs executable | cat > stdout.log > >> the job runs fine and outputs correctly to the stdout.log file > >> > >> I tried playing with a 'stdbuf' prefix to the command, but this didn't > seem to help > >> I would like (2) to work, but have resorted to (3). > >> > >> I tried digging around in the parameters after seeing > https://github.com/open-mpi/ompi/issues/341 > >> and thinking it might be something similar, but didn't see any poll or > epoll in .conf > >> I am hesitant to try to compile from scratch and get away from the repo > release cycle. > >> > >> Is this a known bug? > >> If so, and if it has been fixed, would you recommend I install the > latest stable rpm of 1.10.4-1 from https://www.open-mpi.org/ > software/ompi/v1.10/ ? > >> > >> Thanks, > >> Emre > >> > >> > >> > >> > >> > >> > >> _______________________________________________ > >> users mailing list > >> users@lists.open-mpi.org > >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > > > > > ------------------------------ > > Message: 2 > Date: Wed, 5 Oct 2016 12:29:28 -0700 > From: "Mahdi, Sam" <sam.mahdi....@my.csun.edu> > To: users@lists.open-mpi.org > Subject: [OMPI users] openmpi and mpi4py compatibility > Message-ID: > <CALgM0gUkPpcm0YqvUrnJc3=nw0VWhGnJzS2teYRmz8nZWrA3zQ@mail. > gmail.com> > Content-Type: text/plain; charset="utf-8" > > Hi everyone, > > I had a quick question regarding the compatibility of openmpi and mpi4py. I > have openmpi 1.7.3 and mpi4py 1.3.1. I know these are older versions of > each, but I was having some problems running a program that uses mpi4py and > openmpi, and I wanted to make sure it wasn't a compatibility issue between > the 2 versions of these programs. > > Sincerely, > Sam > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: <https://rfd.newmexicoconsortium.org/mailman/private/users/ > attachments/20161005/aac4f0a4/attachment.html> > > ------------------------------ > > Message: 3 > Date: Wed, 5 Oct 2016 14:51:04 -0500 > From: Jason Maldonis <maldo...@wisc.edu> > To: Open MPI Users <users@lists.open-mpi.org> > Subject: Re: [OMPI users] openmpi and mpi4py compatibility > Message-ID: > <CA+LevY+BfrMUknKUZ28RBpc6DUnRPeZ7TCakD > 7ru9mhpdrn...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Hi Sam, > > I am not a developer but I am using mpi4py with openmpi-1.10.2. For that > version, most of the functionality works, but I think there are some issues > with the mpi_spawn commands. Are you using the spawn commands? > > I have no experience with the versions you are using, but I thought I'd > chime in just in case you ran into a similar issue as me. > > Best, > Jason > > Jason Maldonis > Research Assistant of Professor Paul Voyles > Materials Science Grad Student > University of Wisconsin, Madison > 1509 University Ave, Rm 202 > Madison, WI 53706 > maldo...@wisc.edu > > On Wed, Oct 5, 2016 at 2:29 PM, Mahdi, Sam <sam.mahdi....@my.csun.edu> > wrote: > > > Hi everyone, > > > > I had a quick question regarding the compatibility of openmpi and mpi4py. > > I have openmpi 1.7.3 and mpi4py 1.3.1. I know these are older versions > of > > each, but I was having some problems running a program that uses mpi4py > and > > openmpi, and I wanted to make sure it wasn't a compatibility issue > between > > the 2 versions of these programs. > > > > Sincerely, > > Sam > > > > _______________________________________________ > > users mailing list > > users@lists.open-mpi.org > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: <https://rfd.newmexicoconsortium.org/mailman/private/users/ > attachments/20161005/8625aa46/attachment.html> > > ------------------------------ > > Message: 4 > Date: Wed, 5 Oct 2016 23:24:33 +0300 > From: Lisandro Dalcin <dalc...@gmail.com> > To: Open MPI Users <users@lists.open-mpi.org>, > sam.mahdi....@my.csun.edu > Subject: Re: [OMPI users] openmpi and mpi4py compatibility > Message-ID: > <CAEcYPwA2YVdQCmqdsa8nGxmAkd-tToqhLe7iHEcs8ZJ6F=hDXA@mail. > gmail.com> > Content-Type: text/plain; charset="utf-8" > > On 5 October 2016 at 22:29, Mahdi, Sam <sam.mahdi....@my.csun.edu> wrote: > > > Hi everyone, > > > > I had a quick question regarding the compatibility of openmpi and mpi4py. > > I have openmpi 1.7.3 and mpi4py 1.3.1. I know these are older versions > of > > each, but I was having some problems running a program that uses mpi4py > and > > openmpi, and I wanted to make sure it wasn't a compatibility issue > between > > the 2 versions of these programs. > > > > > Hi, I'm the author of mpi4py. Could you elaborate on the issues you > experienced? I would start by disabling MPI_Init_threads() from mpi4py, for > you have to add the following code: > > import mpi4py.rc > mpi4py.rc.threaded = False > from mpi4py import MPI > > But you have to do it at the VERY BEGINNING of your code, more precisely, > the two first lines should be used before any attempt to "from mpi4py > import MPI". > > PS: Any chance you can use a newer version of mpi4py, maybe even a git > checkout of the master branch? > > -- > Lisandro Dalcin > ============ > Research Scientist > Computer, Electrical and Mathematical Sciences & Engineering (CEMSE) > Extreme Computing Research Center (ECRC) > King Abdullah University of Science and Technology (KAUST) > http://ecrc.kaust.edu.sa/ > > 4700 King Abdullah University of Science and Technology > al-Khawarizmi Bldg (Bldg 1), Office # 0109 > Thuwal 23955-6900, Kingdom of Saudi Arabia > http://www.kaust.edu.sa > > Office Phone: +966 12 808-0459 > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: <https://rfd.newmexicoconsortium.org/mailman/private/users/ > attachments/20161005/ef2da0cc/attachment.html> > > ------------------------------ > > Message: 5 > Date: Thu, 6 Oct 2016 09:03:55 +0900 > From: Gilles Gouaillardet <gil...@rist.or.jp> > To: users@lists.open-mpi.org > Subject: Re: [OMPI users] MPI + system() call + Matlab MEX crashes > Message-ID: <394522cc-300b-43ec-220e-7ada4c5ea...@rist.or.jp> > Content-Type: text/plain; charset="windows-1252"; Format="flowed" > > Juraj, > > > if i understand correctly, the "master" task calls MPI_Init(), and then > fork&exec matlab. > > In some cases (lack of hardware support), fork cannot even work. but > let's assume it is fine for now. > > Then, if i read between the lines, matlab calls mexFunction that > MPI_Init(). > > As far as i am concerned, that cannot work. > > The blocker is that a child cannot call MPI_Init() if its parent already > called MPI_Init() > > > Fortunatly, you have some options :-) > 1) start matlab from mpirun. > for example, if you want one master, two slaves and matlab, you can do > something like > mpirun -np 1 master : -np 1 matlab : -np 2 slave > > 2) MPI_Comm_spawn matlab > master can MPI_Comm_spawn() matlab, and then matlab can merge the parent > communicator, > and communicate to master and slaves > > 3) use the approach suggested by Dmitry > /* this is specific to matlab, and i have no experience with it */ > > One last point, MPI_Init() can be invoked only once per task > (e.g. if your mexFunction does > MPI_Init(); work(); MPI_Finalize(); > then it can be invoked only once per mpirun > > Cheers, > > Gilles > > On 10/5/2016 6:41 PM, Dmitry N. Mikushin wrote: > > Hi Juraj, > > > > Although MPI infrastructure may technically support forking, it's > > known that not all system resources can correctly replicate themselves > > to forked process. For example, forking inside MPI program with active > > CUDA driver will result into crash. > > > > Why not to compile down the MATLAB into a native library and link it > > with the MPI application directly? E.g. like here: > > https://www.mathworks.com/matlabcentral/answers/98867- > how-do-i-create-a-c-shared-library-from-mex-files-using- > the-matlab-compiler?requestedDomain=www.mathworks.com > > > > Kind regards, > > - Dmitry Mikushin. > > > > > > 2016-10-05 11:32 GMT+03:00 juraj2...@gmail.com > > <mailto:juraj2...@gmail.com> <juraj2...@gmail.com > > <mailto:juraj2...@gmail.com>>: > > > > Hello, > > > > I have an application in C++(main.cpp) that is launched with > > multiple processes via mpirun. Master process calls matlab via > > system('matlab -nosplash -nodisplay -nojvm -nodesktop -r > > "interface"'), which executes simple script interface.m that calls > > mexFunction (mexsolve.cpp) from which I try to set up > > communication with the rest of the processes launched at the > > beginning together with the master process. When I run the > > application as listed below on two different machines I experience: > > > > 1) crash at MPI_Init() in the mexFunction() on cluster machine > > with Linux 4.4.0-22-generic > > > > 2) error in MPI_Send() shown below on local machine with > > Linux 3.10.0-229.el7.x86_64 > > [archimedes:31962] shmem: mmap: an error occurred while > > determining whether or not > > /tmp/openmpi-sessions-1007@archimedes_0/58444/1/shared_ > mem_pool.archimedes > > could be created. > > [archimedes:31962] create_and_attach: unable to create shared > > memory BTL coordinating structure :: size 134217728 > > [archimedes:31962] shmem: mmap: an error occurred while > > determining whether or not > > /tmp/openmpi-sessions-1007@archimedes_0/58444/1/0/vader_ > segment.archimedes.0 > > could be created. > > [archimedes][[58444,1],0][../../../../../opal/mca/btl/tcp/ > btl_tcp_endpoint.c:800:mca_btl_tcp_endpoint_complete_connect] > > connect() to <MY_IP> failed: Connection refused (111) > > > > I launch application as following: > > mpirun --mca mpi_warn_on_fork 0 --mca btl_openib_want_fork_support > > 1 -np 2 -npernode 1 ./main > > > > I have openmpi-2.0.1 configured with --prefix=${INSTALLDIR} > > --enable-mpi-fortran=all --with-pmi --disable-dlopen > > > > For more details, the code is here: > > https://github.com/goghino/matlabMpiC > > <https://github.com/goghino/matlabMpiC> > > > > Thanks for any suggestions! > > > > Juraj > > > > _______________________________________________ > > users mailing list > > users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > <https://rfd.newmexicoconsortium.org/mailman/listinfo/users> > > > > > > > > > > _______________________________________________ > > users mailing list > > users@lists.open-mpi.org > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: <https://rfd.newmexicoconsortium.org/mailman/private/users/ > attachments/20161006/341407d4/attachment.html> > > ------------------------------ > > Message: 6 > Date: Wed, 5 Oct 2016 20:38:25 -0400 > From: Bennet Fauber <ben...@umich.edu> > To: Open MPI Users <users@lists.open-mpi.org> > Subject: Re: [OMPI users] MPI + system() call + Matlab MEX crashes > Message-ID: > <CAB2ovotJqrj4JifPoqgrO7FTziGJZ-0rb0LCzizPeP9SsDL4DQ@mail. > gmail.com> > Content-Type: text/plain; charset=UTF-8 > > Matlab may have its own MPI installed. It definitely does if you have > the parallel computing toolbox. If you have that, it could be causing > problems. If you can, you might consider compiling your Matlab > application into a standalone executable, then call that from your own > program. That bypasses the Matlab user interface and may prove more > tractable See the documentation for mcc if you have that. > > http://www.mathworks.com/help/compiler/mcc.html > > If you have that toolbox. > > -- bennet > > > On Wed, Oct 5, 2016 at 8:03 PM, Gilles Gouaillardet <gil...@rist.or.jp> > wrote: > > Juraj, > > > > > > if i understand correctly, the "master" task calls MPI_Init(), and then > > fork&exec matlab. > > > > In some cases (lack of hardware support), fork cannot even work. but > let's > > assume it is fine for now. > > > > Then, if i read between the lines, matlab calls mexFunction that > MPI_Init(). > > > > As far as i am concerned, that cannot work. > > > > The blocker is that a child cannot call MPI_Init() if its parent already > > called MPI_Init() > > > > > > Fortunatly, you have some options :-) > > 1) start matlab from mpirun. > > for example, if you want one master, two slaves and matlab, you can do > > something like > > mpirun -np 1 master : -np 1 matlab : -np 2 slave > > > > 2) MPI_Comm_spawn matlab > > master can MPI_Comm_spawn() matlab, and then matlab can merge the parent > > communicator, > > and communicate to master and slaves > > > > 3) use the approach suggested by Dmitry > > /* this is specific to matlab, and i have no experience with it */ > > > > One last point, MPI_Init() can be invoked only once per task > > (e.g. if your mexFunction does > > MPI_Init(); work(); MPI_Finalize(); > > then it can be invoked only once per mpirun > > > > Cheers, > > > > Gilles > > > > On 10/5/2016 6:41 PM, Dmitry N. Mikushin wrote: > > > > Hi Juraj, > > > > Although MPI infrastructure may technically support forking, it's known > that > > not all system resources can correctly replicate themselves to forked > > process. For example, forking inside MPI program with active CUDA driver > > will result into crash. > > > > Why not to compile down the MATLAB into a native library and link it with > > the MPI application directly? E.g. like here: > > https://www.mathworks.com/matlabcentral/answers/98867- > how-do-i-create-a-c-shared-library-from-mex-files-using- > the-matlab-compiler?requestedDomain=www.mathworks.com > > > > Kind regards, > > - Dmitry Mikushin. > > > > > > 2016-10-05 11:32 GMT+03:00 juraj2...@gmail.com <juraj2...@gmail.com>: > >> > >> Hello, > >> > >> I have an application in C++(main.cpp) that is launched with multiple > >> processes via mpirun. Master process calls matlab via system('matlab > >> -nosplash -nodisplay -nojvm -nodesktop -r "interface"'), which executes > >> simple script interface.m that calls mexFunction (mexsolve.cpp) from > which I > >> try to set up communication with the rest of the processes launched at > the > >> beginning together with the master process. When I run the application > as > >> listed below on two different machines I experience: > >> > >> 1) crash at MPI_Init() in the mexFunction() on cluster machine with > Linux > >> 4.4.0-22-generic > >> > >> 2) error in MPI_Send() shown below on local machine with Linux > >> 3.10.0-229.el7.x86_64 > >> [archimedes:31962] shmem: mmap: an error occurred while determining > >> whether or not > >> /tmp/openmpi-sessions-1007@archimedes_0/58444/1/shared_ > mem_pool.archimedes > >> could be created. > >> [archimedes:31962] create_and_attach: unable to create shared memory BTL > >> coordinating structure :: size 134217728 > >> [archimedes:31962] shmem: mmap: an error occurred while determining > >> whether or not > >> /tmp/openmpi-sessions-1007@archimedes_0/58444/1/0/vader_ > segment.archimedes.0 > >> could be created. > >> > >> [archimedes][[58444,1],0][../../../../../opal/mca/btl/tcp/ > btl_tcp_endpoint.c:800:mca_btl_tcp_endpoint_complete_connect] > >> connect() to <MY_IP> failed: Connection refused (111) > >> > >> I launch application as following: > >> mpirun --mca mpi_warn_on_fork 0 --mca btl_openib_want_fork_support 1 > -np > >> 2 -npernode 1 ./main > >> > >> I have openmpi-2.0.1 configured with --prefix=${INSTALLDIR} > >> --enable-mpi-fortran=all --with-pmi --disable-dlopen > >> > >> For more details, the code is here: https://github.com/goghino/ > matlabMpiC > >> > >> Thanks for any suggestions! > >> > >> Juraj > >> > >> _______________________________________________ > >> users mailing list > >> users@lists.open-mpi.org > >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > > > > > > > > > _______________________________________________ > > users mailing list > > users@lists.open-mpi.org > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > > > > > > > _______________________________________________ > > users mailing list > > users@lists.open-mpi.org > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > > ------------------------------ > > Message: 7 > Date: Thu, 6 Oct 2016 19:26:15 +0530 > From: Aditya <adityasarma...@gmail.com> > To: users@lists.open-mpi.org > Subject: [OMPI users] Using Open MPI with multiple versions of GCC and > G++ > Message-ID: > <CAL50urYvF7GVzabxZb9JXROGtfZvz4doxrxvO968JhhiEtb_mg@mail. > gmail.com> > Content-Type: text/plain; charset="utf-8" > > Hello, > > I'm a senior year Computer Science student working on Parallel Clustering > Algorithms at BITS Pilani, India. I have a few questions about using mpicc > and mpicxx with multiple versions of gcc / g++. > > I am using Ubuntu 12.04 equipped with gcc 4.6.4. The currently installed > mpicc is bound to gcc4.6.4. I want mpicc to be bound with gcc-5 that I have > installed in my pc. > > Is there a way to do the binding to gcc as a compiler flag or something of > that sort. > > PS: Please do reply if you have a solution. I am unable to run a hybrid > code on my pc because of this issue. > > > Regards, > Aditya. > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: <https://rfd.newmexicoconsortium.org/mailman/private/users/ > attachments/20161006/e43eb864/attachment.html> > > ------------------------------ > > Message: 8 > Date: Thu, 6 Oct 2016 14:12:23 +0000 > From: "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> > To: "Open MPI User's List" <users@lists.open-mpi.org> > Subject: Re: [OMPI users] Using Open MPI with multiple versions of GCC > and G++ > Message-ID: <52ba8344-6808-4047-94c5-e2fb59110...@cisco.com> > Content-Type: text/plain; charset="us-ascii" > > On Oct 6, 2016, at 9:56 AM, Aditya <adityasarma...@gmail.com> wrote: > > > > I'm a senior year Computer Science student working on Parallel > Clustering Algorithms at BITS Pilani, India. I have a few questions about > using mpicc and mpicxx with multiple versions of gcc / g++. > > > > I am using Ubuntu 12.04 equipped with gcc 4.6.4. The currently installed > mpicc is bound to gcc4.6.4. I want mpicc to be bound with gcc-5 that I have > installed in my pc. > > > > Is there a way to do the binding to gcc as a compiler flag or something > of that sort. > > > > PS: Please do reply if you have a solution. I am unable to run a hybrid > code on my pc because of this issue. > > Especially with C++, the Open MPI team strongly recommends you building > Open MPI with the target versions of the compilers that you want to use. > Unexpected things can happen when you start mixing versions of compilers > (particularly across major versions of a compiler). To be clear: compilers > are *supposed* to be compatible across multiple versions (i.e., compile a > library with one version of the compiler, and then use that library with an > application compiled by a different version of the compiler), but a) > there's other issues, such as C++ ABI issues and other run-time > bootstrapping that can complicate things, and b) bugs in forward and > backward compatibility happen. > > The short answer is in this FAQ item: https://www.open-mpi.org/faq/? > category=mpi-apps#override-wrappers-after-v1.0. Substituting the gcc 5 > compiler may work just fine. > > But the *safer* answer is that you might want to re-build Open MPI with > the specific target compiler. > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: http://www.cisco.com/web/ > about/doing_business/legal/cri/ > > > > ------------------------------ > > Message: 9 > Date: Thu, 6 Oct 2016 08:15:28 -0600 > From: Simon Hammond <sdha...@sandia.gov> > To: Open MPI Users <users@lists.open-mpi.org> > Subject: Re: [OMPI users] [EXTERNAL] Using Open MPI with multiple > versions of GCC and G++ > Message-ID: <5c6983db-5dda-4147-8b23-20a941b32...@sandia.gov> > Content-Type: text/plain; charset="utf-8" > > Can you try setting the environment variable OMPI_CXX=<put the path to > gcc-5 here> > > Then run: > > mpicxx -v > > and see what version it says its running. You may have to be careful > mixing the versions too far apart. > > > S. > > ? > Si Hammond > Scalable Computer Architectures > Center for Computing Research > Sandia National Laboratories, NM, USA > > > On Oct 6, 2016, at 7:56 AM, Aditya <adityasarma...@gmail.com> wrote: > > > > Hello, > > > > I'm a senior year Computer Science student working on Parallel > Clustering Algorithms at BITS Pilani, India. I have a few questions about > using mpicc and mpicxx with multiple versions of gcc / g++. > > > > I am using Ubuntu 12.04 equipped with gcc 4.6.4. The currently installed > mpicc is bound to gcc4.6.4. I want mpicc to be bound with gcc-5 that I have > installed in my pc. > > > > Is there a way to do the binding to gcc as a compiler flag or something > of that sort. > > > > PS: Please do reply if you have a solution. I am unable to run a hybrid > code on my pc because of this issue. > > > > > > Regards, > > Aditya. > > > > > > > > > > _______________________________________________ > > users mailing list > > users@lists.open-mpi.org > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > > > ------------------------------ > > Message: 10 > Date: Thu, 06 Oct 2016 13:53:59 -0400 > From: George Reeke <re...@mail.rockefeller.edu> > To: users@lists.open-mpi.org > Subject: [OMPI users] Crash during MPI_Finalize > Message-ID: <1475776439.3312.229.ca...@marengo2.rockefeller.edu> > Content-Type: text/plain; charset="iso-8859-1" > > Dear colleagues, > I have a parallel MPI application written in C that works normally in > a serial version and in the parallel version in the sense that all > numerical output is correct. When it tries to shut down, it gives the > following console error messsage: > > Primary job terminated normally, but 1 process returned > a non-zero exit code.. Per user-direction, the job has been aborted. > ------------------------------------------------------- > -------------------------------------------------------------------------- > mpirun detected that one or more processes exited with non-zero status, > thus causing > the job to be terminated. The first process to do so was: > > Process name: [[51524,1],0] > Exit code: 13 > -----End quoted console text----- > The Process name given is not the number of any Linux process. > The Exit code given seems to be any number in the range 12 to 17. > The core dumps produced do not have usable backtrace information. > There is no output on stderr (besides my debug messages). > The last message written by rank 0 node on stdout and flushed is lost. > I cannot determine the cause of the problem. > Let me be as explicit as possible: > OS RHEL 6.8, compiler gcc 4.4.7 with -g, no optimization > Version of MPI (RedHat package): openmpi-1.10-1.10.2-2.el6.x86_64 > The startup command is like this: > mpirun --output-filename junk -mca btl_tcp_if_include lo -n 1 cnsP0 NOSP : > -n 3 cnsPn < v8tin/dan > > cnsP0 is a master code that reads a control file (specified after the > '<' on the command line). The other executables (cnsPn) only send and > receive messages and do math, no file IO. I get same results with > 3 or 4 compute nodes. > Early in startup, another process is started via MPI_Comm_spawn. > I suspect this is relevant to the problem, although simple test > programs using the same setup complete normally. This process, > andmsg, receives status or debug information asynchronously via > messages from the other processes and writes them to stderr. > I have tried many versions of the shutdown code, all with the same > result. Here is one version (debug writes (using fwrite()and > fflush()) are deleted, comments modified for clarity): > > Application code (cnsP0 and cnsPn): > /* Everything works OK up to here (stdout and debug output). */ > int rc, ival = 0; > /* In next line, NC.dmsgid is rank # of andmsg process and > * NC.commd is intercommunicator to it. andmsg counts these > * shutdown messages, one from each app node. */ > rc = MPI_Send(&ival, 1, MPI_INT, NC.dmsgid, SHUTDOWN_ANDMSG, > NC.commd); > /* This message confirms that andmsg got 4 SHUTDOWN messages. > * "is_host(NC.node)" returns 1 if this is the rank 0 node. */ > if (is_host(NC.node)) { MPI_Recv(&ival, 1, MPI_INT, NC.dmsgid, > CLOSING_ANDMSG, NC.commd, MPI_STATUS_IGNORE); } > /* Results are similar with or without this barrier. Debug lines > * written on stderr from all nodes after barrier appear OK. */ > rc = MPI_Barrier(NC.commc); /* NC.commc is original world comm */ > /* Behavior is same with or without this extra message exchange, > * which I added to keep andmsg from terminating before the > * barrier among the other nodes completes. */ > if (is_host(NC.node)) { rc = MPI_Send(&ival, 1, MPI_INT, > NC.dmsgid, SHUTDOWN_ANDMSG, NC.commd); } > /* Behavior is same with or without this disconnect */ > rc = MPI_Comm_disconnect(&NC.commd); > rc = MPI_Finalize(); > exit(0); > > Spawned process (andmsg) code extract: > > if (num2stop <= 0) { /* Countdown of shutdown messages received */ > int rc; > /* This message confirms to main app that shutdown messages > * were received from all nodes. */ > rc = MPI_Send(&num2stop, 1, MPI_INT, NC.hostid, > CLOSING_ANDMSG, NC.commd); > /* Receive extra synch message commented above */ > rc = MPI_Recv(&sdmsg, 1, MPI_INT, NC.hostid, MPI_ANY_TAG, > NC.commd, MPI_STATUS_IGNORE); > sleep(1); /* Results are same with or without this sleep */ > /* Results are same with or without this disconnect */ > rc = MPI_Comm_disconnect(&NC.commd); > rc = MPI_Finalize(); > exit(0); > } > > I would much appreciate any suggestions how to debug this. > >From the suggestions at the community help web page, here is more > information: > config.log file, bzipped version, is attached. > ompi_info --all bzipped output is attached. > I am not sending information from other nodes or network config--for > test purposes, all processes are running on the one node, my laptop > with i7 processor. I set the "-mca btl_tcp_if_include lo" parameter > earlier when I got an error message about a refused connection > (that my code never asked for in the first place). This got rid > of that error message but application still fails and dumps. > Thanks, > George Reeke > > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: config.log.bz2 > Type: application/x-bzip > Size: 17618 bytes > Desc: not available > URL: <https://rfd.newmexicoconsortium.org/mailman/private/users/ > attachments/20161006/d899b19c/attachment.bin> > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: ompi_info.output.bz2 > Type: application/x-bzip > Size: 22964 bytes > Desc: not available > URL: <https://rfd.newmexicoconsortium.org/mailman/private/users/ > attachments/20161006/d899b19c/attachment-0001.bin> > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > ------------------------------ > > End of users Digest, Vol 3619, Issue 1 > ************************************** >
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users