Hello, On Wed, Dec 07, 2016 at 10:19:10AM -0500, Noam Bernstein wrote: > > On Dec 7, 2016, at 10:07 AM, Christof Koehler > > <christof.koeh...@bccms.uni-bremen.de> wrote: > >> > > I really think the hang is a consequence of > > unclean termination (in the sense that the non-root ranks are not > > terminated) and probably not the cause, in my interpretation of what I > > see. Would you have any suggestion to catch signals sent between orterun > > (mpirun) and the child tasks ? > > Do you know where in the code the termination call is? Is it actually > calling mpi_abort(), or just doing something ugly like calling fortran > “stop”? If the latter, would that explain a possible hang? Well, basically it tries to use wannier90 (LWANNIER=.TRUE.). The wannier90 input contains an error, a restart is requested and the wannier90.chk file the restart information is missing. " Exiting....... Error: restart requested but wannier90.chk file not found " So it must terminate.
The termination happens in the libwannier.a, source file io.F90: write(stdout,*) 'Exiting.......' write(stdout, '(1x,a)') trim(error_msg) close(stdout) stop "wannier90 error: examine the output/error file for details" So it calls stop as you assumed. > Presumably someone here can comment on what the standard says about the > validity of terminating without mpi_abort. Well, probably stop is not a good way to terminate then. My main point was the change relative to 1.10 anyway :-) > > Actually, if you’re willing to share enough input files to reproduce, I could > take a look. I just recompiled our VASP with openmpi 2.0.1 to fix a crash > that was apparently addressed by some change in the memory allocator in a > recent version of openmpi. Just e-mail me if that’s the case. I think that is no longer necessary ? In principle it is no problem but it at the end of a (small) GW calculation, the Si tutorial example. So the mail would be abit larger due to the WAVECAR. > > Noam > > > ____________ > || > |U.S. NAVAL| > |_RESEARCH_| > LABORATORY > Noam Bernstein, Ph.D. > Center for Materials Physics and Technology > U.S. Naval Research Laboratory > T +1 202 404 8628 F +1 202 404 7546 > https://www.nrl.navy.mil <https://www.nrl.navy.mil/> -- Dr. rer. nat. Christof Köhler email: c.koeh...@bccms.uni-bremen.de Universitaet Bremen/ BCCMS phone: +49-(0)421-218-62334 Am Fallturm 1/ TAB/ Raum 3.12 fax: +49-(0)421-218-62770 28359 Bremen PGP: http://www.bccms.uni-bremen.de/cms/people/c_koehler/
signature.asc
Description: Digital signature
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users