Jeff/All, OK ... so if I follow your lead and build a version without PBS --tm= integration and it works, I should be able to report this as an incompatibility bug between the latest version of PBS Pro (10.2.0.93147) and the latest version of OpenMPI (1.4.2). right? Do I report that you to my friends at OpenMPI or my friends at PBS Pro (Altair), or both?
Thanks for your help. I will let you know what the result is ... rbw Richard Walsh Parallel Applications and Systems Manager CUNY HPC Center, Staten Island, NY 718-982-3319 612-382-4620 Mighty the Wizard Who found me at sunrise Sleeping, and woke me And learn'd me Magic! ________________________________________ From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of Jeff Squyres [jsquy...@cisco.com] Sent: Thursday, June 10, 2010 11:52 AM To: Open MPI Users Subject: Re: [OMPI users] Address not mapped segmentation fault with1.4.2 ... Not offhand, but just to close the loop on a question from your first mail: this should not be a memory manager issue (i.e., not related to IB). As Ralph noted, this is a segv in the launcher (mpirun, in this case) -- in the tm_init() function call (TM is the launcher helper library in PBS/Torque). Open MPI (mpirun, in this case) calls tm_init() to setup the PBS launcher -- it's the first PBS-specific function call that we make. If tm_init() fails, it may indicate that something fairly basic is busted in that support library. On Jun 10, 2010, at 11:12 AM, Richard Walsh wrote: > > Ralph/Jeff, > > Yes, the change was intentional. I have upgraded PBS as well and built > 1.4.2 pointing to the new PBS via a symbolic link to 'default' which allows > one > to control the actual default without changing the path. I did the same thing > on the non-IB system which seems to be working fine with 1.4.2. This would > suggest that this is not the issue. > > It is possible that the PBS build in the IB system was flawed, but it looked > normal. I could rebuild it. The PBS libraries (as well as MPI) are in a > shared > location that is NFS mounted on the compute nodes so things should be in > sync, but I will verify this. > > Any other suggestions ... ?? > > rbw > > > Richard Walsh > Parallel Applications and Systems Manager > CUNY HPC Center, Staten Island, NY > 718-982-3319 > 612-382-4620 > > Mighty the Wizard > Who found me at sunrise > Sleeping, and woke me > And learn'd me Magic! > ________________________________________ > From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of > Jeff Squyres [jsquy...@cisco.com] > Sent: Thursday, June 10, 2010 11:00 AM > To: Open MPI Users > Subject: Re: [OMPI users] Address not mapped segmentation fault with 1.4.2 > ... > > On Jun 10, 2010, at 10:57 AM, Ralph Castain wrote: > > > That error would indicate something wrong with the pbs connection - it is > > tm_init that is crashing. I note that you did --with-tm pointing to a > > different location - was that intentional? Could be something wrong with > > that pbs build > > ...and make sure that the support libs for TM/PBS are the same between the > node you're building on and all the nodes where OMPI will be running. > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > Think green before you print this email. > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users Think green before you print this email.