Jeff/All,

OK ... so if I follow your lead and build a version without PBS --tm= 
integration
and it works, I should be able to report this as an incompatibility bug between
the latest version of PBS Pro (10.2.0.93147) and the latest version of OpenMPI
(1.4.2). right?  Do I report that you to my friends at OpenMPI or my friends at
PBS Pro (Altair), or both?

Thanks for your help.  I will let you know what the result is ...

rbw


   Richard Walsh
   Parallel Applications and Systems Manager
   CUNY HPC Center, Staten Island, NY
   718-982-3319
   612-382-4620

   Mighty the Wizard
   Who found me at sunrise
   Sleeping, and woke me
   And learn'd me Magic!
________________________________________
From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of Jeff 
Squyres [jsquy...@cisco.com]
Sent: Thursday, June 10, 2010 11:52 AM
To: Open MPI Users
Subject: Re: [OMPI users] Address not mapped segmentation fault with1.4.2       
...

Not offhand, but just to close the loop on a question from your first mail: 
this should not be a memory manager issue (i.e., not related to IB).

As Ralph noted, this is a segv in the launcher (mpirun, in this case) -- in the 
tm_init() function call (TM is the launcher helper library in PBS/Torque).  
Open MPI (mpirun, in this case) calls tm_init() to setup the PBS launcher -- 
it's the first PBS-specific function call that we make.  If tm_init() fails, it 
may indicate that something fairly basic is busted in that support library.


On Jun 10, 2010, at 11:12 AM, Richard Walsh wrote:

>
> Ralph/Jeff,
>
> Yes, the change was intentional.  I have upgraded PBS as well and built
> 1.4.2 pointing to the new PBS via a symbolic link to 'default' which allows 
> one
> to control the actual default without changing the path.  I did the same thing
> on the non-IB system which seems to be working fine with 1.4.2.  This would
> suggest that this is not the issue.
>
> It is possible that the PBS build in the IB system was flawed, but it looked
> normal.  I could rebuild it.  The PBS libraries (as well as MPI) are in a 
> shared
> location that is NFS mounted on the compute nodes so things should be in
> sync, but I will verify this.
>
> Any other suggestions ... ??
>
> rbw
>
>
>    Richard Walsh
>    Parallel Applications and Systems Manager
>    CUNY HPC Center, Staten Island, NY
>    718-982-3319
>    612-382-4620
>
>    Mighty the Wizard
>    Who found me at sunrise
>    Sleeping, and woke me
>    And learn'd me Magic!
> ________________________________________
> From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of 
> Jeff Squyres [jsquy...@cisco.com]
> Sent: Thursday, June 10, 2010 11:00 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] Address not mapped segmentation fault with 1.4.2    
>   ...
>
> On Jun 10, 2010, at 10:57 AM, Ralph Castain wrote:
>
> > That error would indicate something wrong with the pbs connection - it is 
> > tm_init that is crashing. I note that you did --with-tm pointing to a 
> > different location - was that intentional? Could be something wrong with 
> > that pbs build
>
> ...and make sure that the support libs for TM/PBS are the same between the 
> node you're building on and all the nodes where OMPI will be running.
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> Think green before you print this email.
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Think green before you print this email.

Reply via email to