Is this work an FAQ item?  I.e., if specific versions of PBS Pro are broken, 
should we make that google-able on our FAQ, at least?


On Jul 27, 2011, at 2:21 PM, Ralph Castain wrote:

> Great - thanks!
> 
> On Jul 27, 2011, at 12:16 PM, Justin Wood wrote:
> 
>> I heard back from my Altair contact this morning.  He told me that they did 
>> in fact make a change in some version of 10.x that broke this.  They don't 
>> have a workaround for v10, but he said it was fixed in v11.x.
>> 
>> I built OpenMPI 1.5.3 this morning with PBSPro v11.0, and it works fine.  I 
>> don't get any segfaults.
>> 
>> -Justin.
>> 
>> On 07/26/2011 05:49 PM, Ralph Castain wrote:
>>> I don't believe we ever got anywhere with this due to lack of response. If 
>>> you get some info on what happened to tm_init, please pass it along.
>>> 
>>> Best guess: something changed in a recent PBS Pro release. Since none of us 
>>> have access to it, we don't know what's going on. :-(
>>> 
>>> 
>>> On Jul 26, 2011, at 10:10 AM, Wood, Justin Contractor, SAIC wrote:
>>> 
>>>> I'm having a problem using OpenMPI under PBS Pro 10.4.  I tried both 1.4.3 
>>>> and 1.5.3, both behave the same.  I'm able to run just fine if I don't use 
>>>> PBS and go direct to the nodes.  Also, if I run under PBS and use only 1 
>>>> node, it works fine, but as soon as I span nodes, I get the following:
>>>> 
>>>> [a4ou-n501:07366] *** Process received signal ***
>>>> [a4ou-n501:07366] Signal: Segmentation fault (11)
>>>> [a4ou-n501:07366] Signal code: Address not mapped (1)
>>>> [a4ou-n501:07366] Failing at address: 0x3f
>>>> [a4ou-n501:07366] [ 0] /lib64/libpthread.so.0 [0x3f2b20eb10]
>>>> [a4ou-n501:07366] [ 1] 
>>>> /opt/ompi/1.4.3/intel/lib/libopen-rte.so.0(discui_+0x84) [0x2affa453765c]
>>>> [a4ou-n501:07366] [ 2] 
>>>> /opt/ompi/1.4.3/intel/lib/libopen-rte.so.0(diswsi+0xc3) [0x2affa4534c6f]
>>>> [a4ou-n501:07366] [ 3] /opt/ompi/1.4.3/intel/lib/libopen-rte.so.0 
>>>> [0x2affa453290c]
>>>> [a4ou-n501:07366] [ 4] 
>>>> /opt/ompi/1.4.3/intel/lib/libopen-rte.so.0(tm_init+0x1fe) [0x2affa4532bf8]
>>>> [a4ou-n501:07366] [ 5] /opt/ompi/1.4.3/intel/lib/libopen-rte.so.0 
>>>> [0x2affa452691c]
>>>> [a4ou-n501:07366] [ 6] mpirun [0x404c17]
>>>> [a4ou-n501:07366] [ 7] mpirun [0x403e28]
>>>> [a4ou-n501:07366] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) 
>>>> [0x3f2a61d994]
>>>> [a4ou-n501:07366] [ 9] mpirun [0x403d59]
>>>> [a4ou-n501:07366] *** End of error message ***
>>>> Segmentation fault
>>>> 
>>>> I searched the archives and found a similar issue from last year:
>>>> 
>>>> http://www.open-mpi.org/community/lists/users/2010/02/12084.php
>>>> 
>>>> The last update I saw was that someone was going to contact Altair and 
>>>> have them look at why it was failing to do the tm_init.  Does anyone have 
>>>> an update to this, and has anyone been able to run successfully using 
>>>> recent versions of PBSPro?  I've also contacted our rep at Altair, but he 
>>>> hasn't responded yet.
>>>> 
>>>> Thanks, Justin.
>>>> 
>>>> Justin Wood
>>>> Systems Engineer
>>>> FNMOC | SAIC
>>>> 7 Grace Hopper, Stop 1
>>>> Monterey, CA
>>>> justin.g.wood....@navy.mil
>>>> justin.g.w...@saic.com
>>>> office: 831.656.4671
>>>> mobile: 831.869.1576
>>>> 
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> -- 
>> Justin Wood
>> Systems Engineer
>> FNMOC | SAIC
>> 7 Grace Hopper, Stop 1
>> Monterey, CA
>> justin.g.wood....@navy.mil
>> justin.g.w...@saic.com
>> office: 831.656.4671
>> mobile: 831.869.1576
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to