I'm afraid our Xgrid support has lagged, and Apple hasn't show much interest in 
MPI + Xgrid support -- much less HPC.  :-\

Have you see the FAQ items about Xgrid?

    http://www.open-mpi.org/faq/?category=osx#xgrid-howto


On Aug 4, 2011, at 4:16 AM, Christopher Jones wrote:

> Hi there,
> 
> I'm currently trying to set up a small xgrid between two mac pros (a single 
> quadcore and a 2 duo core), where both are directly connected via an ethernet 
> cable. I've set up xgrid using the password authentication (rather than the 
> kerberos), and from what I can tell in the Xgrid admin tool it seems to be 
> working. However, once I try a simple hello world program, I get this error:
> 
> chris-joness-mac-pro:~ chrisjones$ mpirun -np 4 ./test_hello
> mpirun noticed that job rank 0 with PID 381 on node xgrid-node-0 exited on 
> signal 15 (Terminated). 
> 1 additional process aborted (not shown)
> 2011-08-04 10:02:16.329 mpirun[350:903] *** Terminating app due to uncaught 
> exception 'NSInvalidArgumentException', reason: '*** 
> -[NSKVONotifying_XGConnection<0x1001325a0> finalize]: called when collecting 
> not enabled'
> *** Call stack at first throw:
> (
>       0   CoreFoundation                      0x00007fff814237b4 
> __exceptionPreprocess + 180
>       1   libobjc.A.dylib                     0x00007fff84fe8f03 
> objc_exception_throw + 45
>       2   CoreFoundation                      0x00007fff8143e631 
> -[NSObject(NSObject) finalize] + 129
>       3   mca_pls_xgrid.so                    0x00000001002a9ce3 
> -[PlsXGridClient dealloc] + 419
>       4   mca_pls_xgrid.so                    0x00000001002a9837 
> orte_pls_xgrid_finalize + 40
>       5   libopen-rte.0.dylib                 0x000000010002d0f9 
> orte_pls_base_close + 249
>       6   libopen-rte.0.dylib                 0x0000000100012027 
> orte_system_finalize + 119
>       7   libopen-rte.0.dylib                 0x000000010000e968 
> orte_finalize + 40
>       8   mpirun                              0x00000001000011ff orterun + 
> 2042
>       9   mpirun                              0x0000000100000a03 main + 27
>       10  mpirun                              0x00000001000009e0 start + 52
>       11  ???                                 0x0000000000000004 0x0 + 4
> )
> terminate called after throwing an instance of 'NSException'
> [chris-joness-mac-pro:00350] *** Process received signal ***
> [chris-joness-mac-pro:00350] Signal: Abort trap (6)
> [chris-joness-mac-pro:00350] Signal code:  (0)
> [chris-joness-mac-pro:00350] [ 0] 2   libSystem.B.dylib                   
> 0x00007fff81ca51ba _sigtramp + 26
> [chris-joness-mac-pro:00350] [ 1] 3   ???                                 
> 0x00000001000cd400 0x0 + 4295808000
> [chris-joness-mac-pro:00350] [ 2] 4   libstdc++.6.dylib                   
> 0x00007fff830965d2 __tcf_0 + 0
> [chris-joness-mac-pro:00350] [ 3] 5   libobjc.A.dylib                     
> 0x00007fff84fecb39 _objc_terminate + 100
> [chris-joness-mac-pro:00350] [ 4] 6   libstdc++.6.dylib                   
> 0x00007fff83094ae1 _ZN10__cxxabiv111__terminateEPFvvE + 11
> [chris-joness-mac-pro:00350] [ 5] 7   libstdc++.6.dylib                   
> 0x00007fff83094b16 _ZN10__cxxabiv112__unexpectedEPFvvE + 0
> [chris-joness-mac-pro:00350] [ 6] 8   libstdc++.6.dylib                   
> 0x00007fff83094bfc 
> _ZL23__gxx_exception_cleanup19_Unwind_Reason_CodeP17_Unwind_Exception + 0
> [chris-joness-mac-pro:00350] [ 7] 9   libobjc.A.dylib                     
> 0x00007fff84fe8fa2 object_getIvar + 0
> [chris-joness-mac-pro:00350] [ 8] 10  CoreFoundation                      
> 0x00007fff8143e631 -[NSObject(NSObject) finalize] + 129
> [chris-joness-mac-pro:00350] [ 9] 11  mca_pls_xgrid.so                    
> 0x00000001002a9ce3 -[PlsXGridClient dealloc] + 419
> [chris-joness-mac-pro:00350] [10] 12  mca_pls_xgrid.so                    
> 0x00000001002a9837 orte_pls_xgrid_finalize + 40
> [chris-joness-mac-pro:00350] [11] 13  libopen-rte.0.dylib                 
> 0x000000010002d0f9 orte_pls_base_close + 249
> [chris-joness-mac-pro:00350] [12] 14  libopen-rte.0.dylib                 
> 0x0000000100012027 orte_system_finalize + 119
> [chris-joness-mac-pro:00350] [13] 15  libopen-rte.0.dylib                 
> 0x000000010000e968 orte_finalize + 40
> [chris-joness-mac-pro:00350] [14] 16  mpirun                              
> 0x00000001000011ff orterun + 2042
> [chris-joness-mac-pro:00350] [15] 17  mpirun                              
> 0x0000000100000a03 main + 27
> [chris-joness-mac-pro:00350] [16] 18  mpirun                              
> 0x00000001000009e0 start + 52
> [chris-joness-mac-pro:00350] [17] 19  ???                                 
> 0x0000000000000004 0x0 + 4
> [chris-joness-mac-pro:00350] *** End of error message ***
> Abort trap
> 
> 
> I've seen this error in a previous mailing, and it seems that the issue has 
> something to do with forcing everything to use kerberos (SSO). However, I 
> noticed that in the computer being used as an agent, this option is grayed on 
> in the Xgrid sharing configuration (I have no idea why). I would therefore 
> ask if it is absolutely necessary to use SSO to get openmpi to run with 
> xgrid, or am I missing something with the password setup. Seems that the 
> kerberos option is much more complicated, and I may even want to switch to 
> just using openmpi with ssh.
> 
> Many thanks,
> Chris
> 
> 
> Chris Jones
> Post-doctoral Research Assistant, 
> 
> Department of Microbiology
> Swedish University of Agricultural Sciences
> Uppsala, Sweden
> phone: +46 (0)18 67 3222
> email: chris.jo...@slu.se
> 
> Department of Soil and Environmental Microbiology
> National Institute for Agronomic Research
> Dijon, France
> 
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to