Forgot you probably need an equal sign after btl arg Howard Pritchard <hpprit...@gmail.com> schrieb am Mi. 22. März 2017 um 18:11:
> Hi Goetz > > Thanks for trying these other versions. Looks like a bug. Could you post > the config.log output from your build of the 2.1.0 to the list? > > Also could you try running the job using this extra command line arg to > see if the problem goes away? > > mpirun --mca btl ^vader (rest of your args) > > Howard > > Götz Waschk <goetz.was...@gmail.com> schrieb am Mi. 22. März 2017 um > 13:09: > > On Wed, Mar 22, 2017 at 7:46 PM, Howard Pritchard <hpprit...@gmail.com> > wrote: > > Hi Goetz, > > > > Would you mind testing against the 2.1.0 release or the latest from the > > 1.10.x series (1.10.6)? > > Hi Howard, > > after sending my mail I have tested both 1.10.6 and 2.1.0 and I have > received the same error. I have also tested outside of slurm using > ssh, same problem. > > Here's the message from 2.1.0: > [pax11-10:21920] *** Process received signal *** > [pax11-10:21920] Signal: Bus error (7) > [pax11-10:21920] Signal code: Non-existant physical address (2) > [pax11-10:21920] Failing at address: 0x2b5d5b752290 > [pax11-10:21920] [ 0] /usr/lib64/libpthread.so.0(+0xf370)[0x2b5d446e9370] > [pax11-10:21920] [ 1] > > /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_btl_vader.so(mca_btl_vader_frag_init+0x70)[0x2b5d531645e0] > [pax11-10:21920] [ 2] > > /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libopen-pal.so.20(opal_free_list_grow_st+0x211)[0x2b5d44f607c1] > [pax11-10:21920] [ 3] > > /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_btl_vader.so(+0x2b51)[0x2b5d53162b51] > [pax11-10:21920] [ 4] > > /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send_request_start_prepare+0x3f)[0x2b5d5bb0a17f] > [pax11-10:21920] [ 5] > > /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send+0xa7a)[0x2b5d5bafe0aa] > [pax11-10:21920] [ 6] > > /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(ompi_coll_base_allreduce_intra_ring+0x399)[0x2b5d44480429] > [pax11-10:21920] [ 7] > > /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(PMPI_Allreduce+0x17b)[0x2b5d444486ab] > [pax11-10:21920] [ 8] IMB-MPI1[0x40b2ff] > [pax11-10:21920] [ 9] IMB-MPI1[0x402646] > [pax11-10:21920] [10] > /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x2b5d44917b35] > [pax11-10:21920] [11] IMB-MPI1[0x401f79] > [pax11-10:21920] *** End of error message *** > -------------------------------------------------------------------------- > mpirun noticed that process rank 320 with PID 21920 on node pax11-10 > exited on signal 7 (Bus error). > -------------------------------------------------------------------------- > > > Regards, Götz Waschk > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > >
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users