Forgot you probably need an equal sign after btl arg

Howard Pritchard <hpprit...@gmail.com> schrieb am Mi. 22. März 2017 um
18:11:

> Hi Goetz
>
> Thanks for trying these other versions.  Looks like a bug.  Could you post
> the config.log output from your build of the 2.1.0 to the list?
>
> Also could you try running the job using this extra command line arg to
> see if the problem goes away?
>
> mpirun --mca btl ^vader (rest of your args)
>
> Howard
>
> Götz Waschk <goetz.was...@gmail.com> schrieb am Mi. 22. März 2017 um
> 13:09:
>
> On Wed, Mar 22, 2017 at 7:46 PM, Howard Pritchard <hpprit...@gmail.com>
> wrote:
> > Hi Goetz,
> >
> > Would you mind testing against the 2.1.0 release or the latest from the
> > 1.10.x series (1.10.6)?
>
> Hi Howard,
>
> after sending my mail I have tested both 1.10.6 and 2.1.0 and I have
> received the same error. I have also tested outside of slurm using
> ssh, same problem.
>
> Here's the message from 2.1.0:
> [pax11-10:21920] *** Process received signal ***
> [pax11-10:21920] Signal: Bus error (7)
> [pax11-10:21920] Signal code: Non-existant physical address (2)
> [pax11-10:21920] Failing at address: 0x2b5d5b752290
> [pax11-10:21920] [ 0] /usr/lib64/libpthread.so.0(+0xf370)[0x2b5d446e9370]
> [pax11-10:21920] [ 1]
>
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_btl_vader.so(mca_btl_vader_frag_init+0x70)[0x2b5d531645e0]
> [pax11-10:21920] [ 2]
>
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libopen-pal.so.20(opal_free_list_grow_st+0x211)[0x2b5d44f607c1]
> [pax11-10:21920] [ 3]
>
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_btl_vader.so(+0x2b51)[0x2b5d53162b51]
> [pax11-10:21920] [ 4]
>
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send_request_start_prepare+0x3f)[0x2b5d5bb0a17f]
> [pax11-10:21920] [ 5]
>
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send+0xa7a)[0x2b5d5bafe0aa]
> [pax11-10:21920] [ 6]
>
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(ompi_coll_base_allreduce_intra_ring+0x399)[0x2b5d44480429]
> [pax11-10:21920] [ 7]
>
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(PMPI_Allreduce+0x17b)[0x2b5d444486ab]
> [pax11-10:21920] [ 8] IMB-MPI1[0x40b2ff]
> [pax11-10:21920] [ 9] IMB-MPI1[0x402646]
> [pax11-10:21920] [10]
> /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x2b5d44917b35]
> [pax11-10:21920] [11] IMB-MPI1[0x401f79]
> [pax11-10:21920] *** End of error message ***
> --------------------------------------------------------------------------
> mpirun noticed that process rank 320 with PID 21920 on node pax11-10
> exited on signal 7 (Bus error).
> --------------------------------------------------------------------------
>
>
> Regards, Götz Waschk
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to