Have the admin try running the ibv_ud_pingpong test - that will exercise the 
portion of the system under discussion.


> On Dec 28, 2014, at 2:31 PM, Saliya Ekanayake <esal...@gmail.com> wrote:
> 
> What I heard from the administrator is that, 
> 
> "The tests that work are the simple utilities ib_read_lat and ib_read_bw
> that measures latency and bandwith between two nodes. They are part of
> the "perftest" repo package."
> 
> On Dec 28, 2014 10:20 AM, "Saliya Ekanayake" <esal...@gmail.com 
> <mailto:esal...@gmail.com>> wrote:
> This happens at MPI_Init. I've attached the full error message.
> 
> The sys admin mentioned Infiniband utility tests ran OK. I'll contact him for 
> more details and let you know.
> 
> Thank you,
> Saliya
> 
> On Sun, Dec 28, 2014 at 3:18 AM, Gilles Gouaillardet 
> <gilles.gouaillar...@gmail.com <mailto:gilles.gouaillar...@gmail.com>> wrote:
> Where does the error occurs ?
> MPI_Init ?
> MPI_Finalize ?
> In between ?
> 
> In the first case, the bug is likely a mishandled error case,
> which means OpenMPI is unlikely the root cause of the crash.
> 
> Did you check infniband is up and running on your cluster ?
> 
> Cheers,
> 
> Gilles 
> 
> Saliya Ekanayake <esal...@gmail.com <mailto:esal...@gmail.com>>さんのメール:
> It's been a while on this, but we are still having trouble getting OpenMPI to 
> work with Infiniband on this cluster. We tried with latest 1.8.4 as well, but 
> it's still the same.
> 
> To recap, we get the following error when MPI initializes (in the simple 
> Hello world C example) with Infiniband. Everything works fine if we 
> explicitly turn off openib with --mca btl ^openib
> 
> This is the error I got after debugging with gdb as you suggested.
> 
> hello_c: connect/btl_openib_connect_udcm.c:736: udcm_module_finalize: 
> Assertion `((0xdeafbeedULL << 32) + 0xdeafbeedULL) == ((opal_object_t *) 
> (&m->cm_recv_msg_queue))->obj_magic_id' failed.
> 
> Thank you,
> Saliya
> 
> On Mon, Nov 10, 2014 at 10:01 AM, Saliya Ekanayake <esal...@gmail.com 
> <mailto:esal...@gmail.com>> wrote:
> Thank you Jeff, I'll try this and  let you know. 
> 
> Saliya 
> On Nov 10, 2014 6:42 AM, "Jeff Squyres (jsquyres)" <jsquy...@cisco.com 
> <mailto:jsquy...@cisco.com>> wrote:
> I am sorry for the delay; I've been caught up in SC deadlines.  :-(
> 
> I don't see anything blatantly wrong in this output.
> 
> Two things:
> 
> 1. Can you try a nightly v1.8.4 snapshot tarball?  This will check to see if 
> whatever the bug is has been fixed for the upcoming release:
> 
>     http://www.open-mpi.org/nightly/v1.8/ 
> <http://www.open-mpi.org/nightly/v1.8/>
> 
> 2. Build Open MPI with the --enable-debug option (note that this adds a 
> slight-but-noticeable performance penalty).  When you run, it should dump a 
> core file.  Load that core file in a debugger and see where it is failing 
> (i.e., file and line in the OMPI source).
> 
> We don't usually have to resort to asking users to perform #2, but there's no 
> additional information to give a clue as to what is happening.  :-(
> 
> 
> 
> On Nov 9, 2014, at 11:43 AM, Saliya Ekanayake <esal...@gmail.com 
> <mailto:esal...@gmail.com>> wrote:
> 
> > Hi Jeff,
> >
> > You are probably busy, but just checking if you had a chance to look at 
> > this.
> >
> > Thanks,
> > Saliya
> >
> > On Thu, Nov 6, 2014 at 9:19 AM, Saliya Ekanayake <esal...@gmail.com 
> > <mailto:esal...@gmail.com>> wrote:
> > Hi Jeff,
> >
> > I've attached a tar file with information.
> >
> > Thank you,
> > Saliya
> >
> > On Tue, Nov 4, 2014 at 4:18 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com 
> > <mailto:jsquy...@cisco.com>> wrote:
> > Looks like it's failing in the openib BTL setup.
> >
> > Can you send the info listed here?
> >
> >     http://www.open-mpi.org/community/help/ 
> > <http://www.open-mpi.org/community/help/>
> >
> >
> >
> > On Nov 4, 2014, at 1:10 PM, Saliya Ekanayake <esal...@gmail.com 
> > <mailto:esal...@gmail.com>> wrote:
> >
> > > Hi,
> > >
> > > I am using OpenMPI 1.8.1 in a Linux cluster that we recently setup. It 
> > > builds fine, but when I try to run even the simplest hello.c program 
> > > it'll cause a segfault. Any suggestions on how to correct this?
> > >
> > > The steps I did and error message are below.
> > >
> > > 1. Built OpenMPI 1.8.1 on the cluster. The ompi_info is attached.
> > > 2. cd to examples directory and mpicc hello_c.c
> > > 3. mpirun -np 2 ./a.out
> > > 4. Error text is attached.
> > >
> > > Please let me know if you need more info.
> > >
> > > Thank you,
> > > Saliya
> > >
> > >
> > > --
> > > Saliya Ekanayake esal...@gmail.com <mailto:esal...@gmail.com>
> > > Cell 812-391-4914 <tel:812-391-4914> Home 812-961-6383 <tel:812-961-6383>
> > > http://saliya.org <http://saliya.org/>
> > > <ompi_info.txt><error.txt>_______________________________________________
> > > users mailing list
> > > us...@open-mpi.org <mailto:us...@open-mpi.org>
> > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
> > > <http://www.open-mpi.org/mailman/listinfo.cgi/users>
> > > Link to this post: 
> > > http://www.open-mpi.org/community/lists/users/2014/11/25668.php 
> > > <http://www.open-mpi.org/community/lists/users/2014/11/25668.php>
> >
> >
> > --
> > Jeff Squyres
> > jsquy...@cisco.com <mailto:jsquy...@cisco.com>
> > For corporate legal information go to: 
> > http://www.cisco.com/web/about/doing_business/legal/cri/ 
> > <http://www.cisco.com/web/about/doing_business/legal/cri/>
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org <mailto:us...@open-mpi.org>
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
> > <http://www.open-mpi.org/mailman/listinfo.cgi/users>
> > Link to this post: 
> > http://www.open-mpi.org/community/lists/users/2014/11/25672.php 
> > <http://www.open-mpi.org/community/lists/users/2014/11/25672.php>
> >
> >
> >
> > --
> > Saliya Ekanayake esal...@gmail.com <mailto:esal...@gmail.com>
> > Cell 812-391-4914 <tel:812-391-4914> Home 812-961-6383 <tel:812-961-6383>
> > http://saliya.org <http://saliya.org/>
> >
> >
> >
> > --
> > Saliya Ekanayake esal...@gmail.com <mailto:esal...@gmail.com>
> > Cell 812-391-4914 <tel:812-391-4914> Home 812-961-6383 <tel:812-961-6383>
> > http://saliya.org <http://saliya.org/>
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org <mailto:us...@open-mpi.org>
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
> > <http://www.open-mpi.org/mailman/listinfo.cgi/users>
> > Link to this post: 
> > http://www.open-mpi.org/community/lists/users/2014/11/25717.php 
> > <http://www.open-mpi.org/community/lists/users/2014/11/25717.php>
> 
> 
> --
> Jeff Squyres
> jsquy...@cisco.com <mailto:jsquy...@cisco.com>
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/ 
> <http://www.cisco.com/web/about/doing_business/legal/cri/>
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org <mailto:us...@open-mpi.org>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/11/25723.php 
> <http://www.open-mpi.org/community/lists/users/2014/11/25723.php>
> 
> 
> 
> -- 
> Saliya Ekanayake
> Ph.D. Candidate | Research Assistant
> School of Informatics and Computing | Digital Science Center
> Indiana University, Bloomington
> Cell 812-391-4914 <tel:812-391-4914>
> http://saliya.org <http://saliya.org/>
> _______________________________________________
> users mailing list
> us...@open-mpi.org <mailto:us...@open-mpi.org>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/12/26074.php 
> <http://www.open-mpi.org/community/lists/users/2014/12/26074.php>
> 
> 
> 
> -- 
> Saliya Ekanayake
> Ph.D. Candidate | Research Assistant
> School of Informatics and Computing | Digital Science Center
> Indiana University, Bloomington
> Cell 812-391-4914 <tel:812-391-4914>
> http://saliya.org 
> <http://saliya.org/>_______________________________________________
> users mailing list
> us...@open-mpi.org <mailto:us...@open-mpi.org>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/12/26078.php 
> <http://www.open-mpi.org/community/lists/users/2014/12/26078.php>

Reply via email to