Thank you Ralph. This produced the warning on memory limits similar to [1] and setting ulimit -l unlimited worked.
[1] http://lists.openfabrics.org/pipermail/general/2007-June/036941.html Saliya On Sun, Dec 28, 2014 at 5:57 PM, Ralph Castain <r...@open-mpi.org> wrote: > Have the admin try running the ibv_ud_pingpong test - that will exercise > the portion of the system under discussion. > > > On Dec 28, 2014, at 2:31 PM, Saliya Ekanayake <esal...@gmail.com> wrote: > > What I heard from the administrator is that, > > "The tests that work are the simple utilities ib_read_lat and ib_read_bw > that measures latency and bandwith between two nodes. They are part of > the "perftest" repo package." > On Dec 28, 2014 10:20 AM, "Saliya Ekanayake" <esal...@gmail.com> wrote: > >> This happens at MPI_Init. I've attached the full error message. >> >> The sys admin mentioned Infiniband utility tests ran OK. I'll contact him >> for more details and let you know. >> >> Thank you, >> Saliya >> >> On Sun, Dec 28, 2014 at 3:18 AM, Gilles Gouaillardet < >> gilles.gouaillar...@gmail.com> wrote: >> >>> Where does the error occurs ? >>> MPI_Init ? >>> MPI_Finalize ? >>> In between ? >>> >>> In the first case, the bug is likely a mishandled error case, >>> which means OpenMPI is unlikely the root cause of the crash. >>> >>> Did you check infniband is up and running on your cluster ? >>> >>> Cheers, >>> >>> Gilles >>> >>> Saliya Ekanayake <esal...@gmail.com>さんのメール: >>> It's been a while on this, but we are still having trouble getting >>> OpenMPI to work with Infiniband on this cluster. We tried with latest 1.8.4 >>> as well, but it's still the same. >>> >>> To recap, we get the following error when MPI initializes (in the simple >>> Hello world C example) with Infiniband. Everything works fine if we >>> explicitly turn off openib with --mca btl ^openib >>> >>> This is the error I got after debugging with gdb as you suggested. >>> >>> hello_c: connect/btl_openib_connect_udcm.c:736: udcm_module_finalize: >>> Assertion `((0xdeafbeedULL << 32) + 0xdeafbeedULL) == ((opal_object_t *) >>> (&m->cm_recv_msg_queue))->obj_magic_id' failed. >>> >>> Thank you, >>> Saliya >>> >>> On Mon, Nov 10, 2014 at 10:01 AM, Saliya Ekanayake <esal...@gmail.com> >>> wrote: >>> >>>> Thank you Jeff, I'll try this and let you know. >>>> Saliya >>>> On Nov 10, 2014 6:42 AM, "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> >>>> wrote: >>>> >>>>> I am sorry for the delay; I've been caught up in SC deadlines. :-( >>>>> >>>>> I don't see anything blatantly wrong in this output. >>>>> >>>>> Two things: >>>>> >>>>> 1. Can you try a nightly v1.8.4 snapshot tarball? This will check to >>>>> see if whatever the bug is has been fixed for the upcoming release: >>>>> >>>>> http://www.open-mpi.org/nightly/v1.8/ >>>>> >>>>> 2. Build Open MPI with the --enable-debug option (note that this adds >>>>> a slight-but-noticeable performance penalty). When you run, it should >>>>> dump >>>>> a core file. Load that core file in a debugger and see where it is >>>>> failing >>>>> (i.e., file and line in the OMPI source). >>>>> >>>>> We don't usually have to resort to asking users to perform #2, but >>>>> there's no additional information to give a clue as to what is happening. >>>>> :-( >>>>> >>>>> >>>>> >>>>> On Nov 9, 2014, at 11:43 AM, Saliya Ekanayake <esal...@gmail.com> >>>>> wrote: >>>>> >>>>> > Hi Jeff, >>>>> > >>>>> > You are probably busy, but just checking if you had a chance to look >>>>> at this. >>>>> > >>>>> > Thanks, >>>>> > Saliya >>>>> > >>>>> > On Thu, Nov 6, 2014 at 9:19 AM, Saliya Ekanayake <esal...@gmail.com> >>>>> wrote: >>>>> > Hi Jeff, >>>>> > >>>>> > I've attached a tar file with information. >>>>> > >>>>> > Thank you, >>>>> > Saliya >>>>> > >>>>> > On Tue, Nov 4, 2014 at 4:18 PM, Jeff Squyres (jsquyres) < >>>>> jsquy...@cisco.com> wrote: >>>>> > Looks like it's failing in the openib BTL setup. >>>>> > >>>>> > Can you send the info listed here? >>>>> > >>>>> > http://www.open-mpi.org/community/help/ >>>>> > >>>>> > >>>>> > >>>>> > On Nov 4, 2014, at 1:10 PM, Saliya Ekanayake <esal...@gmail.com> >>>>> wrote: >>>>> > >>>>> > > Hi, >>>>> > > >>>>> > > I am using OpenMPI 1.8.1 in a Linux cluster that we recently >>>>> setup. It builds fine, but when I try to run even the simplest hello.c >>>>> program it'll cause a segfault. Any suggestions on how to correct this? >>>>> > > >>>>> > > The steps I did and error message are below. >>>>> > > >>>>> > > 1. Built OpenMPI 1.8.1 on the cluster. The ompi_info is attached. >>>>> > > 2. cd to examples directory and mpicc hello_c.c >>>>> > > 3. mpirun -np 2 ./a.out >>>>> > > 4. Error text is attached. >>>>> > > >>>>> > > Please let me know if you need more info. >>>>> > > >>>>> > > Thank you, >>>>> > > Saliya >>>>> > > >>>>> > > >>>>> > > -- >>>>> > > Saliya Ekanayake esal...@gmail.com >>>>> > > Cell 812-391-4914 Home 812-961-6383 >>>>> > > http://saliya.org >>>>> > > >>>>> <ompi_info.txt><error.txt>_______________________________________________ >>>>> > > users mailing list >>>>> > > us...@open-mpi.org >>>>> > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> > > Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2014/11/25668.php >>>>> > >>>>> > >>>>> > -- >>>>> > Jeff Squyres >>>>> > jsquy...@cisco.com >>>>> > For corporate legal information go to: >>>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>>> > >>>>> > _______________________________________________ >>>>> > users mailing list >>>>> > us...@open-mpi.org >>>>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> > Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2014/11/25672.php >>>>> > >>>>> > >>>>> > >>>>> > -- >>>>> > Saliya Ekanayake esal...@gmail.com >>>>> > Cell 812-391-4914 Home 812-961-6383 >>>>> > http://saliya.org >>>>> > >>>>> > >>>>> > >>>>> > -- >>>>> > Saliya Ekanayake esal...@gmail.com >>>>> > Cell 812-391-4914 Home 812-961-6383 >>>>> > http://saliya.org >>>>> > _______________________________________________ >>>>> > users mailing list >>>>> > us...@open-mpi.org >>>>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> > Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2014/11/25717.php >>>>> >>>>> >>>>> -- >>>>> Jeff Squyres >>>>> jsquy...@cisco.com >>>>> For corporate legal information go to: >>>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2014/11/25723.php >>>>> >>>> >>> >>> >>> -- >>> Saliya Ekanayake >>> Ph.D. Candidate | Research Assistant >>> School of Informatics and Computing | Digital Science Center >>> Indiana University, Bloomington >>> Cell 812-391-4914 >>> http://saliya.org >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/12/26074.php >>> >> >> >> >> -- >> Saliya Ekanayake >> Ph.D. Candidate | Research Assistant >> School of Informatics and Computing | Digital Science Center >> Indiana University, Bloomington >> Cell 812-391-4914 >> http://saliya.org >> > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/12/26078.php > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/12/26079.php > -- Saliya Ekanayake Ph.D. Candidate | Research Assistant School of Informatics and Computing | Digital Science Center Indiana University, Bloomington Cell 812-391-4914 http://saliya.org