We just fixed the segv (see https://svn.open-mpi.org/trac/ompi/changeset/31073, 
if you care).

The issue was an errant large array on the stack in debug builds, which would 
cause JVMs to run out of stack space.

The fix is on the SVN trunk now; it will be on the v1.7 branch shortly.


On Mar 11, 2014, at 5:06 PM, Saliya Ekanayake <esal...@gmail.com> wrote:

> I just tested with "ml" turned off as you suggested, but unfortunately it 
> didn't solve the issue. 
> 
> However, I found that by explicitly setting --mca btl ^tcp the code worked on 
> upto 4 nodes with each running 8 procs. If I don't specify this it'll simply 
> fail even on one node with 8 procs.
> 
> Thank you,
> Saliya
> 
> 
> On Tue, Mar 11, 2014 at 4:35 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
> wrote:
> Looks like we still have a bug in one of our components -- can you try:
> 
>     mpirun --mca coll ^ml ...
> 
> This will deactivate the "ml" collective component.  See if that enables you 
> to run (this particular component has nothing to do with Java).
> 
> 
> On Mar 11, 2014, at 1:33 AM, Saliya Ekanayake <esal...@gmail.com> wrote:
> 
> > Just tested that this happens even with the simple Hello.java program given 
> > in OMPI distribution.
> >
> > I've made a tarball containing details of the error adhering to 
> > http://www.open-mpi.org/community/help/. Please let me know if I have 
> > missed any info necessary.
> >
> > Thank you,
> > Saliya
> >
> >
> >
> >
> > On Mon, Mar 10, 2014 at 10:46 AM, Jeff Squyres (jsquyres) 
> > <jsquy...@cisco.com> wrote:
> > Greetings, and thanks for trying out our Java bindings.
> >
> > Can you provide some more details?  E.g., is there a particular program 
> > you're running that incurs these problems?  Or is there even a particular 
> > MPI function that you're using that results in this segv (e.g., perhaps we 
> > have a specific bug somewhere)?
> >
> > Can you reduce the segv to a small example that we can reproduce (and 
> > therefore fix)?
> >
> >
> > On Mar 10, 2014, at 12:05 AM, Saliya Ekanayake <esal...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I have 8 nodes each with 2 quad core sockets. Also, the nodes have IB 
> > > connectivity. I am trying to run OMPI Java binding in OMPI trunk revision 
> > > 30301 with 8 procs per node totaling 64 procs. This gives a SIGSEV error 
> > > as below.
> > >
> > > I wonder if you have any suggestion to resolve this?
> > >
> > > Thank you,
> > > Saliya
> > >
> > > # A fatal error has been detected by the Java Runtime Environment:
> > > #
> > > #  SIGSEGV (0xb) at pc=0x000000313867b75b, pid=12229, tid=47864973515072
> > > #
> > > # JRE version: Java(TM) SE Runtime Environment (8.0-b118) (build 
> > > 1.8.0-ea-b118)
> > > # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b60 mixed mode 
> > > linux-amd64 compressed oops)
> > > # Problematic frame:
> > > # C  [libc.so.6+0x7b75b]  memcpy+0x15b
> > >
> > >
> > > --
> > > Saliya Ekanayake esal...@gmail.com
> > > http://saliya.org
> > > _______________________________________________
> > > users mailing list
> > > us...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > --
> > Jeff Squyres
> > jsquy...@cisco.com
> > For corporate legal information go to: 
> > http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> >
> > --
> > Saliya Ekanayake esal...@gmail.com
> > Cell 812-391-4914 Home 812-961-6383
> > http://saliya.org
> > <hellobug.tar.gz>_______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> 
> -- 
> Saliya Ekanayake esal...@gmail.com 
> Cell 812-391-4914 Home 812-961-6383
> http://saliya.org
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to