Re: [OMPI users] SIGSEV when running OMPI Java binding

2014-03-14 Thread Saliya Ekanayake
This is really great news!! I'll test the trunk on our cluster. Thank you, Saliya On Fri, Mar 14, 2014 at 4:44 PM, Jeff Squyres (jsquyres) wrote: > We just fixed the segv (see > https://svn.open-mpi.org/trac/ompi/changeset/31073, if you care). > > The issue was an errant large array on the sta

Re: [OMPI users] SIGSEV when running OMPI Java binding

2014-03-14 Thread Jeff Squyres (jsquyres)
We just fixed the segv (see https://svn.open-mpi.org/trac/ompi/changeset/31073, if you care). The issue was an errant large array on the stack in debug builds, which would cause JVMs to run out of stack space. The fix is on the SVN trunk now; it will be on the v1.7 branch shortly. On Mar 11,

Re: [OMPI users] SIGSEV when running OMPI Java binding

2014-03-13 Thread Ralph Castain
We haven't figured it out yet - it seems somewhat erratic as your observations don't match anything we are seeing on our machines. We know the coll/ml component is causing trouble for Java applications (but nothing else, oddly enough), but that doesn't match your experience. On Mar 12, 2014, a

Re: [OMPI users] SIGSEV when running OMPI Java binding

2014-03-13 Thread Saliya Ekanayake
Just checking if there's some solution for this. Thank you, Saliya On Tue, Mar 11, 2014 at 10:54 PM, Saliya Ekanayake wrote: > I forgot to mention that I tried the hello.c version instead of Java and > it too failed in a similar manner, but > > 1. On a single node with --mca btl ^tcp it went up

Re: [OMPI users] SIGSEV when running OMPI Java binding

2014-03-11 Thread Saliya Ekanayake
I forgot to mention that I tried the hello.c version instead of Java and it too failed in a similar manner, but 1. On a single node with --mca btl ^tcp it went up to 24 procs before failing 2. On 8 nodes with --mca btl ^tcp it could go only up to 16 procs On Tue, Mar 11, 2014 at 5:06 PM, Saliya

Re: [OMPI users] SIGSEV when running OMPI Java binding

2014-03-11 Thread Saliya Ekanayake
I just tested with "ml" turned off as you suggested, but unfortunately it didn't solve the issue. However, I found that by explicitly setting --mca btl ^tcp the code worked on upto 4 nodes with each running 8 procs. If I don't specify this it'll simply fail even on one node with 8 procs. Thank yo

Re: [OMPI users] SIGSEV when running OMPI Java binding

2014-03-11 Thread Jeff Squyres (jsquyres)
Looks like we still have a bug in one of our components -- can you try: mpirun --mca coll ^ml ... This will deactivate the "ml" collective component. See if that enables you to run (this particular component has nothing to do with Java). On Mar 11, 2014, at 1:33 AM, Saliya Ekanayake wrot

Re: [OMPI users] SIGSEV when running OMPI Java binding

2014-03-11 Thread Ralph Castain
Seems odd - the Java code is passing all tests on my Linux boxes. A quick glance shows it failing on memcpy on your machine during MPI_Init, which would make one suspect either an uninitialized variable or something not getting loaded correctly. Oscar, Jose? Any thoughts? Ralph On Mar 10, 201

Re: [OMPI users] SIGSEV when running OMPI Java binding

2014-03-11 Thread Saliya Ekanayake
Just tested that this happens even with the simple Hello.java program given in OMPI distribution. I've made a tarball containing details of the error adhering to http://www.open-mpi.org/community/help/. Please let me know if I have missed any info necessary. Thank you, Saliya On Mon, Mar 10,

Re: [OMPI users] SIGSEV when running OMPI Java binding

2014-03-10 Thread Jeff Squyres (jsquyres)
Greetings, and thanks for trying out our Java bindings. Can you provide some more details? E.g., is there a particular program you're running that incurs these problems? Or is there even a particular MPI function that you're using that results in this segv (e.g., perhaps we have a specific bu

[OMPI users] SIGSEV when running OMPI Java binding

2014-03-10 Thread Saliya Ekanayake
Hi, I have 8 nodes each with 2 quad core sockets. Also, the nodes have IB connectivity. I am trying to run OMPI Java binding in OMPI trunk revision 30301 with 8 procs per node totaling 64 procs. This gives a SIGSEV error as below. I wonder if you have any suggestion to resolve this? Thank you, S