thanks Nate.  We will give the test a try.

----------

sent from my smart phonr so no good type.

Howard
On Aug 5, 2015 2:42 PM, "Nate Chambers" <ncham...@usna.edu> wrote:

> Howard,
>
> Thanks for looking at all this. Adding System.gc() did not cause it to
> segfault. The segfault still comes much later in the processing.
>
> I was able to reduce my code to a single test file without other
> dependencies. It is attached. This code simply opens a text file and reads
> its lines, one by one. Once finished, it closes and opens the same file and
> reads the lines again. On my system, it does this about 4 times until the
> segfault fires. Obviously this code makes no sense, but it's based on our
> actual code that reads millions of lines of data and does various
> processing to it.
>
> Attached is a tweets.tgz file that you can uncompress to have an input
> directory. The text file is just the same line over and over again. Run it
> as:
>
> *java MPITestBroke tweets/*
>
>
> Nate
>
>
>
>
>
> On Wed, Aug 5, 2015 at 8:29 AM, Howard Pritchard <hpprit...@gmail.com>
> wrote:
>
>> Hi Nate,
>>
>> Sorry for the delay in getting back.  Thanks for the sanity check.  You
>> may have a point about the args string to MPI.init -
>> there's nothing the Open MPI is needing from this but that is a
>> difference with your use case - your app has an argument.
>>
>> Would you mind adding a
>>
>> System.gc()
>>
>> call immediately after MPI.init call and see if the gc blows up with a
>> segfault?
>>
>> Also, may be interesting to add the -verbose:jni to your command line.
>>
>> We'll do some experiments here with the init string arg.
>>
>> Is your app open source where we could download it and try to reproduce
>> the problem locally?
>>
>> thanks,
>>
>> Howard
>>
>>
>> 2015-08-04 18:52 GMT-06:00 Nate Chambers <ncham...@usna.edu>:
>>
>>> Sanity checks pass. Both Hello and Ring.java run correctly with the
>>> expected program's output.
>>>
>>> Does MPI.init(args) expect anything from those command-line args?
>>>
>>>
>>> Nate
>>>
>>>
>>> On Tue, Aug 4, 2015 at 12:26 PM, Howard Pritchard <hpprit...@gmail.com>
>>> wrote:
>>>
>>>> Hello Nate,
>>>>
>>>> As a sanity check of your installation, could you try to compile the
>>>> examples/*.java codes using the mpijavac you've installed and see that
>>>> those run correctly?
>>>> I'd be just interested in the Hello.java and Ring.java?
>>>>
>>>> Howard
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> 2015-08-04 14:34 GMT-06:00 Nate Chambers <ncham...@usna.edu>:
>>>>
>>>>> Sure, I reran the configure with CC=gcc and then make install. I think
>>>>> that's the proper way to do it. Attached is my config log. The behavior
>>>>> when running our code appears to be the same. The output is the same error
>>>>> I pasted in my email above. It occurs when calling MPI.init().
>>>>>
>>>>> I'm not great at debugging this sort of stuff, but happy to try things
>>>>> out if you need me to.
>>>>>
>>>>> Nate
>>>>>
>>>>>
>>>>> On Tue, Aug 4, 2015 at 5:09 AM, Howard Pritchard <hpprit...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello Nate,
>>>>>>
>>>>>> As a first step to addressing this, could you please try using gcc
>>>>>> rather than the Intel compilers to build Open MPI?
>>>>>>
>>>>>> We've been doing a lot of work recently on the java bindings, etc.
>>>>>> but have never tried using any compilers other
>>>>>> than gcc when working with the java bindings.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Howard
>>>>>>
>>>>>>
>>>>>> 2015-08-03 17:36 GMT-06:00 Nate Chambers <ncham...@usna.edu>:
>>>>>>
>>>>>>> We've been struggling with this error for a while, so hoping someone
>>>>>>> more knowledgeable can help!
>>>>>>>
>>>>>>> Our java MPI code exits with a segfault during its normal operation, 
>>>>>>> *but
>>>>>>> the segfault occurs before our code ever uses MPI functionality like
>>>>>>> sending/receiving. *We've removed all message calls and any use of
>>>>>>> MPI.COMM_WORLD from the code. The segfault occurs if we call 
>>>>>>> MPI.init(args)
>>>>>>> in our code, and does not if we comment that line out. Further vexing 
>>>>>>> us,
>>>>>>> the crash doesn't happen at the point of the MPI.init call, but later 
>>>>>>> on in
>>>>>>> the program. I don't have an easy-to-run example here because our 
>>>>>>> non-MPI
>>>>>>> code is so large and complicated. We have run simpler test programs with
>>>>>>> MPI and the segfault does not occur.
>>>>>>>
>>>>>>> We have isolated the line where the segfault occurs. However, if we
>>>>>>> comment that out, the program will run longer, but then randomly (but
>>>>>>> deterministically) segfault later on in the code. Does anyone have tips 
>>>>>>> on
>>>>>>> how to debug this? We have tried several flags with mpirun, but no good
>>>>>>> clues.
>>>>>>>
>>>>>>> We have also tried several MPI versions, including stable 1.8.7 and
>>>>>>> the most recent 1.8.8rc1
>>>>>>>
>>>>>>>
>>>>>>> ATTACHED
>>>>>>> - config.log from installation
>>>>>>> - output from `ompi_info -all`
>>>>>>>
>>>>>>>
>>>>>>> OUTPUT FROM RUNNING
>>>>>>>
>>>>>>> > mpirun -np 2 java -mx4g FeaturizeDay datadir/ days.txt
>>>>>>> ...
>>>>>>> some normal output from our code
>>>>>>> ...
>>>>>>>
>>>>>>> --------------------------------------------------------------------------
>>>>>>> mpirun noticed that process rank 0 with PID 29646 on node r9n69
>>>>>>> exited on signal 11 (Segmentation fault).
>>>>>>>
>>>>>>> --------------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> Link to this post:
>>>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27386.php
>>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> Link to this post:
>>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27389.php
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27391.php
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/users/2015/08/27392.php
>>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/users/2015/08/27393.php
>>>
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2015/08/27396.php
>>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/08/27399.php
>

Reply via email to