Re: [OMPI users] OMPI users] OMPI users] OMPI users] which info is needed for SIGSEGV inJava foropenmpi-dev-124-g91e9686on Solaris

2014-10-26 Thread Ralph Castain
Just took a glance thru 249 and have a few suggestions on it - will pass them along tomorrow. I think the right solution is to (a) dump opal_identifier_t in favor of using opal_process_name_t everywhere in the opal layer, (b) typedef orte_process_name_t to opal_process_name_t, and (c) leave

Re: [OMPI users] which info is needed for SIGSEGV in Java foropenmpi-dev-124-g91e9686on Solaris

2014-10-26 Thread Ralph Castain
Oh yeah - that would indeed be very bad :-( > On Oct 26, 2014, at 6:06 PM, Kawashima, Takahiro > wrote: > > Siegmar, Oscar, > > I suspect that the problem is calling mca_base_var_register > without initializing OPAL in JNI_OnLoad. > > ompi/mpi/java/c/mpi_MPI.c: >

Re: [OMPI users] which info is needed for SIGSEGV in Java foropenmpi-dev-124-g91e9686on Solaris

2014-10-26 Thread Kawashima, Takahiro
Siegmar, Oscar, I suspect that the problem is calling mca_base_var_register without initializing OPAL in JNI_OnLoad. ompi/mpi/java/c/mpi_MPI.c: jint JNI_OnLoad(JavaVM *vm, void *reserved) { libmpi = dlopen("libmpi."

Re: [OMPI users] OMPI users] OMPI users] OMPI users] which info is needed for SIGSEGV inJava foropenmpi-dev-124-g91e9686on Solaris

2014-10-26 Thread Gilles Gouaillardet
No :-( I need some extra work to stop declaring orte_process_name_t and ompi_process_name_t variables. #249 will make things much easier. One option is to use opal_process_name_t everywhere or typedef orte and ompi types to the opal one. An other (lightweight but error prone imho) is to change

Re: [OMPI users] OMPI users] OMPI users] which info is needed for SIGSEGV inJava foropenmpi-dev-124-g91e9686on Solaris

2014-10-26 Thread Ralph Castain
Will PR#249 solve it? If so, we should just go with it as I suspect that is the long-term solution. > On Oct 26, 2014, at 4:25 PM, Gilles Gouaillardet > wrote: > > It looks like we faced a similar issue : > opal_process_name_t is 64 bits aligned wheteas

Re: [OMPI users] OMPI users] OMPI users] which info is needed for SIGSEGV inJava foropenmpi-dev-124-g91e9686on Solaris

2014-10-26 Thread Gilles Gouaillardet
It looks like we faced a similar issue : opal_process_name_t is 64 bits aligned wheteas orte_process_name_t is 32 bits aligned. If you run an alignment sensitive cpu such as sparc and you are not lucky (so to speak) you can run into this issue. i will make a patch for this shortly Ralph Castain

Re: [OMPI users] OMPI users] which info is needed for SIGSEGV inJava foropenmpi-dev-124-g91e9686on Solaris

2014-10-26 Thread Ralph Castain
Afraid this must be something about the Sparc - just ran on a Solaris 11 x86 box and everything works fine. > On Oct 26, 2014, at 8:22 AM, Siegmar Gross > wrote: > > Hi Gilles, > > I wanted to explore which function is called, when I call MPI_Init > in

Re: [OMPI users] OMPI users] which info is needed for SIGSEGV inJava foropenmpi-dev-124-g91e9686on Solaris

2014-10-26 Thread Siegmar Gross
Hi Gilles, I wanted to explore which function is called, when I call MPI_Init in a C program, because this function should be called from a Java program as well. Unfortunately C programs break with a Bus Error once more for openmpi-dev-124-g91e9686 on Solaris. I assume that's the reason why I get

Re: [OMPI users] OMPI users] which info is needed for SIGSEGV in Java foropenmpi-dev-124-g91e9686on Solaris

2014-10-26 Thread Siegmar Gross
Hi Gilles, thank you very much for the quick tutorial. Unfortunately I still can't get a backtrace. > You might need to configure with --enable-debug and add -g -O0 > to your CFLAGS and LDFLAGS > > Then once you attach with gdb, you have to find the thread that is polling : > thread 1 > bt >