Ralph,

this is also a solution.
the pro is it seems more lightweight than PR #249
the two cons i can see are :
- opal_process_name_t alignment goes from 64 to 32 bits
- some functions (opal_hash_table_*) takes an uint64_t as argument so we
still need to use memcpy in order to
  * guarantee 64 bits alignment on some archs (such as sparc)
  * avoid ugly cast such as uint64_t id = *(uint64_t *)&process_name;

as far as i am concerned, i am fine with your proposed suggestion to
dump opal_identifier_t.

about the patch, did you mean you have something ready i can apply to my
PR ?
or do you expect me to do the changes (i am ok to do it if needed)

Cheers,

Gilles

On 2014/10/27 11:04, Ralph Castain wrote:
> Just took a glance thru 249 and have a few suggestions on it - will pass them 
> along tomorrow. I think the right solution is to (a) dump opal_identifier_t 
> in favor of using opal_process_name_t everywhere in the opal layer, (b) 
> typedef orte_process_name_t to opal_process_name_t, and (c) leave 
> ompi_process_name_t as typedef’d to the RTE component in the MPI layer. This 
> lets other RTEs decide for themselves how they want to handle it.
>
> If you add changes to your branch, I can pass you a patch with my suggested 
> alterations.
>
>> On Oct 26, 2014, at 5:55 PM, Gilles Gouaillardet 
>> <gilles.gouaillar...@gmail.com> wrote:
>>
>> No :-(
>> I need some extra work to stop declaring orte_process_name_t and 
>> ompi_process_name_t variables.
>> #249 will make things much easier.
>> One option is to use opal_process_name_t everywhere or typedef orte and ompi 
>> types to the opal one.
>> An other (lightweight but error prone imho) is to change variable 
>> declaration only.
>> Any thought ?
>>
>> Ralph Castain <r...@open-mpi.org> wrote:
>>> Will PR#249 solve it? If so, we should just go with it as I suspect that is 
>>> the long-term solution.
>>>
>>>> On Oct 26, 2014, at 4:25 PM, Gilles Gouaillardet 
>>>> <gilles.gouaillar...@gmail.com> wrote:
>>>>
>>>> It looks like we faced a similar issue :
>>>> opal_process_name_t is 64 bits aligned wheteas orte_process_name_t is 32 
>>>> bits aligned. If you run an alignment sensitive cpu such as sparc and you 
>>>> are not lucky (so to speak) you can run into this issue.
>>>> i will make a patch for this shortly
>>>>
>>>> Ralph Castain <r...@open-mpi.org> wrote:
>>>>> Afraid this must be something about the Sparc - just ran on a Solaris 11 
>>>>> x86 box and everything works fine.
>>>>>
>>>>>
>>>>>> On Oct 26, 2014, at 8:22 AM, Siegmar Gross 
>>>>>> <siegmar.gr...@informatik.hs-fulda.de> wrote:
>>>>>>
>>>>>> Hi Gilles,
>>>>>>
>>>>>> I wanted to explore which function is called, when I call MPI_Init
>>>>>> in a C program, because this function should be called from a Java
>>>>>> program as well. Unfortunately C programs break with a Bus Error
>>>>>> once more for openmpi-dev-124-g91e9686 on Solaris. I assume that's
>>>>>> the reason why I get no useful backtrace for my Java program.
>>>>>>
>>>>>> tyr small_prog 117 mpicc -o init_finalize init_finalize.c
>>>>>> tyr small_prog 118 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec
>>>>>> ...
>>>>>> (gdb) run -np 1 init_finalize
>>>>>> Starting program: /usr/local/openmpi-1.9.0_64_gcc/bin/mpiexec -np 1 
>>>>>> init_finalize
>>>>>> [Thread debugging using libthread_db enabled]
>>>>>> [New Thread 1 (LWP 1)]
>>>>>> [New LWP    2        ]
>>>>>> [tyr:19240] *** Process received signal ***
>>>>>> [tyr:19240] Signal: Bus Error (10)
>>>>>> [tyr:19240] Signal code: Invalid address alignment (1)
>>>>>> [tyr:19240] Failing at address: ffffffff7bd1c10c
>>>>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libopen-pal.so.0.0.0:opal_backtrace_print+0x2c
>>>>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libopen-pal.so.0.0.0:0xdcc04
>>>>>> /lib/sparcv9/libc.so.1:0xd8b98
>>>>>> /lib/sparcv9/libc.so.1:0xcc70c
>>>>>> /lib/sparcv9/libc.so.1:0xcc918
>>>>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libopen-pal.so.0.0.0:opal_proc_set_name+0x1c
>>>>>>  [ Signal 10 (BUS)]
>>>>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/openmpi/mca_pmix_native.so:0x103e8
>>>>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/openmpi/mca_ess_pmi.so:0x33dc
>>>>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libopen-rte.so.0.0.0:orte_init+0x67c
>>>>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libmpi.so.0.0.0:ompi_mpi_init+0x374
>>>>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libmpi.so.0.0.0:PMPI_Init+0x2a8
>>>>>> /home/fd1026/work/skripte/master/parallel/prog/mpi/small_prog/init_finalize:main+0x20
>>>>>> /home/fd1026/work/skripte/master/parallel/prog/mpi/small_prog/init_finalize:_start+0x7c
>>>>>> [tyr:19240] *** End of error message ***
>>>>>> --------------------------------------------------------------------------
>>>>>> mpiexec noticed that process rank 0 with PID 0 on node tyr exited on 
>>>>>> signal 10 (Bus Error).
>>>>>> --------------------------------------------------------------------------
>>>>>> [LWP    2         exited]
>>>>>> [New Thread 2        ]
>>>>>> [Switching to Thread 1 (LWP 1)]
>>>>>> sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to 
>>>>>> satisfy query
>>>>>> (gdb) bt
>>>>>> #0  0xffffffff7f6173d0 in rtld_db_dlactivity () from 
>>>>>> /usr/lib/sparcv9/ld.so.1
>>>>>> #1  0xffffffff7f6175a8 in rd_event () from /usr/lib/sparcv9/ld.so.1
>>>>>> #2  0xffffffff7f618950 in lm_delete () from /usr/lib/sparcv9/ld.so.1
>>>>>> #3  0xffffffff7f6226bc in remove_so () from /usr/lib/sparcv9/ld.so.1
>>>>>> #4  0xffffffff7f624574 in remove_hdl () from /usr/lib/sparcv9/ld.so.1
>>>>>> #5  0xffffffff7f61d97c in dlclose_core () from /usr/lib/sparcv9/ld.so.1
>>>>>> #6  0xffffffff7f61d9d4 in dlclose_intn () from /usr/lib/sparcv9/ld.so.1
>>>>>> #7  0xffffffff7f61db0c in dlclose () from /usr/lib/sparcv9/ld.so.1
>>>>>> #8  0xffffffff7ec87f60 in vm_close (loader_data=0x0, 
>>>>>> module=0xffffffff7c901fe0)
>>>>>>  at ../../../openmpi-dev-124-g91e9686/opal/libltdl/loaders/dlopen.c:212
>>>>>> #9  0xffffffff7ec85534 in lt_dlclose (handle=0x100189b50)
>>>>>>  at ../../../openmpi-dev-124-g91e9686/opal/libltdl/ltdl.c:1982
>>>>>> #10 0xffffffff7ecaabd4 in ri_destructor (obj=0x1001893a0)
>>>>>>  at 
>>>>>> ../../../../openmpi-dev-124-g91e9686/opal/mca/base/mca_base_component_repository.c:382
>>>>>> #11 0xffffffff7eca9504 in opal_obj_run_destructors (object=0x1001893a0)
>>>>>>  at ../../../../openmpi-dev-124-g91e9686/opal/class/opal_object.h:446
>>>>>> #12 0xffffffff7ecaa474 in mca_base_component_repository_release (
>>>>>>  component=0xffffffff7b1236f0 <mca_oob_tcp_component>)
>>>>>>  at 
>>>>>> ../../../../openmpi-dev-124-g91e9686/opal/mca/base/mca_base_component_repository.c:240
>>>>>> #13 0xffffffff7ecac774 in mca_base_component_unload (
>>>>>>  component=0xffffffff7b1236f0 <mca_oob_tcp_component>, output_id=-1)
>>>>>>  at 
>>>>>> ../../../../openmpi-dev-124-g91e9686/opal/mca/base/mca_base_components_close.c:47
>>>>>> #14 0xffffffff7ecac808 in mca_base_component_close (
>>>>>>  component=0xffffffff7b1236f0 <mca_oob_tcp_component>, output_id=-1)
>>>>>>  at 
>>>>>> ../../../../openmpi-dev-124-g91e9686/opal/mca/base/mca_base_components_close.c:60
>>>>>> #15 0xffffffff7ecac8dc in mca_base_components_close (output_id=-1, 
>>>>>>  components=0xffffffff7f14ba58 <orte_oob_base_framework+80>, skip=0x0)
>>>>>>  at 
>>>>>> ../../../../openmpi-dev-124-g91e9686/opal/mca/base/mca_base_components_close.c:86
>>>>>> #16 0xffffffff7ecac844 in mca_base_framework_components_close (
>>>>>>  framework=0xffffffff7f14ba08 <orte_oob_base_framework>, skip=0x0)
>>>>>>  at 
>>>>>> ../../../../openmpi-dev-124-g91e9686/opal/mca/base/mca_base_components_close.c:66
>>>>>> #17 0xffffffff7efcaf58 in orte_oob_base_close ()
>>>>>>  at 
>>>>>> ../../../../openmpi-dev-124-g91e9686/orte/mca/oob/base/oob_base_frame.c:112
>>>>>> #18 0xffffffff7ecc136c in mca_base_framework_close (
>>>>>>  framework=0xffffffff7f14ba08 <orte_oob_base_framework>)
>>>>>>  at 
>>>>>> ../../../../openmpi-dev-124-g91e9686/opal/mca/base/mca_base_framework.c:187
>>>>>> #19 0xffffffff7be07858 in rte_finalize ()
>>>>>>  at 
>>>>>> ../../../../../openmpi-dev-124-g91e9686/orte/mca/ess/hnp/ess_hnp_module.c:857
>>>>>> #20 0xffffffff7ef338a4 in orte_finalize ()
>>>>>>  at ../../openmpi-dev-124-g91e9686/orte/runtime/orte_finalize.c:66
>>>>>> #21 0x000000010000723c in orterun (argc=4, argv=0xffffffff7fffe0b8)
>>>>>>  at 
>>>>>> ../../../../openmpi-dev-124-g91e9686/orte/tools/orterun/orterun.c:1103
>>>>>> #22 0x0000000100003e80 in main (argc=4, argv=0xffffffff7fffe0b8)
>>>>>>  at ../../../../openmpi-dev-124-g91e9686/orte/tools/orterun/main.c:13
>>>>>> (gdb) 
>>>>>>
>>>>>> Kind regards
>>>>>>
>>>>>> Siegmar
>>>>>>
>>>>>>
>>>>>>
>>>>>>> thank you very much for the quick tutorial. Unfortunately I still
>>>>>>> can't get a backtrace.
>>>>>>>
>>>>>>>> You might need to configure with --enable-debug and add -g -O0
>>>>>>>> to your CFLAGS and LDFLAGS
>>>>>>>>
>>>>>>>> Then once you attach with gdb, you have to find the thread that is 
>>>>>>>> polling :
>>>>>>>> thread 1
>>>>>>>> bt
>>>>>>>> thread 2
>>>>>>>> bt
>>>>>>>> and so on until you find the good thread
>>>>>>>> If _dbg is a local variable, you need to select the right frame
>>>>>>>> before you can change the value :
>>>>>>>> get the frame number from bt (generally 1 under linux)
>>>>>>>> f <frame number>
>>>>>>>> set _dbg=0
>>>>>>>>
>>>>>>>> I hope this helps
>>>>>>> "--enable-debug" is one of my default options. Now I used the
>>>>>>> following command to configure Open MPI. I always start the
>>>>>>> build process in an empty directory and I always remove
>>>>>>> /usr/local/openmpi-1.9.0_64_gcc, before I install a new version.
>>>>>>>
>>>>>>> tyr openmpi-dev-124-g91e9686-SunOS.sparc.64_gcc 112 head config.log \
>>>>>>> | grep openmpi
>>>>>>> $ ../openmpi-dev-124-g91e9686/configure
>>>>>>> --prefix=/usr/local/openmpi-1.9.0_64_gcc
>>>>>>> --libdir=/usr/local/openmpi-1.9.0_64_gcc/lib64
>>>>>>> --with-jdk-bindir=/usr/local/jdk1.8.0/bin
>>>>>>> --with-jdk-headers=/usr/local/jdk1.8.0/include
>>>>>>> JAVA_HOME=/usr/local/jdk1.8.0
>>>>>>> LDFLAGS=-m64 -g -O0 CC=gcc CXX=g++ FC=gfortran
>>>>>>> CFLAGS=-m64 -D_REENTRANT -g -O0
>>>>>>> CXXFLAGS=-m64 FCFLAGS=-m64 CPP=cpp CXXCPP=cpp
>>>>>>> CPPFLAGS=-D_REENTRANT CXXCPPFLAGS=
>>>>>>> --enable-mpi-cxx --enable-cxx-exceptions --enable-mpi-java
>>>>>>> --enable-heterogeneous --enable-mpi-thread-multiple
>>>>>>> --with-threads=posix --with-hwloc=internal --without-verbs
>>>>>>> --with-wrapper-cflags=-std=c11 -m64 --enable-debug
>>>>>>> tyr openmpi-dev-124-g91e9686-SunOS.sparc.64_gcc 113 
>>>>>>>
>>>>>>>
>>>>>>> "gbd" doesn't allow any backtrace for any thread.
>>>>>>>
>>>>>>> tyr java 124 /usr/local/gdb-7.6.1_64_gcc/bin/gdb
>>>>>>> GNU gdb (GDB) 7.6.1
>>>>>>> ...
>>>>>>> (gdb) attach 18876
>>>>>>> Attaching to process 18876
>>>>>>> [New process 18876]
>>>>>>> Retry #1:
>>>>>>> Retry #2:
>>>>>>> Retry #3:
>>>>>>> Retry #4:
>>>>>>> 0x7eadcb04 in ?? ()
>>>>>>> (gdb) info threads
>>>>>>> [New LWP 12]
>>>>>>> [New LWP 11]
>>>>>>> [New LWP 10]
>>>>>>> [New LWP 9]
>>>>>>> [New LWP 8]
>>>>>>> [New LWP 7]
>>>>>>> [New LWP 6]
>>>>>>> [New LWP 5]
>>>>>>> [New LWP 4]
>>>>>>> [New LWP 3]
>>>>>>> [New LWP 2]
>>>>>>> Id   Target Id         Frame 
>>>>>>> 12   LWP 2             0x7eadc6b0 in ?? ()
>>>>>>> 11   LWP 3             0x7eadcbb8 in ?? ()
>>>>>>> 10   LWP 4             0x7eadcbb8 in ?? ()
>>>>>>> 9    LWP 5             0x7eadcbb8 in ?? ()
>>>>>>> 8    LWP 6             0x7eadcbb8 in ?? ()
>>>>>>> 7    LWP 7             0x7eadcbb8 in ?? ()
>>>>>>> 6    LWP 8             0x7ead8b0c in ?? ()
>>>>>>> 5    LWP 9             0x7eadcbb8 in ?? ()
>>>>>>> 4    LWP 10            0x7eadcbb8 in ?? ()
>>>>>>> 3    LWP 11            0x7eadcbb8 in ?? ()
>>>>>>> 2    LWP 12            0x7eadcbb8 in ?? ()
>>>>>>> * 1    LWP 1             0x7eadcb04 in ?? ()
>>>>>>> (gdb) thread 1
>>>>>>> [Switching to thread 1 (LWP 1)]
>>>>>>> #0  0x7eadcb04 in ?? ()
>>>>>>> (gdb) bt
>>>>>>> #0  0x7eadcb04 in ?? ()
>>>>>>> #1  0x7eaca12c in ?? ()
>>>>>>> Backtrace stopped: previous frame identical to this frame (corrupt 
>>>>>>> stack?)
>>>>>>> (gdb) thread 2
>>>>>>> [Switching to thread 2 (LWP 12)]
>>>>>>> #0  0x7eadcbb8 in ?? ()
>>>>>>> (gdb) bt
>>>>>>> #0  0x7eadcbb8 in ?? ()
>>>>>>> #1  0x7eac2638 in ?? ()
>>>>>>> Backtrace stopped: previous frame identical to this frame (corrupt 
>>>>>>> stack?)
>>>>>>> (gdb) thread 3
>>>>>>> [Switching to thread 3 (LWP 11)]
>>>>>>> #0  0x7eadcbb8 in ?? ()
>>>>>>> (gdb) bt
>>>>>>> #0  0x7eadcbb8 in ?? ()
>>>>>>> #1  0x7eac25a8 in ?? ()
>>>>>>> Backtrace stopped: previous frame identical to this frame (corrupt 
>>>>>>> stack?)
>>>>>>> (gdb) thread 4
>>>>>>> [Switching to thread 4 (LWP 10)]
>>>>>>> #0  0x7eadcbb8 in ?? ()
>>>>>>> (gdb) bt
>>>>>>> #0  0x7eadcbb8 in ?? ()
>>>>>>> #1  0x7eac2638 in ?? ()
>>>>>>> Backtrace stopped: previous frame identical to this frame (corrupt 
>>>>>>> stack?)
>>>>>>> (gdb) thread 5
>>>>>>> [Switching to thread 5 (LWP 9)]
>>>>>>> #0  0x7eadcbb8 in ?? ()
>>>>>>> (gdb) bt
>>>>>>> #0  0x7eadcbb8 in ?? ()
>>>>>>> #1  0x7eac2638 in ?? ()
>>>>>>> Backtrace stopped: previous frame identical to this frame (corrupt 
>>>>>>> stack?)
>>>>>>> (gdb) thread 6
>>>>>>> [Switching to thread 6 (LWP 8)]
>>>>>>> #0  0x7ead8b0c in ?? ()
>>>>>>> (gdb) bt
>>>>>>> #0  0x7ead8b0c in ?? ()
>>>>>>> #1  0x7eacbcb0 in ?? ()
>>>>>>> Backtrace stopped: previous frame identical to this frame (corrupt 
>>>>>>> stack?)
>>>>>>> (gdb) thread 7
>>>>>>> [Switching to thread 7 (LWP 7)]
>>>>>>> #0  0x7eadcbb8 in ?? ()
>>>>>>> (gdb) bt
>>>>>>> #0  0x7eadcbb8 in ?? ()
>>>>>>> #1  0x7eac25a8 in ?? ()
>>>>>>> Backtrace stopped: previous frame identical to this frame (corrupt 
>>>>>>> stack?)
>>>>>>> (gdb) thread 8
>>>>>>> [Switching to thread 8 (LWP 6)]
>>>>>>> #0  0x7eadcbb8 in ?? ()
>>>>>>> (gdb) bt
>>>>>>> #0  0x7eadcbb8 in ?? ()
>>>>>>> #1  0x7eac25a8 in ?? ()
>>>>>>> Backtrace stopped: previous frame identical to this frame (corrupt 
>>>>>>> stack?)
>>>>>>> (gdb) thread 9
>>>>>>> [Switching to thread 9 (LWP 5)]
>>>>>>> #0  0x7eadcbb8 in ?? ()
>>>>>>> (gdb) bt
>>>>>>> #0  0x7eadcbb8 in ?? ()
>>>>>>> #1  0x7eac2638 in ?? ()
>>>>>>> Backtrace stopped: previous frame identical to this frame (corrupt 
>>>>>>> stack?)
>>>>>>> (gdb) thread 10
>>>>>>> [Switching to thread 10 (LWP 4)]
>>>>>>> #0  0x7eadcbb8 in ?? ()
>>>>>>> (gdb) bt
>>>>>>> #0  0x7eadcbb8 in ?? ()
>>>>>>> #1  0x7eac25a8 in ?? ()
>>>>>>> Backtrace stopped: previous frame identical to this frame (corrupt 
>>>>>>> stack?)
>>>>>>> (gdb) thread 11
>>>>>>> [Switching to thread 11 (LWP 3)]
>>>>>>> #0  0x7eadcbb8 in ?? ()
>>>>>>> (gdb) bt
>>>>>>> #0  0x7eadcbb8 in ?? ()
>>>>>>> #1  0x7eac25a8 in ?? ()
>>>>>>> Backtrace stopped: previous frame identical to this frame (corrupt 
>>>>>>> stack?)
>>>>>>> (gdb) thread 12
>>>>>>> [Switching to thread 12 (LWP 2)]
>>>>>>> #0  0x7eadc6b0 in ?? ()
>>>>>>> (gdb) 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I also tried to set _dbg in all available frames without success.
>>>>>>>
>>>>>>> (gdb) f 1
>>>>>>> #1  0x7eacb46c in ?? ()
>>>>>>> (gdb) set _dbg=0
>>>>>>> No symbol table is loaded.  Use the "file" command.
>>>>>>> (gdb) symbol-file /usr/local/openmpi-1.9.0_64_gcc/lib64/libmpi_java.so
>>>>>>> Reading symbols from 
>>>>>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libmpi_java.so.0.0.0...done.
>>>>>>> (gdb) f 1
>>>>>>> #1  0x7eacb46c in ?? ()
>>>>>>> (gdb) set _dbg=0
>>>>>>> No symbol "_dbg" in current context.
>>>>>>> (gdb) f 2
>>>>>>> #0  0x00000000 in ?? ()
>>>>>>> (gdb) set _dbg=0
>>>>>>> No symbol "_dbg" in current context.
>>>>>>> (gdb) 
>>>>>>> ...
>>>>>>>
>>>>>>>
>>>>>>> With "list" I get source code from mpi_CartComm.c and not from 
>>>>>>> mpi_MPI.c.
>>>>>>> If a switch threads, "list" continues in the old file.
>>>>>>>
>>>>>>> (gdb) thread 1
>>>>>>> [Switching to thread 1 (LWP 1)]
>>>>>>> #0  0x7eadcb04 in ?? ()
>>>>>>> (gdb) list 36
>>>>>>> 31          distributed under the License is distributed on an "AS IS" 
>>>>>>> BASIS,
>>>>>>> 32          WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either 
>>>>>>> express or implied.
>>>>>>> 33          See the License for the specific language governing 
>>>>>>> permissions and
>>>>>>> 34          limitations under the License.
>>>>>>> 35      */
>>>>>>> 36      /*
>>>>>>> 37       * File         : mpi_CartComm.c
>>>>>>> 38       * Headerfile   : mpi_CartComm.h
>>>>>>> 39       * Author       : Sung-Hoon Ko, Xinying Li
>>>>>>> 40       * Created      : Thu Apr  9 12:22:15 1998
>>>>>>> (gdb) thread 2
>>>>>>> [Switching to thread 2 (LWP 12)]
>>>>>>> #0  0x7eadcbb8 in ?? ()
>>>>>>> (gdb) list
>>>>>>> 41       * Revision     : $Revision: 1.6 $
>>>>>>> 42       * Updated      : $Date: 2003/01/16 16:39:34 $
>>>>>>> 43       * Copyright: Northeast Parallel Architectures Center
>>>>>>> 44       *            at Syracuse University 1998
>>>>>>> 45       */
>>>>>>> 46      #include "ompi_config.h"
>>>>>>> 47      
>>>>>>> 48      #include <stdlib.h>
>>>>>>> 49      #ifdef HAVE_TARGETCONDITIONALS_H
>>>>>>> 50      #include <TargetConditionals.h>
>>>>>>> (gdb) 
>>>>>>>
>>>>>>>
>>>>>>> Do you have any ideas, what's going wrong or if I must use a different
>>>>>>> symbol table?
>>>>>>>
>>>>>>>
>>>>>>> Kind regards
>>>>>>>
>>>>>>> Siegmar
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Gilles
>>>>>>>>
>>>>>>>>
>>>>>>>> Siegmar Gross <siegmar.gr...@informatik.hs-fulda.de> wrote:
>>>>>>>>> Hi Gilles,
>>>>>>>>>
>>>>>>>>> I changed _dbg to a static variable, so that it is visible in the
>>>>>>>>> library, but unfortunately still not in the symbol table.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> tyr java 419 nm /usr/local/openmpi-1.9.0_64_gcc/lib64/libmpi_java.so 
>>>>>>>>> | grep -i _dbg
>>>>>>>>> [271]   |  1249644|     4|OBJT |LOCL |0    |18     |_dbg.14258
>>>>>>>>> tyr java 420 /usr/local/gdb-7.6.1_64_gcc/bin/gdb
>>>>>>>>> GNU gdb (GDB) 7.6.1
>>>>>>>>> ...
>>>>>>>>> (gdb) attach 13019
>>>>>>>>> Attaching to process 13019
>>>>>>>>> [New process 13019]
>>>>>>>>> Retry #1:
>>>>>>>>> Retry #2:
>>>>>>>>> Retry #3:
>>>>>>>>> Retry #4:
>>>>>>>>> 0x7eadcb04 in ?? ()
>>>>>>>>> (gdb) symbol-file /usr/local/openmpi-1.9.0_64_gcc/lib64/libmpi_java.so
>>>>>>>>> Reading symbols from 
>>>>>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libmpi_java.so.0.0.0...done.
>>>>>>>>> (gdb) set var _dbg.14258=0
>>>>>>>>> No symbol "_dbg" in current context.
>>>>>>>>> (gdb) 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Kind regards
>>>>>>>>>
>>>>>>>>> Siegmar
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> unfortunately I didn't get anything useful. It's probably my fault,
>>>>>>>>>> because I'm still not very familiar with gdb or any other debugger.
>>>>>>>>>> I did the following things.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 1st window:
>>>>>>>>>> -----------
>>>>>>>>>>
>>>>>>>>>> tyr java 174 setenv OMPI_ATTACH 1
>>>>>>>>>> tyr java 175 mpijavac InitFinalizeMain.java 
>>>>>>>>>> warning: [path] bad path element
>>>>>>>>>> "/usr/local/openmpi-1.9.0_64_gcc/lib64/shmem.jar":
>>>>>>>>>> no such file or directory
>>>>>>>>>> 1 warning
>>>>>>>>>> tyr java 176 mpiexec -np 1 java InitFinalizeMain
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2nd window:
>>>>>>>>>> -----------
>>>>>>>>>>
>>>>>>>>>> tyr java 379 ps -aef | grep java
>>>>>>>>>> noaccess  1345     1   0   May 22 ?         113:23 
>>>>>>>>>> /usr/java/bin/java -server -Xmx128m 
>>>>>>> -XX:+UseParallelGC 
>>>>>>>>> -XX:ParallelGCThreads=4 
>>>>>>>>>> fd1026  3661 10753   0 14:09:12 pts/14      0:00 mpiexec -np 1 java 
>>>>>>>>>> InitFinalizeMain
>>>>>>>>>> fd1026  3677 13371   0 14:16:55 pts/2       0:00 grep java
>>>>>>>>>> fd1026  3663  3661   0 14:09:12 pts/14      0:01 java -cp 
>>>>>>>>> /home/fd1026/work/skripte/master/parallel/prog/mpi/java:/usr/local/jun
>>>>>>>>>> tyr java 380 /usr/local/gdb-7.6.1_64_gcc/bin/gdb
>>>>>>>>>> GNU gdb (GDB) 7.6.1
>>>>>>>>>> ...
>>>>>>>>>> (gdb) attach 3663
>>>>>>>>>> Attaching to process 3663
>>>>>>>>>> [New process 3663]
>>>>>>>>>> Retry #1:
>>>>>>>>>> Retry #2:
>>>>>>>>>> Retry #3:
>>>>>>>>>> Retry #4:
>>>>>>>>>> 0x7eadcb04 in ?? ()
>>>>>>>>>> (gdb) symbol-file 
>>>>>>>>>> /usr/local/openmpi-1.9.0_64_gcc/lib64/libmpi_java.so
>>>>>>>>>> Reading symbols from 
>>>>>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libmpi_java.so.0.0.0...done.
>>>>>>>>>> (gdb) set var _dbg=0
>>>>>>>>>> No symbol "_dbg" in current context.
>>>>>>>>>> (gdb) set var JNI_OnLoad::_dbg=0
>>>>>>>>>> No symbol "_dbg" in specified context.
>>>>>>>>>> (gdb) set JNI_OnLoad::_dbg=0
>>>>>>>>>> No symbol "_dbg" in specified context.
>>>>>>>>>> (gdb) info threads
>>>>>>>>>> [New LWP 12]
>>>>>>>>>> [New LWP 11]
>>>>>>>>>> [New LWP 10]
>>>>>>>>>> [New LWP 9]
>>>>>>>>>> [New LWP 8]
>>>>>>>>>> [New LWP 7]
>>>>>>>>>> [New LWP 6]
>>>>>>>>>> [New LWP 5]
>>>>>>>>>> [New LWP 4]
>>>>>>>>>> [New LWP 3]
>>>>>>>>>> [New LWP 2]
>>>>>>>>>> Id   Target Id         Frame 
>>>>>>>>>> 12   LWP 2             0x7eadc6b0 in ?? ()
>>>>>>>>>> 11   LWP 3             0x7eadcbb8 in ?? ()
>>>>>>>>>> 10   LWP 4             0x7eadcbb8 in ?? ()
>>>>>>>>>> 9    LWP 5             0x7eadcbb8 in ?? ()
>>>>>>>>>> 8    LWP 6             0x7eadcbb8 in ?? ()
>>>>>>>>>> 7    LWP 7             0x7eadcbb8 in ?? ()
>>>>>>>>>> 6    LWP 8             0x7ead8b0c in ?? ()
>>>>>>>>>> 5    LWP 9             0x7eadcbb8 in ?? ()
>>>>>>>>>> 4    LWP 10            0x7eadcbb8 in ?? ()
>>>>>>>>>> 3    LWP 11            0x7eadcbb8 in ?? ()
>>>>>>>>>> 2    LWP 12            0x7eadcbb8 in ?? ()
>>>>>>>>>> * 1    LWP 1             0x7eadcb04 in ?? ()
>>>>>>>>>> (gdb) 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> It seems that "_dbg" is unknown and unavailable.
>>>>>>>>>>
>>>>>>>>>> tyr java 399 grep _dbg 
>>>>>>>>>> /export2/src/openmpi-1.9/openmpi-dev-124-g91e9686/ompi/mpi/java/c/*
>>>>>>>>>> /export2/src/openmpi-1.9/openmpi-dev-124-g91e9686/ompi/mpi/java/c/mpi_MPI.c:
>>>>>>>>>>         volatile 
>>>>>>> int _dbg = 1;
>>>>>>>>>> /export2/src/openmpi-1.9/openmpi-dev-124-g91e9686/ompi/mpi/java/c/mpi_MPI.c:
>>>>>>>>>>         while 
>>>>>>> (_dbg) poll(NULL, 0, 1);
>>>>>>>>>> tyr java 400 nm /usr/local/openmpi-1.9.0_64_gcc/lib64/*.so | grep -i 
>>>>>>>>>> _dbg
>>>>>>>>>> tyr java 401 nm /usr/local/openmpi-1.9.0_64_gcc/lib64/*.so | grep -i 
>>>>>>>>>> JNI_OnLoad
>>>>>>>>>> [1057]  |              139688|                 444|FUNC |GLOB |0    
>>>>>>>>>> |11     |JNI_OnLoad
>>>>>>>>>> tyr java 402 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> How can I set _dbg to zero to continue mpiexec? I also tried to
>>>>>>>>>> set a breakpoint for function JNI_OnLoad, but it seems, that the
>>>>>>>>>> function isn't called before SIGSEGV.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> tyr java 177 unsetenv OMPI_ATTACH 
>>>>>>>>>> tyr java 178 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec
>>>>>>>>>> GNU gdb (GDB) 7.6.1
>>>>>>>>>> ...
>>>>>>>>>> (gdb) b mpi_MPI.c:JNI_OnLoad
>>>>>>>>>> No source file named mpi_MPI.c.
>>>>>>>>>> Make breakpoint pending on future shared library load? (y or [n]) y
>>>>>>>>>>
>>>>>>>>>> Breakpoint 1 (mpi_MPI.c:JNI_OnLoad) pending.
>>>>>>>>>> (gdb) run -np 1 java InitFinalizeMain 
>>>>>>>>>> Starting program: /usr/local/openmpi-1.9.0_64_gcc/bin/mpiexec -np 1 
>>>>>>>>>> java InitFinalizeMain
>>>>>>>>>> [Thread debugging using libthread_db enabled]
>>>>>>>>>> [New Thread 1 (LWP 1)]
>>>>>>>>>> [New LWP    2        ]
>>>>>>>>>> #
>>>>>>>>>> # A fatal error has been detected by the Java Runtime Environment:
>>>>>>>>>> #
>>>>>>>>>> #  SIGSEGV (0xb) at pc=0xffffffff7ea3c7f0, pid=3518, tid=2
>>>>>>>>>> ...
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> tyr java 381 cat InitFinalizeMain.java 
>>>>>>>>>> import mpi.*;
>>>>>>>>>>
>>>>>>>>>> public class InitFinalizeMain
>>>>>>>>>> {
>>>>>>>>>> public static void main (String args[]) throws MPIException
>>>>>>>>>> {
>>>>>>>>>>  MPI.Init (args);
>>>>>>>>>>  System.out.print ("Hello!\n");
>>>>>>>>>>  MPI.Finalize ();
>>>>>>>>>> }
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> SIGSEGV happens in MPI.Init(args), because I can print a message
>>>>>>>>>> before I call the method.
>>>>>>>>>>
>>>>>>>>>> tyr java 192 unsetenv OMPI_ATTACH
>>>>>>>>>> tyr java 193 mpijavac InitFinalizeMain.java
>>>>>>>>>> tyr java 194 mpiexec -np 1 java InitFinalizeMain
>>>>>>>>>> Before MPI.Init()
>>>>>>>>>> #
>>>>>>>>>> # A fatal error has been detected by the Java Runtime Environment:
>>>>>>>>>> #
>>>>>>>>>> #  SIGSEGV (0xb) at pc=0xffffffff7ea3c7f0, pid=3697, tid=2
>>>>>>>>>> ...
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Any ideas, how I can continue? I couldn't find a C function for
>>>>>>>>>> MPI.Init() in a C file. Do you know, which function is called first,
>>>>>>>>>> so that I can set a breakpoint? By the way, I get the same error
>>>>>>>>>> for Solaris 10 x86_64.
>>>>>>>>>>
>>>>>>>>>> tyr java 388 ssh sunpc1
>>>>>>>>>> ...
>>>>>>>>>> sunpc1 java 106 mpijavac InitFinalizeMain.java
>>>>>>>>>> sunpc1 java 107 uname -a
>>>>>>>>>> SunOS sunpc1 5.10 Generic_147441-21 i86pc i386 i86pc Solaris
>>>>>>>>>> sunpc1 java 108 isainfo -k
>>>>>>>>>> amd64
>>>>>>>>>> sunpc1 java 109 mpiexec -np 1 java InitFinalizeMain
>>>>>>>>>> #
>>>>>>>>>> # A fatal error has been detected by the Java Runtime Environment:
>>>>>>>>>> #
>>>>>>>>>> #  SIGSEGV (0xb) at pc=0xfffffd7fff1d77f0, pid=20256, tid=2
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thank you very much for any help in advance.
>>>>>>>>>>
>>>>>>>>>> Kind regards
>>>>>>>>>>
>>>>>>>>>> Siegmar
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> thank you very much for your help.
>>>>>>>>>>>
>>>>>>>>>>>> how did you configure openmpi ? which java version did you use ?
>>>>>>>>>>>>
>>>>>>>>>>>> i just found a regression and you currently have to explicitly add
>>>>>>>>>>>> CFLAGS=-D_REENTRANT CPPFLAGS=-D_REENTRANT
>>>>>>>>>>>> to your configure command line
>>>>>>>>>>> I added "-D_REENTRANT" to my command.
>>>>>>>>>>>
>>>>>>>>>>> ../openmpi-dev-124-g91e9686/configure 
>>>>>>>>>>> --prefix=/usr/local/openmpi-1.9.0_64_gcc \
>>>>>>>>>>> --libdir=/usr/local/openmpi-1.9.0_64_gcc/lib64 \
>>>>>>>>>>> --with-jdk-bindir=/usr/local/jdk1.8.0/bin \
>>>>>>>>>>> --with-jdk-headers=/usr/local/jdk1.8.0/include \
>>>>>>>>>>> JAVA_HOME=/usr/local/jdk1.8.0 \
>>>>>>>>>>> LDFLAGS="-m64" CC="gcc" CXX="g++" FC="gfortran" \
>>>>>>>>>>> CFLAGS="-m64 -D_REENTRANT" CXXFLAGS="-m64" FCFLAGS="-m64" \
>>>>>>>>>>> CPP="cpp" CXXCPP="cpp" \
>>>>>>>>>>> CPPFLAGS="-D_REENTRANT" CXXCPPFLAGS="" \
>>>>>>>>>>> --enable-mpi-cxx \
>>>>>>>>>>> --enable-cxx-exceptions \
>>>>>>>>>>> --enable-mpi-java \
>>>>>>>>>>> --enable-heterogeneous \
>>>>>>>>>>> --enable-mpi-thread-multiple \
>>>>>>>>>>> --with-threads=posix \
>>>>>>>>>>> --with-hwloc=internal \
>>>>>>>>>>> --without-verbs \
>>>>>>>>>>> --with-wrapper-cflags="-std=c11 -m64" \
>>>>>>>>>>> --enable-debug \
>>>>>>>>>>> |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_gcc
>>>>>>>>>>>
>>>>>>>>>>> I use Java 8.
>>>>>>>>>>>
>>>>>>>>>>> tyr openmpi-1.9 112 java -version
>>>>>>>>>>> java version "1.8.0"
>>>>>>>>>>> Java(TM) SE Runtime Environment (build 1.8.0-b132)
>>>>>>>>>>> Java HotSpot(TM) 64-Bit Server VM (build 25.0-b70, mixed mode)
>>>>>>>>>>> tyr openmpi-1.9 113 
>>>>>>>>>>>
>>>>>>>>>>> Unfortunately I still get a SIGSEGV with openmpi-dev-124-g91e9686.
>>>>>>>>>>> I have applied your patch and will try to debug my small Java
>>>>>>>>>>> program tomorrow or next week and then let you know the result.
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>> Link to this post: 
>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2014/10/25581.php
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> us...@open-mpi.org
>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>> Link to this post: 
>>>>>>>>> http://www.open-mpi.org/community/lists/users/2014/10/25582.php
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> Link to this post: 
>>>>>>> http://www.open-mpi.org/community/lists/users/2014/10/25584.php
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> Link to this post: 
>>>>>> http://www.open-mpi.org/community/lists/users/2014/10/25585.php
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/users/2014/10/25586.php
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/users/2014/10/25587.php
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2014/10/25588.php
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/10/25589.php
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/10/25592.php

Reply via email to