Ralph, this is also a solution. the pro is it seems more lightweight than PR #249 the two cons i can see are : - opal_process_name_t alignment goes from 64 to 32 bits - some functions (opal_hash_table_*) takes an uint64_t as argument so we still need to use memcpy in order to * guarantee 64 bits alignment on some archs (such as sparc) * avoid ugly cast such as uint64_t id = *(uint64_t *)&process_name;
as far as i am concerned, i am fine with your proposed suggestion to dump opal_identifier_t. about the patch, did you mean you have something ready i can apply to my PR ? or do you expect me to do the changes (i am ok to do it if needed) Cheers, Gilles On 2014/10/27 11:04, Ralph Castain wrote: > Just took a glance thru 249 and have a few suggestions on it - will pass them > along tomorrow. I think the right solution is to (a) dump opal_identifier_t > in favor of using opal_process_name_t everywhere in the opal layer, (b) > typedef orte_process_name_t to opal_process_name_t, and (c) leave > ompi_process_name_t as typedef’d to the RTE component in the MPI layer. This > lets other RTEs decide for themselves how they want to handle it. > > If you add changes to your branch, I can pass you a patch with my suggested > alterations. > >> On Oct 26, 2014, at 5:55 PM, Gilles Gouaillardet >> <gilles.gouaillar...@gmail.com> wrote: >> >> No :-( >> I need some extra work to stop declaring orte_process_name_t and >> ompi_process_name_t variables. >> #249 will make things much easier. >> One option is to use opal_process_name_t everywhere or typedef orte and ompi >> types to the opal one. >> An other (lightweight but error prone imho) is to change variable >> declaration only. >> Any thought ? >> >> Ralph Castain <r...@open-mpi.org> wrote: >>> Will PR#249 solve it? If so, we should just go with it as I suspect that is >>> the long-term solution. >>> >>>> On Oct 26, 2014, at 4:25 PM, Gilles Gouaillardet >>>> <gilles.gouaillar...@gmail.com> wrote: >>>> >>>> It looks like we faced a similar issue : >>>> opal_process_name_t is 64 bits aligned wheteas orte_process_name_t is 32 >>>> bits aligned. If you run an alignment sensitive cpu such as sparc and you >>>> are not lucky (so to speak) you can run into this issue. >>>> i will make a patch for this shortly >>>> >>>> Ralph Castain <r...@open-mpi.org> wrote: >>>>> Afraid this must be something about the Sparc - just ran on a Solaris 11 >>>>> x86 box and everything works fine. >>>>> >>>>> >>>>>> On Oct 26, 2014, at 8:22 AM, Siegmar Gross >>>>>> <siegmar.gr...@informatik.hs-fulda.de> wrote: >>>>>> >>>>>> Hi Gilles, >>>>>> >>>>>> I wanted to explore which function is called, when I call MPI_Init >>>>>> in a C program, because this function should be called from a Java >>>>>> program as well. Unfortunately C programs break with a Bus Error >>>>>> once more for openmpi-dev-124-g91e9686 on Solaris. I assume that's >>>>>> the reason why I get no useful backtrace for my Java program. >>>>>> >>>>>> tyr small_prog 117 mpicc -o init_finalize init_finalize.c >>>>>> tyr small_prog 118 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec >>>>>> ... >>>>>> (gdb) run -np 1 init_finalize >>>>>> Starting program: /usr/local/openmpi-1.9.0_64_gcc/bin/mpiexec -np 1 >>>>>> init_finalize >>>>>> [Thread debugging using libthread_db enabled] >>>>>> [New Thread 1 (LWP 1)] >>>>>> [New LWP 2 ] >>>>>> [tyr:19240] *** Process received signal *** >>>>>> [tyr:19240] Signal: Bus Error (10) >>>>>> [tyr:19240] Signal code: Invalid address alignment (1) >>>>>> [tyr:19240] Failing at address: ffffffff7bd1c10c >>>>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libopen-pal.so.0.0.0:opal_backtrace_print+0x2c >>>>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libopen-pal.so.0.0.0:0xdcc04 >>>>>> /lib/sparcv9/libc.so.1:0xd8b98 >>>>>> /lib/sparcv9/libc.so.1:0xcc70c >>>>>> /lib/sparcv9/libc.so.1:0xcc918 >>>>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libopen-pal.so.0.0.0:opal_proc_set_name+0x1c >>>>>> [ Signal 10 (BUS)] >>>>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/openmpi/mca_pmix_native.so:0x103e8 >>>>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/openmpi/mca_ess_pmi.so:0x33dc >>>>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libopen-rte.so.0.0.0:orte_init+0x67c >>>>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libmpi.so.0.0.0:ompi_mpi_init+0x374 >>>>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libmpi.so.0.0.0:PMPI_Init+0x2a8 >>>>>> /home/fd1026/work/skripte/master/parallel/prog/mpi/small_prog/init_finalize:main+0x20 >>>>>> /home/fd1026/work/skripte/master/parallel/prog/mpi/small_prog/init_finalize:_start+0x7c >>>>>> [tyr:19240] *** End of error message *** >>>>>> -------------------------------------------------------------------------- >>>>>> mpiexec noticed that process rank 0 with PID 0 on node tyr exited on >>>>>> signal 10 (Bus Error). >>>>>> -------------------------------------------------------------------------- >>>>>> [LWP 2 exited] >>>>>> [New Thread 2 ] >>>>>> [Switching to Thread 1 (LWP 1)] >>>>>> sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to >>>>>> satisfy query >>>>>> (gdb) bt >>>>>> #0 0xffffffff7f6173d0 in rtld_db_dlactivity () from >>>>>> /usr/lib/sparcv9/ld.so.1 >>>>>> #1 0xffffffff7f6175a8 in rd_event () from /usr/lib/sparcv9/ld.so.1 >>>>>> #2 0xffffffff7f618950 in lm_delete () from /usr/lib/sparcv9/ld.so.1 >>>>>> #3 0xffffffff7f6226bc in remove_so () from /usr/lib/sparcv9/ld.so.1 >>>>>> #4 0xffffffff7f624574 in remove_hdl () from /usr/lib/sparcv9/ld.so.1 >>>>>> #5 0xffffffff7f61d97c in dlclose_core () from /usr/lib/sparcv9/ld.so.1 >>>>>> #6 0xffffffff7f61d9d4 in dlclose_intn () from /usr/lib/sparcv9/ld.so.1 >>>>>> #7 0xffffffff7f61db0c in dlclose () from /usr/lib/sparcv9/ld.so.1 >>>>>> #8 0xffffffff7ec87f60 in vm_close (loader_data=0x0, >>>>>> module=0xffffffff7c901fe0) >>>>>> at ../../../openmpi-dev-124-g91e9686/opal/libltdl/loaders/dlopen.c:212 >>>>>> #9 0xffffffff7ec85534 in lt_dlclose (handle=0x100189b50) >>>>>> at ../../../openmpi-dev-124-g91e9686/opal/libltdl/ltdl.c:1982 >>>>>> #10 0xffffffff7ecaabd4 in ri_destructor (obj=0x1001893a0) >>>>>> at >>>>>> ../../../../openmpi-dev-124-g91e9686/opal/mca/base/mca_base_component_repository.c:382 >>>>>> #11 0xffffffff7eca9504 in opal_obj_run_destructors (object=0x1001893a0) >>>>>> at ../../../../openmpi-dev-124-g91e9686/opal/class/opal_object.h:446 >>>>>> #12 0xffffffff7ecaa474 in mca_base_component_repository_release ( >>>>>> component=0xffffffff7b1236f0 <mca_oob_tcp_component>) >>>>>> at >>>>>> ../../../../openmpi-dev-124-g91e9686/opal/mca/base/mca_base_component_repository.c:240 >>>>>> #13 0xffffffff7ecac774 in mca_base_component_unload ( >>>>>> component=0xffffffff7b1236f0 <mca_oob_tcp_component>, output_id=-1) >>>>>> at >>>>>> ../../../../openmpi-dev-124-g91e9686/opal/mca/base/mca_base_components_close.c:47 >>>>>> #14 0xffffffff7ecac808 in mca_base_component_close ( >>>>>> component=0xffffffff7b1236f0 <mca_oob_tcp_component>, output_id=-1) >>>>>> at >>>>>> ../../../../openmpi-dev-124-g91e9686/opal/mca/base/mca_base_components_close.c:60 >>>>>> #15 0xffffffff7ecac8dc in mca_base_components_close (output_id=-1, >>>>>> components=0xffffffff7f14ba58 <orte_oob_base_framework+80>, skip=0x0) >>>>>> at >>>>>> ../../../../openmpi-dev-124-g91e9686/opal/mca/base/mca_base_components_close.c:86 >>>>>> #16 0xffffffff7ecac844 in mca_base_framework_components_close ( >>>>>> framework=0xffffffff7f14ba08 <orte_oob_base_framework>, skip=0x0) >>>>>> at >>>>>> ../../../../openmpi-dev-124-g91e9686/opal/mca/base/mca_base_components_close.c:66 >>>>>> #17 0xffffffff7efcaf58 in orte_oob_base_close () >>>>>> at >>>>>> ../../../../openmpi-dev-124-g91e9686/orte/mca/oob/base/oob_base_frame.c:112 >>>>>> #18 0xffffffff7ecc136c in mca_base_framework_close ( >>>>>> framework=0xffffffff7f14ba08 <orte_oob_base_framework>) >>>>>> at >>>>>> ../../../../openmpi-dev-124-g91e9686/opal/mca/base/mca_base_framework.c:187 >>>>>> #19 0xffffffff7be07858 in rte_finalize () >>>>>> at >>>>>> ../../../../../openmpi-dev-124-g91e9686/orte/mca/ess/hnp/ess_hnp_module.c:857 >>>>>> #20 0xffffffff7ef338a4 in orte_finalize () >>>>>> at ../../openmpi-dev-124-g91e9686/orte/runtime/orte_finalize.c:66 >>>>>> #21 0x000000010000723c in orterun (argc=4, argv=0xffffffff7fffe0b8) >>>>>> at >>>>>> ../../../../openmpi-dev-124-g91e9686/orte/tools/orterun/orterun.c:1103 >>>>>> #22 0x0000000100003e80 in main (argc=4, argv=0xffffffff7fffe0b8) >>>>>> at ../../../../openmpi-dev-124-g91e9686/orte/tools/orterun/main.c:13 >>>>>> (gdb) >>>>>> >>>>>> Kind regards >>>>>> >>>>>> Siegmar >>>>>> >>>>>> >>>>>> >>>>>>> thank you very much for the quick tutorial. Unfortunately I still >>>>>>> can't get a backtrace. >>>>>>> >>>>>>>> You might need to configure with --enable-debug and add -g -O0 >>>>>>>> to your CFLAGS and LDFLAGS >>>>>>>> >>>>>>>> Then once you attach with gdb, you have to find the thread that is >>>>>>>> polling : >>>>>>>> thread 1 >>>>>>>> bt >>>>>>>> thread 2 >>>>>>>> bt >>>>>>>> and so on until you find the good thread >>>>>>>> If _dbg is a local variable, you need to select the right frame >>>>>>>> before you can change the value : >>>>>>>> get the frame number from bt (generally 1 under linux) >>>>>>>> f <frame number> >>>>>>>> set _dbg=0 >>>>>>>> >>>>>>>> I hope this helps >>>>>>> "--enable-debug" is one of my default options. Now I used the >>>>>>> following command to configure Open MPI. I always start the >>>>>>> build process in an empty directory and I always remove >>>>>>> /usr/local/openmpi-1.9.0_64_gcc, before I install a new version. >>>>>>> >>>>>>> tyr openmpi-dev-124-g91e9686-SunOS.sparc.64_gcc 112 head config.log \ >>>>>>> | grep openmpi >>>>>>> $ ../openmpi-dev-124-g91e9686/configure >>>>>>> --prefix=/usr/local/openmpi-1.9.0_64_gcc >>>>>>> --libdir=/usr/local/openmpi-1.9.0_64_gcc/lib64 >>>>>>> --with-jdk-bindir=/usr/local/jdk1.8.0/bin >>>>>>> --with-jdk-headers=/usr/local/jdk1.8.0/include >>>>>>> JAVA_HOME=/usr/local/jdk1.8.0 >>>>>>> LDFLAGS=-m64 -g -O0 CC=gcc CXX=g++ FC=gfortran >>>>>>> CFLAGS=-m64 -D_REENTRANT -g -O0 >>>>>>> CXXFLAGS=-m64 FCFLAGS=-m64 CPP=cpp CXXCPP=cpp >>>>>>> CPPFLAGS=-D_REENTRANT CXXCPPFLAGS= >>>>>>> --enable-mpi-cxx --enable-cxx-exceptions --enable-mpi-java >>>>>>> --enable-heterogeneous --enable-mpi-thread-multiple >>>>>>> --with-threads=posix --with-hwloc=internal --without-verbs >>>>>>> --with-wrapper-cflags=-std=c11 -m64 --enable-debug >>>>>>> tyr openmpi-dev-124-g91e9686-SunOS.sparc.64_gcc 113 >>>>>>> >>>>>>> >>>>>>> "gbd" doesn't allow any backtrace for any thread. >>>>>>> >>>>>>> tyr java 124 /usr/local/gdb-7.6.1_64_gcc/bin/gdb >>>>>>> GNU gdb (GDB) 7.6.1 >>>>>>> ... >>>>>>> (gdb) attach 18876 >>>>>>> Attaching to process 18876 >>>>>>> [New process 18876] >>>>>>> Retry #1: >>>>>>> Retry #2: >>>>>>> Retry #3: >>>>>>> Retry #4: >>>>>>> 0x7eadcb04 in ?? () >>>>>>> (gdb) info threads >>>>>>> [New LWP 12] >>>>>>> [New LWP 11] >>>>>>> [New LWP 10] >>>>>>> [New LWP 9] >>>>>>> [New LWP 8] >>>>>>> [New LWP 7] >>>>>>> [New LWP 6] >>>>>>> [New LWP 5] >>>>>>> [New LWP 4] >>>>>>> [New LWP 3] >>>>>>> [New LWP 2] >>>>>>> Id Target Id Frame >>>>>>> 12 LWP 2 0x7eadc6b0 in ?? () >>>>>>> 11 LWP 3 0x7eadcbb8 in ?? () >>>>>>> 10 LWP 4 0x7eadcbb8 in ?? () >>>>>>> 9 LWP 5 0x7eadcbb8 in ?? () >>>>>>> 8 LWP 6 0x7eadcbb8 in ?? () >>>>>>> 7 LWP 7 0x7eadcbb8 in ?? () >>>>>>> 6 LWP 8 0x7ead8b0c in ?? () >>>>>>> 5 LWP 9 0x7eadcbb8 in ?? () >>>>>>> 4 LWP 10 0x7eadcbb8 in ?? () >>>>>>> 3 LWP 11 0x7eadcbb8 in ?? () >>>>>>> 2 LWP 12 0x7eadcbb8 in ?? () >>>>>>> * 1 LWP 1 0x7eadcb04 in ?? () >>>>>>> (gdb) thread 1 >>>>>>> [Switching to thread 1 (LWP 1)] >>>>>>> #0 0x7eadcb04 in ?? () >>>>>>> (gdb) bt >>>>>>> #0 0x7eadcb04 in ?? () >>>>>>> #1 0x7eaca12c in ?? () >>>>>>> Backtrace stopped: previous frame identical to this frame (corrupt >>>>>>> stack?) >>>>>>> (gdb) thread 2 >>>>>>> [Switching to thread 2 (LWP 12)] >>>>>>> #0 0x7eadcbb8 in ?? () >>>>>>> (gdb) bt >>>>>>> #0 0x7eadcbb8 in ?? () >>>>>>> #1 0x7eac2638 in ?? () >>>>>>> Backtrace stopped: previous frame identical to this frame (corrupt >>>>>>> stack?) >>>>>>> (gdb) thread 3 >>>>>>> [Switching to thread 3 (LWP 11)] >>>>>>> #0 0x7eadcbb8 in ?? () >>>>>>> (gdb) bt >>>>>>> #0 0x7eadcbb8 in ?? () >>>>>>> #1 0x7eac25a8 in ?? () >>>>>>> Backtrace stopped: previous frame identical to this frame (corrupt >>>>>>> stack?) >>>>>>> (gdb) thread 4 >>>>>>> [Switching to thread 4 (LWP 10)] >>>>>>> #0 0x7eadcbb8 in ?? () >>>>>>> (gdb) bt >>>>>>> #0 0x7eadcbb8 in ?? () >>>>>>> #1 0x7eac2638 in ?? () >>>>>>> Backtrace stopped: previous frame identical to this frame (corrupt >>>>>>> stack?) >>>>>>> (gdb) thread 5 >>>>>>> [Switching to thread 5 (LWP 9)] >>>>>>> #0 0x7eadcbb8 in ?? () >>>>>>> (gdb) bt >>>>>>> #0 0x7eadcbb8 in ?? () >>>>>>> #1 0x7eac2638 in ?? () >>>>>>> Backtrace stopped: previous frame identical to this frame (corrupt >>>>>>> stack?) >>>>>>> (gdb) thread 6 >>>>>>> [Switching to thread 6 (LWP 8)] >>>>>>> #0 0x7ead8b0c in ?? () >>>>>>> (gdb) bt >>>>>>> #0 0x7ead8b0c in ?? () >>>>>>> #1 0x7eacbcb0 in ?? () >>>>>>> Backtrace stopped: previous frame identical to this frame (corrupt >>>>>>> stack?) >>>>>>> (gdb) thread 7 >>>>>>> [Switching to thread 7 (LWP 7)] >>>>>>> #0 0x7eadcbb8 in ?? () >>>>>>> (gdb) bt >>>>>>> #0 0x7eadcbb8 in ?? () >>>>>>> #1 0x7eac25a8 in ?? () >>>>>>> Backtrace stopped: previous frame identical to this frame (corrupt >>>>>>> stack?) >>>>>>> (gdb) thread 8 >>>>>>> [Switching to thread 8 (LWP 6)] >>>>>>> #0 0x7eadcbb8 in ?? () >>>>>>> (gdb) bt >>>>>>> #0 0x7eadcbb8 in ?? () >>>>>>> #1 0x7eac25a8 in ?? () >>>>>>> Backtrace stopped: previous frame identical to this frame (corrupt >>>>>>> stack?) >>>>>>> (gdb) thread 9 >>>>>>> [Switching to thread 9 (LWP 5)] >>>>>>> #0 0x7eadcbb8 in ?? () >>>>>>> (gdb) bt >>>>>>> #0 0x7eadcbb8 in ?? () >>>>>>> #1 0x7eac2638 in ?? () >>>>>>> Backtrace stopped: previous frame identical to this frame (corrupt >>>>>>> stack?) >>>>>>> (gdb) thread 10 >>>>>>> [Switching to thread 10 (LWP 4)] >>>>>>> #0 0x7eadcbb8 in ?? () >>>>>>> (gdb) bt >>>>>>> #0 0x7eadcbb8 in ?? () >>>>>>> #1 0x7eac25a8 in ?? () >>>>>>> Backtrace stopped: previous frame identical to this frame (corrupt >>>>>>> stack?) >>>>>>> (gdb) thread 11 >>>>>>> [Switching to thread 11 (LWP 3)] >>>>>>> #0 0x7eadcbb8 in ?? () >>>>>>> (gdb) bt >>>>>>> #0 0x7eadcbb8 in ?? () >>>>>>> #1 0x7eac25a8 in ?? () >>>>>>> Backtrace stopped: previous frame identical to this frame (corrupt >>>>>>> stack?) >>>>>>> (gdb) thread 12 >>>>>>> [Switching to thread 12 (LWP 2)] >>>>>>> #0 0x7eadc6b0 in ?? () >>>>>>> (gdb) >>>>>>> >>>>>>> >>>>>>> >>>>>>> I also tried to set _dbg in all available frames without success. >>>>>>> >>>>>>> (gdb) f 1 >>>>>>> #1 0x7eacb46c in ?? () >>>>>>> (gdb) set _dbg=0 >>>>>>> No symbol table is loaded. Use the "file" command. >>>>>>> (gdb) symbol-file /usr/local/openmpi-1.9.0_64_gcc/lib64/libmpi_java.so >>>>>>> Reading symbols from >>>>>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libmpi_java.so.0.0.0...done. >>>>>>> (gdb) f 1 >>>>>>> #1 0x7eacb46c in ?? () >>>>>>> (gdb) set _dbg=0 >>>>>>> No symbol "_dbg" in current context. >>>>>>> (gdb) f 2 >>>>>>> #0 0x00000000 in ?? () >>>>>>> (gdb) set _dbg=0 >>>>>>> No symbol "_dbg" in current context. >>>>>>> (gdb) >>>>>>> ... >>>>>>> >>>>>>> >>>>>>> With "list" I get source code from mpi_CartComm.c and not from >>>>>>> mpi_MPI.c. >>>>>>> If a switch threads, "list" continues in the old file. >>>>>>> >>>>>>> (gdb) thread 1 >>>>>>> [Switching to thread 1 (LWP 1)] >>>>>>> #0 0x7eadcb04 in ?? () >>>>>>> (gdb) list 36 >>>>>>> 31 distributed under the License is distributed on an "AS IS" >>>>>>> BASIS, >>>>>>> 32 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either >>>>>>> express or implied. >>>>>>> 33 See the License for the specific language governing >>>>>>> permissions and >>>>>>> 34 limitations under the License. >>>>>>> 35 */ >>>>>>> 36 /* >>>>>>> 37 * File : mpi_CartComm.c >>>>>>> 38 * Headerfile : mpi_CartComm.h >>>>>>> 39 * Author : Sung-Hoon Ko, Xinying Li >>>>>>> 40 * Created : Thu Apr 9 12:22:15 1998 >>>>>>> (gdb) thread 2 >>>>>>> [Switching to thread 2 (LWP 12)] >>>>>>> #0 0x7eadcbb8 in ?? () >>>>>>> (gdb) list >>>>>>> 41 * Revision : $Revision: 1.6 $ >>>>>>> 42 * Updated : $Date: 2003/01/16 16:39:34 $ >>>>>>> 43 * Copyright: Northeast Parallel Architectures Center >>>>>>> 44 * at Syracuse University 1998 >>>>>>> 45 */ >>>>>>> 46 #include "ompi_config.h" >>>>>>> 47 >>>>>>> 48 #include <stdlib.h> >>>>>>> 49 #ifdef HAVE_TARGETCONDITIONALS_H >>>>>>> 50 #include <TargetConditionals.h> >>>>>>> (gdb) >>>>>>> >>>>>>> >>>>>>> Do you have any ideas, what's going wrong or if I must use a different >>>>>>> symbol table? >>>>>>> >>>>>>> >>>>>>> Kind regards >>>>>>> >>>>>>> Siegmar >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Gilles >>>>>>>> >>>>>>>> >>>>>>>> Siegmar Gross <siegmar.gr...@informatik.hs-fulda.de> wrote: >>>>>>>>> Hi Gilles, >>>>>>>>> >>>>>>>>> I changed _dbg to a static variable, so that it is visible in the >>>>>>>>> library, but unfortunately still not in the symbol table. >>>>>>>>> >>>>>>>>> >>>>>>>>> tyr java 419 nm /usr/local/openmpi-1.9.0_64_gcc/lib64/libmpi_java.so >>>>>>>>> | grep -i _dbg >>>>>>>>> [271] | 1249644| 4|OBJT |LOCL |0 |18 |_dbg.14258 >>>>>>>>> tyr java 420 /usr/local/gdb-7.6.1_64_gcc/bin/gdb >>>>>>>>> GNU gdb (GDB) 7.6.1 >>>>>>>>> ... >>>>>>>>> (gdb) attach 13019 >>>>>>>>> Attaching to process 13019 >>>>>>>>> [New process 13019] >>>>>>>>> Retry #1: >>>>>>>>> Retry #2: >>>>>>>>> Retry #3: >>>>>>>>> Retry #4: >>>>>>>>> 0x7eadcb04 in ?? () >>>>>>>>> (gdb) symbol-file /usr/local/openmpi-1.9.0_64_gcc/lib64/libmpi_java.so >>>>>>>>> Reading symbols from >>>>>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libmpi_java.so.0.0.0...done. >>>>>>>>> (gdb) set var _dbg.14258=0 >>>>>>>>> No symbol "_dbg" in current context. >>>>>>>>> (gdb) >>>>>>>>> >>>>>>>>> >>>>>>>>> Kind regards >>>>>>>>> >>>>>>>>> Siegmar >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> unfortunately I didn't get anything useful. It's probably my fault, >>>>>>>>>> because I'm still not very familiar with gdb or any other debugger. >>>>>>>>>> I did the following things. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 1st window: >>>>>>>>>> ----------- >>>>>>>>>> >>>>>>>>>> tyr java 174 setenv OMPI_ATTACH 1 >>>>>>>>>> tyr java 175 mpijavac InitFinalizeMain.java >>>>>>>>>> warning: [path] bad path element >>>>>>>>>> "/usr/local/openmpi-1.9.0_64_gcc/lib64/shmem.jar": >>>>>>>>>> no such file or directory >>>>>>>>>> 1 warning >>>>>>>>>> tyr java 176 mpiexec -np 1 java InitFinalizeMain >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 2nd window: >>>>>>>>>> ----------- >>>>>>>>>> >>>>>>>>>> tyr java 379 ps -aef | grep java >>>>>>>>>> noaccess 1345 1 0 May 22 ? 113:23 >>>>>>>>>> /usr/java/bin/java -server -Xmx128m >>>>>>> -XX:+UseParallelGC >>>>>>>>> -XX:ParallelGCThreads=4 >>>>>>>>>> fd1026 3661 10753 0 14:09:12 pts/14 0:00 mpiexec -np 1 java >>>>>>>>>> InitFinalizeMain >>>>>>>>>> fd1026 3677 13371 0 14:16:55 pts/2 0:00 grep java >>>>>>>>>> fd1026 3663 3661 0 14:09:12 pts/14 0:01 java -cp >>>>>>>>> /home/fd1026/work/skripte/master/parallel/prog/mpi/java:/usr/local/jun >>>>>>>>>> tyr java 380 /usr/local/gdb-7.6.1_64_gcc/bin/gdb >>>>>>>>>> GNU gdb (GDB) 7.6.1 >>>>>>>>>> ... >>>>>>>>>> (gdb) attach 3663 >>>>>>>>>> Attaching to process 3663 >>>>>>>>>> [New process 3663] >>>>>>>>>> Retry #1: >>>>>>>>>> Retry #2: >>>>>>>>>> Retry #3: >>>>>>>>>> Retry #4: >>>>>>>>>> 0x7eadcb04 in ?? () >>>>>>>>>> (gdb) symbol-file >>>>>>>>>> /usr/local/openmpi-1.9.0_64_gcc/lib64/libmpi_java.so >>>>>>>>>> Reading symbols from >>>>>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libmpi_java.so.0.0.0...done. >>>>>>>>>> (gdb) set var _dbg=0 >>>>>>>>>> No symbol "_dbg" in current context. >>>>>>>>>> (gdb) set var JNI_OnLoad::_dbg=0 >>>>>>>>>> No symbol "_dbg" in specified context. >>>>>>>>>> (gdb) set JNI_OnLoad::_dbg=0 >>>>>>>>>> No symbol "_dbg" in specified context. >>>>>>>>>> (gdb) info threads >>>>>>>>>> [New LWP 12] >>>>>>>>>> [New LWP 11] >>>>>>>>>> [New LWP 10] >>>>>>>>>> [New LWP 9] >>>>>>>>>> [New LWP 8] >>>>>>>>>> [New LWP 7] >>>>>>>>>> [New LWP 6] >>>>>>>>>> [New LWP 5] >>>>>>>>>> [New LWP 4] >>>>>>>>>> [New LWP 3] >>>>>>>>>> [New LWP 2] >>>>>>>>>> Id Target Id Frame >>>>>>>>>> 12 LWP 2 0x7eadc6b0 in ?? () >>>>>>>>>> 11 LWP 3 0x7eadcbb8 in ?? () >>>>>>>>>> 10 LWP 4 0x7eadcbb8 in ?? () >>>>>>>>>> 9 LWP 5 0x7eadcbb8 in ?? () >>>>>>>>>> 8 LWP 6 0x7eadcbb8 in ?? () >>>>>>>>>> 7 LWP 7 0x7eadcbb8 in ?? () >>>>>>>>>> 6 LWP 8 0x7ead8b0c in ?? () >>>>>>>>>> 5 LWP 9 0x7eadcbb8 in ?? () >>>>>>>>>> 4 LWP 10 0x7eadcbb8 in ?? () >>>>>>>>>> 3 LWP 11 0x7eadcbb8 in ?? () >>>>>>>>>> 2 LWP 12 0x7eadcbb8 in ?? () >>>>>>>>>> * 1 LWP 1 0x7eadcb04 in ?? () >>>>>>>>>> (gdb) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> It seems that "_dbg" is unknown and unavailable. >>>>>>>>>> >>>>>>>>>> tyr java 399 grep _dbg >>>>>>>>>> /export2/src/openmpi-1.9/openmpi-dev-124-g91e9686/ompi/mpi/java/c/* >>>>>>>>>> /export2/src/openmpi-1.9/openmpi-dev-124-g91e9686/ompi/mpi/java/c/mpi_MPI.c: >>>>>>>>>> volatile >>>>>>> int _dbg = 1; >>>>>>>>>> /export2/src/openmpi-1.9/openmpi-dev-124-g91e9686/ompi/mpi/java/c/mpi_MPI.c: >>>>>>>>>> while >>>>>>> (_dbg) poll(NULL, 0, 1); >>>>>>>>>> tyr java 400 nm /usr/local/openmpi-1.9.0_64_gcc/lib64/*.so | grep -i >>>>>>>>>> _dbg >>>>>>>>>> tyr java 401 nm /usr/local/openmpi-1.9.0_64_gcc/lib64/*.so | grep -i >>>>>>>>>> JNI_OnLoad >>>>>>>>>> [1057] | 139688| 444|FUNC |GLOB |0 >>>>>>>>>> |11 |JNI_OnLoad >>>>>>>>>> tyr java 402 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> How can I set _dbg to zero to continue mpiexec? I also tried to >>>>>>>>>> set a breakpoint for function JNI_OnLoad, but it seems, that the >>>>>>>>>> function isn't called before SIGSEGV. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> tyr java 177 unsetenv OMPI_ATTACH >>>>>>>>>> tyr java 178 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec >>>>>>>>>> GNU gdb (GDB) 7.6.1 >>>>>>>>>> ... >>>>>>>>>> (gdb) b mpi_MPI.c:JNI_OnLoad >>>>>>>>>> No source file named mpi_MPI.c. >>>>>>>>>> Make breakpoint pending on future shared library load? (y or [n]) y >>>>>>>>>> >>>>>>>>>> Breakpoint 1 (mpi_MPI.c:JNI_OnLoad) pending. >>>>>>>>>> (gdb) run -np 1 java InitFinalizeMain >>>>>>>>>> Starting program: /usr/local/openmpi-1.9.0_64_gcc/bin/mpiexec -np 1 >>>>>>>>>> java InitFinalizeMain >>>>>>>>>> [Thread debugging using libthread_db enabled] >>>>>>>>>> [New Thread 1 (LWP 1)] >>>>>>>>>> [New LWP 2 ] >>>>>>>>>> # >>>>>>>>>> # A fatal error has been detected by the Java Runtime Environment: >>>>>>>>>> # >>>>>>>>>> # SIGSEGV (0xb) at pc=0xffffffff7ea3c7f0, pid=3518, tid=2 >>>>>>>>>> ... >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> tyr java 381 cat InitFinalizeMain.java >>>>>>>>>> import mpi.*; >>>>>>>>>> >>>>>>>>>> public class InitFinalizeMain >>>>>>>>>> { >>>>>>>>>> public static void main (String args[]) throws MPIException >>>>>>>>>> { >>>>>>>>>> MPI.Init (args); >>>>>>>>>> System.out.print ("Hello!\n"); >>>>>>>>>> MPI.Finalize (); >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> SIGSEGV happens in MPI.Init(args), because I can print a message >>>>>>>>>> before I call the method. >>>>>>>>>> >>>>>>>>>> tyr java 192 unsetenv OMPI_ATTACH >>>>>>>>>> tyr java 193 mpijavac InitFinalizeMain.java >>>>>>>>>> tyr java 194 mpiexec -np 1 java InitFinalizeMain >>>>>>>>>> Before MPI.Init() >>>>>>>>>> # >>>>>>>>>> # A fatal error has been detected by the Java Runtime Environment: >>>>>>>>>> # >>>>>>>>>> # SIGSEGV (0xb) at pc=0xffffffff7ea3c7f0, pid=3697, tid=2 >>>>>>>>>> ... >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Any ideas, how I can continue? I couldn't find a C function for >>>>>>>>>> MPI.Init() in a C file. Do you know, which function is called first, >>>>>>>>>> so that I can set a breakpoint? By the way, I get the same error >>>>>>>>>> for Solaris 10 x86_64. >>>>>>>>>> >>>>>>>>>> tyr java 388 ssh sunpc1 >>>>>>>>>> ... >>>>>>>>>> sunpc1 java 106 mpijavac InitFinalizeMain.java >>>>>>>>>> sunpc1 java 107 uname -a >>>>>>>>>> SunOS sunpc1 5.10 Generic_147441-21 i86pc i386 i86pc Solaris >>>>>>>>>> sunpc1 java 108 isainfo -k >>>>>>>>>> amd64 >>>>>>>>>> sunpc1 java 109 mpiexec -np 1 java InitFinalizeMain >>>>>>>>>> # >>>>>>>>>> # A fatal error has been detected by the Java Runtime Environment: >>>>>>>>>> # >>>>>>>>>> # SIGSEGV (0xb) at pc=0xfffffd7fff1d77f0, pid=20256, tid=2 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thank you very much for any help in advance. >>>>>>>>>> >>>>>>>>>> Kind regards >>>>>>>>>> >>>>>>>>>> Siegmar >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> thank you very much for your help. >>>>>>>>>>> >>>>>>>>>>>> how did you configure openmpi ? which java version did you use ? >>>>>>>>>>>> >>>>>>>>>>>> i just found a regression and you currently have to explicitly add >>>>>>>>>>>> CFLAGS=-D_REENTRANT CPPFLAGS=-D_REENTRANT >>>>>>>>>>>> to your configure command line >>>>>>>>>>> I added "-D_REENTRANT" to my command. >>>>>>>>>>> >>>>>>>>>>> ../openmpi-dev-124-g91e9686/configure >>>>>>>>>>> --prefix=/usr/local/openmpi-1.9.0_64_gcc \ >>>>>>>>>>> --libdir=/usr/local/openmpi-1.9.0_64_gcc/lib64 \ >>>>>>>>>>> --with-jdk-bindir=/usr/local/jdk1.8.0/bin \ >>>>>>>>>>> --with-jdk-headers=/usr/local/jdk1.8.0/include \ >>>>>>>>>>> JAVA_HOME=/usr/local/jdk1.8.0 \ >>>>>>>>>>> LDFLAGS="-m64" CC="gcc" CXX="g++" FC="gfortran" \ >>>>>>>>>>> CFLAGS="-m64 -D_REENTRANT" CXXFLAGS="-m64" FCFLAGS="-m64" \ >>>>>>>>>>> CPP="cpp" CXXCPP="cpp" \ >>>>>>>>>>> CPPFLAGS="-D_REENTRANT" CXXCPPFLAGS="" \ >>>>>>>>>>> --enable-mpi-cxx \ >>>>>>>>>>> --enable-cxx-exceptions \ >>>>>>>>>>> --enable-mpi-java \ >>>>>>>>>>> --enable-heterogeneous \ >>>>>>>>>>> --enable-mpi-thread-multiple \ >>>>>>>>>>> --with-threads=posix \ >>>>>>>>>>> --with-hwloc=internal \ >>>>>>>>>>> --without-verbs \ >>>>>>>>>>> --with-wrapper-cflags="-std=c11 -m64" \ >>>>>>>>>>> --enable-debug \ >>>>>>>>>>> |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_gcc >>>>>>>>>>> >>>>>>>>>>> I use Java 8. >>>>>>>>>>> >>>>>>>>>>> tyr openmpi-1.9 112 java -version >>>>>>>>>>> java version "1.8.0" >>>>>>>>>>> Java(TM) SE Runtime Environment (build 1.8.0-b132) >>>>>>>>>>> Java HotSpot(TM) 64-Bit Server VM (build 25.0-b70, mixed mode) >>>>>>>>>>> tyr openmpi-1.9 113 >>>>>>>>>>> >>>>>>>>>>> Unfortunately I still get a SIGSEGV with openmpi-dev-124-g91e9686. >>>>>>>>>>> I have applied your patch and will try to debug my small Java >>>>>>>>>>> program tomorrow or next week and then let you know the result. >>>>>>>>>> _______________________________________________ >>>>>>>>>> users mailing list >>>>>>>>>> us...@open-mpi.org >>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>> Link to this post: >>>>>>>>>> http://www.open-mpi.org/community/lists/users/2014/10/25581.php >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> Link to this post: >>>>>>>>> http://www.open-mpi.org/community/lists/users/2014/10/25582.php >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> Link to this post: >>>>>>> http://www.open-mpi.org/community/lists/users/2014/10/25584.php >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/users/2014/10/25585.php >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2014/10/25586.php >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2014/10/25587.php >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/10/25588.php >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/10/25589.php > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/10/25592.php