Hi Siegmar, > I think that it must have to do with MPI, because everything > works fine on Linux and my Java program works fine with an older > MPI version (openmpi-1.8.2a1r31804) as well.
Yes. I also think it must have to do with MPI. But java process side, not mpiexec process side. When you run Java MPI program via mpiexec, a mpiexec process process launch a java process. When the java process (your Java program) calls a MPI method, native part (written in C/C++) of the MPI library is called. It runs in java process, not in mpiexec process. I suspect that part. > On Solaris things are different. Are you saying the following difference? After this line, > 881 ORTE_ACTIVATE_JOB_STATE(jdata, ORTE_JOB_STATE_INIT); Linux shows > orte_job_state_to_str (state=1) > at ../../openmpi-dev-124-g91e9686/orte/util/error_strings.c:217 > 217 switch(state) { but Solaris shows > orte_util_print_name_args (name=0x100118380 <orte_process_info+104>) > at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:122 > 122 if (NULL == name) { Each macro is defined as: #define ORTE_ACTIVATE_JOB_STATE(j, s) \ do { \ orte_job_t *shadow=(j); \ opal_output_verbose(1, orte_state_base_framework.framework_output, \ "%s ACTIVATE JOB %s STATE %s AT %s:%d", \ ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), \ (NULL == shadow) ? "NULL" : \ ORTE_JOBID_PRINT(shadow->jobid), \ orte_job_state_to_str((s)), \ __FILE__, __LINE__); \ orte_state.activate_job_state(shadow, (s)); \ } while(0); #define ORTE_NAME_PRINT(n) \ orte_util_print_name_args(n) #define ORTE_JOBID_PRINT(n) \ orte_util_print_jobids(n) I'm not sure, but I think the gdb on Solaris steps into orte_util_print_name_args, but gdb on Linux doesn't step into orte_util_print_name_args and orte_util_print_jobids for some reason, or orte_job_state_to_str is evaluated before them. So I think it's not an important difference. You showed the following lines. > > > orterun (argc=5, argv=0xffffffff7fffe0d8) > > > at > > > ../../../../openmpi-dev-124-g91e9686/orte/tools/orterun/orterun.c:1084 > > > 1084 while (orte_event_base_active) { > > > (gdb) > > > 1085 opal_event_loop(orte_event_base, OPAL_EVLOOP_ONCE); > > > (gdb) I'm not familiar with this code but I think this part (in mpiexec process) is only waiting the java process to terminate (normally or abnormally). So I think the problem is not in a mpiexec process but in a java process. Regards, Takahiro > Hi Takahiro, > > > mpiexec and java run as distinct processes. Your JRE message > > says java process raises SEGV. So you should trace the java > > process, not the mpiexec process. And more, your JRE message > > says the crash happened outside the Java Virtual Machine in > > native code. So usual Java program debugger is useless. > > You should trace native code part of the java process. > > Unfortunately I don't know how to debug such one. > > I think that it must have to do with MPI, because everything > works fine on Linux and my Java program works fine with an older > MPI version (openmpi-1.8.2a1r31804) as well. > > linpc1 x 112 mpiexec -np 1 java InitFinalizeMain > Hello! > linpc1 x 113 > > Therefore I single stepped through the program on Linux as well > and found a difference launching the process. On Linux I get the > following sequence. > > Breakpoint 1, rsh_launch (jdata=0x614aa0) > at > ../../../../../openmpi-dev-124-g91e9686/orte/mca/plm/rsh/plm_rsh_module.c:876 > 876 if (ORTE_FLAG_TEST(jdata, ORTE_JOB_FLAG_RESTART)) { > (gdb) s > 881 ORTE_ACTIVATE_JOB_STATE(jdata, ORTE_JOB_STATE_INIT); > (gdb) s > orte_job_state_to_str (state=1) > at ../../openmpi-dev-124-g91e9686/orte/util/error_strings.c:217 > 217 switch(state) { > (gdb) > 221 return "PENDING INIT"; > (gdb) > 317 } > (gdb) > orte_util_print_jobids (job=4294967295) > at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:170 > 170 ptr = get_print_name_buffer(); > (gdb) > > > > On Solaris things are different. > > Breakpoint 1, rsh_launch (jdata=0x100125250) > at > ../../../../../openmpi-dev-124-g91e9686/orte/mca/plm/rsh/plm_rsh_module.c:876 > 876 if (ORTE_FLAG_TEST(jdata, ORTE_JOB_FLAG_RESTART)) { > (gdb) s > 881 ORTE_ACTIVATE_JOB_STATE(jdata, ORTE_JOB_STATE_INIT); > (gdb) s > orte_util_print_name_args (name=0x100118380 <orte_process_info+104>) > at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:122 > 122 if (NULL == name) { > (gdb) > 142 job = orte_util_print_jobids(name->jobid); > (gdb) > orte_util_print_jobids (job=2673410048) > at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:170 > 170 ptr = get_print_name_buffer(); > (gdb) > > > > Is this normal or is it the reason for the crash on Solaris? > > > Kind regards > > Siegmar > > > > > > > > > The log file output by JRE may help you. > > > # An error report file with more information is saved as: > > > # > > > /home/fd1026/work/skripte/master/parallel/prog/mpi/java/hs_err_pid13080.log > > > > Regards, > > Takahiro > > > > > Hi, > > > > > > I installed openmpi-dev-124-g91e9686 on Solaris 10 Sparc with > > > gcc-4.9.1 to track down the error with my small Java program. > > > I started single stepping in orterun.c at line 1081 and > > > continued until I got the segmentation fault. I get > > > "jdata = 0x0" in version openmpi-1.8.2a1r31804, which is the > > > last one which works with Java in my environment, while I get > > > "jdata = 0x100125250" in this version. Unfortunately I don't > > > know which files or variables are important to look at. Perhaps > > > somebody can look at the following lines of code and tell me, > > > which information I should provide to solve the problem. I know > > > that Solaris isn't any longer on your list of supported systems, > > > but perhaps we can get it working again, if you tell me what > > > you need and I do the debugging. > > > > > > /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec > > > GNU gdb (GDB) 7.6.1 > > > ... > > > (gdb) run -np 1 java InitFinalizeMain > > > Starting program: /usr/local/openmpi-1.9.0_64_gcc/bin/mpiexec \ > > > -np 1 java InitFinalizeMain > > > [Thread debugging using libthread_db enabled] > > > [New Thread 1 (LWP 1)] > > > [New LWP 2 ] > > > # > > > # A fatal error has been detected by the Java Runtime Environment: > > > # > > > # SIGSEGV (0xb) at pc=0xffffffff7ea3c7f0, pid=13064, tid=2 > > > ... > > > [LWP 2 exited] > > > [New Thread 2 ] > > > [Switching to Thread 1 (LWP 1)] > > > sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be > > > found to satisfy query > > > (gdb) thread 1 > > > [Switching to thread 1 (LWP 1 )] > > > #0 0xffffffff7f6173d0 in rtld_db_dlactivity () from > > > /usr/lib/sparcv9/ld.so.1 > > > (gdb) b orterun.c:1081 > > > Breakpoint 1 at 0x1000070dc: file > > > ../../../../openmpi-dev-124-g91e9686/orte/tools/orterun/orterun.c, line > > > 1081. > > > (gdb) r > > > The program being debugged has been started already. > > > Start it from the beginning? (y or n) y > > > > > > Starting program: /usr/local/openmpi-1.9.0_64_gcc/bin/mpiexec -np 1 java > > > InitFinalizeMain > > > [Thread debugging using libthread_db enabled] > > > [New Thread 1 (LWP 1)] > > > [New LWP 2 ] > > > [Switching to Thread 1 (LWP 1)] > > > > > > Breakpoint 1, orterun (argc=5, argv=0xffffffff7fffe0d8) > > > at > > > ../../../../openmpi-dev-124-g91e9686/orte/tools/orterun/orterun.c:1081 > > > 1081 rc = orte_plm.spawn(jdata); > > > (gdb) print jdata > > > $1 = (orte_job_t *) 0x100125250 > > > (gdb) s > > > rsh_launch (jdata=0x100125250) > > > at > > > ../../../../../openmpi-dev-124-g91e9686/orte/mca/plm/rsh/plm_rsh_module.c:876 > > > 876 if (ORTE_FLAG_TEST(jdata, ORTE_JOB_FLAG_RESTART)) { > > > (gdb) s > > > 881 ORTE_ACTIVATE_JOB_STATE(jdata, ORTE_JOB_STATE_INIT); > > > (gdb) > > > orte_util_print_name_args (name=0x100118380 <orte_process_info+104>) > > > at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:122 > > > 122 if (NULL == name) { > > > (gdb) > > > 142 job = orte_util_print_jobids(name->jobid); > > > (gdb) > > > orte_util_print_jobids (job=2502885376) at > > > ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:170 > > > 170 ptr = get_print_name_buffer(); > > > (gdb) > > > get_print_name_buffer () at > > > ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:92 > > > 92 if (!fns_init) { > > > (gdb) > > > 101 ret = opal_tsd_getspecific(print_args_tsd_key, (void**)&ptr); > > > (gdb) > > > opal_tsd_getspecific (key=1, valuep=0xffffffff7fffd990) > > > at ../../openmpi-dev-124-g91e9686/opal/threads/tsd.h:163 > > > 163 *valuep = pthread_getspecific(key); > > > (gdb) > > > 164 return OPAL_SUCCESS; > > > (gdb) > > > 165 } > > > (gdb) > > > get_print_name_buffer () at > > > ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:102 > > > 102 if (OPAL_SUCCESS != ret) return NULL; > > > (gdb) > > > 104 if (NULL == ptr) { > > > (gdb) > > > 113 return (orte_print_args_buffers_t*) ptr; > > > (gdb) > > > 114 } > > > (gdb) > > > orte_util_print_jobids (job=2502885376) at > > > ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:172 > > > 172 if (NULL == ptr) { > > > (gdb) > > > 178 if (ORTE_PRINT_NAME_ARG_NUM_BUFS == ptr->cntr) { > > > (gdb) > > > 182 if (ORTE_JOBID_INVALID == job) { > > > (gdb) > > > 184 } else if (ORTE_JOBID_WILDCARD == job) { > > > (gdb) > > > 187 tmp1 = ORTE_JOB_FAMILY((unsigned long)job); > > > (gdb) > > > 188 tmp2 = ORTE_LOCAL_JOBID((unsigned long)job); > > > (gdb) > > > 189 snprintf(ptr->buffers[ptr->cntr++], > > > (gdb) > > > 193 return ptr->buffers[ptr->cntr-1]; > > > (gdb) > > > 194 } > > > (gdb) > > > orte_util_print_name_args (name=0x100118380 <orte_process_info+104>) > > > at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:143 > > > 143 vpid = orte_util_print_vpids(name->vpid); > > > (gdb) > > > orte_util_print_vpids (vpid=0) at > > > ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:260 > > > 260 ptr = get_print_name_buffer(); > > > (gdb) > > > get_print_name_buffer () at > > > ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:92 > > > 92 if (!fns_init) { > > > (gdb) > > > 101 ret = opal_tsd_getspecific(print_args_tsd_key, (void**)&ptr); > > > (gdb) > > > opal_tsd_getspecific (key=1, valuep=0xffffffff7fffd9a0) > > > at ../../openmpi-dev-124-g91e9686/opal/threads/tsd.h:163 > > > 163 *valuep = pthread_getspecific(key); > > > (gdb) > > > 164 return OPAL_SUCCESS; > > > (gdb) > > > 165 } > > > (gdb) > > > get_print_name_buffer () at > > > ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:102 > > > 102 if (OPAL_SUCCESS != ret) return NULL; > > > (gdb) > > > 104 if (NULL == ptr) { > > > (gdb) > > > 113 return (orte_print_args_buffers_t*) ptr; > > > (gdb) > > > 114 } > > > (gdb) > > > orte_util_print_vpids (vpid=0) at > > > ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:262 > > > 262 if (NULL == ptr) { > > > (gdb) > > > 268 if (ORTE_PRINT_NAME_ARG_NUM_BUFS == ptr->cntr) { > > > (gdb) > > > 272 if (ORTE_VPID_INVALID == vpid) { > > > (gdb) > > > 274 } else if (ORTE_VPID_WILDCARD == vpid) { > > > (gdb) > > > 277 snprintf(ptr->buffers[ptr->cntr++], > > > (gdb) > > > 281 return ptr->buffers[ptr->cntr-1]; > > > (gdb) > > > 282 } > > > (gdb) > > > orte_util_print_name_args (name=0x100118380 <orte_process_info+104>) > > > at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:146 > > > 146 ptr = get_print_name_buffer(); > > > (gdb) > > > get_print_name_buffer () at > > > ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:92 > > > 92 if (!fns_init) { > > > (gdb) > > > 101 ret = opal_tsd_getspecific(print_args_tsd_key, (void**)&ptr); > > > (gdb) > > > opal_tsd_getspecific (key=1, valuep=0xffffffff7fffda60) > > > at ../../openmpi-dev-124-g91e9686/opal/threads/tsd.h:163 > > > 163 *valuep = pthread_getspecific(key); > > > (gdb) > > > 164 return OPAL_SUCCESS; > > > (gdb) > > > 165 } > > > (gdb) > > > get_print_name_buffer () at > > > ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:102 > > > 102 if (OPAL_SUCCESS != ret) return NULL; > > > (gdb) > > > 104 if (NULL == ptr) { > > > (gdb) > > > 113 return (orte_print_args_buffers_t*) ptr; > > > (gdb) > > > 114 } > > > (gdb) > > > orte_util_print_name_args (name=0x100118380 <orte_process_info+104>) > > > at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:148 > > > 148 if (NULL == ptr) { > > > (gdb) > > > 154 if (ORTE_PRINT_NAME_ARG_NUM_BUFS == ptr->cntr) { > > > (gdb) > > > 158 snprintf(ptr->buffers[ptr->cntr++], > > > (gdb) > > > 162 return ptr->buffers[ptr->cntr-1]; > > > (gdb) > > > 163 } > > > (gdb) > > > orte_util_print_jobids (job=4294967295) at > > > ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:170 > > > 170 ptr = get_print_name_buffer(); > > > (gdb) > > > get_print_name_buffer () at > > > ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:92 > > > 92 if (!fns_init) { > > > (gdb) > > > 101 ret = opal_tsd_getspecific(print_args_tsd_key, (void**)&ptr); > > > (gdb) > > > opal_tsd_getspecific (key=1, valuep=0xffffffff7fffda60) > > > at ../../openmpi-dev-124-g91e9686/opal/threads/tsd.h:163 > > > 163 *valuep = pthread_getspecific(key); > > > (gdb) > > > 164 return OPAL_SUCCESS; > > > (gdb) > > > 165 } > > > (gdb) > > > get_print_name_buffer () at > > > ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:102 > > > 102 if (OPAL_SUCCESS != ret) return NULL; > > > (gdb) > > > 104 if (NULL == ptr) { > > > (gdb) > > > 113 return (orte_print_args_buffers_t*) ptr; > > > (gdb) > > > 114 } > > > (gdb) > > > orte_util_print_jobids (job=4294967295) at > > > ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:172 > > > 172 if (NULL == ptr) { > > > (gdb) > > > 178 if (ORTE_PRINT_NAME_ARG_NUM_BUFS == ptr->cntr) { > > > (gdb) > > > 182 if (ORTE_JOBID_INVALID == job) { > > > (gdb) > > > 183 snprintf(ptr->buffers[ptr->cntr++], > > > ORTE_PRINT_NAME_ARGS_MAX_SIZE, "[INVALID]"); > > > (gdb) > > > 193 return ptr->buffers[ptr->cntr-1]; > > > (gdb) > > > 194 } > > > (gdb) > > > orte_job_state_to_str (state=1) at > > > ../../openmpi-dev-124-g91e9686/orte/util/error_strings.c:217 > > > 217 switch(state) { > > > (gdb) > > > 221 return "PENDING INIT"; > > > (gdb) > > > 317 } > > > (gdb) > > > opal_output_verbose (level=1, output_id=0, > > > format=0xffffffff7f14dd98 <orte_job_states> > > > "\336\257\276\355\336\257\276\355") > > > at ../../../openmpi-dev-124-g91e9686/opal/util/output.c:373 > > > 373 va_start(arglist, format); > > > (gdb) > > > 369 { > > > (gdb) > > > 370 if (output_id >= 0 && output_id < OPAL_OUTPUT_MAX_STREAMS && > > > (gdb) > > > 377 } > > > (gdb) > > > orte_state_base_activate_job_state (jdata=0x100125250, state=1) > > > at > > > ../../../../openmpi-dev-124-g91e9686/orte/mca/state/base/state_base_fns.c:33 > > > 33 opal_list_item_t *itm, *any=NULL, *error=NULL; > > > (gdb) > > > 37 for (itm = opal_list_get_first(&orte_job_states); > > > (gdb) > > > opal_list_get_first (list=0xffffffff7f14dd98 <orte_job_states>) > > > at ../../../../openmpi-dev-124-g91e9686/opal/class/opal_list.h:320 > > > 320 opal_list_item_t* item = > > > (opal_list_item_t*)list->opal_list_sentinel.opal_list_next; > > > (gdb) > > > 324 assert(1 == item->opal_list_item_refcount); > > > (gdb) > > > 325 assert( list == item->opal_list_item_belong_to ); > > > (gdb) > > > 328 return item; > > > (gdb) > > > 329 } > > > (gdb) > > > orte_state_base_activate_job_state (jdata=0x100125250, state=1) > > > at > > > ../../../../openmpi-dev-124-g91e9686/orte/mca/state/base/state_base_fns.c:38 > > > 38 itm != opal_list_get_end(&orte_job_states); > > > (gdb) > > > opal_list_get_end (list=0xffffffff7f14dd98 <orte_job_states>) > > > at ../../../../openmpi-dev-124-g91e9686/opal/class/opal_list.h:399 > > > 399 return &(list->opal_list_sentinel); > > > (gdb) > > > 400 } > > > (gdb) > > > orte_state_base_activate_job_state (jdata=0x100125250, state=1) > > > at > > > ../../../../openmpi-dev-124-g91e9686/orte/mca/state/base/state_base_fns.c:37 > > > 37 for (itm = opal_list_get_first(&orte_job_states); > > > (gdb) > > > 40 s = (orte_state_t*)itm; > > > (gdb) > > > 41 if (s->job_state == ORTE_JOB_STATE_ANY) { > > > (gdb) > > > 45 if (s->job_state == ORTE_JOB_STATE_ERROR) { > > > (gdb) > > > 48 if (s->job_state == state) { > > > (gdb) > > > 49 OPAL_OUTPUT_VERBOSE((1, > > > orte_state_base_framework.framework_output, > > > (gdb) > > > orte_util_print_name_args (name=0x100118380 <orte_process_info+104>) > > > at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:122 > > > 122 if (NULL == name) { > > > (gdb) > > > 142 job = orte_util_print_jobids(name->jobid); > > > (gdb) > > > orte_util_print_jobids (job=2502885376) at > > > ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:170 > > > 170 ptr = get_print_name_buffer(); > > > (gdb) > > > get_print_name_buffer () at > > > ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:92 > > > 92 if (!fns_init) { > > > (gdb) > > > 101 ret = opal_tsd_getspecific(print_args_tsd_key, (void**)&ptr); > > > (gdb) > > > opal_tsd_getspecific (key=1, valuep=0xffffffff7fffd880) > > > at ../../openmpi-dev-124-g91e9686/opal/threads/tsd.h:163 > > > 163 *valuep = pthread_getspecific(key); > > > (gdb) > > > 164 return OPAL_SUCCESS; > > > (gdb) > > > 165 } > > > (gdb) > > > get_print_name_buffer () at > > > ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:102 > > > 102 if (OPAL_SUCCESS != ret) return NULL; > > > (gdb) > > > 104 if (NULL == ptr) { > > > (gdb) > > > 113 return (orte_print_args_buffers_t*) ptr; > > > (gdb) > > > 114 } > > > (gdb) > > > orte_util_print_jobids (job=2502885376) at > > > ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:172 > > > 172 if (NULL == ptr) { > > > (gdb) > > > 178 if (ORTE_PRINT_NAME_ARG_NUM_BUFS == ptr->cntr) { > > > (gdb) > > > 182 if (ORTE_JOBID_INVALID == job) { > > > (gdb) > > > 184 } else if (ORTE_JOBID_WILDCARD == job) { > > > (gdb) > > > 187 tmp1 = ORTE_JOB_FAMILY((unsigned long)job); > > > (gdb) > > > 188 tmp2 = ORTE_LOCAL_JOBID((unsigned long)job); > > > (gdb) > > > 189 snprintf(ptr->buffers[ptr->cntr++], > > > (gdb) > > > 193 return ptr->buffers[ptr->cntr-1]; > > > (gdb) > > > 194 } > > > (gdb) > > > orte_util_print_name_args (name=0x100118380 <orte_process_info+104>) > > > at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:143 > > > 143 vpid = orte_util_print_vpids(name->vpid); > > > (gdb) > > > orte_util_print_vpids (vpid=0) at > > > ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:260 > > > 260 ptr = get_print_name_buffer(); > > > (gdb) > > > get_print_name_buffer () at > > > ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:92 > > > 92 if (!fns_init) { > > > (gdb) > > > 101 ret = opal_tsd_getspecific(print_args_tsd_key, (void**)&ptr); > > > (gdb) > > > opal_tsd_getspecific (key=1, valuep=0xffffffff7fffd890) > > > at ../../openmpi-dev-124-g91e9686/opal/threads/tsd.h:163 > > > 163 *valuep = pthread_getspecific(key); > > > (gdb) > > > 164 return OPAL_SUCCESS; > > > (gdb) > > > 165 } > > > (gdb) > > > get_print_name_buffer () at > > > ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:102 > > > 102 if (OPAL_SUCCESS != ret) return NULL; > > > (gdb) > > > 104 if (NULL == ptr) { > > > (gdb) > > > 113 return (orte_print_args_buffers_t*) ptr; > > > (gdb) > > > 114 } > > > (gdb) > > > orte_util_print_vpids (vpid=0) at > > > ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:262 > > > 262 if (NULL == ptr) { > > > (gdb) > > > 268 if (ORTE_PRINT_NAME_ARG_NUM_BUFS == ptr->cntr) { > > > (gdb) > > > 272 if (ORTE_VPID_INVALID == vpid) { > > > (gdb) > > > 274 } else if (ORTE_VPID_WILDCARD == vpid) { > > > (gdb) > > > 277 snprintf(ptr->buffers[ptr->cntr++], > > > (gdb) > > > 281 return ptr->buffers[ptr->cntr-1]; > > > (gdb) > > > 282 } > > > (gdb) > > > orte_util_print_name_args (name=0x100118380 <orte_process_info+104>) > > > at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:146 > > > 146 ptr = get_print_name_buffer(); > > > (gdb) > > > get_print_name_buffer () at > > > ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:92 > > > 92 if (!fns_init) { > > > (gdb) > > > 101 ret = opal_tsd_getspecific(print_args_tsd_key, (void**)&ptr); > > > (gdb) > > > opal_tsd_getspecific (key=1, valuep=0xffffffff7fffd950) > > > at ../../openmpi-dev-124-g91e9686/opal/threads/tsd.h:163 > > > 163 *valuep = pthread_getspecific(key); > > > (gdb) > > > 164 return OPAL_SUCCESS; > > > (gdb) > > > 165 } > > > (gdb) > > > get_print_name_buffer () at > > > ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:102 > > > 102 if (OPAL_SUCCESS != ret) return NULL; > > > (gdb) > > > 104 if (NULL == ptr) { > > > (gdb) > > > 113 return (orte_print_args_buffers_t*) ptr; > > > (gdb) > > > 114 } > > > (gdb) > > > orte_util_print_name_args (name=0x100118380 <orte_process_info+104>) > > > at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:148 > > > 148 if (NULL == ptr) { > > > (gdb) > > > 154 if (ORTE_PRINT_NAME_ARG_NUM_BUFS == ptr->cntr) { > > > (gdb) > > > 158 snprintf(ptr->buffers[ptr->cntr++], > > > (gdb) > > > 162 return ptr->buffers[ptr->cntr-1]; > > > (gdb) > > > 163 } > > > (gdb) > > > orte_util_print_jobids (job=4294967295) at > > > ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:170 > > > 170 ptr = get_print_name_buffer(); > > > (gdb) > > > get_print_name_buffer () at > > > ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:92 > > > 92 if (!fns_init) { > > > (gdb) > > > 101 ret = opal_tsd_getspecific(print_args_tsd_key, (void**)&ptr); > > > (gdb) > > > opal_tsd_getspecific (key=1, valuep=0xffffffff7fffd950) > > > at ../../openmpi-dev-124-g91e9686/opal/threads/tsd.h:163 > > > 163 *valuep = pthread_getspecific(key); > > > (gdb) > > > 164 return OPAL_SUCCESS; > > > (gdb) > > > 165 } > > > (gdb) > > > get_print_name_buffer () at > > > ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:102 > > > 102 if (OPAL_SUCCESS != ret) return NULL; > > > (gdb) > > > 104 if (NULL == ptr) { > > > (gdb) > > > 113 return (orte_print_args_buffers_t*) ptr; > > > (gdb) > > > 114 } > > > (gdb) > > > orte_util_print_jobids (job=4294967295) at > > > ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:172 > > > 172 if (NULL == ptr) { > > > (gdb) > > > 178 if (ORTE_PRINT_NAME_ARG_NUM_BUFS == ptr->cntr) { > > > (gdb) > > > 182 if (ORTE_JOBID_INVALID == job) { > > > (gdb) > > > 183 snprintf(ptr->buffers[ptr->cntr++], > > > ORTE_PRINT_NAME_ARGS_MAX_SIZE, "[INVALID]"); > > > (gdb) > > > 193 return ptr->buffers[ptr->cntr-1]; > > > (gdb) > > > 194 } > > > (gdb) > > > orte_job_state_to_str (state=1) at > > > ../../openmpi-dev-124-g91e9686/orte/util/error_strings.c:217 > > > 217 switch(state) { > > > (gdb) > > > 221 return "PENDING INIT"; > > > (gdb) > > > 317 } > > > (gdb) > > > opal_output_verbose (level=1, output_id=-1, format=0x1 <Address 0x1 out > > > of > > > bounds>) > > > at ../../../openmpi-dev-124-g91e9686/opal/util/output.c:373 > > > 373 va_start(arglist, format); > > > (gdb) > > > 369 { > > > (gdb) > > > 370 if (output_id >= 0 && output_id < OPAL_OUTPUT_MAX_STREAMS && > > > (gdb) > > > 377 } > > > (gdb) > > > orte_state_base_activate_job_state (jdata=0x100125250, state=1) > > > at > > > ../../../../openmpi-dev-124-g91e9686/orte/mca/state/base/state_base_fns.c:54 > > > 54 if (NULL == s->cbfunc) { > > > (gdb) > > > 62 caddy = OBJ_NEW(orte_state_caddy_t); > > > (gdb) > > > opal_obj_new_debug (type=0xffffffff7f14c7d8 <orte_state_caddy_t_class>, > > > file=0xffffffff7f034c08 > > > "../../../../openmpi-dev-124-g91e9686/orte/mca/state/base/state_base_fns.c", > > > > > > line=62) at > > > ../../../../openmpi-dev-124-g91e9686/opal/class/opal_object.h:249 > > > 249 opal_object_t* object = opal_obj_new(type); > > > (gdb) > > > opal_obj_new (cls=0xffffffff7f14c7d8 <orte_state_caddy_t_class>) > > > at ../../../../openmpi-dev-124-g91e9686/opal/class/opal_object.h:465 > > > 465 assert(cls->cls_sizeof >= sizeof(opal_object_t)); > > > (gdb) > > > 470 object = (opal_object_t *) malloc(cls->cls_sizeof); > > > (gdb) > > > 472 if (0 == cls->cls_initialized) { > > > (gdb) > > > 473 opal_class_initialize(cls); > > > (gdb) > > > opal_class_initialize (cls=0xffffffff7f14c7d8 <orte_state_caddy_t_class>) > > > at ../../openmpi-dev-124-g91e9686/opal/class/opal_object.c:79 > > > 79 assert(cls); > > > (gdb) > > > 84 if (1 == cls->cls_initialized) { > > > (gdb) > > > 87 opal_atomic_lock(&class_lock); > > > (gdb) > > > opal_atomic_lock (lock=0xffffffff7ee89bf0 <class_lock>) > > > at > > > ../../openmpi-dev-124-g91e9686/opal/include/opal/sys/atomic_impl.h:397 > > > 397 while( !opal_atomic_cmpset_acq_32( &(lock->u.lock), > > > (gdb) > > > opal_atomic_cmpset_acq_32 (addr=0xffffffff7ee89bf0 <class_lock>, > > > oldval=0, > > > newval=1) > > > at > > > ../../openmpi-dev-124-g91e9686/opal/include/opal/sys/sparcv9/atomic.h:107 > > > 107 rc = opal_atomic_cmpset_32(addr, oldval, newval); > > > (gdb) > > > opal_atomic_cmpset_32 (addr=0xffffffff7ee89bf0 <class_lock>, oldval=0, > > > newval=1) > > > at > > > ../../openmpi-dev-124-g91e9686/opal/include/opal/sys/sparcv9/atomic.h:93 > > > 93 int32_t ret = newval; > > > (gdb) > > > 95 __asm__ __volatile__("casa [%1] " ASI_P ", %2, %0" > > > (gdb) > > > 98 return (ret == oldval); > > > (gdb) > > > 99 } > > > (gdb) > > > opal_atomic_cmpset_acq_32 (addr=0xffffffff7ee89bf0 <class_lock>, > > > oldval=0, > > > newval=1) > > > at > > > ../../openmpi-dev-124-g91e9686/opal/include/opal/sys/sparcv9/atomic.h:108 > > > 108 opal_atomic_rmb(); > > > (gdb) > > > opal_atomic_rmb () at > > > ../../openmpi-dev-124-g91e9686/opal/include/opal/sys/sparcv9/atomic.h:63 > > > 63 MEMBAR("#LoadLoad"); > > > (gdb) > > > 64 } > > > (gdb) > > > opal_atomic_cmpset_acq_32 (addr=0xffffffff7ee89bf0 <class_lock>, > > > oldval=0, > > > newval=1) > > > at > > > ../../openmpi-dev-124-g91e9686/opal/include/opal/sys/sparcv9/atomic.h:110 > > > 110 return rc; > > > (gdb) > > > 111 } > > > (gdb) > > > opal_atomic_lock (lock=0xffffffff7ee89bf0 <class_lock>) > > > at > > > ../../openmpi-dev-124-g91e9686/opal/include/opal/sys/atomic_impl.h:403 > > > 403 } > > > (gdb) > > > opal_class_initialize (cls=0xffffffff7f14c7d8 <orte_state_caddy_t_class>) > > > at ../../openmpi-dev-124-g91e9686/opal/class/opal_object.c:93 > > > 93 if (1 == cls->cls_initialized) { > > > (gdb) > > > 103 cls->cls_depth = 0; > > > (gdb) > > > 104 cls_construct_array_count = 0; > > > (gdb) > > > 105 cls_destruct_array_count = 0; > > > (gdb) > > > 106 for (c = cls; c; c = c->cls_parent) { > > > (gdb) > > > 107 if( NULL != c->cls_construct ) { > > > (gdb) > > > 108 cls_construct_array_count++; > > > (gdb) > > > 110 if( NULL != c->cls_destruct ) { > > > (gdb) > > > 111 cls_destruct_array_count++; > > > (gdb) > > > 113 cls->cls_depth++; > > > (gdb) > > > 106 for (c = cls; c; c = c->cls_parent) { > > > (gdb) > > > 107 if( NULL != c->cls_construct ) { > > > (gdb) > > > 110 if( NULL != c->cls_destruct ) { > > > (gdb) > > > 113 cls->cls_depth++; > > > (gdb) > > > 106 for (c = cls; c; c = c->cls_parent) { > > > (gdb) > > > 122 (void > > > (**)(opal_object_t*))malloc((cls_construct_array_count + > > > (gdb) > > > 123 > > > cls_destruct_array_count + 2) > > > * > > > (gdb) > > > 122 (void > > > (**)(opal_object_t*))malloc((cls_construct_array_count + > > > (gdb) > > > 121 cls->cls_construct_array = > > > (gdb) > > > 125 if (NULL == cls->cls_construct_array) { > > > (gdb) > > > 130 cls->cls_construct_array + cls_construct_array_count + 1; > > > (gdb) > > > 129 cls->cls_destruct_array = > > > (gdb) > > > 136 cls_construct_array = cls->cls_construct_array + > > > cls_construct_array_count; > > > (gdb) > > > 137 cls_destruct_array = cls->cls_destruct_array; > > > (gdb) > > > 139 c = cls; > > > (gdb) > > > 140 *cls_construct_array = NULL; /* end marker for the > > > constructors */ > > > (gdb) > > > 141 for (i = 0; i < cls->cls_depth; i++) { > > > (gdb) > > > 142 if( NULL != c->cls_construct ) { > > > (gdb) > > > 143 --cls_construct_array; > > > (gdb) > > > 144 *cls_construct_array = c->cls_construct; > > > (gdb) > > > 146 if( NULL != c->cls_destruct ) { > > > (gdb) > > > 147 *cls_destruct_array = c->cls_destruct; > > > (gdb) > > > 148 cls_destruct_array++; > > > (gdb) > > > 150 c = c->cls_parent; > > > (gdb) > > > 141 for (i = 0; i < cls->cls_depth; i++) { > > > (gdb) > > > 142 if( NULL != c->cls_construct ) { > > > (gdb) > > > 146 if( NULL != c->cls_destruct ) { > > > (gdb) > > > 150 c = c->cls_parent; > > > (gdb) > > > 141 for (i = 0; i < cls->cls_depth; i++) { > > > (gdb) > > > 152 *cls_destruct_array = NULL; /* end marker for the > > > destructors */ > > > (gdb) > > > 154 cls->cls_initialized = 1; > > > (gdb) > > > 155 save_class(cls); > > > (gdb) > > > save_class (cls=0xffffffff7f14c7d8 <orte_state_caddy_t_class>) > > > at ../../openmpi-dev-124-g91e9686/opal/class/opal_object.c:188 > > > 188 if (num_classes >= max_classes) { > > > (gdb) > > > 189 expand_array(); > > > (gdb) > > > expand_array () at > > > ../../openmpi-dev-124-g91e9686/opal/class/opal_object.c:201 > > > 201 max_classes += increment; > > > (gdb) > > > 202 classes = (void**)realloc(classes, sizeof(opal_class_t*) * > > > max_classes); > > > (gdb) > > > 203 if (NULL == classes) { > > > (gdb) > > > 207 for (i = num_classes; i < max_classes; ++i) { > > > (gdb) > > > 208 classes[i] = NULL; > > > (gdb) > > > 207 for (i = num_classes; i < max_classes; ++i) { > > > (gdb) > > > 208 classes[i] = NULL; > > > (gdb) > > > 207 for (i = num_classes; i < max_classes; ++i) { > > > (gdb) > > > 208 classes[i] = NULL; > > > (gdb) > > > 207 for (i = num_classes; i < max_classes; ++i) { > > > (gdb) > > > 208 classes[i] = NULL; > > > (gdb) > > > 207 for (i = num_classes; i < max_classes; ++i) { > > > (gdb) > > > 208 classes[i] = NULL; > > > (gdb) > > > 207 for (i = num_classes; i < max_classes; ++i) { > > > (gdb) > > > 208 classes[i] = NULL; > > > (gdb) > > > 207 for (i = num_classes; i < max_classes; ++i) { > > > (gdb) > > > 208 classes[i] = NULL; > > > (gdb) > > > 207 for (i = num_classes; i < max_classes; ++i) { > > > (gdb) > > > 208 classes[i] = NULL; > > > (gdb) > > > 207 for (i = num_classes; i < max_classes; ++i) { > > > (gdb) > > > 208 classes[i] = NULL; > > > (gdb) > > > 207 for (i = num_classes; i < max_classes; ++i) { > > > (gdb) > > > 208 classes[i] = NULL; > > > (gdb) > > > 207 for (i = num_classes; i < max_classes; ++i) { > > > (gdb) > > > 210 } > > > (gdb) > > > save_class (cls=0xffffffff7f14c7d8 <orte_state_caddy_t_class>) > > > at ../../openmpi-dev-124-g91e9686/opal/class/opal_object.c:192 > > > 192 classes[num_classes] = cls->cls_construct_array; > > > (gdb) > > > 193 ++num_classes; > > > (gdb) > > > 194 } > > > (gdb) > > > opal_class_initialize (cls=0xffffffff7f14c7d8 <orte_state_caddy_t_class>) > > > at ../../openmpi-dev-124-g91e9686/opal/class/opal_object.c:159 > > > 159 opal_atomic_unlock(&class_lock); > > > (gdb) > > > opal_atomic_unlock (lock=0xffffffff7ee89bf0 <class_lock>) > > > at > > > ../../openmpi-dev-124-g91e9686/opal/include/opal/sys/atomic_impl.h:409 > > > 409 opal_atomic_wmb(); > > > (gdb) > > > opal_atomic_wmb () at > > > ../../openmpi-dev-124-g91e9686/opal/include/opal/sys/sparcv9/atomic.h:69 > > > 69 MEMBAR("#StoreStore"); > > > (gdb) > > > 70 } > > > (gdb) > > > opal_atomic_unlock (lock=0xffffffff7ee89bf0 <class_lock>) > > > at > > > ../../openmpi-dev-124-g91e9686/opal/include/opal/sys/atomic_impl.h:410 > > > 410 lock->u.lock=OPAL_ATOMIC_UNLOCKED; > > > (gdb) > > > 411 } > > > (gdb) > > > opal_class_initialize (cls=0xffffffff7f14c7d8 <orte_state_caddy_t_class>) > > > at ../../openmpi-dev-124-g91e9686/opal/class/opal_object.c:160 > > > 160 } > > > (gdb) > > > opal_obj_new (cls=0xffffffff7f14c7d8 <orte_state_caddy_t_class>) > > > at ../../../../openmpi-dev-124-g91e9686/opal/class/opal_object.h:475 > > > 475 if (NULL != object) { > > > (gdb) > > > 476 object->obj_class = cls; > > > (gdb) > > > 477 object->obj_reference_count = 1; > > > (gdb) > > > 478 opal_obj_run_constructors(object); > > > (gdb) > > > opal_obj_run_constructors (object=0x1001bfcf0) > > > at ../../../../openmpi-dev-124-g91e9686/opal/class/opal_object.h:420 > > > 420 assert(NULL != object->obj_class); > > > (gdb) > > > 422 cls_construct = object->obj_class->cls_construct_array; > > > (gdb) > > > 423 while( NULL != *cls_construct ) { > > > (gdb) > > > 424 (*cls_construct)(object); > > > (gdb) > > > orte_state_caddy_construct (caddy=0x1001bfcf0) > > > at > > > ../../../../openmpi-dev-124-g91e9686/orte/mca/state/base/state_base_frame.c:84 > > > 84 memset(&caddy->ev, 0, sizeof(opal_event_t)); > > > (gdb) > > > 85 caddy->jdata = NULL; > > > (gdb) > > > 86 } > > > (gdb) > > > opal_obj_run_constructors (object=0x1001bfcf0) > > > at ../../../../openmpi-dev-124-g91e9686/opal/class/opal_object.h:425 > > > 425 cls_construct++; > > > (gdb) > > > 423 while( NULL != *cls_construct ) { > > > (gdb) > > > 427 } > > > (gdb) > > > opal_obj_new (cls=0xffffffff7f14c7d8 <orte_state_caddy_t_class>) > > > at ../../../../openmpi-dev-124-g91e9686/opal/class/opal_object.h:480 > > > 480 return object; > > > (gdb) > > > 481 } > > > (gdb) > > > opal_obj_new_debug (type=0xffffffff7f14c7d8 <orte_state_caddy_t_class>, > > > file=0xffffffff7f034c08 > > > "../../../../openmpi-dev-124-g91e9686/orte/mca/state/base/state_base_fns.c", > > > > > > line=62) at > > > ../../../../openmpi-dev-124-g91e9686/opal/class/opal_object.h:250 > > > 250 object->obj_magic_id = OPAL_OBJ_MAGIC_ID; > > > (gdb) > > > 251 object->cls_init_file_name = file; > > > (gdb) > > > 252 object->cls_init_lineno = line; > > > (gdb) > > > 253 return object; > > > (gdb) > > > 254 } > > > (gdb) > > > orte_state_base_activate_job_state (jdata=0x100125250, state=1) > > > at > > > ../../../../openmpi-dev-124-g91e9686/orte/mca/state/base/state_base_fns.c:63 > > > 63 if (NULL != jdata) { > > > (gdb) > > > 64 caddy->jdata = jdata; > > > (gdb) > > > 65 caddy->job_state = state; > > > (gdb) > > > 66 OBJ_RETAIN(jdata); > > > (gdb) > > > opal_obj_update (inc=1, object=0x100125250) > > > at ../../../../openmpi-dev-124-g91e9686/opal/class/opal_object.h:497 > > > 497 return opal_atomic_add_32(&(object->obj_reference_count), > > > inc); > > > (gdb) > > > opal_atomic_add_32 (addr=0x100125260, delta=1) > > > at > > > ../../../../openmpi-dev-124-g91e9686/opal/include/opal/sys/atomic_impl.h:63 > > > 63 oldval = *addr; > > > (gdb) > > > 64 } while (0 == opal_atomic_cmpset_32(addr, oldval, oldval + > > > delta)); > > > (gdb) > > > opal_atomic_cmpset_32 (addr=0x100125260, oldval=1, newval=2) > > > at > > > ../../../../openmpi-dev-124-g91e9686/opal/include/opal/sys/sparcv9/atomic.h:93 > > > 93 int32_t ret = newval; > > > (gdb) > > > 95 __asm__ __volatile__("casa [%1] " ASI_P ", %2, %0" > > > (gdb) > > > 98 return (ret == oldval); > > > (gdb) > > > 99 } > > > (gdb) > > > opal_atomic_add_32 (addr=0x100125260, delta=1) > > > at > > > ../../../../openmpi-dev-124-g91e9686/opal/include/opal/sys/atomic_impl.h:65 > > > 65 return (oldval + delta); > > > (gdb) > > > 66 } > > > (gdb) > > > orte_state_base_activate_job_state (jdata=0x100125250, state=1) > > > at > > > ../../../../openmpi-dev-124-g91e9686/orte/mca/state/base/state_base_fns.c:66 > > > 66 OBJ_RETAIN(jdata); > > > (gdb) > > > 68 opal_event_set(orte_event_base, &caddy->ev, -1, > > > OPAL_EV_WRITE, s->cbfunc, caddy); > > > (gdb) > > > 69 opal_event_set_priority(&caddy->ev, s->priority); > > > (gdb) > > > 70 opal_event_active(&caddy->ev, OPAL_EV_WRITE, 1); > > > (gdb) > > > 71 return; > > > (gdb) > > > 105 } > > > (gdb) > > > rsh_launch (jdata=0x100125250) > > > at > > > ../../../../../openmpi-dev-124-g91e9686/orte/mca/plm/rsh/plm_rsh_module.c:883 > > > 883 return ORTE_SUCCESS; > > > (gdb) > > > 884 } > > > (gdb) > > > orterun (argc=5, argv=0xffffffff7fffe0d8) > > > at > > > ../../../../openmpi-dev-124-g91e9686/orte/tools/orterun/orterun.c:1084 > > > 1084 while (orte_event_base_active) { > > > (gdb) > > > 1085 opal_event_loop(orte_event_base, OPAL_EVLOOP_ONCE); > > > (gdb) > > > 1084 while (orte_event_base_active) { > > > (gdb) > > > 1085 opal_event_loop(orte_event_base, OPAL_EVLOOP_ONCE); > > > (gdb) > > > 1084 while (orte_event_base_active) { > > > (gdb) > > > 1085 opal_event_loop(orte_event_base, OPAL_EVLOOP_ONCE); > > > (gdb) > > > # > > > # A fatal error has been detected by the Java Runtime Environment: > > > # > > > # SIGSEGV (0xb) at pc=0xffffffff7ea3c7f0, pid=13080, tid=2 > > > # > > > # JRE version: Java(TM) SE Runtime Environment (8.0-b132) (build > > > 1.8.0-b132) > > > # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b70 mixed mode > > > solaris-sparc > > > compressed oops) > > > # Problematic frame: > > > # 1084 while (orte_event_base_active) { > > > (gdb) > > > 1085 opal_event_loop(orte_event_base, OPAL_EVLOOP_ONCE); > > > (gdb) > > > C [libc.so.1+0x3c7f0] strlen+0x50 > > > # > > > # Failed to write core dump. Core dumps have been disabled. To enable > > > core > > > dumping, try "ulimit -c unlimited" before starting Java again > > > # > > > # An error report file with more information is saved as: > > > # > > > /home/fd1026/work/skripte/master/parallel/prog/mpi/java/hs_err_pid13080.log > > > # > > > # If you would like to submit a bug report, please visit: > > > # http://bugreport.sun.com/bugreport/crash.jsp > > > # The crash happened outside the Java Virtual Machine in native code. > > > # See problematic frame for where to report the bug. > > > # > > > -------------------------------------------------------------------------- > > > mpiexec noticed that process rank 0 with PID 0 on node tyr exited on > > > signal 6 > > > (Abort). > > > -------------------------------------------------------------------------- > > > 1084 while (orte_event_base_active) { > > > (gdb) > > > 1089 orte_odls.kill_local_procs(NULL); > > > (gdb) > > > > > > > > > Thank you very much for any help in advance. > > > > > > Kind regards > > > > > > Siegmar