Siegmar,

how did you configure openmpi ? which java version did you use ?

i just found a regression and you currently have to explicitly add
CFLAGS=-D_REENTRANT CPPFLAGS=-D_REENTRANT
to your configure command line

if you want to debug this issue (i cannot reproduce it on a solaris 11
x86 virtual machine)
you can apply the attached patch, and make sure you configure with
--enable-debug and run

OMPI_ATTACH=1 mpiexec -n 1 java InitFinalizeMain

then you will need to attach the *java* process with gdb, set the _dbg
local variable to zero and continue
you should get a clean stack trace and hopefully we will be able to help

Cheers,

Gilles

On 2014/10/24 0:03, Siegmar Gross wrote:
> Hello Oscar,
>
> do you have time to look into my problem? Probably Takahiro has a
> point and gdb behaves differently on Solaris and Linux, so that
> the differing outputs have no meaning. I tried to debug my Java
> program, but without success so far, because I wasn't able to get
> into the Java program to set a breakpoint or to see the code. Have
> you succeeded to debug a mpiJava program? If so, how must I call
> gdb (I normally use "gdb mipexec" and then "run -np 1 java ...")?
> What can I do to get helpful information to track the error down?
> I have attached the error log file. Perhaps you can see if something
> is going wrong with the Java interface. Thank you very much for your
> help and any hints for the usage of gdb with mpiJava in advance.
> Please let me know if I can provide anything else.
>
>
> Kind regards
>
> Siegmar
>
>
>>> I think that it must have to do with MPI, because everything
>>> works fine on Linux and my Java program works fine with an older
>>> MPI version (openmpi-1.8.2a1r31804) as well.
>> Yes. I also think it must have to do with MPI.
>> But java process side, not mpiexec process side.
>>
>> When you run Java MPI program via mpiexec, a mpiexec process
>> process launch a java process. When the java process (your
>> Java program) calls a MPI method, native part (written in C/C++)
>> of the MPI library is called. It runs in java process, not in
>> mpiexec process. I suspect that part.
>>
>>> On Solaris things are different.
>> Are you saying the following difference?
>> After this line,
>>> 881             ORTE_ACTIVATE_JOB_STATE(jdata, ORTE_JOB_STATE_INIT);
>> Linux shows
>>> orte_job_state_to_str (state=1)
>>>     at ../../openmpi-dev-124-g91e9686/orte/util/error_strings.c:217
>>> 217         switch(state) {
>> but Solaris shows
>>> orte_util_print_name_args (name=0x100118380 <orte_process_info+104>)
>>>     at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:122
>>> 122         if (NULL == name) {
>> Each macro is defined as:
>>
>> #define ORTE_ACTIVATE_JOB_STATE(j, s)                                   \
>>     do {                                                                \
>>         orte_job_t *shadow=(j);                                         \
>>         opal_output_verbose(1, orte_state_base_framework.framework_output, \
>>                             "%s ACTIVATE JOB %s STATE %s AT %s:%d",  \
>>                             ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),         \
>>                             (NULL == shadow) ? "NULL" :                 \
>>                             ORTE_JOBID_PRINT(shadow->jobid),         \
>>                             orte_job_state_to_str((s)),                 \
>>                             __FILE__, __LINE__);                     \
>>         orte_state.activate_job_state(shadow, (s));                     \
>>     } while(0);
>>
>> #define ORTE_NAME_PRINT(n) \
>>     orte_util_print_name_args(n)
>>
>> #define ORTE_JOBID_PRINT(n) \
>>     orte_util_print_jobids(n)
>>
>> I'm not sure, but I think the gdb on Solaris steps into
>> orte_util_print_name_args, but gdb on Linux doesn't step into
>> orte_util_print_name_args and orte_util_print_jobids for some
>> reason, or orte_job_state_to_str is evaluated before them.
>>
>> So I think it's not an important difference.
>>
>> You showed the following lines.
>>>>> orterun (argc=5, argv=0xffffffff7fffe0d8)
>>>>>     at 
> ../../../../openmpi-dev-124-g91e9686/orte/tools/orterun/orterun.c:1084
>>>>> 1084        while (orte_event_base_active) {
>>>>> (gdb) 
>>>>> 1085            opal_event_loop(orte_event_base, OPAL_EVLOOP_ONCE);
>>>>> (gdb) 
>> I'm not familiar with this code but I think this part (in mpiexec
>> process) is only waiting the java process to terminate (normally
>> or abnormally). So I think the problem is not in a mpiexec process
>> but in a java process.
>>
>> Regards,
>> Takahiro
>>
>>> Hi Takahiro,
>>>
>>>> mpiexec and java run as distinct processes. Your JRE message
>>>> says java process raises SEGV. So you should trace the java
>>>> process, not the mpiexec process. And more, your JRE message
>>>> says the crash happened outside the Java Virtual Machine in
>>>> native code. So usual Java program debugger is useless.
>>>> You should trace native code part of the java process.
>>>> Unfortunately I don't know how to debug such one.
>>> I think that it must have to do with MPI, because everything
>>> works fine on Linux and my Java program works fine with an older
>>> MPI version (openmpi-1.8.2a1r31804) as well.
>>>
>>> linpc1 x 112 mpiexec -np 1 java InitFinalizeMain
>>> Hello!
>>> linpc1 x 113 
>>>
>>> Therefore I single stepped through the program on Linux as well
>>> and found a difference launching the process. On Linux I get the
>>> following sequence.
>>>
>>> Breakpoint 1, rsh_launch (jdata=0x614aa0)
>>>     at 
> ../../../../../openmpi-dev-124-g91e9686/orte/mca/plm/rsh/plm_rsh_module.c:876
>>> 876         if (ORTE_FLAG_TEST(jdata, ORTE_JOB_FLAG_RESTART)) {
>>> (gdb) s
>>> 881             ORTE_ACTIVATE_JOB_STATE(jdata, ORTE_JOB_STATE_INIT);
>>> (gdb) s
>>> orte_job_state_to_str (state=1)
>>>     at ../../openmpi-dev-124-g91e9686/orte/util/error_strings.c:217
>>> 217         switch(state) {
>>> (gdb) 
>>> 221             return "PENDING INIT";
>>> (gdb) 
>>> 317     }
>>> (gdb) 
>>> orte_util_print_jobids (job=4294967295)
>>>     at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:170
>>> 170         ptr = get_print_name_buffer();
>>> (gdb) 
>>>
>>>
>>>
>>> On Solaris things are different.
>>>
>>> Breakpoint 1, rsh_launch (jdata=0x100125250)
>>>     at 
> ../../../../../openmpi-dev-124-g91e9686/orte/mca/plm/rsh/plm_rsh_module.c:876
>>> 876         if (ORTE_FLAG_TEST(jdata, ORTE_JOB_FLAG_RESTART)) {
>>> (gdb) s
>>> 881             ORTE_ACTIVATE_JOB_STATE(jdata, ORTE_JOB_STATE_INIT);
>>> (gdb) s
>>> orte_util_print_name_args (name=0x100118380 <orte_process_info+104>)
>>>     at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:122
>>> 122         if (NULL == name) {
>>> (gdb) 
>>> 142         job = orte_util_print_jobids(name->jobid);
>>> (gdb) 
>>> orte_util_print_jobids (job=2673410048)
>>>     at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:170
>>> 170         ptr = get_print_name_buffer();
>>> (gdb) 
>>>
>>>
>>>
>>> Is this normal or is it the reason for the crash on Solaris?
>>>
>>>
>>> Kind regards
>>>
>>> Siegmar
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>> The log file output by JRE may help you.
>>>>> # An error report file with more information is saved as:
>>>>> # 
> /home/fd1026/work/skripte/master/parallel/prog/mpi/java/hs_err_pid13080.log
>>>> Regards,
>>>> Takahiro
>>>>
>>>>> Hi,
>>>>>
>>>>> I installed openmpi-dev-124-g91e9686 on Solaris 10 Sparc with
>>>>> gcc-4.9.1 to track down the error with my small Java program.
>>>>> I started single stepping in orterun.c at line 1081 and
>>>>> continued until I got the segmentation fault. I get
>>>>> "jdata = 0x0" in version openmpi-1.8.2a1r31804, which is the
>>>>> last one which works with Java in my environment, while I get
>>>>> "jdata = 0x100125250" in this version. Unfortunately I don't
>>>>> know which files or variables are important to look at. Perhaps
>>>>> somebody can look at the following lines of code and tell me,
>>>>> which information I should provide to solve the problem. I know
>>>>> that Solaris isn't any longer on your list of supported systems,
>>>>> but perhaps we can get it working again, if you tell me what
>>>>> you need and I do the debugging.
>>>>>
>>>>> /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec
>>>>> GNU gdb (GDB) 7.6.1
>>>>> ...
>>>>> (gdb) run -np 1 java InitFinalizeMain 
>>>>> Starting program: /usr/local/openmpi-1.9.0_64_gcc/bin/mpiexec \
>>>>>   -np 1 java InitFinalizeMain
>>>>> [Thread debugging using libthread_db enabled]
>>>>> [New Thread 1 (LWP 1)]
>>>>> [New LWP    2        ]
>>>>> #
>>>>> # A fatal error has been detected by the Java Runtime Environment:
>>>>> #
>>>>> #  SIGSEGV (0xb) at pc=0xffffffff7ea3c7f0, pid=13064, tid=2
>>>>> ...
>>>>> [LWP    2         exited]
>>>>> [New Thread 2        ]
>>>>> [Switching to Thread 1 (LWP 1)]
>>>>> sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be
>>>>>   found to satisfy query
>>>>> (gdb) thread 1
>>>>> [Switching to thread 1 (LWP    1        )]
>>>>> #0  0xffffffff7f6173d0 in rtld_db_dlactivity () from 
> /usr/lib/sparcv9/ld.so.1
>>>>> (gdb) b orterun.c:1081
>>>>> Breakpoint 1 at 0x1000070dc: file 
>>>>> ../../../../openmpi-dev-124-g91e9686/orte/tools/orterun/orterun.c, line 
> 1081.
>>>>> (gdb) r
>>>>> The program being debugged has been started already.
>>>>> Start it from the beginning? (y or n) y
>>>>>
>>>>> Starting program: /usr/local/openmpi-1.9.0_64_gcc/bin/mpiexec -np 1 java 
>>>>> InitFinalizeMain
>>>>> [Thread debugging using libthread_db enabled]
>>>>> [New Thread 1 (LWP 1)]
>>>>> [New LWP    2        ]
>>>>> [Switching to Thread 1 (LWP 1)]
>>>>>
>>>>> Breakpoint 1, orterun (argc=5, argv=0xffffffff7fffe0d8)
>>>>>     at 
> ../../../../openmpi-dev-124-g91e9686/orte/tools/orterun/orterun.c:1081
>>>>> 1081        rc = orte_plm.spawn(jdata);
>>>>> (gdb) print jdata
>>>>> $1 = (orte_job_t *) 0x100125250
>>>>> (gdb) s
>>>>> rsh_launch (jdata=0x100125250)
>>>>>     at 
>>>>>
> ../../../../../openmpi-dev-124-g91e9686/orte/mca/plm/rsh/plm_rsh_module.c:876
>>>>> 876         if (ORTE_FLAG_TEST(jdata, ORTE_JOB_FLAG_RESTART)) {
>>>>> (gdb) s    
>>>>> 881             ORTE_ACTIVATE_JOB_STATE(jdata, ORTE_JOB_STATE_INIT);
>>>>> (gdb) 
>>>>> orte_util_print_name_args (name=0x100118380 <orte_process_info+104>)
>>>>>     at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:122
>>>>> 122         if (NULL == name) {
>>>>> (gdb) 
>>>>> 142         job = orte_util_print_jobids(name->jobid);
>>>>> (gdb) 
>>>>> orte_util_print_jobids (job=2502885376) at 
>>>>> ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:170
>>>>> 170         ptr = get_print_name_buffer();
>>>>> (gdb) 
>>>>> get_print_name_buffer () at 
>>>>> ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:92
>>>>> 92          if (!fns_init) {
>>>>> (gdb) 
>>>>> 101         ret = opal_tsd_getspecific(print_args_tsd_key, 
> (void**)&ptr);
>>>>> (gdb) 
>>>>> opal_tsd_getspecific (key=1, valuep=0xffffffff7fffd990)
>>>>>     at ../../openmpi-dev-124-g91e9686/opal/threads/tsd.h:163
>>>>> 163         *valuep = pthread_getspecific(key);
>>>>> (gdb) 
>>>>> 164         return OPAL_SUCCESS;
>>>>> (gdb) 
>>>>> 165     }
>>>>> (gdb) 
>>>>> get_print_name_buffer () at 
>>>>> ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:102
>>>>> 102         if (OPAL_SUCCESS != ret) return NULL;
>>>>> (gdb) 
>>>>> 104         if (NULL == ptr) {
>>>>> (gdb) 
>>>>> 113         return (orte_print_args_buffers_t*) ptr;
>>>>> (gdb) 
>>>>> 114     }
>>>>> (gdb) 
>>>>> orte_util_print_jobids (job=2502885376) at 
>>>>> ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:172
>>>>> 172         if (NULL == ptr) {
>>>>> (gdb) 
>>>>> 178         if (ORTE_PRINT_NAME_ARG_NUM_BUFS == ptr->cntr) {
>>>>> (gdb) 
>>>>> 182         if (ORTE_JOBID_INVALID == job) {
>>>>> (gdb) 
>>>>> 184         } else if (ORTE_JOBID_WILDCARD == job) {
>>>>> (gdb) 
>>>>> 187             tmp1 = ORTE_JOB_FAMILY((unsigned long)job);
>>>>> (gdb) 
>>>>> 188             tmp2 = ORTE_LOCAL_JOBID((unsigned long)job);
>>>>> (gdb) 
>>>>> 189             snprintf(ptr->buffers[ptr->cntr++], 
>>>>> (gdb) 
>>>>> 193         return ptr->buffers[ptr->cntr-1];
>>>>> (gdb) 
>>>>> 194     }
>>>>> (gdb) 
>>>>> orte_util_print_name_args (name=0x100118380 <orte_process_info+104>)
>>>>>     at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:143
>>>>> 143         vpid = orte_util_print_vpids(name->vpid);
>>>>> (gdb) 
>>>>> orte_util_print_vpids (vpid=0) at 
>>>>> ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:260
>>>>> 260         ptr = get_print_name_buffer();
>>>>> (gdb) 
>>>>> get_print_name_buffer () at 
>>>>> ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:92
>>>>> 92          if (!fns_init) {
>>>>> (gdb) 
>>>>> 101         ret = opal_tsd_getspecific(print_args_tsd_key, 
> (void**)&ptr);
>>>>> (gdb) 
>>>>> opal_tsd_getspecific (key=1, valuep=0xffffffff7fffd9a0)
>>>>>     at ../../openmpi-dev-124-g91e9686/opal/threads/tsd.h:163
>>>>> 163         *valuep = pthread_getspecific(key);
>>>>> (gdb) 
>>>>> 164         return OPAL_SUCCESS;
>>>>> (gdb) 
>>>>> 165     }
>>>>> (gdb) 
>>>>> get_print_name_buffer () at 
>>>>> ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:102
>>>>> 102         if (OPAL_SUCCESS != ret) return NULL;
>>>>> (gdb) 
>>>>> 104         if (NULL == ptr) {
>>>>> (gdb) 
>>>>> 113         return (orte_print_args_buffers_t*) ptr;
>>>>> (gdb) 
>>>>> 114     }
>>>>> (gdb) 
>>>>> orte_util_print_vpids (vpid=0) at 
>>>>> ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:262
>>>>> 262         if (NULL == ptr) {
>>>>> (gdb) 
>>>>> 268         if (ORTE_PRINT_NAME_ARG_NUM_BUFS == ptr->cntr) {
>>>>> (gdb) 
>>>>> 272         if (ORTE_VPID_INVALID == vpid) {
>>>>> (gdb) 
>>>>> 274         } else if (ORTE_VPID_WILDCARD == vpid) {
>>>>> (gdb) 
>>>>> 277             snprintf(ptr->buffers[ptr->cntr++], 
>>>>> (gdb) 
>>>>> 281         return ptr->buffers[ptr->cntr-1];
>>>>> (gdb) 
>>>>> 282     }
>>>>> (gdb) 
>>>>> orte_util_print_name_args (name=0x100118380 <orte_process_info+104>)
>>>>>     at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:146
>>>>> 146         ptr = get_print_name_buffer();
>>>>> (gdb) 
>>>>> get_print_name_buffer () at 
>>>>> ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:92
>>>>> 92          if (!fns_init) {
>>>>> (gdb) 
>>>>> 101         ret = opal_tsd_getspecific(print_args_tsd_key, 
> (void**)&ptr);
>>>>> (gdb) 
>>>>> opal_tsd_getspecific (key=1, valuep=0xffffffff7fffda60)
>>>>>     at ../../openmpi-dev-124-g91e9686/opal/threads/tsd.h:163
>>>>> 163         *valuep = pthread_getspecific(key);
>>>>> (gdb) 
>>>>> 164         return OPAL_SUCCESS;
>>>>> (gdb) 
>>>>> 165     }
>>>>> (gdb) 
>>>>> get_print_name_buffer () at 
>>>>> ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:102
>>>>> 102         if (OPAL_SUCCESS != ret) return NULL;
>>>>> (gdb) 
>>>>> 104         if (NULL == ptr) {
>>>>> (gdb) 
>>>>> 113         return (orte_print_args_buffers_t*) ptr;
>>>>> (gdb) 
>>>>> 114     }
>>>>> (gdb) 
>>>>> orte_util_print_name_args (name=0x100118380 <orte_process_info+104>)
>>>>>     at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:148
>>>>> 148         if (NULL == ptr) {
>>>>> (gdb) 
>>>>> 154         if (ORTE_PRINT_NAME_ARG_NUM_BUFS == ptr->cntr) {
>>>>> (gdb) 
>>>>> 158         snprintf(ptr->buffers[ptr->cntr++], 
>>>>> (gdb) 
>>>>> 162         return ptr->buffers[ptr->cntr-1];
>>>>> (gdb) 
>>>>> 163     }
>>>>> (gdb) 
>>>>> orte_util_print_jobids (job=4294967295) at 
>>>>> ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:170
>>>>> 170         ptr = get_print_name_buffer();
>>>>> (gdb) 
>>>>> get_print_name_buffer () at 
>>>>> ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:92
>>>>> 92          if (!fns_init) {
>>>>> (gdb) 
>>>>> 101         ret = opal_tsd_getspecific(print_args_tsd_key, 
> (void**)&ptr);
>>>>> (gdb) 
>>>>> opal_tsd_getspecific (key=1, valuep=0xffffffff7fffda60)
>>>>>     at ../../openmpi-dev-124-g91e9686/opal/threads/tsd.h:163
>>>>> 163         *valuep = pthread_getspecific(key);
>>>>> (gdb) 
>>>>> 164         return OPAL_SUCCESS;
>>>>> (gdb) 
>>>>> 165     }
>>>>> (gdb) 
>>>>> get_print_name_buffer () at 
>>>>> ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:102
>>>>> 102         if (OPAL_SUCCESS != ret) return NULL;
>>>>> (gdb) 
>>>>> 104         if (NULL == ptr) {
>>>>> (gdb) 
>>>>> 113         return (orte_print_args_buffers_t*) ptr;
>>>>> (gdb) 
>>>>> 114     }
>>>>> (gdb) 
>>>>> orte_util_print_jobids (job=4294967295) at 
>>>>> ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:172
>>>>> 172         if (NULL == ptr) {
>>>>> (gdb) 
>>>>> 178         if (ORTE_PRINT_NAME_ARG_NUM_BUFS == ptr->cntr) {
>>>>> (gdb) 
>>>>> 182         if (ORTE_JOBID_INVALID == job) {
>>>>> (gdb) 
>>>>> 183             snprintf(ptr->buffers[ptr->cntr++], 
>>>>> ORTE_PRINT_NAME_ARGS_MAX_SIZE, "[INVALID]");
>>>>> (gdb) 
>>>>> 193         return ptr->buffers[ptr->cntr-1];
>>>>> (gdb) 
>>>>> 194     }
>>>>> (gdb) 
>>>>> orte_job_state_to_str (state=1) at 
>>>>> ../../openmpi-dev-124-g91e9686/orte/util/error_strings.c:217
>>>>> 217         switch(state) {
>>>>> (gdb) 
>>>>> 221             return "PENDING INIT";
>>>>> (gdb) 
>>>>> 317     }
>>>>> (gdb) 
>>>>> opal_output_verbose (level=1, output_id=0, 
>>>>>     format=0xffffffff7f14dd98 <orte_job_states> 
>>>>> "\336\257\276\355\336\257\276\355")
>>>>>     at ../../../openmpi-dev-124-g91e9686/opal/util/output.c:373
>>>>> 373             va_start(arglist, format);
>>>>> (gdb) 
>>>>> 369     {
>>>>> (gdb) 
>>>>> 370         if (output_id >= 0 && output_id < OPAL_OUTPUT_MAX_STREAMS &&
>>>>> (gdb) 
>>>>> 377     }
>>>>> (gdb) 
>>>>> orte_state_base_activate_job_state (jdata=0x100125250, state=1)
>>>>>     at 
>>>>>
> ../../../../openmpi-dev-124-g91e9686/orte/mca/state/base/state_base_fns.c:33
>>>>> 33          opal_list_item_t *itm, *any=NULL, *error=NULL;
>>>>> (gdb) 
>>>>> 37          for (itm = opal_list_get_first(&orte_job_states);
>>>>> (gdb) 
>>>>> opal_list_get_first (list=0xffffffff7f14dd98 <orte_job_states>)
>>>>>     at ../../../../openmpi-dev-124-g91e9686/opal/class/opal_list.h:320
>>>>> 320         opal_list_item_t* item = 
>>>>> (opal_list_item_t*)list->opal_list_sentinel.opal_list_next;
>>>>> (gdb) 
>>>>> 324         assert(1 == item->opal_list_item_refcount);
>>>>> (gdb) 
>>>>> 325         assert( list == item->opal_list_item_belong_to );
>>>>> (gdb) 
>>>>> 328         return item;
>>>>> (gdb) 
>>>>> 329     }
>>>>> (gdb) 
>>>>> orte_state_base_activate_job_state (jdata=0x100125250, state=1)
>>>>>     at 
>>>>>
> ../../../../openmpi-dev-124-g91e9686/orte/mca/state/base/state_base_fns.c:38
>>>>> 38               itm != opal_list_get_end(&orte_job_states);
>>>>> (gdb) 
>>>>> opal_list_get_end (list=0xffffffff7f14dd98 <orte_job_states>)
>>>>>     at ../../../../openmpi-dev-124-g91e9686/opal/class/opal_list.h:399
>>>>> 399         return &(list->opal_list_sentinel);
>>>>> (gdb) 
>>>>> 400     }
>>>>> (gdb) 
>>>>> orte_state_base_activate_job_state (jdata=0x100125250, state=1)
>>>>>     at 
>>>>>
> ../../../../openmpi-dev-124-g91e9686/orte/mca/state/base/state_base_fns.c:37
>>>>> 37          for (itm = opal_list_get_first(&orte_job_states);
>>>>> (gdb) 
>>>>> 40              s = (orte_state_t*)itm;
>>>>> (gdb) 
>>>>> 41              if (s->job_state == ORTE_JOB_STATE_ANY) {
>>>>> (gdb) 
>>>>> 45              if (s->job_state == ORTE_JOB_STATE_ERROR) {
>>>>> (gdb) 
>>>>> 48              if (s->job_state == state) {
>>>>> (gdb) 
>>>>> 49                  OPAL_OUTPUT_VERBOSE((1, 
>>>>> orte_state_base_framework.framework_output,
>>>>> (gdb) 
>>>>> orte_util_print_name_args (name=0x100118380 <orte_process_info+104>)
>>>>>     at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:122
>>>>> 122         if (NULL == name) {
>>>>> (gdb) 
>>>>> 142         job = orte_util_print_jobids(name->jobid);
>>>>> (gdb) 
>>>>> orte_util_print_jobids (job=2502885376) at 
>>>>> ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:170
>>>>> 170         ptr = get_print_name_buffer();
>>>>> (gdb) 
>>>>> get_print_name_buffer () at 
>>>>> ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:92
>>>>> 92          if (!fns_init) {
>>>>> (gdb) 
>>>>> 101         ret = opal_tsd_getspecific(print_args_tsd_key, 
> (void**)&ptr);
>>>>> (gdb) 
>>>>> opal_tsd_getspecific (key=1, valuep=0xffffffff7fffd880)
>>>>>     at ../../openmpi-dev-124-g91e9686/opal/threads/tsd.h:163
>>>>> 163         *valuep = pthread_getspecific(key);
>>>>> (gdb) 
>>>>> 164         return OPAL_SUCCESS;
>>>>> (gdb) 
>>>>> 165     }
>>>>> (gdb) 
>>>>> get_print_name_buffer () at 
>>>>> ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:102
>>>>> 102         if (OPAL_SUCCESS != ret) return NULL;
>>>>> (gdb) 
>>>>> 104         if (NULL == ptr) {
>>>>> (gdb) 
>>>>> 113         return (orte_print_args_buffers_t*) ptr;
>>>>> (gdb) 
>>>>> 114     }
>>>>> (gdb) 
>>>>> orte_util_print_jobids (job=2502885376) at 
>>>>> ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:172
>>>>> 172         if (NULL == ptr) {
>>>>> (gdb) 
>>>>> 178         if (ORTE_PRINT_NAME_ARG_NUM_BUFS == ptr->cntr) {
>>>>> (gdb) 
>>>>> 182         if (ORTE_JOBID_INVALID == job) {
>>>>> (gdb) 
>>>>> 184         } else if (ORTE_JOBID_WILDCARD == job) {
>>>>> (gdb) 
>>>>> 187             tmp1 = ORTE_JOB_FAMILY((unsigned long)job);
>>>>> (gdb) 
>>>>> 188             tmp2 = ORTE_LOCAL_JOBID((unsigned long)job);
>>>>> (gdb) 
>>>>> 189             snprintf(ptr->buffers[ptr->cntr++], 
>>>>> (gdb) 
>>>>> 193         return ptr->buffers[ptr->cntr-1];
>>>>> (gdb) 
>>>>> 194     }
>>>>> (gdb) 
>>>>> orte_util_print_name_args (name=0x100118380 <orte_process_info+104>)
>>>>>     at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:143
>>>>> 143         vpid = orte_util_print_vpids(name->vpid);
>>>>> (gdb) 
>>>>> orte_util_print_vpids (vpid=0) at 
>>>>> ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:260
>>>>> 260         ptr = get_print_name_buffer();
>>>>> (gdb) 
>>>>> get_print_name_buffer () at 
>>>>> ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:92
>>>>> 92          if (!fns_init) {
>>>>> (gdb) 
>>>>> 101         ret = opal_tsd_getspecific(print_args_tsd_key, 
> (void**)&ptr);
>>>>> (gdb) 
>>>>> opal_tsd_getspecific (key=1, valuep=0xffffffff7fffd890)
>>>>>     at ../../openmpi-dev-124-g91e9686/opal/threads/tsd.h:163
>>>>> 163         *valuep = pthread_getspecific(key);
>>>>> (gdb) 
>>>>> 164         return OPAL_SUCCESS;
>>>>> (gdb) 
>>>>> 165     }
>>>>> (gdb) 
>>>>> get_print_name_buffer () at 
>>>>> ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:102
>>>>> 102         if (OPAL_SUCCESS != ret) return NULL;
>>>>> (gdb) 
>>>>> 104         if (NULL == ptr) {
>>>>> (gdb) 
>>>>> 113         return (orte_print_args_buffers_t*) ptr;
>>>>> (gdb) 
>>>>> 114     }
>>>>> (gdb) 
>>>>> orte_util_print_vpids (vpid=0) at 
>>>>> ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:262
>>>>> 262         if (NULL == ptr) {
>>>>> (gdb) 
>>>>> 268         if (ORTE_PRINT_NAME_ARG_NUM_BUFS == ptr->cntr) {
>>>>> (gdb) 
>>>>> 272         if (ORTE_VPID_INVALID == vpid) {
>>>>> (gdb) 
>>>>> 274         } else if (ORTE_VPID_WILDCARD == vpid) {
>>>>> (gdb) 
>>>>> 277             snprintf(ptr->buffers[ptr->cntr++], 
>>>>> (gdb) 
>>>>> 281         return ptr->buffers[ptr->cntr-1];
>>>>> (gdb) 
>>>>> 282     }
>>>>> (gdb) 
>>>>> orte_util_print_name_args (name=0x100118380 <orte_process_info+104>)
>>>>>     at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:146
>>>>> 146         ptr = get_print_name_buffer();
>>>>> (gdb) 
>>>>> get_print_name_buffer () at 
>>>>> ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:92
>>>>> 92          if (!fns_init) {
>>>>> (gdb) 
>>>>> 101         ret = opal_tsd_getspecific(print_args_tsd_key, 
> (void**)&ptr);
>>>>> (gdb) 
>>>>> opal_tsd_getspecific (key=1, valuep=0xffffffff7fffd950)
>>>>>     at ../../openmpi-dev-124-g91e9686/opal/threads/tsd.h:163
>>>>> 163         *valuep = pthread_getspecific(key);
>>>>> (gdb) 
>>>>> 164         return OPAL_SUCCESS;
>>>>> (gdb) 
>>>>> 165     }
>>>>> (gdb) 
>>>>> get_print_name_buffer () at 
>>>>> ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:102
>>>>> 102         if (OPAL_SUCCESS != ret) return NULL;
>>>>> (gdb) 
>>>>> 104         if (NULL == ptr) {
>>>>> (gdb) 
>>>>> 113         return (orte_print_args_buffers_t*) ptr;
>>>>> (gdb) 
>>>>> 114     }
>>>>> (gdb) 
>>>>> orte_util_print_name_args (name=0x100118380 <orte_process_info+104>)
>>>>>     at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:148
>>>>> 148         if (NULL == ptr) {
>>>>> (gdb) 
>>>>> 154         if (ORTE_PRINT_NAME_ARG_NUM_BUFS == ptr->cntr) {
>>>>> (gdb) 
>>>>> 158         snprintf(ptr->buffers[ptr->cntr++], 
>>>>> (gdb) 
>>>>> 162         return ptr->buffers[ptr->cntr-1];
>>>>> (gdb) 
>>>>> 163     }
>>>>> (gdb) 
>>>>> orte_util_print_jobids (job=4294967295) at 
>>>>> ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:170
>>>>> 170         ptr = get_print_name_buffer();
>>>>> (gdb) 
>>>>> get_print_name_buffer () at 
>>>>> ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:92
>>>>> 92          if (!fns_init) {
>>>>> (gdb) 
>>>>> 101         ret = opal_tsd_getspecific(print_args_tsd_key, 
> (void**)&ptr);
>>>>> (gdb) 
>>>>> opal_tsd_getspecific (key=1, valuep=0xffffffff7fffd950)
>>>>>     at ../../openmpi-dev-124-g91e9686/opal/threads/tsd.h:163
>>>>> 163         *valuep = pthread_getspecific(key);
>>>>> (gdb) 
>>>>> 164         return OPAL_SUCCESS;
>>>>> (gdb) 
>>>>> 165     }
>>>>> (gdb) 
>>>>> get_print_name_buffer () at 
>>>>> ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:102
>>>>> 102         if (OPAL_SUCCESS != ret) return NULL;
>>>>> (gdb) 
>>>>> 104         if (NULL == ptr) {
>>>>> (gdb) 
>>>>> 113         return (orte_print_args_buffers_t*) ptr;
>>>>> (gdb) 
>>>>> 114     }
>>>>> (gdb) 
>>>>> orte_util_print_jobids (job=4294967295) at 
>>>>> ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:172
>>>>> 172         if (NULL == ptr) {
>>>>> (gdb) 
>>>>> 178         if (ORTE_PRINT_NAME_ARG_NUM_BUFS == ptr->cntr) {
>>>>> (gdb) 
>>>>> 182         if (ORTE_JOBID_INVALID == job) {
>>>>> (gdb) 
>>>>> 183             snprintf(ptr->buffers[ptr->cntr++], 
>>>>> ORTE_PRINT_NAME_ARGS_MAX_SIZE, "[INVALID]");
>>>>> (gdb) 
>>>>> 193         return ptr->buffers[ptr->cntr-1];
>>>>> (gdb) 
>>>>> 194     }
>>>>> (gdb) 
>>>>> orte_job_state_to_str (state=1) at 
>>>>> ../../openmpi-dev-124-g91e9686/orte/util/error_strings.c:217
>>>>> 217         switch(state) {
>>>>> (gdb) 
>>>>> 221             return "PENDING INIT";
>>>>> (gdb) 
>>>>> 317     }
>>>>> (gdb) 
>>>>> opal_output_verbose (level=1, output_id=-1, format=0x1 <Address 0x1 out 
> of 
>>>>> bounds>)
>>>>>     at ../../../openmpi-dev-124-g91e9686/opal/util/output.c:373
>>>>> 373             va_start(arglist, format);
>>>>> (gdb) 
>>>>> 369     {
>>>>> (gdb) 
>>>>> 370         if (output_id >= 0 && output_id < OPAL_OUTPUT_MAX_STREAMS &&
>>>>> (gdb) 
>>>>> 377     }
>>>>> (gdb) 
>>>>> orte_state_base_activate_job_state (jdata=0x100125250, state=1)
>>>>>     at 
>>>>>
> ../../../../openmpi-dev-124-g91e9686/orte/mca/state/base/state_base_fns.c:54
>>>>> 54                  if (NULL == s->cbfunc) {
>>>>> (gdb) 
>>>>> 62                  caddy = OBJ_NEW(orte_state_caddy_t);
>>>>> (gdb) 
>>>>> opal_obj_new_debug (type=0xffffffff7f14c7d8 <orte_state_caddy_t_class>, 
>>>>>     file=0xffffffff7f034c08 
>>>>>
> "../../../../openmpi-dev-124-g91e9686/orte/mca/state/base/state_base_fns.c", 
>>>>> line=62) at 
> ../../../../openmpi-dev-124-g91e9686/opal/class/opal_object.h:249
>>>>> 249         opal_object_t* object = opal_obj_new(type);
>>>>> (gdb) 
>>>>> opal_obj_new (cls=0xffffffff7f14c7d8 <orte_state_caddy_t_class>)
>>>>>     at ../../../../openmpi-dev-124-g91e9686/opal/class/opal_object.h:465
>>>>> 465         assert(cls->cls_sizeof >= sizeof(opal_object_t));
>>>>> (gdb) 
>>>>> 470         object = (opal_object_t *) malloc(cls->cls_sizeof);
>>>>> (gdb) 
>>>>> 472         if (0 == cls->cls_initialized) {
>>>>> (gdb) 
>>>>> 473             opal_class_initialize(cls);
>>>>> (gdb) 
>>>>> opal_class_initialize (cls=0xffffffff7f14c7d8 
> <orte_state_caddy_t_class>)
>>>>>     at ../../openmpi-dev-124-g91e9686/opal/class/opal_object.c:79
>>>>> 79          assert(cls);
>>>>> (gdb) 
>>>>> 84          if (1 == cls->cls_initialized) {
>>>>> (gdb) 
>>>>> 87          opal_atomic_lock(&class_lock);
>>>>> (gdb) 
>>>>> opal_atomic_lock (lock=0xffffffff7ee89bf0 <class_lock>)
>>>>>     at 
> ../../openmpi-dev-124-g91e9686/opal/include/opal/sys/atomic_impl.h:397
>>>>> 397        while( !opal_atomic_cmpset_acq_32( &(lock->u.lock),
>>>>> (gdb) 
>>>>> opal_atomic_cmpset_acq_32 (addr=0xffffffff7ee89bf0 <class_lock>, 
> oldval=0, 
>>>>> newval=1)
>>>>>     at 
> ../../openmpi-dev-124-g91e9686/opal/include/opal/sys/sparcv9/atomic.h:107
>>>>> 107        rc = opal_atomic_cmpset_32(addr, oldval, newval);
>>>>> (gdb) 
>>>>> opal_atomic_cmpset_32 (addr=0xffffffff7ee89bf0 <class_lock>, oldval=0, 
> newval=1)
>>>>>     at 
> ../../openmpi-dev-124-g91e9686/opal/include/opal/sys/sparcv9/atomic.h:93
>>>>> 93         int32_t ret = newval;
>>>>> (gdb) 
>>>>> 95         __asm__ __volatile__("casa [%1] " ASI_P ", %2, %0"
>>>>> (gdb) 
>>>>> 98         return (ret == oldval);
>>>>> (gdb) 
>>>>> 99      }
>>>>> (gdb) 
>>>>> opal_atomic_cmpset_acq_32 (addr=0xffffffff7ee89bf0 <class_lock>, 
> oldval=0, 
>>>>> newval=1)
>>>>>     at 
> ../../openmpi-dev-124-g91e9686/opal/include/opal/sys/sparcv9/atomic.h:108
>>>>> 108        opal_atomic_rmb();
>>>>> (gdb) 
>>>>> opal_atomic_rmb () at 
>>>>> ../../openmpi-dev-124-g91e9686/opal/include/opal/sys/sparcv9/atomic.h:63
>>>>> 63          MEMBAR("#LoadLoad");
>>>>> (gdb) 
>>>>> 64      }
>>>>> (gdb) 
>>>>> opal_atomic_cmpset_acq_32 (addr=0xffffffff7ee89bf0 <class_lock>, 
> oldval=0, 
>>>>> newval=1)
>>>>>     at 
> ../../openmpi-dev-124-g91e9686/opal/include/opal/sys/sparcv9/atomic.h:110
>>>>> 110        return rc;
>>>>> (gdb) 
>>>>> 111     }
>>>>> (gdb) 
>>>>> opal_atomic_lock (lock=0xffffffff7ee89bf0 <class_lock>)
>>>>>     at 
> ../../openmpi-dev-124-g91e9686/opal/include/opal/sys/atomic_impl.h:403
>>>>> 403     }
>>>>> (gdb) 
>>>>> opal_class_initialize (cls=0xffffffff7f14c7d8 
> <orte_state_caddy_t_class>)
>>>>>     at ../../openmpi-dev-124-g91e9686/opal/class/opal_object.c:93
>>>>> 93          if (1 == cls->cls_initialized) {
>>>>> (gdb) 
>>>>> 103         cls->cls_depth = 0;
>>>>> (gdb) 
>>>>> 104         cls_construct_array_count = 0;
>>>>> (gdb) 
>>>>> 105         cls_destruct_array_count  = 0;
>>>>> (gdb) 
>>>>> 106         for (c = cls; c; c = c->cls_parent) {
>>>>> (gdb) 
>>>>> 107             if( NULL != c->cls_construct ) {
>>>>> (gdb) 
>>>>> 108                 cls_construct_array_count++;
>>>>> (gdb) 
>>>>> 110             if( NULL != c->cls_destruct ) {
>>>>> (gdb) 
>>>>> 111                 cls_destruct_array_count++;
>>>>> (gdb) 
>>>>> 113             cls->cls_depth++;
>>>>> (gdb) 
>>>>> 106         for (c = cls; c; c = c->cls_parent) {
>>>>> (gdb) 
>>>>> 107             if( NULL != c->cls_construct ) {
>>>>> (gdb) 
>>>>> 110             if( NULL != c->cls_destruct ) {
>>>>> (gdb) 
>>>>> 113             cls->cls_depth++;
>>>>> (gdb) 
>>>>> 106         for (c = cls; c; c = c->cls_parent) {
>>>>> (gdb) 
>>>>> 122             (void 
> (**)(opal_object_t*))malloc((cls_construct_array_count +
>>>>> (gdb) 
>>>>> 123                                                
> cls_destruct_array_count + 2) 
>>>>> *
>>>>> (gdb) 
>>>>> 122             (void 
> (**)(opal_object_t*))malloc((cls_construct_array_count +
>>>>> (gdb) 
>>>>> 121         cls->cls_construct_array = 
>>>>> (gdb) 
>>>>> 125         if (NULL == cls->cls_construct_array) {
>>>>> (gdb) 
>>>>> 130             cls->cls_construct_array + cls_construct_array_count + 
> 1;
>>>>> (gdb) 
>>>>> 129         cls->cls_destruct_array =
>>>>> (gdb) 
>>>>> 136         cls_construct_array = cls->cls_construct_array + 
>>>>> cls_construct_array_count;
>>>>> (gdb) 
>>>>> 137         cls_destruct_array  = cls->cls_destruct_array;
>>>>> (gdb) 
>>>>> 139         c = cls;
>>>>> (gdb) 
>>>>> 140         *cls_construct_array = NULL;  /* end marker for the 
> constructors */
>>>>> (gdb) 
>>>>> 141         for (i = 0; i < cls->cls_depth; i++) {
>>>>> (gdb) 
>>>>> 142             if( NULL != c->cls_construct ) {
>>>>> (gdb) 
>>>>> 143                 --cls_construct_array;
>>>>> (gdb) 
>>>>> 144                 *cls_construct_array = c->cls_construct;
>>>>> (gdb) 
>>>>> 146             if( NULL != c->cls_destruct ) {
>>>>> (gdb) 
>>>>> 147                 *cls_destruct_array = c->cls_destruct;
>>>>> (gdb) 
>>>>> 148                 cls_destruct_array++;
>>>>> (gdb) 
>>>>> 150             c = c->cls_parent;
>>>>> (gdb) 
>>>>> 141         for (i = 0; i < cls->cls_depth; i++) {
>>>>> (gdb) 
>>>>> 142             if( NULL != c->cls_construct ) {
>>>>> (gdb) 
>>>>> 146             if( NULL != c->cls_destruct ) {
>>>>> (gdb) 
>>>>> 150             c = c->cls_parent;
>>>>> (gdb) 
>>>>> 141         for (i = 0; i < cls->cls_depth; i++) {
>>>>> (gdb) 
>>>>> 152         *cls_destruct_array = NULL;  /* end marker for the 
> destructors */
>>>>> (gdb) 
>>>>> 154         cls->cls_initialized = 1;
>>>>> (gdb) 
>>>>> 155         save_class(cls);
>>>>> (gdb) 
>>>>> save_class (cls=0xffffffff7f14c7d8 <orte_state_caddy_t_class>)
>>>>>     at ../../openmpi-dev-124-g91e9686/opal/class/opal_object.c:188
>>>>> 188         if (num_classes >= max_classes) {
>>>>> (gdb) 
>>>>> 189             expand_array();
>>>>> (gdb) 
>>>>> expand_array () at 
> ../../openmpi-dev-124-g91e9686/opal/class/opal_object.c:201
>>>>> 201         max_classes += increment;
>>>>> (gdb) 
>>>>> 202         classes = (void**)realloc(classes, sizeof(opal_class_t*) * 
>>>>> max_classes);
>>>>> (gdb) 
>>>>> 203         if (NULL == classes) {
>>>>> (gdb) 
>>>>> 207         for (i = num_classes; i < max_classes; ++i) {
>>>>> (gdb) 
>>>>> 208             classes[i] = NULL;
>>>>> (gdb) 
>>>>> 207         for (i = num_classes; i < max_classes; ++i) {
>>>>> (gdb) 
>>>>> 208             classes[i] = NULL;
>>>>> (gdb) 
>>>>> 207         for (i = num_classes; i < max_classes; ++i) {
>>>>> (gdb) 
>>>>> 208             classes[i] = NULL;
>>>>> (gdb) 
>>>>> 207         for (i = num_classes; i < max_classes; ++i) {
>>>>> (gdb) 
>>>>> 208             classes[i] = NULL;
>>>>> (gdb) 
>>>>> 207         for (i = num_classes; i < max_classes; ++i) {
>>>>> (gdb) 
>>>>> 208             classes[i] = NULL;
>>>>> (gdb) 
>>>>> 207         for (i = num_classes; i < max_classes; ++i) {
>>>>> (gdb) 
>>>>> 208             classes[i] = NULL;
>>>>> (gdb) 
>>>>> 207         for (i = num_classes; i < max_classes; ++i) {
>>>>> (gdb) 
>>>>> 208             classes[i] = NULL;
>>>>> (gdb) 
>>>>> 207         for (i = num_classes; i < max_classes; ++i) {
>>>>> (gdb) 
>>>>> 208             classes[i] = NULL;
>>>>> (gdb) 
>>>>> 207         for (i = num_classes; i < max_classes; ++i) {
>>>>> (gdb) 
>>>>> 208             classes[i] = NULL;
>>>>> (gdb) 
>>>>> 207         for (i = num_classes; i < max_classes; ++i) {
>>>>> (gdb) 
>>>>> 208             classes[i] = NULL;
>>>>> (gdb) 
>>>>> 207         for (i = num_classes; i < max_classes; ++i) {
>>>>> (gdb) 
>>>>> 210     }
>>>>> (gdb) 
>>>>> save_class (cls=0xffffffff7f14c7d8 <orte_state_caddy_t_class>)
>>>>>     at ../../openmpi-dev-124-g91e9686/opal/class/opal_object.c:192
>>>>> 192         classes[num_classes] = cls->cls_construct_array;
>>>>> (gdb) 
>>>>> 193         ++num_classes;
>>>>> (gdb) 
>>>>> 194     }
>>>>> (gdb) 
>>>>> opal_class_initialize (cls=0xffffffff7f14c7d8 
> <orte_state_caddy_t_class>)
>>>>>     at ../../openmpi-dev-124-g91e9686/opal/class/opal_object.c:159
>>>>> 159         opal_atomic_unlock(&class_lock);
>>>>> (gdb) 
>>>>> opal_atomic_unlock (lock=0xffffffff7ee89bf0 <class_lock>)
>>>>>     at 
> ../../openmpi-dev-124-g91e9686/opal/include/opal/sys/atomic_impl.h:409
>>>>> 409        opal_atomic_wmb();
>>>>> (gdb) 
>>>>> opal_atomic_wmb () at 
>>>>> ../../openmpi-dev-124-g91e9686/opal/include/opal/sys/sparcv9/atomic.h:69
>>>>> 69          MEMBAR("#StoreStore");
>>>>> (gdb) 
>>>>> 70      }
>>>>> (gdb) 
>>>>> opal_atomic_unlock (lock=0xffffffff7ee89bf0 <class_lock>)
>>>>>     at 
> ../../openmpi-dev-124-g91e9686/opal/include/opal/sys/atomic_impl.h:410
>>>>> 410        lock->u.lock=OPAL_ATOMIC_UNLOCKED;
>>>>> (gdb) 
>>>>> 411     }
>>>>> (gdb) 
>>>>> opal_class_initialize (cls=0xffffffff7f14c7d8 
> <orte_state_caddy_t_class>)
>>>>>     at ../../openmpi-dev-124-g91e9686/opal/class/opal_object.c:160
>>>>> 160     }
>>>>> (gdb) 
>>>>> opal_obj_new (cls=0xffffffff7f14c7d8 <orte_state_caddy_t_class>)
>>>>>     at ../../../../openmpi-dev-124-g91e9686/opal/class/opal_object.h:475
>>>>> 475         if (NULL != object) {
>>>>> (gdb) 
>>>>> 476             object->obj_class = cls;
>>>>> (gdb) 
>>>>> 477             object->obj_reference_count = 1;
>>>>> (gdb) 
>>>>> 478             opal_obj_run_constructors(object);
>>>>> (gdb) 
>>>>> opal_obj_run_constructors (object=0x1001bfcf0)
>>>>>     at ../../../../openmpi-dev-124-g91e9686/opal/class/opal_object.h:420
>>>>> 420         assert(NULL != object->obj_class);
>>>>> (gdb) 
>>>>> 422         cls_construct = object->obj_class->cls_construct_array;
>>>>> (gdb) 
>>>>> 423         while( NULL != *cls_construct ) {
>>>>> (gdb) 
>>>>> 424             (*cls_construct)(object);
>>>>> (gdb) 
>>>>> orte_state_caddy_construct (caddy=0x1001bfcf0)
>>>>>     at 
>>>>>
> ../../../../openmpi-dev-124-g91e9686/orte/mca/state/base/state_base_frame.c:84
>>>>> 84          memset(&caddy->ev, 0, sizeof(opal_event_t));
>>>>> (gdb) 
>>>>> 85          caddy->jdata = NULL;
>>>>> (gdb) 
>>>>> 86      }
>>>>> (gdb) 
>>>>> opal_obj_run_constructors (object=0x1001bfcf0)
>>>>>     at ../../../../openmpi-dev-124-g91e9686/opal/class/opal_object.h:425
>>>>> 425             cls_construct++;
>>>>> (gdb) 
>>>>> 423         while( NULL != *cls_construct ) {
>>>>> (gdb) 
>>>>> 427     }
>>>>> (gdb) 
>>>>> opal_obj_new (cls=0xffffffff7f14c7d8 <orte_state_caddy_t_class>)
>>>>>     at ../../../../openmpi-dev-124-g91e9686/opal/class/opal_object.h:480
>>>>> 480         return object;
>>>>> (gdb) 
>>>>> 481     }
>>>>> (gdb) 
>>>>> opal_obj_new_debug (type=0xffffffff7f14c7d8 <orte_state_caddy_t_class>, 
>>>>>     file=0xffffffff7f034c08 
>>>>>
> "../../../../openmpi-dev-124-g91e9686/orte/mca/state/base/state_base_fns.c", 
>>>>> line=62) at 
> ../../../../openmpi-dev-124-g91e9686/opal/class/opal_object.h:250
>>>>> 250         object->obj_magic_id = OPAL_OBJ_MAGIC_ID;
>>>>> (gdb) 
>>>>> 251         object->cls_init_file_name = file;
>>>>> (gdb) 
>>>>> 252         object->cls_init_lineno = line;
>>>>> (gdb) 
>>>>> 253         return object;
>>>>> (gdb) 
>>>>> 254     }
>>>>> (gdb) 
>>>>> orte_state_base_activate_job_state (jdata=0x100125250, state=1)
>>>>>     at 
>>>>>
> ../../../../openmpi-dev-124-g91e9686/orte/mca/state/base/state_base_fns.c:63
>>>>> 63                  if (NULL != jdata) {
>>>>> (gdb) 
>>>>> 64                      caddy->jdata = jdata;
>>>>> (gdb) 
>>>>> 65                      caddy->job_state = state;
>>>>> (gdb) 
>>>>> 66                      OBJ_RETAIN(jdata);
>>>>> (gdb) 
>>>>> opal_obj_update (inc=1, object=0x100125250)
>>>>>     at ../../../../openmpi-dev-124-g91e9686/opal/class/opal_object.h:497
>>>>> 497         return opal_atomic_add_32(&(object->obj_reference_count), 
> inc);
>>>>> (gdb) 
>>>>> opal_atomic_add_32 (addr=0x100125260, delta=1)
>>>>>     at 
>>>>>
> ../../../../openmpi-dev-124-g91e9686/opal/include/opal/sys/atomic_impl.h:63
>>>>> 63            oldval = *addr;
>>>>> (gdb) 
>>>>> 64         } while (0 == opal_atomic_cmpset_32(addr, oldval, oldval + 
> delta));
>>>>> (gdb) 
>>>>> opal_atomic_cmpset_32 (addr=0x100125260, oldval=1, newval=2)
>>>>>     at 
>>>>>
> ../../../../openmpi-dev-124-g91e9686/opal/include/opal/sys/sparcv9/atomic.h:93
>>>>> 93         int32_t ret = newval;
>>>>> (gdb) 
>>>>> 95         __asm__ __volatile__("casa [%1] " ASI_P ", %2, %0"
>>>>> (gdb) 
>>>>> 98         return (ret == oldval);
>>>>> (gdb) 
>>>>> 99      }
>>>>> (gdb) 
>>>>> opal_atomic_add_32 (addr=0x100125260, delta=1)
>>>>>     at 
>>>>>
> ../../../../openmpi-dev-124-g91e9686/opal/include/opal/sys/atomic_impl.h:65
>>>>> 65         return (oldval + delta);
>>>>> (gdb) 
>>>>> 66      }
>>>>> (gdb) 
>>>>> orte_state_base_activate_job_state (jdata=0x100125250, state=1)
>>>>>     at 
>>>>>
> ../../../../openmpi-dev-124-g91e9686/orte/mca/state/base/state_base_fns.c:66
>>>>> 66                      OBJ_RETAIN(jdata);
>>>>> (gdb) 
>>>>> 68                  opal_event_set(orte_event_base, &caddy->ev, -1, 
>>>>> OPAL_EV_WRITE, s->cbfunc, caddy);
>>>>> (gdb) 
>>>>> 69                  opal_event_set_priority(&caddy->ev, s->priority);
>>>>> (gdb) 
>>>>> 70                  opal_event_active(&caddy->ev, OPAL_EV_WRITE, 1);
>>>>> (gdb) 
>>>>> 71                  return;
>>>>> (gdb) 
>>>>> 105     }
>>>>> (gdb) 
>>>>> rsh_launch (jdata=0x100125250)
>>>>>     at 
>>>>>
> ../../../../../openmpi-dev-124-g91e9686/orte/mca/plm/rsh/plm_rsh_module.c:883
>>>>> 883         return ORTE_SUCCESS;
>>>>> (gdb) 
>>>>> 884     }
>>>>> (gdb) 
>>>>> orterun (argc=5, argv=0xffffffff7fffe0d8)
>>>>>     at 
> ../../../../openmpi-dev-124-g91e9686/orte/tools/orterun/orterun.c:1084
>>>>> 1084        while (orte_event_base_active) {
>>>>> (gdb) 
>>>>> 1085            opal_event_loop(orte_event_base, OPAL_EVLOOP_ONCE);
>>>>> (gdb) 
>>>>> 1084        while (orte_event_base_active) {
>>>>> (gdb) 
>>>>> 1085            opal_event_loop(orte_event_base, OPAL_EVLOOP_ONCE);
>>>>> (gdb) 
>>>>> 1084        while (orte_event_base_active) {
>>>>> (gdb) 
>>>>> 1085            opal_event_loop(orte_event_base, OPAL_EVLOOP_ONCE);
>>>>> (gdb) 
>>>>> #
>>>>> # A fatal error has been detected by the Java Runtime Environment:
>>>>> #
>>>>> #  SIGSEGV (0xb) at pc=0xffffffff7ea3c7f0, pid=13080, tid=2
>>>>> #
>>>>> # JRE version: Java(TM) SE Runtime Environment (8.0-b132) (build 
> 1.8.0-b132)
>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b70 mixed mode 
> solaris-sparc 
>>>>> compressed oops)
>>>>> # Problematic frame:
>>>>> # 1084      while (orte_event_base_active) {
>>>>> (gdb) 
>>>>> 1085            opal_event_loop(orte_event_base, OPAL_EVLOOP_ONCE);
>>>>> (gdb) 
>>>>> C  [libc.so.1+0x3c7f0]  strlen+0x50
>>>>> #
>>>>> # Failed to write core dump. Core dumps have been disabled. To enable 
> core 
>>>>> dumping, try "ulimit -c unlimited" before starting Java again
>>>>> #
>>>>> # An error report file with more information is saved as:
>>>>> # 
> /home/fd1026/work/skripte/master/parallel/prog/mpi/java/hs_err_pid13080.log
>>>>> #
>>>>> # If you would like to submit a bug report, please visit:
>>>>> #   http://bugreport.sun.com/bugreport/crash.jsp
>>>>> # The crash happened outside the Java Virtual Machine in native code.
>>>>> # See problematic frame for where to report the bug.
>>>>> #
>>>>>
> --------------------------------------------------------------------------
>>>>> mpiexec noticed that process rank 0 with PID 0 on node tyr exited on 
> signal 6 
>>>>> (Abort).
>>>>>
> --------------------------------------------------------------------------
>>>>> 1084        while (orte_event_base_active) {
>>>>> (gdb) 
>>>>> 1089        orte_odls.kill_local_procs(NULL);
>>>>> (gdb) 
>>>>>
>>>>>
>>>>> Thank you very much for any help in advance.
>>>>>
>>>>> Kind regards
>>>>>
>>>>> Siegmar
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/10/25559.php
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/10/25562.php

diff --git a/ompi/mpi/java/c/mpi_MPI.c b/ompi/mpi/java/c/mpi_MPI.c
index 7c3a3ba..219da6e 100644
--- a/ompi/mpi/java/c/mpi_MPI.c
+++ b/ompi/mpi/java/c/mpi_MPI.c
@@ -62,6 +62,7 @@
 #include <sys/stat.h>
 #endif
 #include <dlfcn.h>
+#include <poll.h>

 #include "opal/util/output.h"
 #include "opal/datatype/opal_convertor.h"
@@ -121,6 +122,11 @@ OBJ_CLASS_INSTANCE(ompi_java_buffer_t,
  */
 jint JNI_OnLoad(JavaVM *vm, void *reserved)
 {
+    char *env = getenv("OMPI_ATTACH");
+    if (NULL != env && 0 < atoi(env)) {
+        volatile int _dbg = 1;
+        while (_dbg) poll(NULL, 0, 1);
+    }
     libmpi = dlopen("libmpi." OPAL_DYN_LIB_SUFFIX, RTLD_NOW | RTLD_GLOBAL);

     if(libmpi == NULL)

Reply via email to