Hello Siegmar,

If your Java program only calls to MPI.Init and MPI.Finalize you don't need debug Java. The JNI layer is very thin, so I think the problem is not in Java. Also, if the process crash is in the JNI side, debugging won't provides you useful information.

But if you want debug 2 processes, you can do the following.

You must launch 2 instances of the Java debugger (jdb) or NetBeans, Eclipse,... listening on port 8000.

The 2 processes must be launched with the necessary parameters to attach to the listening debuggers:

mpirun -np 2 java -agentlib:jdwp=transport=dt_socket,\
server=n,address=localhost:8000 Hello

I checked it on NetBeans and it works.
Here you have more details about debugging:

http://docs.oracle.com/javase/8/docs/technotes/guides/jpda/conninv.html

Regards,
Oscar

El 23/10/14 17:03, Siegmar Gross escribió:
Hello Oscar,

do you have time to look into my problem? Probably Takahiro has a
point and gdb behaves differently on Solaris and Linux, so that
the differing outputs have no meaning. I tried to debug my Java
program, but without success so far, because I wasn't able to get
into the Java program to set a breakpoint or to see the code. Have
you succeeded to debug a mpiJava program? If so, how must I call
gdb (I normally use "gdb mipexec" and then "run -np 1 java ...")?
What can I do to get helpful information to track the error down?
I have attached the error log file. Perhaps you can see if something
is going wrong with the Java interface. Thank you very much for your
help and any hints for the usage of gdb with mpiJava in advance.
Please let me know if I can provide anything else.


Kind regards

Siegmar


I think that it must have to do with MPI, because everything
works fine on Linux and my Java program works fine with an older
MPI version (openmpi-1.8.2a1r31804) as well.
Yes. I also think it must have to do with MPI.
But java process side, not mpiexec process side.

When you run Java MPI program via mpiexec, a mpiexec process
process launch a java process. When the java process (your
Java program) calls a MPI method, native part (written in C/C++)
of the MPI library is called. It runs in java process, not in
mpiexec process. I suspect that part.

On Solaris things are different.
Are you saying the following difference?
After this line,
881             ORTE_ACTIVATE_JOB_STATE(jdata, ORTE_JOB_STATE_INIT);
Linux shows
orte_job_state_to_str (state=1)
     at ../../openmpi-dev-124-g91e9686/orte/util/error_strings.c:217
217         switch(state) {
but Solaris shows
orte_util_print_name_args (name=0x100118380 <orte_process_info+104>)
     at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:122
122         if (NULL == name) {
Each macro is defined as:

#define ORTE_ACTIVATE_JOB_STATE(j, s)                                   \
     do {                                                                \
         orte_job_t *shadow=(j);                                         \
         opal_output_verbose(1, orte_state_base_framework.framework_output, \
                             "%s ACTIVATE JOB %s STATE %s AT %s:%d",  \
                             ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),         \
                             (NULL == shadow) ? "NULL" :                 \
                             ORTE_JOBID_PRINT(shadow->jobid),                \
                             orte_job_state_to_str((s)),                 \
                             __FILE__, __LINE__);                       \
         orte_state.activate_job_state(shadow, (s));                     \
     } while(0);

#define ORTE_NAME_PRINT(n) \
     orte_util_print_name_args(n)

#define ORTE_JOBID_PRINT(n) \
     orte_util_print_jobids(n)

I'm not sure, but I think the gdb on Solaris steps into
orte_util_print_name_args, but gdb on Linux doesn't step into
orte_util_print_name_args and orte_util_print_jobids for some
reason, or orte_job_state_to_str is evaluated before them.

So I think it's not an important difference.

You showed the following lines.
orterun (argc=5, argv=0xffffffff7fffe0d8)
     at
../../../../openmpi-dev-124-g91e9686/orte/tools/orterun/orterun.c:1084
1084        while (orte_event_base_active) {
(gdb)
1085            opal_event_loop(orte_event_base, OPAL_EVLOOP_ONCE);
(gdb)
I'm not familiar with this code but I think this part (in mpiexec
process) is only waiting the java process to terminate (normally
or abnormally). So I think the problem is not in a mpiexec process
but in a java process.

Regards,
Takahiro

Hi Takahiro,

mpiexec and java run as distinct processes. Your JRE message
says java process raises SEGV. So you should trace the java
process, not the mpiexec process. And more, your JRE message
says the crash happened outside the Java Virtual Machine in
native code. So usual Java program debugger is useless.
You should trace native code part of the java process.
Unfortunately I don't know how to debug such one.
I think that it must have to do with MPI, because everything
works fine on Linux and my Java program works fine with an older
MPI version (openmpi-1.8.2a1r31804) as well.

linpc1 x 112 mpiexec -np 1 java InitFinalizeMain
Hello!
linpc1 x 113

Therefore I single stepped through the program on Linux as well
and found a difference launching the process. On Linux I get the
following sequence.

Breakpoint 1, rsh_launch (jdata=0x614aa0)
     at
../../../../../openmpi-dev-124-g91e9686/orte/mca/plm/rsh/plm_rsh_module.c:876
876         if (ORTE_FLAG_TEST(jdata, ORTE_JOB_FLAG_RESTART)) {
(gdb) s
881             ORTE_ACTIVATE_JOB_STATE(jdata, ORTE_JOB_STATE_INIT);
(gdb) s
orte_job_state_to_str (state=1)
     at ../../openmpi-dev-124-g91e9686/orte/util/error_strings.c:217
217         switch(state) {
(gdb)
221             return "PENDING INIT";
(gdb)
317     }
(gdb)
orte_util_print_jobids (job=4294967295)
     at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:170
170         ptr = get_print_name_buffer();
(gdb)



On Solaris things are different.

Breakpoint 1, rsh_launch (jdata=0x100125250)
     at
../../../../../openmpi-dev-124-g91e9686/orte/mca/plm/rsh/plm_rsh_module.c:876
876         if (ORTE_FLAG_TEST(jdata, ORTE_JOB_FLAG_RESTART)) {
(gdb) s
881             ORTE_ACTIVATE_JOB_STATE(jdata, ORTE_JOB_STATE_INIT);
(gdb) s
orte_util_print_name_args (name=0x100118380 <orte_process_info+104>)
     at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:122
122         if (NULL == name) {
(gdb)
142         job = orte_util_print_jobids(name->jobid);
(gdb)
orte_util_print_jobids (job=2673410048)
     at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:170
170         ptr = get_print_name_buffer();
(gdb)



Is this normal or is it the reason for the crash on Solaris?


Kind regards

Siegmar







The log file output by JRE may help you.
# An error report file with more information is saved as:
#
/home/fd1026/work/skripte/master/parallel/prog/mpi/java/hs_err_pid13080.log
Regards,
Takahiro

Hi,

I installed openmpi-dev-124-g91e9686 on Solaris 10 Sparc with
gcc-4.9.1 to track down the error with my small Java program.
I started single stepping in orterun.c at line 1081 and
continued until I got the segmentation fault. I get
"jdata = 0x0" in version openmpi-1.8.2a1r31804, which is the
last one which works with Java in my environment, while I get
"jdata = 0x100125250" in this version. Unfortunately I don't
know which files or variables are important to look at. Perhaps
somebody can look at the following lines of code and tell me,
which information I should provide to solve the problem. I know
that Solaris isn't any longer on your list of supported systems,
but perhaps we can get it working again, if you tell me what
you need and I do the debugging.

/usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec
GNU gdb (GDB) 7.6.1
...
(gdb) run -np 1 java InitFinalizeMain
Starting program: /usr/local/openmpi-1.9.0_64_gcc/bin/mpiexec \
   -np 1 java InitFinalizeMain
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[New LWP    2        ]
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0xffffffff7ea3c7f0, pid=13064, tid=2
...
[LWP    2         exited]
[New Thread 2        ]
[Switching to Thread 1 (LWP 1)]
sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be
   found to satisfy query
(gdb) thread 1
[Switching to thread 1 (LWP    1        )]
#0  0xffffffff7f6173d0 in rtld_db_dlactivity () from
/usr/lib/sparcv9/ld.so.1
(gdb) b orterun.c:1081
Breakpoint 1 at 0x1000070dc: file
../../../../openmpi-dev-124-g91e9686/orte/tools/orterun/orterun.c, line
1081.
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y

Starting program: /usr/local/openmpi-1.9.0_64_gcc/bin/mpiexec -np 1 java
InitFinalizeMain
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[New LWP    2        ]
[Switching to Thread 1 (LWP 1)]

Breakpoint 1, orterun (argc=5, argv=0xffffffff7fffe0d8)
     at
../../../../openmpi-dev-124-g91e9686/orte/tools/orterun/orterun.c:1081
1081        rc = orte_plm.spawn(jdata);
(gdb) print jdata
$1 = (orte_job_t *) 0x100125250
(gdb) s
rsh_launch (jdata=0x100125250)
     at

../../../../../openmpi-dev-124-g91e9686/orte/mca/plm/rsh/plm_rsh_module.c:876
876         if (ORTE_FLAG_TEST(jdata, ORTE_JOB_FLAG_RESTART)) {
(gdb) s
881             ORTE_ACTIVATE_JOB_STATE(jdata, ORTE_JOB_STATE_INIT);
(gdb)
orte_util_print_name_args (name=0x100118380 <orte_process_info+104>)
     at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:122
122         if (NULL == name) {
(gdb)
142         job = orte_util_print_jobids(name->jobid);
(gdb)
orte_util_print_jobids (job=2502885376) at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:170
170         ptr = get_print_name_buffer();
(gdb)
get_print_name_buffer () at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:92
92          if (!fns_init) {
(gdb)
101         ret = opal_tsd_getspecific(print_args_tsd_key,
(void**)&ptr);
(gdb)
opal_tsd_getspecific (key=1, valuep=0xffffffff7fffd990)
     at ../../openmpi-dev-124-g91e9686/opal/threads/tsd.h:163
163         *valuep = pthread_getspecific(key);
(gdb)
164         return OPAL_SUCCESS;
(gdb)
165     }
(gdb)
get_print_name_buffer () at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:102
102         if (OPAL_SUCCESS != ret) return NULL;
(gdb)
104         if (NULL == ptr) {
(gdb)
113         return (orte_print_args_buffers_t*) ptr;
(gdb)
114     }
(gdb)
orte_util_print_jobids (job=2502885376) at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:172
172         if (NULL == ptr) {
(gdb)
178         if (ORTE_PRINT_NAME_ARG_NUM_BUFS == ptr->cntr) {
(gdb)
182         if (ORTE_JOBID_INVALID == job) {
(gdb)
184         } else if (ORTE_JOBID_WILDCARD == job) {
(gdb)
187             tmp1 = ORTE_JOB_FAMILY((unsigned long)job);
(gdb)
188             tmp2 = ORTE_LOCAL_JOBID((unsigned long)job);
(gdb)
189             snprintf(ptr->buffers[ptr->cntr++],
(gdb)
193         return ptr->buffers[ptr->cntr-1];
(gdb)
194     }
(gdb)
orte_util_print_name_args (name=0x100118380 <orte_process_info+104>)
     at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:143
143         vpid = orte_util_print_vpids(name->vpid);
(gdb)
orte_util_print_vpids (vpid=0) at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:260
260         ptr = get_print_name_buffer();
(gdb)
get_print_name_buffer () at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:92
92          if (!fns_init) {
(gdb)
101         ret = opal_tsd_getspecific(print_args_tsd_key,
(void**)&ptr);
(gdb)
opal_tsd_getspecific (key=1, valuep=0xffffffff7fffd9a0)
     at ../../openmpi-dev-124-g91e9686/opal/threads/tsd.h:163
163         *valuep = pthread_getspecific(key);
(gdb)
164         return OPAL_SUCCESS;
(gdb)
165     }
(gdb)
get_print_name_buffer () at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:102
102         if (OPAL_SUCCESS != ret) return NULL;
(gdb)
104         if (NULL == ptr) {
(gdb)
113         return (orte_print_args_buffers_t*) ptr;
(gdb)
114     }
(gdb)
orte_util_print_vpids (vpid=0) at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:262
262         if (NULL == ptr) {
(gdb)
268         if (ORTE_PRINT_NAME_ARG_NUM_BUFS == ptr->cntr) {
(gdb)
272         if (ORTE_VPID_INVALID == vpid) {
(gdb)
274         } else if (ORTE_VPID_WILDCARD == vpid) {
(gdb)
277             snprintf(ptr->buffers[ptr->cntr++],
(gdb)
281         return ptr->buffers[ptr->cntr-1];
(gdb)
282     }
(gdb)
orte_util_print_name_args (name=0x100118380 <orte_process_info+104>)
     at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:146
146         ptr = get_print_name_buffer();
(gdb)
get_print_name_buffer () at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:92
92          if (!fns_init) {
(gdb)
101         ret = opal_tsd_getspecific(print_args_tsd_key,
(void**)&ptr);
(gdb)
opal_tsd_getspecific (key=1, valuep=0xffffffff7fffda60)
     at ../../openmpi-dev-124-g91e9686/opal/threads/tsd.h:163
163         *valuep = pthread_getspecific(key);
(gdb)
164         return OPAL_SUCCESS;
(gdb)
165     }
(gdb)
get_print_name_buffer () at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:102
102         if (OPAL_SUCCESS != ret) return NULL;
(gdb)
104         if (NULL == ptr) {
(gdb)
113         return (orte_print_args_buffers_t*) ptr;
(gdb)
114     }
(gdb)
orte_util_print_name_args (name=0x100118380 <orte_process_info+104>)
     at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:148
148         if (NULL == ptr) {
(gdb)
154         if (ORTE_PRINT_NAME_ARG_NUM_BUFS == ptr->cntr) {
(gdb)
158         snprintf(ptr->buffers[ptr->cntr++],
(gdb)
162         return ptr->buffers[ptr->cntr-1];
(gdb)
163     }
(gdb)
orte_util_print_jobids (job=4294967295) at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:170
170         ptr = get_print_name_buffer();
(gdb)
get_print_name_buffer () at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:92
92          if (!fns_init) {
(gdb)
101         ret = opal_tsd_getspecific(print_args_tsd_key,
(void**)&ptr);
(gdb)
opal_tsd_getspecific (key=1, valuep=0xffffffff7fffda60)
     at ../../openmpi-dev-124-g91e9686/opal/threads/tsd.h:163
163         *valuep = pthread_getspecific(key);
(gdb)
164         return OPAL_SUCCESS;
(gdb)
165     }
(gdb)
get_print_name_buffer () at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:102
102         if (OPAL_SUCCESS != ret) return NULL;
(gdb)
104         if (NULL == ptr) {
(gdb)
113         return (orte_print_args_buffers_t*) ptr;
(gdb)
114     }
(gdb)
orte_util_print_jobids (job=4294967295) at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:172
172         if (NULL == ptr) {
(gdb)
178         if (ORTE_PRINT_NAME_ARG_NUM_BUFS == ptr->cntr) {
(gdb)
182         if (ORTE_JOBID_INVALID == job) {
(gdb)
183             snprintf(ptr->buffers[ptr->cntr++],
ORTE_PRINT_NAME_ARGS_MAX_SIZE, "[INVALID]");
(gdb)
193         return ptr->buffers[ptr->cntr-1];
(gdb)
194     }
(gdb)
orte_job_state_to_str (state=1) at
../../openmpi-dev-124-g91e9686/orte/util/error_strings.c:217
217         switch(state) {
(gdb)
221             return "PENDING INIT";
(gdb)
317     }
(gdb)
opal_output_verbose (level=1, output_id=0,
     format=0xffffffff7f14dd98 <orte_job_states>
"\336\257\276\355\336\257\276\355")
     at ../../../openmpi-dev-124-g91e9686/opal/util/output.c:373
373             va_start(arglist, format);
(gdb)
369     {
(gdb)
370         if (output_id >= 0 && output_id < OPAL_OUTPUT_MAX_STREAMS &&
(gdb)
377     }
(gdb)
orte_state_base_activate_job_state (jdata=0x100125250, state=1)
     at

../../../../openmpi-dev-124-g91e9686/orte/mca/state/base/state_base_fns.c:33
33          opal_list_item_t *itm, *any=NULL, *error=NULL;
(gdb)
37          for (itm = opal_list_get_first(&orte_job_states);
(gdb)
opal_list_get_first (list=0xffffffff7f14dd98 <orte_job_states>)
     at ../../../../openmpi-dev-124-g91e9686/opal/class/opal_list.h:320
320         opal_list_item_t* item =
(opal_list_item_t*)list->opal_list_sentinel.opal_list_next;
(gdb)
324         assert(1 == item->opal_list_item_refcount);
(gdb)
325         assert( list == item->opal_list_item_belong_to );
(gdb)
328         return item;
(gdb)
329     }
(gdb)
orte_state_base_activate_job_state (jdata=0x100125250, state=1)
     at

../../../../openmpi-dev-124-g91e9686/orte/mca/state/base/state_base_fns.c:38
38               itm != opal_list_get_end(&orte_job_states);
(gdb)
opal_list_get_end (list=0xffffffff7f14dd98 <orte_job_states>)
     at ../../../../openmpi-dev-124-g91e9686/opal/class/opal_list.h:399
399         return &(list->opal_list_sentinel);
(gdb)
400     }
(gdb)
orte_state_base_activate_job_state (jdata=0x100125250, state=1)
     at

../../../../openmpi-dev-124-g91e9686/orte/mca/state/base/state_base_fns.c:37
37          for (itm = opal_list_get_first(&orte_job_states);
(gdb)
40              s = (orte_state_t*)itm;
(gdb)
41              if (s->job_state == ORTE_JOB_STATE_ANY) {
(gdb)
45              if (s->job_state == ORTE_JOB_STATE_ERROR) {
(gdb)
48              if (s->job_state == state) {
(gdb)
49                  OPAL_OUTPUT_VERBOSE((1,
orte_state_base_framework.framework_output,
(gdb)
orte_util_print_name_args (name=0x100118380 <orte_process_info+104>)
     at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:122
122         if (NULL == name) {
(gdb)
142         job = orte_util_print_jobids(name->jobid);
(gdb)
orte_util_print_jobids (job=2502885376) at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:170
170         ptr = get_print_name_buffer();
(gdb)
get_print_name_buffer () at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:92
92          if (!fns_init) {
(gdb)
101         ret = opal_tsd_getspecific(print_args_tsd_key,
(void**)&ptr);
(gdb)
opal_tsd_getspecific (key=1, valuep=0xffffffff7fffd880)
     at ../../openmpi-dev-124-g91e9686/opal/threads/tsd.h:163
163         *valuep = pthread_getspecific(key);
(gdb)
164         return OPAL_SUCCESS;
(gdb)
165     }
(gdb)
get_print_name_buffer () at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:102
102         if (OPAL_SUCCESS != ret) return NULL;
(gdb)
104         if (NULL == ptr) {
(gdb)
113         return (orte_print_args_buffers_t*) ptr;
(gdb)
114     }
(gdb)
orte_util_print_jobids (job=2502885376) at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:172
172         if (NULL == ptr) {
(gdb)
178         if (ORTE_PRINT_NAME_ARG_NUM_BUFS == ptr->cntr) {
(gdb)
182         if (ORTE_JOBID_INVALID == job) {
(gdb)
184         } else if (ORTE_JOBID_WILDCARD == job) {
(gdb)
187             tmp1 = ORTE_JOB_FAMILY((unsigned long)job);
(gdb)
188             tmp2 = ORTE_LOCAL_JOBID((unsigned long)job);
(gdb)
189             snprintf(ptr->buffers[ptr->cntr++],
(gdb)
193         return ptr->buffers[ptr->cntr-1];
(gdb)
194     }
(gdb)
orte_util_print_name_args (name=0x100118380 <orte_process_info+104>)
     at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:143
143         vpid = orte_util_print_vpids(name->vpid);
(gdb)
orte_util_print_vpids (vpid=0) at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:260
260         ptr = get_print_name_buffer();
(gdb)
get_print_name_buffer () at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:92
92          if (!fns_init) {
(gdb)
101         ret = opal_tsd_getspecific(print_args_tsd_key,
(void**)&ptr);
(gdb)
opal_tsd_getspecific (key=1, valuep=0xffffffff7fffd890)
     at ../../openmpi-dev-124-g91e9686/opal/threads/tsd.h:163
163         *valuep = pthread_getspecific(key);
(gdb)
164         return OPAL_SUCCESS;
(gdb)
165     }
(gdb)
get_print_name_buffer () at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:102
102         if (OPAL_SUCCESS != ret) return NULL;
(gdb)
104         if (NULL == ptr) {
(gdb)
113         return (orte_print_args_buffers_t*) ptr;
(gdb)
114     }
(gdb)
orte_util_print_vpids (vpid=0) at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:262
262         if (NULL == ptr) {
(gdb)
268         if (ORTE_PRINT_NAME_ARG_NUM_BUFS == ptr->cntr) {
(gdb)
272         if (ORTE_VPID_INVALID == vpid) {
(gdb)
274         } else if (ORTE_VPID_WILDCARD == vpid) {
(gdb)
277             snprintf(ptr->buffers[ptr->cntr++],
(gdb)
281         return ptr->buffers[ptr->cntr-1];
(gdb)
282     }
(gdb)
orte_util_print_name_args (name=0x100118380 <orte_process_info+104>)
     at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:146
146         ptr = get_print_name_buffer();
(gdb)
get_print_name_buffer () at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:92
92          if (!fns_init) {
(gdb)
101         ret = opal_tsd_getspecific(print_args_tsd_key,
(void**)&ptr);
(gdb)
opal_tsd_getspecific (key=1, valuep=0xffffffff7fffd950)
     at ../../openmpi-dev-124-g91e9686/opal/threads/tsd.h:163
163         *valuep = pthread_getspecific(key);
(gdb)
164         return OPAL_SUCCESS;
(gdb)
165     }
(gdb)
get_print_name_buffer () at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:102
102         if (OPAL_SUCCESS != ret) return NULL;
(gdb)
104         if (NULL == ptr) {
(gdb)
113         return (orte_print_args_buffers_t*) ptr;
(gdb)
114     }
(gdb)
orte_util_print_name_args (name=0x100118380 <orte_process_info+104>)
     at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:148
148         if (NULL == ptr) {
(gdb)
154         if (ORTE_PRINT_NAME_ARG_NUM_BUFS == ptr->cntr) {
(gdb)
158         snprintf(ptr->buffers[ptr->cntr++],
(gdb)
162         return ptr->buffers[ptr->cntr-1];
(gdb)
163     }
(gdb)
orte_util_print_jobids (job=4294967295) at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:170
170         ptr = get_print_name_buffer();
(gdb)
get_print_name_buffer () at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:92
92          if (!fns_init) {
(gdb)
101         ret = opal_tsd_getspecific(print_args_tsd_key,
(void**)&ptr);
(gdb)
opal_tsd_getspecific (key=1, valuep=0xffffffff7fffd950)
     at ../../openmpi-dev-124-g91e9686/opal/threads/tsd.h:163
163         *valuep = pthread_getspecific(key);
(gdb)
164         return OPAL_SUCCESS;
(gdb)
165     }
(gdb)
get_print_name_buffer () at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:102
102         if (OPAL_SUCCESS != ret) return NULL;
(gdb)
104         if (NULL == ptr) {
(gdb)
113         return (orte_print_args_buffers_t*) ptr;
(gdb)
114     }
(gdb)
orte_util_print_jobids (job=4294967295) at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:172
172         if (NULL == ptr) {
(gdb)
178         if (ORTE_PRINT_NAME_ARG_NUM_BUFS == ptr->cntr) {
(gdb)
182         if (ORTE_JOBID_INVALID == job) {
(gdb)
183             snprintf(ptr->buffers[ptr->cntr++],
ORTE_PRINT_NAME_ARGS_MAX_SIZE, "[INVALID]");
(gdb)
193         return ptr->buffers[ptr->cntr-1];
(gdb)
194     }
(gdb)
orte_job_state_to_str (state=1) at
../../openmpi-dev-124-g91e9686/orte/util/error_strings.c:217
217         switch(state) {
(gdb)
221             return "PENDING INIT";
(gdb)
317     }
(gdb)
opal_output_verbose (level=1, output_id=-1, format=0x1 <Address 0x1 out
of
bounds>)
     at ../../../openmpi-dev-124-g91e9686/opal/util/output.c:373
373             va_start(arglist, format);
(gdb)
369     {
(gdb)
370         if (output_id >= 0 && output_id < OPAL_OUTPUT_MAX_STREAMS &&
(gdb)
377     }
(gdb)
orte_state_base_activate_job_state (jdata=0x100125250, state=1)
     at

../../../../openmpi-dev-124-g91e9686/orte/mca/state/base/state_base_fns.c:54
54                  if (NULL == s->cbfunc) {
(gdb)
62                  caddy = OBJ_NEW(orte_state_caddy_t);
(gdb)
opal_obj_new_debug (type=0xffffffff7f14c7d8 <orte_state_caddy_t_class>,
     file=0xffffffff7f034c08

"../../../../openmpi-dev-124-g91e9686/orte/mca/state/base/state_base_fns.c",
line=62) at
../../../../openmpi-dev-124-g91e9686/opal/class/opal_object.h:249
249         opal_object_t* object = opal_obj_new(type);
(gdb)
opal_obj_new (cls=0xffffffff7f14c7d8 <orte_state_caddy_t_class>)
     at ../../../../openmpi-dev-124-g91e9686/opal/class/opal_object.h:465
465         assert(cls->cls_sizeof >= sizeof(opal_object_t));
(gdb)
470         object = (opal_object_t *) malloc(cls->cls_sizeof);
(gdb)
472         if (0 == cls->cls_initialized) {
(gdb)
473             opal_class_initialize(cls);
(gdb)
opal_class_initialize (cls=0xffffffff7f14c7d8
<orte_state_caddy_t_class>)
     at ../../openmpi-dev-124-g91e9686/opal/class/opal_object.c:79
79          assert(cls);
(gdb)
84          if (1 == cls->cls_initialized) {
(gdb)
87          opal_atomic_lock(&class_lock);
(gdb)
opal_atomic_lock (lock=0xffffffff7ee89bf0 <class_lock>)
     at
../../openmpi-dev-124-g91e9686/opal/include/opal/sys/atomic_impl.h:397
397        while( !opal_atomic_cmpset_acq_32( &(lock->u.lock),
(gdb)
opal_atomic_cmpset_acq_32 (addr=0xffffffff7ee89bf0 <class_lock>,
oldval=0,
newval=1)
     at
../../openmpi-dev-124-g91e9686/opal/include/opal/sys/sparcv9/atomic.h:107
107        rc = opal_atomic_cmpset_32(addr, oldval, newval);
(gdb)
opal_atomic_cmpset_32 (addr=0xffffffff7ee89bf0 <class_lock>, oldval=0,
newval=1)
     at
../../openmpi-dev-124-g91e9686/opal/include/opal/sys/sparcv9/atomic.h:93
93         int32_t ret = newval;
(gdb)
95         __asm__ __volatile__("casa [%1] " ASI_P ", %2, %0"
(gdb)
98         return (ret == oldval);
(gdb)
99      }
(gdb)
opal_atomic_cmpset_acq_32 (addr=0xffffffff7ee89bf0 <class_lock>,
oldval=0,
newval=1)
     at
../../openmpi-dev-124-g91e9686/opal/include/opal/sys/sparcv9/atomic.h:108
108        opal_atomic_rmb();
(gdb)
opal_atomic_rmb () at
../../openmpi-dev-124-g91e9686/opal/include/opal/sys/sparcv9/atomic.h:63
63          MEMBAR("#LoadLoad");
(gdb)
64      }
(gdb)
opal_atomic_cmpset_acq_32 (addr=0xffffffff7ee89bf0 <class_lock>,
oldval=0,
newval=1)
     at
../../openmpi-dev-124-g91e9686/opal/include/opal/sys/sparcv9/atomic.h:110
110        return rc;
(gdb)
111     }
(gdb)
opal_atomic_lock (lock=0xffffffff7ee89bf0 <class_lock>)
     at
../../openmpi-dev-124-g91e9686/opal/include/opal/sys/atomic_impl.h:403
403     }
(gdb)
opal_class_initialize (cls=0xffffffff7f14c7d8
<orte_state_caddy_t_class>)
     at ../../openmpi-dev-124-g91e9686/opal/class/opal_object.c:93
93          if (1 == cls->cls_initialized) {
(gdb)
103         cls->cls_depth = 0;
(gdb)
104         cls_construct_array_count = 0;
(gdb)
105         cls_destruct_array_count  = 0;
(gdb)
106         for (c = cls; c; c = c->cls_parent) {
(gdb)
107             if( NULL != c->cls_construct ) {
(gdb)
108                 cls_construct_array_count++;
(gdb)
110             if( NULL != c->cls_destruct ) {
(gdb)
111                 cls_destruct_array_count++;
(gdb)
113             cls->cls_depth++;
(gdb)
106         for (c = cls; c; c = c->cls_parent) {
(gdb)
107             if( NULL != c->cls_construct ) {
(gdb)
110             if( NULL != c->cls_destruct ) {
(gdb)
113             cls->cls_depth++;
(gdb)
106         for (c = cls; c; c = c->cls_parent) {
(gdb)
122             (void
(**)(opal_object_t*))malloc((cls_construct_array_count +
(gdb)
123
cls_destruct_array_count + 2)
*
(gdb)
122             (void
(**)(opal_object_t*))malloc((cls_construct_array_count +
(gdb)
121         cls->cls_construct_array =
(gdb)
125         if (NULL == cls->cls_construct_array) {
(gdb)
130             cls->cls_construct_array + cls_construct_array_count +
1;
(gdb)
129         cls->cls_destruct_array =
(gdb)
136         cls_construct_array = cls->cls_construct_array +
cls_construct_array_count;
(gdb)
137         cls_destruct_array  = cls->cls_destruct_array;
(gdb)
139         c = cls;
(gdb)
140         *cls_construct_array = NULL;  /* end marker for the
constructors */
(gdb)
141         for (i = 0; i < cls->cls_depth; i++) {
(gdb)
142             if( NULL != c->cls_construct ) {
(gdb)
143                 --cls_construct_array;
(gdb)
144                 *cls_construct_array = c->cls_construct;
(gdb)
146             if( NULL != c->cls_destruct ) {
(gdb)
147                 *cls_destruct_array = c->cls_destruct;
(gdb)
148                 cls_destruct_array++;
(gdb)
150             c = c->cls_parent;
(gdb)
141         for (i = 0; i < cls->cls_depth; i++) {
(gdb)
142             if( NULL != c->cls_construct ) {
(gdb)
146             if( NULL != c->cls_destruct ) {
(gdb)
150             c = c->cls_parent;
(gdb)
141         for (i = 0; i < cls->cls_depth; i++) {
(gdb)
152         *cls_destruct_array = NULL;  /* end marker for the
destructors */
(gdb)
154         cls->cls_initialized = 1;
(gdb)
155         save_class(cls);
(gdb)
save_class (cls=0xffffffff7f14c7d8 <orte_state_caddy_t_class>)
     at ../../openmpi-dev-124-g91e9686/opal/class/opal_object.c:188
188         if (num_classes >= max_classes) {
(gdb)
189             expand_array();
(gdb)
expand_array () at
../../openmpi-dev-124-g91e9686/opal/class/opal_object.c:201
201         max_classes += increment;
(gdb)
202         classes = (void**)realloc(classes, sizeof(opal_class_t*) *
max_classes);
(gdb)
203         if (NULL == classes) {
(gdb)
207         for (i = num_classes; i < max_classes; ++i) {
(gdb)
208             classes[i] = NULL;
(gdb)
207         for (i = num_classes; i < max_classes; ++i) {
(gdb)
208             classes[i] = NULL;
(gdb)
207         for (i = num_classes; i < max_classes; ++i) {
(gdb)
208             classes[i] = NULL;
(gdb)
207         for (i = num_classes; i < max_classes; ++i) {
(gdb)
208             classes[i] = NULL;
(gdb)
207         for (i = num_classes; i < max_classes; ++i) {
(gdb)
208             classes[i] = NULL;
(gdb)
207         for (i = num_classes; i < max_classes; ++i) {
(gdb)
208             classes[i] = NULL;
(gdb)
207         for (i = num_classes; i < max_classes; ++i) {
(gdb)
208             classes[i] = NULL;
(gdb)
207         for (i = num_classes; i < max_classes; ++i) {
(gdb)
208             classes[i] = NULL;
(gdb)
207         for (i = num_classes; i < max_classes; ++i) {
(gdb)
208             classes[i] = NULL;
(gdb)
207         for (i = num_classes; i < max_classes; ++i) {
(gdb)
208             classes[i] = NULL;
(gdb)
207         for (i = num_classes; i < max_classes; ++i) {
(gdb)
210     }
(gdb)
save_class (cls=0xffffffff7f14c7d8 <orte_state_caddy_t_class>)
     at ../../openmpi-dev-124-g91e9686/opal/class/opal_object.c:192
192         classes[num_classes] = cls->cls_construct_array;
(gdb)
193         ++num_classes;
(gdb)
194     }
(gdb)
opal_class_initialize (cls=0xffffffff7f14c7d8
<orte_state_caddy_t_class>)
     at ../../openmpi-dev-124-g91e9686/opal/class/opal_object.c:159
159         opal_atomic_unlock(&class_lock);
(gdb)
opal_atomic_unlock (lock=0xffffffff7ee89bf0 <class_lock>)
     at
../../openmpi-dev-124-g91e9686/opal/include/opal/sys/atomic_impl.h:409
409        opal_atomic_wmb();
(gdb)
opal_atomic_wmb () at
../../openmpi-dev-124-g91e9686/opal/include/opal/sys/sparcv9/atomic.h:69
69          MEMBAR("#StoreStore");
(gdb)
70      }
(gdb)
opal_atomic_unlock (lock=0xffffffff7ee89bf0 <class_lock>)
     at
../../openmpi-dev-124-g91e9686/opal/include/opal/sys/atomic_impl.h:410
410        lock->u.lock=OPAL_ATOMIC_UNLOCKED;
(gdb)
411     }
(gdb)
opal_class_initialize (cls=0xffffffff7f14c7d8
<orte_state_caddy_t_class>)
     at ../../openmpi-dev-124-g91e9686/opal/class/opal_object.c:160
160     }
(gdb)
opal_obj_new (cls=0xffffffff7f14c7d8 <orte_state_caddy_t_class>)
     at ../../../../openmpi-dev-124-g91e9686/opal/class/opal_object.h:475
475         if (NULL != object) {
(gdb)
476             object->obj_class = cls;
(gdb)
477             object->obj_reference_count = 1;
(gdb)
478             opal_obj_run_constructors(object);
(gdb)
opal_obj_run_constructors (object=0x1001bfcf0)
     at ../../../../openmpi-dev-124-g91e9686/opal/class/opal_object.h:420
420         assert(NULL != object->obj_class);
(gdb)
422         cls_construct = object->obj_class->cls_construct_array;
(gdb)
423         while( NULL != *cls_construct ) {
(gdb)
424             (*cls_construct)(object);
(gdb)
orte_state_caddy_construct (caddy=0x1001bfcf0)
     at

../../../../openmpi-dev-124-g91e9686/orte/mca/state/base/state_base_frame.c:84
84          memset(&caddy->ev, 0, sizeof(opal_event_t));
(gdb)
85          caddy->jdata = NULL;
(gdb)
86      }
(gdb)
opal_obj_run_constructors (object=0x1001bfcf0)
     at ../../../../openmpi-dev-124-g91e9686/opal/class/opal_object.h:425
425             cls_construct++;
(gdb)
423         while( NULL != *cls_construct ) {
(gdb)
427     }
(gdb)
opal_obj_new (cls=0xffffffff7f14c7d8 <orte_state_caddy_t_class>)
     at ../../../../openmpi-dev-124-g91e9686/opal/class/opal_object.h:480
480         return object;
(gdb)
481     }
(gdb)
opal_obj_new_debug (type=0xffffffff7f14c7d8 <orte_state_caddy_t_class>,
     file=0xffffffff7f034c08

"../../../../openmpi-dev-124-g91e9686/orte/mca/state/base/state_base_fns.c",
line=62) at
../../../../openmpi-dev-124-g91e9686/opal/class/opal_object.h:250
250         object->obj_magic_id = OPAL_OBJ_MAGIC_ID;
(gdb)
251         object->cls_init_file_name = file;
(gdb)
252         object->cls_init_lineno = line;
(gdb)
253         return object;
(gdb)
254     }
(gdb)
orte_state_base_activate_job_state (jdata=0x100125250, state=1)
     at

../../../../openmpi-dev-124-g91e9686/orte/mca/state/base/state_base_fns.c:63
63                  if (NULL != jdata) {
(gdb)
64                      caddy->jdata = jdata;
(gdb)
65                      caddy->job_state = state;
(gdb)
66                      OBJ_RETAIN(jdata);
(gdb)
opal_obj_update (inc=1, object=0x100125250)
     at ../../../../openmpi-dev-124-g91e9686/opal/class/opal_object.h:497
497         return opal_atomic_add_32(&(object->obj_reference_count),
inc);
(gdb)
opal_atomic_add_32 (addr=0x100125260, delta=1)
     at

../../../../openmpi-dev-124-g91e9686/opal/include/opal/sys/atomic_impl.h:63
63            oldval = *addr;
(gdb)
64         } while (0 == opal_atomic_cmpset_32(addr, oldval, oldval +
delta));
(gdb)
opal_atomic_cmpset_32 (addr=0x100125260, oldval=1, newval=2)
     at

../../../../openmpi-dev-124-g91e9686/opal/include/opal/sys/sparcv9/atomic.h:93
93         int32_t ret = newval;
(gdb)
95         __asm__ __volatile__("casa [%1] " ASI_P ", %2, %0"
(gdb)
98         return (ret == oldval);
(gdb)
99      }
(gdb)
opal_atomic_add_32 (addr=0x100125260, delta=1)
     at

../../../../openmpi-dev-124-g91e9686/opal/include/opal/sys/atomic_impl.h:65
65         return (oldval + delta);
(gdb)
66      }
(gdb)
orte_state_base_activate_job_state (jdata=0x100125250, state=1)
     at

../../../../openmpi-dev-124-g91e9686/orte/mca/state/base/state_base_fns.c:66
66                      OBJ_RETAIN(jdata);
(gdb)
68                  opal_event_set(orte_event_base, &caddy->ev, -1,
OPAL_EV_WRITE, s->cbfunc, caddy);
(gdb)
69                  opal_event_set_priority(&caddy->ev, s->priority);
(gdb)
70                  opal_event_active(&caddy->ev, OPAL_EV_WRITE, 1);
(gdb)
71                  return;
(gdb)
105     }
(gdb)
rsh_launch (jdata=0x100125250)
     at

../../../../../openmpi-dev-124-g91e9686/orte/mca/plm/rsh/plm_rsh_module.c:883
883         return ORTE_SUCCESS;
(gdb)
884     }
(gdb)
orterun (argc=5, argv=0xffffffff7fffe0d8)
     at
../../../../openmpi-dev-124-g91e9686/orte/tools/orterun/orterun.c:1084
1084        while (orte_event_base_active) {
(gdb)
1085            opal_event_loop(orte_event_base, OPAL_EVLOOP_ONCE);
(gdb)
1084        while (orte_event_base_active) {
(gdb)
1085            opal_event_loop(orte_event_base, OPAL_EVLOOP_ONCE);
(gdb)
1084        while (orte_event_base_active) {
(gdb)
1085            opal_event_loop(orte_event_base, OPAL_EVLOOP_ONCE);
(gdb)
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0xffffffff7ea3c7f0, pid=13080, tid=2
#
# JRE version: Java(TM) SE Runtime Environment (8.0-b132) (build
1.8.0-b132)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b70 mixed mode
solaris-sparc
compressed oops)
# Problematic frame:
# 1084      while (orte_event_base_active) {
(gdb)
1085            opal_event_loop(orte_event_base, OPAL_EVLOOP_ONCE);
(gdb)
C  [libc.so.1+0x3c7f0]  strlen+0x50
#
# Failed to write core dump. Core dumps have been disabled. To enable
core
dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
#
/home/fd1026/work/skripte/master/parallel/prog/mpi/java/hs_err_pid13080.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.sun.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 0 on node tyr exited on
signal 6
(Abort).

--------------------------------------------------------------------------
1084        while (orte_event_base_active) {
(gdb)
1089        orte_odls.kill_local_procs(NULL);
(gdb)


Thank you very much for any help in advance.

Kind regards

Siegmar
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/10/25559.php

Reply via email to