Good morning
I am facing a tuning problem while playing with the orterun command in
order to set a tcp port within a specific range.
A part of this can be I'm not very familiar with the architecture of the
software and I sometimes struggle through the documentation.
Here is what I'm trying to do (problem has been here reduced to
launching a single task on ··one·· remote node):
orterun --launch-agent /home/boubliki/openmpi/bin/orted -mca btl tcp
--mca btl_tcp_port_min_v4 6706 --mca btl_tcp_port_range_v4 10 --mca
oob_tcp_static_ipv4_ports 6705 -host node2:1 -np 1
/path/to/some/program arg1 .. argn
Those mca options are highlighted here and there in various
mailing-lists or archives on the net.Version is 4.0.5.
I tried different combinations like
only --mca btl_tcp_port_min_v4 6706 --mca btl_tcp_port_range_v4 10
(then --report-uri shows a randomly picked up tcp port number)
or adding --mca oob_tcp_static_ipv4_ports 6705 (then --report-uri
report the tcp port I specified and everything crashes)
or many others
but the result becomes:
[node2:4050181] *** Process received signal ***
[node2:4050181] Signal: Segmentation fault (11)
[node2:4050181] Signal code: Address not mapped (1)
[node2:4050181] Failing at address: (nil)
[node2:4050181] [ 0] /lib64/libpthread.so.0(+0x12dd0)[0x7fdaf95a9dd0]
[node2:4050181] *** End of error message ***
bash: line 1: 4050181 Segmentation fault (core dumped)
/home/boubliki/openmpi/bin/orted -mca ess "env" -mca ess_base_jobid
"1254293504" -mca ess_base_vpid 1 -mca ess_base_num_procs "2" -mca
orte_node_regex "node[1:1,2]@0(2)" -mca btl "tcp" --mca
btl_tcp_port_min_v4 "6706" --mca btl_tcp_port_range_v4 "10" --mca
oob_tcp_static_ipv4_ports "6705" -mca plm "rsh" --tree-spawn -mca
routed "radix" -mca orte_parent_uri
"1254293504.0;tcp://192.168.xxx.xxx:6705" -mca orte_launch_agent
"/home/boubliki/openmpi/bin/orted" -mca pmix "^s1,s2,cray,isolated"
I tried on different machines, and also with different compilers (gcc
10.2 and intel 19u1). Version 4.1.0rc5 did not improve the execution.
Forcing no optimization with -O0 neither.
Not familiar with debugging such a software but I could add a lantency
somewhere (sleep()) and catch the orted process on the [single] remote
node, reaching line 572 with gdb
boubliki@node1: ~/openmpi/src/openmpi-4.0.5> cat -n
orte/mca/ess/base/ess_base_std_orted.c | sed -n -r -e '562,583p'
562 if (orte_static_ports || orte_fwd_mpirun_port) {
563 if (NULL == orte_node_regex) {
564 /* we didn't get the node info */
565 error = "cannot construct daemon map for static
ports - no node map info";
566 goto error;
567 }
568 /* extract the node info from the environment and
569 * build a nidmap from it - this will update the
570 * routing plan as well
571 */
572 if (ORTE_SUCCESS != (ret =
*orte_regx.build_daemon_nidmap()*)) {
573 ORTE_ERROR_LOG(ret);
574 error = "construct daemon map from static ports";
575 goto error;
576 }
577 /* be sure to update the routing tree so the initial
"phone home"
578 * to mpirun goes through the tree if static ports were
enabled
579 */
580 orte_routed.update_routing_plan(NULL);
581 /* routing can be enabled */
582 orte_routed_base.routing_enabled = true;
583 }
boubliki@node1: ~/openmpi/src/openmpi-4.0.5>
The debugger led me to printing element called orte_regx, showing
address of a method called build_daemon_nidmap containing a NULL value
while line 572 wants precisely to call this method for execution.
(gdb)
Thread 1 "orted" received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) bt
#0 0x0000000000000000 in ?? ()
*#1 0x00007f76ae3fa585 in orte_ess_base_orted_setup () at
base/ess_base_std_orted.c:572*
#2 0x00007f76ae2662b4 in rte_init () at ess_env_module.c:149
#3 0x00007f76ae432645 in orte_init
(pargc=pargc@entry=0x7ffe1c87a81c, pargv=pargv@entry=0x7ffe1c87a810,
flags=flags@entry=2) at runtime/orte_init.c:271
#4 0x00007f76ae3e0bf0 in orte_daemon (argc=<optimized out>,
argv=<optimized out>) at orted/orted_main.c:362
#5 0x00007f76acc976a3 in __libc_start_main () from /lib64/libc.so.6
#6 0x000000000040111e in _start ()
(gdb) *p orte_regx*
$1 = {init = 0x0, nidmap_create = 0x7f76ab46c230 <nidmap_create>,
nidmap_parse = 0x7f76ae4180b0 <orte_regx_base_nidmap_parse>,
extract_node_names = 0x7f76ae41bd20
<orte_regx_base_extract_node_names>, encode_nodemap = 0x7f76ae418730
<orte_regx_base_encode_nodemap>,
decode_daemon_nodemap = 0x7f76ae41a190
<orte_regx_base_decode_daemon_nodemap>, *build_daemon_nidmap = 0x0*,
generate_ppn = 0x7f76ae41b0f0 <orte_regx_base_generate_ppn>,
parse_ppn = 0x7f76ae41b760 <orte_regx_base_parse_ppn>, finalize = 0x0}
(gdb)
I suppose the orte_regx element has been initialized somewhere through
an inline function in [maybe] opal/class/opal_object.h but I'm lost in
code and probably in some concurrency/multi-threading aspects, and can't
even figure out at the end whether I'm using the mca option correctly or
not, or facing a bug in the core application
478 static inline opal_object_t *opal_obj_new(opal_class_t * cls)
479 {
480 opal_object_t *object;
481 assert(cls->cls_sizeof >= sizeof(opal_object_t));
482
483 #if OPAL_WANT_MEMCHECKER
484 object = (opal_object_t *) calloc(1, cls->cls_sizeof);
485 #else
486 object = (opal_object_t *) malloc(cls->cls_sizeof);
487 #endif
488 if (opal_class_init_epoch != cls->cls_initialized) {
489 *opal_class_initialize(cls);*
490 }
491 if (NULL != object) {
492 object->obj_class = cls;
493 object->obj_reference_count = 1;
494 opal_obj_run_constructors(object);
495 }
496 return object;
497 }
Can you maybe (firstly) fix my knowledge about what correct mca option I
could use for this and get orted on the remote node connecting back to
the tcp port I specify ?
Or (worse) browse the code for a potential bug related to this
functionality ?
Thank you
Vincent