Good morning

I am facing a tuning problem while playing with the orterun command in order to set a tcp port within a specific range. A part of this can be I'm not very familiar with the architecture of the software and I sometimes struggle through the documentation.

Here is what I'm trying to do (problem has been here reduced to launching a single task on ··one·· remote node):

   orterun --launch-agent /home/boubliki/openmpi/bin/orted -mca btl tcp
   --mca btl_tcp_port_min_v4 6706 --mca btl_tcp_port_range_v4 10 --mca
   oob_tcp_static_ipv4_ports 6705 -host node2:1 -np 1
   /path/to/some/program arg1 .. argn

Those mca options are highlighted here and there in various mailing-lists or archives on the net.Version is 4.0.5.

I tried different combinations like

   only --mca btl_tcp_port_min_v4 6706 --mca btl_tcp_port_range_v4 10
   (then --report-uri shows a randomly picked up tcp port number)
   or adding --mca oob_tcp_static_ipv4_ports 6705  (then --report-uri
   report the tcp port I specified and everything crashes)
   or many others

but the result becomes:

   [node2:4050181] *** Process received signal ***
   [node2:4050181] Signal: Segmentation fault (11)
   [node2:4050181] Signal code: Address not mapped (1)
   [node2:4050181] Failing at address: (nil)
   [node2:4050181] [ 0] /lib64/libpthread.so.0(+0x12dd0)[0x7fdaf95a9dd0]
   [node2:4050181] *** End of error message ***
   bash: line 1: 4050181 Segmentation fault      (core dumped)
   /home/boubliki/openmpi/bin/orted -mca ess "env" -mca ess_base_jobid
   "1254293504" -mca ess_base_vpid 1 -mca ess_base_num_procs "2" -mca
   orte_node_regex "node[1:1,2]@0(2)" -mca btl "tcp" --mca
   btl_tcp_port_min_v4 "6706" --mca btl_tcp_port_range_v4 "10" --mca
   oob_tcp_static_ipv4_ports "6705" -mca plm "rsh" --tree-spawn -mca
   routed "radix" -mca orte_parent_uri
   "1254293504.0;tcp://192.168.xxx.xxx:6705" -mca orte_launch_agent
   "/home/boubliki/openmpi/bin/orted" -mca pmix "^s1,s2,cray,isolated"

I tried on different machines, and also with different compilers (gcc 10.2 and intel 19u1). Version 4.1.0rc5 did not improve the execution. Forcing no optimization with -O0 neither.

Not familiar with debugging such a software but I could add a lantency somewhere (sleep()) and catch the orted process on the [single] remote node, reaching line 572 with gdb

boubliki@node1: ~/openmpi/src/openmpi-4.0.5> cat -n orte/mca/ess/base/ess_base_std_orted.c | sed -n -r -e '562,583p'
   562      if (orte_static_ports || orte_fwd_mpirun_port) {
   563          if (NULL == orte_node_regex) {
   564              /* we didn't get the node info */
   565              error = "cannot construct daemon map for static ports - no node map info";
   566              goto error;
   567          }
   568          /* extract the node info from the environment and
   569           * build a nidmap from it - this will update the
   570           * routing plan as well
   571           */
   572          if (ORTE_SUCCESS != (ret = *orte_regx.build_daemon_nidmap()*)) {
   573              ORTE_ERROR_LOG(ret);
   574              error = "construct daemon map from static ports";
   575              goto error;
   576          }
   577          /* be sure to update the routing tree so the initial "phone home"    578           * to mpirun goes through the tree if static ports were enabled
   579           */
   580          orte_routed.update_routing_plan(NULL);
   581          /* routing can be enabled */
   582          orte_routed_base.routing_enabled = true;
   583      }
boubliki@node1: ~/openmpi/src/openmpi-4.0.5>

The debugger led me to printing element called orte_regx, showing address of a method called build_daemon_nidmap containing a NULL value while line 572 wants precisely to call this method for execution.

   (gdb)
   Thread 1 "orted" received signal SIGSEGV, Segmentation fault.
   0x0000000000000000 in ?? ()
   (gdb) bt
   #0  0x0000000000000000 in ?? ()
   *#1  0x00007f76ae3fa585 in orte_ess_base_orted_setup () at
   base/ess_base_std_orted.c:572*
   #2  0x00007f76ae2662b4 in rte_init () at ess_env_module.c:149
   #3  0x00007f76ae432645 in orte_init
   (pargc=pargc@entry=0x7ffe1c87a81c, pargv=pargv@entry=0x7ffe1c87a810,
   flags=flags@entry=2) at runtime/orte_init.c:271
   #4  0x00007f76ae3e0bf0 in orte_daemon (argc=<optimized out>,
   argv=<optimized out>) at orted/orted_main.c:362
   #5  0x00007f76acc976a3 in __libc_start_main () from /lib64/libc.so.6
   #6  0x000000000040111e in _start ()
   (gdb) *p orte_regx*
   $1 = {init = 0x0, nidmap_create = 0x7f76ab46c230 <nidmap_create>,
   nidmap_parse = 0x7f76ae4180b0 <orte_regx_base_nidmap_parse>,
      extract_node_names = 0x7f76ae41bd20
   <orte_regx_base_extract_node_names>, encode_nodemap = 0x7f76ae418730
   <orte_regx_base_encode_nodemap>,
      decode_daemon_nodemap = 0x7f76ae41a190
   <orte_regx_base_decode_daemon_nodemap>, *build_daemon_nidmap = 0x0*,
      generate_ppn = 0x7f76ae41b0f0 <orte_regx_base_generate_ppn>,
   parse_ppn = 0x7f76ae41b760 <orte_regx_base_parse_ppn>, finalize = 0x0}
   (gdb)

I suppose the orte_regx element has been initialized somewhere through an inline function in [maybe] opal/class/opal_object.h but I'm lost in code and probably in some concurrency/multi-threading aspects, and can't even figure out at the end whether I'm using the mca option correctly or not, or facing a bug in the core application

    478 static inline opal_object_t *opal_obj_new(opal_class_t * cls)
    479 {
    480     opal_object_t *object;
    481     assert(cls->cls_sizeof >= sizeof(opal_object_t));
    482
    483 #if OPAL_WANT_MEMCHECKER
    484     object = (opal_object_t *) calloc(1, cls->cls_sizeof);
    485 #else
    486     object = (opal_object_t *) malloc(cls->cls_sizeof);
    487 #endif
    488     if (opal_class_init_epoch != cls->cls_initialized) {
    489 *opal_class_initialize(cls);*
    490     }
    491     if (NULL != object) {
    492         object->obj_class = cls;
    493         object->obj_reference_count = 1;
    494         opal_obj_run_constructors(object);
    495     }
    496     return object;
    497 }

Can you maybe (firstly) fix my knowledge about what correct mca option I could use for this and get orted on the remote node connecting back to the tcp port I specify ? Or (worse) browse the code for a potential bug related to this functionality ?

Thank you

Vincent


Reply via email to