Hi Howard and Michael, thanks for your feedback. I did not want to write a toot long mail with non pertinent information so I just show how the two different builds give different result. I'm using a small test case based on my large code, the same used to show the memory leak with mpi_Alltoallv calls, but just running 2 iterations. It is a 2D case and data storage is moved from distributions "along X axis" to "along Y axis" with mpi_Alltoallv and subarrays types. Datas initialization is based on the location in the array to allow checking for correct exchanges.
When the program runs (on 4 processes in my test) it must only show the max rss size of the processes. When it fails it shows the invalid locations. I've drastically reduced the size of the problem with nx=5 and ny=7. Launching the non working setup with more details show: dahu138 : mpirun -np 4 -mca mtl_base_verbose 99 ./test_layout_array [dahu138:115761] mca: base: components_register: registering framework mtl components [dahu138:115763] mca: base: components_register: registering framework mtl components [dahu138:115763] mca: base: components_register: found loaded component psm2 [dahu138:115763] mca: base: components_register: component psm2 register function successful [dahu138:115763] mca: base: components_open: opening mtl components [dahu138:115763] mca: base: components_open: found loaded component psm2 [dahu138:115761] mca: base: components_register: found loaded component psm2 [dahu138:115763] mca: base: components_open: component psm2 open function successful [dahu138:115761] mca: base: components_register: component psm2 register function successful [dahu138:115761] mca: base: components_open: opening mtl components [dahu138:115761] mca: base: components_open: found loaded component psm2 [dahu138:115761] mca: base: components_open: component psm2 open function successful [dahu138:115760] mca: base: components_register: registering framework mtl components [dahu138:115760] mca: base: components_register: found loaded component psm2 [dahu138:115760] mca: base: components_register: component psm2 register function successful [dahu138:115760] mca: base: components_open: opening mtl components [dahu138:115760] mca: base: components_open: found loaded component psm2 [dahu138:115762] mca: base: components_register: registering framework mtl components [dahu138:115762] mca: base: components_register: found loaded component psm2 [dahu138:115760] mca: base: components_open: component psm2 open function successful [dahu138:115762] mca: base: components_register: component psm2 register function successful [dahu138:115762] mca: base: components_open: opening mtl components [dahu138:115762] mca: base: components_open: found loaded component psm2 [dahu138:115762] mca: base: components_open: component psm2 open function successful [dahu138:115760] mca:base:select: Auto-selecting mtl components [dahu138:115760] mca:base:select:( mtl) Querying component [psm2] [dahu138:115760] mca:base:select:( mtl) Query of component [psm2] set priority to 40 [dahu138:115761] mca:base:select: Auto-selecting mtl components [dahu138:115762] mca:base:select: Auto-selecting mtl components [dahu138:115762] mca:base:select:( mtl) Querying component [psm2] [dahu138:115762] mca:base:select:( mtl) Query of component [psm2] set priority to 40 [dahu138:115762] mca:base:select:( mtl) Selected component [psm2] [dahu138:115762] select: initializing mtl component psm2 [dahu138:115761] mca:base:select:( mtl) Querying component [psm2] [dahu138:115761] mca:base:select:( mtl) Query of component [psm2] set priority to 40 [dahu138:115761] mca:base:select:( mtl) Selected component [psm2] [dahu138:115761] select: initializing mtl component psm2 [dahu138:115760] mca:base:select:( mtl) Selected component [psm2] [dahu138:115760] select: initializing mtl component psm2 [dahu138:115763] mca:base:select: Auto-selecting mtl components [dahu138:115763] mca:base:select:( mtl) Querying component [psm2] [dahu138:115763] mca:base:select:( mtl) Query of component [psm2] set priority to 40 [dahu138:115763] mca:base:select:( mtl) Selected component [psm2] [dahu138:115763] select: initializing mtl component psm2 [dahu138:115761] select: init returned success [dahu138:115761] select: component psm2 selected [dahu138:115762] select: init returned success [dahu138:115762] select: component psm2 selected [dahu138:115763] select: init returned success [dahu138:115763] select: component psm2 selected [dahu138:115760] select: init returned success [dahu138:115760] select: component psm2 selected On 1 found 1007 but expect 3007 On 2 found 1007 but expect 4007 and with this setup the code freeze with this dimension of the problem. Below is the same code with my no-ib setup of openMPI on the same node: dahu138 : mpirun -np 4 -mca mtl_base_verbose 99 ./test_layout_array [dahu138:116723] mca: base: components_register: registering framework mtl components [dahu138:116723] mca: base: components_open: opening mtl components [dahu138:116724] mca: base: components_register: registering framework mtl components [dahu138:116724] mca: base: components_open: opening mtl components [dahu138:116726] mca: base: components_register: registering framework mtl components [dahu138:116726] mca: base: components_open: opening mtl components [dahu138:116725] mca: base: components_register: registering framework mtl components [dahu138:116725] mca: base: components_open: opening mtl components [INFO MEMORY] : processor 0 uses 9948 kb max of resident memory [INFO MEMORY] : processor 0 uses 9948 kb max of resident memory The test case used is provides in attachment but as it runs on many OS/OpenMPI/hardware associations I do not think the problem could be the tes-case even if it is also a possibility. Patrick
test_layout_array.tar.gz
Description: application/gzip