Hello, I'm working as a research intern in a lab where we're studying virtualization. And I've been working with several benchmarks using OpenMPI 4.1.0 (ASKAP, GPAW and Incompact3d from Phrononix Test suite).
To briefly explain my experiments, I'm running those benchmarks on several virtual machines using different topologies. During one experiment I've been comparing those two topologies : - Topology1 : 96 vCPUS divided in 96 sockets containing 1 threads - Topology2 : 96 vCPUS divided in 48 sockets containing 2 threads (usage of hyperthreading) For the ASKAP Benchmark : - While using Topology2, 2306 processes will be created by the application to do its work. - While using Topology1, 4612 processes will be created by the application to do its work. This is also happening when running GPAW and Incompact3d benchmarks. What I've been wondering (and looking for) is, does OpenMPI take into account the topology, and reduce the number of processes create to execute its work in order to avoid the usage of hyperthreading ? Or is it something done by the application itself ? I was looking at the source code, and I've been trying to find how and when are filled the information about the MPI_COMM_WORLD communicator, to see if the 'num_procs' field depends on the topology, but I didn't have any chance for now. Respectfully, Chaloyard Lucas.