Hello,

I'm working on a sample program to connect two MPI communicators launched with 
mpirun using Ports.

Firstly, I use MPI_Open_port to obtain a name and write that to a file:

  if (options.participant == A) { // A publishes the port
    if (options.commType == single and rank == 0)
      openPublishPort(options);

    if (options.commType == many)
      openPublishPort(options);
  }
  MPI_Barrier(MPI_COMM_WORLD);

participant is a command line argument and defines the role of A as server. B 
is the client.

void openPublishPort(Options options)
{
  using namespace boost::filesystem;
  int rank;
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);

  char p[MPI_MAX_PORT_NAME];
  MPI_Open_port(MPI_INFO_NULL, p);
  std::string portName(p);

  create_directory(options.publishDirectory);
  std::string filename;
  if (options.commType == many)
    filename = "A-" + std::to_string(rank) + ".address";
  if (options.commType == single)
    filename = "intercomm.address";

  auto path = options.publishDirectory / filename;
  DEBUG << "Writing address " << portName << " to " << path;
  std::ofstream ofs(path.string(), std::ofstream::out);
  ofs << portName;
}

This works fine as far as I see. Next, I try to connect:

  MPI_Comm icomm;
  std::string portName;
  if (options.participant == A) { // receives connections
    if (options.commType == single) {
      if (rank == 0)
        portName = readPort(options);
      INFO << "Accepting connection on " << portName;
      MPI_Comm_accept(portName.c_str(), MPI_INFO_NULL, 0, MPI_COMM_WORLD, 
&icomm);
      INFO << "Received connection";
    }
  }

  if (options.participant == B) { // connects to the intercomms
    if (options.commType == single) {
      if (rank == 0)
        portName = readPort(options);
      INFO << "Trying to connect to " << portName;
      MPI_Comm_connect(portName.c_str(), MPI_INFO_NULL, 0, MPI_COMM_WORLD, 
&icomm);
      INFO << "Connected";
    }
  }


options.single says that I want to use a single communicator that contains all 
ranks on both participants, A and B.
readPort reads the port name from the file that was written before.

Now, when I first launch A and, in another terminal, B, nothing happens until a 
timeout occurs.

% mpirun -n 1 ./mpiports --commType="single" --participant="A"
[2017-11-03 15:29:55.469891] [debug]   Writing address 3048013825.0:1069313090 
to "./publish/intercomm.address"
[2017-11-03 15:29:55.470169] [debug]   Read address 3048013825.0:1069313090 
from "./publish/intercomm.address"
[2017-11-03 15:29:55.470185] [info]    Accepting connection on 
3048013825.0:1069313090
[asaru:16199] OPAL ERROR: Timeout in file base/pmix_base_fns.c at line 195
[...]

and on the other site:

% mpirun -n 1 ./mpiports --commType="single" --participant="B"
[2017-11-03 15:29:59.698921] [debug]   Read address 3048013825.0:1069313090 
from "./publish/intercomm.address"
[2017-11-03 15:29:59.698947] [info]    Trying to connect to 
3048013825.0:1069313090
[asaru:16238] OPAL ERROR: Timeout in file base/pmix_base_fns.c at line 195
[...]

The complete code, including cmake build script can be downloaded at:

https://www.dropbox.com/s/azo5ti4kjg12zjy/MPI_Ports.tar.gz?dl=0

Why is the connection not working?

Thanks a lot,
Florian


_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to