Problem is that comm_accept isn't thread safe in 1.6 series - we have a devel branch that might solve it, but is still under evaluation
On May 14, 2013, at 11:15 AM, Damien Kick <dk...@shoretel.com> wrote: > I'm been playing with come code to try and become familiar with > MPI_Comm_accept and MPI_Comm_connect to implement an MPI > client/server. The code that I have simply sends a single MPI_INT, > the client process pid, to the server and then disconnects. The code > that I have works for a few test runs but then on the 2nd or 3rd > client connection, the server seems to stop responding and the client > spins 100% CPU in the call to MPI_Comm_accept. Am I doing something > wrong in my code? Thanks in advance for any help. First, an example > run ... > > In terminal #1, start the name service > > $ ompi-server -r ${PREFIX}/var/run/ompi-server/uri.txt > $ > > In terminal #2, start the server code > > $ mpirun -mca btl tcp,sm,self \ > --ompi-server file:${PREFIX}/var/run/ompi-server/uri.txt mpi-server > mpi-server pid 41556 > Opened port 2011758592.0;tcp://10.161.1.73:51113+2011758593.0;\ > tcp://10.161.1.73:51114:300 > MPI_Info_set("ompi_global_scope", "true") > Published {"mpi-server-example", "2011758592.0;\ > tcp://10.161.1.73:51113+2011758593.0;tcp://10.161.1.73:51114:300"} > Waiting to accept a connection on {"mpi-server-example", "2011758592.0;\ > tcp://10.161.1.73:51113+2011758593.0;tcp://10.161.1.73:51114:300"} > Accepted a connection on {"mpi-server-example", "2011758592.0;\ > tcp://10.161.1.73:51113+2011758593.0;tcp://10.161.1.73:51114:300"} \ > with rank 0 and size 1 > We detached from the client thread... > Waiting for an int... > Waiting to accept a connection on {"mpi-server-example", "2011758592.0;\ > tcp://10.161.1.73:51113+2011758593.0;tcp://10.161.1.73:51114:300"} > Received 41558 > Disconnecting intercomm > Accepted a connection on {"mpi-server-example", "2011758592.0;\ > tcp://10.161.1.73:51113+2011758593.0;tcp://10.161.1.73:51114:300"} \ > with rank 0 and size 1 > We detached from the client thread... > Waiting to accept a connection on {"mpi-server-example", "Waiting for an > int... > 2011758592.0;tcp://10.161.1.73:51113+2011758593.0;tcp://10.161.1.73:51114:300"} > Received 41560 > Disconnecting intercomm > ^Cmpirun: killing job... > > In terminal #2, the client runs > > $ mpirun -mca btl tcp,sm,self \ > --ompi-server file:${PREFIX}/var/run/ompi-server/uri.txt mpi-client > mpi-client pid 41558 > Looking up port for service "mpi-server-example > Found {"mpi-server-example", "2011758592.0;\ > tcp://10.161.1.73:51113+2011758593.0;tcp://10.161.1.73:51114:300"} > Connected to {"mpi-server-example", "2011758592.0;\ > tcp://10.161.1.73:51113+2011758593.0;tcp://10.161.1.73:51114:300"} \ > with rank 0 and size 1 > Sending to {"mpi-server-example", "2011758592.0;\ > tcp://10.161.1.73:51113+2011758593.0;tcp://10.161.1.73:51114:300"} > Disconnecting intercomm > $ mpirun -mca btl tcp,sm,self \ > --ompi-server file:${PREFIX}/var/run/ompi-server/uri.txt mpi-client > mpi-client pid 41560 > Looking up port for service "mpi-server-example > Found {"mpi-server-example", "2011758592.0;\ > tcp://10.161.1.73:51113+2011758593.0;tcp://10.161.1.73:51114:300"} > Connected to {"mpi-server-example", "2011758592.0;\ > tcp://10.161.1.73:51113+2011758593.0;tcp://10.161.1.73:51114:300"} \ > with rank 0 and size 1 > Sending to {"mpi-server-example", "2011758592.0;\ > tcp://10.161.1.73:51113+2011758593.0;tcp://10.161.1.73:51114:300"} > Disconnecting intercomm > $ mpirun -mca btl tcp,sm,self \ > --ompi-server file:${PREFIX}/var/run/ompi-server/uri.txt mpi-client > mpi-client pid 41562 > Looking up port for service "mpi-server-example > Found {"mpi-server-example", "2011758592.0;\ > tcp://10.161.1.73:51113+2011758593.0;tcp://10.161.1.73:51114:300"} > ^Cmpirun: killing job... > > $ mpirun -mca btl tcp,sm,self \ > --ompi-server file:${PREFIX}/var/run/ompi-server/uri.txt mpi-client > mpi-client pid 41564 > Looking up port for service "mpi-server-example > Found {"mpi-server-example", "2011758592.0;\ > tcp://10.161.1.73:51113+2011758593.0;tcp://10.161.1.73:51114:300"} > ^Cmpirun: killing job... > > The source code for the server is as follows > > $ cat src/mpi-server.cc > #include "seed/mpi_info.hh" > #include "seed/mpi_intercomm.hh" > #include "seed/scope_exit.hh" > > #include <mpi.h> > > #include <array> > #include <cstdlib> > #include <iostream> > #include <ostream> > #include <string> > #include <thread> > > int main(int argc, char* argv[]) > { > std::clog << argv[0] << " pid " << getpid() << '\n'; > > shor::seed::Scope_exit finalize( > []() { > if (MPI::Is_initialized()) { > MPI::Finalize(); > } > }); > const auto required = MPI_THREAD_MULTIPLE; > const auto provided = MPI::Init_thread(argc, argv, required); > if (provided < required) { > std::cerr << "Error: could not init with MPI_THREAD_MULTIPLE\n"; > return EXIT_FAILURE; > } > > typedef std::array<char, MPI_MAX_PORT_NAME> Port_name; > Port_name port_name; > MPI::Open_port(MPI_INFO_NULL, port_name.data()); > shor::seed::Scope_exit close_port( > [port_name]() { > std::clog << "Closing port " << port_name.data() << '\n'; > MPI::Close_port(port_name.data()); > }); > std::clog << "Opened port " << port_name.data() << '\n'; > > const std::string service_name = "mpi-server-example"; > const shor::seed::Mpi_info pub_info({{"ompi_global_scope", "true"}}); > MPI::Publish_name(service_name.c_str(), pub_info, port_name.data()); > shor::seed::Scope_exit unpublish_name( > [port_name, service_name, &pub_info]() { > std::clog > << "Unpublishing {\"" << service_name > << "\", \"" << port_name.data() << "\"}\n"; > MPI::Unpublish_name( > service_name.c_str(), pub_info, port_name.data()); > }); > std::clog > << "Published {\"" << service_name << "\", \"" << port_name.data() > << "\"}\n"; > > while (true) { > std::clog > << "Waiting to accept a connection on {\"" << service_name > << "\", \"" << port_name.data() << "\"}\n"; > shor::seed::Mpi_intercomm comm( > MPI::COMM_SELF.Accept(port_name.data(), MPI_INFO_NULL, 0)); > std::clog > << "Accepted a connection on {\"" << service_name > << "\", \"" << port_name.data() << "\"} with rank " > << comm->Get_rank() << " and size " << comm->Get_size() > << '\n'; > const auto lambda = [](shor::seed::Mpi_intercomm comm) { > int n; > std::clog << "Waiting for an int...\n"; > comm->Recv(&n, 1, MPI_INT, MPI_ANY_SOURCE, 0); > std::clog << "Received " << n << '\n'; > }; > std::thread a_thread(lambda, std::move(comm)); > a_thread.detach(); > std::clog << "We detached from the client thread...\n"; > } > } > $ > > The client > > $ cat src/mpi-client.cc > #include "seed/mpi_intercomm.hh" > #include "seed/scope_exit.hh" > > #include <mpi.h> > > #include <array> > #include <iostream> > #include <ostream> > #include <string> > > #include <unistd.h> > > int main(int argc, char* argv[]) > { > std::clog << argv[0] << " pid " << getpid() << '\n'; > > shor::seed::Scope_exit finalize( > []() { > if (MPI::Is_initialized()) { > MPI::Finalize(); > } > }); > const auto required = MPI_THREAD_MULTIPLE; > const auto provided = MPI::Init_thread(argc, argv, required); > if (provided < required) { > std::cerr << "Error: could not init with MPI_THREAD_MULTIPLE\n"; > return EXIT_FAILURE; > } > > const std::string service_name = "mpi-server-example"; > std::clog > << "Looking up port for service \"" << service_name << '\n'; > std::array<char, MPI_MAX_PORT_NAME> port_name; > MPI::Lookup_name( > service_name.c_str(), MPI_INFO_NULL, port_name.data()); > std::clog > << "Found {\"" << service_name << "\", \"" << port_name.data() > << "\"}\n"; > > > shor::seed::Mpi_intercomm comm( > MPI::COMM_SELF.Connect(port_name.data(), MPI_INFO_NULL, 0)); > std::clog > << "Connected to {\"" << service_name << "\", \"" > << port_name.data() << "\"} with rank " << comm->Get_rank() > << " and size " << comm->Get_size() << '\n'; > > std::clog > << "Sending to {\"" << service_name << "\", \"" > << port_name.data() << "\"}\n"; > const int n = getpid(); > comm->Send(&n, 1, MPI_INT, 0, 0); > } > $ > > And a few helpers used by both > > $ cat include/seed/mpi_intercomm.hh > #ifndef INCLUDE_SEED_MPI_INTERCOMM_HH > #define INCLUDE_SEED_MPI_INTERCOMM_HH > > #include <mpi.h> > > #include <boost/optional.hpp> > > namespace shor { > namespace seed { > class Mpi_intercomm { > boost::optional<MPI::Intercomm> impl_; > > public: > explicit Mpi_intercomm(MPI::Intercomm impl); > Mpi_intercomm(const Mpi_intercomm& that) = delete; > Mpi_intercomm(Mpi_intercomm&& that); > ~Mpi_intercomm(); > > Mpi_intercomm& operator = (const Mpi_intercomm& that) = delete; > Mpi_intercomm& operator = (Mpi_intercomm&& that); > > MPI::Intercomm* operator -> (); > const MPI::Intercomm* operator -> () const; > > operator MPI::Intercomm() const; > }; > > } // namespace seed > } // namespace shor > > #include "seed/mpi_intercomm.cc.hh" > > #endif > $ cat include/seed/mpi_intercomm.cc.hh > #ifndef INCLUDE_SEED_MPI_INTERCOMM_CC_HH > #define INCLUDE_SEED_MPI_INTERCOMM_CC_HH > > #include <mpi.h> > > inline MPI::Intercomm* > shor::seed::Mpi_intercomm::operator -> () > { > return &(*impl_); > } > > inline const MPI::Intercomm* > shor::seed::Mpi_intercomm::operator -> () const > { > return &(*impl_); > } > > inline > shor::seed::Mpi_intercomm::operator MPI::Intercomm() const > { > return *impl_; > } > > #endif > $ cat src/mpi_intercomm.cc > #include "seed/mpi_intercomm.hh" > > shor::seed::Mpi_intercomm::Mpi_intercomm( > MPI::Intercomm impl) > : impl_(impl) > { } > > shor::seed::Mpi_intercomm::Mpi_intercomm( > Mpi_intercomm&& that) > : impl_(that.impl_) > { > that.impl_ = boost::none; > } > > shor::seed::Mpi_intercomm::~Mpi_intercomm() > { > if (impl_ > && (*impl_ != MPI::COMM_WORLD) && (*impl_ != MPI::COMM_SELF)) > { > std::clog << "Disconnecting intercomm\n"; > impl_->Disconnect(); > impl_ = boost::none; > } > } > > shor::seed::Mpi_intercomm& > shor::seed::Mpi_intercomm::operator = ( > Mpi_intercomm&& that) > { > impl_ = that.impl_; > that.impl_ = boost::none; > } > $ cat include/seed/mpi_info.hh > #ifndef INCLUDE_SEED_MPI_INFO_HH > #define INCLUDE_SEED_MPI_INFO_HH > > #include <mpi.h> > > #include <initializer_list> > > namespace shor { > namespace seed { > class Mpi_info { > MPI::Info impl_; > > public: > typedef std::pair<const char*, const char*> Key_value; > typedef std::initializer_list<Key_value> Init_list; > > Mpi_info(); > explicit Mpi_info(const Init_list& some_values); > Mpi_info(const Mpi_info& that) = delete; > Mpi_info(Mpi_info&&); > ~Mpi_info(); > > Mpi_info& operator = (const Mpi_info& that) = delete; > Mpi_info& operator = (Mpi_info&& that); > > operator MPI::Info() const; > }; > > } // namespace seed > } // namespace shor > > #include "seed/mpi_info.cc.hh" > > #endif > $ cat include/seed/mpi_info.cc.hh > #ifndef INCLUDE_SEED_MPI_INFO_CC_HH > #define INCLUDE_SEED_MPI_INFO_CC_HH > > #include "seed/mpi_info.hh" > > inline shor::seed::Mpi_info::operator MPI::Info() const > { > return impl_; > } > > #endif > $ cat src/mpi_info.cc > #include "seed/mpi_info.hh" > > #include <algorithm> > #include <initializer_list> > > shor::seed::Mpi_info::Mpi_info() > : impl_(MPI::Info::Create()) > { } > > shor::seed::Mpi_info::Mpi_info( > const Init_list& some_values) > : impl_(MPI::Info::Create()) > { > std::for_each( > std::begin(some_values), std::end(some_values), > [this] (const Key_value& one_value) { > std::clog > << "MPI_Info_set(\"" << std::get<0>(one_value) > << "\", \"" << std::get<1>(one_value) > << "\")\n"; > impl_.Set(std::get<0>(one_value), std::get<1>(one_value)); > }); > } > > shor::seed::Mpi_info::Mpi_info(Mpi_info&& that) > : impl_(that.impl_) > { } > > shor::seed::Mpi_info::~Mpi_info() > { > impl_.Free(); > } > > shor::seed::Mpi_info& > shor::seed::Mpi_info::operator = (Mpi_info&& that) > { > impl_ = that.impl_; > return *this; > } > $ cat include/seed/scope_exit.hh > #ifndef INCLUDE_SEED_SCOPE_EXIT_HH > #define INCLUDE_SEED_SCOPE_EXIT_HH > > #include <functional> > > namespace shor { > namespace seed { > class Scope_exit { > std::function<void()> lambda_; > > public: > Scope_exit(std::function<void()> lambda) : lambda_(lambda) { } > Scope_exit(const Scope_exit& that) = delete; > ~Scope_exit() { lambda_(); } > > Scope_exit& operator = (const Scope_exit& that) = delete; > }; > > } // namespace seed > } // namespace shor > > #endif > $ > > And here is the output of ompi_info > > $ ompi_info > Package: Open MPI dkick@Damien-Kicks-MacBook-Pro.local > Distribution > Open MPI: 1.6.4 > Open MPI SVN revision: r28081 > Open MPI release date: Feb 19, 2013 > Open RTE: 1.6.4 > Open RTE SVN revision: r28081 > Open RTE release date: Feb 19, 2013 > OPAL: 1.6.4 > OPAL SVN revision: r28081 > OPAL release date: Feb 19, 2013 > MPI API: 2.1 > Ident string: 1.6.4 > Prefix: ${PREFIX} > Configured architecture: x86_64-apple-darwin12.3.0 > Configure host: Damien-Kicks-MacBook-Pro.local > Configured by: dkick > Configured on: Thu May 9 21:36:29 CDT 2013 > Configure host: Damien-Kicks-MacBook-Pro.local > Built by: dkick > Built on: Thu May 9 21:53:32 CDT 2013 > Built host: Damien-Kicks-MacBook-Pro.local > C bindings: yes > C++ bindings: yes > Fortran77 bindings: yes (single underscore) > Fortran90 bindings: yes > Fortran90 bindings size: small > C compiler: gcc > C compiler absolute: /usr/bin/gcc > C compiler family name: GNU > C compiler version: 4.8.0 > C++ compiler: g++ --std=c++0x > C++ compiler absolute: /usr/bin/g++ > Fortran77 compiler: gfortran > Fortran77 compiler abs: /sw/bin/gfortran > Fortran90 compiler: gfortran > Fortran90 compiler abs: /sw/bin/gfortran > C profiling: yes > C++ profiling: yes > Fortran77 profiling: yes > Fortran90 profiling: yes > C++ exceptions: yes > Thread support: posix (MPI_THREAD_MULTIPLE: yes, progress: no) > Sparse Groups: no > Internal debug support: no > MPI interface warnings: no > MPI parameter check: runtime > Memory profiling support: no > Memory debugging support: no > libltdl support: yes > Heterogeneous support: no > mpirun default --prefix: no > MPI I/O support: yes > MPI_WTIME support: gettimeofday > Symbol vis. support: yes > Host topology support: yes > MPI extensions: affinity example > FT Checkpoint support: no (checkpoint thread: no) > VampirTrace support: yes > MPI_MAX_PROCESSOR_NAME: 256 > MPI_MAX_ERROR_STRING: 256 > MPI_MAX_OBJECT_NAME: 64 > MPI_MAX_INFO_KEY: 36 > MPI_MAX_INFO_VAL: 256 > MPI_MAX_PORT_NAME: 1024 > MPI_MAX_DATAREP_STRING: 128 > MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.6.4) > MCA paffinity: hwloc (MCA v2.0, API v2.0, Component v1.6.4) > MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.6.4) > MCA carto: file (MCA v2.0, API v2.0, Component v1.6.4) > MCA shmem: mmap (MCA v2.0, API v2.0, Component v1.6.4) > MCA shmem: posix (MCA v2.0, API v2.0, Component v1.6.4) > MCA shmem: sysv (MCA v2.0, API v2.0, Component v1.6.4) > MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.6.4) > MCA maffinity: hwloc (MCA v2.0, API v2.0, Component v1.6.4) > MCA timer: darwin (MCA v2.0, API v2.0, Component v1.6.4) > MCA installdirs: env (MCA v2.0, API v2.0, Component v1.6.4) > MCA installdirs: config (MCA v2.0, API v2.0, Component v1.6.4) > MCA sysinfo: darwin (MCA v2.0, API v2.0, Component v1.6.4) > MCA hwloc: hwloc132 (MCA v2.0, API v2.0, Component v1.6.4) > MCA dpm: orte (MCA v2.0, API v2.0, Component v1.6.4) > MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.6.4) > MCA allocator: basic (MCA v2.0, API v2.0, Component v1.6.4) > MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.6.4) > MCA coll: basic (MCA v2.0, API v2.0, Component v1.6.4) > MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.6.4) > MCA coll: inter (MCA v2.0, API v2.0, Component v1.6.4) > MCA coll: self (MCA v2.0, API v2.0, Component v1.6.4) > MCA coll: sm (MCA v2.0, API v2.0, Component v1.6.4) > MCA coll: sync (MCA v2.0, API v2.0, Component v1.6.4) > MCA coll: tuned (MCA v2.0, API v2.0, Component v1.6.4) > MCA io: romio (MCA v2.0, API v2.0, Component v1.6.4) > MCA mpool: fake (MCA v2.0, API v2.0, Component v1.6.4) > MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.6.4) > MCA mpool: sm (MCA v2.0, API v2.0, Component v1.6.4) > MCA pml: bfo (MCA v2.0, API v2.0, Component v1.6.4) > MCA pml: csum (MCA v2.0, API v2.0, Component v1.6.4) > MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.6.4) > MCA pml: v (MCA v2.0, API v2.0, Component v1.6.4) > MCA bml: r2 (MCA v2.0, API v2.0, Component v1.6.4) > MCA rcache: vma (MCA v2.0, API v2.0, Component v1.6.4) > MCA btl: self (MCA v2.0, API v2.0, Component v1.6.4) > MCA btl: sm (MCA v2.0, API v2.0, Component v1.6.4) > MCA btl: tcp (MCA v2.0, API v2.0, Component v1.6.4) > MCA topo: unity (MCA v2.0, API v2.0, Component v1.6.4) > MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.6.4) > MCA osc: rdma (MCA v2.0, API v2.0, Component v1.6.4) > MCA iof: hnp (MCA v2.0, API v2.0, Component v1.6.4) > MCA iof: orted (MCA v2.0, API v2.0, Component v1.6.4) > MCA iof: tool (MCA v2.0, API v2.0, Component v1.6.4) > MCA oob: tcp (MCA v2.0, API v2.0, Component v1.6.4) > MCA odls: default (MCA v2.0, API v2.0, Component v1.6.4) > MCA ras: cm (MCA v2.0, API v2.0, Component v1.6.4) > MCA ras: slurm (MCA v2.0, API v2.0, Component v1.6.4) > MCA rmaps: load_balance (MCA v2.0, API v2.0, Component v1.6.4) > MCA rmaps: rank_file (MCA v2.0, API v2.0, Component v1.6.4) > MCA rmaps: resilient (MCA v2.0, API v2.0, Component v1.6.4) > MCA rmaps: round_robin (MCA v2.0, API v2.0, Component v1.6.4) > MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.6.4) > MCA rmaps: topo (MCA v2.0, API v2.0, Component v1.6.4) > MCA rml: oob (MCA v2.0, API v2.0, Component v1.6.4) > MCA routed: binomial (MCA v2.0, API v2.0, Component v1.6.4) > MCA routed: cm (MCA v2.0, API v2.0, Component v1.6.4) > MCA routed: direct (MCA v2.0, API v2.0, Component v1.6.4) > MCA routed: linear (MCA v2.0, API v2.0, Component v1.6.4) > MCA routed: radix (MCA v2.0, API v2.0, Component v1.6.4) > MCA routed: slave (MCA v2.0, API v2.0, Component v1.6.4) > MCA plm: rsh (MCA v2.0, API v2.0, Component v1.6.4) > MCA plm: slurm (MCA v2.0, API v2.0, Component v1.6.4) > MCA filem: rsh (MCA v2.0, API v2.0, Component v1.6.4) > MCA errmgr: default (MCA v2.0, API v2.0, Component v1.6.4) > MCA ess: env (MCA v2.0, API v2.0, Component v1.6.4) > MCA ess: hnp (MCA v2.0, API v2.0, Component v1.6.4) > MCA ess: singleton (MCA v2.0, API v2.0, Component v1.6.4) > MCA ess: slave (MCA v2.0, API v2.0, Component v1.6.4) > MCA ess: slurm (MCA v2.0, API v2.0, Component v1.6.4) > MCA ess: slurmd (MCA v2.0, API v2.0, Component v1.6.4) > MCA ess: tool (MCA v2.0, API v2.0, Component v1.6.4) > MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.6.4) > MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.6.4) > MCA grpcomm: hier (MCA v2.0, API v2.0, Component v1.6.4) > MCA notifier: command (MCA v2.0, API v1.0, Component v1.6.4) > MCA notifier: syslog (MCA v2.0, API v1.0, Component v1.6.4) > $ > > > ________________________________ > > This e-mail and any attachments are confidential. If it is not intended for > you, please notify the sender, and please erase and ignore the contents. > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users