What you said totally makes sense. I think I will start using MPI_Isend(). Thanks for your help very much!
Michael On Thu, Aug 11, 2016 at 6:36 PM, Cabral, Matias A <matias.a.cab...@intel.com > wrote: > Michael, > > > > In general terms and assuming you are running all messages sizes in PIO > Eager Mode, the communication is going to be affected by the CPU load. In > other words, the bigger the message, the more CPU cycles to copy the > buffer. Additionally, I have to say I’m not very certain how MPI_Send() > will behave under the hood with temporary buffering. I think a more > predicable behavior would be seen with MPI_Ssend(). Now, if you really > don’t want to see the sender affected by the receiver load, you need to > move to non-blocking calls MPI_Isend(). > > > > > > _MAC > > > > *From:* users [mailto:users-boun...@lists.open-mpi.org] *On Behalf Of > *Xiaolong > Cui > *Sent:* Thursday, August 11, 2016 2:13 PM > > *To:* Open MPI Users <users@lists.open-mpi.org> > *Subject:* Re: [OMPI users] runtime performance tuning for Intel OMA > interconnect > > > > Sorry, forgot the attachments. > > > > On Thu, Aug 11, 2016 at 5:06 PM, Xiaolong Cui <sunshine...@gmail.com> > wrote: > > Thanks! I tried it, but it didn't solve my problem. Maybe the reason is > not eager/rndv. > > > > The reason why I want to always use eager mode is that I want to avoid a > sender being slowed down by an unready receiver. I can prevent a sender > from slowing down by always using eager mode on InfiniBand, just like your > approach, but I cannot repeat this on OPA. Based on the experiments below, > it seems to me that a sender will be delayed to some extent due to reasons > other than eager/rndv. > > > > I designed a simple test (see hello_world.c in attachment) where there is > one sender rank (r0) and one receiver rank (r1). r0 always runs at full > speed, but r1 runs at full speed in one case and half speed in the second > case. To run r1 at half speed, I collate a third rank r2 with r1 (see > rankfile). Then I compare the completion time at r0 to see if there is a > slow down when r1 is "unready to receive". The result is positive. But it > is surprising that the delay varies significantly when I change the message > length. This is different from my previous observation when eager/rndv is > the cause. > > > > So my question is do you know other factors that cause a delay to a > MPI_Send() when the receiver is not ready to receive? > > > > > > > > > > On Wed, Aug 10, 2016 at 11:48 PM, Cabral, Matias A < > matias.a.cab...@intel.com> wrote: > > To remain in eager mode you need to increase the size of > PSM2_MQ_RNDV_HFI_THRESH. > > > PSM2_MQ_EAGER_SDMA_SZ is the threshold at which PSM changes from PIO (uses > the CPU) to start setting SDMA engines. This summary may help: > > > > PIO Eager Mode: 0 bytes -> PSM2_MQ_EAGER_SDMA_SZ - 1 > > SDMA Eager Mode: PSM2_MQ_EAGER_SDMA_SZ -> PSM2_MQ_RNDV_HFI_THRESH - > 1 > > RNDZ Expected: PSM2_MQ_RNDV_HFI_THRESH -> Largest supported > value. > > > > Regards, > > > > _MAC > > > > *From:* users [mailto:users-boun...@lists.open-mpi.org] *On Behalf Of > *Xiaolong > Cui > *Sent:* Wednesday, August 10, 2016 7:19 PM > *To:* Open MPI Users <users@lists.open-mpi.org> > *Subject:* Re: [OMPI users] runtime performance tuning for Intel OMA > interconnect > > > > Hi Matias, > > > > Thanks a lot, that's very helpful! > > > > What I need indeed is to always use eager mode. But I didn't find any > information about PSM2_MQ_EAGER_SDMA_SZ online. Would you please > elaborate on "Just in case PSM2_MQ_EAGER_SDMA_SZ changes PIO to SDMA, > always in eager mode." > > > > Thanks! > > Michael > > > > On Wed, Aug 10, 2016 at 3:59 PM, Cabral, Matias A < > matias.a.cab...@intel.com> wrote: > > Hi Michael, > > > > When Open MPI run on Omni-Path it will choose the PSM2 MTL by default, to > use the libpsm2.so. Strictly speaking, it has compatibility to run using > the openib BTL. However, the performance so significantly impacted that it > is, not only discouraged, but no tuning would make sense. Regarding the > PSM2 MTL, currently it only supports two mca parameters > ("mtl_psm2_connect_timeout" and "mtl_psm2_priority") which are not for what > you are looking for. Instead, you can set values directly in the PSM2 > library with environment variables. Further info in the Programmers Guide: > > > > http://www.intel.com/content/dam/support/us/en/documents/ > network-and-i-o/fabric-products/Intel_PSM2_PG_H76473_v3_0.pdf > > More docs: > > > > https://www-ssl.intel.com/content/www/us/en/support/ > network-and-i-o/fabric-products/000016242.html?wapkw=psm2 > > > > Now, for your parameters: > > > > btl = openib,vader,self -> Ignore this one > > btl_openib_eager_limit = 160000 -> I don’t clearly see the diff with the > below parameter. However, they are set to the same value. Just in case > PSM2_MQ_EAGER_SDMA_SZ changes PIO to SDMA, always in eager mode. > > btl_openib_rndv_eager_limit = 160000 -> PSM2_MQ_RNDV_HFI_THRESH > > btl_openib_max_send_size = 160000 -> does not apply to PSM2 > > btl_openib_receive_queues = P,128,256,192,128:S,2048,1024, > 1008,64:S,12288,1024,1008,64:S,160000,1024,512,512 -> does not apply for > PSM2. > > > > Thanks, > > Regards, > > > > _MAC > > BTW, should change the subject OMA -> OPA > > > > > > *From:* users [mailto:users-boun...@lists.open-mpi.org > <users-boun...@lists.open-mpi.org>] *On Behalf Of *Xiaolong Cui > *Sent:* Tuesday, August 09, 2016 2:22 PM > *To:* users@lists.open-mpi.org > *Subject:* [OMPI users] runtime performance tuning for Intel OMA > interconnect > > > > I used to tune the performance of OpenMPI on InfiniBand by changing the > OpenMPI MCA parameters for openib component (see > https://www.open-mpi.org/faq/?category=openfabrics). Now I migrate to a > new cluster that deploys Intel's omni-path interconnect, and my previous > approach does not work any more. Does anyone know how to tune the > performance for omni-path interconnect (what OpenMPI component to change) ? > > > > The version of OpenMPI is openmpi-1.10.2-hfi. I have included the output > from opmi_info and openib parameters that I used to change. Thanks! > > > > Sincerely, > > Michael > > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > > > > > > > > > _______________________________________________ > users mailing list > > > > > > > > -- > > Xiaolong Cui (Michael) > > Department of Computer Science > > Dietrich School of Arts & Science > > University of Pittsburgh > > Pittsburgh, PA 15260 > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users >
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users