Re: [OMPI users] runtime performance tuning for Intel OMA interconnect

Xiaolong Cui Thu, 11 Aug 2016 19:48:16 -0700

What you said totally makes sense. I think I will start using MPI_Isend().
Thanks for your help very much!


Michael

On Thu, Aug 11, 2016 at 6:36 PM, Cabral, Matias A <matias.a.cab...@intel.com
> wrote:

> Michael,
>
>
>
> In general terms and assuming you are running all messages sizes in PIO
> Eager Mode, the communication is going to be affected by the CPU load. In
> other words, the bigger the message, the more CPU cycles to copy the
> buffer. Additionally, I have to say I’m not very certain how MPI_Send()
> will behave under the hood with temporary buffering. I think a more
> predicable behavior would be seen with MPI_Ssend().  Now, if you really
> don’t want to see the sender affected by the receiver load, you need to
> move to non-blocking calls MPI_Isend().
>
>
>
>
>
> _MAC
>
>
>
> *From:* users [mailto:users-boun...@lists.open-mpi.org] *On Behalf Of 
> *Xiaolong
> Cui
> *Sent:* Thursday, August 11, 2016 2:13 PM
>
> *To:* Open MPI Users <users@lists.open-mpi.org>
> *Subject:* Re: [OMPI users] runtime performance tuning for Intel OMA
> interconnect
>
>
>
> Sorry, forgot the attachments.
>
>
>
> On Thu, Aug 11, 2016 at 5:06 PM, Xiaolong Cui <sunshine...@gmail.com>
> wrote:
>
> Thanks! I tried it, but it didn't solve my problem. Maybe the reason is
> not eager/rndv.
>
>
>
> The reason why I want to always use eager mode is that I want to avoid a
> sender being slowed down by an unready receiver. I can prevent a sender
> from slowing down by always using eager mode on InfiniBand, just like your
> approach, but I cannot repeat this on OPA. Based on the experiments below,
> it seems to me that a sender will be delayed to some extent due to reasons
> other than eager/rndv.
>
>
>
> I designed a simple test (see hello_world.c in attachment) where there is
> one sender rank (r0) and one receiver rank (r1). r0 always runs at full
> speed, but r1 runs at full speed in one case and half speed in the second
> case. To run r1 at half speed, I collate a third rank r2 with r1 (see
> rankfile). Then I compare the completion time at r0 to see if there is a
> slow down when r1 is "unready to receive". The result is positive. But it
> is surprising that the delay varies significantly when I change the message
> length. This is different from my previous observation when eager/rndv is
> the cause.
>
>
>
> So my question is do you know other factors that cause a delay to a
> MPI_Send() when the receiver is not ready to receive?
>
>
>
>
>
>
>
>
>
> On Wed, Aug 10, 2016 at 11:48 PM, Cabral, Matias A <
> matias.a.cab...@intel.com> wrote:
>
> To remain in eager mode you need to increase the size of 
> PSM2_MQ_RNDV_HFI_THRESH.
>
>
> PSM2_MQ_EAGER_SDMA_SZ is the threshold at which PSM changes from PIO (uses
> the CPU) to start setting SDMA engines.  This summary may help:
>
>
>
> PIO Eager Mode:              0 bytes -> PSM2_MQ_EAGER_SDMA_SZ - 1
>
> SDMA Eager Mode:        PSM2_MQ_EAGER_SDMA_SZ -> PSM2_MQ_RNDV_HFI_THRESH -
> 1
>
> RNDZ Expected:               PSM2_MQ_RNDV_HFI_THRESH -> Largest supported
> value.
>
>
>
> Regards,
>
>
>
> _MAC
>
>
>
> *From:* users [mailto:users-boun...@lists.open-mpi.org] *On Behalf Of 
> *Xiaolong
> Cui
> *Sent:* Wednesday, August 10, 2016 7:19 PM
> *To:* Open MPI Users <users@lists.open-mpi.org>
> *Subject:* Re: [OMPI users] runtime performance tuning for Intel OMA
> interconnect
>
>
>
> Hi Matias,
>
>
>
> Thanks a lot, that's very helpful!
>
>
>
> What I need indeed is to always use eager mode. But I didn't find any
> information about PSM2_MQ_EAGER_SDMA_SZ online. Would you please
> elaborate on "Just in case PSM2_MQ_EAGER_SDMA_SZ changes PIO to SDMA,
> always in eager mode."
>
>
>
> Thanks!
>
> Michael
>
>
>
> On Wed, Aug 10, 2016 at 3:59 PM, Cabral, Matias A <
> matias.a.cab...@intel.com> wrote:
>
> Hi Michael,
>
>
>
> When Open MPI run on Omni-Path it will choose the PSM2 MTL by default, to
> use the libpsm2.so. Strictly speaking, it has compatibility to run using
> the openib BTL. However, the performance so significantly impacted that it
> is, not only discouraged, but no tuning would make sense. Regarding the
> PSM2 MTL, currently it only supports two mca parameters
> ("mtl_psm2_connect_timeout" and "mtl_psm2_priority") which are not for what
> you are looking for. Instead, you can set values directly in the PSM2
> library with environment variables.  Further info in the Programmers Guide:
>
>
>
> http://www.intel.com/content/dam/support/us/en/documents/
> network-and-i-o/fabric-products/Intel_PSM2_PG_H76473_v3_0.pdf
>
> More docs:
>
>
>
> https://www-ssl.intel.com/content/www/us/en/support/
> network-and-i-o/fabric-products/000016242.html?wapkw=psm2
>
>
>
> Now, for your parameters:
>
>
>
> btl = openib,vader,self  -> Ignore this one
>
> btl_openib_eager_limit = 160000   -> I don’t clearly see the diff with the
> below parameter. However, they are set to the same value. Just in case
> PSM2_MQ_EAGER_SDMA_SZ changes PIO to SDMA, always in eager mode.
>
> btl_openib_rndv_eager_limit = 160000  -> PSM2_MQ_RNDV_HFI_THRESH
>
> btl_openib_max_send_size = 160000   -> does not apply to PSM2
>
> btl_openib_receive_queues = P,128,256,192,128:S,2048,1024,
> 1008,64:S,12288,1024,1008,64:S,160000,1024,512,512  -> does not apply for
> PSM2.
>
>
>
> Thanks,
>
> Regards,
>
>
>
> _MAC
>
> BTW, should change the subject OMA -> OPA
>
>
>
>
>
> *From:* users [mailto:users-boun...@lists.open-mpi.org
> <users-boun...@lists.open-mpi.org>] *On Behalf Of *Xiaolong Cui
> *Sent:* Tuesday, August 09, 2016 2:22 PM
> *To:* users@lists.open-mpi.org
> *Subject:* [OMPI users] runtime performance tuning for Intel OMA
> interconnect
>
>
>
> I used to tune the performance of OpenMPI on InfiniBand by changing the
> OpenMPI MCA parameters for openib component (see
> https://www.open-mpi.org/faq/?category=openfabrics). Now I migrate to a
> new cluster that deploys Intel's omni-path interconnect, and my previous
> approach does not work any more. Does anyone know how to tune the
> performance for omni-path interconnect (what OpenMPI component to change) ?
>
>
>
> The version of OpenMPI is openmpi-1.10.2-hfi. I have included the output
> from opmi_info and openib parameters that I used to change. Thanks!
>
>
>
> Sincerely,
>
> Michael
>
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
>
>
>
>
>
>
>
> _______________________________________________
> users mailing list
>
>
>
>
>
>
>
> --
>
> Xiaolong Cui (Michael)
>
> Department of Computer Science
>
> Dietrich School of Arts & Science
>
> University of Pittsburgh
>
> Pittsburgh, PA 15260
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] runtime performance tuning for Intel OMA interconnect

Reply via email to