Hi,
  Suppose I have this piece of code and I use cuda-aware MPI,
              cudaMalloc(&sbuf,sz);
   Kernel1<<<...,stream>>>(...,sbuf);
   MPI_Isend(sbuf,...);
   Kernel2<<<...,stream>>>();

  Do I need to call cudaStreamSynchronize(stream) before MPI_Isend() to make 
sure data in sbuf is ready to send?  If not, why?

  Thank you.

--Junchao Zhang

Reply via email to