Hi, Suppose I have this piece of code and I use cuda-aware MPI, cudaMalloc(&sbuf,sz); Kernel1<<<...,stream>>>(...,sbuf); MPI_Isend(sbuf,...); Kernel2<<<...,stream>>>();
Do I need to call cudaStreamSynchronize(stream) before MPI_Isend() to make sure data in sbuf is ready to send? If not, why? Thank you. --Junchao Zhang