Hi Charles :) The BLAS kernels for CUDA and OpenCL are entirely different, actually. OpenCL kernels rely on a code-generator, and have been auto-tuned. As far as I know, the CUDA kernels have not been auto-tuned, and don't rely on the same generation engine as the OpenCL ones. While for BLAS1-2, the difference should not be so significant, for GEMM it's totally possible to observe a huge difference.
Philippe 2015-07-31 12:04 GMT-07:00 Charles Determan <cdeterma...@gmail.com>: > Greetings, > > Brief background, I am developing a series of R packages to bring ViennaCL > to the R community. I have had success with the development of my gpuR > package (https://github.com/cdeterman/gpuR) which relies on the OpenCL > backend of ViennaCL (which is housed in the package RViennaCL). I am > hoping to submit to CRAN in the coming weeks now that the latest stable > ViennaCL version has just been released. > > Naturally, I wanted a companion package for a CUDA backend. This is now > the gpuRcuda package (https://github.com/cdeterman/gpuRcuda). This has > appeared to work successfully as most of the code is the same. However, my > initial benchmarks are showing very dismal performance with the CUDA > backend. > > I was wondering if someone from this list would be willing to have a look > at my code to see why the CUDA code would be so much worse. I had thought, > given working a NVIDIA card (GeForce GTX 970), CUDA would provide improved > speed but the benchmarks are showing performance at least 5-fold slower > than the CPU based R multiplication. Even the 'float' type matrix > multiplication is slower than R (which only has double type support!). > > The sgemm CUDA file is ( > https://github.com/cdeterman/gpuRcuda/blob/master/src/vcl_sgemm.cu) and > the associated C++ file is ( > https://github.com/cdeterman/gpuRcuda/blob/master/src/vcl_cudaMatrix_gemm.cpp > ). > > Other note, I have tried making the two packages completely independent > and the performance is still very poor with CUDA. > > I really appreciate any help others could provide troubleshooting this. I > have truly run out of ideas as to why the code has such poor performance. > > Regards, > Charles > > > ------------------------------------------------------------------------------ > > _______________________________________________ > ViennaCL-devel mailing list > ViennaCL-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/viennacl-devel > >
------------------------------------------------------------------------------
_______________________________________________ ViennaCL-devel mailing list ViennaCL-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/viennacl-devel