Hi Charles :)

The BLAS kernels for CUDA and OpenCL are entirely different, actually.
OpenCL kernels rely on a code-generator, and have been auto-tuned. As far
as I know, the CUDA kernels have not been auto-tuned, and don't rely on the
same generation engine as the OpenCL ones. While for BLAS1-2, the
difference should not be so significant, for GEMM it's totally possible to
observe a huge difference.

Philippe

2015-07-31 12:04 GMT-07:00 Charles Determan <cdeterma...@gmail.com>:

> Greetings,
>
> Brief background, I am developing a series of R packages to bring ViennaCL
> to the R community.  I have had success with the development of my gpuR
> package (https://github.com/cdeterman/gpuR) which relies on the OpenCL
> backend of ViennaCL (which is housed in the package RViennaCL).  I am
> hoping to submit to CRAN in the coming weeks now that the latest stable
> ViennaCL version has just been released.
>
> Naturally, I wanted a companion package for a CUDA backend.  This is now
> the gpuRcuda package (https://github.com/cdeterman/gpuRcuda).  This has
> appeared to work successfully as most of the code is the same.  However, my
> initial benchmarks are showing very dismal performance with the CUDA
> backend.
>
> I was wondering if someone from this list would be willing to have a look
> at my code to see why the CUDA code would be so much worse.  I had thought,
> given working a NVIDIA card (GeForce GTX 970), CUDA would provide improved
> speed but the benchmarks are showing performance at least 5-fold slower
> than the CPU based R multiplication.  Even the 'float' type matrix
> multiplication is slower than R (which only has double type support!).
>
> The sgemm CUDA file is (
> https://github.com/cdeterman/gpuRcuda/blob/master/src/vcl_sgemm.cu) and
> the associated C++ file is (
> https://github.com/cdeterman/gpuRcuda/blob/master/src/vcl_cudaMatrix_gemm.cpp
> ).
>
> Other note, I have tried making the two packages completely independent
> and the performance is still very poor with CUDA.
>
> I really appreciate any help others could provide troubleshooting this.  I
> have truly run out of ideas as to why the code has such poor performance.
>
> Regards,
> Charles
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> ViennaCL-devel mailing list
> ViennaCL-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/viennacl-devel
>
>
------------------------------------------------------------------------------
_______________________________________________
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Reply via email to