Greetings,

Brief background, I am developing a series of R packages to bring ViennaCL
to the R community.  I have had success with the development of my gpuR
package (https://github.com/cdeterman/gpuR) which relies on the OpenCL
backend of ViennaCL (which is housed in the package RViennaCL).  I am
hoping to submit to CRAN in the coming weeks now that the latest stable
ViennaCL version has just been released.

Naturally, I wanted a companion package for a CUDA backend.  This is now
the gpuRcuda package (https://github.com/cdeterman/gpuRcuda).  This has
appeared to work successfully as most of the code is the same.  However, my
initial benchmarks are showing very dismal performance with the CUDA
backend.

I was wondering if someone from this list would be willing to have a look
at my code to see why the CUDA code would be so much worse.  I had thought,
given working a NVIDIA card (GeForce GTX 970), CUDA would provide improved
speed but the benchmarks are showing performance at least 5-fold slower
than the CPU based R multiplication.  Even the 'float' type matrix
multiplication is slower than R (which only has double type support!).

The sgemm CUDA file is (
https://github.com/cdeterman/gpuRcuda/blob/master/src/vcl_sgemm.cu) and the
associated C++ file is (
https://github.com/cdeterman/gpuRcuda/blob/master/src/vcl_cudaMatrix_gemm.cpp
).

Other note, I have tried making the two packages completely independent and
the performance is still very poor with CUDA.

I really appreciate any help others could provide troubleshooting this.  I
have truly run out of ideas as to why the code has such poor performance.

Regards,
Charles
------------------------------------------------------------------------------
_______________________________________________
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Reply via email to