
I've been obtaining recently significant performance improvement out of the
kernel generator, which should bring ViennaCL 1.6 extremely close (>95%) to
CuBLAS (on NVidia hardware) and clAmdBlas (on AMD hardware) for BLAS1/Dense
BLAS2/Dense BLAS3. There are a lot of drawbacks to maintaining a
blas-linking functionnality, which may not be worth it if we can improve
the performance of ViennaCL to such an extent. I'd be in favor of dropping
this idea away from the ViennaCL 1.6 roadmap. This would also free us some
time to focus on distributed matrices/vectors.

What do you think about it?

Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
ViennaCL-devel mailing list

Reply via email to