Hey! So it seems like most of the features are ready for ViennaCL 1.6. My merge from a few days ago (finally) fully integrated the use of device-specific kernels for BLAS1, BLAS2, BLAS3. The reduction API is still missing, though, but I think that the priority should be to polish the code, and to ensure ViennaCL is still stable despite the migration to device-specific kernels. Specifically, I think that we should spend the next few weeks cleaning the code-base. I can list a few points that have caught my eye
- I've rewritten from scratch the GEMM test. It now uses ublas::prod 27 times, instead of 2500, and thanks to a few macros the file size is substantially smaller (~250 lines vs ~850). The test now completes about 15times faster using the single threaded ViennaCL implementation, and in the glimpse of an eye (~10seconds) when OpenCL is used. Hurray! More importantly, this new version allowed me to spot the bug which was responsible for the failure of libviennacl-blas3 in tonight's dash. The culprit was blas3_prod-test choosing slice1==slice2, while the slices were bugged for C=row-major+sliced... This is frightening because some other similar glitches may be hidden here and there in the test suite. For example, the matrix_vector test passes, but the libviennacl-blas2 test fails. Probably due to some stride issues for row-major matrices. Things get more complicated now that the col-major kernels are used for the row-major cases. Anyhow, I think that we should somehow ensure that there is no such glitch remaining in the test suite, before shipping ViennaCL 1.6 (ie, all matrix_slices/ranges use different offsets/strides in each direction) - I really think that we should rewrite the benchmarks for the 1.6 release, all the more that it would value the substantial performance improvement that this release will bring. I can start writing a condensed benchmark including copy, axpy, dot, gemv, gemm. I think it would be cool to have sparse,solver,qr also included in that routine. I won't have the time to carry out this ; I'm moving to the United States in 1 week :-p - I've noticed a some of unsafe/faulty legacy code dating back to when the layout was made a runtime parameter. * nmf only implements matrix<T>, but in principle matrix_base<T> should work (since no custom kernel is called, I believe) * There was a faulty row_major(is_row_major<BaseType>::value) in matrix_range and matrix_stride. This caused matrix_range<matrix_base<T> > to be column-major no matter what. More generally, there are a couple of places using the static is_row_major<> or alignment<> traits. I thought that it could be a good idea to delete these traits to be sure that there can be no such faulty code anywhere else. Am I overlooking any side effect? - We should definitely have a discussion on matrix padding, which is no longer required anywhere in ViennaCL, as far as I know. I am in favor of making size()==internal_size() by default. That's not the point of the e-mail, but we should have a discussion on what we should do with it! - Finally, there is a performance regression for GEMM with slices, due to my fallback being too extreme (one element computed per work-unit). I'm on it, so don't worry if you've got like 3GFLOP/s on slices in the current blas3 benchmark. Okay, that's pretty much everything I'm worried about, I think! Philippe
------------------------------------------------------------------------------
_______________________________________________ ViennaCL-devel mailing list ViennaCL-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/viennacl-devel