Hey!

So it seems like most of the features are ready for ViennaCL 1.6. My merge
from a few days ago (finally) fully integrated the use of device-specific
kernels for BLAS1, BLAS2, BLAS3. The reduction API is still missing,
though, but I think that the priority should be to polish the code, and to
ensure ViennaCL is still stable despite the migration to device-specific
kernels. Specifically, I think that we should spend the next few weeks
cleaning the code-base. I can list a few points that have caught my eye

- I've rewritten from scratch the GEMM test. It now uses ublas::prod 27
times, instead of 2500, and thanks to a few macros the file size is
substantially smaller (~250 lines vs ~850). The test now completes about
15times faster using the single threaded ViennaCL implementation, and in
the glimpse of an eye (~10seconds) when OpenCL is used. Hurray!
More importantly, this new version allowed me to spot the bug which was
responsible for the failure of libviennacl-blas3 in tonight's dash. The
culprit was blas3_prod-test choosing slice1==slice2, while the slices were
bugged for C=row-major+sliced... This is frightening because some other
similar glitches may be hidden here and there in the test suite. For
example, the matrix_vector test passes, but the libviennacl-blas2 test
fails. Probably due to some stride issues for row-major matrices. Things
get more complicated now that the col-major kernels are used for the
row-major cases. Anyhow, I think that we should somehow ensure that there
is no such glitch remaining in the test suite, before shipping ViennaCL 1.6
(ie, all matrix_slices/ranges use different offsets/strides in each
direction)

- I really think that we should rewrite the benchmarks for the 1.6 release,
all the more that it would value the substantial performance improvement
that this release will bring. I can start writing a condensed benchmark
including copy, axpy, dot, gemv, gemm. I think it would be cool to have
sparse,solver,qr also included in that routine. I won't have the time to
carry out this ; I'm moving to the United States in 1 week :-p

- I've noticed a some of unsafe/faulty legacy code dating back to when the
layout was made a runtime parameter.
* nmf only implements matrix<T>, but in principle matrix_base<T> should
work (since no custom kernel is called, I believe)
* There was a faulty row_major(is_row_major<BaseType>::value) in
matrix_range and matrix_stride. This caused matrix_range<matrix_base<T> >
to be column-major no matter what. More generally, there are a couple of
places using the static is_row_major<> or alignment<> traits. I thought
that it could be a good idea to delete these traits to be sure that there
can be no such faulty code anywhere else. Am I overlooking any side effect?

- We should definitely have a discussion on matrix padding, which is no
longer required anywhere in ViennaCL, as far as I know. I am in favor of
making size()==internal_size() by default. That's not the point of the
e-mail, but we should have a discussion on what we should do with it!

- Finally, there is a performance regression for GEMM with slices, due to
my fallback being too extreme (one element computed per work-unit). I'm on
it, so don't worry if you've got like 3GFLOP/s on slices in the current
blas3 benchmark.

Okay, that's pretty much everything I'm worried about, I think!

Philippe
------------------------------------------------------------------------------
_______________________________________________
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Reply via email to