Hi,

The integration of the generator is going slowly, but "surely". All the
OpenCL vector kernels (including plane rotations, multi-inner_prod, etc.)
should be device-specific in a few days. This will be a good leap forward,
in terms of maintainability, peak performance and performance-portability.
However, I feel like this is gonna get much more complicated for BLAS3.
Here's why.

Ranges do not work well with zero-padding. The kernel may perform
operations with nonzeros out-of-bound elements, thus giving an incorrect
result ...! Of course, it is mandatory to have each work-group processing
some (blocks) without any difficult size-checking, for performance. I think
that we should drop zero-padding, because it forces us to treat differently
vector and ranges, which we shouldn't have t do, since ranges are vectors,
too. I see two ways of handling GEMM without zero-padding, both of which
have advantages and drawbacks:

(1) Have an optimized kernel for ideal cases (size is a proper multiple of
64/128/Friday/Whatever ; stride{A,B,C} can be incorporated into LDA/LDB ;
start{A,B,C} are multiple of the vector length used in the kernel), and a
fallback kernel for all the other cases. This fallback is super-safe
(size/checking, vector_length=1...)

(2) The same as above, but the optimized kernel is always used for
performing the large sub-matrix multiplication ( rounding the size to the
best previous multiple), and the fallback is just used to finish the job.

I would go for (2), as (1) is simpler to implement but disastrous for large
odd matrices. However, for small matrices (2) will have a large over-head,
and it may be significantly worse than zero-padding in some corner cases
(consider a matrix 60x100000 with either zero-padding to make it 64x100032
or a crappy kernel...). Do you have any idea of how typical BLAS
implementations handle this issue with the offset? (strides are rare enough
to require slow but safe kernel, I believe)

Philippe
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their 
applications. Written by three acclaimed leaders in the field, 
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech
_______________________________________________
ViennaCL-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Reply via email to