Hey, The GEMM kernel(s) are getting pretty tricky, with quite a few fallbacks involved. This gets hard to test, so I thought it could be a good idea to discuss this. Basically, here is how it works:
A = [A1 A2; A3 A4] B = [B1 B2; B3 B4] C = [C1 C2; C3 C4] Where each block is divided according to the corresponding block size of the template. For example; A1 is the closest multiple of the size tuple (ML, KL), where ML is the number of rows computed by each work group, and KL the "width step" for computing the inner products (If the kernel use local memories, it will load successive blocks of size ML*KL in each work group). A few kernels are enqueued so that: C1 = A1*B1 [optimized kernel] C1 += A2*B3 [fallback] if needed C2 = A1*B2 [fallback] if needed C2 += A2*B4 [fallback] if needed etc... Basically, one optimized kernel doing the bulk of the work, and the other ones doing the "clean-up". This works well for full matrices and ranges. When slices are involved, things get more complicated. If the stride is on the non-leading dimension (stride2 for column-major matrices), then it can be incorporated in the optimized kernel. (by appending ld *= stride2 at the beginning of the kernel). However, if stride1 > 1, then we need to use the fallback kernel. This is a reasonable thing to do : in most applications I know of, only one stride is accessed at the time (we want a set of the rows/columns of a given matrix). However, this becomes really messy to test! Basically, I think that, to have an exhaustive enough testing suite, then we should go for: - Matrices of complicated arbitrary sizes (143, 284, 395). It is important to space them by more than 128, to be sure that A1, B1 and C1 is not square. - Ranges of similar complicated sizes. - "Optimized" range: (128, 256, 384) for example - matrix row-wise slices, matrix col-wise slices, matrix slice in both directions. I am ready to rewrite the GEMM tests accordingly, but any thought on the procedure would be appreciated! Philippe
------------------------------------------------------------------------------
_______________________________________________ ViennaCL-devel mailing list ViennaCL-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/viennacl-devel