Re: [ViennaCL-devel] Implementation of multi_inner_prod

2014-06-27 Thread Philippe Tillet
Ok, thanks! This sounds reasonable indeed. Philippe 2014-06-26 23:51 GMT+02:00 Karl Rupp : > Hi, > > the cases 5, 6, and 7 are handled by running a kernel for four vectors, > then subtract '4' and run a dedicated kernel on the remaining 1, 2, or 3 > vectors. This could also be handled by a gene

Re: [ViennaCL-devel] Implementation of multi_inner_prod

2014-06-26 Thread Karl Rupp
Hi, the cases 5, 6, and 7 are handled by running a kernel for four vectors, then subtract '4' and run a dedicated kernel on the remaining 1, 2, or 3 vectors. This could also be handled by a generated kernel, yes, but I haven't implemented this for two reasons: 1. less kernels to compile 2.

Re: [ViennaCL-devel] Implementation of multi_inner_prod

2014-06-26 Thread Philippe Tillet
I'll add something. I assume that multiple kernels are launched thanks to current_index. Wouldn't it be better to launch one single kernel ? I think that a lot of users would prefer to have better performance for perhaps a slightly longer JIT overhead (since we'll provide a caching mechanism). Phi

[ViennaCL-devel] Implementation of multi_inner_prod

2014-06-26 Thread Philippe Tillet
Hello! I note this in the implementation of multi_inner_prod: switch (vec_tuple.const_size() - current_index) { case 7: case 6: case 5: case 4: //do stuff However, there is a test for 5,6,7 so I assume that these h