Hi, > The nasty bug on strided GEMV got solved.
thanks! :-) > I'm available on wednesday for the code uniformization session. We > should be on IRC at the same time, though, in case we face a situation > we had not discussed. Ok, when do we start? > I have a couple of questions regarding a > standardized way of naming the numeric type of a matrix/vector. > Sometimes it's NumericT, sometimes it's T, sometimes it's TYPE... What > about NumericType everywhere? Anyway, some similar questions could arise > so it's probably better to be able to chat in real time while making the > code style uniform. The answer can already be found here: http://viennastar.iue.tuwien.ac.at/wiki/doku.php?id=codingstyle ;-) > We must also remember to sort out > https://github.com/viennacl/viennacl-dev/issues/71 > https://github.com/viennacl/viennacl-dev/issues/77 > https://github.com/viennacl/viennacl-dev/issues/66 > https://github.com/viennacl/viennacl-dev/issues/2 Sure. What is special about them in comparison to the other issues scheduled for 1.6.0? Best regards, Karli > > 2014-08-17 19:36 GMT+02:00 Philippe Tillet <phil.til...@gmail.com > <mailto:phil.til...@gmail.com>>: > > So the dense benchmark suite got refurbished here: > > > https://github.com/viennacl/viennacl-dev/commit/73f46e36cfa4104628f831195e4da25a62f9ef66 > The same template using macros can be used for any benchmark. It's > pretty concise and maintainable! > > Philippe > > > 2014-08-17 13:50 GMT+02:00 Karl Rupp <r...@iue.tuwien.ac.at > <mailto:r...@iue.tuwien.ac.at>>: > > Hi, > > > > * nmf only implements matrix<T>, but in principle > matrix_base<T> should > > work (since no custom kernel is called, I believe) > > > NMF uses a custom kernel and thus only works with OpenCL. A > generalization to matrix_base should be > straight-forward, yes. I > should be able to do it for the release. > > > The kernel it uses is: > > template <typename StringType> > void generate_nmf_el_wise_mul_div(__StringType & > source, > std::string const & numeric_string) > { > source.append("__kernel void el_wise_mul_div( \n"); > source.append(" __global "); > source.append(numeric_string); source.append(" * matrix1, \n"); > source.append(" __global const "); > source.append(numeric_string); source.append(" * matrix2, \n"); > source.append(" __global const "); > source.append(numeric_string); source.append(" * matrix3, \n"); > source.append(" unsigned int size) \n"); > source.append("{ \n"); > source.append(" for (unsigned int i = > get_global_id(0); i < > size; i += get_global_size(0)) \n"); > source.append(" { \n"); > source.append(" "); > source.append(numeric_string); > source.append(" val = matrix1[i] * matrix2[i]; \n"); > source.append(" "); > source.append(numeric_string); > source.append(" divisor = matrix3[i]; \n"); > source.append(" matrix1[i] = (divisor > ("); > source.append(numeric_string); source.append(")0.00001) ? (val / > divisor) : ("); source.append(numeric_string); > source.append(")0; \n"); > source.append(" } \n"); > source.append("} \n"); > } > > So, the layout of the matrix shouldn't matter, indeed. It > would be > pretty easy to have this kernel generated by the generator, > too, as this > can be represented by the expression tree : > matrix1 = select(matrix3 > 0.00001, > element_div(element_prod(__matrix1, > matrix2), matrix3), cast<T>(0)). > However, we're running out of time so I wouldn't port it. > But we have to > keep in mind that this would be a trivial thing to do. > > > The same student who ported the FFT-code to multiple backends > will take care of porting NMF to multiple backends. He's pretty > quick already, so it should be done by the release. > > However, I'd refrain from integrating this into the generator > for now because it is totally non-critical in terms of overall > performance. We can port that under perfect control within the > OpenCL backend later when we have more confidence in the > stability of the generator (no pun' intended). > > > > - We should definitely have a discussion on matrix > padding, > which is no > longer required anywhere in ViennaCL, as far as I > know. I am in > favor of > making size()==internal_size() by default. That's > not the point > of the > e-mail, but we should have a discussion on what we > should do > with it! > > > Getting rid of the padding would certainly remove the > traps of using > fast_copy() on a matrix. Other than that, I don't think > it has a > substantial influence on the code because > internal_size() is still > needed for dealing with ranges. > > There may be an influence on certain bandwidth-limited > operations, > though, as for example a matrix addition may lead to > bank conflicts > (or channel conflicts, whatever...) when accessing GPU > RAM for > certain matrix sizes. Before making a decision on the > padding issue, > we should run some benchmarks to see whether there is > an impact. > > > Well, one thing I'm sure of is that we should give the > possibility to > use no padding if needed (for memory constraints), or > (probably even > better) to choose the padding size. > > > Apparently it is not an easy choice for us to pick the default > because of the many things to consider. Thus, making this > user-customizable is most likely the way to go, so that we only > have to worry about choosing the 'best' default layout :-) > > > I completely agree that removing > padding will have a harmful influence for > ldsize=some_weird_number. > > > which makes things so complicated... ;-) > > > > However, we certainly don't need to pad both size1 and > size2. Padding > size2() for row-major matrices, and size1() for column major > matrices, > will not cause any performance regression. > > > Indeed. > > > > There are a couple of more things for the release to be > completed. > They are essentially all listed in the issue tracker > and have the > 1.6.0 milestone assigned to it, except for the > unification of coding > style. When are you available for tackling that > together? I'm > available after Monday. > > > I'm available from today to Friday. I'll be unavailable for > quite some > time afterwards for any significant work. I will still be > available for > critical work such as fixing correctness issues in the > generated code, > but overall I'll be busy designing my PhD course/research > plans. What I > plan to do before leaving: > - Fix the GEMM performance regression of the fallback kernel > - Refurbish the benchmark code for dense operations > - Rewrite the matrix-vector tests > > Not much more. This is my last week in France, so I want to > spend some > time with my family. I've also been having a really hard > time lately > when adding support for vector types, ranges and strides > inside the > generated kernels, so I feel like taking a short break > before my PhD > begins... > > > Sure, make sure you get to the US sufficiently relaxed, who > knows when you'll have the next opportunity to relax again ;-) > Let's schedule the coding style unification for Wednesday? We > should be done within a few hours I guess. > > Best regards, > Karli > > > ------------------------------------------------------------------------------ _______________________________________________ ViennaCL-devel mailing list ViennaCL-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/viennacl-devel