So the dense benchmark suite got refurbished here: https://github.com/viennacl/viennacl-dev/commit/73f46e36cfa4104628f831195e4da25a62f9ef66 The same template using macros can be used for any benchmark. It's pretty concise and maintainable!
Philippe 2014-08-17 13:50 GMT+02:00 Karl Rupp <r...@iue.tuwien.ac.at>: > Hi, > > > > * nmf only implements matrix<T>, but in principle matrix_base<T> > should > >> work (since no custom kernel is called, I believe) >> >> >> NMF uses a custom kernel and thus only works with OpenCL. A >> generalization to matrix_base should be straight-forward, yes. I >> should be able to do it for the release. >> >> >> The kernel it uses is: >> >> template <typename StringType> >> void generate_nmf_el_wise_mul_div(StringType & source, >> std::string const & numeric_string) >> { >> source.append("__kernel void el_wise_mul_div( \n"); >> source.append(" __global "); >> source.append(numeric_string); source.append(" * matrix1, \n"); >> source.append(" __global const "); >> source.append(numeric_string); source.append(" * matrix2, \n"); >> source.append(" __global const "); >> source.append(numeric_string); source.append(" * matrix3, \n"); >> source.append(" unsigned int size) \n"); >> source.append("{ \n"); >> source.append(" for (unsigned int i = get_global_id(0); i < >> size; i += get_global_size(0)) \n"); >> source.append(" { \n"); >> source.append(" "); source.append(numeric_string); >> source.append(" val = matrix1[i] * matrix2[i]; \n"); >> source.append(" "); source.append(numeric_string); >> source.append(" divisor = matrix3[i]; \n"); >> source.append(" matrix1[i] = (divisor > ("); >> source.append(numeric_string); source.append(")0.00001) ? (val / >> divisor) : ("); source.append(numeric_string); source.append(")0; \n"); >> source.append(" } \n"); >> source.append("} \n"); >> } >> >> So, the layout of the matrix shouldn't matter, indeed. It would be >> pretty easy to have this kernel generated by the generator, too, as this >> can be represented by the expression tree : >> matrix1 = select(matrix3 > 0.00001, element_div(element_prod(matrix1, >> matrix2), matrix3), cast<T>(0)). >> However, we're running out of time so I wouldn't port it. But we have to >> keep in mind that this would be a trivial thing to do. >> > > The same student who ported the FFT-code to multiple backends will take > care of porting NMF to multiple backends. He's pretty quick already, so it > should be done by the release. > > However, I'd refrain from integrating this into the generator for now > because it is totally non-critical in terms of overall performance. We can > port that under perfect control within the OpenCL backend later when we > have more confidence in the stability of the generator (no pun' intended). > > > > - We should definitely have a discussion on matrix padding, >> which is no >> longer required anywhere in ViennaCL, as far as I know. I am in >> favor of >> making size()==internal_size() by default. That's not the point >> of the >> e-mail, but we should have a discussion on what we should do >> with it! >> >> >> Getting rid of the padding would certainly remove the traps of using >> fast_copy() on a matrix. Other than that, I don't think it has a >> substantial influence on the code because internal_size() is still >> needed for dealing with ranges. >> >> There may be an influence on certain bandwidth-limited operations, >> though, as for example a matrix addition may lead to bank conflicts >> (or channel conflicts, whatever...) when accessing GPU RAM for >> certain matrix sizes. Before making a decision on the padding issue, >> we should run some benchmarks to see whether there is an impact. >> >> >> Well, one thing I'm sure of is that we should give the possibility to >> use no padding if needed (for memory constraints), or (probably even >> better) to choose the padding size. >> > > Apparently it is not an easy choice for us to pick the default because of > the many things to consider. Thus, making this user-customizable is most > likely the way to go, so that we only have to worry about choosing the > 'best' default layout :-) > > > I completely agree that removing >> padding will have a harmful influence for ldsize=some_weird_number. >> > > which makes things so complicated... ;-) > > > > However, we certainly don't need to pad both size1 and size2. Padding >> size2() for row-major matrices, and size1() for column major matrices, >> will not cause any performance regression. >> > > Indeed. > > > > There are a couple of more things for the release to be completed. >> They are essentially all listed in the issue tracker and have the >> 1.6.0 milestone assigned to it, except for the unification of coding >> style. When are you available for tackling that together? I'm >> available after Monday. >> >> >> I'm available from today to Friday. I'll be unavailable for quite some >> time afterwards for any significant work. I will still be available for >> critical work such as fixing correctness issues in the generated code, >> but overall I'll be busy designing my PhD course/research plans. What I >> plan to do before leaving: >> - Fix the GEMM performance regression of the fallback kernel >> - Refurbish the benchmark code for dense operations >> - Rewrite the matrix-vector tests >> >> Not much more. This is my last week in France, so I want to spend some >> time with my family. I've also been having a really hard time lately >> when adding support for vector types, ranges and strides inside the >> generated kernels, so I feel like taking a short break before my PhD >> begins... >> > > Sure, make sure you get to the US sufficiently relaxed, who knows when > you'll have the next opportunity to relax again ;-) Let's schedule the > coding style unification for Wednesday? We should be done within a few > hours I guess. > > Best regards, > Karli > >
------------------------------------------------------------------------------
_______________________________________________ ViennaCL-devel mailing list ViennaCL-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/viennacl-devel