Hi, > * nmf only implements matrix<T>, but in principle matrix_base<T> should > work (since no custom kernel is called, I believe) > > > NMF uses a custom kernel and thus only works with OpenCL. A > generalization to matrix_base should be straight-forward, yes. I > should be able to do it for the release. > > > The kernel it uses is: > > template <typename StringType> > void generate_nmf_el_wise_mul_div(StringType & source, > std::string const & numeric_string) > { > source.append("__kernel void el_wise_mul_div( \n"); > source.append(" __global "); > source.append(numeric_string); source.append(" * matrix1, \n"); > source.append(" __global const "); > source.append(numeric_string); source.append(" * matrix2, \n"); > source.append(" __global const "); > source.append(numeric_string); source.append(" * matrix3, \n"); > source.append(" unsigned int size) \n"); > source.append("{ \n"); > source.append(" for (unsigned int i = get_global_id(0); i < > size; i += get_global_size(0)) \n"); > source.append(" { \n"); > source.append(" "); source.append(numeric_string); > source.append(" val = matrix1[i] * matrix2[i]; \n"); > source.append(" "); source.append(numeric_string); > source.append(" divisor = matrix3[i]; \n"); > source.append(" matrix1[i] = (divisor > ("); > source.append(numeric_string); source.append(")0.00001) ? (val / > divisor) : ("); source.append(numeric_string); source.append(")0; \n"); > source.append(" } \n"); > source.append("} \n"); > } > > So, the layout of the matrix shouldn't matter, indeed. It would be > pretty easy to have this kernel generated by the generator, too, as this > can be represented by the expression tree : > matrix1 = select(matrix3 > 0.00001, element_div(element_prod(matrix1, > matrix2), matrix3), cast<T>(0)). > However, we're running out of time so I wouldn't port it. But we have to > keep in mind that this would be a trivial thing to do.
The same student who ported the FFT-code to multiple backends will take care of porting NMF to multiple backends. He's pretty quick already, so it should be done by the release. However, I'd refrain from integrating this into the generator for now because it is totally non-critical in terms of overall performance. We can port that under perfect control within the OpenCL backend later when we have more confidence in the stability of the generator (no pun' intended). > - We should definitely have a discussion on matrix padding, > which is no > longer required anywhere in ViennaCL, as far as I know. I am in > favor of > making size()==internal_size() by default. That's not the point > of the > e-mail, but we should have a discussion on what we should do > with it! > > > Getting rid of the padding would certainly remove the traps of using > fast_copy() on a matrix. Other than that, I don't think it has a > substantial influence on the code because internal_size() is still > needed for dealing with ranges. > > There may be an influence on certain bandwidth-limited operations, > though, as for example a matrix addition may lead to bank conflicts > (or channel conflicts, whatever...) when accessing GPU RAM for > certain matrix sizes. Before making a decision on the padding issue, > we should run some benchmarks to see whether there is an impact. > > > Well, one thing I'm sure of is that we should give the possibility to > use no padding if needed (for memory constraints), or (probably even > better) to choose the padding size. Apparently it is not an easy choice for us to pick the default because of the many things to consider. Thus, making this user-customizable is most likely the way to go, so that we only have to worry about choosing the 'best' default layout :-) > I completely agree that removing > padding will have a harmful influence for ldsize=some_weird_number. which makes things so complicated... ;-) > However, we certainly don't need to pad both size1 and size2. Padding > size2() for row-major matrices, and size1() for column major matrices, > will not cause any performance regression. Indeed. > There are a couple of more things for the release to be completed. > They are essentially all listed in the issue tracker and have the > 1.6.0 milestone assigned to it, except for the unification of coding > style. When are you available for tackling that together? I'm > available after Monday. > > > I'm available from today to Friday. I'll be unavailable for quite some > time afterwards for any significant work. I will still be available for > critical work such as fixing correctness issues in the generated code, > but overall I'll be busy designing my PhD course/research plans. What I > plan to do before leaving: > - Fix the GEMM performance regression of the fallback kernel > - Refurbish the benchmark code for dense operations > - Rewrite the matrix-vector tests > > Not much more. This is my last week in France, so I want to spend some > time with my family. I've also been having a really hard time lately > when adding support for vector types, ranges and strides inside the > generated kernels, so I feel like taking a short break before my PhD > begins... Sure, make sure you get to the US sufficiently relaxed, who knows when you'll have the next opportunity to relax again ;-) Let's schedule the coding style unification for Wednesday? We should be done within a few hours I guess. Best regards, Karli ------------------------------------------------------------------------------ _______________________________________________ ViennaCL-devel mailing list ViennaCL-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/viennacl-devel