An additional information. Now that the test passes, i can safely compare
the JIT compilation times.
Without the optimized kernels : 1.6seconds
With the optimized kernels : 6.5 seconds
I fear that this might get untractable when further operations are added. I
should probably find a way to disable the optimized kernels if the
VIENNACL_CACHE_PATH environment variable is not set...

Philippe


2014-05-28 1:15 GMT+02:00 Philippe Tillet <phil.til...@gmail.com>:

> Hello,
>
> The integration of the kernel generator has been a nightmare! Anyway, I've
> realized that thousands of kernels per scalartype are required, in order to
> obtain optimal performance. Why so much?
> - flip_a, reciprocal_a, flip_b, reciprocal_b requiring their own kernel
> - The generator interprets differently x = a*y + b*z, x = a*y + b*x, x =
> a*x + b*y, etc...
> - Each avbv requires 2 kernel, because we need one fallback when the
> offset is not a multiple of the simd_width. There are some trick on AMD
> implementations to avoid doing this, but I know no portable trick.
>
> As you might have guessed, this gets me uncomfortable and upset.
> On the one hand, it cannot be bad performance-wise to have a specific
> implementation for operations such as x = a*x + b*y, x = a*x + b*x, etc. On
> the other hand, I'm seriously wondering if the practical gain would be
> noticeable, and what practical overhead it would induce.
>
> Note, however, that a kernel x = a*x + a*x is at least as efficient as x =
> (2*a)*x (and more efficient if a is a device scalar).
>
> I need your advises here. Should I add an option to force the generator to
> treat each vector as a different object (so that x = a*x + b*z would use
> the kernel x = a*y + b*z with y<-x), or should I leave it as-is,
> considering that we might have a higher throughput at the price of more
> latency? Has anyone ever had bad experience with very large programs?
>
> Philippe
>
------------------------------------------------------------------------------
Time is money. Stop wasting it! Get your web API in 5 minutes.
www.restlet.com/download
http://p.sf.net/sfu/restlet
_______________________________________________
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Reply via email to