Hey, > I am a bit confused, is there any reason for using "reciprocal" and > "flip_sign", instead of just changing the scalar accordingly?
yes (with a drawback I'll discuss at the end): Consider the family of operations x = +- y OP1 a +- z OP2 b where x, y, and z are vectors, OP1 and OP2 are either multiplication or division, and a,b are host scalars. If I did the math correctly, these are 16 different kernels when coded explicitly. Hence, if you put all these into separate OpenCL kernels, you'll get fairly long compilation times. However, not that you cannot do this if a and b stem from device scalars, because then the manipulation of a and b would result in additional buffer allocations and kernel launches -> way too slow. For floating point operations, one can reduce the number of operations a lot when (+- OP1 a) and (+- OP2 b) are computed once in a preprocessing step. Then, only the kernel x = y * a' + z * b' is needed, cutting the number of OpenCL kernels from 16 to 1. Since (-a) and (1/a) cannot be computed outside the kernel if a is a GPU scalar, this is always computed in a preprocessing step inside the OpenCL kernel for unification purposes. I think we can even apply some more cleverness here if we delegate all the work to a suitable implementation function. And now for the drawback: When using integers, the operation n/m is no longer the same as n * (1/m). Even worse, for unsigned integers it is also no longer possible to replace n - m by n + (-m). Thus, we certainly have to bite the bullet and generate kernels for all 16 combinations when using unsigned integers. However, I'm reluctant to generate all 16 combinations for floating point arguments if this is not needed... Best regards, Karli ------------------------------------------------------------------------------ CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments & Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk _______________________________________________ ViennaCL-devel mailing list ViennaCL-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/viennacl-devel