Hi Karl, here is the kernel below. Regarding your second point, I would
love to process all columns in one kernel but I want to avoid initializing
another entire matrix of the same size. To avoid this I am trying to only
initialize a vector of size = number of rows which can then be assigned to
t
Hi Charles,
can you please send us the kernel? Maybe there's something wrong with
the thread assignment there.
Also, rather than looping from 0 to P-1, it would make much more sense
to process all columns in parallel in a single kernel.
Best regards,
Karli
On 12/14/2016 06:01 PM, Charles Det