Hey hey hey, > Convergence depends on what is inside generate_execute() ;-) How is > the problem with alpha and beta residing on the GPU addressed? How > will the batch-compilation look like? The important point is that > for the default axpy kernels we really don't want to go through the > jit-compiler for each of them individually. > > > ;) > in this case, generate_execute() will just trigger the compilation - on > the first call only - of the kernel > x = cpu_alpha*y + cpu_beta*z; > > __kernel void kernel(unsigned int N, float4* x, float4* y, float4* z, > float alpha, float beta) > { > for(i = get_global_id(0) ; i < N ; i+=get_global_size(0)) > x[i] = alpha*y[i] + beta*z[i]; > }
I'm afraid this is not suitable then. A simple conjugate gradient solver would then go through ~10 OpenCL compilations, making it awfully slow at the first run... With AMD and Intel SDKs, which to my knowledge still do not buffer kernels, this would mean that each time a process is started, this large overhead will be visible. Best regards, Karli ------------------------------------------------------------------------------ CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments & Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk _______________________________________________ ViennaCL-devel mailing list ViennaCL-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/viennacl-devel