Hi,

Oh, I get it better now. I am not entirely convinced, though ;)
>From my experience, the overhead of the jit launch is negligible  compared
to the compilation of one kernel. I'm not sure whether compiling two
kernels in the same program or two different program creates a big
difference. Plus, ideally, in the case of linear solver, the generator
could be used to generate fused kernels, provided that the scheduler is
fully operationnal. I fear that any solution to the aforementioned problem
would destroy this precious ability... Ideally, once we enable it, the
generate_execute() mentioned above would just be replaced by generate() (or
enqueue_for_generation, which is more explicit)

This put aside, I'm not sure if we should give that much importance to
jit-compilation overhead, since the binaries can be cached. If I remember
well, Denis Demidov implemented such a caching mechanism for VexCL. What if
we replace  "distributed vector/matrix" with "optionnal automatic kernel
caching mechanism" for ViennaCL 1.6.0 (we just have a limited amount of
time :P) ? The drawback is that the filesystem library would have to be
dynamically linked, though, but afterall OpenCL itself also has to be
dynamically linked.

Best regards,
Philippe

2014/1/25 Karl Rupp <r...@iue.tuwien.ac.at>

> Hi Philippe,
>
>
>
>  I don't understand why this would go through more than one compilation...
>> This kernel is compiled only once, the value of flip_sign and reciprocal
>> only changes the dynamic value of the argument, not the source code.
>>
>> This would eventually result in:
>>
>> if(alpha_reciprocal)
>>     kernel(N,x,y,z,1/alpha,beta)
>>
>> Am I missing something?
>>
>
> I think so ;-) It's not about a single kernel, it's about the compilation
> unit (i.e. OpenCL program). For conjugate gradients we roughly have the
> following vector operations (random variable names)
>
> x = y;
> x += alpha y;
> x = z + alpha z;
> x = y - alpha z;
> x = inner_prod(y,z);
>
> BiCGStab and GMRES add a few more of them. If we use the generator as-is
> now, then each of the operations creates a separate OpenCL program the
> first time it is encountered and we pay the jit-compiler launch overhead
> multiple times. With the current non-generator model, all vector kernels
> are in the same OpenCL program and we pay the jit-overhead only once. I'd
> like to stick with the current model of having just one OpenCL program for
> all the basic kernels, but get the target-optimized sources from the
> generator.
>
> Sorry if I wasn't clear enough in my earlier mails.
>
> Best regards,
> Karli
>
>
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Reply via email to