This patch implements a context and program cache for OpenCL. The OpenCL renderer can have very slow startup time in some implementations. For example, Intel CPU OpenCL takes 9 seconds to startup even when loading a cached program binary on a Core I7 990x Extreme Edition. This startup time happens every time the user renders. This patch makes startup time instant.
The implementation maintains a single process-wide thread-safe object that contains a map for contexts and for programs. Since a program is part of a context, they need to be maintained together. The cache itself is lazy instantiated - no instance will be constructed until the first time it is accessed. OpenCL objects are reference counted, and the cache takes advantage of that, by using "retain" calls. This allows the OpenCLDevice implementation to just go ahead and release the object when it is finished with it. Each time an object is fetched from the cache, it is assumed that the caller will just release it (as usual) when finished with it, so it needs to be retained each time. It is possible for a race condition to occur, where two threads do a lookup and both don't find the object. Both threads will proceed with compilation and both threads will try to insert into the cache. The access to the cache data itself is protected by a mutex, so the race loser's map insert will fail, so no retain call will be issued, and the loser will really release the object when it is finished with it. Besides the render startup time fix, this also changes the clFinish after issuing the kernel to clFlush instead. clFinish is completely unnecessary, but clFlush ensures that the device will begin working on it as soon as possible without waiting until a blocking copy. We definitely don't want the compute hardware to be idle when enqueue_kernel returns, which is how it is without this change. We currently block at the next memory copy, which maintains full coherency already. I changed the implementation of mem_alloc partially, which is part of a change I have in progress that uses unified memory (zero-copy) for capable OpenCL devices. _______________________________________________ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers