See the following two threads: http://lists.webkit.org/pipermail/webkit-dev/2012-August/021820.html http://lists.webkit.org/pipermail/webkit-dev/2012-August/021734.html
In particular, https://bugs.webkit.org/show_bug.cgi?id=90375 appears to have a work-in-progress patch. - R. Niwa On Thu, Nov 22, 2012 at 6:31 PM, Zhang Peixuan <[email protected]>wrote: > Hello, > We are writing a GPU-based libjpeg-turbo accelerated version and our > goal is to use it in Chrome or other browsers that using Webkit. Now we > have written a beta version, that can use GPU to decode JPEG files in > Chromium. > We still have a lot of work to do. And I have known from the Chromium > community that there is also an effort underway in WebKit to generalize the > concept of parallel/asynchronous image decoding. > So I wan't to know whether we could contribute code? > > Thanks a lot. > > Peixuan Zhang > 20121123 > > The following is the email that I have sent to Chromium community. > > =================================== > > Hello, > > I'm a programmer, and my team and I are writing a GPU-based libjpeg-turbo > accelerated version, and we mainly use OpenCL. Our goal is to use it in > Chrome. Now we have written a beta version, that can use GPU to decode JPEG > files in Chromium. > > However, because we need to load the some additional .dll files and API > (e.g. we must load OpenCL.dll), this version must run with the parameter > --no-sandbox. > > We don't know how to run it without no-sandbox, so I really want to know > how to load additional .dll files and access some information of the > registry in sandbox. Is there some way to do it? > > In addition, because we need to do some initialization before using > OpenCL, while Chrome is a multi-process application, so it needs to do > initialization work in each process, which increases the time consumption. > We have put forward several ideas, and my workmate Peng Xiao has discussed > with you in this community. But after some discussion, we thought that > these ideas may not be suitable. > > Therefore, we have proposed another solution, if we use a separate process > to deal with jpeg decoder, We won't need to do multiple initialization > work. I think it just like the process of "--type=gpu-process". We could > decode image using a single process. > > We learned that Chrome run JPEG decoder in sandbox maybe because safety > factors, so we don't know if we run all JPEG decoder in one process, > whether it will bring security risks? Or whether it will bring other > problems? Because this step of the work is still in the conceptual stage, > we do not know whether it is worthwhile to go ahead. > > Yours sincerely. > ============================================================= > > 1. Do you have timing information about how jpeg decoding is a bottleneck > at the moment? How much % of time is spent in jpeg decoding on rendering? > > According to the libjpeg-turbo-OpenCL that we have already completed, the > performance is a little good than the original version. Of course, we only > tested independent libjpeg-turbo, and there may be some differences in > Chrome. > > We tested on AMD A10M 4600M 2.3GHz, on this platform, the OpenCL version > is 20~70% faster than before (the performance due to image size and > sampling ratio). And for some case, it's even 8% faster than Intel i7-3520M > 2.9GHz. > > Of course, in many cases the JPEG codec is not the most time-consuming > things in browsers, but with the popularity of HTML5, the picture codec's > proportion will be more and more. e.g. There are many JPEG textures in > WebGL. > > 2. Do you plan to use OpenCL for other things than jpeg decoding? > > Yes, we do have more plan that use OpenCL to accelerate some of the > features in Chrome, what we're doing at least include JPEG and FFMpeg, in > the future we may do more work on image and video. > > 3. Do you have an idea about the latency introduced by doing that, plus > the kernel overhead, compared to a completely user-mode solution? There are > several context switches introduced that would add a constant time to > decoding an image, which severely affect smaller images. Is it worth > sending 500 bytes of data to the GPU to be decoded? I don't think so. > > Yes, we have some ideas that can reduce the transmission time between CPU > and GPU, and we also try to reduce the time of kernel overhead, some of > these ideas have matured, but we are waiting for its open source. > > 4. The sandbox bypass is a non-starter. Adding yet-another-process is a > non-starter too. Having a new jpeg decoder significantly increases the > attack surface so just from a security perspective, I'm not sure it's worth. > > It's very important, if it would bring high risk of safe, the value of the > work is low. > > 5. Do you have an idea how to do the runtime trade off when it's worth > doing a software-only decoding versus offloading to the GPU? What if the > user has its GPU already saturated but its CPU idle? At the extreme end, > let's assume a dual-8-Cores-Xeon with a low-end Intel integrated graphic > cards with 2 30" monitors plugged in. > > OK, We are concerned about the different things, I think on AMD trinity > APU, there may not be such problems, for Intel, I think I need to do some > additional research. > > In addition to what Marc-Antoine said, note that there is also an effort > underway in WebKit to generalize the concept of parallel/asynchronous image > decoding. You probably want to sync up with that effort to see what > overlaps. > > Thanks a lot, I will send email to WebKit community for more infomation. > > _______________________________________________ > webkit-dev mailing list > [email protected] > http://lists.webkit.org/mailman/listinfo/webkit-dev > >
_______________________________________________ webkit-dev mailing list [email protected] http://lists.webkit.org/mailman/listinfo/webkit-dev

