Thanks a lot, and we will check it soon. 2012/11/23 Dong Seong Hwang <[email protected]>
> Looks promising to me. However, as you mentioned, using gpu wastes > constant times. You have to handle trade-off carefully, because decoding > small jpeg is light. > > You can work in parallel with Bug 90375 as Niwa mentioned. Bug 90375 tried > to change WebKit architecture to decode off the main thread. Your work > optimizes just decoding operations. Both approaches can get along with. > In addition, Bug 90375 was postponed, because this causes flashing images. > We think the solution will be general deferred image decoding on WebKit. > > Alpha in chromium team works deferred image decoding on chromium in > progress. > https://bugs.webkit.org/show_bug.cgi?id=94240 > > You need to see. I think Alpha will decode an image off the main thread. > You should make the opencl jpeg decoder can be used on multi threads. > > - Dongsung Huang > > > 2012/11/23 Ryosuke Niwa <[email protected]> > >> See the following two threads: >> http://lists.webkit.org/pipermail/webkit-dev/2012-August/021820.html >> http://lists.webkit.org/pipermail/webkit-dev/2012-August/021734.html >> >> In particular, https://bugs.webkit.org/show_bug.cgi?id=90375 appears to >> have a work-in-progress patch. >> >> - R. Niwa >> >> On Thu, Nov 22, 2012 at 6:31 PM, Zhang Peixuan <[email protected] >> > wrote: >> >>> Hello, >>> We are writing a GPU-based libjpeg-turbo accelerated version and >>> our goal is to use it in Chrome or other browsers that using Webkit. Now we >>> have written a beta version, that can use GPU to decode JPEG files in >>> Chromium. >>> We still have a lot of work to do. And I have known from the >>> Chromium community that there is also an effort underway in WebKit to >>> generalize the concept of parallel/asynchronous image decoding. >>> So I wan't to know whether we could contribute code? >>> >>> Thanks a lot. >>> >>> Peixuan Zhang >>> 20121123 >>> >>> The following is the email that I have sent to Chromium community. >>> >>> =================================== >>> >>> Hello, >>> >>> I'm a programmer, and my team and I are writing a GPU-based >>> libjpeg-turbo accelerated version, and we mainly use OpenCL. Our goal is to >>> use it in Chrome. Now we have written a beta version, that can use GPU to >>> decode JPEG files in Chromium. >>> >>> However, because we need to load the some additional .dll files and >>> API (e.g. we must load OpenCL.dll), this version must run with the >>> parameter --no-sandbox. >>> >>> We don't know how to run it without no-sandbox, so I really want to know >>> how to load additional .dll files and access some information of the >>> registry in sandbox. Is there some way to do it? >>> >>> In addition, because we need to do some initialization before using >>> OpenCL, while Chrome is a multi-process application, so it needs to do >>> initialization work in each process, which increases the time consumption. >>> We have put forward several ideas, and my workmate Peng Xiao has discussed >>> with you in this community. But after some discussion, we thought that >>> these ideas may not be suitable. >>> >>> Therefore, we have proposed another solution, if we use a separate >>> process to deal with jpeg decoder, We won't need to do multiple >>> initialization work. I think it just like the process of >>> "--type=gpu-process". We could decode image using a single process. >>> >>> We learned that Chrome run JPEG decoder in sandbox maybe because >>> safety factors, so we don't know if we run all JPEG decoder in one process, >>> whether it will bring security risks? Or whether it will bring other >>> problems? Because this step of the work is still in the conceptual stage, >>> we do not know whether it is worthwhile to go ahead. >>> >>> Yours sincerely. >>> ============================================================= >>> >>> 1. Do you have timing information about how jpeg decoding is a >>> bottleneck at the moment? How much % of time is spent in jpeg decoding on >>> rendering? >>> >>> According to the libjpeg-turbo-OpenCL that we have already completed, >>> the performance is a little good than the original version. Of course, we >>> only tested independent libjpeg-turbo, and there may be some differences in >>> Chrome. >>> >>> We tested on AMD A10M 4600M 2.3GHz, on this platform, the OpenCL version >>> is 20~70% faster than before (the performance due to image size and >>> sampling ratio). And for some case, it's even 8% faster than Intel i7-3520M >>> 2.9GHz. >>> >>> Of course, in many cases the JPEG codec is not the most time-consuming >>> things in browsers, but with the popularity of HTML5, the picture codec's >>> proportion will be more and more. e.g. There are many JPEG textures in >>> WebGL. >>> >>> 2. Do you plan to use OpenCL for other things than jpeg decoding? >>> >>> Yes, we do have more plan that use OpenCL to accelerate some of the >>> features in Chrome, what we're doing at least include JPEG and FFMpeg, in >>> the future we may do more work on image and video. >>> >>> 3. Do you have an idea about the latency introduced by doing that, >>> plus the kernel overhead, compared to a completely user-mode solution? >>> There are several context switches introduced that would add a constant >>> time to decoding an image, which severely affect smaller images. Is it >>> worth sending 500 bytes of data to the GPU to be decoded? I don't think so. >>> >>> Yes, we have some ideas that can reduce the transmission time between >>> CPU and GPU, and we also try to reduce the time of kernel overhead, some of >>> these ideas have matured, but we are waiting for its open source. >>> >>> 4. The sandbox bypass is a non-starter. Adding yet-another-process is >>> a non-starter too. Having a new jpeg decoder significantly increases the >>> attack surface so just from a security perspective, I'm not sure it's worth. >>> >>> It's very important, if it would bring high risk of safe, the value of >>> the work is low. >>> >>> 5. Do you have an idea how to do the runtime trade off when it's worth >>> doing a software-only decoding versus offloading to the GPU? What if the >>> user has its GPU already saturated but its CPU idle? At the extreme end, >>> let's assume a dual-8-Cores-Xeon with a low-end Intel integrated graphic >>> cards with 2 30" monitors plugged in. >>> >>> OK, We are concerned about the different things, I think on AMD trinity >>> APU, there may not be such problems, for Intel, I think I need to do some >>> additional research. >>> >>> In addition to what Marc-Antoine said, note that there is also an >>> effort underway in WebKit to generalize the concept of >>> parallel/asynchronous image decoding. You probably want to sync up with >>> that effort to see what overlaps. >>> >>> Thanks a lot, I will send email to WebKit community for more infomation. >>> >>> _______________________________________________ >>> webkit-dev mailing list >>> [email protected] >>> http://lists.webkit.org/mailman/listinfo/webkit-dev >>> >>> >> >> _______________________________________________ >> webkit-dev mailing list >> [email protected] >> http://lists.webkit.org/mailman/listinfo/webkit-dev >> >> >
_______________________________________________ webkit-dev mailing list [email protected] http://lists.webkit.org/mailman/listinfo/webkit-dev

