On Friday 23 October 2009, Koen Kooi wrote: > > I'm not sure about pixman_gc_t since most of the needed operations are just > > simple copies. What about starting with just introducing a variant > > of 'pixman_blt' which is overlapping aware? > > > > I created a work-in-progress branch with 'pixman_blt' function (generic C > > implementation for now) extended to support overlapped source/destination > > case. A simple test program is also included: > > http://cgit.freedesktop.org/~siamashka/pixman/log/?h=overlapped-blt First, this branch is outdated. There is a new branch with the final code :) http://cgit.freedesktop.org/~siamashka/pixman/log/?h=overlapped-blt-v2
> Would using said branch give me 'magically' a performance boost (e.g. > not make firefox unusably slow as it is now on an 600MHz cortex a8) or > would I need to patch other libs (e.g. xrender) as well? Not really, it's just a small extension of pixman functionality. Currently the handling of overlapped blt operation (for software rendering) is done in xorg-server. As it is the responsibility of pixman to provide CPU-specific SIMD optimizations (NEON for ARM Cortex-A8), it would be quite natural to move this work to pixman. So the next steps are to add NEON optimizations to pixman_plt and make sure that xserver takes advantage of these optimizations for the overlapped blit too. As for improving scrolling performance (and assuming a standard fbdev driver), the most important thing is to improve framebuffer memory performance. Right now framebuffer memory is mapped as noncached writecombine on OMAP3. Enabling write-through cache for it (with a simple kernel patch) improves scrolling and moving windows performance by 4x-5x factor (unless shadow framebuffer is used, which is also not good for performance). This works fine if nothing but CPU can modify framebuffer memory. But if GPU or DSP can also access framebuffer memory or compositing manager is used, everything gets more complicated. Cache invalidate operations will have to be inserted in appropriate places in order to ensure memory coherency and uniform view of its content from all the units. If default write-back cache is used instead of write-through, cache flush operations are needed too. Unpatched firefox is also quite slow for another reason - it tries to always work with 32bpp data internally, no matter what color depth is used for desktop. -- Best regards, Siarhei Siamashka _______________________________________________ xorg-devel mailing list xorg-devel@lists.x.org http://lists.x.org/mailman/listinfo/xorg-devel