On 26-10-09 01:57, Siarhei Siamashka wrote: > On Friday 23 October 2009, Koen Kooi wrote: >>> I'm not sure about pixman_gc_t since most of the needed operations are just >>> simple copies. What about starting with just introducing a variant >>> of 'pixman_blt' which is overlapping aware? >>> >>> I created a work-in-progress branch with 'pixman_blt' function (generic C >>> implementation for now) extended to support overlapped source/destination >>> case. A simple test program is also included: >>> http://cgit.freedesktop.org/~siamashka/pixman/log/?h=overlapped-blt > > First, this branch is outdated. There is a new branch with the final code :) > http://cgit.freedesktop.org/~siamashka/pixman/log/?h=overlapped-blt-v2 > >> Would using said branch give me 'magically' a performance boost (e.g. >> not make firefox unusably slow as it is now on an 600MHz cortex a8) or >> would I need to patch other libs (e.g. xrender) as well? > > Not really, it's just a small extension of pixman functionality. Currently > the handling of overlapped blt operation (for software rendering) is done > in xorg-server. As it is the responsibility of pixman to provide CPU-specific > SIMD optimizations (NEON for ARM Cortex-A8), it would be quite natural to > move this work to pixman. So the next steps are to add NEON optimizations > to pixman_plt and make sure that xserver takes advantage of these > optimizations for the overlapped blit too.
So: 1) merge your branch into pixman master 2) move overlapped blit handling from xserver-xorg to pixman 3) add SIMD optimizations to pixman Would give us better scrolling, right? > As for improving scrolling performance (and assuming a standard fbdev driver), > the most important thing is to improve framebuffer memory performance. Right > now framebuffer memory is mapped as noncached writecombine on OMAP3. Enabling > write-through cache for it (with a simple kernel patch) improves scrolling > and moving windows performance by 4x-5x factor (unless shadow framebuffer is > used, which is also not good for performance). This works fine if nothing > but CPU can modify framebuffer memory. But if GPU or DSP can also access > framebuffer memory or compositing manager is used, everything gets more > complicated. Cache invalidate operations will have to be inserted in > appropriate places in order to ensure memory coherency and uniform view > of its content from all the units. If default write-back cache is used > instead of write-through, cache flush operations are needed too. I have no idea how the sgx or dsp handle the framebuffer, but I'm using both. > Unpatched firefox is also quite slow for another reason - it tries to > always work with 32bpp data internally, no matter what color depth is > used for desktop. I'm already using your patch for that :) regards, Koen _______________________________________________ xorg-devel mailing list xorg-devel@lists.x.org http://lists.x.org/mailman/listinfo/xorg-devel