On Fri, 2016-09-30 at 22:43 -0700, Keith Packard wrote: > 1 2 Operation > ------------ ------------------------- ------------------------- > 10900.0 99900.0 ( 9.165) PutImage XY 10x10 square > 1740.0 2160.0 ( 1.241) PutImage XY 100x100 square > 83.2 90.4 ( 1.087) PutImage XY 500x500 square
90 ops per second is still shameful. No blit should take 11ms. > 11800.0 351000.0 ( 29.746) PutImage XYBitmap 10x10 square > 6280.0 81600.0 ( 12.994) PutImage XYBitmap 100x100 square > 752.0 14300.0 ( 19.016) PutImage XYBitmap 500x500 square Where does one get a copy of x11perf that has this test? Also I assume this is measuring before the whole series vs after. Would be nice to see the impact between 2/3 and 3/3 too. > I'd like to know why the GPU expansion version has so much GL > overhead; it "should" be faster for everything as it uploads a lot > less data to the GPU. On i965, at least, scissor updates are an appreciable amount of gl driver time, so small ops get punished. > I'd also like to try writing an XYPixmap that > did all of the plane merging in the fragment shader. That "should" be > faster for large images than merging on the CPU. With a trivial planemask, sure. With an interesting planemask you either need a bitwise write mask to the output fragment (and I'm not sure glsl 1.30 gives you that) or you need one of the fb fetch extensions. - ajax _______________________________________________ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: https://lists.x.org/mailman/listinfo/xorg-devel