Hi,

using opensolaris build 113, i am running an ATI radeon card on a
Niagara system (ultrasparc sun4v). the server is Xorg 1.7.4 and
the video driver is the xf86-xorg-ati.

the gnome desktop looks nice but when i try to play flash videos
(from youtube) the rendering is very slow and choppy.

using the same GPU and same radeon driver on a linux x86-64 box i
can play the videos very smoothly.

of course i have configured the two systems in the same
configuration:

-no acceleration (3D or 2D -- not avail on sparc anyway)
-using the shadow framebuffer.

the shadow frame buffer is a system memory buffer where the CPU
draws the image. then it is copied on a a regular basis to the
GPU "real" framebuffer.

one reason that linux would be faster is because it has write
combining for the GPU MMIO framebuffer range as defined by MTRR
registers.

 the code in charge of that transfer is:

xorg-server-1.7.4/miext/shadow/shpacked.c::shadowUpdatePacked()

so i am thinking of trying to make the transfer faster


....
            while (width) {
                /* how much remains in this window */
                i = scrBase + winSize - scr;
                if (i <= 0 || scr < scrBase)
                {
                    winBase = (FbBits *) (*pBuf->window) (pScreen,
                                                          y,
                                                          scr *
sizeof (FbBits),

SHADOW_WINDOW_WRITE,
                                                          &winSize,

pBuf->closure);
                    if(!winBase)
                        return;
                    scrBase = scr;
                    winSize /= sizeof (FbBits);
                    i = winSize;
                }
                win = winBase + (scr - scrBase);
                if (i > width)
                    i = width;
                width -= i;
                scr += i;
#define PickBit(a,i)    (((a) >> (i)) & 1)
                while (i--)
                    *win++ = *sha++;
            }
            shaLine += shaStride;
            y++;
        }
        pbox++;
    }
}
....



i wonder what sparc optimizations i could use to speed up this loop:

               while (i--)
                    *win++ = *sha++;


is there anything faster than bcopy? anything that will have the
same effect as x86 write combining?

thx
-jfs

Reply via email to