On Niagara 1/2 machines the shadow rendering of even one diagonal line
causes incredible MMU turmoil because there are not enough MMU entries
for all memory lines being touched.
You could try to enforce the use of 64k MMU pages instead of 8k MMU
pages at the beginning of Xorg:
struct memcntl_mha mha;
mha.mha_cmd = MHA_MAPSIZE_BSSBRK;
mha.mha_flags = 0;
mha.mha_pagesize = 64 * 1024;
(void)memcntl(NULL, 0, MC_HAT_ADVISE, (caddr_t)&mha, 0, 0);
Stack should be mapped with 64k pages, too:
struct memcntl_mha mha;
mha.mha_cmd = MHA_MAPSIZE_STACK;
mha.mha_flags = 0;
mha.mha_pagesize = 64 * 1024;
(void)memcntl(NULL, 0, MC_HAT_ADVISE, (caddr_t)&mha, 0, 0);
You should examine if the shadow buffer is mapped with 64k pages and
compare the performance with 64k pages and 8k pages. You should at
least see a 9-15% increase in performance.
Olga
On Thu, Mar 25, 2010 at 5:18 PM, jf simon <jfs at themis.com> wrote:
> Hi,
>
> using opensolaris build 113, i am running an ATI radeon card on a
> Niagara system (ultrasparc sun4v). the server is Xorg 1.7.4 and
> the video driver is the xf86-xorg-ati.
>
> the gnome desktop looks nice but when i try to play flash videos
> (from youtube) the rendering is very slow and choppy.
>
> using the same GPU and same radeon driver on a linux x86-64 box i
> can play the videos very smoothly.
>
> of course i have configured the two systems in the same
> configuration:
>
> -no acceleration (3D or 2D -- not avail on sparc anyway)
> -using the shadow framebuffer.
>
> the shadow frame buffer is a system memory buffer where the CPU
> draws the image. then it is copied on a a regular basis to the
> GPU "real" framebuffer.
>
> one reason that linux would be faster is because it has write
> combining for the GPU MMIO framebuffer range as defined by MTRR
> registers.
>
> the code in charge of that transfer is:
>
> xorg-server-1.7.4/miext/shadow/shpacked.c::shadowUpdatePacked()
>
> so i am thinking of trying to make the transfer faster
>
>
> ....
> while (width) {
> /* how much remains in this window */
> i = scrBase + winSize - scr;
> if (i <= 0 || scr < scrBase)
> {
> winBase = (FbBits *) (*pBuf->window) (pScreen,
> y,
> scr *
> sizeof (FbBits),
>
> SHADOW_WINDOW_WRITE,
> &winSize,
>
> pBuf->closure);
> if(!winBase)
> return;
> scrBase = scr;
> winSize /= sizeof (FbBits);
> i = winSize;
> }
> win = winBase + (scr - scrBase);
> if (i > width)
> i = width;
> width -= i;
> scr += i;
> #define PickBit(a,i) (((a) >> (i)) & 1)
> while (i--)
> *win++ = *sha++;
> }
> shaLine += shaStride;
> y++;
> }
> pbox++;
> }
> }
> ....
>
>
>
> i wonder what sparc optimizations i could use to speed up this loop:
>
> while (i--)
> *win++ = *sha++;
>
>
> is there anything faster than bcopy? anything that will have the
> same effect as x86 write combining?
>
> thx
> -jfs
>
> _______________________________________________
> xwin-discuss mailing list
> xwin-discuss at opensolaris.org
>
--
, _ _ ,
{ \/`o;====- Olga Kryzhanovska -====;o`\/ }
.----'-/`-/ olga.kryzhanovska at gmail.com \-`\-'----.
`'-..-| / Solaris/BSD//C/C++ programmer \ |-..-'`
/\/\ /\/\
`--` `--`