I have completed the TigerVNC Viewer optimization project, and the
latest revision of TigerVNC (r4764) should, unless I am totally missing
something, perform on parity with TurboVNC end-to-end (barring the use
of multi-threading in the latter, and assuming a sufficiently fast
client machine-- see below.)  The project was not as difficult as
optimizing the server, since it was not necessary to make any protocol
changes, but it still required many hours of low-level analysis using
VNCBenchTools.  I also performed a full high-level benchmark run with
the Linux viewer (64-bit and 32-bit) and sanity checked the performance
of the OS X and Windows viewers.  All appear to be producing the same
frame rate (even slightly better in a couple of cases) than TurboVNC
when using the GLXspheres benchmark and VirtualGL.  At the low levels,
decoder performance was improved by 15-20% across the board relative to
the TigerVNC 1.1 baseline, measured against the set of 20 canonical RFB
session captures described in
http://www.virtualgl.org/pmwiki/uploads/About/tighttoturbo.pdf.  This
low-level improvement translated into an aggregate throughput
improvement of 15-20% at the high level as well, bringing the aggregate
performance up to TurboVNC's baseline.

The only remaining anomaly is that the TigerVNC Viewer, at least the
Linux version, still produces quite a bit more CPU usage on the client
machine than the TurboVNC Viewer.  This did not cause a slow-down on my
system, because the client CPU cores were still only about 65% engaged
(as opposed to 47% engaged with TurboVNC.)  However, theoretically, this
could cause TigerVNC to perform more slowly than TurboVNC on a
single-core machine or a slower multi-core machine.  Both the vncviewer
and Xorg processes use more of the CPU in TigerVNC than in TurboVNC.
The increased usage may be due to a difference in architecture.  One
thing I noticed at the low level was that a significant amount of CPU
time was being spent in the rectangle fill routines, because the
TigerVNC Viewer uses the CPU to fill solid regions of its back buffer.
TurboVNC, by contrast, "cheats" and does a sort of
pseudo-double-buffering, whereby it will wait until all of the
rectangles from a framebuffer update have been received, then it will
call XFillRectangle() and XShmPutImage() in rapid succession to draw the
solid and non-solid rectangles, respectively, then XSync() to flush
everything to the screen.  The sequence of X calls occurs so quickly
that it is perceived as double buffering, even though technically it
isn't, and since the XFillRectangle() calls can be offloaded to
hardware, they don't cause a significant load on the X server.  I have
optimized the rectangle fill routines in TigerVNC somewhat, but the X11
usage is still different, since TigerVNC will use XShmPutImage() to draw
everything to the screen, rather than using XFillRectangle() for the
solid regions and XShmPutImage() for everything else.  I'm not
suggesting that TigerVNC adopt TurboVNC's method, but I'm hoping to
perhaps spark ideas regarding how to improve the rectangle fill
performance in TigerVNC, if that is in fact the cause of the CPU usage
increase.

I wish I had a build to share with you, but the pre-release build is
broken right now, due to the DPMS changes that were checked in earlier.
 Looking into it.

DRC

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Tigervnc-devel mailing list
Tigervnc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tigervnc-devel

Reply via email to