I have completed the TigerVNC Viewer optimization project, and the latest revision of TigerVNC (r4764) should, unless I am totally missing something, perform on parity with TurboVNC end-to-end (barring the use of multi-threading in the latter, and assuming a sufficiently fast client machine-- see below.) The project was not as difficult as optimizing the server, since it was not necessary to make any protocol changes, but it still required many hours of low-level analysis using VNCBenchTools. I also performed a full high-level benchmark run with the Linux viewer (64-bit and 32-bit) and sanity checked the performance of the OS X and Windows viewers. All appear to be producing the same frame rate (even slightly better in a couple of cases) than TurboVNC when using the GLXspheres benchmark and VirtualGL. At the low levels, decoder performance was improved by 15-20% across the board relative to the TigerVNC 1.1 baseline, measured against the set of 20 canonical RFB session captures described in http://www.virtualgl.org/pmwiki/uploads/About/tighttoturbo.pdf. This low-level improvement translated into an aggregate throughput improvement of 15-20% at the high level as well, bringing the aggregate performance up to TurboVNC's baseline.
The only remaining anomaly is that the TigerVNC Viewer, at least the Linux version, still produces quite a bit more CPU usage on the client machine than the TurboVNC Viewer. This did not cause a slow-down on my system, because the client CPU cores were still only about 65% engaged (as opposed to 47% engaged with TurboVNC.) However, theoretically, this could cause TigerVNC to perform more slowly than TurboVNC on a single-core machine or a slower multi-core machine. Both the vncviewer and Xorg processes use more of the CPU in TigerVNC than in TurboVNC. The increased usage may be due to a difference in architecture. One thing I noticed at the low level was that a significant amount of CPU time was being spent in the rectangle fill routines, because the TigerVNC Viewer uses the CPU to fill solid regions of its back buffer. TurboVNC, by contrast, "cheats" and does a sort of pseudo-double-buffering, whereby it will wait until all of the rectangles from a framebuffer update have been received, then it will call XFillRectangle() and XShmPutImage() in rapid succession to draw the solid and non-solid rectangles, respectively, then XSync() to flush everything to the screen. The sequence of X calls occurs so quickly that it is perceived as double buffering, even though technically it isn't, and since the XFillRectangle() calls can be offloaded to hardware, they don't cause a significant load on the X server. I have optimized the rectangle fill routines in TigerVNC somewhat, but the X11 usage is still different, since TigerVNC will use XShmPutImage() to draw everything to the screen, rather than using XFillRectangle() for the solid regions and XShmPutImage() for everything else. I'm not suggesting that TigerVNC adopt TurboVNC's method, but I'm hoping to perhaps spark ideas regarding how to improve the rectangle fill performance in TigerVNC, if that is in fact the cause of the CPU usage increase. I wish I had a build to share with you, but the pre-release build is broken right now, due to the DPMS changes that were checked in earlier. Looking into it. DRC ------------------------------------------------------------------------------ RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ Tigervnc-devel mailing list Tigervnc-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tigervnc-devel