On Wed, 16 May 2018, [UTF-8] Jean-S??bastien P??dron wrote:

Author: dumbbell
Date: Wed May 16 09:01:02 2018
New Revision: 333669
URL: https://svnweb.freebsd.org/changeset/base/333669

Log:
 teken, vt(4): New callbacks to lock the terminal once

 ... to process input, instead of inside each smaller operations such as
 appending a character or moving the cursor forward.
....
 The goal is to improve input processing speed of vt(4). As a benchmark,
 here is the time taken to write a text file of 360 000 lines (26 MiB) on
 `ttyv0`:

   * vt(4), unmodified:      1500 ms
   * vt(4), with this patch: 1200 ms
   * syscons(4):              700 ms

Syscons was pessimized by a factor of about 12 using related methods
(excessive layering, aktough not so much locking).  So the correct
comparison is with unpessimized syscons taking about 60 ms.

These times are just for writing to the history buffer and are very relevant
for normal operation.  Pessimizations by factors of 12 just annoy me.

 This is on a Haswell laptop with a GENERIC-NODEBUG kernel.

My times are on Haswell too, but on a 4.0GHz desktop with non-GENERIC
non-debug kernels.  My test does 65MB of (weird) text output consisting
650 lines of length 100000.  Again, this is not very representative.
The (trivial) benchmark sources have almost no changes since I wrote
it to benchmark and optimize syscons the Minix console driver 25-30
years ago.  (I didn't do much with syscons then, but made sure that
it was only slightly slower.)  Old tests did 80-column output and all
versions do 1 write(2) per line and it was convenient to scale up the
line length to avoid having too much of the time being for syscall
overhead.  The 65MB is scaled to take about 1 second on an AthlonXP
2.2GHz with the best version of syscons.

Approximate times on Haswell:

- 0.2 seconds with FreeBSD-5.2 syscons modified to recover some of my fixes
  from 1993.  I micro-optimized the inner loop in 1993.  The inner
  loop handles these 100000-character lines 80 characters at a time
  using as close as possible to *dst++ = *src++, with considerable
  overhead for attributes, checking for escape sequences, and for
  reducing to 80 columns.  According to the comment, this took 26
  cycles on i486's (probably DX2/66), but I optimized it to only 18.
  This optimized inner loop was turned into mostly nonsense before
  FreeBSD-5.2, using inlining in all the wrong places.  The loop was
  moved into an inline function (sc_term_gen_print()), but it calls a
  non-inline function (sc_vtb_putchar()).  This made it about 50%
  slower IIRC.

- 50% slower in pre-teken versions in pre-release versions of FreeBSD-8.
  FreeBSD-8 changed the upper layers of the tty driver.  The pessimization
  is to do quoting stuff per-char.  This made the i/o even slower than the
  old way.  The older way produced larger tinygrams by transfering between
  the layers only about 100 bytes per output call, and used inefficent clists.

- about 500% slower for teken, by calling from the sc layer to the teken
  layer for every char.  IIRC, there are more than 5 but less than 10
  function calls per char, so it is doing OK to be only 5 times slower.

- thus the time on Haswell was about 2.4 seconds for 65MB.  This is for
  text mode.  Only slightly slower for graphics mode.  Screen refresh should
  occur at most about 50 times in 2.4 seconds, so at most 100k of the output
  should actually reach the screen and that shouldn't take long on a 4GHz
  system!  (On a 30 year ET4000, the frame buffer speed was 5.9MB/sec so
  100k must take at least 1/30 seconds in text mode.  Any slower than that
  is bad.)

- I optimized this a little by avoiding 1 or 2 function calls (for attribute
  handling) per character, so syscoons onl takes about 2 seconds for 65MB
  in -current.  This is consistent with your 700 ms (26/65 * 2000 ms =
  800).

- I have syscons mostly fixed in local patches:

  - Method A: restore scterm-sc.c (don't use teken).  This was very easy, and
    fixes many other bugs much more easily than in my committed and local
    patches.  The excessive layering in syscons actually helps here -- the
    API for the layering of the terminal emulator has only small changes,
    so scterm-sc.c from FreeBSD-7 takes only about 10 lines of changes to
    drop back in.

    Also restore optimizations from 1993 as far as possible.  They moved to
    scterm-sc.c and were lost with teken.

    This fixes everything except the slow upper layers, so the speed is
    0.33 seconds for 65MB.
    seconds for

  - Method B: restore only sctermvar.h from FreeBSD-7.  Use only
    sc_term_gen_print() and its infrastructure from this.  This does
    essentially *dst++ = *src++ to the history buffer until it hits an
    escape sequence, and it must not be called while in an escape
    sequence.  Subvert the teken layering so that this can be used.

 At the same time, the locking is changed in the vt_flush() function
 which is responsible to draw the text on screen. So instead of
 (indirectly) using VTBUF_LOCK() just to read and reset the dirty area
 of the internal buffer, the lock is held for about the entire function,
 including the drawing part.

 The change is mostly visible while content is scrolling fast: before,
 lines could appear garbled while scrolling because the internal buffer
 was accessed without locks (once the scrolling was finished, the output
 was correct). Now, the scrolling appears correct.

 In the end, the locking model is closer to what syscons(4) does.

This is only very fast because the output never reaches the screen.
Frame buffers aren't much faster than 25 years ago.  Text mode tends to
be limited to a few MB/sec.  Old VGA graphics modes (not supported by
vt) are much slower since they require using PIO registers and there is
more to do.  Direct bitmapped modes might be no slower than text mode,
but again there is a lot of I/O to do.

Old syscons updates so fast that no scrolling is visible when all lines
are the same.  This also requires magic buffer sizes.  clists and the
transfer size of 100 gave suitable magic (e.g., 100 divides 80x25).  Now
tty buffer sizes are normally powers of 2, so the magic lining up rarely
occurs and this looks like artifacts in scrolling.

Bruce
_______________________________________________
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"

Reply via email to