Re: Unicode, SMS and year 2012

Richard Wordingham Sat, 28 Apr 2012 11:06:24 -0700

On Fri, 27 Apr 2012 11:21:05 -0700
"Doug Ewell" <[email protected]> wrote:


> SCSU works equally well, or almost so, with any text sample where the
> non-ASCII characters fit into a single block of 128 code points. For
> anything other than Latin-1 you need one byte of overhead, to switch
> to another window, and for many scripts you need two, to define a
> window and switch to it. But again, two bytes is not what's holding
> anyone up.

With SCSU that avoids Unicode mode and UQU whenever possible, most
alphabetic languages work fairly well.  However, extra windows are
needed to cover the half-blocks from A480 to ABFF, 15 new codes.  If I
were being miserly, I wouldn't cover A500-A5FF.

SCSU doesn't work well with large syllabaries, especially if they
include a lot of unused characters within the half-blocks used.  Inuit
suffers badly from this, but still achieves noticeable compression.  I
experimented with compressing Yi transposed to a covered range, and
found that it achieved something like 10% compression.  Yi suffers from
needing the 8 dynamic windows to be switched between 10 half-blocks
(with occasionally excursions to an 11th.)  If the Yi characters had
been arranged by tone first and initial consonant second, 2 of the
half-blocks would never have been used in my sample!

Vai A500-A63F fits in 3 half-blocks, and I would expect non-Vai
characters in it to be in static blocks.  Given how well Yi performed, I
expect Vai to benefit from SCSU.

Has anyone investigated the performance of SCSU with Cuneiform or
Egyptian Hieroglyphics?  It might achieve better than 50% compression!
A fair comparison of Egyptian Hieroglyphics depends on the mark-up
used, for Unicode on its own does not enable one to write reasonable Middle
Egyptian.

Richard.

Re: Unicode, SMS and year 2012

Reply via email to