On Fri, 27 Apr 2012 11:21:05 -0700 "Doug Ewell" <d...@ewellic.org> wrote:
> SCSU works equally well, or almost so, with any text sample where the > non-ASCII characters fit into a single block of 128 code points. For > anything other than Latin-1 you need one byte of overhead, to switch > to another window, and for many scripts you need two, to define a > window and switch to it. But again, two bytes is not what's holding > anyone up. With SCSU that avoids Unicode mode and UQU whenever possible, most alphabetic languages work fairly well. However, extra windows are needed to cover the half-blocks from A480 to ABFF, 15 new codes. If I were being miserly, I wouldn't cover A500-A5FF. SCSU doesn't work well with large syllabaries, especially if they include a lot of unused characters within the half-blocks used. Inuit suffers badly from this, but still achieves noticeable compression. I experimented with compressing Yi transposed to a covered range, and found that it achieved something like 10% compression. Yi suffers from needing the 8 dynamic windows to be switched between 10 half-blocks (with occasionally excursions to an 11th.) If the Yi characters had been arranged by tone first and initial consonant second, 2 of the half-blocks would never have been used in my sample! Vai A500-A63F fits in 3 half-blocks, and I would expect non-Vai characters in it to be in static blocks. Given how well Yi performed, I expect Vai to benefit from SCSU. Has anyone investigated the performance of SCSU with Cuneiform or Egyptian Hieroglyphics? It might achieve better than 50% compression! A fair comparison of Egyptian Hieroglyphics depends on the mark-up used, for Unicode on its own does not enable one to write reasonable Middle Egyptian. Richard.