Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:

>> I appreciate Philippe's support of SCSU, but I don't think *even I*
>> would recommend it as an internal storage format.  The effort to
>> encode and decode it, while by no means Herculean as often perceived,
>> is not trivial once you step outside Latin-1.
>
> I said: "for immutable strings", which means that these Strings are
> instanciated for long term, and multiple reuses. In that sense, what
> is really significant is its decoding, not the effort to encode it
> (which is minimal for ISO-8859-1 encoded source texts, or Unicode
> UTF-encoded texts that only use characters from the first page).
>
> Decoding SCSU is very straightforward, even if this is stateful (at
> the internal character level). But for immutable strings, there's no
> need to handle various initial states, and the states associated with
> each conponent character of the string has no importance (strings
> being immutable, only the decoding of the string as a whole makes
> sense).

Here is a string, expressed as a sequence of bytes in SCSU:

05 1C 4D 6F 73 63 6F 77 05 1D 20 69 73 20 12 9C BE C1 BA B2 B0 2E

See how long it takes you to decode this to Unicode code points.  (Do
not refer to UTN #14; that would be cheating. :-)

It may not be rocket science, but it is not trivial.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/



Reply via email to