Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote: >> I appreciate Philippe's support of SCSU, but I don't think *even I* >> would recommend it as an internal storage format. The effort to >> encode and decode it, while by no means Herculean as often perceived, >> is not trivial once you step outside Latin-1. > > I said: "for immutable strings", which means that these Strings are > instanciated for long term, and multiple reuses. In that sense, what > is really significant is its decoding, not the effort to encode it > (which is minimal for ISO-8859-1 encoded source texts, or Unicode > UTF-encoded texts that only use characters from the first page). > > Decoding SCSU is very straightforward, even if this is stateful (at > the internal character level). But for immutable strings, there's no > need to handle various initial states, and the states associated with > each conponent character of the string has no importance (strings > being immutable, only the decoding of the string as a whole makes > sense).
Here is a string, expressed as a sequence of bytes in SCSU: 05 1C 4D 6F 73 63 6F 77 05 1D 20 69 73 20 12 9C BE C1 BA B2 B0 2E See how long it takes you to decode this to Unicode code points. (Do not refer to UTN #14; that would be cheating. :-) It may not be rocket science, but it is not trivial. -Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/