Doug Ewell wrote: > Devanagari text encoded in SCSU occupies exactly 1 byte per > character, plus an additional byte near the start of the > file to set the current window (0x14 = SC4).
The problem is what happens if that very byte gets corrupted for any reason... If an octet is erroneously deleted, changed or added from an UTF-8 stream, only a single character would be corrupted. If the same thing happens to the window-setting byte of a SCSU (or other similar "zany" formats), the whole stream turns into garbage. What this means in practice for website developers is: 1) SCSU text can only be edited with a text editor which properly decodes the *whole* file on load and re-encodes it on save. On the other hand, UTF-8 text can also be edited using an encoding-unaware editor, although non-ASCII text is invisible. 2) SCSU text cannot be built by assembling binary pieces coming from external sources. E.g., you cannot get a SCSU-encoded template file and fill in the blanks with customer data coming from a SCSU-encoded database: each time you insert a piece of text coming from the database, you delete the current window information, turning into garbage the rest of the file. On the other hand, UTF-8 allows this, provided that the integrity of each multi-byte sequence is maintained. 3) A SCSU page can only be accepted by browsers and e-mail readers that are able to decode it. On the other hand, UTF-8 also works on old ASCII-based browsers, although non-ASCII text is clearly not properly displayed. _ Marco