RE: Devanagari

Marco Cimarosti Mon, 21 Jan 2002 05:58:59 -0800

Doug Ewell wrote:
> Devanagari text encoded in SCSU occupies exactly 1 byte per
> character, plus an additional byte near the start of the
> file to set the current window (0x14 = SC4).


The problem is what happens if that very byte gets corrupted for any
reason...

If an octet is erroneously deleted, changed or added from an UTF-8 stream,
only a single character would be corrupted. If the same thing happens to the
window-setting byte of a SCSU (or other similar "zany" formats), the whole
stream turns into garbage.

What this means in practice for website developers is:

1) SCSU text can only be edited with a text editor which properly decodes
the *whole* file on load and re-encodes it on save. On the other hand, UTF-8
text can also be edited using an encoding-unaware editor, although non-ASCII
text is invisible.

2) SCSU text cannot be built by assembling binary pieces coming from
external sources. E.g., you cannot get a SCSU-encoded template file and fill
in the blanks with customer data coming from a SCSU-encoded database: each
time you insert a piece of text coming from the database, you delete the
current window information, turning into garbage the rest of the file. On
the other hand, UTF-8 allows this, provided that the integrity of each
multi-byte sequence is maintained.

3) A SCSU page can only be accepted by browsers and e-mail readers that are
able to decode it. On the other hand, UTF-8 also works on old ASCII-based
browsers, although non-ASCII text is clearly not properly displayed.

_ Marco

RE: Devanagari

Reply via email to