At 05:50 PM 10/31/01 -0800, Kenneth Whistler wrote:
>I have no quarrel with the claim that the SCSU scheme could be
>implemented directly on UTF-32 data. But as Unicode Technical Standard
>#6 is currently written, that is not how to do it conformantly.

Actually, no specific encoding form is required for the uncompressed data.
SCSU has always been a transformation from code point sequences to byte
sequences. As long as the same byte sequence represents the same code point
sequence, the implementation is conformant. (The encoder and decoder should
probably state very clearly what encoding form they consume, resp. emit).


>It seems to me that a rewrite of SCSU would be in order to explicitly
>allow and define UTF-32 implementations as well as UTF-16 implementations
>of SCSU.

What is needed is a rewrite of SCSU that makes explicit that in the SCSU
*compressed* data stream "unicode mode" is always UTF-16BE (instead of
"two byte unicode in the usual way", as the current text reads ;-)

I have completed such a rewrite, with modest updates of the terminology,
so as to not actually require Unicode 3.0 or 3.1 as base document. Since
formally SCSU uses Unicode 2.0.0 as base version, I have felt it
inappropriate to go overboard in making changes.

For that reason, I have introduced the term "supplementary code space"
as definition in TR6 itself. This allows me to eliminate references
to "expansion space" which readers coming from 3.x can no longer follow,
without requiring formal reference to 3.1 just for different words for the
same thing.

Another goal was to limit the places in which text was changed, since no
*technical* change of the specification is intended, and wholesale changes
would have obscured this fact.

I have added a short section on worst-case behavior as well.

The resulting draft is posted on http://www.unicode.org/~asmus/tr6-3.3d1.html
for input.

A./



Reply via email to