On Fri, 8 Nov 2024 at 19:05, suzuki toshiya <[email protected]> wrote:
> I understand your background is academic study of Japanese language, but > is there any special reason to mention to JIS X 0213, during the discussion > of general purpose encoding scheme of UTF-8? It was an aside. (My academic background is in computer science; Japanese NLP is a diversion which I have followed in my retirement.) The original question was about the source code for UTF-8, and the OP mentioned using Debian Linux I wanted to point out that there was source code available for conversion of codes to UTF-8. I tossed in a representation of the conversion of 16-bit Unicode points into 3-byte UTF-8 sequences. (All the characters in JIS X 0208 and JIS X 0212 were incorporated in the initial Unicode version.) Markus Scherer added the representation of 21-bit Unicode in UTF-8, so I pointed out that relatively few kanji in the JIS standards have 21-bit codepoints. > In Japan, many running systems keep the restriction of JIS X 0208, > especially in public sectors. Interesting comment. I guess you are aware that several of the changes and additions made in the 2010 revision of the 常用漢字 involved the use of kanji from outside JIS X 0208. Also, government bodies such as 文化庁 have been encouraging the use of Unicode-only kanji in lists such as the 表外漢字字体表. [...] > I think, the popularity of "21-bit Unicode codepoint" in Japanese text is > highly dependent with the category of the text. Absolutely. Despite some misguided grumbling in Japan about Unicode in its early days, it's what virtually everyone uses now, and no-one is really aware whether the codepoints are 16 or 21 bits. Cheers Jim -- Jim Breen Adjunct Snr Research Fellow, Japanese Studies Centre, Monash University http://www.jimbreen.org/ http://nihongo.monash.edu/
