Not just japanese, most kanji is usually double-width, some abjad (think arabic for simplicity) and a few odd others also use a mix of single and double width characters. There is also a few that use half-width and single with mixed and some even have tripple-width to contend with.

https://msdn.microsoft.com/en-us/library/cc194788.aspx

will give you some general idea of the basic formats for kana/kanji, but really I can only say, good luck with it all.

Date: Wed, 25 Oct 2017 18:42:14 -0500
From: Rob Landley <r...@landley.net>
To: toybox@lists.landley.net
Subject: [Toybox] utf8 display question.
Message-ID: <47ce57b4-486b-2920-5358-92ee955f4...@landley.net>
Content-Type: text/plain; charset=utf-8

I'm adding cut -C to do column-based selection, what should it do about
the middle of double width characters? middle of double width
characters? Right now I'm having it round down, so since japanese text
is double width in monospaced fonts:

$ cat tests/files/utf8/japan.txt && echo
?????????????????????????
$ ./cut -C 5-11 tests/files/utf8/japan.txt
???

I.E. 5 skips the first 2 (which starts at column 4, the next display
point _below_ 5), and then it continues to stop before the ending
column. (So 5-11 is the same as 5-10, and 5-12 shows 4 characters
because the 4th character includes column 12).

This is consistent, but I'm not sure if it's right...? Should the first
one round up instead? (Since it's an exclusion range, should the start
fail forward and the end fail backwards?)

Dunno...

Rob

_______________________________________________
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Reply via email to