Richard Hipp wrote:
> On 2/26/15, Adam Podstawczy?ski <adam at podstawczynski.com> wrote:
>> Also, to provide more input, I have now noticed that even if the column
>> width is wider than the offending string, this issue still creates problems
>> ? while nothing gets truncated, the position of the next column is
>> miscalculated, causing misalignment:
>
> Proposed fix:  
> https://www.sqlite.org/src/ci?name=b1a9e2916f5b4adef91c34563f71b98e79a10c12

That code correctly computes the number of characters.

However, in the Unicode world, the number of characters is not the same
as the number of columns, not even with so-called fixed-width fonts.

<http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c> says:
| In fixed-width output devices, Latin characters all occupy a single
| "cell" position of equal width, whereas ideographic CJK characters
| occupy two such cells.
| [...]
| The following two functions define the column width of an ISO 10646
| character as follows:
|
|    - The null character (U+0000) has a column width of 0.
|
|    - Other C0/C1 control characters and DEL will lead to a return
|      value of -1.
|
|    - Non-spacing and enclosing combining characters (general
|      category code Mn or Me in the Unicode database) have a
|      column width of 0.
|
|    - SOFT HYPHEN (U+00AD) has a column width of 1.
|
|    - Other format characters (general category code Cf in the Unicode
|      database) and ZERO WIDTH SPACE (U+200B) have a column width of 0.
|
|    - Hangul Jamo medial vowels and final consonants (U+1160-U+11FF)
|      have a column width of 0.
|
|    - Spacing characters in the East Asian Wide (W) or East Asian
|      Full-width (F) category as defined in Unicode Technical
|      Report #11 have a column width of 2.
|
|    - All remaining characters (including all printable
|      ISO 8859-1 and WGL4 characters, Unicode control characters,
|      etc.) have a column width of 1.


Regards,
Clemens

Reply via email to