Re: [sqlite] [Bug] Non-ASCII character is not counted in calculating column width

Yuriy M. Kaminskiy Fri, 02 Jun 2017 02:47:39 -0700

Jacob Pratt <[email protected]> writes:

> Using .width x along with .mode columns, any non-ASCII character isn't
> counted, causing the column to shrink by one.
>
> I *think* my analysis is correct, but it also might be counted multiple
> times by taking a naïve approach and just counting the number of bytes
> (UTF-8 has multi-byte characters).


Yep, it uses bytes:
#if defined(_WIN32) || defined(WIN32)
...
#elif !defined(utf8_printf)
# define utf8_printf fprintf
#endif
...
static int shell_callback(
...
    case MODE_Column: {
...
        if( w<0 ){
          utf8_printf(p->out,"%*.*s%s",-w,-w,
              azArg[i] ? azArg[i] : p->nullValue,
              i==nArg-1 ? rowSep : "  ");
        }else{
          utf8_printf(p->out,"%-*.*s%s",w,w,
              azArg[i] ? azArg[i] : p->nullValue,
              i==nArg-1 ? rowSep : "  ");
        }

And fprintf counts bytes, not characters.

I'd like to also note that aside of multi-byte characters that must be
accounted for, there are "opposite": double-width characters. E.g.
<a5>    /x30/x42        HIRAGANA LETTER A
utf-8 representation takes 3 bytes, but *two* column positions.
See man wcwidth wcswidth.

BTW, truncating utf-8 in the middle, as fprintf would do, can
produce confusing mojibake.

Whether it will be fixed (it is definitely annoying to implement), but
at least current deficiencies should be documented.

cat >>sqlite3.1
.SH KNOWN BUGS
.B .mode column
does not handle either multibyte or double-width characters, patches welcomed.

_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] [Bug] Non-ASCII character is not counted in calculating column width

Reply via email to