Actually I think that current crunch_str prints trailing zero width combining chars just fine?
since when width==columns its still >= 0 ................................................. for (end = start = *str; *end; columns += col, end += bytes) { wchar_t wc; if ((bytes = utf8towc(&wc, end, 4))>0 && (col = wcwidth(wc))>=0) { if (!escmore || wc>255 || !strchr(escmore, wc)) { if (width-columns<col) break; <------col is 0 when U-0x300-0x36f if (out) fwrite(end, bytes, 1, out); continue; } } ...................... And yeah UTF-8 is good because it was originally written on napkin at dinner table by Ken Thompson and Rob Pike. Unicode on the other hand... not written in napkin. -Jarno On Thu, Sep 19, 2019 at 7:39 PM Jarno Mäkipää <jmaki...@gmail.com> wrote: > > Yeah combining chars follow up the main glyph. My draw_str_until had > extra for loop to just to check if there is 0 width chars after we are > at correct width and I only pushed data to stdout when I was sure > about length. > > But interface in crunch_str is better, since it has support for > rendering special chars with custom function. > > Now there is bit incorrect rendering when stepping around > tests/files/test1.txt so I need to patch this up. Perhaps I try to > make crunch_nstr() work correctly... > > -Jarno > > On Thu, Sep 19, 2019 at 5:34 PM Rob Landley <r...@landley.net> wrote: > > > > On 9/15/19 8:05 AM, Jarno Mäkipää wrote: > > > Replaced: draw_str_until with lib/crunch_str() where possible > > > > > > Removed: Unused char draw functions. > > > > > > Implemented: crunch_nstr() which is crunch_str with additional check > > > for byte length, this can be used to draw substrings or non null > > > terminated strings. (This can be moved to lib/ if its useful for others) > > > > Applied, but I note when I wrote crunch_str() I assumed that unicode was > > sane, > > which was wrong. > > > > UTF-8 is very well done. Unicode combining characters are as stupid as it's > > possible to be: they TRAIL the printing character, meaning that you have a > > base > > character that gets displayed, and then you redraw over it repeatedly as > > you get > > each new modifier attaching to the _previous_ character you already drew, > > and > > then you can't tell you've gone past your length allocation until you parse > > the > > first character you _can't_ display in that space, which you then need to > > unget. > > > > I thought combining characters were stored up and then applied to the _next_ > > character (which would have been the sane thing to do), and the measuring > > logic > > works based on that assumption. So it probably won't display combining > > characters on the last UTF8 character because the unicode committe is too > > dumb > > to live. > > > > Rob _______________________________________________ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net