Re: less(1) UTF-8 cleanup: do_append()

Todd C . Miller Tue, 12 Mar 2019 08:15:24 -0700

On Tue, 12 Mar 2019 15:55:59 +0100, Ingo Schwarze wrote:

> Four bad function calls have to be replaced here:
>
>  1. The call to the bad function get_wchar() is used to find the
>     character already present at the current position.  Replacing
>     it with mbtowc(3) also eliminates LWCHAR.  In case of failure -
>     which may no be likely since at least usually, the linebuf[]
>     will contain valid UTF-8 - setting prev_ch to NUL makes sure
>     that whatever is already there will simply be replaced.
>     I think linebuf[curr] cannot be NUL at this point because only
>     backc() sets overstrike, and just having backed up, *something*
>     will be there.  But even if linebuf[curr] *is* NUL and hence
>     mbtowc(3) returns 0, the new code should do the right thing and
>     simply append.
>
>  2. The calls to the bad functions is_composing_char() and
>     is_combining_char() have to be replaced with wcwidth(3).
>     That also eliminates the second call to get_wchar().
>     If wcwidth(3) fails (i.e. ch is not printable), we simply have
>     to treat it like a width 1 character.
>
>  3. The call to the bad function control_char() has to be eliminated.
>     Start by considering two cases separately.  In utf_mode, we
>     have is_ascii_char(ch), i.e. ch <= 0x7f.  In that case,
>     control_char() is just iscntrl(3), which is identical to
>     !isprint(3).  In !utf_mode, we know that do_append() only gets
>     called with ch <= 0xff, and control_char() is iscntrl(3) ||
>     !isprint(3).  In that case and expression, the iscntrl(3) is
>     obviously redundant.  So we can simply use !isprint(3) for both
>     cases, which is also a logical way of expressing the condition
>     because this "else if" clause is supposed to handle non-printable
>     single-byte characters.
>
>  4. The call to the bad function is_ubin_char() intends to handle
>     non-printable Unicode characters, so the right function to use
>     is simply !iswprint(3) from <wctype.h>.


This looks like an improvement to me.  OK millert@

 - todd

Re: less(1) UTF-8 cleanup: do_append()

Reply via email to