On Thu, Mar 14, 2019 at 04:04:20PM +0100, Ingo Schwarze wrote:
> Hi,
>
> the following is a very simple patch to completely clean up the
> file less/search.c with respect to UTF-8 handling. It also fixes
> an outright bug: Searching for uppercase UTF-8 characters currently
> doesn't work because passing a Unicode codepoint (in this case, the
> "ch" retrieved with step_char()) to isupper(3) is just totally
> wrong.
>
> The new loop is fairly standard. Invalid bytes are simply skipped.
>
> OK?
> Ingo
>
Yes, OK.
> Index: search.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/less/search.c,v
> retrieving revision 1.19
> diff -u -p -r1.19 search.c
> --- search.c 2 Aug 2017 19:35:57 -0000 1.19
> +++ search.c 14 Mar 2019 13:48:59 -0000
> @@ -75,12 +75,14 @@ static struct pattern_info filter_info;
> static int
> is_ucase(char *str)
> {
> - char *str_end = str + strlen(str);
> - LWCHAR ch;
> + wchar_t ch;
> + int len;
>
> - while (str < str_end) {
> - ch = step_char(&str, +1, str_end);
> - if (isupper(ch))
> + for (; *str != '\0"; str += len) {
> + if ((len = mbtowc(&ch, str, MB_CUR_MAX)) == -1) {
> + mbtowc(NULL, NULL, MB_CUR_MAX);
> + len = 1;
> + } else if (iswupper(ch))
> return (1);
> }
> return (0);