On Fri, Mar 08, 2019 at 02:39:32PM +0100, Ingo Schwarze wrote:
> Hi,
>
> Lauri Tirkkonen wrote on Fri, Mar 08, 2019 at 12:43:11PM +0200:
>
> > I feel like xterm should just use the system wcwidth() to avoid these
> > mismatches, so rudimentary diff to do that below.
>
> Absolutely, i strongly agree with that sentiment.
>
> Having a local character width table in an application program is
> just wrong. Archaic operating systems like Oracle Solaris 11 that
> have broken wcwidth(3) implementations should either fix their C
> library or go to hell.
>
> I also agree with the details of the diff. It is minimal in so far
> as it changes only one file, keeping the annoyance minimal when merging
> new xterm releases. The function systemWcwidthOk() is horrific
> enough that deleting it outright makes sense, to make sure that it
> can never get called by accident.
>
> Note that the command line options -cjk_width and -mk_width and the
> related resources remain broken. But i'd say people using them get
> what they deserve. The disruption to updates that would be caused
> by excising them from multiple files is probably better avoided.
>
> Unchanged diff left in place below for easy reference.
>
> OK to commit?
hi,
I would prefer a diff that just add a &&!defined(__OpenBSD__) to the
condition before the definition of systemWcwidthOk(). This will cause
less risk of conflicts in future updates and clearly show the
intention.
but either way ok matthieu@
> Ingo
>
>
> diff --git a/app/xterm/util.c b/app/xterm/util.c
> index a3de4a005..9b6443683 100644
> --- a/app/xterm/util.c
> +++ b/app/xterm/util.c
> @@ -4980,61 +4980,6 @@ decode_keyboard_type(XtermWidget xw, XTERM_RESOURCE *
> rp)
> }
>
> #if OPT_WIDE_CHARS
> -#if defined(HAVE_WCHAR_H) && defined(HAVE_WCWIDTH)
> -/*
> - * If xterm is running in a UTF-8 locale, it is still possible to encounter
> - * old runtime configurations which yield incomplete or inaccurate data.
> - */
> -static Bool
> -systemWcwidthOk(int samplesize, int samplepass)
> -{
> - wchar_t n;
> - int oops = 0;
> -
> - for (n = 21; n <= 25; ++n) {
> - wchar_t code = (wchar_t) dec2ucs(NULL, (unsigned) n);
> - int system_code = wcwidth(code);
> - int intern_code = mk_wcwidth(code);
> -
> - /*
> - * Solaris 10 wcwidth() returns "2" for all of the line-drawing (page
> - * 0x2500) and most of the geometric shapes (a few are excluded, just
> - * to make it more difficult to use). Do a sanity check to avoid using
> - * it.
> - */
> - if ((system_code < 0 && intern_code >= 1)
> - || (system_code >= 0 && intern_code != system_code)) {
> - TRACE(("systemWcwidthOk: broken system line-drawing wcwidth\n"));
> - oops += (samplepass + 1);
> - break;
> - }
> - }
> -
> - for (n = 0; n < (wchar_t) samplesize; ++n) {
> - int system_code = wcwidth(n);
> - int intern_code = mk_wcwidth(n);
> -
> - /*
> - * When this check was originally implemented, there were few if any
> - * libraries with full Unicode coverage. Time passes, and it is
> - * possible to make a full comparison of the BMP. There are some
> - * differences: mk_wcwidth() marks some codes as combining and some
> - * as single-width, differing from GNU libc.
> - */
> - if ((system_code < 0 && intern_code >= 1)
> - || (system_code >= 0 && intern_code != system_code)) {
> - TRACE((".. width(U+%04X) = %d, expected %d\n",
> - (unsigned) n, system_code, intern_code));
> - if (++oops > samplepass)
> - break;
> - }
> - }
> - TRACE(("systemWcwidthOk: %d/%d mismatches, allowed %d\n",
> - oops, (int) n, samplepass));
> - return (oops <= samplepass);
> -}
> -#endif /* HAVE_WCWIDTH */
> -
> void
> decode_wcwidth(XtermWidget xw)
> {
> @@ -5045,8 +4990,7 @@ decode_wcwidth(XtermWidget xw)
> switch (mode) {
> default:
> #if defined(HAVE_WCHAR_H) && defined(HAVE_WCWIDTH)
> - if (xtermEnvUTF8() &&
> - systemWcwidthOk(xw->misc.mk_samplesize, xw->misc.mk_samplepass)) {
> + if (xtermEnvUTF8()) {
> my_wcwidth = wcwidth;
> TRACE(("using system wcwidth() function\n"));
> break;
--
Matthieu Herrb