Re: libevent wide character functions

Stefan Sperling Thu, 07 Jul 2011 03:25:42 -0700

On Thu, Jul 07, 2011 at 09:50:42AM +0100, Nicholas Marriott wrote:
> Let's turn 'em on.
> 
> Not much supports them but we might as well have them.
> 
> ok?
> 
> 
> Index: Makefile
> ===================================================================
> RCS file: /cvs/src/lib/libedit/Makefile,v
> retrieving revision 1.10
> diff -u -p -r1.10 Makefile
> --- Makefile  30 Jun 2010 00:05:35 -0000      1.10
> +++ Makefile  7 Jul 2011 08:49:16 -0000
> @@ -6,6 +6,8 @@ LIB=  edit
>  WANTLINT=
>  USE_SHLIBDIR=        yes
>  
> +WIDECHAR=    yes
> +
>  OSRCS=       chared.c common.c el.c emacs.c fcns.c filecomplete.c help.c \
>       hist.c key.c map.c chartype.c \
>       parse.c prompt.c read.c refresh.c search.c sig.c term.c tty.c vi.c
> Index: chartype.h
> ===================================================================
> RCS file: /cvs/src/lib/libedit/chartype.h,v
> retrieving revision 1.3
> diff -u -p -r1.3 chartype.h
> --- chartype.h        7 Jul 2011 05:40:42 -0000       1.3
> +++ chartype.h        7 Jul 2011 08:49:16 -0000
> @@ -45,7 +45,7 @@
>   * supports non-BMP code points without requiring UTF-16, but nothing
>   * seems to actually advertise this properly, despite Unicode 3.1 having
>   * been around since 2001... */
> -#ifndef __NetBSD__
> +#if !defined(__NetBSD__) && !defined(__OpenBSD__)
>  #ifndef __STDC_ISO_10646__


Ideally we'd define __STDC_ISO_10646__. But we cannot right now.

With ASCII, latin1, and UTF-8, a wchar_t value is indeed a unicode code
point.

But we have encodings where this will not hold:
CP1251, ARMSCII-8, ISO8859-{2,3,5,7,13}, KOI8. For those, we copy a byte
into the wchar_t object (which is really an int) without translating
the byte value to a unicode code point first.
So we encode characters in wchar_t using values from the range 0x80-0xff
that do not match the unicode code points for those characters.
If libedit's wchar APIs really depend on wchar_t being unicode code
points they won't work correctly in these locales.

These are all single-byte encodings so of course there is no need to
use wchar_t APIs for them. But applications might expect the system to
hide this distinction so they can just use wchar_t for everything.

The lazy way to fix this would be to nuke the old locales and just
support ASCII, latin1, and UTF-8 :)

The proper way to fix this is to add conversion from/to the various
character sets into unicode code points within mbrtowc() and wcrtomb()
in citrus_none.c. This wouldn't need anything special like iconv.
A couple of statically defined translation tables should suffice.

But none of this is libedit's fault, and other applications can run
into similar problems. I don't object to enabling wide characters
in libedit now.

Re: libevent wide character functions

Reply via email to