On Thu, Jul 07, 2011 at 09:50:42AM +0100, Nicholas Marriott wrote:
> Let's turn 'em on.
>
> Not much supports them but we might as well have them.
>
> ok?
>
>
> Index: Makefile
> ===================================================================
> RCS file: /cvs/src/lib/libedit/Makefile,v
> retrieving revision 1.10
> diff -u -p -r1.10 Makefile
> --- Makefile 30 Jun 2010 00:05:35 -0000 1.10
> +++ Makefile 7 Jul 2011 08:49:16 -0000
> @@ -6,6 +6,8 @@ LIB= edit
> WANTLINT=
> USE_SHLIBDIR= yes
>
> +WIDECHAR= yes
> +
> OSRCS= chared.c common.c el.c emacs.c fcns.c filecomplete.c help.c \
> hist.c key.c map.c chartype.c \
> parse.c prompt.c read.c refresh.c search.c sig.c term.c tty.c vi.c
> Index: chartype.h
> ===================================================================
> RCS file: /cvs/src/lib/libedit/chartype.h,v
> retrieving revision 1.3
> diff -u -p -r1.3 chartype.h
> --- chartype.h 7 Jul 2011 05:40:42 -0000 1.3
> +++ chartype.h 7 Jul 2011 08:49:16 -0000
> @@ -45,7 +45,7 @@
> * supports non-BMP code points without requiring UTF-16, but nothing
> * seems to actually advertise this properly, despite Unicode 3.1 having
> * been around since 2001... */
> -#ifndef __NetBSD__
> +#if !defined(__NetBSD__) && !defined(__OpenBSD__)
> #ifndef __STDC_ISO_10646__
Ideally we'd define __STDC_ISO_10646__. But we cannot right now.
With ASCII, latin1, and UTF-8, a wchar_t value is indeed a unicode code
point.
But we have encodings where this will not hold:
CP1251, ARMSCII-8, ISO8859-{2,3,5,7,13}, KOI8. For those, we copy a byte
into the wchar_t object (which is really an int) without translating
the byte value to a unicode code point first.
So we encode characters in wchar_t using values from the range 0x80-0xff
that do not match the unicode code points for those characters.
If libedit's wchar APIs really depend on wchar_t being unicode code
points they won't work correctly in these locales.
These are all single-byte encodings so of course there is no need to
use wchar_t APIs for them. But applications might expect the system to
hide this distinction so they can just use wchar_t for everything.
The lazy way to fix this would be to nuke the old locales and just
support ASCII, latin1, and UTF-8 :)
The proper way to fix this is to add conversion from/to the various
character sets into unicode code points within mbrtowc() and wcrtomb()
in citrus_none.c. This wouldn't need anything special like iconv.
A couple of statically defined translation tables should suffice.
But none of this is libedit's fault, and other applications can run
into similar problems. I don't object to enabling wide characters
in libedit now.