On Thu, Jul 07, 2011 at 09:50:42AM +0100, Nicholas Marriott wrote: > Let's turn 'em on. > > Not much supports them but we might as well have them. > > ok? > > > Index: Makefile > =================================================================== > RCS file: /cvs/src/lib/libedit/Makefile,v > retrieving revision 1.10 > diff -u -p -r1.10 Makefile > --- Makefile 30 Jun 2010 00:05:35 -0000 1.10 > +++ Makefile 7 Jul 2011 08:49:16 -0000 > @@ -6,6 +6,8 @@ LIB= edit > WANTLINT= > USE_SHLIBDIR= yes > > +WIDECHAR= yes > + > OSRCS= chared.c common.c el.c emacs.c fcns.c filecomplete.c help.c \ > hist.c key.c map.c chartype.c \ > parse.c prompt.c read.c refresh.c search.c sig.c term.c tty.c vi.c > Index: chartype.h > =================================================================== > RCS file: /cvs/src/lib/libedit/chartype.h,v > retrieving revision 1.3 > diff -u -p -r1.3 chartype.h > --- chartype.h 7 Jul 2011 05:40:42 -0000 1.3 > +++ chartype.h 7 Jul 2011 08:49:16 -0000 > @@ -45,7 +45,7 @@ > * supports non-BMP code points without requiring UTF-16, but nothing > * seems to actually advertise this properly, despite Unicode 3.1 having > * been around since 2001... */ > -#ifndef __NetBSD__ > +#if !defined(__NetBSD__) && !defined(__OpenBSD__) > #ifndef __STDC_ISO_10646__
Ideally we'd define __STDC_ISO_10646__. But we cannot right now. With ASCII, latin1, and UTF-8, a wchar_t value is indeed a unicode code point. But we have encodings where this will not hold: CP1251, ARMSCII-8, ISO8859-{2,3,5,7,13}, KOI8. For those, we copy a byte into the wchar_t object (which is really an int) without translating the byte value to a unicode code point first. So we encode characters in wchar_t using values from the range 0x80-0xff that do not match the unicode code points for those characters. If libedit's wchar APIs really depend on wchar_t being unicode code points they won't work correctly in these locales. These are all single-byte encodings so of course there is no need to use wchar_t APIs for them. But applications might expect the system to hide this distinction so they can just use wchar_t for everything. The lazy way to fix this would be to nuke the old locales and just support ASCII, latin1, and UTF-8 :) The proper way to fix this is to add conversion from/to the various character sets into unicode code points within mbrtowc() and wcrtomb() in citrus_none.c. This wouldn't need anything special like iconv. A couple of statically defined translation tables should suffice. But none of this is libedit's fault, and other applications can run into similar problems. I don't object to enabling wide characters in libedit now.