* Stefan Sperling <s...@stsp.name> [110104 23:12]:
> On Tue, Jan 04, 2011 at 09:14:51PM +0300, Alexander Polakov wrote:
> > Hi,
> > 
> > I wonder if there any plans on adding multibyte support for ls(1)?
> > Or maybe there's a reason why it's not a great idea (which I am not
> > aware of)?
> > Anyway, here's a patch I have. It's based on DragonFlyBSD's ls.
> > 
> 
> Any locale stuff added to applications that are used on the ramdisk
> (bsd.rd) must be inside #ifndef SMALL.
> The ls binary is linked statically so we need to prevent it from wasting
> space by pulling citrus stuff onto the ramdisk.

Sure.
 
> More importantly, there is an alleged bug in our wcwidth() implementation.
> I haven't had time to investigate, but it has been pointed out on separate
> occasions, by Jordi Beltran Creix and by n...@.
> Test program (from Jordi):
> 
>   #include <stdio.h>
>   #include <locale.h>
>   
>   main ()
>   {
>       setlocale(LC_ALL, "");
>       printf("%d %d %d %d\n", wcwidth(0x53DA), wcwidth('A'),
>   wcwidth(0x200B), wcwidth(0x1F));
>       return 0;
>   }
>   
> Output is 2, 1, 1, 0, should be 2, 1, 0, -1 (according to Jordi).
> 
> We should make sure that wcwidth() is working properly before changing
> applications to use it. We also need a wcwidth() man page.

I think there're 2 separate bugs and I have 2 fixes (neither one
tested).

1) wcwidth(0x200B)
This if from http://unicode.org/Public/UNIDATA/ :

200B;ZERO WIDTH SPACE;Cf;0;BN;;;;;N;;;;;
200C;ZERO WIDTH NON-JOINER;Cf;0;BN;;;;;N;;;;;
200D;ZERO WIDTH JOINER;Cf;0;BN;;;;;N;;;;;

--- share/locale/ctype/en_US.UTF-8.src.orig     Tue Jan  4 22:49:22 2011
+++ share/locale/ctype/en_US.UTF-8.src  Tue Jan  4 22:50:55 2011
@@ -1672,7 +1672,8 @@
 BLANK     0x2000 - 0x200b  0x202f  0x205f
 PRINT     0x2000 - 0x200b  0x2010 - 0x2029  0x202f - 0x2052  0x2057
 PRINT     0x205f
-SWIDTH1   0x2000 - 0x200b  0x2010 - 0x2029  0x202f - 0x2052  0x2057
+SWIDTH1   0x2000 - 0x200c  0x2010 - 0x2029  0x202f - 0x2052  0x2057
+SWIDTH0   0x200b - 0x200d
 SWIDTH1   0x205f
 

2) wcwidth(0x1f)

DragonFly's man page for wcwidth(3) says that function returns -1 if 
character is not printable. _RUNETYPE_R is the flag to check.

--- lib/libc/locale/iswctype.c.orig     Tue Jan  4 23:12:23 2011
+++ lib/libc/locale/iswctype.c  Tue Jan  4 23:02:37 2011
@@ -170,7 +170,11 @@
 int
 wcwidth(wchar_t c)
 {
-        return (((unsigned)__runetype_w(c) & _CTYPE_SWM) >> _CTYPE_SWS);
+       _RuneType r;
+       r = __runetype_w(c);
+       if (r & _RUNETYPE_R)
+               return (((unsigned)r & _CTYPE_SWM) >> _CTYPE_SWS);
+       return -1;
 }
 
 wctrans_t

Again, I don't have hardware at hand to build libc so this is completely
untested.

Reply via email to