> Currently it is not possible to use unicode codepoints > 0xFF on the console,
> because our UTF-8 decoding logic is badly broken.
>
> The code in question is in wsemul_subr.c, wsemul_getchar().
>
> The problem is that we calculate the number of bytes in a multi-byte
> sequence by just looking at the high bits in turn:
>
> if (frag & 0x20) {
> frag &= ~0x20;
> mbleft++;
> }
> if (frag & 0x10) {
> frag &= ~0x10;
> mbleft++;
> }
> if (frag & 0x08) {
> frag &= ~0x08;
> mbleft++;
> }
> if (frag & 0x04) {
> frag &= ~0x04;
> mbleft++;
> }
>
> This is wrong, for several reasons.
Doh! Thanks for noticing this. I have replaced that code with something
much saner now.
Miod