Op Fri, 19 May 2017 15:17:55 +0200 schreef Anton Lindqvist
<anton.lindqv...@gmail.com>:
On Fri, May 19, 2017 at 09:33:33AM -0300, Lucas Gabriel Vuotto wrote:
On 19/05/17 03:42, Anton Lindqvist wrote:
>
> +static int
> +u8len(unsigned char c)
> +{
> + switch (c & 0xF0) {
> + case 0xF0:
> + return 4;
> + case 0xE0:
> + return 3;
> + case 0xC0:
> + return 2;
> + default:
> + return 1;
> + }
> +}
> +
This is wrong: most codepoints in the range U+0080-U+07ff (the ones
greater than U+0400) would be interpreted as being 1 character long
instead of 2.
Thanks for the heads-up. Maybe a more reliable solution would be to call
mbtowc(3) repeatedly as new input arrives until it returns successfully.
Assuming the first read byte is a UTF-8 start byte.
Not needed. Only case 0xD0 is missing.
case 0xC0: case 0xD0:
return 2;
--
Gemaakt met Opera's e-mailprogramma: http://www.opera.com/mail/