Hi Stefan,
Stefan Sperling wrote on Fri, Feb 05, 2016 at 04:06:50PM +0100:
> On Fri, Feb 05, 2016 at 03:53:44PM +0100, Ingo Schwarze wrote:
>> Index: mbtowc.c
>> ===================================================================
>> RCS file: /cvs/src/lib/libc/locale/mbtowc.c,v
>> retrieving revision 1.2
>> diff -u -p -r1.2 mbtowc.c
>> --- mbtowc.c 5 Dec 2012 23:20:00 -0000 1.2
>> +++ mbtowc.c 5 Feb 2016 13:26:17 -0000
>> @@ -44,7 +44,14 @@ mbtowc(wchar_t * __restrict pwc, const c
>> return (0);
>> }
>> rval = mbrtowc(pwc, s, n, &mbs);
>>
>> - if (rval == (size_t)-1 || rval == (size_t)-2)
>> - return (-1);
>> - return ((int)rval);
>> +
>> + switch (rval) {
>> + case (size_t)-2:
>> + errno = EILSEQ;
> Doesn't mbrtowc(3) already set errno to EILSEQ when it returns (size_t)-2?
No, it doesn't.
> At least our man page says it would:
>
> (size_t)-2 s points to an incomplete byte sequence of length n which
> has been consumed and contains part of a valid multibyte
> character. mbrtowc() sets errno to EILSEQ. The character
That's clearly a documentation bug. Patch below.
Yes, our multibyte code and documentation is still full of bugs,
to a degree quite unusual for OpenBSD. That's why we are working
on it. :-/
> Though citrus_utf8.c doesn't seem to do this... hmmm.
> Perhaps you should fix mbrtowc() instead?
No. POSIX is quite clear that mbrtowc(3) must not set errno
when returning -2 and that returning -2 does not indicate an error:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/mbrtowc.html
Also, setting errno for incomplete characters in mbrtowc(3) wouldn't
make sense because the typical idiom using mbrtowc(3) is to read
byte by byte and try mbrtowc(3) after each byte until it succeeds.
So the typical use case would always clobber errno, which would be
bad. Even internally in our libc, mbrtowc(3) is used like that
at a few places, for example in vfscanf(3).
Yours,
Ingo
Index: mbrtowc.3
===================================================================
RCS file: /cvs/src/lib/libc/locale/mbrtowc.3,v
retrieving revision 1.4
diff -u -p -r1.4 mbrtowc.3
--- mbrtowc.3 5 Jun 2013 03:39:22 -0000 1.4
+++ mbrtowc.3 5 Feb 2016 15:44:33 -0000
@@ -210,10 +210,6 @@ truncated input.
points to an incomplete byte sequence of length
.Fa n
which has been consumed and contains part of a valid multibyte character.
-.Fn mbrtowc
-sets
-.Va errno
-to EILSEQ.
The character may be completed by calling
.Fn mbrtowc
again with
@@ -230,7 +226,7 @@ function may cause an error in the follo
.Bl -tag -width Er
.It Bq Er EILSEQ
.Fa s
-points to an invalid or incomplete multibyte character.
+points to an invalid multibyte character.
.It Bq Er EINVAL
.Fa mbs
points to an invalid or uninitialized mbstate_t object.