Module Name:    src
Committed By:   kre
Date:           Wed Aug  7 15:40:03 UTC 2024

Modified Files:
        src/usr.bin/printf: printf.c

Log Message:
Correctly handle extracting wide chars from empty strings.

Fix a (probably would have rarely been seen) bug I installed yesterday.

It turns out that mbtowc() needs to include the terminating \0 in the
length arg passed to it, or it errors (EILSEQ) on a zero length (instead
of doing the sane thing and treating that the same as "\0" (treated as
being length 1).   So, increase the length passed to mbtowc() by 1.
That makes no difference in the typical case, it is an upper limit on
the number of bytes to examine, and mbtowc() stops after it has
converted 1 character, so in the non "" input cases, nothing that
matters changes.

The rest of this you can skip if you like, not directly related to
this change...

Note: it is not clear to me what is correct here, POSIX looks to be
ambiguous, or strange anyway; in the RETURN VALUE section it says:

   If s is not a null pointer, mbtowc() shall either return 0 (if s points
   to the null byte), or return the number of bytes [...]

Further for the error possibilities it says:

[EILSEQ]  An invalid character sequence is detected. In the POSIX locale
          an [EILSEQ] error cannot occur since all byte values are valid
          characters.

On the other hand our mbtowc(3) says:

     There are special cases:

     n == 0     In this case, the first n bytes of the array pointed to by
                s never form a complete character.  Thus, the mbtowc()
                always fails.

Since EILSEQ is the only defined error for mbtowc() in POSIX, and
cannot happen (according to it) in the POSIX locale, that "always fails"
in our manual page looks dubious.

What actually happens in our mbtowc() in the POSIX locale, is that if
passed n==0 (and *s == '\0') mbtowc() returns 0 (that's good) but
also sets errno to EILSEQ (not so good - though this is not one of
the functions guaranteed to not alter errno if it doesn't fail).

In other locales it returns -1 (with errno == EILSEQ) when n == 0.
(Well, in some other locales anyway, I didn't go and test all of them).

Where POSIX gets weird, is that earlier it says:

    At most n bytes of the array pointed to by s shall be examined.

If n == 0, then no bytes can be examined.  In that case mbtowc()
cannot test whether s points to the null byte, even in the POSIX locale.

So it is unclear (to me) what should be returned in that case.


To generate a diff of this commit:
cvs rdiff -u -r1.57 -r1.58 src/usr.bin/printf/printf.c

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.

Reply via email to