Module Name:    src
Committed By:   riastradh
Date:           Fri Aug 16 23:11:03 UTC 2024

Modified Files:
        src/lib/libc/locale: mbrtoc16.3 mbrtoc32.3

Log Message:
mbrtoc16(3), mbrtoc32(3): Work on deturgidifying prose.

Still maybe not great but at least there's less jargon in most of the
text, without really losing any content.

PR lib/52374: <uchar.h> missing


To generate a diff of this commit:
cvs rdiff -u -r1.5 -r1.6 src/lib/libc/locale/mbrtoc16.3 \
    src/lib/libc/locale/mbrtoc32.3

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.

Modified files:

Index: src/lib/libc/locale/mbrtoc16.3
diff -u src/lib/libc/locale/mbrtoc16.3:1.5 src/lib/libc/locale/mbrtoc16.3:1.6
--- src/lib/libc/locale/mbrtoc16.3:1.5	Fri Aug 16 13:37:43 2024
+++ src/lib/libc/locale/mbrtoc16.3	Fri Aug 16 23:11:02 2024
@@ -1,4 +1,4 @@
-.\"	$NetBSD: mbrtoc16.3,v 1.5 2024/08/16 13:37:43 riastradh Exp $
+.\"	$NetBSD: mbrtoc16.3,v 1.6 2024/08/16 23:11:02 riastradh Exp $
 .\"
 .\" Copyright (c) 2024 The NetBSD Foundation, Inc.
 .\" All rights reserved.
@@ -30,7 +30,7 @@
 .\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
 .Sh NAME
 .Nm mbrtoc16
-.Nd Restartable multibyte to UTF-16 code unit conversion
+.Nd Restartable multibyte to UTF-16 conversion
 .\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
 .Sh LIBRARY
 .Lb libc
@@ -50,20 +50,37 @@
 .Sh DESCRIPTION
 The
 .Nm
-function attempts to decode a multibyte character sequence at
-.Fa s
-of up to
+decodes multibyte characters in the current locale and converts them to
+UTF-16, keeping state so it can restart after incremental progress.
+.Pp
+Each call to
+.Nm :
+.Bl -enum -compact
+.It
+examines up to
 .Fa n
-bytes in the current locale, and yield the content as UTF-16 code
-units via the output parameter
-.Fa pc16 .
-.Fa pc16
-may be null, in which case no output is stored.
+bytes starting at
+.Fa s ,
+.It
+yields a UTF-16 code unit if available by storing it at
+.Li * Ns Fa pc16 ,
+.It
+saves state at
+.Fa ps ,
+and
+.It
+returns either the number of bytes consumed if any or a special return
+value.
+.El
+.Pp
+Specifically:
 .Bl -bullet
 .It
 If the multibyte sequence at
 .Fa s
-is invalid or an error occurs in decoding,
+is invalid after any previous input saved at
+.Fa ps ,
+or if an error occurs in decoding,
 .Nm
 returns
 .Li (size_t)-1
@@ -75,7 +92,7 @@ If the multibyte sequence at
 .Fa s
 is still incomplete after
 .Fa n
-bytes, including any previously processed input saved in
+bytes, including any previous input saved in
 .Fa ps ,
 .Nm
 saves its state in
@@ -85,53 +102,33 @@ after all the input so far and returns
 .It
 If
 .Nm
-finds the null scalar value at
-.Fa s ,
-then it stores zero at
+had previously decoded a multibyte character but has not yet yielded
+all the code units of its UTF-16 encoding, it stores the next UTF-16
+code unit at
 .Li * Ns Fa pc16
-and returns zero.
+and returns
+.Li "(size_t)-3" .
 .It
 If
 .Nm
-finds a nonnull scalar value in the Basic Multilingual Plane (BMP),
-i.e., a 16-bit scalar value, then it stores the scalar value at
-.Li * Ns Fa pc16 ,
-and returns the number of bytes it read from the input.
+decodes the null multibyte character, then it stores zero at
+.Li * Ns Fa pc16
+and returns zero.
 .It
-If
+Otherwise,
 .Nm
-finds a scalar value outside the BMP, then it:
-.Bl -dash -compact
-.It
-stores the scalar value's high surrogate code point at
-.Li * Ns Fa pc16 ;
-.It
-stores conversion state in
-.Fa ps
-to remember the rest of the pending scalar value; and
-.It
-returns the number of bytes it read from the input.
+decodes a single multibyte character, stores the first (and possibly
+only) code unit in its UTF-16 encoding at
+.Li * Ns Fa pc16 ,
+and returns the number of bytes consumed to decode the first multibyte
+character.
 .El
-.It
+.Pp
 If
-.Nm
-had previously found a scalar value outside the BMP, then, instead of
-any of the above options, it:
-.Bl -dash -compact
-.It
-stores the scalar value's low surrogate code point at
-.Li * Ns Fa pc16 ;
-.It
-consumes rest of the pending scalar value from the conversion state
-.Fa ps ;
-and
-.It
-returns
-.Li (size_t)-3
-to indicate that no bytes were consumed but a code unit was yielded
-nevertheless.
-.El
-.El
+.Fa pc16
+is a null pointer, nothing is stored, but the effects on
+.Fa ps
+and the return value are unchanged.
 .Pp
 If
 .Fa s
@@ -174,6 +171,15 @@ and
 which is initialized at program startup to the initial conversion
 state.
 .\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
+.Sh IMPLEMENTATION NOTES
+On well-formed input, the
+.Nm
+function yields either a Unicode scalar value in the Basic Multilingual
+Plane (BMP), i.e., a 16-bit Unicode code point that is not a surrogate
+code point, or, over two successive calls, yields the high and low
+surrogate code points (in that order) of a Unicode scalar value outside
+the BMP.
+.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
 .Sh RETURN VALUES
 The
 .Nm
@@ -197,26 +203,21 @@ if
 consumed
 .Ar i
 bytes of input to decode the next multibyte character, yielding a
-(nonnull) UTF-16 code unit, either a Unicode scalar value in the BMP or
-a high surrogate code point.
+UTF-16 code unit.
 .It Li (size_t)-3
 .Bq continuation
 if
 .Nm
-consumed no bytes of input but yielded a (nonnull) UTF-16 code unit, a
-low surrogate code point, because the previous call to
-.Nm
-with
-.Fa ps
-had yielded a high surrogate code point for a Unicode scalar value
-outside the BMP.
+consumed no new bytes of input but yielded a UTF-16 code unit that was
+pending from previous input.
 .It Li (size_t)-2
 .Bq incomplete
 if
 .Nm
-found an incomplete multibyte character after all
+found only an incomplete multibyte sequence after all
 .Fa n
-bytes of input, and saved its state to restart in the next call with
+bytes of input and any previous input, and saved its state to restart
+in the next call with
 .Fa ps .
 .It Li (size_t)-1
 .Bq error
@@ -262,7 +263,8 @@ while (n) {
 .Sh ERRORS
 .Bl -tag -width Bq
 .It Bq Er EILSEQ
-The multibyte sequence cannot be decoded as a Unicode scalar value.
+The multibyte sequence cannot be decoded in the current locale as a
+Unicode scalar value.
 .It Bq Er EIO
 An error occurred in loading the locale's character conversions.
 .El
Index: src/lib/libc/locale/mbrtoc32.3
diff -u src/lib/libc/locale/mbrtoc32.3:1.5 src/lib/libc/locale/mbrtoc32.3:1.6
--- src/lib/libc/locale/mbrtoc32.3:1.5	Fri Aug 16 13:37:43 2024
+++ src/lib/libc/locale/mbrtoc32.3	Fri Aug 16 23:11:03 2024
@@ -1,4 +1,4 @@
-.\"	$NetBSD: mbrtoc32.3,v 1.5 2024/08/16 13:37:43 riastradh Exp $
+.\"	$NetBSD: mbrtoc32.3,v 1.6 2024/08/16 23:11:03 riastradh Exp $
 .\"
 .\" Copyright (c) 2024 The NetBSD Foundation, Inc.
 .\" All rights reserved.
@@ -30,7 +30,7 @@
 .\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
 .Sh NAME
 .Nm mbrtoc32
-.Nd Restartable multibyte to UTF-32 code unit conversion
+.Nd Restartable multibyte to UTF-32 conversion
 .\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
 .Sh LIBRARY
 .Lb libc
@@ -50,20 +50,39 @@
 .Sh DESCRIPTION
 The
 .Nm
-function attempts to decode a multibyte character sequence at
-.Fa s
-of up to
+decodes multibyte characters in the current locale and converts them to
+Unicode scalar values (i.e., to UTF-32), keeping state so it can
+restart after incremental progress.
+.Pp
+Each call to
+.Nm :
+.Bl -enum -compact
+.It
+examines up to
 .Fa n
-bytes in the current locale, and yield the content as UTF-32 code
-units, i.e., Unicode scalar values, via the output parameter
-.Fa pc32 .
-.Fa pc32
-may be null, in which case no output is stored.
+bytes starting at
+.Fa s ,
+.It
+yields a Unicode scalar value (i.e., a UTF-32 code unit) if available
+by storing it at
+.Li * Ns Fa pc32 ,
+.It
+saves state at
+.Fa ps ,
+and
+.It
+returns either the number of bytes consumed if any or a special return
+value.
+.El
+.Pp
+Specifically:
 .Bl -bullet
 .It
 If the multibyte sequence at
 .Fa s
-is invalid or an error occurs in decoding,
+is invalid after any previous input saved at
+.Fa ps ,
+or if an error occurs in decoding,
 .Nm
 returns
 .Li (size_t)-1
@@ -75,7 +94,7 @@ If the multibyte sequence at
 .Fa s
 is still incomplete after
 .Fa n
-bytes, including any previously processed input saved in
+bytes, including any previous input saved in
 .Fa ps ,
 .Nm
 saves its state in
@@ -85,20 +104,26 @@ after all the input so far and returns
 .It
 If
 .Nm
-finds the null scalar value at
-.Fa s ,
-then it stores zero at
+decodes the null multibyte character, then it stores zero at
 .Li * Ns Fa pc32
 and returns zero.
 .It
-If
+Otherwise,
 .Nm
-finds a nonnull scalar value, then it stores the scalar value at
+decodes a single multibyte character, stores its Unicode scalar value
+at
 .Li * Ns Fa pc32 ,
-and returns the number of bytes it read from the input.
+and returns the number of bytes consumed to decode the first multibyte
+character.
 .El
 .Pp
 If
+.Fa pc32
+is a null pointer, nothing is stored, but the effects on
+.Fa ps
+and the return value are unchanged.
+.Pp
+If
 .Fa s
 is a null pointer, the
 .Nm
@@ -162,14 +187,15 @@ if
 consumed
 .Ar i
 bytes of input to decode the next multibyte character, yielding a
-(nonnull) Unicode scalar value.
+Unicode scalar value.
 .It Li (size_t)-2
 .Bq incomplete
 if
 .Nm
-found an incomplete multibyte character after all
+found only an incomplete multibyte sequence after all
 .Fa n
-bytes of input, and saved its state to restart in the next call with
+bytes of input and any previous input, and saved its state to restart
+in the next call with
 .Fa ps .
 .It Li (size_t)-1
 .Bq error
@@ -211,10 +237,8 @@ while (n) {
 .Sh ERRORS
 .Bl -tag -width Bq
 .It Bq Er EILSEQ
-A surrogate code point was passed.
-.It Bq Er EILSEQ
-The Unicode scalar value requested cannot be encoded as a multibyte
-sequence in the current locale.
+The multibyte sequence cannot be decoded in the current locale as a
+Unicode scalar value.
 .It Bq Er EIO
 An error occurred in loading the locale's character conversions.
 .El

Reply via email to