Module Name: src Committed By: riastradh Date: Fri Aug 16 23:11:03 UTC 2024
Modified Files: src/lib/libc/locale: mbrtoc16.3 mbrtoc32.3 Log Message: mbrtoc16(3), mbrtoc32(3): Work on deturgidifying prose. Still maybe not great but at least there's less jargon in most of the text, without really losing any content. PR lib/52374: <uchar.h> missing To generate a diff of this commit: cvs rdiff -u -r1.5 -r1.6 src/lib/libc/locale/mbrtoc16.3 \ src/lib/libc/locale/mbrtoc32.3 Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files.
Modified files: Index: src/lib/libc/locale/mbrtoc16.3 diff -u src/lib/libc/locale/mbrtoc16.3:1.5 src/lib/libc/locale/mbrtoc16.3:1.6 --- src/lib/libc/locale/mbrtoc16.3:1.5 Fri Aug 16 13:37:43 2024 +++ src/lib/libc/locale/mbrtoc16.3 Fri Aug 16 23:11:02 2024 @@ -1,4 +1,4 @@ -.\" $NetBSD: mbrtoc16.3,v 1.5 2024/08/16 13:37:43 riastradh Exp $ +.\" $NetBSD: mbrtoc16.3,v 1.6 2024/08/16 23:11:02 riastradh Exp $ .\" .\" Copyright (c) 2024 The NetBSD Foundation, Inc. .\" All rights reserved. @@ -30,7 +30,7 @@ .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .Sh NAME .Nm mbrtoc16 -.Nd Restartable multibyte to UTF-16 code unit conversion +.Nd Restartable multibyte to UTF-16 conversion .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .Sh LIBRARY .Lb libc @@ -50,20 +50,37 @@ .Sh DESCRIPTION The .Nm -function attempts to decode a multibyte character sequence at -.Fa s -of up to +decodes multibyte characters in the current locale and converts them to +UTF-16, keeping state so it can restart after incremental progress. +.Pp +Each call to +.Nm : +.Bl -enum -compact +.It +examines up to .Fa n -bytes in the current locale, and yield the content as UTF-16 code -units via the output parameter -.Fa pc16 . -.Fa pc16 -may be null, in which case no output is stored. +bytes starting at +.Fa s , +.It +yields a UTF-16 code unit if available by storing it at +.Li * Ns Fa pc16 , +.It +saves state at +.Fa ps , +and +.It +returns either the number of bytes consumed if any or a special return +value. +.El +.Pp +Specifically: .Bl -bullet .It If the multibyte sequence at .Fa s -is invalid or an error occurs in decoding, +is invalid after any previous input saved at +.Fa ps , +or if an error occurs in decoding, .Nm returns .Li (size_t)-1 @@ -75,7 +92,7 @@ If the multibyte sequence at .Fa s is still incomplete after .Fa n -bytes, including any previously processed input saved in +bytes, including any previous input saved in .Fa ps , .Nm saves its state in @@ -85,53 +102,33 @@ after all the input so far and returns .It If .Nm -finds the null scalar value at -.Fa s , -then it stores zero at +had previously decoded a multibyte character but has not yet yielded +all the code units of its UTF-16 encoding, it stores the next UTF-16 +code unit at .Li * Ns Fa pc16 -and returns zero. +and returns +.Li "(size_t)-3" . .It If .Nm -finds a nonnull scalar value in the Basic Multilingual Plane (BMP), -i.e., a 16-bit scalar value, then it stores the scalar value at -.Li * Ns Fa pc16 , -and returns the number of bytes it read from the input. +decodes the null multibyte character, then it stores zero at +.Li * Ns Fa pc16 +and returns zero. .It -If +Otherwise, .Nm -finds a scalar value outside the BMP, then it: -.Bl -dash -compact -.It -stores the scalar value's high surrogate code point at -.Li * Ns Fa pc16 ; -.It -stores conversion state in -.Fa ps -to remember the rest of the pending scalar value; and -.It -returns the number of bytes it read from the input. +decodes a single multibyte character, stores the first (and possibly +only) code unit in its UTF-16 encoding at +.Li * Ns Fa pc16 , +and returns the number of bytes consumed to decode the first multibyte +character. .El -.It +.Pp If -.Nm -had previously found a scalar value outside the BMP, then, instead of -any of the above options, it: -.Bl -dash -compact -.It -stores the scalar value's low surrogate code point at -.Li * Ns Fa pc16 ; -.It -consumes rest of the pending scalar value from the conversion state -.Fa ps ; -and -.It -returns -.Li (size_t)-3 -to indicate that no bytes were consumed but a code unit was yielded -nevertheless. -.El -.El +.Fa pc16 +is a null pointer, nothing is stored, but the effects on +.Fa ps +and the return value are unchanged. .Pp If .Fa s @@ -174,6 +171,15 @@ and which is initialized at program startup to the initial conversion state. .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" +.Sh IMPLEMENTATION NOTES +On well-formed input, the +.Nm +function yields either a Unicode scalar value in the Basic Multilingual +Plane (BMP), i.e., a 16-bit Unicode code point that is not a surrogate +code point, or, over two successive calls, yields the high and low +surrogate code points (in that order) of a Unicode scalar value outside +the BMP. +.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .Sh RETURN VALUES The .Nm @@ -197,26 +203,21 @@ if consumed .Ar i bytes of input to decode the next multibyte character, yielding a -(nonnull) UTF-16 code unit, either a Unicode scalar value in the BMP or -a high surrogate code point. +UTF-16 code unit. .It Li (size_t)-3 .Bq continuation if .Nm -consumed no bytes of input but yielded a (nonnull) UTF-16 code unit, a -low surrogate code point, because the previous call to -.Nm -with -.Fa ps -had yielded a high surrogate code point for a Unicode scalar value -outside the BMP. +consumed no new bytes of input but yielded a UTF-16 code unit that was +pending from previous input. .It Li (size_t)-2 .Bq incomplete if .Nm -found an incomplete multibyte character after all +found only an incomplete multibyte sequence after all .Fa n -bytes of input, and saved its state to restart in the next call with +bytes of input and any previous input, and saved its state to restart +in the next call with .Fa ps . .It Li (size_t)-1 .Bq error @@ -262,7 +263,8 @@ while (n) { .Sh ERRORS .Bl -tag -width Bq .It Bq Er EILSEQ -The multibyte sequence cannot be decoded as a Unicode scalar value. +The multibyte sequence cannot be decoded in the current locale as a +Unicode scalar value. .It Bq Er EIO An error occurred in loading the locale's character conversions. .El Index: src/lib/libc/locale/mbrtoc32.3 diff -u src/lib/libc/locale/mbrtoc32.3:1.5 src/lib/libc/locale/mbrtoc32.3:1.6 --- src/lib/libc/locale/mbrtoc32.3:1.5 Fri Aug 16 13:37:43 2024 +++ src/lib/libc/locale/mbrtoc32.3 Fri Aug 16 23:11:03 2024 @@ -1,4 +1,4 @@ -.\" $NetBSD: mbrtoc32.3,v 1.5 2024/08/16 13:37:43 riastradh Exp $ +.\" $NetBSD: mbrtoc32.3,v 1.6 2024/08/16 23:11:03 riastradh Exp $ .\" .\" Copyright (c) 2024 The NetBSD Foundation, Inc. .\" All rights reserved. @@ -30,7 +30,7 @@ .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .Sh NAME .Nm mbrtoc32 -.Nd Restartable multibyte to UTF-32 code unit conversion +.Nd Restartable multibyte to UTF-32 conversion .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .Sh LIBRARY .Lb libc @@ -50,20 +50,39 @@ .Sh DESCRIPTION The .Nm -function attempts to decode a multibyte character sequence at -.Fa s -of up to +decodes multibyte characters in the current locale and converts them to +Unicode scalar values (i.e., to UTF-32), keeping state so it can +restart after incremental progress. +.Pp +Each call to +.Nm : +.Bl -enum -compact +.It +examines up to .Fa n -bytes in the current locale, and yield the content as UTF-32 code -units, i.e., Unicode scalar values, via the output parameter -.Fa pc32 . -.Fa pc32 -may be null, in which case no output is stored. +bytes starting at +.Fa s , +.It +yields a Unicode scalar value (i.e., a UTF-32 code unit) if available +by storing it at +.Li * Ns Fa pc32 , +.It +saves state at +.Fa ps , +and +.It +returns either the number of bytes consumed if any or a special return +value. +.El +.Pp +Specifically: .Bl -bullet .It If the multibyte sequence at .Fa s -is invalid or an error occurs in decoding, +is invalid after any previous input saved at +.Fa ps , +or if an error occurs in decoding, .Nm returns .Li (size_t)-1 @@ -75,7 +94,7 @@ If the multibyte sequence at .Fa s is still incomplete after .Fa n -bytes, including any previously processed input saved in +bytes, including any previous input saved in .Fa ps , .Nm saves its state in @@ -85,20 +104,26 @@ after all the input so far and returns .It If .Nm -finds the null scalar value at -.Fa s , -then it stores zero at +decodes the null multibyte character, then it stores zero at .Li * Ns Fa pc32 and returns zero. .It -If +Otherwise, .Nm -finds a nonnull scalar value, then it stores the scalar value at +decodes a single multibyte character, stores its Unicode scalar value +at .Li * Ns Fa pc32 , -and returns the number of bytes it read from the input. +and returns the number of bytes consumed to decode the first multibyte +character. .El .Pp If +.Fa pc32 +is a null pointer, nothing is stored, but the effects on +.Fa ps +and the return value are unchanged. +.Pp +If .Fa s is a null pointer, the .Nm @@ -162,14 +187,15 @@ if consumed .Ar i bytes of input to decode the next multibyte character, yielding a -(nonnull) Unicode scalar value. +Unicode scalar value. .It Li (size_t)-2 .Bq incomplete if .Nm -found an incomplete multibyte character after all +found only an incomplete multibyte sequence after all .Fa n -bytes of input, and saved its state to restart in the next call with +bytes of input and any previous input, and saved its state to restart +in the next call with .Fa ps . .It Li (size_t)-1 .Bq error @@ -211,10 +237,8 @@ while (n) { .Sh ERRORS .Bl -tag -width Bq .It Bq Er EILSEQ -A surrogate code point was passed. -.It Bq Er EILSEQ -The Unicode scalar value requested cannot be encoded as a multibyte -sequence in the current locale. +The multibyte sequence cannot be decoded in the current locale as a +Unicode scalar value. .It Bq Er EIO An error occurred in loading the locale's character conversions. .El