Module Name: src Committed By: riastradh Date: Sat Aug 17 00:29:21 UTC 2024
Modified Files: src/lib/libc/locale: c16rtomb.3 c32rtomb.3 Log Message: c16rtomb(3), c32rtomb(3): Attempt a deturgidification pass. Limit the jargon around surrogates. PR lib/52374: <uchar.h> missing To generate a diff of this commit: cvs rdiff -u -r1.5 -r1.6 src/lib/libc/locale/c16rtomb.3 \ src/lib/libc/locale/c32rtomb.3 Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files.
Modified files: Index: src/lib/libc/locale/c16rtomb.3 diff -u src/lib/libc/locale/c16rtomb.3:1.5 src/lib/libc/locale/c16rtomb.3:1.6 --- src/lib/libc/locale/c16rtomb.3:1.5 Fri Aug 16 19:39:51 2024 +++ src/lib/libc/locale/c16rtomb.3 Sat Aug 17 00:29:21 2024 @@ -1,4 +1,4 @@ -.\" $NetBSD: c16rtomb.3,v 1.5 2024/08/16 19:39:51 riastradh Exp $ +.\" $NetBSD: c16rtomb.3,v 1.6 2024/08/17 00:29:21 riastradh Exp $ .\" .\" Copyright (c) 2024 The NetBSD Foundation, Inc. .\" All rights reserved. @@ -30,7 +30,7 @@ .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .Sh NAME .Nm c16rtomb -.Nd Restartable UTF-16 code unit to multibyte conversion +.Nd Restartable UTF-16 to multibyte conversion .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .Sh LIBRARY .Lb libc @@ -49,49 +49,52 @@ .Sh DESCRIPTION The .Nm -function attempts to encode Unicode input as a multibyte character -sequence output at -.Fa s -in the current locale, writing anywhere between zero and -.Dv MB_CUR_MAX -bytes, inclusive, to -.Fa s , -depending on the inputs and conversion state -.Fa ps . -.Pp -The input -.Fa c16 -is a UTF-16 code unit, which can be either: -.Bl -bullet -.It -a Unicode scalar value in the Basic Multilingual Plane (BMP), that is, -a 16-bit code unit outside the interval [0xd800,0xdfff]; or, -.It -over the course of two consecutive calls to -.Nm , -the high and low surrogate code points of a Unicode scalar value -outside the BMP. -.El +function decodes UTF-16 and converts it to multibyte characters in the +current locale, keeping state so it can restart after incremental +progress. .Pp -If a low surrogate code point, that is, a value of -.Fa c16 -in [0xdc00,0xdfff], is passed to +Each call to .Nm -without the preceding call to it with the same +updates the conversion state .Fa ps -having been passed a high surrogate code point, that is, a value of +with a UTF-16 code unit +.Fa c16 , +writes up to +.Dv MB_CUR_MAX +bytes to +.Fa s +(possibly none), and returns either the number of bytes written to +.Fa s +or +.Li (size_t)-1 +to denote error. +.Pp +Over successive calls to +.Nm +with the same state +.Fa ps , +the sequence of .Fa c16 -in [0xd800,0xdbff], or if a high surrogate was passed in the previous -call and anything other than a low surrogate is passed, then +values must be a well-formed UTF-16 code unit sequence. +If +.Fa c16 , +when appended to the sequence of code units passed in previous calls, +does not form a well-formed UTF-16 code unit sequence, then .Nm -will return +returns .Li (size_t)-1 -to denote failure with +with .Xr errno 2 set to .Er EILSEQ . .Pp If +.Fa s +is a null pointer, no output is stored, but the effects on +.Fa ps +and the return value are unchanged. +.Pp +If .Fa ps is a null pointer, .Nm @@ -148,9 +151,9 @@ printf("%s\en", buf); .Sh ERRORS .Bl -tag -width Bq .It Bq Er EILSEQ -A surrogate code point was passed as +The .Fa c16 -when it is inappropriate. +input sequence does not encode a Unicode scalar value in UTF-16. .It Bq Er EILSEQ The Unicode scalar value requested cannot be encoded as a multibyte sequence in the current locale. Index: src/lib/libc/locale/c32rtomb.3 diff -u src/lib/libc/locale/c32rtomb.3:1.5 src/lib/libc/locale/c32rtomb.3:1.6 --- src/lib/libc/locale/c32rtomb.3:1.5 Fri Aug 16 19:39:51 2024 +++ src/lib/libc/locale/c32rtomb.3 Sat Aug 17 00:29:21 2024 @@ -1,4 +1,4 @@ -.\" $NetBSD: c32rtomb.3,v 1.5 2024/08/16 19:39:51 riastradh Exp $ +.\" $NetBSD: c32rtomb.3,v 1.6 2024/08/17 00:29:21 riastradh Exp $ .\" .\" Copyright (c) 2024 The NetBSD Foundation, Inc. .\" All rights reserved. @@ -30,7 +30,7 @@ .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .Sh NAME .Nm c32rtomb -.Nd Restartable UTF-32 code unit to multibyte conversion +.Nd Restartable UTF-32 to multibyte conversion .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .Sh LIBRARY .Lb libc @@ -49,30 +49,37 @@ .Sh DESCRIPTION The .Nm -function attempts to encode Unicode input as a multibyte character -sequence output at -.Fa s -in the current locale, writing anywhere between zero and +function converts Unicode scalar values to multibyte characters in the +current locale, keeping state so it can restart after incremental +progress. +.Pp +Each call to +.Nm +updates the conversion state +.Fa ps +with a UTF-32 code unit +.Fa c32 , +writes up to .Dv MB_CUR_MAX -bytes, inclusive, to +bytes to .Fa s , -depending on the inputs and conversion state -.Fa ps . +and returns either the number of bytes written to +.Fa s +or +.Li (size_t)-1 +to denote error. .Pp The input .Fa c32 -is a UTF-32 code unit, which represents a single Unicode scalar value, -i.e., a Unicode code point that is not in the interval [0xd800,0xdfff] -of surrogate code points. +is a UTF-32 code unit, representing represents a Unicode scalar value, +i.e., a Unicode code point that is not a surrogate code point \(em in +other words, an integer either in [0,0xd7ff] or in [0xe000,0x10ffff]. .Pp -If a surrogate code point is passed, -.Nm -will return -.Li (size_t)-1 -to denote failure with -.Xr errno 2 -set to -.Er EILSEQ . +If +.Fa s +is a null pointer, no output is stored, but the effects on +.Fa ps +and the return value are unchanged. .Pp If .Fa ps @@ -131,8 +138,10 @@ printf("%s\en", buf); .Sh ERRORS .Bl -tag -width Bq .It Bq Er EILSEQ -A surrogate code point was passed as -.Fa c32 . +.Fa c32 +is not a Unicode scalar value, i.e., it is a surrogate code point in +the interval [0xd800,0xdfff] or it lies outside the Unicode codespace +[0,0x10ffff] altogether. .It Bq Er EILSEQ The Unicode scalar value requested cannot be encoded as a multibyte sequence in the current locale.