Module Name:    src
Committed By:   riastradh
Date:           Sat Aug 17 00:29:21 UTC 2024

Modified Files:
        src/lib/libc/locale: c16rtomb.3 c32rtomb.3

Log Message:
c16rtomb(3), c32rtomb(3): Attempt a deturgidification pass.

Limit the jargon around surrogates.

PR lib/52374: <uchar.h> missing


To generate a diff of this commit:
cvs rdiff -u -r1.5 -r1.6 src/lib/libc/locale/c16rtomb.3 \
    src/lib/libc/locale/c32rtomb.3

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.

Modified files:

Index: src/lib/libc/locale/c16rtomb.3
diff -u src/lib/libc/locale/c16rtomb.3:1.5 src/lib/libc/locale/c16rtomb.3:1.6
--- src/lib/libc/locale/c16rtomb.3:1.5	Fri Aug 16 19:39:51 2024
+++ src/lib/libc/locale/c16rtomb.3	Sat Aug 17 00:29:21 2024
@@ -1,4 +1,4 @@
-.\"	$NetBSD: c16rtomb.3,v 1.5 2024/08/16 19:39:51 riastradh Exp $
+.\"	$NetBSD: c16rtomb.3,v 1.6 2024/08/17 00:29:21 riastradh Exp $
 .\"
 .\" Copyright (c) 2024 The NetBSD Foundation, Inc.
 .\" All rights reserved.
@@ -30,7 +30,7 @@
 .\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
 .Sh NAME
 .Nm c16rtomb
-.Nd Restartable UTF-16 code unit to multibyte conversion
+.Nd Restartable UTF-16 to multibyte conversion
 .\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
 .Sh LIBRARY
 .Lb libc
@@ -49,49 +49,52 @@
 .Sh DESCRIPTION
 The
 .Nm
-function attempts to encode Unicode input as a multibyte character
-sequence output at
-.Fa s
-in the current locale, writing anywhere between zero and
-.Dv MB_CUR_MAX
-bytes, inclusive, to
-.Fa s ,
-depending on the inputs and conversion state
-.Fa ps .
-.Pp
-The input
-.Fa c16
-is a UTF-16 code unit, which can be either:
-.Bl -bullet
-.It
-a Unicode scalar value in the Basic Multilingual Plane (BMP), that is,
-a 16-bit code unit outside the interval [0xd800,0xdfff]; or,
-.It
-over the course of two consecutive calls to
-.Nm ,
-the high and low surrogate code points of a Unicode scalar value
-outside the BMP.
-.El
+function decodes UTF-16 and converts it to multibyte characters in the
+current locale, keeping state so it can restart after incremental
+progress.
 .Pp
-If a low surrogate code point, that is, a value of
-.Fa c16
-in [0xdc00,0xdfff], is passed to
+Each call to
 .Nm
-without the preceding call to it with the same
+updates the conversion state
 .Fa ps
-having been passed a high surrogate code point, that is, a value of
+with a UTF-16 code unit
+.Fa c16 ,
+writes up to
+.Dv MB_CUR_MAX
+bytes to
+.Fa s
+(possibly none), and returns either the number of bytes written to
+.Fa s
+or
+.Li (size_t)-1
+to denote error.
+.Pp
+Over successive calls to
+.Nm
+with the same state
+.Fa ps ,
+the sequence of
 .Fa c16
-in [0xd800,0xdbff], or if a high surrogate was passed in the previous
-call and anything other than a low surrogate is passed, then
+values must be a well-formed UTF-16 code unit sequence.
+If
+.Fa c16 ,
+when appended to the sequence of code units passed in previous calls,
+does not form a well-formed UTF-16 code unit sequence, then
 .Nm
-will return
+returns
 .Li (size_t)-1
-to denote failure with
+with
 .Xr errno 2
 set to
 .Er EILSEQ .
 .Pp
 If
+.Fa s
+is a null pointer, no output is stored, but the effects on
+.Fa ps
+and the return value are unchanged.
+.Pp
+If
 .Fa ps
 is a null pointer,
 .Nm
@@ -148,9 +151,9 @@ printf("%s\en", buf);
 .Sh ERRORS
 .Bl -tag -width Bq
 .It Bq Er EILSEQ
-A surrogate code point was passed as
+The
 .Fa c16
-when it is inappropriate.
+input sequence does not encode a Unicode scalar value in UTF-16.
 .It Bq Er EILSEQ
 The Unicode scalar value requested cannot be encoded as a multibyte
 sequence in the current locale.
Index: src/lib/libc/locale/c32rtomb.3
diff -u src/lib/libc/locale/c32rtomb.3:1.5 src/lib/libc/locale/c32rtomb.3:1.6
--- src/lib/libc/locale/c32rtomb.3:1.5	Fri Aug 16 19:39:51 2024
+++ src/lib/libc/locale/c32rtomb.3	Sat Aug 17 00:29:21 2024
@@ -1,4 +1,4 @@
-.\"	$NetBSD: c32rtomb.3,v 1.5 2024/08/16 19:39:51 riastradh Exp $
+.\"	$NetBSD: c32rtomb.3,v 1.6 2024/08/17 00:29:21 riastradh Exp $
 .\"
 .\" Copyright (c) 2024 The NetBSD Foundation, Inc.
 .\" All rights reserved.
@@ -30,7 +30,7 @@
 .\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
 .Sh NAME
 .Nm c32rtomb
-.Nd Restartable UTF-32 code unit to multibyte conversion
+.Nd Restartable UTF-32 to multibyte conversion
 .\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
 .Sh LIBRARY
 .Lb libc
@@ -49,30 +49,37 @@
 .Sh DESCRIPTION
 The
 .Nm
-function attempts to encode Unicode input as a multibyte character
-sequence output at
-.Fa s
-in the current locale, writing anywhere between zero and
+function converts Unicode scalar values to multibyte characters in the
+current locale, keeping state so it can restart after incremental
+progress.
+.Pp
+Each call to
+.Nm
+updates the conversion state
+.Fa ps
+with a UTF-32 code unit
+.Fa c32 ,
+writes up to
 .Dv MB_CUR_MAX
-bytes, inclusive, to
+bytes to
 .Fa s ,
-depending on the inputs and conversion state
-.Fa ps .
+and returns either the number of bytes written to
+.Fa s
+or
+.Li (size_t)-1
+to denote error.
 .Pp
 The input
 .Fa c32
-is a UTF-32 code unit, which represents a single Unicode scalar value,
-i.e., a Unicode code point that is not in the interval [0xd800,0xdfff]
-of surrogate code points.
+is a UTF-32 code unit, representing represents a Unicode scalar value,
+i.e., a Unicode code point that is not a surrogate code point \(em in
+other words, an integer either in [0,0xd7ff] or in [0xe000,0x10ffff].
 .Pp
-If a surrogate code point is passed,
-.Nm
-will return
-.Li (size_t)-1
-to denote failure with
-.Xr errno 2
-set to
-.Er EILSEQ .
+If
+.Fa s
+is a null pointer, no output is stored, but the effects on
+.Fa ps
+and the return value are unchanged.
 .Pp
 If
 .Fa ps
@@ -131,8 +138,10 @@ printf("%s\en", buf);
 .Sh ERRORS
 .Bl -tag -width Bq
 .It Bq Er EILSEQ
-A surrogate code point was passed as
-.Fa c32 .
+.Fa c32
+is not a Unicode scalar value, i.e., it is a surrogate code point in
+the interval [0xd800,0xdfff] or it lies outside the Unicode codespace
+[0,0x10ffff] altogether.
 .It Bq Er EILSEQ
 The Unicode scalar value requested cannot be encoded as a multibyte
 sequence in the current locale.

Reply via email to