Module Name: src
Committed By: riastradh
Date: Sat Aug 17 00:29:21 UTC 2024
Modified Files:
src/lib/libc/locale: c16rtomb.3 c32rtomb.3
Log Message:
c16rtomb(3), c32rtomb(3): Attempt a deturgidification pass.
Limit the jargon around surrogates.
PR lib/52374: <uchar.h> missing
To generate a diff of this commit:
cvs rdiff -u -r1.5 -r1.6 src/lib/libc/locale/c16rtomb.3 \
src/lib/libc/locale/c32rtomb.3
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
Modified files:
Index: src/lib/libc/locale/c16rtomb.3
diff -u src/lib/libc/locale/c16rtomb.3:1.5 src/lib/libc/locale/c16rtomb.3:1.6
--- src/lib/libc/locale/c16rtomb.3:1.5 Fri Aug 16 19:39:51 2024
+++ src/lib/libc/locale/c16rtomb.3 Sat Aug 17 00:29:21 2024
@@ -1,4 +1,4 @@
-.\" $NetBSD: c16rtomb.3,v 1.5 2024/08/16 19:39:51 riastradh Exp $
+.\" $NetBSD: c16rtomb.3,v 1.6 2024/08/17 00:29:21 riastradh Exp $
.\"
.\" Copyright (c) 2024 The NetBSD Foundation, Inc.
.\" All rights reserved.
@@ -30,7 +30,7 @@
.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
.Sh NAME
.Nm c16rtomb
-.Nd Restartable UTF-16 code unit to multibyte conversion
+.Nd Restartable UTF-16 to multibyte conversion
.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
.Sh LIBRARY
.Lb libc
@@ -49,49 +49,52 @@
.Sh DESCRIPTION
The
.Nm
-function attempts to encode Unicode input as a multibyte character
-sequence output at
-.Fa s
-in the current locale, writing anywhere between zero and
-.Dv MB_CUR_MAX
-bytes, inclusive, to
-.Fa s ,
-depending on the inputs and conversion state
-.Fa ps .
-.Pp
-The input
-.Fa c16
-is a UTF-16 code unit, which can be either:
-.Bl -bullet
-.It
-a Unicode scalar value in the Basic Multilingual Plane (BMP), that is,
-a 16-bit code unit outside the interval [0xd800,0xdfff]; or,
-.It
-over the course of two consecutive calls to
-.Nm ,
-the high and low surrogate code points of a Unicode scalar value
-outside the BMP.
-.El
+function decodes UTF-16 and converts it to multibyte characters in the
+current locale, keeping state so it can restart after incremental
+progress.
.Pp
-If a low surrogate code point, that is, a value of
-.Fa c16
-in [0xdc00,0xdfff], is passed to
+Each call to
.Nm
-without the preceding call to it with the same
+updates the conversion state
.Fa ps
-having been passed a high surrogate code point, that is, a value of
+with a UTF-16 code unit
+.Fa c16 ,
+writes up to
+.Dv MB_CUR_MAX
+bytes to
+.Fa s
+(possibly none), and returns either the number of bytes written to
+.Fa s
+or
+.Li (size_t)-1
+to denote error.
+.Pp
+Over successive calls to
+.Nm
+with the same state
+.Fa ps ,
+the sequence of
.Fa c16
-in [0xd800,0xdbff], or if a high surrogate was passed in the previous
-call and anything other than a low surrogate is passed, then
+values must be a well-formed UTF-16 code unit sequence.
+If
+.Fa c16 ,
+when appended to the sequence of code units passed in previous calls,
+does not form a well-formed UTF-16 code unit sequence, then
.Nm
-will return
+returns
.Li (size_t)-1
-to denote failure with
+with
.Xr errno 2
set to
.Er EILSEQ .
.Pp
If
+.Fa s
+is a null pointer, no output is stored, but the effects on
+.Fa ps
+and the return value are unchanged.
+.Pp
+If
.Fa ps
is a null pointer,
.Nm
@@ -148,9 +151,9 @@ printf("%s\en", buf);
.Sh ERRORS
.Bl -tag -width Bq
.It Bq Er EILSEQ
-A surrogate code point was passed as
+The
.Fa c16
-when it is inappropriate.
+input sequence does not encode a Unicode scalar value in UTF-16.
.It Bq Er EILSEQ
The Unicode scalar value requested cannot be encoded as a multibyte
sequence in the current locale.
Index: src/lib/libc/locale/c32rtomb.3
diff -u src/lib/libc/locale/c32rtomb.3:1.5 src/lib/libc/locale/c32rtomb.3:1.6
--- src/lib/libc/locale/c32rtomb.3:1.5 Fri Aug 16 19:39:51 2024
+++ src/lib/libc/locale/c32rtomb.3 Sat Aug 17 00:29:21 2024
@@ -1,4 +1,4 @@
-.\" $NetBSD: c32rtomb.3,v 1.5 2024/08/16 19:39:51 riastradh Exp $
+.\" $NetBSD: c32rtomb.3,v 1.6 2024/08/17 00:29:21 riastradh Exp $
.\"
.\" Copyright (c) 2024 The NetBSD Foundation, Inc.
.\" All rights reserved.
@@ -30,7 +30,7 @@
.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
.Sh NAME
.Nm c32rtomb
-.Nd Restartable UTF-32 code unit to multibyte conversion
+.Nd Restartable UTF-32 to multibyte conversion
.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
.Sh LIBRARY
.Lb libc
@@ -49,30 +49,37 @@
.Sh DESCRIPTION
The
.Nm
-function attempts to encode Unicode input as a multibyte character
-sequence output at
-.Fa s
-in the current locale, writing anywhere between zero and
+function converts Unicode scalar values to multibyte characters in the
+current locale, keeping state so it can restart after incremental
+progress.
+.Pp
+Each call to
+.Nm
+updates the conversion state
+.Fa ps
+with a UTF-32 code unit
+.Fa c32 ,
+writes up to
.Dv MB_CUR_MAX
-bytes, inclusive, to
+bytes to
.Fa s ,
-depending on the inputs and conversion state
-.Fa ps .
+and returns either the number of bytes written to
+.Fa s
+or
+.Li (size_t)-1
+to denote error.
.Pp
The input
.Fa c32
-is a UTF-32 code unit, which represents a single Unicode scalar value,
-i.e., a Unicode code point that is not in the interval [0xd800,0xdfff]
-of surrogate code points.
+is a UTF-32 code unit, representing represents a Unicode scalar value,
+i.e., a Unicode code point that is not a surrogate code point \(em in
+other words, an integer either in [0,0xd7ff] or in [0xe000,0x10ffff].
.Pp
-If a surrogate code point is passed,
-.Nm
-will return
-.Li (size_t)-1
-to denote failure with
-.Xr errno 2
-set to
-.Er EILSEQ .
+If
+.Fa s
+is a null pointer, no output is stored, but the effects on
+.Fa ps
+and the return value are unchanged.
.Pp
If
.Fa ps
@@ -131,8 +138,10 @@ printf("%s\en", buf);
.Sh ERRORS
.Bl -tag -width Bq
.It Bq Er EILSEQ
-A surrogate code point was passed as
-.Fa c32 .
+.Fa c32
+is not a Unicode scalar value, i.e., it is a surrogate code point in
+the interval [0xd800,0xdfff] or it lies outside the Unicode codespace
+[0,0x10ffff] altogether.
.It Bq Er EILSEQ
The Unicode scalar value requested cannot be encoded as a multibyte
sequence in the current locale.