Module Name: src
Committed By: riastradh
Date: Tue Aug 20 20:04:45 UTC 2024
Modified Files:
src/lib/libc/locale: c16rtomb.3 c32rtomb.3 c8rtomb.3
Log Message:
c8rtomb(3), c16rtomb(3), c32rtomb(3): Attempt to simplify language.
To generate a diff of this commit:
cvs rdiff -u -r1.9 -r1.10 src/lib/libc/locale/c16rtomb.3 \
src/lib/libc/locale/c32rtomb.3
cvs rdiff -u -r1.7 -r1.8 src/lib/libc/locale/c8rtomb.3
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
Modified files:
Index: src/lib/libc/locale/c16rtomb.3
diff -u src/lib/libc/locale/c16rtomb.3:1.9 src/lib/libc/locale/c16rtomb.3:1.10
--- src/lib/libc/locale/c16rtomb.3:1.9 Tue Aug 20 17:14:05 2024
+++ src/lib/libc/locale/c16rtomb.3 Tue Aug 20 20:04:45 2024
@@ -1,4 +1,4 @@
-.\" $NetBSD: c16rtomb.3,v 1.9 2024/08/20 17:14:05 riastradh Exp $
+.\" $NetBSD: c16rtomb.3,v 1.10 2024/08/20 20:04:45 riastradh Exp $
.\"
.\" Copyright (c) 2024 The NetBSD Foundation, Inc.
.\" All rights reserved.
@@ -50,8 +50,8 @@
The
.Nm
function decodes UTF-16 and converts it to multibyte characters in the
-current locale, keeping state so it can restart after incremental
-progress.
+current locale, keeping state to remember incremental progress if
+restarted.
.Pp
Each call to
.Nm
@@ -69,27 +69,6 @@ or
.Li (size_t)-1
to denote error.
.Pp
-Over successive calls to
-.Nm
-with the same state
-.Fa ps ,
-the sequence of
-.Fa c16
-values must be a well-formed UTF-16 code unit sequence, or an
-incomplete UTF-16 code unit sequence followed by null.
-If
-.Fa c16 ,
-when appended to the sequence of code units passed in previous calls,
-is not null and does not form a well-formed UTF-16 code unit sequence,
-then
-.Nm
-returns
-.Li (size_t)-1
-with
-.Xr errno 2
-set to
-.Er EILSEQ .
-.Pp
If
.Fa s
is a null pointer, no output is stored, but the effects on
@@ -98,12 +77,12 @@ and the return value are unchanged.
.Pp
If
.Fa c16
-is null,
+is zero,
.Nm
discards any pending incomplete UTF-16 code unit sequence in
.Fa ps ,
outputs a (possibly empty) shift sequence to restore the initial state
-followed by a null byte, and resets
+followed by a NUL byte, and resets
.Fa ps
to the initial conversion state.
.Pp
@@ -117,13 +96,8 @@ object with static storage duration, dis
.Vt mbstate_t
objects
.Po
-including those used by
-.Xr mbrtoc8 3 ,
-.Xr mbrtoc16 3 ,
-.Xr mbrtoc32 3 ,
-.Xr c8rtomb 3 ,
-and
-.Xr c32rtomb 3
+including those used by other functions such as
+.Xr mbrtoc16 3
.Pc ,
which is initialized at program startup to the initial conversion
state.
@@ -173,12 +147,12 @@ which is a constant upper bound on the l
.Sh ERRORS
.Bl -tag -width Bq
.It Bq Er EILSEQ
-The
.Fa c16
-input sequence does not encode a Unicode scalar value in UTF-16.
+is invalid as the next code unit in the conversion state
+.Fa ps .
.It Bq Er EILSEQ
-The Unicode scalar value requested cannot be encoded as a multibyte
-sequence in the current locale.
+The input cannot be encoded as a multibyte sequence in the current
+locale.
.It Bq Er EIO
An error occurred in loading the locale's character conversions.
.El
@@ -220,12 +194,13 @@ function first appeared in
.Nx 11.0 .
.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
.Sh BUGS
-The standard requires that a null code unit unconditionally reset the
-conversion state and output null:
+The standard requires that passing zero as
+.Fa c16
+unconditionally reset the conversion state and output a NUL byte:
.Bd -filled -offset indent
If
-.Fa c8
-is a null character, a null byte is stored, preceded by any shift
+.Fa c16
+is a null wide character, a null byte is stored, preceded by any shift
sequence needed to restore the initial shift state; the resulting state
described is the initial conversion state.
.Ed
@@ -233,7 +208,7 @@ described is the initial conversion stat
However, some implementations such as
.Fx 14.0 ,
.Ox 7.4 ,
-and glibc 2.36 ignore this clause and, if the null was preceded by an
+and glibc 2.36 ignore this clause and, if the zero was preceded by an
incomplete UTF-16 code unit sequence, fail with
.Er EILSEQ
instead.
Index: src/lib/libc/locale/c32rtomb.3
diff -u src/lib/libc/locale/c32rtomb.3:1.9 src/lib/libc/locale/c32rtomb.3:1.10
--- src/lib/libc/locale/c32rtomb.3:1.9 Tue Aug 20 17:14:05 2024
+++ src/lib/libc/locale/c32rtomb.3 Tue Aug 20 20:04:45 2024
@@ -1,4 +1,4 @@
-.\" $NetBSD: c32rtomb.3,v 1.9 2024/08/20 17:14:05 riastradh Exp $
+.\" $NetBSD: c32rtomb.3,v 1.10 2024/08/20 20:04:45 riastradh Exp $
.\"
.\" Copyright (c) 2024 The NetBSD Foundation, Inc.
.\" All rights reserved.
@@ -50,8 +50,8 @@
The
.Nm
function converts Unicode scalar values to multibyte characters in the
-current locale, keeping state so it can restart after incremental
-progress.
+current locale, keeping state to remember incremental progress if
+restarted.
.Pp
Each call to
.Nm
@@ -71,9 +71,11 @@ to denote error.
.Pp
The input
.Fa c32
-is a UTF-32 code unit, representing represents a Unicode scalar value,
-i.e., a Unicode code point that is not a surrogate code point \(em in
-other words, an integer either in [0,0xd7ff] or in [0xe000,0x10ffff].
+is a UTF-32 code unit, representing a Unicode scalar value, i.e., a
+Unicode code point that is not a surrogate code point \(em in other
+words,
+.Fa c32
+is an integer either in [0,0xd7ff] or in [0xe000,0x10ffff].
.Pp
If
.Fa s
@@ -83,10 +85,10 @@ and the return value are unchanged.
.Pp
If
.Fa c32
-is null,
+is zero,
.Nm
outputs a (possibly empty) shift sequence to restore the initial state
-followed by a null byte and resets
+followed by a NUL byte and resets
.Fa ps
to the initial conversion state.
.Pp
@@ -100,13 +102,8 @@ object with static storage duration, dis
.Vt mbstate_t
objects
.Po
-including those used by
-.Xr mbrtoc8 3 ,
-.Xr mbrtoc16 3 ,
-.Xr mbrtoc32 3 ,
-.Xr c8rtomb 3 ,
-and
-.Xr c16rtomb 3
+including those used by other functions such as
+.Xr mbrtoc32 3
.Pc ,
which is initialized at program startup to the initial conversion
state.
@@ -147,6 +144,7 @@ if (len == (size_t)-1)
assert(len <= sizeof(buf) - (s - buf));
printf("%s\en", buf);
.Ed
+.Pp
To avoid a variable-length array, this code uses
.Dv MB_LEN_MAX ,
which is a constant upper bound on the locale-dependent
@@ -160,8 +158,8 @@ is not a Unicode scalar value, i.e., it
the interval [0xd800,0xdfff] or it lies outside the Unicode codespace
[0,0x10ffff] altogether.
.It Bq Er EILSEQ
-The Unicode scalar value requested cannot be encoded as a multibyte
-sequence in the current locale.
+The input cannot be encoded as a multibyte sequence in the current
+locale.
.It Bq Er EIO
An error occurred in loading the locale's character conversions.
.El
Index: src/lib/libc/locale/c8rtomb.3
diff -u src/lib/libc/locale/c8rtomb.3:1.7 src/lib/libc/locale/c8rtomb.3:1.8
--- src/lib/libc/locale/c8rtomb.3:1.7 Tue Aug 20 17:14:05 2024
+++ src/lib/libc/locale/c8rtomb.3 Tue Aug 20 20:04:45 2024
@@ -1,4 +1,4 @@
-.\" $NetBSD: c8rtomb.3,v 1.7 2024/08/20 17:14:05 riastradh Exp $
+.\" $NetBSD: c8rtomb.3,v 1.8 2024/08/20 20:04:45 riastradh Exp $
.\"
.\" Copyright (c) 2024 The NetBSD Foundation, Inc.
.\" All rights reserved.
@@ -50,8 +50,8 @@
The
.Nm
function decodes UTF-8 and converts it to multibyte characters in the
-current locale, keeping state so it can restart after incremental
-progress.
+current locale, keeping state to remember incremental progress if
+restarted.
.Pp
Each call to
.Nm
@@ -61,35 +61,14 @@ with a UTF-8 code unit
.Fa c8 ,
writes up to
.Dv MB_CUR_MAX
-bytes to
-.Fa s
-(possibly none), and returns either the number of bytes written to
+bytes (possibly none) to
+.Fa s ,
+and returns either the number of bytes written to
.Fa s
or
.Li (size_t)-1
to denote error.
.Pp
-Over successive calls to
-.Nm
-with the same state
-.Fa ps ,
-the sequence of
-.Fa c8
-values must be a well-formed UTF-8 code unit sequence, or an
-incomplete UTF-8 code unit sequence followed by null.
-If
-.Fa c8 ,
-when appended to the sequence of code units passed in previous calls,
-is not null and does not form a well-formed UTF-8 code unit sequence,
-then
-.Nm
-returns
-.Li (size_t)-1
-with
-.Xr errno 2
-set to
-.Er EILSEQ .
-.Pp
If
.Fa s
is a null pointer, no output is stored, but the effects on
@@ -98,12 +77,12 @@ and the return value are unchanged.
.Pp
If
.Fa c8
-is null,
+is zero,
.Nm
discards any pending incomplete UTF-8 code unit sequence in
.Fa ps ,
outputs a (possibly empty) shift sequence to restore the initial state
-followed by a null byte, and resets
+followed by a NUL byte, and resets
.Fa ps
to the initial conversion state.
.Pp
@@ -117,13 +96,8 @@ object with static storage duration, dis
.Vt mbstate_t
objects
.Po
-including those used by
-.Xr mbrtoc8 3 ,
-.Xr mbrtoc16 3 ,
-.Xr mbrtoc32 3 ,
-.Xr c16rtomb 3 ,
-and
-.Xr c32rtomb 3
+including those used by other functions such as
+.Xr mbrtoc8 3
.Pc ,
which is initialized at program startup to the initial conversion
state.
@@ -173,12 +147,12 @@ which is a constant upper bound on the l
.Sh ERRORS
.Bl -tag -width Bq
.It Bq Er EILSEQ
-The
.Fa c8
-input sequence does not encode a Unicode scalar value in UTF-8.
+is invalid as the next code unit in the conversion state
+.Fa ps .
.It Bq Er EILSEQ
-The Unicode scalar value cannot be encoded as a multibyte sequence in
-the current locale.
+The input cannot be encoded as a multibyte sequence in the current
+locale.
.It Bq Er EIO
An error occurred in loading the locale's character conversions.
.El
@@ -220,8 +194,9 @@ function first appeared in
.Nx 11.0 .
.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
.Sh CAVEATS
-The standard requires that a null code unit unconditionally reset the
-conversion state and output null:
+The standard requires that passing zero as
+.Fa c8
+unconditionally reset the conversion state and output a NUL byte:
.Bd -filled -offset indent
If
.Fa c8
@@ -231,7 +206,7 @@ described is the initial conversion stat
.Ed
.Pp
However, some implementations such as glibc 2.36 ignore this clause
-and, if the null was preceded by an incomplete UTF-8 code unit
+and, if the zero was preceded by a nonempty incomplete UTF-8 code unit
sequence, fail with
.Er EILSEQ
instead.