Module Name:    src
Committed By:   martin
Date:           Sun Jan 14 15:15:00 UTC 2024

Modified Files:
        src/usr.bin/mklocale [netbsd-10]: mklocale.1 yacc.y

Log Message:
Pull up following revision(s) (requested by rin in ticket #538):

        usr.bin/mklocale/yacc.y: revision 1.35
        usr.bin/mklocale/yacc.y: revision 1.36
        usr.bin/mklocale/mklocale.1: revision 1.18
        usr.bin/mklocale/mklocale.1: revision 1.19

mklocale: XXX: Neglect TODIGIT at the moment
PR lib/57798

It was implemented with an assumption that all digit characters
can be mapped to numerical values <= 255.
This is no longer true for Unicode, and results in, e.g., wrong
return values of wcwidth(3) for U+5146 or U+16B60.

As a workaround, neglect TODIGIT for now, as done for OpenBSD:
https://github.com/OpenBSD/src/commit/4efe9bdeb34
XXX

At least netbsd-10 should be fixed, but it requires some tests.

mklocale(1): Add range check for TODIGIT, rather than disabling it
PR lib/57798

Digit value specified by TODIGIT is storaged as lowest 8 bits of
_RuneType, see lib/libc/locale/runetype_file.h:
https://nxr.netbsd.org/xref/src/lib/libc/locale/runetype_file.h#56

The symptom reported in the PR is due to missing range check for
this value; values of 256 and above were mistakenly treated as
other flag bits in _RuneType.

For example, U+5146 has numerical value 1000,000,000,000 ==
0xe8d4a51000 where __BITS(30, 31) == _RUNETYPE_SW3 are turned on.

This is why wcwidth(3) returned 3 for this character.

This apparently affected not only character width, but also other
attributes storaged in _RuneType.

IIUC, digit value attributes in _RuneType have never been utilized
until now, but preserve these if digit fits within (0, 256). This
should be safer for pulling this up into netbsd-10. Also, these
attributes may be useful to implement some I18N features as
suggested by uwe@ in the PR.

netbsd-[98] is not affected as these use old UTF-8 ctype definitions.


To generate a diff of this commit:
cvs rdiff -u -r1.17 -r1.17.16.1 src/usr.bin/mklocale/mklocale.1
cvs rdiff -u -r1.34 -r1.34.8.1 src/usr.bin/mklocale/yacc.y

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.

Modified files:

Index: src/usr.bin/mklocale/mklocale.1
diff -u src/usr.bin/mklocale/mklocale.1:1.17 src/usr.bin/mklocale/mklocale.1:1.17.16.1
--- src/usr.bin/mklocale/mklocale.1:1.17	Mon Jul  3 21:34:20 2017
+++ src/usr.bin/mklocale/mklocale.1	Sun Jan 14 15:15:00 2024
@@ -1,4 +1,4 @@
-.\" $NetBSD: mklocale.1,v 1.17 2017/07/03 21:34:20 wiz Exp $
+.\" $NetBSD: mklocale.1,v 1.17.16.1 2024/01/14 15:15:00 martin Exp $
 .\" FreeBSD: src/usr.bin/mklocale/mklocale.1,v 1.6 1999/09/20 09:15:21 phantom Exp
 .\"
 .\" Copyright (c) 1993, 1994
@@ -33,7 +33,7 @@
 .\"
 .\"	@(#)mklocale.1	8.2 (Berkeley) 4/18/94
 .\"
-.Dd July 15, 2013
+.Dd January 5, 2024
 .Dt MKLOCALE 1
 .Os
 .Sh NAME
@@ -210,7 +210,11 @@ is the integer value represented by
 For example, the ASCII character
 .Sq 0
 would map to the decimal value 0.
-Only values up to 255 are allowed.
+On
+.Nx ,
+this information has never been used until now.
+Only values up to 255 are allowed, and mapping to 256 and above is
+silently ignored.
 .El
 .Pp
 The following keywords may appear multiple times and have the following

Index: src/usr.bin/mklocale/yacc.y
diff -u src/usr.bin/mklocale/yacc.y:1.34 src/usr.bin/mklocale/yacc.y:1.34.8.1
--- src/usr.bin/mklocale/yacc.y:1.34	Sun Oct 13 21:12:32 2019
+++ src/usr.bin/mklocale/yacc.y	Sun Jan 14 15:15:00 2024
@@ -1,4 +1,4 @@
-/*	$NetBSD: yacc.y,v 1.34 2019/10/13 21:12:32 christos Exp $	*/
+/*	$NetBSD: yacc.y,v 1.34.8.1 2024/01/14 15:15:00 martin Exp $	*/
 
 %{
 /*-
@@ -43,7 +43,7 @@
 static char sccsid[] = "@(#)yacc.y	8.1 (Berkeley) 6/6/93";
 static char rcsid[] = "$FreeBSD$";
 #else
-__RCSID("$NetBSD: yacc.y,v 1.34 2019/10/13 21:12:32 christos Exp $");
+__RCSID("$NetBSD: yacc.y,v 1.34.8.1 2024/01/14 15:15:00 martin Exp $");
 #endif
 #endif /* not lint */
 
@@ -390,11 +390,18 @@ set_digitmap(rune_map *map, rune_list *l
     while (list) {
 	rune_list *nlist = list->next;
 	for (i = list->min; i <= list->max; ++i) {
-	    if (list->map + (i - list->min)) {
+	    /*
+	     * XXX PR lib/57798
+	     * Currently, we support mapping up to 255. Attempts to map
+	     * 256 (== _RUNETYPE_A) and above are silently ignored.
+	     */
+	    _RuneType digit = list->map + (i - list->min);
+	    if (digit > 0 && digit <= 0xff) {
 		rune_list *tmp = (rune_list *)xmalloc(sizeof(rune_list));
+		memset(tmp, 0, sizeof(*tmp));
 		tmp->min = i;
 		tmp->max = i;
-		add_map(map, tmp, list->map + (i - list->min));
+		add_map(map, tmp, digit);
 	    }
 	}
 	free(list);

Reply via email to