Re: POSIX msgfmt and universal-character-name escape sequences

2022-06-28 Thread Bruno Haible via austin-group-l at The Open Group
Geoff Clare wrote: > In today's teleconference we discussed this and formulated the following > response... > > If a C17 source file contains calls to gettext family functions > that pass string literals containing \u sequences, xgettext will > write those strings literals to the .po

Re: POSIX xgettext and the initial domain directive

2022-06-28 Thread Bruno Haible via austin-group-l at The Open Group
Geoff Clare wrote: > we struck out part about -d so that it reads: > > The first directive in each created dot-po file shall be a domain > directive giving the associated domain name, except that this > directive is optional in the default output file. > > This allows both the

Re: POSIX xgettext and the -s option

2022-06-28 Thread Bruno Haible via austin-group-l at The Open Group
Geoff Clare wrote: > > Suggestion: Remove the '-s' option from the standard. > > In today's teleconference we struck out the text relating to -s and > added to RATIONALE explaining why it is being omitted. Thank you! Bruno

Re: POSIX gettext(): lifetime of returned values

2022-06-23 Thread Bruno Haible via austin-group-l at The Open Group
Geoff Clare wrote: > We believe that all of your comments have now been addressed. ... Once > you have reviewed this last change, we plan to clean up the document Thanks for the prompt. I have reviewed the specifications of msgfmt and xgettext, and sent 7 comments about them. Bruno

POSIX xgettext: -K option description

2022-06-23 Thread Bruno Haible via austin-group-l at The Open Group
https://posix.rhansen.org/p/gettext_draft Lines 1202..1211 In line 1164, the argument to the -K option is called 'pattern'. Issue: In lines 1202..1211 it is called 'keyword'. Suggestion: Use the same term 'pattern' here as well, instead of 'keyword'. Rationale: In the 1st, 3rd, and 4th case,

POSIX xgettext example

2022-06-23 Thread Bruno Haible via austin-group-l at The Open Group
https://posix.rhansen.org/p/gettext_draft Line 1293 Issue: The list of -K options is incomplete, as they don't handle the dgettext_l, dcgettext_l, dngettext_l, dcngettext_l function invocations. Suggestion: Add these options: -K gettext_l:1 -K dgettext_l:2 -K dcgettext_l:2 -K ngettext_l:1,2 -K

POSIX xgettext and the -s option

2022-06-23 Thread Bruno Haible via austin-group-l at The Open Group
https://posix.rhansen.org/p/gettext_draft Lines 1164, 1166, 1187, 1221-1222 Issue: The option '-s' has been found to be counter-productive in practice, and therefore has been deprecated in GNU gettext. See https://savannah.gnu.org/bugs/?61249 . Suggestion: Remove the '-s' option from the

POSIX xgettext and the initial domain directive

2022-06-23 Thread Bruno Haible via austin-group-l at The Open Group
https://posix.rhansen.org/p/gettext_draft Line 1183 "The first directive in each created dot-po file shall be a domain directive giving the associated domain name" GNU gettext currently does not do this. Solaris gettext does it. The msgfmt program allows the initial domain directive to be absent

POSIX msgfmt and newlines in strings

2022-06-23 Thread Bruno Haible via austin-group-l at The Open Group
https://posix.rhansen.org/p/gettext_draft Line 1067 "Unlike shell command language strings, double-quoted strings in dot-po files cannot contain a literal character." Issue: This sentence should be part of the specification of the dot-po file format. Suggestion: Move this sentence from the

POSIX msgfmt and escape sequences in msgid and msgid_plural strings

2022-06-23 Thread Bruno Haible via austin-group-l at The Open Group
https://posix.rhansen.org/p/gettext_draft Line 1031 "C-language escape sequences in message strings shall be processed as specified for character string literals in the ISO C standard ..." Issue: The way this is written, it is not possible to write, in a dot-po file: msgid "Program

POSIX msgfmt and universal-character-name escape sequences

2022-06-23 Thread Bruno Haible via austin-group-l at The Open Group
https://posix.rhansen.org/p/gettext_draft Line 1031 "except that universal-character-name escape sequences need not be supported." Neither GNU msgfmt nor Solaris msgfmt treat universal-character-name escape sequences specially. If an msgstr contains e.g. "\\u20AC", the resulting string in the

Re: POSIX gettext(): lifetime of returned values

2022-06-22 Thread Bruno Haible via austin-group-l at The Open Group
Geoff Clare wrote: > > I hope this explains it: how gettext() can be implemented in a reasonable > > way, without limiting the use of uselocale(). > > In today's teleconference we changed the etherpad text to require that > uselocale() does not invalidate the returned string. That's great! Thank

Re: POSIX gettext(): changes to the .mo file

2022-05-26 Thread Bruno Haible via austin-group-l at The Open Group
Robert Elz wrote: > I would also guess that a side effect of the way it was described > is that changes to the on disc backing store (the .mo file, or > whatever) will not be detected while the application remains > running, and that aside from execing itself to restart clean > there is no way for

Re: POSIX gettext(): lifetime of returned values

2022-05-24 Thread Bruno Haible via austin-group-l at The Open Group
Thank you for the reply. Geoff Clare wrote: > > https://posix.rhansen.org/p/gettext_draft > > Line 357 > > ... > > If temporarily switching a thread's locale through uselocale() > > invalidates the gettext functions' results (even if only those from > > the same thread), it effectively disallows

Re: POSIX gettext(): behaviour if iconv() produces a replacement character

2022-05-24 Thread Bruno Haible via austin-group-l at The Open Group
Thank you for the reply. Geoff Clare wrote: > > https://posix.rhansen.org/p/gettext_draft > > Line 350 > > In today's call we made changes along the lines you suggest. Please > check the updated etherpad to see if they achieve what you wanted. The new text achieves what I wanted; thank you.

Re: POSIX bind_textdomain_codeset(): some invalid codeset arguments

2022-05-24 Thread Bruno Haible via austin-group-l at The Open Group
Thank you for the reply. Geoff Clare wrote: > > https://posix.rhansen.org/p/gettext_draft > > Line 573 > > In today's call we made changes along the lines you suggest. Please > check the updated etherpad to see if they achieve what you wanted. The change is good, from my POV. Thank you. Bruno

Re: POSIX gettext with option -s: handling of \c escape sequence

2022-05-24 Thread Bruno Haible via austin-group-l at The Open Group
Thanks for the reply. Geoff Clare wrote: > > This is NOT entirely how the gettext program from GNU gettext behaves. > > Namely, > > it also looks whether some of the strings contain a '\c' sequence, in order > > to > > emulate what BSD 'echo' does: > > > > $ gettext -s -e 'ab\c' | od -t c > >

Re: POSIX gettext(): multithread-safe or not?

2022-05-24 Thread Bruno Haible via austin-group-l at The Open Group
Thank you for the reply. https://posix.rhansen.org/p/gettext_draft Line 357 Geoff Clare wrote: > However, we have rearranged the wording in a way that > we hope makes it clearer it is a requirement on implementations. Thank you; it is clearer now. It would be even clearer if there was a

Re: POSIX bind_textdomain_codeset(): some invalid codeset arguments

2022-05-11 Thread Bruno Haible via austin-group-l at The Open Group
Steffen Nurpmeso wrote: > ... > | [.] "UTF-7"." > > That is overshoot. No. UTF-7 is invalid here because it produces output that is not NUL terminated. See: $ printf 'ab\0' | iconv -t UTF-7 | od -t c 000 a b + A A A - 007 strlen() on such a return value makes invalid

POSIX gettext: a typo

2022-05-11 Thread Bruno Haible via austin-group-l at The Open Group
https://posix.rhansen.org/p/gettext_draft Line 668 Typo: msgid_pural -> msgid_plural

POSIX gettext() and NLSPATH

2022-05-11 Thread Bruno Haible via austin-group-l at The Open Group
https://posix.rhansen.org/p/gettext_draft Line 130 "indicates that catopen( ) should look ..." What does the gettext family of functions do when NLSPATH is set to this value? Line 136 "indicates that the gettext family of functions ..." What does the catopen() function when NLSPATH is set to this

POSIX and restrict

2022-05-11 Thread Bruno Haible via austin-group-l at The Open Group
https://posix.rhansen.org/p/gettext_draft Lines 163..230, 538..543 The 'restrict' keywords in these declarations are useless and - worse - forbid some valid, useful calls. For example, there is nothing wrong with dgettext("hello", "hello") which will attempt to search for a translation of

POSIX gettext(): choosing the domain name

2022-05-11 Thread Bruno Haible via austin-group-l at The Open Group
https://posix.rhansen.org/p/gettext_draft Line 50 "often named after the application that provides the collection" Issue: On my system, in /usr/share/locale/de/LC_MESSAGES/ there are 55 .mo files for libraries. Suggestion: Change "after the application" -> "after the application or library"

POSIX gettext(): behaviour if iconv() produces a replacement character

2022-05-11 Thread Bruno Haible via austin-group-l at The Open Group
https://posix.rhansen.org/p/gettext_draft Line 350 "If a significant proportion of the converted message string would consist of characters resulting from non-identical conversions ..." The term "significant proportion" is undefined. Suggestion: Change "If a significant proportion of the

POSIX bind_textdomain_codeset(): some invalid codeset arguments

2022-05-11 Thread Bruno Haible via austin-group-l at The Open Group
https://posix.rhansen.org/p/gettext_draft Line 573 "The application shall ensure that the codeset argument, if non-empty, is a valid codeset name that can be used as the tocode argument of the iconv_open() function." This is not the only requirement. We also need the requirement that the NUL

POSIX gettext with option -s: handling of \c escape sequence

2022-05-11 Thread Bruno Haible via austin-group-l at The Open Group
https://posix.rhansen.org/p/gettext_draft Lines 699, 721 "if the -n option is not specified, a shall be written after the last message string" "(if -n is not also specified) append a to the output." This is NOT entirely how the gettext program from GNU gettext behaves. Namely, it also looks

POSIX msgfmt: effect of LC_CTYPE on PO file parsing

2022-05-11 Thread Bruno Haible via austin-group-l at The Open Group
https://posix.rhansen.org/p/gettext_draft Line 960 "Do we need to say this isn't used for message strings, only for parsing the .po file?" The .po file format has a mechanism for specifying the codeset of the PO file. See line 1009. Therefore LC_CTYPE is *not used* for the interpretation of the

POSIX gettext(): multithread-safe or not?

2022-05-11 Thread Bruno Haible via austin-group-l at The Open Group
https://posix.rhansen.org/p/gettext_draft Line 357 "The returned string shall not be ... invalidated by a subsequent call to a gettext family function." It is not clear whether this sentence is an assertion (regarding how the gettext() implementation behaves) or a requirement/restriction w.r.t.

POSIX gettext(): lifetime of returned values

2022-05-11 Thread Bruno Haible via austin-group-l at The Open Group
https://posix.rhansen.org/p/gettext_draft Line 357 "The returned string may be invalidated by ... a subsequent call to uselocale() in the same thread, except for calls that only query values." As explained in my mail from 2021-05-04 [1] uselocale() is a helper function to implement *_l

POSIX gettext(): Use of LANGUAGE in the POSIX locale

2022-05-11 Thread Bruno Haible via austin-group-l at The Open Group
https://posix.rhansen.org/p/gettext_draft Line 65 "The locale names in LANGUAGE shall take precedence over <...>" Issue: If this is true in all cases, then 1) programs such as 'diff' https://pubs.opengroup.org/onlinepubs/9699919799/utilities/diff.html - which are forced to produce a specific

POSIX gettext(): messages catalog lookup when LANGUAGE is set

2022-05-11 Thread Bruno Haible via austin-group-l at The Open Group
https://posix.rhansen.org/p/gettext_draft Lines 308..309 "o attempt to locate a suitable messages object..." o attempt to retrieve the string identified by msgid from the messages object" and line 342, 344 "the pathname used to locate the messages object shall be

POSIX gettext(): messages catalog lookup when LANGUAGE is not set

2022-05-11 Thread Bruno Haible via austin-group-l at The Open Group
https://posix.rhansen.org/p/gettext_draft Lines 335, 344 "For portable applications, only the LANGUAGE search supports searches across multiple locale names." "For the LANGUAGE search, ... if a locale name has the format language[_territory][.codeset][@modifier], additional searches of

Re: POSIX msgfmt and duplicate msgids

2022-05-11 Thread Bruno Haible via austin-group-l at The Open Group
Eric Blake wrote: > In the msgfmt(1) utility, there is currently a difference between GNU > and Illumos implementations on detecting duplicate msgid strings, and > which command line switch(es) make detection of duplicates possible. > The question is whether GNU msgfmt would be willing to use the

Re: POSIX xgettext and dgettext() calls

2022-05-11 Thread Bruno Haible via austin-group-l at The Open Group
https://posix.rhansen.org/p/gettext_draft Lines 1173..1179 > on Solaris, the resulting .po file is called "foobar.po" and contains the > msgid "test". Confirmed; it's like this on OmniOS and OpenIndiana. > Running it on GNU, the resulting .po file is called "messages.po" and there > is no

Re: POSIX gettext() and uselocale()

2022-01-17 Thread Bruno Haible via austin-group-l at The Open Group
Geoff Clare wrote: > The current draft says: > > The returned string may be invalidated by a subsequent call to > bind_textdomain_codeset(), bindtextdomain(), setlocale(), or > textdomain() in the same process, or a subsequent call to > uselocale() in the same thread, except for

POSIX gettext() and uselocale()

2022-01-16 Thread Bruno Haible via austin-group-l at The Open Group
[First sent on 2021-05-03. Resending because it has not been handled.] https://posix.rhansen.org/p/gettext_draft says (line 358): "The returned string may be invalidated by a subsequent call to bind_textdomain_codeset(), bindtextdomain(), setlocale(), textdomain(), or uselocale()."

POSIX gettext() and the installation directories for .mo files

2022-01-16 Thread Bruno Haible via austin-group-l at The Open Group
[First sent on 2021-05-03. Resending because it has not been fully handled.] https://posix.rhansen.org/p/gettext_draft says (line 343..345) "For each locale name in LANGUAGE, or if LANGUAGE is not set or is empty, or no suitable messages object is found in processing LANGUAGE, the

Re: Question from Austin Group regarding standardization of msgfmt

2022-01-16 Thread Bruno Haible via austin-group-l at The Open Group
Hi, Eric Blake wrote: > The Austin Group (the standards body in charge of the POSIX document) > is trying to standardize the gettext(3) family of functions, as well > as command line tools such as gettext(1) and xgettext(1). You can > track the efforts here, and if you have comments, I'm happy

Re: Question regarding gettext behavior on iconv failure

2021-05-04 Thread Bruno Haible via austin-group-l at The Open Group
Carlos O'Donell wrote: > > 1 Empfaenger Chinese (??,???,??) ?? > > * For the second line of output, in the first three cases, iconv() > > did transliteration, and the result was always an ASCII string. > > (The quality of glibc's transliteration of Hanzi characters to > >

Re: Question regarding gettext behavior on iconv failure

2021-05-04 Thread Bruno Haible via austin-group-l at The Open Group
Eric Ackermann wrote: > please find attached another test case (a shortened version of the > example in the gettext proposal that Eric Blake linked). It uses the > same mail.po and mail-utf8.po files that you provided earlier. > When I compile and run it on Ubuntu 20.04 (Ubuntu GLIBC >

POSIX gettext() and the locale category

2021-05-03 Thread Bruno Haible via austin-group-l at The Open Group
https://posix.rhansen.org/p/gettext_split says (line 217): "All of the functions in the gettext family of functions, except dcgettext(), search for messages objects only in the LC_MESSAGES category." dcgettext_l, dcngettext, dcngettext_l also search in the specified category. Suggested

POSIX gettext() and chdir()

2021-05-03 Thread Bruno Haible via austin-group-l at The Open Group
https://posix.rhansen.org/p/gettext_split says (line 273): "The bindtextdomain() function shall not perform pathname resolution on dirname (that is done by the gettext family of functions)." This is indeed how GNU gettext and GNU libc behave. However, this is not optimal: 1) If the

POSIX gettext() and uselocale()

2021-05-03 Thread Bruno Haible via austin-group-l at The Open Group
https://posix.rhansen.org/p/gettext_split says (line 92): "The returned string may be invalidated by a subsequent call to bind_textdomain_codeset(), bindtextdomain(), setlocale(), textdomain(), or uselocale()." While in most programs setlocale(), textdomain(), bindtextdomain(),

POSIX gettext() and the installation directories for .mo files

2021-05-03 Thread Bruno Haible via austin-group-l at The Open Group
https://posix.rhansen.org/p/gettext_split says (line 77..79) "For each locale name in LANGUAGE, or if LANGUAGE is not set or is empty, or no suitable messages object is found in processing LANGUAGE, the pathname used to locate the messages object shall be

POSIX gettext() and the LANGUAGE environment variable

2021-05-03 Thread Bruno Haible via austin-group-l at The Open Group
https://posix.rhansen.org/p/gettext_split says (line 72) "For the LANGUAGE search, the value of the LANGUAGE environment variable shall be a list of one or more locale names separated by a colon (':') character." This is NOT how GNU gettext behaves. If POSIX standardizes it like this,

POSIX gettext() and iconv_open()

2021-05-03 Thread Bruno Haible via austin-group-l at The Open Group
https://posix.rhansen.org/p/gettext_split says (line 85) "The conversion shall be performed as if by a call to iconv() using a conversion descriptor returned by iconv_open(, )." This is NOT how GNU gettext behaves. If POSIX standardizes it like this, GNU libc and GNU gettext will have

Re: Question regarding gettext behavior on iconv failure

2021-05-03 Thread Bruno Haible via austin-group-l at The Open Group
Hi Eric, > The example in question set up several .po files and a specific > environment to test various pluralization/transcoding fallbacks, and > concludes with a snippet where a string with an encoding error in > ISO-8859-1 is output in spite of an iconv failure, rather than the > string

Re: SIGSTKSZ is now a run-time variable

2021-03-09 Thread Bruno Haible via austin-group-l at The Open Group
Eric Blake wrote: > I can open a defect against POSIX if we decide that is needed, but want > some consensus first on whether it is glibc's change that went too far, > or POSIX's requirements that are too restrictive for what glibc wants to do. Thanks for opening the discussion, Eric. Here are a

Re: Coordination on standardizing gettext() in future POSIX

2020-01-24 Thread Bruno Haible
Jörg Schilling wrote: > OK, then use: > > printf "$(gettext 'Hello World %d')\\n" $$ This is a good solution to the problem of gettext with sentences with embedded arguments. I withdraw my objection about "half of the necessary API". The other objection, that we should avoid copying the

Re: Coordination on standardizing gettext() in future POSIX

2020-01-22 Thread Bruno Haible
Jörg Schilling wrote: > > It is well-known that the escape sequence expansion in 'echo' was different > > in System V and BSD systems. You can assume that when Ulrich Drepper started > > out writing GNU gettext in 1995, he did NOT want to copy the System V > > behaviour > > of 'echo' into the

Re: Coordination on standardizing gettext() in future POSIX

2020-01-22 Thread Bruno Haible
Joerg Schilling wrote: > It is obvious that gettext(1) must expand escape sequences by default since > this is the documented default behavior for both Solaris gettext(1) and GNU > gettext(1) but in the default case, GNU gettext does not behave the way it is > documented. What you call the

Re: Coordination on standardizing gettext() in future POSIX

2020-01-21 Thread Bruno Haible
Hi Jörg, Regarding the gettext(1) program and whether it expands escape sequences by default: 1) [1] is ambiguous / self-contradictory. On one hand it says: This utility interprets C escape sequences such as \t for tab. Use \\ to print a backslash... Which sounds like they are expanded by

Re: Coordination on standardizing gettext() in future POSIX

2018-08-10 Thread Bruno Haible
Joerg Schilling wrote: > From looking at the current manuals available on Linux systems, I propose to > start with the manual pages from OpenSolaris since these manual pages seem to > be closer from being complete enough for the POSIX standard. The text from LI18NUX [1] and LSB [2] would be a

Re: Coordination on standardizing gettext() in future POSIX

2018-08-09 Thread Bruno Haible
Eric Blake wrote: > Jörg Schilling is interested in standardizing gettext() and friends in a > future version of POSIX (as a replacement to the hard-to-use catgets() > that is currently standardized). See > http://austingroupbugs.net/view.php?id=1122 Thanks for the heads-up. I added a couple