Geoff Clare wrote:
> In today's teleconference we discussed this and formulated the following
> response...
>
> If a C17 source file contains calls to gettext family functions
> that pass string literals containing \u sequences, xgettext will
> write those strings literals to the .po
Geoff Clare wrote:
> we struck out part about -d so that it reads:
>
> The first directive in each created dot-po file shall be a domain
> directive giving the associated domain name, except that this
> directive is optional in the default output file.
>
> This allows both the
Geoff Clare wrote:
> > Suggestion: Remove the '-s' option from the standard.
>
> In today's teleconference we struck out the text relating to -s and
> added to RATIONALE explaining why it is being omitted.
Thank you!
Bruno
Geoff Clare wrote:
> We believe that all of your comments have now been addressed. ... Once
> you have reviewed this last change, we plan to clean up the document
Thanks for the prompt. I have reviewed the specifications of msgfmt and
xgettext, and sent 7 comments about them.
Bruno
https://posix.rhansen.org/p/gettext_draft
Lines 1202..1211
In line 1164, the argument to the -K option is called 'pattern'.
Issue: In lines 1202..1211 it is called 'keyword'.
Suggestion: Use the same term 'pattern' here as well, instead of 'keyword'.
Rationale: In the 1st, 3rd, and 4th case,
https://posix.rhansen.org/p/gettext_draft
Line 1293
Issue: The list of -K options is incomplete, as they don't handle the
dgettext_l, dcgettext_l, dngettext_l, dcngettext_l function invocations.
Suggestion: Add these options:
-K gettext_l:1 -K dgettext_l:2 -K dcgettext_l:2 -K ngettext_l:1,2 -K
https://posix.rhansen.org/p/gettext_draft
Lines 1164, 1166, 1187, 1221-1222
Issue: The option '-s' has been found to be counter-productive in practice,
and therefore has been deprecated in GNU gettext.
See https://savannah.gnu.org/bugs/?61249 .
Suggestion: Remove the '-s' option from the
https://posix.rhansen.org/p/gettext_draft
Line 1183
"The first directive in each created dot-po file shall be a domain directive
giving the associated domain name"
GNU gettext currently does not do this. Solaris gettext does it.
The msgfmt program allows the initial domain directive to be absent
https://posix.rhansen.org/p/gettext_draft
Line 1067
"Unlike shell command language strings, double-quoted strings in dot-po files
cannot contain a literal character."
Issue: This sentence should be part of the specification of the dot-po file
format.
Suggestion: Move this sentence from the
https://posix.rhansen.org/p/gettext_draft
Line 1031
"C-language escape sequences in message strings shall be processed as
specified for character string literals in the ISO C standard ..."
Issue: The way this is written, it is not possible to write, in a dot-po file:
msgid "Program
https://posix.rhansen.org/p/gettext_draft
Line 1031
"except that universal-character-name escape sequences need not be supported."
Neither GNU msgfmt nor Solaris msgfmt treat universal-character-name
escape sequences specially. If an msgstr contains e.g. "\\u20AC", the
resulting string in the
Geoff Clare wrote:
> > I hope this explains it: how gettext() can be implemented in a reasonable
> > way, without limiting the use of uselocale().
>
> In today's teleconference we changed the etherpad text to require that
> uselocale() does not invalidate the returned string.
That's great! Thank
Robert Elz wrote:
> I would also guess that a side effect of the way it was described
> is that changes to the on disc backing store (the .mo file, or
> whatever) will not be detected while the application remains
> running, and that aside from execing itself to restart clean
> there is no way for
Thank you for the reply.
Geoff Clare wrote:
> > https://posix.rhansen.org/p/gettext_draft
> > Line 357
> > ...
> > If temporarily switching a thread's locale through uselocale()
> > invalidates the gettext functions' results (even if only those from
> > the same thread), it effectively disallows
Thank you for the reply.
Geoff Clare wrote:
> > https://posix.rhansen.org/p/gettext_draft
> > Line 350
>
> In today's call we made changes along the lines you suggest. Please
> check the updated etherpad to see if they achieve what you wanted.
The new text achieves what I wanted; thank you.
Thank you for the reply.
Geoff Clare wrote:
> > https://posix.rhansen.org/p/gettext_draft
> > Line 573
>
> In today's call we made changes along the lines you suggest. Please
> check the updated etherpad to see if they achieve what you wanted.
The change is good, from my POV. Thank you.
Bruno
Thanks for the reply.
Geoff Clare wrote:
> > This is NOT entirely how the gettext program from GNU gettext behaves.
> > Namely,
> > it also looks whether some of the strings contain a '\c' sequence, in order
> > to
> > emulate what BSD 'echo' does:
> >
> > $ gettext -s -e 'ab\c' | od -t c
> >
Thank you for the reply.
https://posix.rhansen.org/p/gettext_draft
Line 357
Geoff Clare wrote:
> However, we have rearranged the wording in a way that
> we hope makes it clearer it is a requirement on implementations.
Thank you; it is clearer now. It would be even clearer if there was a
Steffen Nurpmeso wrote:
> ...
> | [.] "UTF-7"."
>
> That is overshoot.
No. UTF-7 is invalid here because it produces output that is not NUL
terminated. See:
$ printf 'ab\0' | iconv -t UTF-7 | od -t c
000 a b + A A A -
007
strlen() on such a return value makes invalid
https://posix.rhansen.org/p/gettext_draft
Line 668
Typo: msgid_pural -> msgid_plural
https://posix.rhansen.org/p/gettext_draft
Line 130
"indicates that catopen( ) should look ..."
What does the gettext family of functions do when NLSPATH is set to this
value?
Line 136
"indicates that the gettext family of functions ..."
What does the catopen() function when NLSPATH is set to this
https://posix.rhansen.org/p/gettext_draft
Lines 163..230, 538..543
The 'restrict' keywords in these declarations are useless and - worse -
forbid some valid, useful calls. For example, there is nothing wrong
with
dgettext("hello", "hello")
which will attempt to search for a translation of
https://posix.rhansen.org/p/gettext_draft
Line 50
"often named after the application that provides the collection"
Issue: On my system, in /usr/share/locale/de/LC_MESSAGES/ there are
55 .mo files for libraries.
Suggestion: Change
"after the application"
->
"after the application or library"
https://posix.rhansen.org/p/gettext_draft
Line 350
"If a significant proportion of the converted message string would consist
of characters resulting from non-identical conversions ..."
The term "significant proportion" is undefined.
Suggestion: Change
"If a significant proportion of the
https://posix.rhansen.org/p/gettext_draft
Line 573
"The application shall ensure that the codeset argument, if non-empty, is a
valid codeset name that can be used as the tocode argument of the iconv_open()
function."
This is not the only requirement. We also need the requirement that the NUL
https://posix.rhansen.org/p/gettext_draft
Lines 699, 721
"if the -n option is not specified, a shall be written after the
last message string"
"(if -n is not also specified) append a to the output."
This is NOT entirely how the gettext program from GNU gettext behaves. Namely,
it also looks
https://posix.rhansen.org/p/gettext_draft
Line 960
"Do we need to say this isn't used for message strings, only for parsing
the .po file?"
The .po file format has a mechanism for specifying the codeset of the
PO file. See line 1009. Therefore LC_CTYPE is *not used* for the
interpretation of the
https://posix.rhansen.org/p/gettext_draft
Line 357
"The returned string shall not be ... invalidated by a subsequent call
to a gettext family function."
It is not clear whether this sentence is an assertion (regarding how the
gettext() implementation behaves) or a requirement/restriction w.r.t.
https://posix.rhansen.org/p/gettext_draft
Line 357
"The returned string may be invalidated by ... a subsequent call to
uselocale() in the same thread, except for calls that only query values."
As explained in my mail from 2021-05-04 [1]
uselocale() is a helper function to implement *_l
https://posix.rhansen.org/p/gettext_draft
Line 65
"The locale names in LANGUAGE shall take precedence over <...>"
Issue: If this is true in all cases, then
1) programs such as 'diff'
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/diff.html
- which are forced to produce a specific
https://posix.rhansen.org/p/gettext_draft
Lines 308..309
"o attempt to locate a suitable messages object..."
o attempt to retrieve the string identified by msgid from the messages
object"
and line 342, 344
"the pathname used to locate the messages object shall be
https://posix.rhansen.org/p/gettext_draft
Lines 335, 344
"For portable applications, only the LANGUAGE search supports searches
across multiple locale names."
"For the LANGUAGE search, ... if a locale name has the format
language[_territory][.codeset][@modifier], additional searches of
Eric Blake wrote:
> In the msgfmt(1) utility, there is currently a difference between GNU
> and Illumos implementations on detecting duplicate msgid strings, and
> which command line switch(es) make detection of duplicates possible.
> The question is whether GNU msgfmt would be willing to use the
https://posix.rhansen.org/p/gettext_draft
Lines 1173..1179
> on Solaris, the resulting .po file is called "foobar.po" and contains the
> msgid "test".
Confirmed; it's like this on OmniOS and OpenIndiana.
> Running it on GNU, the resulting .po file is called "messages.po" and there
> is no
Geoff Clare wrote:
> The current draft says:
>
> The returned string may be invalidated by a subsequent call to
> bind_textdomain_codeset(), bindtextdomain(), setlocale(), or
> textdomain() in the same process, or a subsequent call to
> uselocale() in the same thread, except for
[First sent on 2021-05-03. Resending because it has not been handled.]
https://posix.rhansen.org/p/gettext_draft
says (line 358):
"The returned string may be invalidated by a subsequent call to
bind_textdomain_codeset(), bindtextdomain(), setlocale(),
textdomain(), or uselocale()."
[First sent on 2021-05-03. Resending because it has not been fully handled.]
https://posix.rhansen.org/p/gettext_draft
says (line 343..345)
"For each locale name in LANGUAGE, or if LANGUAGE is not set or is
empty, or no suitable messages object is found in processing LANGUAGE,
the
Hi,
Eric Blake wrote:
> The Austin Group (the standards body in charge of the POSIX document)
> is trying to standardize the gettext(3) family of functions, as well
> as command line tools such as gettext(1) and xgettext(1). You can
> track the efforts here, and if you have comments, I'm happy
Carlos O'Donell wrote:
> > 1 Empfaenger Chinese (??,???,??) ??
> > * For the second line of output, in the first three cases, iconv()
> > did transliteration, and the result was always an ASCII string.
> > (The quality of glibc's transliteration of Hanzi characters to
> >
Eric Ackermann wrote:
> please find attached another test case (a shortened version of the
> example in the gettext proposal that Eric Blake linked). It uses the
> same mail.po and mail-utf8.po files that you provided earlier.
> When I compile and run it on Ubuntu 20.04 (Ubuntu GLIBC
>
https://posix.rhansen.org/p/gettext_split
says (line 217):
"All of the functions in the gettext family of functions, except
dcgettext(), search for messages objects only in the LC_MESSAGES
category."
dcgettext_l, dcngettext, dcngettext_l also search in the specified
category.
Suggested
https://posix.rhansen.org/p/gettext_split
says (line 273):
"The bindtextdomain() function shall not perform pathname resolution
on dirname (that is done by the gettext family of functions)."
This is indeed how GNU gettext and GNU libc behave. However, this is
not optimal:
1) If the
https://posix.rhansen.org/p/gettext_split
says (line 92):
"The returned string may be invalidated by a subsequent call to
bind_textdomain_codeset(), bindtextdomain(), setlocale(),
textdomain(), or uselocale()."
While in most programs setlocale(), textdomain(), bindtextdomain(),
https://posix.rhansen.org/p/gettext_split
says (line 77..79)
"For each locale name in LANGUAGE, or if LANGUAGE is not set or is
empty, or no suitable messages object is found in processing LANGUAGE,
the pathname used to locate the messages object shall be
https://posix.rhansen.org/p/gettext_split
says (line 72)
"For the LANGUAGE search, the value of the LANGUAGE environment
variable shall be a list of one or more locale names separated
by a colon (':') character."
This is NOT how GNU gettext behaves. If POSIX standardizes it like this,
https://posix.rhansen.org/p/gettext_split
says (line 85)
"The conversion shall be performed as if by a call to iconv() using a
conversion descriptor returned by iconv_open(,
)."
This is NOT how GNU gettext behaves. If POSIX standardizes it like this,
GNU libc and GNU gettext will have
Hi Eric,
> The example in question set up several .po files and a specific
> environment to test various pluralization/transcoding fallbacks, and
> concludes with a snippet where a string with an encoding error in
> ISO-8859-1 is output in spite of an iconv failure, rather than the
> string
Eric Blake wrote:
> I can open a defect against POSIX if we decide that is needed, but want
> some consensus first on whether it is glibc's change that went too far,
> or POSIX's requirements that are too restrictive for what glibc wants to do.
Thanks for opening the discussion, Eric.
Here are a
Jörg Schilling wrote:
> OK, then use:
>
> printf "$(gettext 'Hello World %d')\\n" $$
This is a good solution to the problem of gettext with sentences with
embedded arguments.
I withdraw my objection about "half of the necessary API".
The other objection, that we should avoid copying the
Jörg Schilling wrote:
> > It is well-known that the escape sequence expansion in 'echo' was different
> > in System V and BSD systems. You can assume that when Ulrich Drepper started
> > out writing GNU gettext in 1995, he did NOT want to copy the System V
> > behaviour
> > of 'echo' into the
Joerg Schilling wrote:
> It is obvious that gettext(1) must expand escape sequences by default since
> this is the documented default behavior for both Solaris gettext(1) and GNU
> gettext(1) but in the default case, GNU gettext does not behave the way it is
> documented.
What you call the
Hi Jörg,
Regarding the gettext(1) program and whether it expands escape sequences
by default:
1) [1] is ambiguous / self-contradictory.
On one hand it says:
This utility interprets C escape sequences such as \t for tab. Use \\ to
print a backslash...
Which sounds like they are expanded by
Joerg Schilling wrote:
> From looking at the current manuals available on Linux systems, I propose to
> start with the manual pages from OpenSolaris since these manual pages seem to
> be closer from being complete enough for the POSIX standard.
The text from LI18NUX [1] and LSB [2] would be a
Eric Blake wrote:
> Jörg Schilling is interested in standardizing gettext() and friends in a
> future version of POSIX (as a replacement to the hard-to-use catgets()
> that is currently standardized). See
> http://austingroupbugs.net/view.php?id=1122
Thanks for the heads-up. I added a couple
54 matches
Mail list logo