Re: [xz-devel] Translation platform for XZ ?

Lasse Collin Sun, 03 Mar 2019 11:47:21 -0800

Hello! I'm sorry for the delay.

On 2019-02-22 Mario Blättermann wrote:
> Am Donnerstag, 21. Februar 2019, 18:38:06 CET schrieb Lasse Collin:
> > On 2019-02-17 Mario Blättermann wrote:  
> > > It would be nice if xz would be integrated into a global
> > > translation platform.  
> > 
> > Benno Schulenberg asked me about this in 2016. I didn't want to
> > think about it at that moment and then it was forgotten. :-/ Let's
> > try again now.
> >   
> He's CCed from now on.

:-) The xz-devel list only allows subscribers to post, which you
probably already noticed. This a bit inconvenient but it keeps spam
away. I get the rejected messages in my inbox still.

> > I worry that it's not that simple. My experience is that I need to
> > look through the translations because most have had some errors in
> > aligning columns in --help and --list outputs. In some cases it has
> > taken several tries until a translator has gotten it correctly
> > done. 
> > 
> > There is debug/translation.bash to see the translations in action,
> > and there are instructions in README section 4. Multiple
> > translators having similar problems suggests that there's a problem
> > in my code or instructions, but I don't know how to improve.
> >  
> In some translations, the --help output is split into two gettext
> messages: the option itself and the corresponding description. This
> way, translators don't have to bother with indentations, tab widths
> and so on. But this behavior I haven't found very often.
> Unfortunately, I don't have any coding skills, that's why I won't be
> able to help you.

I think some GNU packages use "argp" for the --help output where the
messages are split as you described. argp can be convenient and I
understand why translators may like it too. On the other hand, raw
strings give translators more control how --help is shown (e.g. they
can change the column where the description starts for all messages)
which might be useful in some (rare) cases.

argp is not in POSIX. argp is availabe in gnulib so it isn't too hard
to add it into a package. The gnulib implementation is under LGPLv3+.
xz is public domain because LZMA SDK is; I didn't want to use a more
restrictive license than the original compression code does. Thus I
don't want to use argp in xz. (There is GNU getopt_long in xz but it's
not a big problem because a compatible enough version is available on
many OSes, including all BSDs.)

Obviously argp isn't the only way to split the --help messages. I
haven't searched for other ready-made solutions though because so far I
haven't had much interest in this.

A bit more off-topic but I post it here anyway in case someone finds it
interesting or even has knowledge and energy to improve the relevant
argp code:

Splitting the strings in --help works perfectly only if the library is
sophisticated enough. Things are simple in US-ASCII, ISO-8859-*, and
such character sets, but nowadays UTF-8 is the most common. In UTF-8 a
single Unicode code point can use 1-4 bytes and each code point may use
0, 1, or 2 columns in a terminal. If these things aren't handled
properly, the --help output won't look perfect.

I tested GNU tar --help under de_DE (ISO-8859-1) and de_DE.UTF-8. It
think argp uses bytes to calculate string lengths and thus gets it
wrong under UTF-8 locale:

      --group-map=DATEI      DATEI benutzen, um GIDs und Namen der Besitzer
                             abzubilden
      --mode=ÄNDERUNGEN     den (symbolischen) Modus ÄNDERUNGEN für
                             hinzugefügte Dateien erzwingen

"den (symbolischen)" is misaligned because the Ä in ÄNDERUNGEN is two
bytes and argp thinks it takes two columns of space, while in reality
those two bytes use only one column. With ISO-8859-1 locale the
alignment is correct.

The same problem causes line-wrapping to happen too early. ISO-8859-1
version first (converted to UTF-8 for email), then UTF-8:

      --xattrs-include=MASKE das Einschluss-Muster für xattr-Schlüssel angeben
      --xattrs-include=MASKE das Einschluss-Muster für xattr-Schlüssel
                             angeben

  -P, --absolute-names       führende »/«-Zeichen in den Dateinamen erhalten
  -P, --absolute-names       führende „/“-Zeichen in den Dateinamen
                             erhalten

These aren't translator's fault, but still make the translated program
look slightly sloppy.

> > I wonder should a few experienced translators look at this first so
> > that possible problems at my side can be fixed. It doesn't sound
> > great if I get 30 new translations and 25 need similar fixes and I
> > need to explain them to each translator separately.
> >   
> Once a new translation arrives (assuming the TP robot sends it to
> this list) I will have a look at it.

I'm not sure if I understood correctly. If you meant that the TP would
send the ready-made translations to xz-devel, I guess it's a problem
due to only subscribers being able to post to xz-devel. I had thought
the translations could be sent directly to me but now I'm unsure if
that is flexible enough.

Benno Schulenberg wrote:
> For the --help output, I wouldn't worry much about the alignment; it's
> much more important that the translation is clear and grammatically
> correct.

I agree that correct language is much more important than the
alignment. However, I think it's way easier to get the alignment right
than make a good translation, so if the hard part is done, it would be
nice if the easy part gets done too. :-)

> For the --list output... I've looked at the Dutch output
> of xz-5.2.2 (that's installed on my machine) and it is... quite
> misaligned.  Not looking good.

Oh. :-( If so, it's my fault too as I thought I had checked them before
committing.

> Maybe have a look at df in coreutils.  It used to have problems with
> alignment of the column headers too, but they changed things so that
> each column header is translated separately and they are aligned
> automatically.  Or maybe have a look at util-linux -- I think it has
> a mechanism/library to create properly aligned tables.

Thanks! I quickly looked at df and I see it has code that handles the
various issues in getting the alignment right. :-) I think I cannot use
that code in xz for license reasons, but on the other hand I don't need
that fancy features in xz either, I think. There already is some
multibyte-aware code in xz because some languages use fancy characters
for thousand separators, and those need to be handled correctly to get
the alignment right.

Splitting the strings for --list is much easier than for --help (without
an external library) as --list doesn't need word wrapping. Perhaps I
should look if it is easy enough to change --list to separate strings,
or at least part of it. I suppose splitting even a few strings should
make translations easier and less error prone.

> If that is too much work, then adding a translator instruction (hint)
> as a comment before the relevant string might help a bit.  Normally
> one then adds --add-comments=TRANSLATORS to the invocation of
> xgettext.

There already are TRANSLATORS-comments and they show up in xz.pot too.

> Op 01-03-19 om 21:18 schreef Mario Blättermann:
> > XZ now could be added now to the TP.  
> 
> Okay for me.  But I need a direct request from Lasse,

From other emails I understood that the situation of the existing
translations and translators is clear now, thus I can now ask that XZ
Utils becomes part of the TP. I think version 5.2.4 is a decent
starting point:

    https://tukaani.org/xz/xz-5.2.4.tar.xz

5.2.5 will probably have no changes in translatable strings, unless I
suddenly split the strings in --list, but probably I don't want to
rush that into a bug fix release because I fear regressions.

I plan to add this to README, I hope it's good enough:

    The translations are handled via the Translation Project. If you
    wish to help translating xz, please join the Translation Project:

        http://translationproject.org/html/translators.html

> plus whether he wants to receive a notification when a translator
> uploads an update, and if yes on which email address.

I would like an email to my personal address (not xz-devel) with an URL
to the translation.

> And whether he wants the translators to have signed a disclaimer
> (normally only required for GNU software,
> https://translationproject.org/html/whydisclaim.html).

The original strings are in the public domain and the translations
should be too (strictly speaking: as far as PD is legally possible). I
think it's enough if this is written in the .po files like it is in the
existing .po files. That is, I don't request any physical papers or
such things.

Thanks to everyone involved for helping with this!

-- 
Lasse Collin  |  IRC: Larhzu @ IRCnet & Freenode

Re: [xz-devel] Translation platform for XZ ?

Reply via email to