https://bugs.kde.org/show_bug.cgi?id=434995

            Bug ID: 434995
           Summary: wish: Check for simple string features and compare
                    original text and translation
           Product: lokalize
           Version: 20.12.3
          Platform: Archlinux Packages
                OS: Linux
            Status: REPORTED
          Severity: wishlist
          Priority: NOR
         Component: editor
          Assignee: sdepi...@gmail.com
          Reporter: war...@gmx.de
                CC: sha...@ukr.net
  Target Milestone: ---

Dear devs,

I just finished reviewing a bigger po file which had lots of discrepancies
between original and translated language in terms of how the strings ended. In
particular, there were many inconsistencies regarding the full stop in many
tooltip texts. This made me think that this is a rather easy thing to spot
programmatically and write up a wish about it.

Especially in larger files with either many strings (in large applications) or
with huge texts (think handbooks) checking such details is tedious work and
thus is usually not done in one session. I can imagine it would be of great
assistance if there were an automated process that gathers some simple string
metrics and compares them between the two languages in a file.

Some metrics I can think of:
- the ending of the string (punctuation mark)
- number of sentences
- number of line breaks
- the number of placeholders ("%" + number)
- presence of plural forms
- presence and count of HTML tags (not necessarily entities such as &kde;, but
structural things like <param>, <span> or <strong>)

Ideally, those metrics can all be toggled by the user, because not all are
applicable to every use case. For example, comparing the number of sentences is
not useful when translating long paragraphs of a documentation handbook,
because sometimes the text needs to be restructured due to language
peculiarities.

Once a metrics mismatch between a string’s two languages is found, the entry is
marked in some way. Perhaps as a new status. A string that currently would be
shown as Finished, but has a mismatch, is instead marked as Caution (or a
better-fitting term). For any translation status other than Finished, I think
the user has to look at the string anyways, so in such cases there is no need
to point him towards the metrics.

I also don’t think it’s necessary to show this in the project overview. First
and foremost of course due to the performance impact. But also because, as I
said, it is rather dependent on the context and those metrics are not really a
quantitative measure, but rather a qualitative one for a quick overview or for
a final sanity check before committing the file. Thus it would suffice to do
the comparison only in the editor tab of an individual file.

What do you think?

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to