Perhaps it would be a good idea to compare the translated text to the text that the user wants to save.
If they are more than 95% the same, that means that the user didn't take the effort to correct the text. Cheers, Micru On Wed, May 3, 2017 at 10:31 AM, Wojciech Pędzich <wpedz...@gmail.com> wrote: > It does depend a lot on the engagement level of the human behind the > keyboard. When I deal with machine-translated text, I simply wonder whether > the someone behind the keyboard took efforts to actually read the piece. > > Now whether this would work if limited to namespaces outside "main" - I do > not want to demonise the issue, but if the person submitting the text for > machine translation does not read it, what will stop them from a quick > ctrl+c / ctrl+v? Just asking. > > Wojciech > > W dniu 2017-05-03 o 09:33, Yaroslav Blanter pisze: > > Creating machine translations only in the draft space (or in the user space >> in the projects which do not have draft) could help. >> >> Cheers >> Yaroslav >> >> On Tue, May 2, 2017 at 10:16 PM, Pharos <pharosofalexand...@gmail.com> >> wrote: >> >> I think it all depends on the level of engagement of the human translator. >>> >>> When the tool is used in the right way, it is a fantastic tool. >>> >>> Maybe we can find better methods to nudge people toward taking their time >>> and really doing work on their translations. >>> >>> Thanks, >>> Pharos >>> >>> On Tue, May 2, 2017 at 4:09 PM, Bodhisattwa Mandal < >>> bodhisattwa.rg...@gmail.com> wrote: >>> >>> Content translation with Yandex is also a problem in Bengali Wikipedia. >>>> Some users have grown a tendency to create machine translated >>>> meaningless >>>> articles with this extension to increase edit count and article count. >>>> >>> This >>> >>>> has increased the workloads of admins to find and delete those articles. >>>> >>>> Yandex is not ready for many languages and it is better to shut it. We >>>> don't need it in Bengali. >>>> >>>> Regards >>>> On May 3, 2017 12:17 AM, "John Erling Blad" <jeb...@gmail.com> wrote: >>>> >>>> Actually this _is_ about turning ContentTranslation off, that is what >>>>> several users in the community want. They block people using the >>>>> >>>> extension >>>> >>>>> and delete the translated articles. Use of ContentTranslation has >>>>> >>>> become >>> >>>> a >>>> >>>>> rather contentious case. >>>>> >>>>> Yandex as a general translation engine to be able to read some alien >>>>> language is quite good, but as an engine to produce written text it is >>>>> >>>> not >>>> >>>>> very good at all. In fact it often creates quite horrible Norwegian, >>>>> >>>> even >>> >>>> for closely related languages. One quite common problem is reordering >>>>> >>>> of >>> >>>> words into meaningless constructs, an other problem is reordering >>>>> >>>> lexical >>> >>>> gender in weird ways. The English preposition "a" is often translated >>>>> >>>> as >>> >>>> "en" in a propositional phrase, and then the gender is added to the >>>>> following phrase. That gives a translation of "Oppland is a county >>>>> >>>> in…" >>> >>>> into something like "Oppland er en fylket i…" This should be "Oppland >>>>> >>>> er >>> >>>> et fylke i…". >>>>> >>>>> (I just checked and it seems like Yandex messes up a lot less now than >>>>> previously, but it is still pretty bad.) >>>>> >>>>> Apertium works because the language is closely related, Yandex does not >>>>> work because it is used between very different languages. People try to >>>>> >>>> use >>>> >>>>> Yandex and gets disappointed, and falsely conclude that all language >>>>> translations are equally weird. They are not, but Yandex translations >>>>> >>>> are >>> >>>> weird. >>>>> >>>>> The numerical threshold does not work. The reason is simple, the number >>>>> >>>> of >>>> >>>>> fixes depends on language constructs that fails, and that is simply >>>>> >>>> not a >>> >>>> constant for small text fragments. Perhaps if we could flag specific >>>>> language constructs that is known to give a high percentage of >>>>> >>>> failures, >>> >>>> and if the translator must check those sentences. One such language >>>>> construct is disappearances between the preposition and the gender of >>>>> >>>> the >>> >>>> following term in a prepositional phrase. If they are not similar, then >>>>> >>>> the >>>> >>>>> sentence must be checked. It is not always wrong to write "en jenta" in >>>>> Norwegian, but it is likely to be wrong. >>>>> >>>>> A language model could be a statistical model for the language itself, >>>>> >>>> not >>>> >>>>> for the translation into that language. We don't want a perfect >>>>> >>>> language >>> >>>> model, but a sufficient language model to mark weird constructs. A very >>>>> simple solution could simply be to mark tri-grams that does not >>>>> >>>> already >>> >>>> exist in the text base for the destination as possible errors. It is >>>>> >>>> not >>> >>>> necessary to do a live check, but at least do it before the page can >>>>> >>>> be >>> >>>> saved. >>>>> >>>>> Note the difference in what Yandex do and what we want to achieve; >>>>> >>>> Yandex >>> >>>> translates a text between two different languages, without any clear >>>>> >>>> reason >>>> >>>>> why. It is not to important if there are weird constructs in the text, >>>>> >>>> as >>> >>>> long as it is usable in "some" context. We translate a text for the >>>>> >>>> purpose >>>> >>>>> of republishing it. The text should be usable and easily readable in >>>>> >>>> that >>> >>>> language. >>>>> >>>>> >>>>> >>>>> On Tue, May 2, 2017 at 7:07 PM, Amir E. Aharoni < >>>>> amir.ahar...@mail.huji.ac.il> wrote: >>>>> >>>>> 2017-05-02 18:20 GMT+03:00 John Erling Blad <jeb...@gmail.com>: >>>>>> >>>>>> Brute force solution; turn the ContentTranslation off. Really >>>>>>> >>>>>> stupid >>> >>>> solution. >>>>>>> >>>>>> >>>>>> ... Then I guess you don't mind that I'm changing the thread name :) >>>>>> >>>>>> >>>>>> The next solution; turn the Yandex engine off. That would solve a >>>>>>> part of the problem. Kind of lousy solution though. >>>>>>> >>>>>>> What about adding a language model that warns when the language >>>>>>> >>>>>> constructs >>>>>> >>>>>>> gets to weird? It is like a "test" for the translation. The CT is >>>>>>> >>>>>> used >>>> >>>>> for >>>>>> >>>>>>> creating a translation, but the language model is used for >>>>>>> >>>>>> verifying >>> >>>> if >>>> >>>>> the >>>>>> >>>>>>> translation is good enough. If it does not validate against the >>>>>>> >>>>>> language >>>>> >>>>>> model it should simply not be published to the main name space. It >>>>>>> >>>>>> will >>>> >>>>> still be possible to create a draft, but then the user is >>>>>>> >>>>>> completely >>> >>>> aware >>>>>> >>>>>>> that the translation isn't good enough. >>>>>>> >>>>>>> Such a language model should be available as a test for any >>>>>>> >>>>>> article, >>> >>>> as >>>> >>>>> it >>>>>> >>>>>>> can be used as a quality measure for the article. It is really a >>>>>>> >>>>>> quantity >>>>> >>>>>> measure for the well-spokenness of the article, but that isn't >>>>>>> >>>>>> quite >>> >>>> so >>>> >>>>> intuitive. >>>>>>> >>>>>>> So, I'll allow myself to guess that you are talking about one >>>>>> >>>>> particular >>>> >>>>> language, probably Norwegian. >>>>>> >>>>>> Several technical facts: >>>>>> >>>>>> 1. In the past there were several cases in which translators to >>>>>> >>>>> different >>>> >>>>> languages who reported common translation mistakes to me. I passed >>>>>> >>>>> them >>> >>>> on >>>>> >>>>>> to Yandex developers, with whom I communicate quite regularly. They >>>>>> acknowledged receiving all of them. I am aware of at least one such >>>>>> >>>>> common >>>>> >>>>>> mistake that was fixed; possibly there were more. If you can give me >>>>>> >>>>> a >>> >>>> list >>>>> >>>>>> of such mistakes for Norwegian, I'll be very happy to pass them on. I >>>>>> absolutely cannot promise that they will be fixed upstream, but it's >>>>>> possible. >>>>>> >>>>>> 2. In Norwegian, Apertium is used for translating between the two >>>>>> >>>>> varieties >>>>> >>>>>> of Norwegian itself (Bokmål and Nynorsk), and from other Scandinavian >>>>>> languages. That's probably why it works so well—they are similar in >>>>>> grammar, vocabulary, and narrative style (I'll pass it on to Apertium >>>>>> developers—I'm sure they'll be happy to hear it). Unfortunately, >>>>>> >>>>> machine >>>> >>>>> translation from English is not available in Apertium. Apertium works >>>>>> >>>>> best >>>>> >>>>>> with very similar languages, and English has two characteristics, >>>>>> >>>>> which >>> >>>> are >>>>> >>>>>> unfortunate when combined: it is both the most popular source for >>>>>> translation into almost all other languages (including Norwegian), >>>>>> >>>>> and >>> >>>> it >>>> >>>>> is not _very_ similar to any other languages (except maybe Scots). >>>>>> >>>>> Machine >>>>> >>>>>> translation from English into Norwegian is only possible with Yandex >>>>>> >>>>> at >>> >>>> the >>>>> >>>>>> moment. More engines may be added in the future, but at the moment >>>>>> >>>>> that's >>>> >>>>> all we have. That's why disabling Yandex completely would indeed be a >>>>>> >>>>> lousy >>>>> >>>>>> solution: A lot of people say that without machine translation >>>>>> >>>>> integration >>>>> >>>>>> Content Translation is useless. Not all users think like that, but >>>>>> >>>>> many >>> >>>> do. >>>>> >>>>>> 3. We can define a numerical threshold of acceptable percentage of >>>>>> >>>>> machine >>>>> >>>>>> translation post-editing. Currently it's 75%. It's a tad >>>>>> >>>>> embarrassing, >>> >>>> but >>>>> >>>>>> it's hard-coded at the moment, but it can be very easily be made >>>>>> >>>>> into a >>> >>>> variable per language. If the translator tries to publish a page in >>>>>> >>>>> which >>>> >>>>> less than that is modified, a warning will be shown. >>>>>> >>>>>> 4. I'm not sure what do you mean by "language model". If it's any >>>>>> >>>>> kind >>> >>>> of a >>>>> >>>>>> linguistic engine, then it's definitely not within the resources that >>>>>> >>>>> the >>>> >>>>> Language team itself can currently dedicate. However, if somebody who >>>>>> >>>>> knows >>>>> >>>>>> Norwegian and some programming will write a script that analyzes >>>>>> >>>>> common >>> >>>> bad >>>>> >>>>>> constructs in a Wikipedia dump, this will be very useful. This would >>>>>> basically be an upgraded version of suggestion #1 above. (In my spare >>>>>> >>>>> time >>>>> >>>>>> as a volunteer I'm doing something comparable for Hebrew, although >>>>>> >>>>> not >>> >>>> for >>>>> >>>>>> translation, but for improving how MediaWiki link trails work.) >>>>>> _______________________________________________ >>>>>> Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ >>>>>> wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ >>>>>> wiki/Wikimedia-l >>>>>> New messages to: Wikimedia-l@lists.wikimedia.org >>>>>> Unsubscribe: https://lists.wikimedia.org/ >>>>>> >>>>> mailman/listinfo/wikimedia-l, >>> >>>> <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe> >>>>>> >>>>> _______________________________________________ >>>>> Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ >>>>> wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ >>>>> wiki/Wikimedia-l >>>>> New messages to: Wikimedia-l@lists.wikimedia.org >>>>> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, >>>>> <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe> >>>>> >>>> _______________________________________________ >>>> Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ >>>> wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ >>>> wiki/Wikimedia-l >>>> New messages to: Wikimedia-l@lists.wikimedia.org >>>> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, >>>> <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe> >>>> >>>> _______________________________________________ >>> Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ >>> wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ >>> wiki/Wikimedia-l >>> New messages to: Wikimedia-l@lists.wikimedia.org >>> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, >>> <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe> >>> >>> _______________________________________________ >> Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wik >> i/Mailing_lists/Guidelines and https://meta.wikimedia.org/wik >> i/Wikimedia-l >> New messages to: Wikimedia-l@lists.wikimedia.org >> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, >> <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe> >> > > > > _______________________________________________ > Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wik > i/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l > New messages to: Wikimedia-l@lists.wikimedia.org > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe> > -- Etiamsi omnes, ego non _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>