On 13.11.2018 18:12, Mark Thomas wrote:
Removing the [OT] marker as I think this is very much on topic.

On 13/11/2018 15:33, André Warnier (tomcat) wrote:
Holy Smoke (Fumée Sacrée | Sagrado Humo | Heilige Rauch) !
How many messages are in that code ?

Currently there are 2747 unique terms.

Seems time to add some AI-translate add-on to the code.

That is supported but it has to be paid for. That was something I was
thinking about. I have 10k characters of free translation (POEditor uses
either Google translate or Microsoft Automatic Translation) with my
POEditor account. The Tomcat messages average ~67.5 characters per
message so those free credits should be able to translate just under 150
messages.

To put it another way, automatic translation of the 2000 untranslated
French messages would cost less than $10 USD.

Hmm. The Tomcat project has a little over GBP 800 in the bank to cover
the up front costs of the next Tomca,t conference.

Here is a thought. I try automatic translation of as many French
messages as I can with the 10k free characters. You review them (you can
filter by automatic translation and then mark them as proof read). If
you think the automatic translations are worthwhile, I get the PMC to
vote on spending some of that money on automatic translation. For
example, if we spent ~$55 we could do automatic translation for just
over 10 complete languages.

Are you up for that?


I was half-kidding, but what I was really thinking of, was a Valve which would use some AI to translate the messages going out, on-the-fly.

The vast majority of the messages which I've seen so far (and attempted to translate to French), are error messages, which either go to the logs (in majority I presume), or to the user as some kind of error response (of which the status codes should be identifiable). A good number of terms in them (50% ?) are either untranslatable, or should not be translated because they point to Classnames and the like (so, "reserved words"). The rest looks like a limited vocabulary and "filler" words, such as "can", "cannot", "disallowed", "parameter", "directory", "request", "response", "committed" .. The majority of the messages also look like they would make sense only to a public of programmers, which are used to deal with english-speaking-only programming languages (only Java in this case), and (I believe) are not so picky about the finer points of style or syntax (well, at least not the ones I know) ;-). The thing is also that one really needs such a translation only when things go wrong or during development/testing, so it could be turned off (default) and on only when needed, using some dynamic parameter e.g. (the Manager, anyone ?).

That all looks to me like it may make sense, and it should not be so difficult, to apply some automated (and optional) translation to them on-the-fly. And such a thing may save *a lot* of maintenance and contributed time over the years, don't you think ?

Note that this is not in any way meant to denigrate the enthousiasm and literary talent of the people having contributed so far. But let's face it : due to the very nature of the beast itself, to the length limit etc., most of what comes out looks like Denglish or Frenglish or Spanglish anyway (and has to be so, to be really helpful). So maybe we might as well bite the bullet.. Also, AI sounds hot again nowadays, and having the first Apache software which implements an automatic on-the-fly translator-assistant for messages should be a hit, no ?

As a final marketing spiel, I would add that the inevitable initial vagaries of the AI-assistant, would probably add much enjoyment to the arduous task of debugging one's code. And if one can switch the language on the fly, it may even fulfill some educational purpose.



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to