Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)

Stephan Stiller Sat, 20 Apr 2013 19:49:12 -0700

I am wondering whether it would be a good idea for there to be a list of 
numbered preset sentences that are an international standard and then if Google 
chose to front end Google Translate with precise translations of that list of 
sentences made by professional linguists who are native speakers, then there 
could be a system that can produce a translation that is precise for the 
sentences that are on the list and machine translated for everything else.

Phrase-based machine translation goes much further: it already lets youpair up far more sentences than would fit into a standard with a limitedcode inventory such as Unicode, and it lets you pair up phrases. Thefact that translations are not precise is a problem that has to do withcontext and with natural language per se.

Maybe there could then just be two special Unicode characters, one to indicate 
that the number of a preset sentence is to follow and one to indicate that the 
number has finished.

That would belong into a higher-level protocol, not Unicode.

If that were the case then there might well not be symbols for the sentences, 
yet the precise conveying of messages as envisaged in the simulations would 
still be achievable.

The sentences will be as precise as the scope of the sentence inventoryallows. Enumerating sentences or phrasal fragments (I'm hesitant to talkof "phrases", which for me have constituent nature, but maybe that'sjust me) is unrealistic unless you are trying to cover only a /very/limited domain. If all you encode is (say) requests for meals with the100 most frequently wanted combinations of nutritional restrictions,your sentence inventory will encode those requests precisely, but assoon as you're trying to make adjustments to your formulaic requests(you're willing to eat /any/ vegetarian, gluten-free meal each time ofthe day and day of the year? of /any/ size?), the sentences won't be ofuse anymore. This is really why an approach that enumerates large textchunks is unworkable. (I won't say "useless", but of limited use;"point-at-me" picture books and imprecise translations are likely to doa tolerable job already.) The number of sentences you'll need will beexponential in the number of ingredient options you are intending tovary over. In any case, we are all left guessing about the intendedcoverage of any set of sentences you have mind. From your previouswritings I'm guessing (as implied earlier) that you mean something like"travel and emergency communication", but that is already a largedomain. If you try to delimit the coverage and come up with a finitelist of sentences, you will see that you'll end up with far too many.You'd also need to think about how to make these sentences accessible(via number/ID? that would be difficult or require training for the userif the number of sentences isn't very small). What if you only want theinventory of a travel phrasebook? For that, you have the travelphrasebook (hierarchically organized, not by number), and I have heardof limited-domain computers/apps for crisis situations (the detailselude me at the moment).

Perhaps that is the way forward for some aspects of communication through the 
language barrier.

You would need to specify which problems precisely you are attempting tosolve, what is wrong with the approaches presently available, andwhy/how your approach does a better job.


Stephan

Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)

Reply via email to