The proposed transliteration mechanism, while being quite flexible
already through the rule mechanism, suffers from the principal
weakness of having to the morphology of the underlying word.

For example, in Arabic ZDMG transcription, one can transliterate the
sequence [Xuwwx] (i.e. strong consonant - damma - waw + shadda - vowel)
in two ways: as [X 016B 0077 x] or as [X 0075 0077 0077 x], depending
on whether the first w represents the long vowel u or the consonant w
in the Arabic script, which is indiscernible from the Arabic script.
For correctly transcribing this, the system needs detailed knowledge
of Arabic noun and verb paradigms, which probably is beyond the scope
of rule-based transliteration in the ICU framework.

Now I do admit that this is a highly specialized case. I could imagine
similar cases in other language/script environments as well, however.
Unless one designs an extremely complicated ruleset, automatic
transliteration will not achieve 100% accuracy (which I don't know if
it's your goal) This goes well beyond the scope
of character-based transliteration, though.

Greetings
 Philipp                            mailto:[EMAIL PROTECTED]
__________________________
With searching comes loss / And the presence of absence / The server, not found



Reply via email to