Bob Hallissy wrote:
> NB: One of the complexities you may run into, and which will limit your
> options, is that your encoding may store text in a different order than
> Unicode requires. If this is the case, TECkit can do the rearrangement for
> you but I'm not sure ICU will easily do that. Certainly the current
> standard for XML-based descriptions of encoding mappings as given in
> Unicode Technical Report 22 (see
> http://www.unicode.org/unicode/reports/tr22/ ) cannot express such
> mappings.

Someone made me notice recently that UTR#22 can indeed implement Indic
visual-to-logical mappings, provided that one chooses the whole Indic
"syllable" as a mapping unit. E.g.:

        <a b="69 73 6B 27" u="0930 094D 0938 094D 0915 093F" c="र्स्कि" />
        <!-- matraI+halfSa+Ka+Repha = Ra+Virama+Sa+Virama+Ka+matraI -->

Of course, this requires very big tables, which could be avoided using a
smarter mechanisms. Moreover, it only works with well-formed sequences in an
anticipated set of languages, but fails with misspellings or new
orthographies.

_ Marco

Reply via email to