Hi Aditya, --- Aditya Gokhale <[EMAIL PROTECTED]> wrote: > I had few query regarding representation of Devanagari script in > Unicode > (Code page - 0x0900 - 0x097F). Devanagari is a writing script, is used in > Hindi, Marathi and Sanskrit languages. I have following questions - > > > In the same script code page, how do I use these two different Glyphs, to > represent the same character ? Is there any way by which I can do it in > an Open type font and Free type font implementation ?
Yes, it is certainly possible with OpenType font. Please note that FreeType is not a font format but it is a rendering library used to rasterize different kind of fonts including TrueType and OpenType fonts. In an Opentype font, you can include all glyphs with alternate shapes and then select one of them depending upon the script and language. Application should specify script and language tag while sending character codes to the opentype rendering library/engine. All substitution will be taken place depending on the language and/or script selection. There should be a default script in the font. Similarly there will be a default language for that script which will be used as fallback language if application does not specify which language to be used for processing. >From the list of alternate glyphs you may want to use the glyph for default language for an entry in cmap table. This default glyph can be substituted by alternate glyph depending upon the language specification. You have to use GSUB table and write language dependent lookup for substitution. > > 2. Implementation Query - > In an implementation where I need to send / process Hindi, Marathi > and Sanskrit data, how do I differentiate between languages (Hindi, > Marathi and Sanskrit). Say for example, I am writing a translation > engine, and I want to translate a document having Hindi, Marathi and > Sanskrit Text in it, how do I know from the code points between 0x0900 > and 0x097F, that the data under perusal is Hindi / Marathi / Sanskrit ? Unicode is not divided into code pages. Unlike few old encodings there is only one code page for entire Unicode standard. However, for better readability and quick user reference the entire chart has been divided into different sections which you might interpret as code pages. > I would suggest that we should give different code pages for Marathi, > Hindi and Sanskrit. May be current code page of Devanagari can be traded > as Hindi and two new code pages for Marathi and Sanskrit be added. This > could solve these issues. If there is any better way of solving this, any > one suggest. Unicode gives code points to script only and not language. In fact it is not desirable to give code points to individual languages falling under the same script. Also, Unicode encodes characters which have abstract meaning and properties. Unicode does not encode glyphs. The shapes of glyphs shown in the Unicode chart have been given just for convenience and not actually represent the shapes to be used in the font. The shape of the glyph for a Unicode character may vary from one font to another. Since it is already possible to select proper glyph(s) depending upon language selection, this scheme is suitable for all Indian languages. > > > 3. Character codes for jna, shra, ksh - > > In Sanskrit and Marathi jna, shra and ksh are considered as separate > characters and not ligatures. How do we take care of this ? Can I get > over all views on the matter from the group ? In my opinion they should > be given different code points in the specific language code page. > Please find below the character glyphs - > > jna > shra > ksh All of the above can be composed through following consonant clusters: jna -> ja halant nya shra -> sha halant ra ksh -> ka halant ssha The point that the above sequences are considered as characters in some of the Indian languages has merit. If there is demand from native speakers then a proposal can be submitted to Unicode. There is a predefined procedure for proposal submission. Once this is discussed with concerned people and agreed upon then these ligatures can be added in Devanagari script itself because Devenagari script represent all three languages you mentioned namely Sanskrit, Marathi, and Hindi. Meanwhile you can write rules for composing them from the consonant clusters. Regards, Keyur __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com