On Wed, Dec 9, 2015 at 5:18 AM, Martin J. Dürst <due...@it.aoyama.ac.jp> wrote: > > I suggest using HTML: > > ब<sup>क ्ष</sup> >
This will work only if the end-users are always going to use a web browser to view the text content. It will help if Unicode standard itself intrinsically supports generalised subscript/superscript text. I think the meaning of the text should be contained within the text itself rather than relying on external text markers and viewers. That way the text-content creator does not have to rely on what type of unicode compliant text viewer or editor the end user is using. The text should retain it's meaning irrespective of the type of unicode compliant text viewer or editor used. Similarly, if the text has to be saved in a database without losing it's meaning, then either it has to be saved with all the known markers of all the available editors, or some special processing needs to be incorporated to convert some saved marker to markers of various available text viewers and editors. Having generalised Unicode support for superscript and subscript will solve all these problems. Following is one of the use-cases where general Unicode support for superscript/subscript will help tremendously: A math teacher(गणिताचे शिक्षक) in a Marathi(मराठी) language school is writing notes, in her Unicode compliant plain text editor, to explain mathematical terms to her students. Following is an excerpt from the notes that explains terms such as exponents(घातांक) and base(पाया). (English translation is given below): "जेव्हा एखाद्या संखेचा स्वतःशीच अनेक वेळा गुणाकार होतो तेव्हा त्या गुणाकाराला थोडक्यात लिहिण्याच्या पद्धतीला घातांक असे म्हणतात. उदाहरणार्थ, ५ ही संख्या जर स्वतःशी ३ वेळा गुणली जात असेल, म्हणजे ५ x ५ x ५, तर त्याला घातांक पद्धतीत ५^३ असे लिहितात. ह्या घातांकीय रचनेला "५ चा ३ रा घात" असे म्हणतात. आपण अजून एक उदाहरण घेऊया, "२ ना चा १० वा घात", म्हणजे २ ही संख्या स्वतःशी १० वेळा गुणली गेली आहे. ह्याला आपण २^१० असे लिहितो. तर साधारणपणे, कूठलीही संख्या ब जेव्हा स्वतःशी क्ष वेळा गुणलीजाते तेव्हा त्याला घातांक पद्धतीत ब^क्ष असे लिहितात, आणि त्या रचनेला "ब चा क्ष वा घात" असे म्हणतात. इथे ब ह्या संखेला पाया म्हणतात आणि क्ष ह्या संखेला घात असे म्हणतात. तर थोडक्यात, घातांकीय रचनेला पाया^घात असे लिहितात." English translation: "Exponent is a shorthand notation that denotes a multiplication of a number by itself a number of times. For example, if a number 5 is multiplied by itself 3 times i.e. 5 x 5 x 5, then it is represented in an exponential form as 5^3. This exponential term is referred to as "5 raise to the power of 3". Let us consider another example, "2 raise to the power of 10", i.e. 2 is multiplied by itself 10 times. This is written in exponential form as 2^10. So, in general any number b that is multiplied by itself k number of times is written as b^k and the term is referred to as "b raise to the power of k". The number b is called the base, and the number k is called the exponent. In short, exponential term is written as base^exponent." Please note that the teacher had to use a Circumflex Accent (Caret) to indicate superscript, which is an unwritten convention, in the absence of proper superscript support within Unicode. To make the text available to wider audience and still retain it's meaning, the teacher will have to partly rely on Unicode support, partly on the markers available in the various text viewers of her students, partly on the markers available in the text editors of the peer-reviewers of her text and partly on the unwritten convention(such as the caret). This conundrum can be resolved only if there is a generalised support for superscript and subscript within Unicode standard. The standard already has a section for superscript and subscript. Generalising and extending this support will help other languages and scripts. General support for all characters, words and sentences could be achieved by just three new formatting characters, e.g. SCR, SUP and SUB, similar to the way other formatting characters such as ZWS, ZWJ, ZWNJ etc are defined. The new formatting characters could be defined as: SCR: In a character stream, all the characters following this formatting character shall be treated as normal text until either the end of the character stream or the next SUP or SUB character is reached. This shall be the default marker i.e. if no marker is specified then the text shall be treated as normal text until either the end of the character stream or the next SUP or SUB character is reached. SUP: In a character stream, all the characters following this formatting character shall be treated as superscript text until either the end of the character stream or the next SCR or SUB character is reached. SUB: In a character stream, all the characters following this formatting character shall be treated as subscript text until either the end of the character stream or the next SCR or SUP character is reached. A general support within Unicode for subscripting and superscripting text(characters and words) will tremendously help languages and scripts that are not English/Latin. Thanks and kind regards, ~Plug >> >> Hi, >> >> I am trying to understand if there is a way to use Devanagari >> characters (and grapheme clusters) as subscript and/or superscript in >> unicode text. It will help if someone could please direct me to any >> document that explains how to achieve that. Is there a unicode marker >> that will treat the next grapheme cluster in the unicode text as >> super/subscript? For e.g. if one wants to represent "ब raise to क्ष" >> how does one achieve that; is there a marker to represent it as >> follows: ब + SUP + क + ् + ष >> where SUP acts as a marker for superscripting the next grapheme >> cluster. Similar for subscripting. >> >> Sorry if this is not the right place to ask this question; in that >> case please could you direct me to the right forum? >> >> Thanks and kind regards >> >> ~Plug >> >> . >> >