2016-09-30 17:54 GMT+02:00 Jukka K. Korpela <jkorp...@cs.tut.fi>: > Using HTML, for example, the way to achieve that at present would be to > use markup like <span class="sub">...</span> (to avoid the problems caused > by the default formatting of <sub> and <sup>) and to use a CSS style sheet > that sets font-family suitably and uses OpenType font feature settings to > select subscript or superscript glyphs. In practice, you would need to use > @font-face to embed a suitable OpenType font. So it’s doable, but not > trivial like just slapping <sub> and </sub> around some text. >
Not needed. the <sup> and <sup> elements in HTML can be styled directly as well (also with CSS), with clear implied semantic, without needing the creation of a custom class in a non-semantic <span> element. Here the intent in the formula was clearly to designate a subscript notation (as opposed to a superscript whose meaning in formulas after the symbol of a variable is generally an exponent. Using superscripts after other symbols (such as a summation operation) generally designate something else (an upper bound). After some operators such as "C" it means a cardinal in a set from which all possible unordered combinations (distinct subsets) are counted. In cimicla formulas, superscripts and subscripts are used before or after an element to indicate some physical state (total charge, charge of the nucleus, total weight, 3D configuration for compound elements and cristalline forms, orientation, number of occurences for subgroups in complex compounds...). In formulas the supercripts and subscripts, are parsed according to the context after which they occur (which will remap these superscript or superscripts by assigning them a speficic role), but alone they are just sub/super-scripts with no other semantics added (but still keeping all the semantics of their content). For complex compounds, these subscript/superscripts are not enough and specific layouts and symbols are needed, but you cannot use simple linear plain-text to represent them without defining a specific notation convention and defining annotation terms inserted in the custom formula. Plain-text encoding will not solve the problem of representation at a character level: you'll need an upper protocol. There's an infinite way to define these protocols but they are out of scope of Unicode, which will not encode them (the same way that it does not encode orthographic conventions or script conventions for specific languages: the conventions for technical notations are creating their own language).