Christopher John Fynn writes: > In Unicode U+0BBE, U+0BC6 and U+0BCA are all dependent vowel signs > IE is probably treating a base character and any dependent > vowels as a single > unit. Since in some fonts a base character + combining vowel > mark might be > displayed by a single ligature glyph, it makes sense to apply the > formatting of > a base character to any dependant combining characters as well. > > In Mozilla you may be completely breaking the font lookups by separately > formatting the different parts of a conjunct. > > In legacy glyph based Tamil encodings there was a simple one-to-one > correspondence characters and glyphs so it is straightforward to apply > different formatting to different characters.
Still this is an interesting problem: some texts for example want to exhibit some diacritics added to a base letter with a distinct color, notably in linguistic texts related to grammar or orthography. So for example you could want to exhibit the difference between the two French words "désert" and "dessert" by coloring the accent of the first word or the second s of the second; or even more accurately between "bailler" (concéder un bail, des baux) and "bâiller" (ouvrir en grand) where the presence or absence of the circumflex on letter 'a' is necessary to reflect the difference of both meaning and pronounciation. However, this is not a problem of Unicode itself, but of the rich-text format used to add style to a given text. In Unicode (and even in HTML and SGML), a letter 'a' followed by a circumflex is canonically equivalent to the composed latter 'a' with a circumflex. However if you add tags between a base letter and its diacritics, you create separate texts and you then have a defective combining sequence in the second string starting with the circumflex. For Unicode, this circumflex will logically attempt to create a combining sequence with its previous HTML or SGML or XML tag. This will break many parsers that use the Unicode rules when handling files encoded with a Unicode encoding scheme like UTF-8. Creating a text that use this HTML "feature" is very hazardous, as the interpretation and rendering of defective combining sequences is implementation-specific (an application may choose to render the diacritics with a base dotted circle glyph, or may display them with an base empty glyph, or associate the defective combining sequence with the previous combining sequence, or may just be unable to render this sequence, as the previous combining sequence may not be accessible in the current context of rendering). If one want really to add style to diacritics only, it's not in Unicode that you'll must search a solution, but in the styling or tagging language itself (but defining such a style rule would be extremely tricky, and adding this with intermediate tags is not conforming to the W3C recommandation for separation between text and styles). So that's an interesting question to submit to the W3C for its CSS specification... I think that Unicode will not allow you to define anything else. For now you can use a conforming solution that consists in a HTML code like this (here to render the circumflex above a in red): a<span style="position: relative; x: -6pt; color: red; "> ̂</span> or better with a style sheet: <style><!-- .diac-red {position: relative; x: -6pt; color: red;} --></style> ... a<span class="diac-red"> ̂</span> This code does not contain any defective sequence, and treats the diacritic as a separate graphic unit (it is really such if you need a style to detach it from the regular text. __________________________________________________________________ << ella for Spam Control >> has removed Spam messages and set aside Newsletters for me You can use it too - and it's FREE! http://www.ellaforspam.com