2011/7/2 Jukka K. Korpela <jkorp...@cs.tut.fi>: > And there is really no guarantee that programs support the soft hyphen. For > one, Microsoft Word doesn’t—it treats it as just another printable > character.
You're wrong, it DOES. I just tested it (in Microsoft Word 2010 for Windows 7) within a random long word (aaaaaaaaaa....) and the SHY is recognized to generate the intended hyphenation break. And SHY does not invalidate the spell corrector in a long non-random word (I tried within "anticonstitutionnellement", the longest word in French). It can effectively be used in a discretionary way, including at non canonical positions where the default hyphenator proposes some other position or does not hyphenate at all. However your point is correct : SHY is not an *orthographic* character. It is strictly a formatting character intended as an hint for the typesetting of documents. Regarfing the previous comment about the Danish "aa", given that Danish normal orthography uses å now for all cases where a legacy "aa" digram would have been used, there's no need to insert any format control for other accidental occurences of "aa" as separate letters: the default for Danish is certainly to disable the recognition of the legacy digram "aa" if "å" is usable directly in the same context. The legacy use in Danish would have been old ASCII-encoded texts. But anyway in this context you would not even have any format control and no choice than leaving the ambiguity about the digram. Note that I also don't think that it's necessary to specially encode any joiner or disjoiner control in the middle of candidate digrams/trigrams. If it is used, it must be discretionary within specific documents (already typesetted in their righ-text format), and such control should be clearly ignorable in case the text was exported and reimported into another document.