Is there a definition or guideline for the distinction between plain text and rich text?
For example, in the expression 3², the exponent is a single character, "superscript two". Semantically, this expression is equivalent to 3^2, using a visible character to indicate exponentiation and then leaving the exponent in normal notation. Both seem to me clear examples of plain text. But if the circumflex were replaced by an invisible character that meant "the following number should be superscripted", would that still be plain text? Or would it be formatting that should be relegated to markup? What about a character that inhibited the composition of following Hangul jamo into a syllable? That seems to me to be markup, but if it could be replaced by a medial ZWNJ, I'm no longer sure. Is the ZWNJ another tricky case? One could say that it's an invisible formatting character whose role is simply to control how other characters are displayed, and thus it should be markup? For that matter, perhaps the normal space is a type of markup, especially when it triggers the use of a final variant in the previous character. Finally, aren't the LTR and RTL characters markup? What if we wanted characters that put a run of text into vertical directionality? One candidate guideline would be that plain text never include anything that affects non-adjacent characters. But isn't that just the equivalent of requiring repetition of markup for each character? For example, if you wanted to write 3²⁽ⁿ⁺¹⁾ with m instead of n, the plain text would be 3^2^(^m^+^1^), using ^ as a superscripting prefix. If that is acceptable as plain text, then perhaps the Unicode superscripted characters should all decompose into a superscripting prefix. Maybe I just need more sleep...