RE: A last missing link for interoperable representation

Tex via Unicode Fri, 11 Jan 2019 02:46:30 -0800

Martin,

James is making the case there is demand or a user need and that the proof is 
that users are using inconsistent tactics to simulate a solution to their 
problem.


The response that:
"Almost by definition, styled text isn't plain text, even if it's simulated by 
something else." 
is a bit like Humpty Dumpty saying words mean what I want them to mean. 

Most of the emoji aren't plain text and Unicode has them in abundance. Ruby 
text is also not plain text. Their inclusion was the user need for consistency 
and interoperability. The original emoji had inconsistent encodings and were a 
problem for interchange as well as search and rendering. Their existence and 
popularity became their own problem requiring further styling (e.g. coloring) 
and greatly expanded enumeration (foods, animals, et al.) Let's be honest and 
admit the actual demand for some of these latter objects in plain text is 
marginal and certainly is less than the prevalence of italics.

The response that:
"the simulation is highly limited, as the voicing examples and the fact that 
the math alphanumerics only cover basic Latin have shown." unless I 
misunderstand your meaning, is the argument that we encoded only these 
therefore the use case is limited to these.

In a different message you say:
"Also, in contrast to the issue discussed in the current thread, there's no 
consistent or widely deployed solution for such CJK variants in rich text 
scenarios such as HTML."
I don't see how a rich text solution has any bearing on plain text. We could 
take the point that if there was no need in HTML to solve the problem than 
there wasn't demand justifying the need in Unicode. :-)
 I understand your actual intent to say there was a need for CJK variants and 
there was no other solution. However, the fact that there is a rich text 
solution for italics isn't helpful to plain text users.
HTML had bidirectional isolates and after the fact Unicode encoded them as well.

The fact that there isn't a consistent way to represent stress or the other 
uses for italics (or obliques, and bold, etc.) does make certain searches 
across large numbers of plain texts problematic. In the same way it is 
sometimes important to distinguish capitalized text when searching (polish vs 
Polish) it would be helpful to do the same for italicized text. For example, if 
I am searching for the movie title "Contact" vs. all the places where texts 
reference a personal "Contact", distinguishing italicized titles would help. 
And to the extent that users are inserting non-standardized punctuation or 
other characters for "styling" it makes reliable searching difficult. As James 
mentioned it helps with interoperability as well.

In the '90s it made sense to resist styling plain text. In the 2020's, with 
more than 100k characters, numerous pictures and character adornments, it seems 
anachronistic to be arguing against a handful of control characters that would 
standardize a common text requirement. Most rendering systems will handle it 
easily and any plain text editor or other software that supports a combining 
strikethrough character would easily adapt a combining italicize or a combining 
bold character.

tex

RE: A last missing link for interoperable representation

Reply via email to