Re: Plain text

Asmus Freytag Wed, 28 Jul 2010 23:02:10 -0700

On 7/28/2010 9:32 PM, Doug Ewell wrote:

Murray Sargent <murrays at exchange dot microsoft dot com> wrote:
It's worth remembering that plain text is a format that wasintroduced due to the limitations of early computers. Books havealways been rendered with at least some degree of rich text. And dueto the complexity of Unicode, even Unicode plain text often needs tobe rendered with more than one font.
I disagree with this assessment of plain text. When you consider thebasic equivalence of the "same" text written in longhand by differentpeople, typed on a typewriter, finger-painted by a child,spray-painted through a stencil, etc., it's clear that the "sameness"is an attribute of the underlying plain text. None of these exampleshas anything to do with computers, old or new.

That may be, but the way Unicode plain text is designed, is based on theconcept of plain text in computers, and what that means was hashed outlong before Unicode arrived on the scene. To a large measure, whatUnicode did, was extend that concept to additional writing systems (andto historic or rarely used nooks and crannies of some of the existingwriting systems).

In the process, your definition of plain text was pulled out, dustedoff, and used as a philosophical underpinning of the enterprise - butthe technologists in the effort did not first discard any notions ofcomputer-based plain text before proceeding. In other words, claiming aclean break between the existing "ASCII" plain text and Unicode would bea falsification.

I do agree that rich text has existed for a long time, possibly aslong as plain text (though I doubt that, when you consider reallyearly writing technologies like palm leaves), but I don't think thatrefutes the independent existence of plain text. And I don't thinkthe need to use more than one font to render some Unicode text impliesit isn't plain text. I think that has more to do with aesthetics (arich-text concept) and technical limits on font size.

No, it's not headings and the like. If you pull together a selection ofordinary books in the English language and remove rich text attributes,you will find a considerable fraction of the works will exhibit subtlechanges in meaning - these works require italics to mark emphasis inplaces where the same sequence of words can be read in different ways.

Scholarly works require italics for citations - absent italics, someother method would need to be introduced to mark titles, without anydesignation, there can and will be ambiguities.


Hence, not all texts can be expressed as plain text.

If you take a German text, rendered (by a human typesetter) in Frakturand rendered (by a later typesetter) in Antiqua, you will find that thesecond version has less information in it, when you encode both texts ona computer. And many texts that can be represented as plain text if theyare to be rendered in Antiqua cannot be plain text if they are to berendered according to the rules of typesetting a work in the Frakturstyle - again, we are talking ordinary running text, no headings,bibliographies or anything.

The additional information is not of an aesthetic or stylistic nature,but tied to the meaning of certain words - that which Unicode callssemantic.In other words, the text, as rendered in Antiqua, allows for potentialambiguities - not necessarily fatal ones, because context may easilyresolve them, but they are there, nevertheless.

This is just one example how the concept of an abstract content of apiece of text is not nearly as clearcut as you might think.

On the contrary, the definition of Unicode plain text is straightforward: a sequence of Unicode characters without any style information.

A./

Re: Plain text

Reply via email to