On 7/28/2010 9:32 PM, Doug Ewell wrote:
Murray Sargent <murrays at exchange dot microsoft dot com> wrote:

It's worth remembering that plain text is a format that was introduced due to the limitations of early computers. Books have always been rendered with at least some degree of rich text. And due to the complexity of Unicode, even Unicode plain text often needs to be rendered with more than one font.

I disagree with this assessment of plain text. When you consider the basic equivalence of the "same" text written in longhand by different people, typed on a typewriter, finger-painted by a child, spray-painted through a stencil, etc., it's clear that the "sameness" is an attribute of the underlying plain text. None of these examples has anything to do with computers, old or new.
That may be, but the way Unicode plain text is designed, is based on the concept of plain text in computers, and what that means was hashed out long before Unicode arrived on the scene. To a large measure, what Unicode did, was extend that concept to additional writing systems (and to historic or rarely used nooks and crannies of some of the existing writing systems).

In the process, your definition of plain text was pulled out, dusted off, and used as a philosophical underpinning of the enterprise - but the technologists in the effort did not first discard any notions of computer-based plain text before proceeding. In other words, claiming a clean break between the existing "ASCII" plain text and Unicode would be a falsification.

I do agree that rich text has existed for a long time, possibly as long as plain text (though I doubt that, when you consider really early writing technologies like palm leaves), but I don't think that refutes the independent existence of plain text. And I don't think the need to use more than one font to render some Unicode text implies it isn't plain text. I think that has more to do with aesthetics (a rich-text concept) and technical limits on font size.
No, it's not headings and the like. If you pull together a selection of ordinary books in the English language and remove rich text attributes, you will find a considerable fraction of the works will exhibit subtle changes in meaning - these works require italics to mark emphasis in places where the same sequence of words can be read in different ways.

Scholarly works require italics for citations - absent italics, some other method would need to be introduced to mark titles, without any designation, there can and will be ambiguities.

Hence, not all texts can be expressed as plain text.

If you take a German text, rendered (by a human typesetter) in Fraktur and rendered (by a later typesetter) in Antiqua, you will find that the second version has less information in it, when you encode both texts on a computer. And many texts that can be represented as plain text if they are to be rendered in Antiqua cannot be plain text if they are to be rendered according to the rules of typesetting a work in the Fraktur style - again, we are talking ordinary running text, no headings, bibliographies or anything.

The additional information is not of an aesthetic or stylistic nature, but tied to the meaning of certain words - that which Unicode calls semantic. In other words, the text, as rendered in Antiqua, allows for potential ambiguities - not necessarily fatal ones, because context may easily resolve them, but they are there, nevertheless.

This is just one example how the concept of an abstract content of a piece of text is not nearly as clearcut as you might think.

On the contrary, the definition of Unicode plain text is straight forward: a sequence of Unicode characters without any style information.

A./

Reply via email to