On Tue, Jan 8, 2019 at 11:58 PM James Kass via Unicode <[email protected]> wrote:
> > David Starner wrote, > > > Can some books be mostly handled with Unicode plain text > > and italics? Sure. HTML can handle them quite nicely. ... > > Yes, many books can be handled very well with HTML using simple > mark-up. If I were producing a computer file to reproduce an old > fiction novel, that's how I'd do it. Not because it's better or simpler > than plain text, but because it can't really be done in plain text at > this time. But if a section of the text is copy/pasted from the screen > into an editor, some of the original information may be lost. > Looking at the Encyclopedia Brown book at hand, you'd lose any marking that "The Case of the Headless Ghost" is the chapter header. While the picture of the treasure chest may be gratuitous, but "he hung his sign outside the garage:" is followed by an image of said sign that says "BROWN DETECTIVE AGENCY...". If you copy/paste that without carrying the original image along, some of the original information will be lost. In the Gmail editor, I see buttons to make the text bold, italic, or underlined, and to change the color, text size and font. English users tend to see italics as part and parcel of the text formatting. One can argue that's part of history, that italics is somehow different from bold and underline and font and text size changes, but when the standard perception conveniently matches how Unicode encodes the script, there doesn't seem much point in changing things, especially with terabytes of text that encodes italics separately from the plain text matter. Frequently, copy/pasting material does preserve non-plain text features; if I paste a title from Wikipedia into here, it will show up much larger then the rest of the text. It's a pain, because I want the underlying text, not how it was displayed in the context. Honestly, I could argue that case should not be encoded. It would simplify so much processing of Latin script text, and most of the time case-sensitive operations are just wrong. Case is clearly a headache that has to be dealt with in plain text, but it certainly doesn't encourage me to add another set of characters that are basically the same but not.

