Re: Another take on the English apostrophe in Unicode

Marcel Schneider Mon, 15 Jun 2015 02:53:58 -0700

On Mon, Jun 15, 2015 at 10:19 AM, Mark Davis ☕️  wrote:

> On Mon, Jun 15, 2015 at 9:17 AM, Marcel Schneider  wrote:


>> When we take the topic down again from linguistics to the core mission of 
>> Unicode, that is character encoding and text processing standardisation, 
>> ellipsis and Swedish abbreviation colon differ from the single closing 
>> quotation mark in this, that they are not to be processed.

>> Linguistics, however, delivered the foundation on which Unicode issued its 
>> first recommendation on what character to use for apostrophe. The result was 
>> neither a matter of opinion, nor of probabilities.

>> Actually, the choice is between perpetuating confusion in word processing, 
>> and get people confused for a little time when announcing that U+2019 for 
>> apostrophe was a mistake.


> Quite nice of you to inform me of the core mission of Unicode—I must have 
> somehow missed that.

> More seriously, it is not all so black and white. As we developed Unicode, we 
> considered whether to separate characters by function, eg, an END OF SENTENCE 
> PERIOD, ABBREVIATION PERIOD, DECIMAL PERIOD, NUMERIC GROUPING PERIOD, etc. Or 
> DIARASIS vs UMLAUT. We quickly concluded that the costs far, far outweighed 
> the benefits.

>In practice, whenever characters are essentially identical—and by that I mean 
>that the overlap between the acceptable glyphs for each character is very 
>high—people will inevitably mix up the characters on entry. So any processing 
>that depends on that distinction is forced to correct the data anyway. And 
>separating them causes even simple things like searching for a character on a 
>page to get screwed up without having equivalence classes.

>So we only separated essentially identical characters in limited cases: such 
>as letters from different scripts.

 

It was a very good idea to disambiguate also apostrophe and single quote, and I 
feel it's not paid too much because it simplified greatly the processing of 
quotation marks in English. I mean, the replacement of each pair of one kind by 
a pair of another kind. When I search for quotes in a text, I don't want to be 
distracted by apostrophes. Don't worry about equivalence classes, they already 
present to us a word without apostrophe as equivalent to the same letters with 
an apostrophe/quote between. It's every time better the computer knows what a 
character is exactly, even when at output it doesn't need to let us know, than 
that it comes up with a useless mixup.


 

You just brought up another good idea too: Period-terminated abbreviations are 
listed as exceptions in word processors. Another list could contain all words 
with leading apostrophe and all words with trailing apostrophe. This might 
allow to filter search results and to separate definitely apostrophes and 
single comma quotation marks. And at input, the smart quotes algorithms will 
become even smarter. Say, really smart.


 

I don't believe working people would mix up letter apostrophe and close-quote 
if they were on keyboard. And even now that they aren't, people don't, because 
people just hit the apostrophe key, which without any dumb smart quotes 
algorithm leads always to visually satisfying results, as shown in the Unicode 
documentation. For good desktop publishing, people must work hard anyway, so it 
would be nice to give them the means, and not to overburden them with routine 
tasks due to deficient text encoding.


 

The way things are working today is not satisfying concerning the English 
apostrophe. I still can't believe that the Unicode Committees were wrong when 
recommending U+02BC. Restoring this advantage today, will be at the honor of 
all involved parties, and we and future generations will thank you very much. 

 

If they'll exist.


 

Best regards,


Marcel Schneider

Re: Another take on the English apostrophe in Unicode

Reply via email to