Re: Global apostrophe solution? (Part of: A new take on the English apostrophe in Unicode; Keyman Developer for free?; Input methods at the age of Unicode)

Marcel Schneider Thu, 23 Jul 2015 01:54:33 -0700

As I donʼt know if the apostrophe issue** has been satisfactorily resolved, Iʼd 
like to briefly check that up, making a few statements to agree or not to agree 
with:


1 - We are all allowed to use U+02BC for the English apostrophe.  U+2019 is 
only a de facto preference, mainly with respect to end-users and wysiwyg word 
processing.  Unicode is thus a user-oriented standard.  However we must also 
take into consideration the font-related issues: U+02BC missing, or varying in 
shape following different expectations, like in these three sans-serif fonts 
(tested in LibreOffice):



2 - UAX #29 is not intended to work fine for English, so English 
implementations need to be tailored. These two statements are inferred from the 
Notes at § 4.1.1. This tailoring is however often not completed, as we can 
deduce from the behavior of word processors applying the UAX #29 recommendation:
| A further complication is the use of the same character as an apostrophe
| and as a quotation mark. Therefore leading or trailing apostrophes
| are best excluded from the default definition of a word.



3 - As in English, a leading U+2019 is never a quotation mark (as opposed to 
Scandinavian usage), leading apostrophes should be included in word definition, 
at the same level as in-word apostrophes.  Only the possessive mark apostrophe 
would end up to be left out when trailing.  This however is inconsistent, so a 
complete tailoring of UAX #29 for English must include algorithms that take a 
trailing U+2019 as a quote only if preceded by U+2018 within a number of 
words... but this too is uncomplete.



4 - Conversion of British single quotes to double quotes needs special 
processing to identify the close-quotes: applying a number of search rules, 
submitting each instance to the operator for validation.  This routine task is 
very annoying but remains limited to technicians (editors, typesetters), while 
the disambiguation of the apostrophe would affect the public on the whole.  As 
Marc Davis wrote on Mon, Jun 15, 2015 at 10:19 AM:

> In practice, whenever characters are essentially identical—and by that I mean 
> that the overlap between the acceptable glyphs for each character is very 
> high—people will inevitably mix up the characters on entry. So any processing 
> that depends on that distinction is forced to correct the data anyway.

Consequently, the introduction of U+02BC in English usage would not produce 
reliable data.



5 - The use of angle quotation marks for quotations in English (both British 
and American) would eliminate the apostrophe problem and bring a number of 
substantial advantages:

+ Quotations, especially when consisting in single words, are better 
highlighted and are no longer confusable with the use of scare quotes.

+ This may result in a move inside the psychological relationship towards 
quotations and quoting, which could eventually improve the handling of 
intellectual property.  A certain menace in this domain, due to word processing 
and internet, has been detected by Roman linguist Raffaele Simone.

+ British and American English would use the same quotes convention, so no 
quotes conversion would be necessary any longer.  This process streamlining 
could facilitate exchanges, locale barriers being overcome while localesʼ 
“flavour” (Iʼm quoting, not scaring, hereʼs my source: 
http://babelstone.blogspot.fr/2006/03/unicode-character-names-part-2-name-is.html)
 will be preserved trough word orthography.

+ Scare quotes would always have the same appearance, inside as well as outside 
of quotations. Their meaning is independent of quotation, so it seems 
consistent that they be not affected by their environment.



6 - Additionally, the use of U+0027 could be preferred for highlighting words, 
a usage found in technical documents like the Unicode documentation.  (However, 
even the inword apostrophe is in most cases represented by U+0027.)
As a result, the use of U+2018 is not needed any longer and should be strongly 
discouraged, at least in lanquages like English and French, to prevent U+2019 
from being used as a quotation mark.  This is far easier and better feasible 
than completing all fonts with U+02BC, urge users to deal with *two* different 
but identically looking “squiggles” (quotation), and track incorrect use. 
Having then an old and a new quotation marks convention visibly side by side, 
would probably be less cumbersome than having two apostrophes that look 
identical in most of the complete fonts but behave differently.



7 - As an input method for angle quotation marks, we can use the autocorrect 
while waiting that this and nested quotes management is implemented in word 
processing.  To achieve this, six entries may be required:
<  → «
«< → ‹
‹< → <
>  → »
»> → ›
›> → >
In Microsoft Word (supporting punctuations and symbols as autocorrect 
triggers), this will result in getting the double quotes with one keystroke, 
the single quotes (less used) with two keystrokes, and finally the 
less-than/greater-than signs with three keystrokes.
Following user preferences, the latter may be raised, and four entries only 
would be required:
<< → «
«< → ‹
>> → »
»> → ›

For a solution working in *all* applications, we can program extended keyboard 
layouts, notably using Keyman Developer, a software that I see as an important 
part of Unicode implementation by its easy-to-understand and flexible layout 
programming, matching expectations that were uttered soon after the first 
releases of the Unicode Standard.



8 - I (or even: We) still not know why the apostrophe has not been 
disambiguated with one of the quotation marks, while the hyphen-minus 
(mentioned in the parent thread) has been (U+2010 vs U+2212).  Iʼm not sure to 
buy the argument that “essential identity” (this is derived quotation, not 
scaring!) can be deduced from glyphic resemblence.  And indeed it hasnʼt been 
much times in Unicode history, given that the purpose is “to encode characters, 
not glyphs.”  The following quotation of TUS has not exactly this meaning: (§ 
1.3, p. 6) “the standard defines how characters are interpreted, not how glyphs 
are rendered”.  

In the case of “that squiggle” '’', TUS doesnʼt fully define how it is 
interpreted, only whether itʼs a letter (U+02BC) or a punctuation (U+2019), but 
*not* whether itʼs an apostrophe or a single closing quote, even while the two 
are essentially different (not in appearance, but in what philosophers called 
“essentia”, which is “the being”).  They “are the same in outward form but 
different in essence.”  To prove that to ourselves, we may look at German 
usage: single quotes are U+201A and U+2018, apostrophe is U+2019.  If the same 
principles had been applied, U+201A should have been merged with the comma, 
because we canʼt tell the difference: ‚,‚,‚, (the 1st, 3rd and 5th are 
quotation marks).  And here at least, the semantics would have been legible 
even for computers: leading comma is quote, trailing comma is comma.  The 
actual apostrophe convention in English is illegible semantics.

The curly apostropheʼs misfortune might have been to be encoded at the same 
time as the curly quote, while the (curly) comma was pre-existent to itʼs curly 
quote counterpart.  Ultimately, the punctuation apostrophe has *not* been 
encoded in Unicode.  Hence the *original* recommendation to use the letter 
apostrophe, which is very consistent with English usage.  Even more, we already 
learned that since 1983, the apostrophe may be considered as the 27th letter of 
the Latin alphabet: http://unicode.org/pipermail/unicode/2015-June/001914.html



9 - By not encoding the punctuation apostrophe, Unicode could rely upon the 
typographical tradition, realizing some scale economies and making the Standard 
more end-user friendly in some way.  This reflects however a tendency that 
prioritizes the appearance.  In Unicode this tendency is far from being 
omnipresent, it is surely very marginal in Unicode, and it’s presence is due to 
the influence of the software industry where that tendency is naturally more 
widespread, for economical reasons, that is mainly because the demand on usersʼ 
side has already a component (among others) which handles appearance as a 
satisfactory good and not asking for more than that a given item looks fine, no 
matter whatʼs behind...

Actually, as far as the English apostrophe is concerned, the process burden is 
moved from input to treatment.  Users can enter text without bothering, while 
on the other side, other people must work hard to fix a number of recurrent 
problems...



Now the goal would be to know if a part of the problem is conveniently 
resolved, and if there is an agreement on some of the different points listed 
above.  Ted Clancy and all who launched and responded the parent thread, are 
invited to share their feelings and how they see the topic today.

Best regards,

Marcel

** Note for archive readers:  Please refer to Ted Clancyʼs blogpost and the 
subsequent discussion:
http://www.unicode.org/mail-arch/unicode-ml/y2015-m06/0047.html

Re: Global apostrophe solution? (Part of: A new take on the English apostrophe in Unicode; Keyman Developer for free?; Input methods at the age of Unicode)

Reply via email to