Just a remark for fun: - You'll also note that this talk is all about the apostrophe, and if Kazakhstan wants to introduce it in 2019, that year will match exactly the code point U+2019 [ ’ ]... - This year 2018 is also the year to discuss and reverse the apostrophe decision, and it matches the codepoint U+2018 [ ‘ ] for the reversed apostrophe.
Happy new years to ‘Kazakhstan’ ! But now we have a new way to memoize the code point value for these apostrophes ! 2018-01-25 16:40 GMT+01:00 Philippe Verdy <[email protected]>: > Such example shows that ignoring umlauts makes the document > counterintuitive. Nobody is able to infer that "Paper" refers to a person > here or if he actually meant a paper sheet/article... > At least he should have written "Paeper" which would be more correct (if > "Christoph Päper" is German, the umlaut is equivalent to a following "e"), > or even "Christoph Paper". > > Apply that tot the Kazakh language, and attempt to drop the apostrophes > (because they very commonly cause various technical issues in softwares), > I'm sure you'll see problems of interpretation or too many synonyms, that > the use of acute instead would have avoided > > All softwares today are "8-bit" clean and support at least ISO 8859-1 or > windows 1252, if they don't support multibyte UTF-8; the time of 7-bit > ASCII is ended now since long, except in very old systems, that were anyway > not used at all for Kazakh in Cyrillic; so acute accents are more likely than > ASCII apostrophes to survive the technical software constraints, notably > if Latin letters with accents come from the ISO 8859-1 subset which is also > 8-bit in Unicode. Even with UTF-8, these Latin letters with accents (from > any ISO 8859-* subset) will be 2-byte wide, so exactly the same encoding > size as basic letter+ASCII quote and the encoding size is definitely not an > issue anywhere (all existing Kazakh Cyrillic letters are already using > 2-byte encoding in UTF-8, as all their assigned code points values were > higher than 0x7F but lower than 0x800) > > Choosing the ASCII quote for this "apostrophe" will not save anything ; > but the regular Unicode apostrophe U+2019 would need... 3 bytes after the > 1-byte basic Latin letter from ASCII (so it is worse !). > > Choosing the acute accent above Latin letters from ISO 8859-* would avoid > this issue, because they are precombined, and in UTF-8 the usual prefered > representation is in NFC form using a single code points. Javascript, Java, > or C/C++ "wide string" types will handle these characters also with a > single code unit (so the measured string "length" matches the number of > letters). You will avoid all problems of SQL code injection on web sites if > you have to allow the ASCII quotes unfiltered in data input forms to > represent the proposed Kazakh orthography: with the acute, you can still > continue to reject all ASCII quotes from software input forms and people > won't be forced to use the alternate U+2019, not found on their basic > keyboards, or will not substitute it by an hyphen or space or will not drop > it completely; they'll just type letters with acute accents with a single > keystroke on their Latinized keyboard. > > > 2018-01-25 13:15 GMT+01:00 Andrew West via Unicode <[email protected]>: > >> On 23 January 2018 at 00:55, James Kass via Unicode <[email protected]> >> wrote: >> > >> > Regular American users simply don't type umlauts, period. >> >> Not even the president of the Unicode Consortium when referring to >> Christoph Päper: >> >> http://www.unicode.org/L2/L2018/18051-emoji-ad-hoc-resp.pdf >> >> Andrew >> >> >

