Re: Contrastive use of kratka and breve

2014-07-02 Thread Leo Broukhis
И with lunate breve is not a letter of the alphabet, the breve is just an indication to the reader of the dictionary that the И in this particular word is pronounced short. While there may be homographs that differ in pronunciation only by the vowel length, the alphabet doesn't provide for that dis

Re: Missing Nenets letters?

2014-07-02 Thread Leo Broukhis
Thank you, but how convenient! Calling a letter a "modifier" allows to avoid re-encoding the same shape in various alphabets. Leo On Wed, Jul 2, 2014 at 3:02 PM, Jean-François Colson wrote: > > Le 02/07/14 22:33, Leo Broukhis a écrit : > > http://www.omniglot.com/writing/nenets.htm >> >> sho

Re: Contrastive use of kratka and breve

2014-07-02 Thread Philippe Verdy
The angle and form (straight or curved, with wedge, with rounded bowl or not, attached or detached from the letter) of the acute accent is not really defined, all variants are possible, including the Czech/Polish form. All that matters is the main direction of slanting. The only unacceptable rende

Re: Corrigendum #9

2014-07-02 Thread Richard Wordingham
On Wed, 2 Jul 2014 21:19:16 +0200 Philippe Verdy wrote: > 2014-07-02 20:19 GMT+02:00 David Starner : > > > I might argue b for 0x00 in UTF-8 would be technically > > legal > But the same C libraries are also using -1 as end-of-stream values > and if they are converted to bytes, they wi

Re: Contrastive use of kratka and breve

2014-07-02 Thread Richard Wordingham
On Wed, 2 Jul 2014 13:08:42 -0700 Leo Broukhis wrote: > The difference is real and intentional, but isn't it akin to the > difference between (IIRC a discussion several years ago) the > Polish/Czech acute and the French acute - the former is more > vertical? Not really. Look at the third entry

Re: Contrastive use of kratka and breve

2014-07-02 Thread Richard Wordingham
On Wed, 2 Jul 2014 11:48:06 -0700 Leo Broukhis wrote: > If the font happens to have lunar breve at U+0306, whereas the letter > й has the rounded bowl breve, using CGJ should guarantee to achieve > distinctive rendering, because <и, CGJ, U+0306> is not canonically > equivalent to <и, U+0306> (cf

Re: Contrastive use of kratka and breve

2014-07-02 Thread Richard Wordingham
On Wed, 02 Jul 2014 21:13:30 +0300 "Jukka K. Korpela" wrote: > 2014-07-02 20:34, Philippe Verdy wrote: > > CGJ would be better used to prevent canonical compositions but it > > won't normally give a distinctive semantic. > In the question, visual difference was desired. The Unicode FAQ says: >

Re: Missing Nenets letters?

2014-07-02 Thread Jean-François Colson
Le 02/07/14 22:33, Leo Broukhis a écrit : http://www.omniglot.com/writing/nenets.htm shows two letters (’ and ”) in both versions of the Cyrillic Nenets alphabet ("voiced taser”" and "unvoiced taser”") that don't seem to be encoded as letters. Should they be encoded, or 2019 and 201D are good

Missing Nenets letters?

2014-07-02 Thread Leo Broukhis
http://www.omniglot.com/writing/nenets.htm shows two letters (’ and ”) in both versions of the Cyrillic Nenets alphabet ("voiced taser”" and "unvoiced taser”") that don't seem to be encoded as letters. Should they be encoded, or 2019 and 201D are good enough? Leo _

Re: Contrastive use of kratka and breve

2014-07-02 Thread Leo Broukhis
The difference is real and intentional, but isn't it akin to the difference between (IIRC a discussion several years ago) the Polish/Czech acute and the French acute - the former is more vertical? It has been decided that there is no need for two combining acute signs. Leo On Wed, Jul 2, 2014 at

Re: Contrastive use of kratka and breve

2014-07-02 Thread Philippe Verdy
2014-07-02 20:55 GMT+02:00 Jukka K. Korpela : > I think the idea of using CGJ is more wrong than the idea of using ZWNJ. > I think exactly the opposite. CGJ brings the distinction that it prohibits the cannical combination. As the resulting string is not anonically equivalent, it is also semantic

Re: Corrigendum #9

2014-07-02 Thread Philippe Verdy
2014-07-02 20:19 GMT+02:00 David Starner : > I might argue b for 0x00 in UTF-8 would be technically > legal It is not. UTF-8 specifies the effective value of each 8-bit byte, if you store b in that byte you have exactly the same result as when storing 0xFF or -1 (unless your syst

Re: Contrastive use of kratka and breve

2014-07-02 Thread Kent Karlsson
Sounds to me that what you really want is to have two different breve characters (assuming that the distinction is real and intentional, and not a happenstance). That would require encoding a new combining character, AFAICT... /Kent K Den 2014-07-02 20:48, skrev "Leo Broukhis" : > Jukka, > >

Re: Contrastive use of kratka and breve

2014-07-02 Thread Jukka K. Korpela
2014-07-02 19:11, Leo Broukhis wrote: Here https://upload.wikimedia.org/wikipedia/commons/a/a4/Contrastive_use_of_kratka_and_breve.JPG is an example of й and и + U+0306 COMBINING BREVE used contrastively (/j/ vs short /i/) thanks to a difference in typographic style of Cyrillic breve (kratka) an

Re: Contrastive use of kratka and breve

2014-07-02 Thread Leo Broukhis
Jukka, If the font happens to have lunar breve at U+0306, whereas the letter й has the rounded bowl breve, using CGJ should guarantee to achieve distinctive rendering, because <и, CGJ, U+0306> is not canonically equivalent to <и, U+0306> (cf. "The sequences and are not canonically equivalent.")

Re: Contrastive use of kratka and breve

2014-07-02 Thread Philippe Verdy
Aren(t we in such a case where the distinction (supposed to be guessed contextually) would be needed only to facilitate contextual analisis of text (such as counting syllables, or transforming the text to count them in a later process, or searching text phonologically, even if the look of the rende

Re: Contrastive use of kratka and breve

2014-07-02 Thread Leo Broukhis
> The alternative would be to encode a separate CYRILLIC COMBINING "LUNAR" BREVE for the case of the initial /j/, or to encode that letter /j/ specifically. This is in effect what they are proposing on the wiki discussion page. A correction: the lunar breve is for the short /i/ sound, and the rou

Re: Corrigendum #9

2014-07-02 Thread David Starner
On Wed, Jul 2, 2014 at 8:02 AM, Karl Williamson wrote: > In > UTF-8, an example would be that Sun, I'm told, and for reasons I've > forgotten or never knew, did not want raw NUL bytes to appear in text > streams, so used the overlong sequence \xC0\x80 to represent them; overlong > sequences genera

Re: Contrastive use of kratka and breve

2014-07-02 Thread Jukka K. Korpela
2014-07-02 20:34, Philippe Verdy wrote: CGJ would be better used to prevent canonical compositions but it won't normally give a distinctive semantic. In the question, visual difference was desired. The Unicode FAQ says: “The semantics of CGJ are such that it should impact only searching and s

Re: Contrastive use of kratka and breve

2014-07-02 Thread Philippe Verdy
The alternative would be to encode a separate CYRILLIC COMBINING "LUNAR" BREVE for the case of the initial /j/, or to encode that letter /j/ specifically. However in your examples, that letter /j/ only occurs in the word initial position where phonology transforms the long /i/ into /j/. Contextual

Re: Contrastive use of kratka and breve

2014-07-02 Thread Philippe Verdy
ZWNJ is not supposed to join or disjoin combing diacritics from a base letter (even if it has such limited use in Indic scripts, but only between letters to prevent clusters with subjoined letters), CGJ would be better used to prevent canonical compositions but it won't normally give a distinctive

Re: Corrigendum #9

2014-07-02 Thread Asmus Freytag
On 7/2/2014 8:02 AM, Karl Williamson wrote: Corrigendum #9 has changed this so much that people are coming to me and saying that inputs may very well have non-characters, and that the default should be to pass them through. Since we have no published wording for how the TUS will absorb Corrige

Contrastive use of kratka and breve

2014-07-02 Thread Leo Broukhis
Here https://upload.wikimedia.org/wikipedia/commons/a/a4/Contrastive_use_of_kratka_and_breve.JPG is an example of й and и + U+0306 COMBINING BREVE used contrastively (/j/ vs short /i/) thanks to a difference in typographic style of Cyrillic breve (kratka) and regular breve. For me in Win7 using и +

Re: Corrigendum #9

2014-07-02 Thread Karl Williamson
On 06/12/2014 11:14 PM, Peter Constable wrote: From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Karl Williamson Sent: Wednesday, June 11, 2014 9:30 PM I have a something like a library that was written a long time ago (not by me) assuming that noncharacters were illegal in open i

Unencoded cased scripts and unencoded titlecase letters

2014-07-02 Thread Karl Williamson
It's my sense that there are very few cased scripts in existence that are ever likely to be encoded by Unicode that haven't already been so-encoded. I also suspect that there very few new titlecased letters will ever be added to Unicode, as I believe these all come to maintain roundtrip compa

Re: Thai unalom symbol

2014-07-02 Thread Philippe Verdy
These guidelines are quite old (1999). But even with these, I'm convinced that the proposed symbol is OK for encoding, and that it should harmonize with glyphs for letters of the Thai script. The dictionary example is enough convincing for me, as it is hard to see that just as an illustration. It

Re: Thai unalom symbol

2014-07-02 Thread James Clark
On Wed, Jul 2, 2014 at 2:18 PM, Jukka K. Korpela wrote: > > Is there evidence of its use in text? This should be an essential question > when discussing whether it should be defined as a Unicode character. Use as > “logo” or, rather, as a standalone graphic symbol does not really mean it > is use

Re: Thai unalom symbol

2014-07-02 Thread Christopher Fynn
On 02/07/2014, James Clark wrote: > The Royal Institute Thai Dictionary (the authoritative dictionary for the > Thai language) has an entry for unalom showing the symbol: > https://pbs.twimg.com/media/BrdB2IsCYAAu4gP.jpg:large Are there other dictionaries and books which use this symbol in te

Re: Thai unalom symbol

2014-07-02 Thread Jukka K. Korpela
2014-07-02 6:10, James Clark wrote: The unalom is widespread in Thailand. For example, the Thai Red Cross Society was originally founded as the Red Unalom Society, and its logo was a red Unalom combined with a cross. It forms the main component of the seal of Rama I (founder of the current Thai