И with lunate breve is not a letter of the alphabet, the breve is just an
indication to the reader of the dictionary that the И in this particular
word is pronounced short. While there may be homographs that differ in
pronunciation only by the vowel length, the alphabet doesn't provide for
that dis
Thank you, but how convenient!
Calling a letter a "modifier" allows to avoid re-encoding the same shape in
various alphabets.
Leo
On Wed, Jul 2, 2014 at 3:02 PM, Jean-François Colson wrote:
>
> Le 02/07/14 22:33, Leo Broukhis a écrit :
>
> http://www.omniglot.com/writing/nenets.htm
>>
>> sho
The angle and form (straight or curved, with wedge, with rounded bowl or
not, attached or detached from the letter) of the acute accent is not
really defined, all variants are possible, including the Czech/Polish form.
All that matters is the main direction of slanting. The only unacceptable
rende
On Wed, 2 Jul 2014 21:19:16 +0200
Philippe Verdy wrote:
> 2014-07-02 20:19 GMT+02:00 David Starner :
>
> > I might argue b for 0x00 in UTF-8 would be technically
> > legal
> But the same C libraries are also using -1 as end-of-stream values
> and if they are converted to bytes, they wi
On Wed, 2 Jul 2014 13:08:42 -0700
Leo Broukhis wrote:
> The difference is real and intentional, but isn't it akin to the
> difference between (IIRC a discussion several years ago) the
> Polish/Czech acute and the French acute - the former is more
> vertical?
Not really. Look at the third entry
On Wed, 2 Jul 2014 11:48:06 -0700
Leo Broukhis wrote:
> If the font happens to have lunar breve at U+0306, whereas the letter
> й has the rounded bowl breve, using CGJ should guarantee to achieve
> distinctive rendering, because <и, CGJ, U+0306> is not canonically
> equivalent to <и, U+0306> (cf
On Wed, 02 Jul 2014 21:13:30 +0300
"Jukka K. Korpela" wrote:
> 2014-07-02 20:34, Philippe Verdy wrote:
> > CGJ would be better used to prevent canonical compositions but it
> > won't normally give a distinctive semantic.
> In the question, visual difference was desired. The Unicode FAQ says:
>
Le 02/07/14 22:33, Leo Broukhis a écrit :
http://www.omniglot.com/writing/nenets.htm
shows two letters (’ and ”) in both versions of the Cyrillic Nenets
alphabet ("voiced taser”" and "unvoiced taser”") that don't seem to be
encoded as letters. Should they be encoded, or 2019 and 201D are good
http://www.omniglot.com/writing/nenets.htm
shows two letters (’ and ”) in both versions of the Cyrillic Nenets
alphabet ("voiced taser”" and "unvoiced taser”") that don't seem to be
encoded as letters. Should they be encoded, or 2019 and 201D are good
enough?
Leo
_
The difference is real and intentional, but isn't it akin to the difference
between (IIRC a discussion several years ago) the Polish/Czech acute and
the French acute - the former is more vertical? It has been decided that
there is no need for two combining acute signs.
Leo
On Wed, Jul 2, 2014 at
2014-07-02 20:55 GMT+02:00 Jukka K. Korpela :
> I think the idea of using CGJ is more wrong than the idea of using ZWNJ.
>
I think exactly the opposite. CGJ brings the distinction that it prohibits
the cannical combination. As the resulting string is not anonically
equivalent, it is also semantic
2014-07-02 20:19 GMT+02:00 David Starner :
> I might argue b for 0x00 in UTF-8 would be technically
> legal
It is not. UTF-8 specifies the effective value of each 8-bit byte, if you
store b in that byte you have exactly the same result as when
storing 0xFF or -1 (unless your syst
Sounds to me that what you really want is to have two different breve
characters
(assuming that the distinction is real and intentional, and not a
happenstance).
That would require encoding a new combining character, AFAICT...
/Kent K
Den 2014-07-02 20:48, skrev "Leo Broukhis" :
> Jukka,
>
>
2014-07-02 19:11, Leo Broukhis wrote:
Here
https://upload.wikimedia.org/wikipedia/commons/a/a4/Contrastive_use_of_kratka_and_breve.JPG
is an example of й and и + U+0306 COMBINING BREVE used contrastively
(/j/ vs short /i/) thanks to a difference in typographic style of
Cyrillic breve (kratka) an
Jukka,
If the font happens to have lunar breve at U+0306, whereas the letter й has
the rounded bowl breve, using CGJ should guarantee to achieve distinctive
rendering, because <и, CGJ, U+0306> is not canonically equivalent to <и,
U+0306> (cf. "The sequences and are not
canonically equivalent.")
Aren(t we in such a case where the distinction (supposed to be guessed
contextually) would be needed only to facilitate contextual analisis of
text (such as counting syllables, or transforming the text to count them in
a later process, or searching text phonologically, even if the look of the
rende
> The alternative would be to encode a separate CYRILLIC COMBINING "LUNAR"
BREVE for the case of the initial /j/, or to encode that letter /j/
specifically.
This is in effect what they are proposing on the wiki discussion page.
A correction: the lunar breve is for the short /i/ sound, and the rou
On Wed, Jul 2, 2014 at 8:02 AM, Karl Williamson wrote:
> In
> UTF-8, an example would be that Sun, I'm told, and for reasons I've
> forgotten or never knew, did not want raw NUL bytes to appear in text
> streams, so used the overlong sequence \xC0\x80 to represent them; overlong
> sequences genera
2014-07-02 20:34, Philippe Verdy wrote:
CGJ would be better used to prevent canonical compositions but it won't
normally give a distinctive semantic.
In the question, visual difference was desired. The Unicode FAQ says:
“The semantics of CGJ are such that it should impact only searching and
s
The alternative would be to encode a separate CYRILLIC COMBINING "LUNAR"
BREVE for the case of the initial /j/, or to encode that letter /j/
specifically.
However in your examples, that letter /j/ only occurs in the word initial
position where phonology transforms the long /i/ into /j/. Contextual
ZWNJ is not supposed to join or disjoin combing diacritics from a base
letter (even if it has such limited use in Indic scripts, but only between
letters to prevent clusters with subjoined letters),
CGJ would be better used to prevent canonical compositions but it won't
normally give a distinctive
On 7/2/2014 8:02 AM, Karl Williamson wrote:
Corrigendum #9 has changed this so much that people are coming to me
and saying that inputs may very well have non-characters, and that the
default should be to pass them through. Since we have no published
wording for how the TUS will absorb Corrige
Here
https://upload.wikimedia.org/wikipedia/commons/a/a4/Contrastive_use_of_kratka_and_breve.JPG
is an example of й and и + U+0306 COMBINING BREVE used contrastively (/j/
vs short /i/) thanks to a difference in typographic style of Cyrillic breve
(kratka) and regular breve.
For me in Win7 using и +
On 06/12/2014 11:14 PM, Peter Constable wrote:
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Karl Williamson
Sent: Wednesday, June 11, 2014 9:30 PM
I have a something like a library that was written a long time ago
(not by me) assuming that noncharacters were illegal in open i
It's my sense that there are very few cased scripts in existence that
are ever likely to be encoded by Unicode that haven't already been
so-encoded.
I also suspect that there very few new titlecased letters will ever be
added to Unicode, as I believe these all come to maintain roundtrip
compa
These guidelines are quite old (1999). But even with these, I'm convinced
that the proposed symbol is OK for encoding, and that it should harmonize
with glyphs for letters of the Thai script.
The dictionary example is enough convincing for me, as it is hard to see
that just as an illustration. It
On Wed, Jul 2, 2014 at 2:18 PM, Jukka K. Korpela wrote:
>
> Is there evidence of its use in text? This should be an essential question
> when discussing whether it should be defined as a Unicode character. Use as
> “logo” or, rather, as a standalone graphic symbol does not really mean it
> is use
On 02/07/2014, James Clark wrote:
> The Royal Institute Thai Dictionary (the authoritative dictionary for the
> Thai language) has an entry for unalom showing the symbol:
> https://pbs.twimg.com/media/BrdB2IsCYAAu4gP.jpg:large
Are there other dictionaries and books which use this symbol in te
2014-07-02 6:10, James Clark wrote:
The unalom is widespread in Thailand. For example, the Thai Red Cross
Society was originally founded as the Red Unalom Society, and its logo
was a red Unalom combined with a cross. It forms the main component of
the seal of Rama I (founder of the current Thai
29 matches
Mail list logo