Somya asked:
> I have unicode C application. I am using the following macro
> to define my string
> to 2 byte width characters.
>
> #ifdef UNICODE
> #define _T(x) L##x
>
> But I see that GCC compiler maps 'L' to wchar_t, which is 4 byte on Linux. I
> have used -fshort-wchar option
> on Linux
Asmus replied:
> On 11/15/2010 2:24 PM, Kenneth Whistler wrote:
> >> FA47 is a "compatibility character", and would have a
> >> compatibility mapping.
> > Faulty syllogism.
>
> Formally correct answer but only because of something of a design flaw
> FA47 is a "compatibility character", and would have a compatibility mapping.
Faulty syllogism.
FA47 is a CJK Compatibility character, which means it was encoded
for compatibility purposes -- in this case to cover the round-trip
mapping needed for JIS X 0213.
However, it has a *canonical* deco
Mark Davis wrote:
> What are also tricky are the 'almost' supersets, where there are only a few
> different characters. Those definitely cause problems because the difference
> in data is almost undetectable.
For example, Mark is referring to cases such as ISO 8859-1 and 8859-15.
Those share all
Nagesh Chigurupati asked:
> I have a question regarding some of the contextual rules in RFC5892. For
> example the contextual rule in appendix A.4 Greek Lower Numeral Sign
> (U+0375), states the following:
>
> If Script(After(cp)) .eq. Greek Then True;
>
> If the Greek Lower Numeral Sign (U+037
Gy. Dobner asked:
> But my original question was not how to encode a combining macron in one
> more possible way but how to encode a length mark that would display as
> something _visually_ _distinguishable_ _from_ _a_ _macron_ (because the
> macron is functionally ambiguous and hence unsuitable f
> > What is the position regarding the 32-bit code point space
> > above U+10 please?
> > Does the Unicode Consortium and/or ISO or indeed anyone else
> > make any claims upon it?
> Yes, the claim is that if you use it, you're generating invalid Unicode.
>
> Don't do it, don't contempla
Asmus,
> >> I'm curious if any thought was given to this, and what code points I'm
> >> missing in my analysis.
> > U+1D452 MATHEMATICAL ITALIC SMALL E (or merely U+0065 LATIN
> > SMALL LETTER E), also used for Euler's number. See also U+2147.
>
> Now you are confusing Euler's constant - also dep
Karl Williamson asked:
> The Unicode standard only gives numeric values to rational numbers. Is
> the reason for this merely because of the difficulty of representing
> irrational ones?
No. Primarily it is because the Unicode Standard is a *character*
encoding standard, and not a standard for
> Exploring the dictionary with the search engine (which is operational
> since today morning ...) I discovered two occurences of an unexplained
> abbreviation which refers to a language in which "silvir" means
> "silver" and "ses" means "six". The name of the language is
> abbreviated as "Kimr."
> I am thinking of where a poet might specify an ending version
> of a glyph at the end of the last word on some lines, yet not
> on others, for poetic effect. I think that it would be good
> if one could specify that in plain text.
Why can't a poet find a poetic means of doing that, instead o
> > That statement is incorrect. The UCA currently specifies that
> > ill-formed code unit sequences and *noncharacters* are mapped
> > to [....], but unassigned code points are not.
>
> This is exactly equivalent: if you use strength level 3, they are
> both [...], ...
> > But an approach that abstracts the name, then tries to re-imagine a
> > representation from scratch is, in my view, very much misguided.
>
> Recall that many of the emojis 1) have changed glyphs quite a lot from
> the source glyphs, and 2) are to quite an extent defined from the *source*
> (J
Martin,
> In a discussion about a new protocol, there was some issue about how to
> replace illegal bytes in UTF-8 with U+FFFD. That let me remember that
> there was once a Public Review Issue about this, and that as a result, I
> added something to the Ruby (programming language) codebase. I t
Philippe Verdy said:
> Implicit weights for unassigned code points and other characters that
> are NOT ill-formed are suboptimal, as noted in the proposed update.
To follow up on Mark's response on this thread...
>
> It should take into account their existing default properties, notably :
[ lo
Luke asked:
> Given this scenario, is it proper to encode perhaps one set of TONAL MODIFIER
> LETTER SMALL _ suitable for use,
No.
> are we stuck using these mismatching existing
> encodings,
No, although if I were representing this data, that is probably
what I would use.
> or perhaps some
Frédéric Grosshans asked:
> Why did you chose the "fleur" words ? The question discussed about the
> accent do not seem to arise here.
I was struck by the issues about space, hyphen (or lack thereof)
and alternate spellings that could be illustrated by that
stretch of topics, so used that as the
A couple of weeks ago, in this thread Philippe Verdy said:
> Breaking on words, even if it requirs a very modest buffering,
> will significantly improve the processing time,
> because each word in the long texts will be scanned only
> once, and all the rest will occur within the small and
> co
Philippe Verdy noted:
>
> Everywhere below, the Unicode property value alias is missing an 'l'.
>
> - In HTML table 1:
> Egyp 050 Egyptian hieroglyphshiéroglyphes égyptiens Egyptian
> _Hierogyphs 2009-06-01
etc.
These errors in the tables have been corrected by the Registration
Aut
Karl Williamson asked:
> Subject: Why does EULER CONSTANT not have math property and PLANCK CONSTANT
> does?
> They are U+2107 and U+210E respectively.
Because U+210E PLANCK CONSTANT is, to quote the standard,
"simply a mathematical italic h". It serves as the filler for
the gap in the run of m
C. E. Whitehead said:
> I've not gone through many character charts though so I can't
> really speak as an expert as you all can; sorry I've not gotten
> to more; I will try to ...
For people who wish to pursue this issue further, the relevant
information is neatly summarized in the extracted p
Sharma asked:
> I have a question about VS characters and the default ignorable property.
>
> TUS 5.2 ch 16.4 clearly states that VS characters are default ignorable.
> Ch 5.21 states that default ignorable characters are to be ignored in
> rendering (except in specialized modes which show hidd
> > On this date, Unicode had received proposals for same purpose
> > form non-insiders too -- as you know this is true because India
> > is a nation of over a billion populations.
>
> I have seen no other proposals to encode the character, submitted
> either to the UTC or to WG2.
Actually, t
Philippe Verdy said:
> A side note about this preliminary proposal for allocating blocks in
> the SMP for the two Pau Cin Hau scripts (including one for the large
> "logographic" script, with 1050 signs):
>
> http://std.dkuug.dk/JTC1/SC2/WG2/docs/n3865.pdf
>
> (authored by Anshuman Pandey, in MI
Arno Schmitt noted:
> The marks in the Arabic bloc are not well organized;
A well-known fact that has resulted from the prior legacy for
Arabic encoding brought into Unicode, followed by twenty years
of incremental encoding of additional marks, as evidence has
been brought to bear and proposals f
> So what do we do with all these names?
> Can't we ask Mark to use a lottery to pick one and go from there? ...
So whaddya say, Mark? Have a go at the roulette wheel?
Ladies and gentlemen... step right up and place your bets!!
Bengali, Bangla, Bengalese, Bangladeshi, Bengalian, Bengalish,
Beng
Philippe Verdy said:
> A basic word-breaker using inly the space separator would marvelously
> improve the speed of French sorting even if backwards ordering occurs,
> just because it would significantly improve the data locality in the
> implementation and would considerably reduce the reallocati
Philippe Verdy wrote:
> "Kenneth Whistler" wrote:
> > Huh? That is just preprocessing to delete portions of strings
> > before calculating keys. If you want to do so, be my guest,
> > but building in arbitrary rules of content suppression into
> > the
Philippe Verdy said:
> If we don't limit the backwards reordering, then all accents in the
> full sentences will be reordered, so this is the final word that will
> drive the order. not only this is incorrect,
I understand that you think that the ordering should be done
word-by-word, with the Fre
[ snipping all the word breaking discussion, which I am not going
to comment on ... ]
CE Whitehead said:
> I collate as follows (note that i' is equivalent to i with accent grave):
>
> (EXAMPLE 1 -- my sort)
> di Silva, Fred,
> di Silva, John
> di Si'lva, Fred
> di Si'lva, John
> Disilva, Fr
William Overington asked:
> Will the Unicode Standard version 6.0 include mention of
> the unification of characters from the emoji set used in
> mobile telephones with earlier Unicode characters, also
> including a list of those characters of the emoji set
> that have been unified and where t
> On Fri, 25 Jun 2010, I wrote
>
> > Even in the year 2010, the euro sign (¤) doesn't work reliably.
>
> in both the Unicode list and in the newsgroup de.test.
>
> unicode.org shows a euro sign:
> http://www.unicode.org/mail-arch/unicode-ml/y2010-m06/0372.html
>
> groups.google.com shows a cur
A small aside on one suggestion by Philippe Verdy:
> This also suggests a new separate general category for the abstract
> symbols/traits encoded for such complex scripts, instead of assigning
> them in "gc=Lo" or defining them as unrelated symbols in "gc=S*" :
> possibly "gc=Lx" ?
That would run
> John -> If I define a symbol (variable or constant) named ɸ and some
> user types 'Ï' or 'Ï' instead, it won't match.
>
> Can you please post the names for the other two, i.e., 'Ï' or 'Ï' ?
John was referring to:
U+0278 LATIN SMALL LETTER PHI
U+03C6 GREEK SMALL LETTER PHI
U+03D5 GREEK P
Steve,
> All of this writing can be encoded using 1280 code points. I
> have a 12-bit encoding with bi-directional conversion with UTF-8 working
> for planes 1, 15, or 16.
A minor point, but I suggest you not use "bi-directional"
in that context.
"Bidirectional" is a term of art in Unicode ch
> But again, I'm not talking about programming. My four year old can grasp
> tonal
> just as well as she could decimal had I been teaching that. Now if I were
> using the a-f notation, she would be (reasonably) confused as to why *some*
> numbers are unique, but *other* numbers are also letter
> On Friday 04 June 2010 08:51:05 am Otto Stolz wrote:
> > In any case, you have to know the base of every number
> > you are going to parse. This stems from the fact that
> > the same digits are used for all number systems.
Luke-Jr replied:
>
> But you first need to know if it is a number or a
> I'm not sure how much longer we should continue to wait for Tengwar and
> Cirth.
Three words: Squeaky wheel -- grease.
Don't expect this to "just happen". The corporate members of
the Unicode Consortium are mostly concerned about economically
significant sets of characters that impact their b
> > Note that as of 1993, the only "LAMDA" or "LAMBDA" characters
> > in the standard were:
> >
> > 039B;GREEK CAPITAL LETTER LAMDA;Lu;0;L;N;GREEK CAPITAL LETTER
> > LAMBDA;;;03BB;
> > 03BB;GREEK SMALL LETTER LAMDA;Ll;0;L;N;GREEK SMALL LETTER
> > LAMBDA;;039B;;039B
> > 019B;LATIN SMALL LE
> Why not? I thought the names of some things have changed
> between versions, and other database items have changed substantially.
See "Name Stability" on the Unicode Character Encoding Stability Policy
page:
http://www.unicode.org/policies/stability_policy.html
--Ken
> > Names sometimes d
Robert Abel noted:
> It seems U+019B is the only instance where "lambda" is used. All other
> instances use "lamda". So it seems the slip-up is the other way around,
> whatever the initial reasoning for using "lamda" was.
It was not a slip-up. It was deliberate at the time (1993).
Note that as
John Dlugosz asked:
> Why does the code chart call the plain Greek letter (upper and
> lower case) "LAMDA" rather than "LAMBDA"?
Because ISO 8859-7 called it "LAMDA" rather than "LAMBDA".
Note that Unicode 1.0 called it "LAMBDA", but synchronization
of names for Unicode 1.1 (in 1993) was towar
Marcin Kowalczyk noted:
> Unicode has the following property. Consider sequences of valid
> Unicode characters: from the range U+..U+10, excluding
> non-characters (i.e. U+nFFFE and U+n for n from 0 to 0x10 and
> U+FDD0..U+FDEF) and surrogates. Any such sequence can be encoded
> in any
Lars asked:
> BTW, what are the properties of U+FFFD? In English please, do not point me
> to the standard.
?!
It has the general category of "Symbol Other" [gc=So].
> Like, can it be a part of an identifier,
It does not have the ID_Start or the ID_Continue property, which
you could determin
Lars said:
> According to UTC, you need to keep processing
> the UNIX filenames as BINARY data. And, also according to UTC, any UTF-8
> function is allowed to reject invalid sequences. Basically, you are not
> supposed to use strcpy to process filenames.
This is a very misleading set of statement
Lars Kristan stated:
> I said, the choice is yours. My proposal does not prevent you from doing it
> your way. You don't need to change anything and it will still work the way
> it worked before. OK? I just want 128 codepoints so I can make my own
> choice.
You have them: U+EE80..U+EEFF, which a
Philippe,
> RSVP is a French acronym for "Répondez, s'il vous plait".
Yes, we know that.
But it is also a reanalyzed English verb which means
"reply to a message (or invitation)".
That it has been morphological reanalyzed is demonstrated by the
fact that it takes regular English verb endings, a
Tim Greenwood asked:
> > ... a perfectly normal linguistic process of
> > attributive disambiguation of a term which had grown ambiguous
> > in usage.
>
> Is that like the 'Please RSVP' that I see all too often? Or should
> that not be excused?
*grins* Well, technically, that is not a case of at
> If any
> criticism was present, it referred to the redundant "US-" prefix in
> "US-ASCII", not to Unicode, and even that wasn't really criticism, just my
> lack of understanding /why/.
In addition to Doug's historical clarification, you need to
understand this as a perfectly normal linguistic
Peter Kirk noted:
> I was reviewing the Roadmap for the SMP
> (http://www.unicode.org/roadmaps/smp/), in comparison with the list of
> proposed new scripts, and found a few anomalies.
>
> "Hittite (Anatolian) Hieroglyphs/Luvian" is listed as a proposed new
> script, with a draft proposal, but
Lars responded:
> > ... Whatever the solutions
> > for representation of corrupt data bytes or uninterpreted data
> > bytes on conversion to Unicode may be, that is irrelevant to the
> > concerns on whether an application is using UTF-8 or UTF-16
> > or UTF-32.
> The important fact is that if you
Marcin asked:
> The general trouble is that numeric character references can only
> encode individual code points
By design.
> rather than graphemes (is this a correct
> term for a non-combining code point with a sequence of combining code
> points?).
No. The correct term is "combining characte
John Cowan responded:
> > Storage of UNIX filenames on Windows databases, for example,
^^
O.k., I just quoted this back from the original email, but
it really is a complete misconception of the issue for
databases. "Windows databases" is a misn
Lars,
I'm going to step in here, because this argument seems to
be generating more heat than light.
> I never said it doesn't violate any existing rules. Stating that it does,
> doesn't help a bit. Rules can be changed.
> I ask you to step back and try to see the big picture.
First, I'm going
Philippe continued:
> As if Unicode had to be bound on
> architectural constraints such as the requirement of representing code units
> (which are architectural for a system) only as 16-bit or 32-bit units,
Yes, it does. By definition. In the standard.
> ignoring the fact that technologies do
Philippe stated, and I need to correct:
> UTF-24 already exists as an encoding form (it is identical to UTF-32), if
> you just consider that encoding forms just need to be able to represent a
> valid code range within a single code unit.
This is false.
Unicode encoding forms exist by virtue of
Peter,
> This was in fact my question: will the amendment be
> passed automatically if there is a majority in favour, or does it go
> back for further discussion until a consensus is reached? You have
> clarified that the latter is true. And I am glad to hear it.
The relevant applicable clause
John Cowan clarified the JTC1 process:
> The result of a
> "no" vote is that the process loops until all such votes are resolved.
All comments on a formal JTC1 ballot receive a *disposition*.
As far as possible, that disposition is done by committee consensus,
which usually means, in practice, th
Allen Haaheim provided some further detailed clarification:
> Note that Han characters are logographic, not ideographic. That is,
> they are graphemes that represent words (or at least morphemes),
> not ideas.
This correctly states the situation for the normal case for
Chinese characters used w
John Hudson responded to Jony Rosenne:
> The idea that the position of such text on a page -- as a marginal
> note -- somehow demotes
> it from being text, is particularly nonsensical.
I think you two (Jony and John) are talking at cross-purposes
on this particular point.
The *content* of marg
Michael Norton (a.k.a. Flarn) asked:
> What's an ideograph? Also, what's a radical?
> Are they the same thing?
No, they aren't.
In the Unicode context, the simplest answer is that
an "ideograph" or a "CJK ideograph" is simply to be
taken as a synonym for "a Chinese character".
A "radical" is on
Mark Davis said (in reference to a long set of comments by
Philippe Verdy on this thread):
> The statements below are incorrect
And Philippe asked:
> Which "statements"? My message is mostly a read as a question, not as an
> affirmation...
And I will attempt the fact-finding...
> CGJ is a com
Philippe Verdy responded to John Cowan:
> From: "John Cowan" <[EMAIL PROTECTED]>
> > the need to encode Dutch
> > ij as a single character, which is neither necessary nor practical.
> > (U+0132 and U+0133 are encoded for compatibility only.) In cases where
> > ij is a digraph in Dutch text, i+ZWN
Otoo Stolz asked:
> In German, however, a ligature must not span a syllable break.
> How should I code plain text, w.r.t. hyphenation and ligatures?
> - "Huf" + ZWNJ + "lattich"
> - "Huf" + SYH + "lattich"
> - "Huf" + SYH + ZWNJ + "lattich"
> - "Huf" + ZWNJ + SYH + "lattich"
You should code it as
Tim Greenwood asked:
> > All of the spacing combining marks (general category Mc) except
> > musical symbols have a canonical combining class of 0. So, for example
> >
> > 0B95 (TAMIL LETTER KA) 0BC7 (TAMIL VOWEL SIGN EE - stands to the left
> > of the consonant) 0BBE (TAMIL VOWEL SIGN AA - on th
Harshal Trivedi asked:
> How can i make sure that UTF-8 format string has terminated while
> encoding it, as compared to C program string which ends with '\0'
> (NULL) character?
You don't need to do anything special at all when using UTF-8
in C programs, as far as string termination goes. UTF-8
Peter Kirk suggested:
> I am suggesting that the best way to get the job done properly is to lay
> the conceptual foundation properly first, instead of trying to build a
> structure on a foundation which doesn't match...
Part of the problem that I think some people are having here,
including Pete
Elaine Keown asked:
> Supposedly this list has >600 people.
>
> Just of curiosity, how many of you are NOT font
> designers?
And since a number of people are declaring their
backgrounds, I'll chime in, too. ;-)
I am not a font designer, although I have designed fonts
(many years ago) for ling
Theo,
Further following up from what Mark Davis responded...
> Mark Davis wrote:
> > All comments are reviewed at the next UTC meeting. Due to the volume, we
> > don't reply to each and every one what the disposition was. If actions were
> > taken, they are recorded in the minutes of the meetings
Elaine,
[Feel free to forward this on to the Hebrew lists you
copied on your original inquiry, if you think it appropriate.]
> Peter Constable replied on the Unicode list:
> >Which items? There were three at the June meeting:
> >- atnah hafukh
> >- lower dot and nun hafukha
> >- qamats qatan
> Jon Hanna wrote:
>
> >>imported UTF-8 sequences like [U+0065][U+0303] get
> >>remapped
> >>internally to [U+1ebd] LATIN SMALL LETTER E WITH TILDE.
> >>
> >>Is this kind of behavior what one would expect?
> >>
> >>
> >
> >That's conformant, if it causes problems with any other process (in
> At 06:04 PM 9/30/2004, Michael Everson wrote:
> > see no reason given for us not to unify the handwritten symbol we have
> > seen with BREVE ABOVE.
and Asmus responded:
> Functionally, the symbol is not a breve. Visually, the sample does not look
> like a standard breve, and the font resou
Jonathan Coxhead asked:
> >>Then could/should we use the sequence <200C, 062D, 20DD, 200C>?
> >
> >
> > You *could* use that sequence, and if your rendering implementation
> > were sophisticated enough, it *might* render what you were
> > expecting.
>
> So here's my question ...
>
> If
Antoine asked:
> On Tuesday, September 21st, 2004 18:50 Kenneth Whistler va escriure:
> >
> > With this change in place, it seems to me that the case is
> > quite clear *for* separate encoding of any circled Arabic
> > letter used as a symbol. If the sequence <062D
Kent wrote:
> Kenneth Whistler wrote:
>
> > Second, there is the question of cursive joining for Arabic.
> > I don't know anything in the Unicode Standard that states that
> > a combining enclosing mark breaks cursive ligation. It stands
> > to reason that it
Incidentally, for those interested, the website of the National
Court Reporters Association has a brief history of
shorthand (skewed of course to the English language-based
developments):
http://www.ncraonline.org/about/history/shorthand.shtml
A summary of the development of the Stenograph machin
>> There is no specific allocation
> > for Gregg or Pitman or any other particular system, but
> > 11E00..11FFF is currently blocked out for shorthands, simply
> > as a placeholder to indicate that we know such systems
> > exist and that somebody might bring forth a proposal and
> > that if success
Michael Everson responded to Christopher Fynn's question:
> At 13:46 +0100 2004-09-19, Christopher Fynn wrote:
>
> >So, am I right in assuming that were someone put together a decent
> >proposal for one or more shorthand scripts, there is no particular
> >reason in principle why it would be rej
Asmus responded:
> >It's a simple combining character. Even if you can't do arbitrary circles
> >around characters, you can take one character sequence and map it to the
> >glyph in a font. Systems that can't do even that need to be fixed.
>
> In other words, you would like to treat this as a man
Philippe waxed lyrical about the advantages of platform-independent
development:
> Isn't Java hiding most of these platform details, by providing unified
> support for platform-specific look and feel? Aren't there now many PLAF and
> themes manager available with automatic default selection of t
Philippe asked:
> http://www.omniglot.com/writing/albanian.htm
> shows two historic scripts that have been used to write Albanian (Shqip):
> - the Elsaban script in the 18th century, which looks like Old Greek for the
> language Tosk variant. However there are lots of unique letter forms, and
>
> On 05/09/2004 18:27, John Cowan wrote:
>
> >The following links show L-shaped marks, apparently combining
> >characters, that indicate the change-of-pitch position in Japanese
> >words written in romaji. Are these novel characters, or can they
> >be identified with existing Unicode characters?
> >>One
> >>such situation is Holam Male which never takes an additional combining
> >>mark*. So why can't we represent it as ?
> >>
> >>
> >
> >Because the UTC has ruled out as interpretable sequences.
> >
> >
>
> Is there a better reason than "because we say so"? You don't have to
>
Peter Kirk continued:
> I did read it, but it didn't deal with the issue I was concerned about,
> of multiple combining marks. And I was concerned about that issue
> because that was the major concern expressed in the earlier discussion
> on variation selectors, and presented as the decisive re
Peter Kirk wrote:
> > At 11:02 AM 7/13/2004, Peter Kirk wrote:
> >
> >> I was surprised to see that WG2 has accepted a proposal made by the
> >> US National Body to use CGJ to distinguish between Umlaut and Tréma
> >> in German bibliographic data.
And Asmus responded:
> > You raise some intere
> Subject: Impotance of diacritics (was: Looking for transcription ...)
^
It's a good thing this discussion of the impotence of diacritics
from bushmanush didn't also mention \/|å.G4ä, and talked about
*tran*scription, instead of *pre*scription, or my spam filter
would certainl
> Subject: Re: Changing UCA primarly weights (bad idea)
Correcting the subject, just because it bugs me...
> You are certainly right that this is not a slam-dunk; there are reasons for
> and against it. A
Peter Kirk said:
> I made a serious point, not apparently made in the UTR draft, that
> diacritic folding may be useful for spam filtering and similar
> applications including finding misleading URIs.
This seems like a reasonable point to make and to add to the discussion
of folding in UTR #30
> the versions in the main Greek and
> Coptic block (or has it been officially renamed just "Greek"?)
No, the block name won't be changed, in part because changing
block names is another destabilization in the standard that
really serves nobody well, but mostly because the existing
14 Coptic lett
> I have a (hopefully) short question about "polytonic" Greek support.
> Does anyone know what the idea was behind encoding Greek vowel+acute
> combinations (without apirates, etc.) twice: first in the Basic
> Greek section as vowel+tonos, for the second time in the Extended
> Greek section as vow
Elain asked:
> Quotes below from the SMP .pdf---I can't put the three
> quotes below together intelligibly.
>
> Do the quotes mean that the Linear B syllabary and Old
> Italic and Ugaritic are already in permanent locations
> in the SMP, or do they mean something else?
You should start with th
> I like to use the decomposed version of Unicode characters Ð, ð, £ and
> ³ (U+0110, U+0111, U+0141 and U+0142).
> For example, d followed by a combining_diacritical_mark should generate
> ð (d with stroke).
>
> What combining_diacritical_mark should be used for this case ?
As Michael and Clark
> On Jun 11, 2004, at 6:44 AM, Andrew C. West wrote:
>
> > Depite the oft-mentioned cutesy Hong Kong race horse names,
> > idiosyncratic
> > invented Han ideographs are a negligible component of the encoded CJK
> > repertoire. In my opinion there are thousands, possibly tens of
> > thousands, o
> Peter Constable wrote,
>
> > Don't forget canonical equivalence (I forgot about this as well): the
> > double-width diacritics have a combining class of 234 rather than 230.
> > This means that 0251 0361 0302 028A is canonically equivalent to 0251
> > 0302 0361 028A. Therefore, the first (for be
Michael,
And now you are answering arguments with irrelevancies.
> >But the argument in this particular case hinges on a particular,
> >nonce set of characters.
>
> You use "nonce" very easily.
Nonce: Occurring, used, or made only once or for a special occasion.
You can, of course, quibble tha
> > Simply because some images appear in some
> > documents does not mean that they automatically should be
> > represented as encoded
> > characters.
>
> These aren't images. They're clearly letters; they occur in running texts and
> represent
> the sounds of a spoken language.
Well, I agree
Peter,
> There is no consensus that this Phoenician proposal is necessary. I
> and others have also put forward several mediating positions e.g.
> separate encoding with compatibility decompositions
>
> >>>Which was rejected by Ken for good technical reasons.
> >>>
> >>I don't r
António noted:
> Dunno about the others, but spanish play cards suit symbols are
> clearly "style" variations of U+2660, U+2663, U+2665 and U+2666.
>
> (BTW, I'm right asuming that U+2660, U+2663, U+2665 and U+2666 are the
> "actual" suit symbols, while U+2661, U+2662, U+2664 and U+2667 are
> jus
Ted Hopp responded:
> On Tuesday, May 25, 2004 5:23 AM, Michael Everson wrote:
> > >At what point is it more practical to say 'use a graphic'?
> >
> > When they are just pictures of things. Not when they are coherent
> > sets of things with structure, used by people for well over a century
> > to
Peter Constable responded to Peter Kirk:
> > From: Peter Kirk [mailto:[EMAIL PROTECTED]
> > Sent: Friday, May 28, 2004 1:40 PM
>
>
> > Well, I understood the semantic content of a text to be the meaning of
> > the words...
[Kirk continuing, to provide more context...
> > , not the indication o
1 - 100 of 907 matches
Mail list logo