OT: apologizing (was RE: Chemistry in chinesse (Only in chinesse?

2001-01-29 Thread Marco Cimarosti
Marco Cimarosti wrote: > MOST Chinese dictionaries that I have seen bear a table of > chemical elements at the end. > Perhaps you would have found out earlier going in a public > library. [...} And a lot of other unneeded "clarifications" about a "misunderstan

RE: Benefits of Unicode

2001-01-29 Thread Marco Cimarosti
Mark Davis wrote: > > [...] Invoice or ticketing applications can print native language names. > [...] Drop "and ticketing" No, please, don't drop that! I was about to show it to my boss, doctor Giuseppe Còmpani Mèneger ("You see, Joe? I told you it can also be used on POS tickets."). > > that

RE: Benefits of Unicode

2001-01-29 Thread Marco Cimarosti
Richard Cook wrote: > Has anybody played devil's advocate to this, with a list of > "Failings of > Unicode"? Are there any? :-) This question might in fact result in a > longer Benefits list Although I've always been a Unicode fan, Richard's invitation is too tempting. :-) I'll add these t

RE: Chemistry in chinesse (Only in chinesse?)

2001-01-26 Thread Marco Cimarosti
Erik Garrés wrote: > Now that thanks to Pierpaolo BERNARDI who found a book (...) > (dictionary) where shows what I was mentioning, MOST Chinese dictionaries that I have seen bear a table of chemical elements at the end. Perhaps you would have found out earlier going in a public library. > here

RE: Chemistry on chinesse. (CJK)

2001-01-24 Thread Marco Cimarosti
Michael Everson wrote: > There is no reason the Chinese or anyone else cannot write > this with LATIN CAPITAL LETTER O and SUBSCRIPT TWO. I think there is a misunderstanding, probably on my side. In his Spanish version, Erik claimed that the chemical elements were missing "en el contexto de lo

RE: Chemistry on chinesse. (CJK)

2001-01-24 Thread Marco Cimarosti
Erik Garrés wrote: > The elements of the periodical table (chemistry) are > missing, and they are specially needed on chinesse > because they don't have alphabet, so they need > them as a graphical representation. Some of these characters are quite common in modern life (e.g., "oxygen" is certai

RE: Greek questions, on- and off-topic

2001-01-23 Thread Marco Cimarosti
> My Greek textbook has acute, grave, and circumflex (called by > those names), > but I'm not sure what these correspond to in the Greek and > Greek Extended > blocks (there seem to be many more diacriticals than those). > Is there an on-line guide somewhere? There are in fact other diacriti

RE: anyone recognise this?

2001-01-22 Thread Marco Cimarosti
Peter Constable: > Does anybody recognise the script in the attached sample.gif? I already tried with handwritten Devanagari (without the top bar), but an expert on another list said that it is unlikely. I thought too it could be Georgian, but then I was unable to match any single letter. Severa

RE: Teletext mappings

2001-01-19 Thread Marco Cimarosti
Rob Hardy wrote: > I'm preparing some mappings of teletext character sets to Unicode. From : 0x60 0x2010 # HYPHEN (or is it a dash?) I think that 0x60 should be U+0640 (ARABIC TATWEEL): a character used to extend

RE: conjucts beginning with independent vowel?

2001-01-18 Thread Marco Cimarosti
Peter Constable wrote: > A couple of people asked to see some samples. I've posted a PDF at > ftp://ftp.sil.org/unicode/sylhoti/Syloti_VC_conjuncts.pdf. Now I see! Sorry for arriving after everybody else... I wonder then, how about renaming "Syloti Nagri Ng" as "Syloti Nagri anusvara"? And possi

RE: conjucts beginning with independent vowel?

2001-01-18 Thread Marco Cimarosti
Peter Constable wrote: >In the better known Indic scripts, are there ever cases of conjuncts formed >with independent vowels and a following consonant? >I know this may sound weird. The idea would be a VC syllable like "al". >Things that are more familiar are to have CC conjuncts, which would have

RE: An unexpected sight...

2001-01-17 Thread Marco Cimarosti
John Jenkins wrote: > (...) The CD is recorded by the Kölner Violen-Consort, with the words > in all caps on the album cover and the umlaut inside the O. > I cannot help but wonder how common this sort of thing is. In fraktur, I also have seen the umlaut depicted as a small "e" (lowecase), positi

RE: Transcriptions of "Unicode"

2001-01-15 Thread Marco Cimarosti
{Notice: way off-topic} Mark Davis wrote: > There was a period well after the Norman invasion where a > large number of words came into English directly from > Latin, which was still in widespread use among scholars. Right. And it also was the language of priests, on both sides of the Channel.

RE: Transcriptions of "Unicode"

2001-01-15 Thread Marco Cimarosti
Mark Davis wrote: >Much as I admire and appreciate the French language (second only to Italian), >the proximate derivation of "Unicode" was not from that language, and the >transcription should not match the French pronunciation. Instead, it has >solid Northern Californian roots (even though not e

RE: Transcriptions of "Unicode"

2001-01-12 Thread Marco Cimarosti
Peter Constable wrote: > I'd add the square brackets, an off-glide on the "o", and > aspiration (02b0) after the "k". Is that k aspirated? I do hear an aspiration when [p], [t] or [k] are at the *beginning* of "words" (mainly because teachers told me I was supposed to notice it), but I don't feel

Re: Transcriptions of "Unicode"

2001-01-12 Thread Marco Cimarosti
en off the list for a while. And, about points 2 and 3 above, beware that I am a second language English speaker and that I don't have much experience of American pronunciation. Ciao. Marco Cimarosti

Re: Open-Type Support (was: Greek Prosgegrammeni)

2000-11-22 Thread Marco Cimarosti
Lukas Pietsch wrote: > a lot was said in this thread about intelligent rendering > mechanisms, [...] > I figure that people are mostly thinking of the technology > called "Open Type", is that right? Right, but quite partial. There are several major technologies for rendering "complex Unicode scr

[totally OT] Unicode terminology (was Re: string vs. char [was Re: Java and Unicode])

2000-11-20 Thread Marco Cimarosti
David Starner wrote: > Sent: 20 Nov 2000, Mon 16.18 > To: Unicode List > Subject: Re: string vs. char [was Re: Java and Unicode] > > On Mon, Nov 20, 2000 at 06:54:27AM -0800, Michael (michka) > Kaplan wrote: > > From: "Marco Cimarosti" <[EMAIL PROTECTED]&g

Re: string vs. char [was Re: Java and Unicode]

2000-11-20 Thread Marco Cimarosti
Antoine Leca wrote: > Marco Cimarosti wrote: > > Actually, C does have different types for characters within > strings and for > > characters in isolation. > > That is not my point of view. > There is a special case for 'H', that holds int type rather >

RE: string vs. char [was Re: Java and Unicode]

2000-11-17 Thread Marco Cimarosti
Ooops! In my previous message, I wrote: > wchar_t * _wcschr_32(const wint_t * s, wchar_t c); > wchar_t * _wcsrchr_32(const wint_t * s, wchar_t c); What I actually wanted to write is: wchar_t * _wcschr_32(const wchar_t * s, wint_t c); wchar_t * _wcsrchr_32(const wchar_t * s, wint_t c); Sorry i

RE: string vs. char [was Re: Java and Unicode]

2000-11-17 Thread Marco Cimarosti
Addison P. Phillips wrote: > I ended up deciding that the Unicode API for this OS will only work in > strings. CTYPE replacement functions (such as isalpha) and > character based > replacement functions (such as strchr) will take and return > strings for > all of their arguments. > > Internally, m

RE: Java and Unicode

2000-11-15 Thread Marco . Cimarosti
Eliotte Rusty Harold wrote: > One thing I'm very curious about going forward: Right now character > values greater than 65535 are purely theoretical. However this will > change. It seems to me that handling these characters properly is > going to require redefining the char data type from two

Re: Devanagari question

2000-11-14 Thread Marco Cimarosti
Antoine Leca wrote: > Marco Cimarosti wrote: > > > > I think that the original idea behind having combining > marks in Unicode was > > that *any* combination of base + diacritic should be permitted, > > The fact that it is permitted (as I said, they "are not

RE: Devanagari question

2000-11-13 Thread Marco Cimarosti
Antoine Leca wrote: > My understanding is that there are a number of similar cases, > which are not > officially prohibited (AFAIK), but does not carry any sense. > For example, how about digits followed by accents (as > combining marks)? > Or the kana voicing/voiceless combining marks, when they

RE: Greek Prosgegrammeni

2000-11-08 Thread Marco Cimarosti
John Jenkins wrote: > On Tuesday, November 7, 2000, at 01:14 PM, James Kass wrote: > > > Does anyone know why the iota subscript has been used > > with the capital vowels on the charts? > > > > It was an error and has been fixed with the 3.0 charts. But, even if you do so, we are left with a "wr

RE: Q.'s about hanja

2000-11-07 Thread Marco Cimarosti
John Cowan wrote: > Marco Cimarosti wrote: > > > Do you mean that some hanja have a polisyllabic > pronunciation in Korean? > > Yes. Of the 9033 Unihan characters with Korean readings > given in the Unihan.txt > file, there are 689 with two-syllable mappings, 13 wit

RE: Q.'s about hanja

2000-11-07 Thread Marco Cimarosti
John Cowan wrote: > > 3) How often are hanja used today, however? (...) > > I believe they are still common in newspaper headlines, > because of the greater > degree of compression they permit. Do you mean that some hanja have a polisyllabic pronunciation in Korean? I thought than any single han

RE: Q.'s about hanja

2000-11-07 Thread Marco Cimarosti
SoHee Kim wrote: > > 1) Is it correct to say that hanja are only used for words > derived from > > Chinese, and never for genuninely Korean words? > > What do you mean by genuinely Korean words? It was just a poor expression. I meant "Korean words that were not derived from Chinese". (Of course

Q.'s about hanja

2000-11-07 Thread Marco Cimarosti
I have some questions about the usage of hanja (Chinese characters) in Korean. 1) Is it correct to say that hanja are only used for words derived from Chinese, and never for genuninely Korean words? 2) Is it true that hanja have been abolished in North Korea? When did this happen? 3) How often

RE: Is there an example of web site (or page) encoded in Unicode?

2000-11-07 Thread Marco Cimarosti
Paul Deuter wrote: > So can anyone point me to a web-site or page that is encoded > in Unicode (UTF-16 or UCS-2)? I have seen one single example of a web page in UTF-16 (but I cant remember the URL), and never saw one in UCS-2. It is much more likely to find Unicode web pages in the form of UTF-

RE: Indian Languges Fonts

2000-11-07 Thread Marco Cimarosti
Charlie Jolly wrote: > Can anybody supply me with Indian language fonts that work > with IE5.5 and > UTF-8 encoding. > > Primarily I am looking for Punjabi, Gujerati, Bengali and Urdu. The only Unicode font working with "Indic" scripts that I heard of is Microsoft's Mangal, which has to be used i

Stacking Thai marks (RE: FW: Greek questions)

2000-11-06 Thread Marco . Cimarosti
Timothy Partridge wrote: > (...) Thai combiners keep a fixed > distance from the base line, so although they stack they don't > (need to) move. This is in fact the behavior of many Thai systems (computer fonts, typewriters, etc.), but I think that it has to be seen as a approximation, rather than

RE: Unicode Character not Printing

2000-11-02 Thread Marco Cimarosti
Flask Eric wrote: > I have installed the Unicode versions of Arial and Times New > Roman on Windows > 98 running Office 97 on several PCs. Everything works fine > but on two separate > occasions I found out that when printing the Maltese > Characters on particular > printers, the Maltese Character

RE: Unicode Character not Printing

2000-11-02 Thread Marco Cimarosti
Flask Eric wrote: > I have installed the Unicode versions of Arial and Times New > Roman on Windows > 98 running Office 97 on several PCs. Everything works fine > but on two separate > occasions I found out that when printing the Maltese > Characters on particular > printers, the Maltese Character

RE: Number separators

2000-10-30 Thread Marco . Cimarosti
Mike Ayers wrote: > I discovered this weekend that Chinese, despite grouping large > numbers by ten thousands [...], write their digits with comma > separators every 3 digits [...] This may be different in different operating systems, but I too was convinced that they grouped four digits a

RE: FW: WIDOWS POLICES ??

2000-10-27 Thread Marco . Cimarosti
Alain LaBonté (Alan TheGoodness) wrote: > Interesting, isn't it, in particular in the context of > character coding? Fascinating, I would say! But one thing beats me: why did he write "des polices windows"? That's not "logiciel": it should have been "des polices fenêtres"! And the subject itsel

RE: WIDOWS POLICES ??

2000-10-27 Thread Marco . Cimarosti
Sorry to disappoint you: it means "font". > -Original Message- > From: Magda Danish (Unicode) [mailto:[EMAIL PROTECTED]] > Sent: 27 Oct 2000, Fri 19.37 > To: Unicode List > Subject: FW: WIDOWS POLICES ?? > > > I received this email inquiry in French. I translated it to > the best of my

RE: Convincing executives of character code perils

2000-10-24 Thread Marco . Cimarosti
Well, my executives are mostly Italians or Dutchmen, so they are quite used to the perils of their own languages. Ouch! I have just bitten my tongue in the attempt of pronouncing a very dangerous Italian phoneme! I need medical assistance, fast! _ Ma?co > -Original Message- > From: J. P

RE: Character properties

2000-10-23 Thread Marco . Cimarosti
Marcin Kowalczyk wrote: > isDigit:Nd > isHexDigit: '0'..'9', 'A'..'F', 'a'..'f' > isDecDigit: '0'..'9' > isOctDigit: '0'..'7' The definition "Nd" is what I would have proposed for isDecDigit. In general, I would consider any script's digit for decimal and octal numbers. Not so for hex numbers

RE: [Very OT] Japanese economy failing -- it's the Japanese langu

2000-10-20 Thread Marco . Cimarosti
Patrick Andries wrote, quoting from the Frankfurter Allgemeine Zeitung: > [...] drei völlig getrennte Schriftsysteme gewissermaßen in bunter Mischung [...] I am not sure which "three completely separate writing system" the author had in mind. There are several possible ways of counting "Japanese

Re: CJK combining components: MOVING TO OTHER ML

2000-10-20 Thread Marco . Cimarosti
rams", "holograms", etc.), and how (and whether) this analysis could be useful for encoding text on computers, building software fonts, and other computer-related fall downs. Then I (Marco Cimarosti) wrote: > Anyway. I think that everybody probably had quite enough of this > dayd

RE: Colours

2000-10-20 Thread Marco . Cimarosti
[EMAIL PROTECTED] wrote on [EMAIL PROTECTED]: > Are there languages you might need to encode where > colour is important? (such as, if a certain shape > in red is one letter, but in blue it is a different > letter) I think this is the case for the Nahuatl (Atztec) script, where color is a primary

RE: query

2000-10-19 Thread Marco . Cimarosti
Carl W. Brown wrote: > Double byte enabling DOS is no minor feat. It is not a > driver but a new > operating system. If you are tight on memory your > applications may not run > because the DBCS support adds overhead. About 5 years ago we > gave up on > DBCS DOS projects because they were to

RE: CJK combining components (was RE: "Giga ...)

2000-10-19 Thread Marco . Cimarosti
James E. Agenbroad wrote: > If I had to make a guess it would be that transforming the > glyphs of parts of characters so they will fit together in > a pleasing fashion would take about as much effort (or > more) than designing separate glyphs for each new character. Perhaps. I am a programmer,

RE: "Giga Character Set": Nothing but noise

2000-10-19 Thread Marco . Cimarosti
Jon Babcock wrote: > BTW, Marco, as near as I can recall, the above quotation in not from > me. Did it again! Shame on me! Sorry! _ Marco

RE: CJK combining components

2000-10-18 Thread Marco . Cimarosti
Doug Ewell wrote: > Marco Cimarosti <[EMAIL PROTECTED]> wrote: > > Carl W. Brown: > >> An article in the October 12, 2000 issue of Linux Weekly News > >> <http://lwn.net/bigpage.php3> tries to explain the benefit... > > Actually, that quote from Lin

RE: "Giga Character Set": Nothing but noise

2000-10-18 Thread Marco . Cimarosti
Jon Babcock wrote: > It seems to me that if not for that, how could anyone > make a Chinese font? Who is going to sit down and > draw a *myriad* or more characters? Since elements > recur, this reduces the amount of labour required > greatly. I too would have bet that all CJK foundries used some

RE: CJK combining components (was "Giga Character Set": Nothing b

2000-10-16 Thread Marco . Cimarosti
Carl W. Brown: > An article in the October 12, 2000 issue of Linux Weekly News > tries to explain the benefit: "Many > Asian characters are composites, made up of one or more simpler > characters. Unicode simply makes a big catalog of characters, without > recognizi

RE: [OT] problem with shift_jis

2000-10-12 Thread Marco . Cimarosti
Raghu Kolluru wrote: > My email delivery programs works with most of the charsets > but not with > shift_jis. > Here are the steps that I do, > 1) I get a text file from Japan which as the content in the > encoded charset. > 2) I paste this content in web based UI and store it in SQL server > 3)

RE: "Giga Character Set": Anything besides noise

2000-10-12 Thread Marco . Cimarosti
John Cowan wrote (in ASCII(tm), by the way): > In fact, of course, every extant Klingon text can be written > with Unicode, and indeed with ISO 646:1983. Well, it can -- provided that you properly *registered* your copy of ASCII(tm) (http://www.wholehog.fsnet.co.uk/robert/ascii/), and paid your

RE: When does toUpperCase(ch) == ch ?

2000-10-10 Thread Marco . Cimarosti
John O'Conner wrote: > I intend on testing this with a few perl scripts later using > the db, but > thought I'd pose the question to see if anyone has a quick answer: > > Is it true that if a character's general category is neither > Ll nor Lt, > then the uppercase character is simply the chara

RE: UTF-8 and UTF-16

2000-10-06 Thread Marco . Cimarosti
I muttered this incomprehensible paragraph: > - UTF-16 has 16-bit units ("words") and uses 1 or 2 units per > character. Characters 00 to 00 use the corresponding > word; higher values use a pair of "surrogates", the first one > ("high") being in . It too exists in the same 3 variants a

RE: UTF-8 and UTF-16

2000-10-06 Thread Marco . Cimarosti
George Zeigler wrote: > someone send me a FAQ page that explains the difference > between UTF-8 and Unicode (UTF-16 I suppose). You should perhaps read it again ;-) > UTF-8 if I understand correctly only supports > European characters, where as UTF-16 supports all major > characters world

RE: Locale ID's again: simplified vs. traditional

2000-10-04 Thread Marco . Cimarosti
Ayers, Mike wrote: > Correct me if I'm wrong, but isn't such a designator > unnecessary? I'll dare to correct you, then. :-) The reason for "language" tagging is not --should not be-- to clarify the interpretation of characters. At the character level, the semantics of text should be as l

RE: New Name Registry Using Unicode

2000-10-04 Thread Marco . Cimarosti
Carl W. Brown wrote: > It would certainly seem that the optimal solution would be to > carry the locale. Not at all, and for a good reasons: I need that, whenever and wherever I type in a certain string, I reach the same web site. Scenario: Imagine that I am a customer of Äöü, a (fictionary) I

UTF-8 and UTF-16 (was help me !!!)

2000-10-04 Thread Marco . Cimarosti
Karambir Rohilla wrote: > Please help me anyone > waht is UTF8 & UTF16 ? I found these to be well written and helpful: - "Forms of Unicode" (http://www-4.ibm.com/software/developer/library/utfencodingforms/index.html ) by Mark Davis. - "Unicode Transformation Formats: UTF-8 & Co." (http://czybo

RE: Locale ID's again: simplified vs. traditional

2000-10-04 Thread Marco . Cimarosti
Jukka Korpela wrote: > Does Unicode encode traditional and simplified Chinese characters > separately, or is the difference considered as glyph variation only, > to be indicated (if desired) at higher protocol levels? They are encoded separately, at different code points. What you heard about l

RE: Locale ID's again: simplified vs. traditional

2000-10-04 Thread Marco . Cimarosti
I wrote this blunder: > *Spell checking* is one of these cases, that we are all quite > familiar with. If I have to write a text using traditional > hanzi in Unicode, I can tag it as "Chinese-simplified", so > that my spell-checker can assist me signaling simplified > characters that slipped i

RE: New Name Registry Using Unicode

2000-10-04 Thread Marco . Cimarosti
Doug Ewell wrote: > Marco Cimarosti <[EMAIL PROTECTED]> wrote: > [...] > > a, B, c, e, H, i, j, K, M, n, o, p, s, T, u, x, or y to be > [...] > > This is a potential can of worms, because "look the same" is not a > Boolean property for glyphs. What about

RE: New Name Registry Using Unicode

2000-10-02 Thread Marco . Cimarosti
[EMAIL PROTECTED] wrote: > Just to clarify, I have no connection with the XNS project > (other than as a > user), but posted the info about it as of possible interest > [...] I am certainly one of those who made the impression of addressing Tom himself, as if he was the author of the proposal.

RE: New Name Registry Using Unicode

2000-10-02 Thread Marco . Cimarosti
Hi, Carl. (You replied privately; was this intentional? If not, you can resend it to the list, and I will re-send this one). > >A better choice, IMHO, would be to normalize by *decomposition*. In this > >way, the problem above would be addressed by rule 3 below. > I think you have a very good p

RE: New Name Registry Using Unicode

2000-09-29 Thread Marco . Cimarosti
Antoine Leca wrote: > [EMAIL PROTECTED] wrote: > > For purposes of name registration uniqueness, the only significant > > characters are numbers and letter as defined by the Java > isLetterOrDigit > > function returning TRUE. This function determines if a > character is a > > letter or digit acc

RE: New Name Registry Using Unicode

2000-09-29 Thread Marco . Cimarosti
[EMAIL PROTECTED] wrote: > In XNS 1.0, XNS personal, business, and general names all > follow the same normalization rules: These normalization rules only work for ASCII, so why bother using Unicode? After all, they can all keep on using ASCII (cmp. http://www.trigeminal.com/samples/provincial.

RE: halp me!!!!

2000-09-28 Thread Marco . Cimarosti
Karambir Rohilla wrote: > wath is maping of unicode font in indian language? Sorry, your question is too clumsy. I think that no one will be able to give you an answer. You should first make some points clear to yourself, then try and ask the different differently. The things that make your ques

RE: Character properties

2000-09-25 Thread Marco . Cimarosti
Marcin Kowalczyk wrote: > Thu, 21 Sep 2000 23:55:24 +0330 (IRT), Roozbeh Pournader > <[EMAIL PROTECTED]> pisze: > > I disagree with the isDigit case, simply because my main language, > > Persian, uses alternate digits when written. I agree with Roozbeh in disagreeing (sorry for the pun), even if

FWD: Unicode on a website

2000-09-22 Thread Marco . Cimarosti
Hi, Santosh. I am forwarding your questions to the Unicode List, as there are many people there that know much more than me about web programming and databases. What I can say, is that I don't see how Unicode could have any noticeable impact on the performance of a web site: after all it is just

RE: [very OT] "Slavic"

2000-09-21 Thread Marco . Cimarosti
Jörg Knappen wrote: > No, in german "welsch" always means a romance language (in most > cases french, but also italian and even romanian can fill in). Note > also "rotwelsch". > The "generic" term for slavonic languages is "wendisch" or "windisch" > derived form the formerly slavonic "Wenden", se

RE: [very OT] Welsch (was: [very OT] "Slavic")

2000-09-21 Thread Marco . Cimarosti
Otto Stolz wrote: > Buon giorno Marco, Guten Tag, Otto. > am 2000-09-21 um 8:34 h UCT hat [EMAIL PROTECTED] geschrieben: > > I read that the German dialectal word "Welsch" means "Italian" > > (a *Romance* language) to Austrians and German-speaking Italians; > Actually, it is standard German, and

[very OT] "Slavic"

2000-09-21 Thread Marco . Cimarosti
Peter Constable wrote: > On 09/16/2000 12:56:31 PM Doug Ewell wrote: > >MKJ is the Ethnologue code for both 'Macedonian' and 'Slavic'. > >Absolutely *everyone* knows there is no one 'Slavic' > language; the name > >refers to an entire language family. This is much more > imprecise than > >any o

RE: [idn] nameprep forbidden characters

2000-09-20 Thread Marco . Cimarosti
Michael (michka) Kaplan wrote: > It is not that simple... what if someone else registers the > domain that uses > the common orthographic variants? Well, I assume that it would not be possible because, by those hypothetic collation rules, the two domains would be considered the same -- like tryi

uax uts dutr?

2000-09-19 Thread Marco . Cimarosti
Out of curiosity, when did the acronym "UTR" ("Unicode Technical Report") mutate to those "UAX", "UTS", "DUTR" that I see in ? And, BTW, how is it that a "Superseded UTR" is not, say, a "SUTR"? _ Marco

RE: [idn] nameprep forbidden characters

2000-09-19 Thread Marco . Cimarosti
Edwin F. Hart asked: > Is there a need for a "fuzzy" comparison where names with and without > points in Hebrew? Is there a similar need for other scripts such as > Arabic? Mark Davis replied > UCA (#10) already handles that. You will get a "fuzzy" compare if you > mask off less important weight

RE: Ligatured characters

2000-09-15 Thread Marco . Cimarosti
Roozbeh Pournader wrote: > This sequence, ZWJ ZWNJ ZWJ, really worries me. In the Arabic > script, my > interest, this is always the case. The ZWNJ is not enough in any case, > since it disconnects the letters. > > And this also means some change in many simple rendering > programs that use > o

RE: Printing issues

2000-09-15 Thread Marco . Cimarosti
Dieter Hoffmann wrote: > Are there known issues between the way AMD K6/2 handles > Unicode when sent to printer by Office97? > > In the Windows98 SE environment whence originates this > question, Wordpad98 document containing > Greek and other special characters prints correctly, but when > ha

RE: Tagging orthographic systems

2000-09-14 Thread Marco . Cimarosti
Michael Everson wrote: > Tire Center (US) > Tire Centre (CA) > Tyre Centre (GB) > civilization (US) > civilization (GB) Oxford recommendation > civilisation (GB) Lots of folks (Ouch! The e-mail spellchecker had a lot to complain about the above quotation :-) Out of curiosity: is no "en-IE" tag n

Re: Tamil glyphs

2000-09-13 Thread Marco Cimarosti
Antoine Leca wrote: > > 1) should render the "half[...] > Yes, and irrelevant on this matter (but I shall return on it > later). I admit. It was the first chain of lng example. > Paragraph 6 page 214, titled "explicit virama", says: "[...] > placing the character U+200C zero width non join

FWD: Unicode & Indian languages (was Re: Tamil glyphs)

2000-09-13 Thread Marco Cimarosti
e best _ Marco --Original Message-- From: "mlinguist" <[EMAIL PROTECTED]> To: "Marco Cimarosti" <[EMAIL PROTECTED]> Sent: September 12, 2000 1:55:59 PM GMT Subject: Re: Tamil glyphs Dear Mr.Marco, Sorry for sending an unsolicited mail to you. I am interested in

RE: surrogate terminology (was Re: Surrogate support in *ML?

2000-09-13 Thread Marco . Cimarosti
Peter constable wrote: > - code values: integers within the space of some encoding > form; d800 - dfff > *are* code values, but not codepoints > - surrogate: I'm inclined to say that this should refer > *only* to a UTF-16 > code value in the range d800 - dfff; equal to "surrogate code value" > -

Re: Tamil glyphs

2000-09-12 Thread Marco Cimarosti
Please ignore my previous message (subj "[EMAIL PROTECTED]", to Antoine, cc [EMAIL PROTECTED]). Sorry about that. Antoine Leca wrote: > [EMAIL PROTECTED] wrote: > [...] > > In ordinary cases, a ZW[N]J inside a consonant cluster does > not prevent > > matra reordering. E.g., in Devanagari: > > >

RE: Tamil glyphs

2000-09-11 Thread Marco . Cimarosti
Michael (michka) Kaplan wrote: > From: "Rick McGowan" <[EMAIL PROTECTED]> > [...] > > I suppose if you just want to display the non-ligature type > thing in a > situation where the font wants to give you the ligature type > thing, you > might be able to use a ZWNJ or ZWNBSP between the chars.

RE: Win32: Commandline/batch ANSI-UTF8-UTF16-UTF8-ANSI conversion

2000-09-08 Thread Marco . Cimarosti
Title: Win32: Commandline/batch ANSI-UTF8-UTF16-UTF8-ANSI conversion tools Sure: uniconv.exe by Basis Technology.   It is distributed for free as a demo of the Rosette library; download from .   The version I have (quite old) does not support UTF-16, bu

RE: Tamil glyphs

2000-09-07 Thread Marco . Cimarosti
Antoine Leca wrote: > Michael (michka) Kaplan wrote: > [...] > > The Monotype font and Latha in Windows 2000 are the way > that my client got > > both display types. > > I believe this is a rather special need that your client > have: as I understand, > he wants, at the same time, some renderi

RE: [unicode] More ways to encode U+FEFF (was: Re: Designing a

2000-09-06 Thread Marco . Cimarosti
Markus Scherer wrote: > of this list, only UTF-EBCDIC is a viable encoding form. > the others are either deprecated, never made it beyond draft, > or are unofficial discussion pieces that never made it > anywhere (i proposed one of them :-). Please notice that at least one of these has never ev

RE: Armenian numbers

2000-09-04 Thread Marco . Cimarosti
Elliotte Rusty Harold wrote: > Is anyone here familiar with Armenian? The CSS Level 2 specification > from the W3C makes reference to "Traditional Armenian numbering" but > Unicode doesn't seem to include any Armenian numbers, at least as > such. Is this another language like Nebrew where the

RE: Same language, two locales

2000-09-04 Thread Marco . Cimarosti
Michael (michka) Kaplan wrote: > The one irrevocable thing that LCIDs give you is a collation > choice (the regional options do not allow you to specify a separate default > collation choice). Another important setting that is hard-wired with Windows locale is language. This affects some standard

RE: Same language, two locales (RE: Locale string for Norwegian -

2000-09-01 Thread Marco . Cimarosti
Antoine Leca joked: > Neither you nor I would accept that our national language are tagged, > respectively, la-ital and la-fran... ;-) > Similarly, I believe Norwegians and Danes will not accept to > have their > present 2-letter codes replaced with cascaded ones in the form > "Norse"-n? or "Nors

RE: Same language, two locales (RE: Locale string for Norwegian -

2000-08-31 Thread Marco . Cimarosti
Addison P. Phillips wrote: > Differences in writing systems are much more problematic than the > Norwegian example. The Simplified/Traditional Chinese thing > leaps to mind, of course, [...] Right. I just notice that, in Unicode, this is not a display difference but an encoding one: correspondin

Same language, two locales (RE: Locale string for Norwegian - Bok

2000-08-31 Thread Marco . Cimarosti
Addison P. Phillips wrote: > This is a weakness of the locale model used on the Web and most UNIX > systems: the hierarchy is based on the ISO 639 language codes > and the ISO 3166 country codes. It doesn't cover such minutiae as > "inside-a-country" variation easily nor does it deal well with su

RE: Zero-width ligator

2000-08-10 Thread Marco . Cimarosti
Roozbeh Pournader wrote: > That seems problematic to me, when used for Arabic. How should one use > ZWNJ between two Arabic letters to stop the ligature? The'll get > disconnected! Good point. ZWJ+ZWNJ+ZWJ comes to mind, but it is really not the maximum of elegance... _ Marco

RE: Braille rendering of Unicode [OT 50%]

2000-08-10 Thread Marco . Cimarosti
Steven R. Loomis wrote: > [...] Presumably the unicode codepoints in braille > would make a great format for these translations on their way to a > printer. One would hope they would get such use and not simply for > braille-looking characters on paper or screen. You are right, I didn't catch it

RE: Swiss numerical format [OT]

2000-08-10 Thread Marco . Cimarosti
Jörg Knappen wrote: > Are there good (authorative) references on the so called > swiss numerical format with its peculiar thousand separator? Why not comparing the locale settings of main operating systems? I think that at least WinNT, Apple, Linux, and other Unixes are widely represented on this

RE: Which languages are supported in basic latin

2000-08-10 Thread Marco . Cimarosti
Halldor G. Gestsson: > Can I find a list where all languages supported in the basic latin > (0x-0x00FF)? > [...] > Wich languages uses the latin extensions A,B and C? Page contains the information to build your lists. _ Marco

RE: Braille rendering of Unicode [OT 50%]

2000-08-09 Thread Marco . Cimarosti
Michael (michka) Kaplan wrote: > Is not > http://www.hclrss.demon.co.uk/unicode/braille_patterns.html > or alternately > http://charts.unicode.org/Web/U2800.html > already covering this? No. These are at most the building blocks for braille. A better parallel would be to consider these "presentat

Braille rendering of Unicode [OT 50%]

2000-08-09 Thread Marco . Cimarosti
In the next few years, most or all electronic texts (including web pages and e-mail) will be in Unicode. I was wandering: how about blind people? What will a braille display do when presented with multilingual text? Does anyone know of projects for defining a general "braille rendering" of arbitr

RE: Arabic shaping behavior questions

2000-08-09 Thread Marco . Cimarosti
Bob Hallissy wrote: > 1) Is the Arabic Joining Class [...] normative or informative? Like it or not, it is normative. See , that reads: ... ArabicShaping.txt (Section 8.2) Basic Arabic and Syriac character shaping

RE: Unicode String literals on various

2000-08-08 Thread Marco . Cimarosti
Hi, Antoine. > I can continue to dissert on this subject (all of this should > finally be > cooked in a FAQ anyway), but I do not want to flood the list > with a marginaly interesting subject. Merci beaucoup. It was very informative! Ciao. Marco P.S. You should not be so shy:

RE: Unicode String literals on various

2000-08-08 Thread Marco . Cimarosti
Antoine Leca wrote: > char C_thai[] = > "\u0E40\u0E02\u0E17\u0E32\u0E49\u0E1B\u0E07\u0E1C\u0E33"; Would the Unicode values be converted to the local SBCS/MBCS character set? If yes: Is the definition of this locale info part of the C99 standard itself, or is it operating system's locale? An

RE: is there any way to change already defined character codes?

2000-08-08 Thread Marco . Cimarosti
Sandro Karumidze wrote: > The issue is that in Unicode there is a sequence of Georgian > caracters different > from what this people think should be. > [...] In beginning of this century 5 characters were dropped > [...] > In Unicode this 5 characters follow 33. There is a different > point of

RE: Encodings for SQL Databases

2000-08-07 Thread Marco . Cimarosti
((( Sorry to those who see a mangled subject. It should read "RE: Encodings for SQL Databases" ))) Jon Peck wrote: > Most of the major databases now support Unicode at some > level, but what is > the best way to encode SQL statements for various database > access apis? [...] According to the

RE: Encodings for SQL Databases

2000-08-07 Thread Marco . Cimarosti
((( Sorry to those who see a mangled subject. It should read "RE: Encodings for SQL Databases" ))) Jon Peck wrote: > Most of the major databases now support Unicode at some > level, but what is > the best way to encode SQL statements for various database > access apis? [...] According to the

<    3   4   5   6   7   8   9   >