Re: Korean linebreking and UTR14(was Re: extracting words)

2001-02-13 Thread Mark Davis
Unicode List" <[EMAIL PROTECTED]> Sent: Monday, February 12, 2001 20:30 Subject: Re: Korean linebreking and UTR14(was Re: extracting words) > > > On Mon, 12 Feb 2001, Mark Davis wrote: > > Thank you for your answer. > > > Asmus Freytag is the one to talk to; he

Re: Korean linebreking and UTR14(was Re: extracting words)

2001-02-12 Thread Mark Davis
tracting words) > > > > On Sun, 11 Feb 2001, Mark Davis wrote: > > MD> Please read TUS Chapter 5 and the Linebreak TR before proceeding, as I > MD> recommended in my last message. The Unicode standard is online, as is the > MD> TR. Both can be found by going t

Re: Unicode collation algorithm - interpretation]

2001-02-11 Thread Mark Davis
I agree with Tex that the algorithm is small, if implemented in the straightforward way. I also agree with his #1, #2, and #3. I will add two things: 1. Where performance is important, and where people start adding options (e.g. uppercase < lowercase vs. the reverse), the implemenation of collati

Re: extracting words

2001-02-11 Thread Mark Davis
Please read TUS Chapter 5 and the Linebreak TR before proceeding, as I recommended in my last message. The Unicode standard is online, as is the TR. Both can be found by going to www.unicode.org, and selecting the right topic. The TR in particular discusses the recommended approach to line break i

Re: extracting words

2001-02-11 Thread Mark Davis
Word break is *very* different than linebreak; see Chapter 5 of TUS, and the Linebreak TR. For linebreak the only tricky language is Thai, since it requires a dictionary lookup (much like hyphenation in English). Java (and ICU) supply linebreak mechanisms as a part of the standard API. They also s

Navigator 6 question

2001-02-11 Thread Mark Davis
I have a few JavaScript pages for doing code charts, UTF conversion, and displaying Unicode glyphs. These work on IE, and on NN 4.7 (although the layout is not great), but someone complained that on NN 6 they don't work at all. Anyone have an idea what is happening? There seems to be a problem wri

Re: Unicode collation algorithm - Khmer/Cambodian

2001-02-10 Thread Mark Davis
I have not been following this discussion up until now. Typically the issue with syllables is like that with word-sorting. With word sorting, no matter what is in the second word, any difference in the first word swamps it. Example: ab xyz ghi abc def ghi In many cases, UCA does handle syllabic

Re: The normalization form of the result of a dyadic operation.

2001-02-09 Thread Mark Davis
The whole principle of tagging individual strings with NF* is a bit odd to me; not sure I like it. The K forms in particular are really a folding operation, much like casing. I would not expect to find a model where someone tagged every string in a database with its Case, and then had some e

Proposed Update: UAX #19: UTF-32

2001-02-07 Thread Mark Davis
a Unicode Standard Annex. However, it has not undergone final editorial review: it is not a stable document and may not be used as reference material nor cited as a normative reference from another document. Mark ___ Mark Davis, IBM GCoC, Cupertino (408) 777-5850 [fax: 5891], [EMAIL PROTECTED

Re: Surrogate space in Unicode

2001-02-06 Thread Mark Davis
It is the set of code points that can be addressed using surrogate code points. For more information, see the glossary at www.unicode.org. Mark - Original Message - From: "nikita k" <[EMAIL PROTECTED]> To: "Unicode List" <[EMAIL PROTECTED]> Sent: Tuesday, February 06, 2001 01:51 Subject:

Re: [OT] Unicode-compatible SQL?

2001-02-05 Thread Mark Davis
The topic came up in a UTC meeting some time ago, a "UTF-8S". The motivation was for performance (having a form that reproduces the binary order of UTF-16). We have yet to see a formal proposal for this, though. Mark - Original Message - From: "J M Sykes" <[EMAIL PROTECTED]> To: "Unicode

Re: Property error for U+2118?

2001-02-01 Thread Mark Davis
u, Feb 01, 2001 at 10:14:04AM -0800, Mark Davis wrote: > > If you had made almost any reasonable attempt whatsoever you would have > > found this. To find out about a character you first look in the charts and > > block descriptions. In this particular case, there is an annotati

Re: Time Intervals

2001-02-01 Thread Mark Davis
ng a strftime to ICU date format conversion routine and noticed that ICU has no week based year support. Fortunately I don't think my client needs it. Carl -Original Message- From: Mark Davis [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 25, 2001 9:18 PM To: Carl W. Brown; Unicode L

Re: Property error for U+2118?

2001-02-01 Thread Mark Davis
John, > It's interesting how we find ways to get around rules that bother us This is a misrepresentation. The symbol was always intended to be the Weierstrass elliptic function. It was misnamed, and is thus annotated with the correct information. Nobody is winking. > ... If I had read the U

Re: Unicode 3.1: UTF-8

2001-02-01 Thread Mark Davis
This is not an omission. This issue was debated at great length in the Unicode technical committee, and the precise wording was agreed to by the committee. Mark - Original Message - From: "John Cowan" <[EMAIL PROTECTED]> To: "Unicode List" <[EMAIL PROTECTED]> Sent: Wednesday, January 31,

Re: Benefits of Unicode

2001-01-29 Thread Mark Davis
Title: Unicode Benefits >Allows for multilingual documents using any or all the languages you desire. Invoice or ticketing applications can print native language names. *"multilingual documents" are rare -- as most people understand the term 'documents'. What more people care about is that

Re: Time Intervals

2001-01-25 Thread Mark Davis
RE: Time Intervals Mark, Date calculations are much easier if you start on a March 1 date such as March 1 1900. This is becase the months are 31,30,31,30,31 31,30,31,30,31 31,xx Putting February last makes leap year calculations easier. Carl -Original Message- From: Mark Davis [mailto

Re: Unicode 3.1: IDS and ZW(N)J

2001-01-24 Thread Mark Davis
It doesn't add any value to insert joiners. Just add the IDS itself to the font table. Mark - Original Message - From: "John Cowan" <[EMAIL PROTECTED]> To: "Unicode List" <[EMAIL PROTECTED]> Sent: Wednesday, January 24, 2001 11:21 Subject: Re: Unicode 3.1: IDS and ZW(N)J > John Jenkins

Re: Time Intervals

2001-01-22 Thread Mark Davis
This appears to have bounced the first time I sent it. - Original Message - From: "Mark Davis" <[EMAIL PROTECTED]> To: "Unicore" <[EMAIL PROTECTED]>; "Unicode" <[EMAIL PROTECTED]> Sent: Monday, January 22, 2001 08:04 Subject: Time Interva

Re: PDUTR #27: Unicode 3.1

2001-01-22 Thread Mark Davis
BTW, we have settled on a term for characters with code points above . See http://www.unicode.org/glossary/#supplementary_character http://www.unicode.org/glossary/#supplementary_code_point Mark - Original Message - From: "David Starner" <[EMAIL PROTECTED]> To: "Unicode List" <[EM

Re: A real bug in bidi

2001-01-17 Thread Mark Davis
Yes, I have already proposed an agenda item for the next UTC, to get this fix into 3.1. Mark ___ Mark Davis, IBM GCoC, Cupertino (408) 777-5850 [fax: 5891], [EMAIL PROTECTED], [EMAIL PROTECTED] http://maps.yahoo.com/py/maps.py?Pyt=Tmap&addr=10275+N.+De+Anza&csz=95014 Roozbeh P

Re: UNICODE application on IBM Mainframe

2001-01-17 Thread Mark Davis
Unicode is always serialized in a UTF: UTF-8, UTF-16*, or UTF-32*. The definition of each of these is invariant across systems: in UTF-8 an 'a' is always stored as 0x61. There is a special UTF for use on EBCDIC systems. Check out the technical reports and FAQs on www.unicode.org. Mark - Orig

Re: A real bug in bidi

2001-01-16 Thread Mark Davis
optimization strategy. (we here don't use that strategy, by the way). We think that the implementation strategy could be changed to still work, but for now we would recommend removing the characters. Mark ___ Mark Davis, IBM GCoC, Cupertino (408) 777-5850 [fax: 5891], [EMAIL PROTECTED], [

Re: Transcriptions of "Unicode"

2001-01-16 Thread Mark Davis
; To: "Unicode List" <[EMAIL PROTECTED]> Sent: Monday, January 15, 2001 07:39 Subject: RE: Transcriptions of "Unicode" > {Notice: way off-topic} > > Mark Davis wrote: > > There was a period well after the Norman invasion where a > > large number of w

Re: Transcriptions of "Unicode"

2001-01-15 Thread Mark Davis
- Original Message - From: "Marco Cimarosti" <[EMAIL PROTECTED]> To: "Unicode List" <[EMAIL PROTECTED]> Sent: Monday, January 15, 2001 00:15 Subject: RE: Transcriptions of "Unicode" > Mark Davis wrote: > >Much as I admire and appre

Re: Transcriptions of "Unicode"

2001-01-15 Thread Mark Davis
quot; Ar 07:28 -0800 2001-01-12, scr’obh Mark Davis: >According to the references I have, the prefix "uni" is directly from >Latin while the word "code" is through French. The Indo-European would >have been *oi-no-kau-do ("give one strike"): *kau apparently

Re: Transcriptions of "Unicode": Still Missing scripts

2001-01-12 Thread Mark Davis
quot;: Still Missing scripts > On Thu, 11 Jan 2001, Mark Davis wrote: > > > By the way, I am still missing the following. If anyone can supply them, I'd > > appreciate it. > > > > [BOPOMOFO] > [snip] > >[MONGOLIAN] > [snip] > > See http://www.macchia

Re: Transcriptions of "Unicode"

2001-01-12 Thread Mark Davis
Thanks for your detailed note; I'll have to think it over. ... > But there's another inconsistency in the transcription: the vowels in the > first ("u-") and third ("-code") syllable are both phonemically long. > Either you put the length mark on both (recommended for *phonetic* > transcription),

Re: Transcriptions of "Unicode"

2001-01-12 Thread Mark Davis
d.   Mark   - Original Message - From: "Marco Cimarosti" <[EMAIL PROTECTED]> To: "Unicode List" <[EMAIL PROTECTED]> Sent: Friday, January 12, 2001 03:11 Subject: Re: Transcriptions of "Unicode" > Hallo everybody!> > I don't full

Transcriptions of "Unicode": Still Missing scripts

2001-01-11 Thread Mark Davis
By the way, I am still missing the following. If anyone can supply them, I'd appreciate it. [BOPOMOFO] [KHMER] [MONGOLIAN] [MYANMAR] [SINHALA] [SYRIAC] [THAANA] [THAI] [TIBETAN] [YI] See http://www.macchiato.com/unicode/Unicode_transcriptions.html for details.

Re: Reverse Bidi Algorithm

2001-01-08 Thread Mark Davis
ICU offers a reverse BIDI algorithm. (http://oss.software.ibm.com/icu/) Mark - Original Message - From: "Roozbeh Pournader" <[EMAIL PROTECTED]> To: "Unicode List" <[EMAIL PROTECTED]> Cc: "Behdad Esfahbod" <[EMAIL PROTECTED]> Sent: Monday, January 08, 2001 20:12 Subject: Reverse Bidi Algor

Re: GBK, HZ and EUC-TW

2001-01-08 Thread Mark Davis
In specific cases you may use one character conversion mapping instead of two, but you should be very careful about that. See http://www.unicode.org/unicode/reports/tr22/, especially "1.2.1 Best-Fit Mappings" Mark - Original Message - From: "Lars Marius Garshol" <[EMAIL PROTECTED]> To: "

Re: (SC22WG20.3292) 14651 draft table updated

2001-01-02 Thread Mark Davis
Those have been added, and their weights are now > >reasonable. (Look under the respective Arabic letters.) > > > > I have a question outstanding among Inuktitut experts regarding the > > ordering of some elements of UCAS for Nunavut and Nunavik. More > > on that later.

New Unicode Website items

2000-12-31 Thread Mark Davis
We'd like to call people's attention to a few recent items on the Unicode site. UTF-8 Corrigendum - The Unicode Technical Committee has modified the definition of UTF-8 to forbid conformant implementations from interpreting non-shortest forms for BMP characters, and clarified some of the conform

Re: sqare root in java applets

2000-12-29 Thread Mark Davis
Magda, for questions like this, it would be helpful if you ask people to read the following. If they still have questions afterwards, you could forward them on to this list. http://www.unicode.org/help/display_problems.html http://www.unicode.org/unicode/faq/ (relevant "FAQ Pages") Mark - O

Re: Bug in Bidi

2000-12-26 Thread Mark Davis
he *opposite* of the embedding; the embedding marks that the embedded text is to be given a *different* direction than the surrounding text. Mark ___ Mark Davis, IBM GCoC, Cupertino (408) 777-5850 [fax: 5891], [EMAIL PROTECTED], [EMAIL PROTECTED] http://maps.yahoo.com/py/maps.py?Pyt=Tmap&addr=10

Re: Bug in Bidi

2000-12-19 Thread Mark Davis
I am swamped right now -- I will have more time after the 25th to comment. Mark - Original Message - From: "Roozbeh Pournader" <[EMAIL PROTECTED]> To: "Unicode List" <[EMAIL PROTECTED]> Cc: "Unicode List" <[EMAIL PROTECTED]> Sent: Tuesday, December 19, 2000 02:32 Subject: Bug in Bidi

Beta Unicode Character Database 3.1

2000-12-15 Thread Mark Davis
e/timesens/calendar.html) Mark ___ Mark Davis, IBM GCoC, Cupertino (408) 777-5850 [fax: 5891], [EMAIL PROTECTED], [EMAIL PROTECTED] http://maps.yahoo.com/py/maps.py?Pyt=Tmap&addr=10275+N.+De+Anza&csz=95014

Re: Transcriptions of Unicode

2000-12-14 Thread Mark Davis
That matches what I have on http://www.macchiato.com/unicode/Unicode_transcriptions.html, right? (circle?) Mark - Original Message - From: "Michael (michka) Kaplan" <[EMAIL PROTECTED]> To: "Mark Davis" <[EMAIL PROTECTED]>; "Unicode List" <[

Re: Transcriptions of Unicode

2000-12-14 Thread Mark Davis
D]> Sent: Tuesday, December 12, 2000 09:01 Subject: Re: Transcriptions of Unicode Ar 07:11 -0800 2000-12-12, scríobh Mark Davis: >ARMENIAN BULGARIAN >CHEROKEE >ETHIOPIC GREEK >GUJARATI >GURMUKHI INUKTITUT >OGHAM >RUNIC RUSSIAN >SINHALA >UCAS See http://www.egt

Transcriptions of Unicode

2000-12-12 Thread Mark Davis
Some people were kind enough to send me extra transcriptions for http://www.macchiato.com/unicode/Unicode_transcriptions.html I am still missing confirmation on the Russian and Greek, and (at least one language in) the following scripts. Any help from native speakers would be appreciated. ARMEN

Re: displaying Unicode text (was Re: Transcriptions of "Unicode")

2000-12-07 Thread Mark Davis
07, 2000 00:30 Subject: Re: displaying Unicode text (was Re: Transcriptions of "Unicode") > Mark Davis wrote: > > > > Let's take an example. > > > > - The page is UTF-8. > > - It contains a mixture of German, dingbats and Hindi text. > > - My l

displaying Unicode text (was Re: Transcriptions of "Unicode")

2000-12-06 Thread Mark Davis
<[EMAIL PROTECTED]> Sent: Monday, December 04, 2000 22:08 Subject: Re: Transcriptions of "Unicode" > Mark Davis wrote:> > > > What wasn't clear from his message> > is whether Mozilla picks a reasonable font if the language is not there.> > Sorry

TR22

2000-12-04 Thread Mark Davis
As per the instructions of the Unicode Technical Committee, TR#22: Character Mapping Markup Language (CharMapML) has been advanced from draft TR to full TR. See http://www.unicode.org/unicode/reports/tr22/ for more information. Note: The UTC intends to continue development this TR to also encomp

Re: Transcriptions of "Unicode"

2000-12-04 Thread Mark Davis
Gatos, CA, USA mailto:[EMAIL PROTECTED] > > +1 408.210.3569 (mobile) +1 408.904.4762 (fax) > ======= > Globalization Engineering & Consulting Services > > On Sat, 2 Dec 2000, Mark Davis wrote: > > > Won't Modzilla pick fonts based on charact

Re: Transcriptions of "Unicode"

2000-12-02 Thread Mark Davis
ibutes, Mozilla/Netscape 6 will use > the fonts that have been set up for those languages. E.g.: > > ... > > Erik > > Mark Davis wrote: > > > > Done. > > > > From: "Michael (michka) Kaplan" <[EMAIL PROTECTED]> > > > > > > I would suggest adding a > > > > > > > Mark Davis wrote: > > > > > > > > > http://www.macchiato.com/unicode/Unicode_transcriptions.html >

Re: Transcriptions of "Unicode"

2000-12-02 Thread Mark Davis
PROTECTED]> Cc: "Unicode List" <[EMAIL PROTECTED]> Sent: Friday, December 01, 2000 22:46 Subject: Re: Transcriptions of "Unicode" > Cool. Now if you also add LANG attributes, Mozilla/Netscape 6 will use > the fonts that have been set up for those languages.

Re: Transcriptions of "Unicode"

2000-12-01 Thread Mark Davis
t; > To: "Unicode List" <[EMAIL PROTECTED]> > Cc: "Unicode List" <[EMAIL PROTECTED]> > Sent: Friday, December 01, 2000 2:30 PM > Subject: Re: Transcriptions of "Unicode" > > > > Sad to report, my browser (Netscape 4.7) shows the Yidd

Transcriptions of "Unicode"

2000-12-01 Thread Mark Davis
I am interested in collecting transcriptions of the word "Unicode" in different scripts (and languages). If you are fluent in a language other than Unicode, I'd appreciate any suggestions. What I have so far is at: http://www.macchiato.com/unicode/Unicode_transcriptions.html Mark

Re: display problems on browser

2000-12-01 Thread Mark Davis
Have you tried looking at the Unicode home page, at "Display Problems", or the FAQ "Unicode on the Web"? - Original Message - From: "sreekant" <[EMAIL PROTECTED]> To: "Unicode List" <[EMAIL PROTECTED]> Sent: Thursday, November 30, 2000 22:27 Subject: display problems on browser > hi, >

Re: sequences and stuff

2000-11-30 Thread Mark Davis
The soft hyphen is not sufficient, since in other languages the case where two letters must be distinguished in collation may not fall on a syllable boundary, or allow hyphenation between them. The UTC looked at all the possible existing boundary-control characters; none of them really work for t

Re: UTF-8 Corrigendum, new Glossary

2000-11-30 Thread Mark Davis
quot;G. Adam Stanislav" <[EMAIL PROTECTED]> To: "Unicode List" <[EMAIL PROTECTED]> Sent: Wednesday, November 29, 2000 22:42 Subject: Re: UTF-8 Corrigendum, new Glossary > At 21:08 29-11-2000 -0800, Mark Davis wrote: > >1. The Unicode Technical Committee has modified t

UTF-8 Corrigendum, new Glossary

2000-11-29 Thread Mark Davis
We would like to call two items to people's attention. 1. The Unicode Technical Committee has modified the definition of UTF-8 to forbid conformant implementations from interpreting non-shortest forms for BMP characters, and clarified some of the conformance clauses. For more information, see htt

Re: Unicode Case Mappings UTR #21

2000-11-29 Thread Mark Davis
These are good points. TR 21 deliberately does not specify the language conventions for using titlecase, which as you note will change the effect of its use (see http://www.unicode.org/unicode/reports/tr21/#TitlecaseCaveats). Most products will have some smarts, but also leave it up to the user w

Re: string vs. char [was Re: Java and Unicode]

2000-11-20 Thread Mark Davis
The UTC will be using the terms "supplementary code points", "supplementary characters" and "supplementary planes". The term it is "deprecating with extreme prejudice" is "surrogate characters". See http://www.unicode.org/glossary/ for more information. Mark - Original Message - From: "

Re: Greek Prosgegrammeni

2000-11-19 Thread Mark Davis
I haven't had time to read this list recently, so here is a somewhat belated response. >But, even if you do so, we are left with a "wrong" canonical decomposition: >1FBC;GREEK CAPITAL LETTER ALPHA WITH PROSGEGRAMMENI;Lt;0;L;0391 0345N1FB3; >According to James' statement (which is not to

Re: string vs. char [was Re: Java and Unicode]

2000-11-16 Thread Mark Davis
We have found that it works pretty well to have a uchar32 datatype, with uchar16 storage in strings. In ICU (C version) we use macros for efficient access; in ICU (C++) version we use method calls, and for ICU (Java version) we have a set of utility static methods (since we can't add to the Java S

Re: [idn] Javascript code charts, unicode converter, show-characters

2000-11-16 Thread Mark Davis
That agrees with the results I get on http://www.macchiato.com/unicode/convert.html.   Mark - Original Message - From: J. William Semich To: Mark Davis ; Rick H Wesson Cc: Unicore ; Unicode ; w3c-i18n-ig Sent: Wednesday, November 15, 2000 22:46 Subject: Re

Re: [idn] Javascript code charts, unicode converter, show-characters

2000-11-15 Thread Mark Davis
programmatically, the program is wrong.   Mark - Original Message - From: J. William Semich To: Rick H Wesson ; Mark Davis Cc: Unicore ; Unicode ; [EMAIL PROTECTED] ; w3c-i18n-ig Sent: Wednesday, November 15, 2000 09:32 Subject: Re: [idn] Javascript code charts

Javascript code charts, unicode converter, show-characters

2000-11-15 Thread Mark Davis
I just made some fixes in my Javascript Unicode pages (insomnia again) that may be of interest.   http://www.macchiato.com/unicode/convert.html has UTF, RACE and LACE conversions, with a bit better error checking.   http://www.macchiato.com/unicode/charts.html has Unicode charts, plus a new "

Re: Devanagari question

2000-11-13 Thread Mark Davis
The Unicode Standard does define the rendering of such combinations, which is in the absence of any other information to stack outwards. Implementations that can't do that will either overstrike, or use some other fallback rendering. A sophisticated rendering will use positioning such as control

Re: Character counter?

2000-11-11 Thread Mark Davis
Doug is right, if you are counting *encoded characters*. This is fine for programmers, so if that is the purpose, you can use that method. (If the text is not well-formed, then you probably want to filter (e.g. not count) isolated half-surrogates, ill-formed UTF-8, and noncharacters. However, if

Re: National Languages Support in Windows

2000-11-10 Thread Mark Davis
ICU has a list of these. If you take a look at http://oss.software.ibm.com/icu/charset/CharMaps-HTML/windows-1252-2000.html , for example, you will see some other interesting cases. Mark - Original Message - From: "Michael (michka) Kaplan" <[EMAIL PROTECTED]> To: "Unicode List" <[EMAIL P

Re: Fw: Normative vs Informative

2000-11-06 Thread Mark Davis/Cupertino/IBM
OTECTED]> Sent: Monday, November 06, 2000 04:03 Subject: Re: Normative vs Informative Ar 01:57 -0800 2000-11-06, scríobh Mark Davis: >I was looking for a different message from Michael, and ran across this one. >This issue was not followed up on this list, so I thought I would report the >

Re: Normative vs Informative

2000-11-06 Thread Mark Davis
t: Thursday, October 26, 2000 03:15 Subject: Re: Normative vs Informative Ar 00:04 -0700 2000-10-26, scríobh Mark Davis: >I am leary of using normative your way unless we find strong evidence of >this. Well, that's just wrong, Mark. (Sorry, it's beat-up Mark day I guess.) Ken explained

Re: Unicode Character not Printing

2000-11-02 Thread Mark Davis
We appreciate any submissions of FAQ questions, and this is a good one. Reformat it as a Q... A... pair (plain text is fine), and send to [EMAIL PROTECTED], with the title "FAQ submission". The editorial committee will then look at it. Mark - Original Message - From: "Marco Cimarosti"

Re: Fonts that support the ORNL rendering of Tamil?

2000-10-31 Thread Mark Davis
Can someone write up a description of the proposed change, with the attandant glyphs. There is a UTC meeting next week in San Diego, so now's the time. Mark - Original Message - From: "Antoine Leca" <[EMAIL PROTECTED]> To: "Unicode List" <[EMAIL PROTECTED]> Sent: Tuesday, October 31, 2000

Re: IUC17 Talks and Papers

2000-10-23 Thread Mark Davis
Thanks, Misha and Ian. Two quick notes on my papers: The title of "What's New in Unicode 3.0" should be "What's New in Unicode 3.0.1". My keynote is also on the same site: "Unicode Myths". Mark - Original Message - From: "Misha Wolf" <[EMAIL PROTECTED]> To: "Unicode List" <[EMAIL PROTECT

Re: [Very OT] Japanese economy failing -- it's the Japanese language and

2000-10-20 Thread Mark Davis
>Zumindest die Hälfte der Namen im Lande kann so oder auch so ausgesprochen werden > - je nachdem, wie es der Namensträger wünscht. Much the same in America; you very often don't know how someone's last name is pronounced (or spelt): Stein => shtyn? styn? steen? - Original Message - Fro

Re: utf-8 != latin-1

2000-10-17 Thread Mark Davis
One of the main features of XML is that it has quite strict rules about how to handle errors. The goal, I believe, is to ensure that we are not awash in malformed files that have no clear interpretation. And this is clearly an error: the acceptable code points are quite clearly stated: http://ww

Re: Mail list archive

2000-10-13 Thread Mark Davis
You may already be aware of this, but there is an eGroups that archives the Unicode mail. (It is also searchable: for example, "Etruscan" comes up with about 15 messages. "Help" comes up with many many screenfulls.) It is described on http://www.unicode.org/unicode/consortium/distlist.html Mark

Re: Character properties

2000-10-11 Thread Mark Davis
Here is my take on the way Unicode general categories should be mapped to POSIX ones. 1. As a reminder, the Unicode General Categories are: L* (letters): Lu, Ll, Lt , Lm, Lo M* (marks): Mn, Mc, Me N* (numbers): Nd, Nl, No P* (punctuation): Pc, Pd, Ps, Pe, Pi, Pf, Po S* (symbols): Sm, Sc, Sk, So

Re: When does toUpperCase(ch) == ch ?

2000-10-10 Thread Mark Davis
In general, you can't depend on any of the following: toUppercase(x) == x iff Cat(x) == Lu toTitlecase(x) == x iff Cat(x) == Lt toLowercase(x) == x iff Cat(x) == Ll There are counterexamples to these, even using the simple 1-1 mappings. Take a look at the casing charts, at http://www.unicode.org

Re: Correct definition for an "isLatin1()" function

2000-10-05 Thread Mark Davis
For the purpose specified, isLatin1 should just test for <= 0xFF. After all, one would not want to exclude TAB, CR or LF ☺ Mark - Original Message - From: "John Cowan" <[EMAIL PROTECTED]> To: "Unicode List" <[EMAIL PROTECTED]> Sent: Thursday, October 05, 2000 10:33 Subject: Re: Correct de

Re: UTF-8 and UTF-16

2000-10-05 Thread Mark Davis
UTF-8, UTF-16, and UTF-32 all support exactly the same character repertoire. Please look at www.unicode.org, on the front page is a link to the FAQs. Mark - Original Message - From: "George Zeigler" <[EMAIL PROTECTED]> To: "Unicode List" <[EMAIL PROTECTED]> Sent: Thursday, October 05, 20

What is Unicode?

2000-10-04 Thread Mark Davis
Thanks to the industriousness of volunteer translators and to Magda and Julie's editorial work, we have many more translations of "What is Unicode" on www.unicode.org (all in UTF-8, of course). Check out http://www.unicode.org/unicode/standard/WhatIsUnicode.html. If you have problems displaying a

Re: Starter questions

2000-10-04 Thread Mark Davis
Please take a look at the FAQ and material on www.unicode.org to see if it answers your questions. - Original Message - From: "Jennifer Nguyen" <[EMAIL PROTECTED]> To: "Unicode List" <[EMAIL PROTECTED]> Sent: Wednesday, October 04, 2000 08:02 Subject: Starter questions > Hi, > > I'm new

Re: help me !!!

2000-10-03 Thread Mark Davis
Please take a look at www.unicode.org - Original Message - From: "Karambir Rohilla" <[EMAIL PROTECTED]> To: "Unicode List" <[EMAIL PROTECTED]> Sent: Tuesday, October 03, 2000 21:17 Subject: help me !!! > > > > hello > > Please help me anyone > waht is UTF8 & UTF16 ? > regard > kara

Re: [OT] Word select in Microsoft products?

2000-10-03 Thread Mark Davis
going forward.   Mark - Original Message - From: Mark Davis To: Unicode List Sent: Tuesday, October 03, 2000 07:30 Subject: Re: [OT] Word select in Microsoft products? Thanks for the detailed message. I tried it out, and if I have a sentence like  

Re: lag time in Unicode implementations in OS, etc?

2000-10-03 Thread Mark Davis
If there are specific areas where the BIDI algorithm has flaws, that should be communicated to the UTC bidi subcommittee, ideally with a proposal to fix the problem. Mark - Original Message - From: "Michael (michka) Kaplan" <[EMAIL PROTECTED]> To: "Unicode List" <[EMAIL PROTECTED]> Sent:

Re: [OT] Word select in Microsoft products?

2000-10-03 Thread Mark Davis
eems to work. Thanks!   Mark - Original Message - From: Chris Pratley To: 'Mark Davis' ; Unicode List Sent: Monday, October 02, 2000 13:12 Subject: RE: [OT] Word select in Microsoft products? There are two edit controls in the apps you mention

Re: lag time in Unicode implementations in OS, etc?

2000-10-03 Thread Mark Davis
It would be more accurate to say that it does not support all of Unicode 3.0. Just using the phrase "doesn't support 3.0" suggests that it is not compliant. A product can be compliant to a particular version of Unicode while only supporting a subset of the characters. Even compliant products with

[OT] Word select in Microsoft products?

2000-10-02 Thread Mark Davis
[Off topic -- just looking for information from a broad audience.]   Anyone know how to turn off the extremely annoying automatic word select (AWS) in Microsoft products? This is the "feature" that causes dragging outside of a word to behave like double-click. I often want to select part of

Re: New Name Registry Using Unicode

2000-10-02 Thread Mark Davis
There are a number of similarities between this XNS and IDN, so http://www.ietf.org/internet-drafts/draft-ietf-idn-nameprep-00.txt would be worth reading. On locales: using them is dangerous for matching. The only reason to add locale is if it were to make a difference which letters match. But th

Re: Cyrillic -

2000-09-29 Thread Mark Davis
Are you sure about that? On http://www.unicode.org/charts/PDF/U0400.pdf there is 0483 COMBINING CYRILLIC TITLO Mark - Original Message - From: "Valeriy E. Ushakov" <[EMAIL PROTECTED]> To: "Unicode List" <[EMAIL PROTECTED]> Cc: "Aleksandar Poposki" <[EMAIL PROTECTED]> Sent: Friday, Sept

Re: TATAP => TATAR

2000-09-19 Thread Mark Davis
If those can be confirmed, then the SpecialCasing file should be modified to add them. Could you verify this in time for the next UTC? Mark Cathy Wissink wrote: > I believe Azeri also uses the dotless i/dotted i Turkish-style casing. > > Cathy > > -Original Message- > From: Carl W. Brow

Re: [idn] nameprep forbidden characters

2000-09-19 Thread Mark Davis
ar need for other scripts such as > > Arabic? > > Mark Davis replied > > UCA (#10) already handles that. You will get a "fuzzy" compare if you > > mask off less important weights, and you will get a much > > better ordering > > than binary compare

Re: [idn] nameprep forbidden characters

2000-09-19 Thread Mark Davis
UCA (#10) already handles that. You will get a "fuzzy" compare if you mask off less important weights, and you will get a much better ordering than binary compare as well. Mark Hart, Edwin F. wrote: > Is there a need for a "fuzzy" comparison where names with and without > points in Hebrew? Is

Re: http://www.unicode.org/unicode/standard/standard.html

2000-09-18 Thread Mark Davis
Controlling Ligatures", in TUC 3.0, p. 318. > > Am 2000-09-15 um 14:40 UCT hat Mark Davis geschrieben: > > I'd like to remind everyone to look at the latest version of the Unicode > > Standard, especially when looking at fine points. To cite Unicode 3.0.1 > > (ht

Re: [idn] nameprep forbidden characters

2000-09-17 Thread Mark Davis
more >on the pronunciation rather than the exact spelling. I didn't quite get the last sentence. I had thought that the vowel marks were used to get the exact pronunciation. If that is not true, it may be part of my misunderstanding of the situation. > Jony > > > -Orig

Re: [OT] Ethnologue / Swiss German

2000-09-17 Thread Mark Davis
> the Ethnologue staff created separate entries for "Allemanisch," > "Alsatian," and "Schwyzerdütsch," which *may* appease nationalistic > preferences but definitely *does* result in inconsistency and > confusion. Interesting example. Some time ago I lived in eastern Switzerland for 4 years, and

Re: [idn] nameprep forbidden characters

2000-09-17 Thread Mark Davis
I am curious why you feel so strongly that the Hebrew points should be ignored in domain names. Prima facie, it seems that there is little harm in treating them no differently from other characters. What problem would arise if the domain was ABC.COM and I could not get it by typing AB*C.COM? (Here

Re: Ligatured characters

2000-09-15 Thread Mark Davis
I'd like to remind everyone to look at the latest version of the Unicode Standard, especially when looking at fine points. To cite Unicode 3.0.1 (http://www.unicode.org/unicode/standard/versions/Unicode3.0.1.html) "Section 13.2 Controlling Ligatures, page 318: the text is superseded by the follow

Re: Tagging orthographic systems (was: (iso639.186) the

2000-09-15 Thread Mark Davis
I share the concern about combinatorial explosions. Look a Spanish, Arabic or English, for example: http://oss.software.ibm.com/developerworks/opensource/icu/localeexplorer/ I agree that de-*-sp1996 makes more sense. For us, the variant should go before the country only if the variant is -- in ge

Re: surrogate terminology

2000-09-13 Thread Mark Davis
Not all code points are assigned (or even assignable) to characters. U+xx is used to refer to code points, which range from 0 to 10. Of these code points, some are assigned to characters (including regular characters, control characters, format characters, and private use characters [whose

Re: Counting characters or bytes in UTF-8?

2000-09-11 Thread Mark Davis
In general, we've found it far better to have low-level routines always have APIs in terms of code units that they implement (e.g. bytes in this case), and add higher-level routines that provide other interesting boundary information (e.g. code point boundaries, grapheme boundaries, word boundarie

Re: Surrogate support in *ML?

2000-09-08 Thread Mark Davis
Good point. In the past, I have used "surrogate characters" to refer to the characters encoded above , and surrogate code units to refer to the UTF-16 units D800-DFFF. However, I think that leads to confusion. Nobody has come up with a good term for all characters above . "Plane 1-16 chara

Re: Unicode on a non-Unicode web page

2000-09-08 Thread Mark Davis
Take a look at the Unicode FAQ on the web, at www.unicode.org "Gary P. Grosso" wrote: > Hi Unicoders, > > I am working on software to emit HTML in the encoding > and character set of the user's choice, from SGML/XML > documents which can contain any Plane 1 Unicode character. > The question is w

[Fwd: Unicode Conversions]

2000-09-07 Thread Mark Davis
Mark Davis wrote: > > > > > Hello all, > > I have been trying to input unicode from a browser and store it in a database. >The problem is the different encodings used to represent the unicode. > > The input text is in the UTF-8 format. I have read on the Mic

Re: Surrogate support in *ML?

2000-09-07 Thread Mark Davis
In HTML or XML you always use the code point (e.g. UTF-32), not a series of code units (UTF-8 or UTF-16). Thus you would use: 𐄣 not �� from UTF-16 nor 𐄣 from UTF-8 Mark Brendan Murray/DUB/Lotus wrote: > How can one encode a surrogate character as an entity in HTML/XML? Should > it be as t

<    5   6   7   8   9   10   11   >