Fw: 27th Unicode Conference - Call for Papers - Berlin, Germany - April 6-8, 2005

2004-10-12 Thread Lisa Moore
Unicodersread belowLisa Send in your submissions now! Call for Papers! Twenty-seventh Internationalization and Unicode Conference (IUC27) Theme: Unicode, Cultural Diversity and Multilingual Computing See Call for Papers at:

Re: bit notation in ISO-8859-x is wrong

2004-10-12 Thread Peter Kirk
On 12/10/2004 00:10, Mike Ayers wrote: From: Hohberger, Clive [mailto:[EMAIL PROTECTED] Sent: Monday, October 11, 2004 11:08 AM I agree with you... almost.. I think that AD and BC are really ordinal numbers, which denote relative position in a series from a 1-origin point. I thought 1 AD

Re: UTF-8 stress test file?

2004-10-12 Thread Philippe Verdy
From: Doug Ewell [EMAIL PROTECTED] Theodore H. Smith delete at elfdata dot com wrote: - the file mixes UTF-8 and UTF-16 Does this file mix UTF-8 and UTF-16? I thought it just had surrogates encoded into UTF-8? Of course a surrogate should never exist in UTF-8. You are right. Philippe's statement

Re: bit notation in ISO-8859-x is wrong

2004-10-12 Thread Christopher Fynn
Years were frequently written with Roman numerals - which of course have no zero. - Chris

Public Review Issues Update

2004-10-12 Thread Rick McGowan
The Unicode Technical Committee has posted three new public review issues. Details are on the following web page: http://www.unicode.org/review/ Briefly the new issues are: 47 Changes to default collation of Latin in UCA In collation, searching, and matching according to the Unicode

Re: UTF-8 stress test file?

2004-10-12 Thread Clark Cox
On Tue, 12 Oct 2004 20:25:16 +0200, Philippe Verdy [EMAIL PROTECTED] wrote: From: Doug Ewell [EMAIL PROTECTED] Theodore H. Smith delete at elfdata dot com wrote: - the file mixes UTF-8 and UTF-16 Does this file mix UTF-8 and UTF-16? I thought it just had surrogates encoded into UTF-8?

Re: UTF-8 stress test file?

2004-10-12 Thread Philippe Verdy
From: Clark Cox [EMAIL PROTECTED] unless the file was used as a test for CESU-8 The whole point of the CESU-8-like section is that it is not legal UTF-8. Except that the document does not even cite CESU-8 but only UTF-16! The text itself is puzzling as well as nearly all its suggestions about

Arcana

2004-10-12 Thread Chris Jacobs
- Original Message - From: Christopher Fynn [EMAIL PROTECTED] To: Unicode List [EMAIL PROTECTED] Sent: Tuesday, October 12, 2004 8:34 PM Subject: Re: bit notation in ISO-8859-x is wrong Years were frequently written with Roman numerals - which of course have no zero. Major arcana

Public Review Issues Update (correction)

2004-10-12 Thread Rick McGowan
There has been a further update to the document for Public Review Issue #48 (Directional Run) to clarify and expand the proposed definition. If you have already reviewed the document, I apologize for the inconvenience. The revised document is linked from the review page:

Re: bit notation in ISO-8859-x is wrong

2004-10-12 Thread Werner LEMBERG
But for certain purposes e.g. historical astronomical calculations (used for establishing chronology from records of eclipses etc) the year numbers used are effectively negative numbers (and zero) AD. Well, astronomers normally convert everything to Julian Day (JD) numbers, starting at

RE: bit notation in ISO-8859-x is wrong

2004-10-12 Thread Mike Ayers
Title: RE: bit notation in ISO-8859-x is wrong From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Werner LEMBERG Sent: Tuesday, October 12, 2004 2:26 PM But for certain purposes e.g. historical astronomical calculations (used for establishing chronology from records

Re: UTF-8 stress test file?

2004-10-12 Thread Philipp Reichmuth
Philippe Verdy schrieb: Examples of bad assumptions that a reader could make: - [quote](...) Experience so far suggests that most first-time authors of UTF-8 decoders find at least one serious problem in their decoder by using this file.[/quote] This suggests to the reader that if its browser or

outside decomposed, inside precomposed

2004-10-12 Thread Richard Cook
Using a certain newly Unicode-aware database application which shall remain nameless (FileMaker 7): imported UTF-8 sequences like [U+0065][U+0303] e, tilde get remapped internally to [U+1ebd] LATIN SMALL LETTER E WITH TILDE. Is this kind of behavior what one would expect? It's problematic (and

Re: UTF-8 stress test file?

2004-10-12 Thread Philippe Verdy
From: Philipp Reichmuth [EMAIL PROTECTED] Don't you think you are stretching things a bit? This is an UTF-8 parser stress test file. If an application opens it in a different encoding, well, of course the results will be different, and things will not look UTF-8-ish. Again, this is a