RE: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)

2003-03-02 Thread Kent Karlsson
Michael (michka) Kaplan: ... > then the conversion will simply strip the errant characters. Note that > either solution meets the needs of refusal to interpret the errant > sequences. Simply stripping the errant byte sequences means that they are each interpreted as the empty string of character

Impossible combinations?

2003-03-02 Thread Kevin Brown
I'm working on a Latin-based font that's got a large number of kerning pairs already defined and I'm trying to pare this list of pairs down to the bare minimum. There seem to be many pairs which are unlikely ever to be used. These pairs all involve a lowercase on the left with an uppercase on t

Re: Impossible combinations?

2003-03-02 Thread Michael Everson
At 22:41 +1030 2003-03-02, Kevin Brown wrote: Does anyone know of a Latin-based language in which it is possible to have a lowercase immediately followed by an uppercase in the SAME word? Yes. It happens in Irish all the time. -- Michael Everson * * Everson Typography * * http://www.evertype.c

Some of Andy's assertions

2003-03-02 Thread Michael Everson
1. The sequence 'Vowel+Virama+Ya...' is illogical to scholars of Bengali and indeed Indic languages in general. I refuted this yesterday by indication that this usage is an innovation. 2. Such sequences are not semantically equivalent to the intended ... sentence fragment. 3. There are no other

Re: Please see my latest proposal

2003-03-02 Thread Michael Everson
Andy, Your BENGALI LETTER OPEN O can be encoded already with the sequence U+0985 U+09CD U+09AF. Your BENGALI LETTER CENTRAL E can be encoded already with the sequence U+098F U+09CD U+09AF. There is no need to "bring the Bengali code block in line with the Devanagari block". -- Michael Everson

Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)

2003-03-02 Thread Mark Davis
I agree with Kent that it is somewhat less robust to simply remove ill-formed sequences, since it removes any indication that the data was corrupted. Either better to signal an error, or insert some other indication like a REPLACEMENT CHARACTER or SUB at that point. (And in my reading, C12a does a

Re: Impossible combinations?

2003-03-02 Thread Roozbeh Pournader
On Sun, 2 Mar 2003, Kevin Brown wrote: > Does anyone know of a Latin-based language in which it is possible to > have a lowercase immediately followed by an uppercase in the SAME word? That happens in many common names, like McGowan. It will also be used in tech terms that need to avoid space for

Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)

2003-03-02 Thread Michael \(michka\) Kaplan
From: "Mark Davis" <[EMAIL PROTECTED]> > I agree with Kent that it is somewhat less robust to simply remove > ill-formed sequences, since it removes any indication that the data was > corrupted. Nice that the API gives one the option to choose, huh? ;-) The notion of continuing (even if one is

Re: Impossible combinations?

2003-03-02 Thread Michael Everson
At 21:01 +0330 2003-03-02, Roozbeh Pournader wrote: That happens in many common names, like McGowan. Noble names, Roozbeh. ;-) -- Michael Everson * * Everson Typography * * http://www.evertype.com

Re: Impossible combinations?

2003-03-02 Thread Kenneth Whistler
> On Sun, 2 Mar 2003, Kevin Brown wrote: > > > Does anyone know of a Latin-based language in which it is possible to > > have a lowercase immediately followed by an uppercase in the SAME word? In addition to the examples pointed out by Roozbeh and Michael, this pattern is growing increasingly co

Re: Impossible combinations?

2003-03-02 Thread John Hudson
At 04:11 AM 3/2/2003, Kevin Brown wrote: I'm working on a Latin-based font that's got a large number of kerning pairs already defined and I'm trying to pare this list of pairs down to the bare minimum. There seem to be many pairs which are unlikely ever to be used. These pairs all involve a lowerc

Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)

2003-03-02 Thread Asmus Freytag
At 07:21 AM 3/2/03 -0800, Mark Davis wrote: >"C12a When a process interprets a code unit sequence which > purports to be in a Unicode character encoding form, it > shall treat ill-formed code unit sequences as an error > condition, and shall not interpret such sequences as > cha