Re: UTF-7 signature

2002-04-11 Thread Doug Ewell
Markus Scherer <[EMAIL PROTECTED]> wrote: > On 2002-apr-09, Shlomi Tal and Doug Ewell discussed on this list > a UTF-7 signature byte sequence of +/v8- (which was news to me). I don't remember ever reading a recommendation, or even a suggestion, to use +/v8- as a signature for UTF-7. But that w

Re: Concerning proposals

2002-04-11 Thread John Hudson
At 18:40 4/11/2002, =?iso-2022-jp?B?GyRCJG0hOyE7ITshOxsoQiAbJEIkbSE7ITshOxsoQg==?= wrote: >Why does the printed word get so much more respect than the written word? > >It would be like saying that for a spoken language to be accepted into a >registry, one must make a speech synthesizer for the

Re: Concerning proposals

2002-04-11 Thread John Hudson
At 17:23 4/11/2002, David Starner wrote: >pfaedit's a free font editor for Unix. Or one could write out a >PostScript font by hand - it's not completely unreasonable, especially >if you're doing something like a few math characters. I love the fact that there are still people out there who would

Re: MS/Unix BOM FAQ again (small fix)

2002-04-11 Thread George W Gerrity
This thread seems just about ended, and I don't want to be the person to revive it, but there have been numerous related topics in the past six months, and nothing in them answers the question that has been nagging me. The question is "Considering the difficulty af actually getting access to

Re: Concerning proposals

2002-04-11 Thread $B$m!;!;!;!;(B $B$m!;!;!;(B
>This is a barrier erected for three reasons: > > 1. If a proposed character can't pass the font test -- i.e., nobody can > come up with a usable font that contains it -- then it may be of > rather marginal usefulness, since apparently people *aren't* using it. > Of course, histo

Re: Concerning proposals

2002-04-11 Thread David Starner
pfaedit's a free font editor for Unix. Or one could write out a PostScript font by hand - it's not completely unreasonable, especially if you're doing something like a few math characters. -- David Starner - [EMAIL PROTECTED] "It's not a habit; it's cool; I feel alive. If you don't have it you'

Re: Concerning proposals

2002-04-11 Thread James H. Cloos Jr.
> "Stefan" == Stefan Persson <[EMAIL PROTECTED]> writes: Stefan> Is there some free font program out there that can be used for Stefan> this purpose? There is pfaedit at: http://pfaedit.sf.net/ and for bdf bitmap fonts xmbdfed at: http://crl.nmsu.edu/~mleisher/xmbdfed.html Pfaedi

Re: Concerning proposals

2002-04-11 Thread John Hudson
At 15:49 4/11/2002, Kenneth Whistler wrote: > > Is there some free font program out there that can be used for this > purpose? > >I'll let somebody else on the list who knows about font tools answer >that one. I'm not aware of any free tools that I would trust to do the job. The cheapest optio

Re: Concerning proposals

2002-04-11 Thread Kenneth Whistler
Juuitchan donned sackcloth and ashes and wailed: > >It seems that I have to make a font containing any characters that I want > to > >propose for inclusion. > > > > Oy gevalt. So I can't propose anything. Fabulous. Just fabulous. Well, get serious. The Unicode Standard is serious business. (E

Re: Concerning proposals

2002-04-11 Thread Kenneth Whistler
Stefan asked: > It seems that I have to make a font containing any characters that I want to > propose for inclusion. Or provide a font already made by someone else containing them, or get someone else who has the relevant tools to produce it. This is a barrier erected for three reasons: 1.

Re: Inherent "a"

2002-04-11 Thread Kenneth Whistler
> From [EMAIL PROTECTED] Thu Apr 11 13:45:37 2002 > X-Originating-IP: [62.30.112.2] > To: <[EMAIL PROTECTED]> > Subject: Re: Inherent "a" Sinnathurai Srivas wrote: > May I assume u+0b85 as official? No. That is U+0B85 TAMIL LETTER A -- just the ordinary, standalone letter /a/. You are, of cour

Re: Concerning proposals

2002-04-11 Thread $B$m!;!;!;!;(B $B$m!;!;!;(B
>From: "Stefan Persson" <[EMAIL PROTECTED]> >To: "Unicode-listan" <[EMAIL PROTECTED]> >Subject: Concerning proposals >Date: Thu, 11 Apr 2002 23:57:55 +0200 > >It seems that I have to make a font containing any characters that I want to >propose for inclusion. > Oy gevalt. So I can't propose a

Re: Unicode Myths

2002-04-11 Thread Peter_Constable
Mark: A suggestion: On slide 5, I would be inclined not to differentiate surrogates from non-characters. That only confuses people, I think, regarding the relationships between codepoints and the various encoding forms. Even if they are formally still distinguished in the Std, I contend that the

Re: When was U+xxxx added?

2002-04-11 Thread Markus Scherer
ICU 2.1 will have an API for this, uchar.h/u_charAge(). markus Kenneth Whistler wrote: > Frank asked: >>Given a Unicode encoding value U+ (or whatever for non-BMP), how can >>I find out the version of the Unicode standard in which this character >>first appeared? > > http://www.unicode.org

Concerning proposals

2002-04-11 Thread Stefan Persson
It seems that I have to make a font containing any characters that I want to propose for inclusion. Do the characters have to be encoded to the correct code points, or can they be encoded to just about any code point? Is there some free font program out there that can be used for this purpose?

Re: Vietnamese Nom Text

2002-04-11 Thread Stefan Persson
- Original Message - From: "Tom Gewecke" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: den 11 april 2002 22:56 Subject: Re: Vietnamese Nom Text > >see: > > > > http://www.columbia.edu/kermit/utf8.html > > > >which has an interesting new entry: Vietnamese N^¥m, the first entry > >con

Re: Vietnamese Nom Text

2002-04-11 Thread Tom Gewecke
>see: > > http://www.columbia.edu/kermit/utf8.html > >which has an interesting new entry: Vietnamese Nˆ¥m, the first entry >containing non-BMP characters (probably will not be entirely visible to >most people) Can *anyone* see it properly? Last I checked no browser could read UTF-8 beyond the

Re: Inherent "a"

2002-04-11 Thread Rick McGowan
Avarangal wrote: > Dear Doug Ewell, William Overington, James E. Agenbroad, and Maurice > Bauhahn, > > Thank you all for the reply. > > May I assume u+0b85 as official? Whoa, hang on here! Official WHAT? u+0b85 is definitely in Unicode: U+0B85 TAMIL LETTER A It is _NOT_ an "inherent a"

Re: Inherent "a"

2002-04-11 Thread Avarangal
Dear Doug Ewell, William Overington, James E. Agenbroad, and Maurice Bauhahn, Thank you all for the reply. May I assume u+0b85 as official? Some explanations for the need for a visible "a". In Tamil, a/ dependent "ai", and "au" has ligatures. infact "au" and "ou" at present utilise the same li

Re: When was U+xxxx added?

2002-04-11 Thread Frank da Cruz
Ken answered: > Frank asked: > > From [EMAIL PROTECTED] Thu Apr 11 12:12:33 2002 > > Date: Thu, 11 Apr 2002 14:58:48 EDT > > Given a Unicode encoding value U+ (or whatever for non-BMP), how can > > I find out the version of the Unicode standard in which this character > > first appeared? > >

Re: When was U+xxxx added?

2002-04-11 Thread Kenneth Whistler
Frank asked: > From [EMAIL PROTECTED] Thu Apr 11 12:12:33 2002 > Date: Thu, 11 Apr 2002 14:58:48 EDT > Given a Unicode encoding value U+ (or whatever for non-BMP), how can > I find out the version of the Unicode standard in which this character > first appeared? At last, a question for whic

When was U+xxxx added?

2002-04-11 Thread Frank da Cruz
Given a Unicode encoding value U+ (or whatever for non-BMP), how can I find out the version of the Unicode standard in which this character first appeared? - Frank

Re: UTF-7 signature

2002-04-11 Thread Markus Scherer
Shlomi Tal wrote: > UTF-7, it shocked me how Greek "Sokrates" and "S o k r a t e s" (with > spaces between each Greek letter in the latter) would have different > encodings for the same Unicode characters. That is not unusual for stateful encodings. It's the same with BOCU-1 (not in this part

Re: UTF-7 signature

2002-04-11 Thread Shlomi Tal
Markus Scherer wrote: >+/v8 is the encoding of U+FEFF as the first code point in a text. So far, >so good. >The '-' as the next byte switches UTF-7 back to direct-encoding of a subset >of US-ASCII. > >What if there is no '-' there? What if a non-ASCII code point immediately >follows the U+FEFF

UTF-7 signature

2002-04-11 Thread Markus Scherer
On 2002-apr-09, Shlomi Tal and Doug Ewell discussed on this list a UTF-7 signature byte sequence of +/v8- (which was news to me). (Subject "MS/Unix BOM FAQ again (small fix)") I "meditated" some over this - +/v8 is the encoding of U+FEFF as the first code point in a text. So far, so good. The '

RE: MS/Unix BOM FAQ again (small fix)

2002-04-11 Thread jarkko . hietaniemi
> Mark Davis <[EMAIL PROTECTED]> wrote: > > > - when one of the BOM-allowing UTFs starts with a BOM, you know the > > encoding*, and you strip off the BOM when you get the content. > > > > *assuming that no UTF-16 file has U+ as the first character. > > In the real world, this is a pretty go

OT: Definitions of Unicode

2002-04-11 Thread Mark Davis
I thought some of the choices in the following were amusing: http://m-w.com/cgi-bin/dictionary/?va=Unicode Mark — Γνῶθι σαυτόν — Θαλῆς [For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr] http://www.macchiato.com

Re: MS/Unix BOM FAQ again (small fix)

2002-04-11 Thread Mark Davis
It is a pretty good assumption; but if BOMs are used on smaller fields the probability goes up. And to be perfectly reliable, you can't assume it. That is one reason that the WORD JOINER was encoded, so that eventually we can use FEFF solely as a BOM. Mark — Γνῶθι σαυτόν — Θαλῆς [For transl

Re: MS/Unix BOM FAQ again (small fix)

2002-04-11 Thread Otto Stolz
Doug Ewell wrote: > As Shlomi points out, Microsoft products do not treat UTF-7 > specially, except that IE recognizes the UTF-7 BOM and sets its encoding > accordingly (but this is true for any UTF-7 sequence, not just the BOM; > try loading a text file containing only the 11 ASCII characters >

Re[2]: Discrepancy in ch03.pdf?

2002-04-11 Thread Anton Tagunov
Hello, Doug! I) AT> http://www.unicode.org/unicode/uni2book/ch03.pdf AT> 1. AT> - A single abstract character may correspond to more then one code AT> value - for example, U+00C5 ... LATIN CAPITAL LETTER A WITH RING and U+212B ... ANGSTROM SIGN 2. AT> - Multiple code values may be