Re: How many printable characters in 3.2.0?

2002-04-23 Thread Doug Ewell
Mark Davis <[EMAIL PROTECTED]> wrote: > Perhaps what is meant by the original request would be satisfied by > the property: > > Default_Ignorable_Code_Point > > defined in > http://www.unicode.org/Public/3.2-Update/DerivedCoreProperties-3.2.0.txt > > These are essentially characters that have no

Re: browsers and unicode surrogates

2002-04-23 Thread Martin Duerst
Just a very small correction: At 07:19 02/04/22 -0400, James H. Cloos Jr. wrote: >There are other ways as well. Apache will already (if you use the >default configs) add the Content-Language header if you use a filename >like foo.en.html. You could have it also add the charset via a >similar m

Re: browsers and unicode surrogates

2002-04-23 Thread Martin Duerst
At 22:25 02/04/19 +0100, Steffen Kamp wrote: >However, when giving the validator a ASCII-only document with a META tag >specifying UTF-16 as encoding (just for testing) it says that it does not >yet support this encoding, so I don't fully trust the validator in this case. The validator indeed doe

UTR #18: Unicode Regular Expression Guidelines

2002-04-23 Thread Mark Davis
There is a new version of UTR #18 that rolls in the latest changes from the UTC plus updates for Unicode 3.2. See http://www.unicode.org/reports/tr18/ Mark

Re: "UNICODE BOMBER STRIKES AGAIN"

2002-04-23 Thread Mark Davis
You say: > Lemme see, that's 0x4B 0x00 0x65 0x00 0x6E 0x00. > > There's no BOM, and no external tagging as "UTF-16LE," and since this is > the Internet, we don't know the endianness of the originating machine. > > So, based on last week's discussion between Ken, Mark Davis, and me, I > am *requir

RE: browsers and unicode surrogates

2002-04-23 Thread Yves Arrouye
> | I am surprised by the "must only be used". It seems I am not > | conforming by including a meta statement in the utf-16 HTML page. I > | should either remove the statement or encode the HTML up to and > | including that statement as ascii. I'll check on this. > > It doesn't make much sense to

Re: unidata is big

2002-04-23 Thread Mark Davis
One of the Dublin papers talks about how this is done in ICU: http://www.unicode.org/iuc/iuc21/a347.html Mark — Γνῶθι σαυτόν — Θαλῆς [For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr] http://www.macchiato.com - Original Message - From: "Geoffrey Waigh" <[EMAIL PR

Re: How many printable characters in 3.2.0?

2002-04-23 Thread Mark Davis
Perhaps what is meant by the original request would be satisfied by the property: Default_Ignorable_Code_Point defined in http://www.unicode.org/Public/3.2-Update/DerivedCoreProperties-3.2.0.t xt These are essentially characters that have no visible glyphs and no advance width, but may have a d

Re: "UNICODE BOMBER STRIKES AGAIN"

2002-04-23 Thread Florian Weimer
[EMAIL PROTECTED] writes: > FYI: http://linguistlist.org/issues/13/13-1106.html#3 And I thought the Unicode bomber was %u9090%u6858%ucbd3... guy!

Re: OT Korean spam

2002-04-23 Thread David Starner
On Mon, Apr 22, 2002 at 11:34:47PM -0700, Curtis Clark wrote: > Somehow I got on a Korean spam list a while back, and I get between 10 and > 20 emails a day in euc-kr. The majority have subject lines that start with > U+AD11 U+ACE0. If it's not obscene, could someone tell me what that means? I

Re: How many printable characters in 3.2.0?

2002-04-23 Thread James E. Agenbroad
On Mon, 22 Apr 2002, Doug Ewell wrote: > Zsigri Gyula <[EMAIL PROTECTED]> wrote: > > > How many printable characters are there in Unicode 3.2.0? I tried > > desperately to find the answer at the Unicode web site but could > > not. > > There are 95,156 total assigned characters. > > To find the

RE: Variant locales?

2002-04-23 Thread Suzanne M. Topping
> -Original Message- > From: Michael (michka) Kaplan [mailto:[EMAIL PROTECTED]] > > Plus there is a disdain for corporate interest (or there has > been in many > past postings) since they want something independent? The group does not disdain corporate interest, although some individu

OT Korean spam

2002-04-23 Thread Curtis Clark
Somehow I got on a Korean spam list a while back, and I get between 10 and 20 emails a day in euc-kr. The majority have subject lines that start with U+AD11 U+ACE0. If it's not obscene, could someone tell me what that means? (Thanks to SC Unipad, I can see the Hangul, although I don't read Kore