traditional vs simplified chinese

2003-02-13 Thread Paul Hastings
i suppose this is a really simple minded question but is there any way of telling if an incoming chunk of text (say from a browser form) is traditional or simplified chinese? thanks. Paul Hastings [EMAIL PROTECTED] Director

Re: traditional vs simplified chinese

2003-02-13 Thread Zhang Weiwu
Zhang Weiwu from Xiamen China - Original Message - From: "Paul Hastings" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Thursday, February 13, 2003 7:35 PM Subject: traditional vs simplified chinese > i suppose this is a really simple minded question but

RE: traditional vs simplified chinese

2003-02-13 Thread Marco Cimarosti
Paul Hastings wrote: > i suppose this is a really simple minded question but is > there any way of telling if an incoming chunk of text > (say from a browser form) is traditional or simplified > chinese? Please notice that the classification you want is not always meaningful. E.g., what if the in

Re: traditional vs simplified chinese

2003-02-13 Thread Zhang Weiwu
- Original Message - From: "Paul Hastings" <[EMAIL PROTECTED]> To: "Zhang Weiwu" <[EMAIL PROTECTED]> Sent: Thursday, February 13, 2003 9:16 PM Subject: Re: traditional vs simplified chinese > >meaning "for" (wei in Mandarin pinyin) is th

Re: traditional vs simplified chinese

2003-02-13 Thread John H. Jenkins
On Thursday, February 13, 2003, at 07:18 AM, Marco Cimarosti wrote: 3) All other characters listed in Unihan.txt are *both* "Traditional" and "Simplified". Actually, this is not quite true. Even though the current set of traditional/simplified data is much better than it's ever been, we

RE: traditional vs simplified chinese

2003-02-13 Thread Marco Cimarosti
logically correct documents" that could contain both characters: - a bibliography containing books published Mainland China and in Taiwan; - an article about the Chinese writing system; - the table of traditional vs. simplified Chinese character; - discussions on a Chinese newsgroup or ma

Re: traditional vs simplified chinese

2003-02-13 Thread Edward H Trager
Hi, Paul, On Thu, 13 Feb 2003, Zhang Weiwu wrote: > - Original Message - > From: "Paul Hastings" <[EMAIL PROTECTED]> > To: "Zhang Weiwu" <[EMAIL PROTECTED]> > Sent: Thursday, February 13, 2003 9:16 PM > Subject: Re: traditional vs simplifie

Re: traditional vs simplified chinese

2003-02-13 Thread Paul Hastings
> So I think Zhang Weiwu is suggesting a heuristic algorithm for > discriminating a unicode text which is already known, or assumed to be, in > Chinese. well the site will deliver chinese content w/doublechecking browser locale, etc. so yes, most likely chinese users. >to encounter at least o

Re: traditional vs simplified chinese

2003-02-13 Thread Paul Hastings
> Please notice that the classification you want is not always meaningful. > E.g., what if the incoming text is in Spanish? Would you classify it as > traditional or simplified Chinese?... as spanish i guess. the website will deliver chinese content & with some browser locale checking should be ok

Re: traditional vs simplified chinese

2003-02-13 Thread Andrew C. West
On Thu, 13 Feb 2003 09:48:45 -0800 (PST), "Zhang Weiwu" wrote: > Take it easy, if you find one 500B (the measure word) it is usually enough to > say it is traditional Chinese, one 4E2A (measure word) is in simplified > Chinese. They never happen together in a logically correct document. Marco i

RE: traditional vs simplified chinese

2003-02-13 Thread Marco Cimarosti
Edward H Trager wrote: > [...] > If I were going to write such an algorithm, I would: > > * First, insure that the incoming text stream to be classified was >sufficiently long to be probabilistically classifiable. In other >words, what's the shortest stream of Hanzi characters needed, on

RE: traditional vs simplified chinese

2003-02-13 Thread Rick Cameron
t: RE: traditional vs simplified chinese Paul wrote: > To: Edward H Trager > > Marco Cimarosti has questioned, why do you need to classify > > text as being simplified or traditional? > > if i understand their needs correctly, its to implement a > search system with search phrases o

RE: traditional vs simplified chinese

2003-02-13 Thread Marco Cimarosti
Paul wrote: > To: Edward H Trager > > Marco Cimarosti has questioned, why do you need to classify > > text as being simplified or traditional? > > if i understand their needs correctly, its to implement a > search system with search phrases of either "type" of > chinese--content would be in bot

Re: traditional vs simplified chinese

2003-02-13 Thread Edward H Trager
On Thu, 13 Feb 2003, Andrew C. West wrote: > On Thu, 13 Feb 2003 09:48:45 -0800 (PST), "Zhang Weiwu" wrote: > > > Take it easy, if you find one 500B (the measure word) it is usually enough to > > say it is traditional Chinese, one 4E2A (measure word) is in simplified > > Chinese. They never ha

Re: traditional vs simplified chinese

2003-02-13 Thread Edward H Trager
On Fri, 14 Feb 2003, Paul Hastings wrote: > > So I think Zhang Weiwu is suggesting a heuristic algorithm for > > discriminating a unicode text which is already known, or assumed to be, in > > Chinese. > > well the site will deliver chinese content w/doublechecking browser locale, > etc. so yes, m

RE: traditional vs simplified chinese

2003-02-13 Thread Edward H Trager
On Thu, 13 Feb 2003, Rick Cameron wrote: > The Win32 API includes a function that can do this folding, on Windows > NT/2000/XP: LCMapString, with the option LCMAP_SIMPLIFIED_CHINESE or > LCMAP_TRADITIONAL_CHINESE. > > I know little about Chinese, but I have the impression that it is much more >

RE: traditional vs simplified chinese

2003-02-13 Thread Rick Cameron
> -Original Message- > From: Edward H Trager [mailto:[EMAIL PROTECTED]] > > On Thu, 13 Feb 2003, Rick Cameron wrote: > > > The Win32 API includes a function that can do this folding, > on Windows > > NT/2000/XP: LCMapString, with the option > LCMAP_SIMPLIFIED_CHINESE or > > LCMAP_TRAD

Re: traditional vs simplified chinese

2003-02-13 Thread David Oftedal
I say live with it. This happens in Japanese as well, and it gets even worse when searching in romazi, European letters, because there are so many different ways of spelling things, and all the Chinese borrow words mean and sound exactly the same. But when the whole point of the system is to s

Re: traditional vs simplified chinese

2003-02-13 Thread Zhang Weiwu
Andrew C. West" <[EMAIL PROTECTED]> wrote on Friday, February 14, 2003 2:29 AM Subject: Re: traditional vs simplified chinese > On Thu, 13 Feb 2003 09:48:45 -0800 (PST), "Zhang Weiwu" wrote: > > > Take it easy, if you find one 500B (the measure word) it i

RE: traditional vs simplified chinese

2003-02-14 Thread jarkko.hietaniemi
> > I know little about Chinese, but I have the impression that it is much more > > common for several traditional characters to correspond to one simplified > > character than vice versa. If that's true, it seems to me that it would make > > most sense to fold to simplified. > > Hmmm ... Suppose I

Re: traditional vs simplified chinese

2003-02-14 Thread Andrew C. West
On Fri, 14 Feb 2003 01:23:42 -0800 (PST), "Zhang Weiwu" wrote: > I never saw 500B and 4E2A in one same printed document as I lived in China for > 20 years. (Well, need to remove the years I cannot read:) Unless you have a > obvious reason to do so, to print a book with Traditional characters is >

Re: traditional vs simplified chinese

2003-02-14 Thread John Cowan
Andrew C. West scripsit: > Interestingly, the dictionary quotes Zheng Xuan, writing in the 2nd century > A.D., as stating that U+4E2A (the modern "simplified" form) is the correct form > of the character, and that U+500B (the modern "traditional" form) is a vulgar > substitute ! IIRC this is true

Re: traditional vs simplified chinese

2003-02-14 Thread Thomas Chan
On Thu, 13 Feb 2003, Zhang Weiwu wrote: >Take it easy, if you find one 500B (the measure word) it is usually enough to >say it is traditional Chinese, one 4E2A (measure word) is in simplified >Chinese. They never happen together in a logically correct document. Others have already given examples

Re: traditional vs simplified chinese

2003-02-14 Thread Andrew C. West
On Fri, 14 Feb 2003 07:45:44 -0800 (PST), Thomas Chan wrote: > I think zhe4 'this' (simp U+8FD9 / trad U+9019) might be better for a very > simple heuristic for modern text, since it occupies position #11 in at > least one frequency list (compared to #15 for the above-cited ge4), and as > far as I

Re: traditional vs simplified chinese

2003-02-14 Thread John Cowan
Andrew C. West scripsit: > On a related matter, I was wondering about language tagging for Chinese. "zh-CN" > and "zh-TW" are used quite frequently, but what do they imply ? They are usually (mis)used to mean "Mandarin, simplified characters" and "Mandarin, traditional characters" respectively.

Re: traditional vs simplified chinese

2003-02-15 Thread David Oftedal
Haha, I just realized I stole my new sig from your page. Haha, neat! -- New Norwegian (Nynorsk) is essentially the speech of Norwegian peasants as mutilated by a schoolteacher with a poor understanding of Icelandic. --Halldór Laxness, via B. Philip Jonsson Swedish, Norwegian and Danish are actual